Preliminary evaluation of deep learning for first-line diagnostic prediction of tumor mutational status
Summary
Detecting tumour gene mutations via DNA or RNA sequencing is crucial for prescribing targeted therapies. Recent developments showed promising results for predicting tumoral mutational status using deep learning on histopathological images. However, aside from sequencing methods, their utility for efficient population diagnosis remains uncertain.
In this retrospective study, we use a standard prediction pipeline based on a convolutional neural network for the detection of cancer driver genomic alterations in The Cancer Genome Atlas (TCGA) breast (BRCA, n = 719), lung (LUAD, n = 541) and colon (COAD, n = 459) cancer datasets.
We propose 3 diagnostic strategies using deep learning methods as first-line diagnostic tools. Focusing on cancer driver genes such as KRAS, EGFR or TP53, we show that these methods help reduce DNA sequencing by up to 49.9% with a high sensitivity (95%). In a context of limited resources, these methods increase sensitivity up to 69.8% at a 30% capacity of DNA sequencing tests, up to 85.1% at a 50% capacity, and up to 91.8% at a 70% capacity.
These methods can also be used to prioritize patients with a positive predictive value up to 90.6% in the 10% patient most at risk of being mutated. Limitations of this study include the lack of external validation on non-TCGA data, dependence on prevalence of mutations in datasets, and use of a standard DL method on a limited dataset. Future studies using state-of-the-art methods and larger datasets are needed for better evaluation and clinical implementation.
Introduction
Targeted cancer therapies are specialized and efficient therapies that have revolutionized the treatment of cancer in the last few years. The higher specialization of targeted cancer therapies requires knowing more and more information about the patient. Getting personalized information requires using more specialized diagnostic tests. As an example, the presence or absence of genomic mutations can be associated with a response to a targeted cancer therapy like Wee1 inhibitors, which are treatments that are efficient only on cancers for which TP53 is mutated. Detection of somatic mutation is routinely made by DNA-sequencing. However, these tests face a threefold limitation: they have a long waiting period, require a large amount of tissue, and are expensive. Therefore, there is a growing need to identify new biomarkers, associated screening, and diagnostic strategies to improve the efficiency of diagnostic workflows in medical oncology.
More recently, deep learning methods have been used for many image analysis tasks in digital pathology such as tumour detection, tumour subtyping, quantification of cell numbers, and classification of cell types, RNA-seq, and have shown promising results for the prediction of the mutational status from digitized tissue stained with hematoxylin and eosin as whole slide images (WSI). The seminal work of Coudray et al. showed that key mutations of lung cancer could be identified from histopathology slides. Many other studies have followed and have shown similar results in brain, bladder, colorectal, breast, gastric, liver, and also in pan-cancer studies, demonstrating the presence of a link between histomorphology and genetic features. These studies mostly report AUC, a metric that is efficient to compare different approaches but that is not relevant to evaluate the benefits of the method in clinical routine.
These WSI are already made routinely in the diagnostic workflow and deep learning methods are cost-effective, always feasible and highly scalable. Therefore, a deep learning-based solution assessing the tumoral mutational status of a patient directly onto the WSI appears as a potentially valid diagnostic strategy. Here, we evaluate the benefits of using a standard deep learning pipeline for mutational status prediction on WSI in the diagnostic strategy for patients with breast, lung, and colorectal cancer. We simulate three diagnostic strategies using the deep learning pipeline as a first-line diagnostic tool in a clinical context before using DNA sequencing.
The first strategy, “Save-all,” considers the number of diagnostic tests that can be avoided while preserving high sensitivity. The second strategy, “Fixed-Capacity,” considers, in the case of a limited number of diagnostic tests available, the proportion of the positively mutated patients found (sensitivity) during DNA sequencing. In other words, it optimizes the number of patients that will later benefit from the associated targeted therapy for a limited DNA testing capacity. The last strategy, “Prioritization,” considers the number of mutated patients found in a small part of the patient population for short-tracking. The rationale behind this strategy is that the earlier the patient has access to the best therapy, the higher might be its chance of remission.
We finally show the relevance of our deep learning algorithm for each strategy in these realistic screening scenarios by showing its efficiency for each gene that both has a predictable mutational status and is clinically relevant and demonstrate the efficiency of the “fixed-capacity” strategy to reduce population inequalities.