Retrospective Diagnostic Study of Ummon AutoReader Performance

A Tool for Prescreening Cytology of Cervical Lesions

Summary

Cervical cancer is a preventable disease, yet its persistent incidence highlights the need for more effective screening strategies. Among existing methods, p16/Ki-67 dual stain cytology is noted for its cost-effectiveness but is limited by time-consuming processes and the risk of overlooking precancerous cells.

To address these issues, we developed Ummon AutoReader, a software that automates the prescreening of cytology slides using a hybrid artificial intelligence approach to minimize the omission of precancerous cells and ease cytology reading.

Objectives: This study aims to evaluate the diagnostic accuracy of Ummon AutoReader by comparing the sensitivity and specificity of manual versus software-assisted readings of p16/Ki-67 dual stain cytology. Additionally, we evaluate the analytical performance of the Ummon AutoReader.

Methods: We analyzed a representative population of 110 cases gathered from the routine workflow of a pathology laboratory. Manual diagnosis derived from clinical records was compared with assisted diagnosis conducted by three cytopathologists using Ummon AutoReader. All discordances were resolved by a consensus committee.

Results: The use of Ummon AutoReader for assisted diagnosis resulted in a higher sensitivity compared to manual reading (100% vs. 81.9%), while maintaining equivalent specificity (100% for both methods). The software achieved a cell detection rate of 98% in a sample of 200 cells, with an area under the curve (AUC) of 0.893 for differentiating between dually and non-dually stained cells. Inter-scanner consistency was confirmed in all 10 cases examined, and inter-scanner correlation of cell scores improved from 0.749 (n=37) without calibration to 0.813 (n=39) with calibration.

Conclusion: Ummon AutoReader enhances the sensitivity of p16/Ki-67 cytology readings without affecting specificity. It also demonstrates robust inter-scanner reliability and improves the ease of cytological evaluations, making it a valuable tool for the early detection of cervical cancer.

Introduction

Cervical cancer represents a significant global health concern, ranking among the top three cancers affecting women younger than 45 years worldwide (Arbyn et al., 2020). In 2022, an estimated 565,541 new cases of cervical cancer were reported and 280,479 new deaths occurred due to cervical cancer worldwide (Momenimovahed et al., 2023). The incidence rate of this disease continues to underscore the importance of effective screening strategies for its prevention and early detection (Bhatla & Singhal, 2020; Jansen et al., 2020; Perkins et al., 2023; Sawaya et al., 2019). These strategies often involve primary HPV testing, because of its high sensitivity to detect precancerous lesions. A negative HPV test indicates a very low cervical cancer risk over the next decade (Dillner et al., 2008; Gage et al., 2014; Katki et al., 2011). However, the moderate specificity of HPV testing, due to its inability to discriminate between transient and persistent infections (Catarino et al., 2015), necessitates additional triage for colposcopy referral (Cuschieri et al., 2018; Sawaya et al., 2019; Wentzensen et al., 2016). This often includes cytology (Papanicolaou tests), but the limited reproducibility of cytology requires frequent retesting (Stoler et al., 2001; Wright Jr et al., 2014).

Another promising triage strategy is concomitant detection of p16 and Ki-67, respectively a HPV-activated protein and a cell proliferation marker, in the same cell. The p16/Ki-67 dual staining has demonstrated higher accuracy in detecting cervical precancerous lesions compared to cytology (Carozzi et al., 2013; Clarke et al., 2019; Ouh et al., 2024; Schmidt et al., 2011; Wentzensen et al., 2012, 2015, 2019; Wright et al., 2017). The p16/Ki-67 dual staining is commercialized by Roche as the CINTec Plus™ technology (Bergeron et al., 2015; Schmidt et al., 2011), for which medico-economic studies showed improved cost-effectiveness (Barré et al., 2017; Petry et al., 2017). Recently, new ASCCP cervical cancer management guidelines included dual-stain triage testing to manage early diagnosis of HPV-positive cervical precancer and cancer (Clarke et al., 2024). These guidelines highlight that, compared to cytology, dual stain requires fewer colposcopies and detects cervical intraepithelial neoplasia grade 3 (CIN3) or worse earlier.

Nevertheless, the manual interpretation of CINTec Plus™ smear tests presents several challenges. Each smear requires meticulous double examination by a cytotechnologist and a cytopathologist, often involving the scrutiny of tens to hundreds of thousands of cells. Despite this thoroughness, the possibility of missing positive cells, even when present among the vast cell population, remains a concern. To mitigate these challenges, automation through algorithmic software has emerged as a promising solution. In 2021, Wentzensen et al. (2021) introduced a deep learning algorithm to automatize the p16/Ki-67 dual stain reading and demonstrated the clinical relevance of such tools by showing that automated reading of dual stain provided better risk stratification compared with Pap cytology and manual reading of dual stain. In the conducted study, manual reading was undertaken within a study environment, thereby potentially leading to the Hawthorne effect (McCambridge et al., 2014), a phenomenon that could inflate test sensitivity as compared to routine conditions. Furthermore, external validation using alternative scanners for whole slide image acquisition was not conducted.

In this study, we conduct a comprehensive evaluation of Ummon AutoReader. Firstly, we assess its diagnostic accuracy by comparing the sensitivity and specificity of the software-assisted approach to a fully manual diagnosis across a dataset of 110 slides, focusing specifically on the detection of slides containing at least one dual-stained cell. We then scrutinize individual components of the automated process, including cell detection, scoring, and ranking, to detail the performance at each stage. Lastly, we validate the robustness of the algorithm and examine the impact of the calibration process through an inter-scanner assessment involving 10 different slides. This multifaceted analysis aims to establish a thorough understanding of Ummon AutoReader's capabilities and its potential utility in clinical settings.