
Introduction
Predictive models could support clinicians in identifying patients who may benefit from cancer investigations. We aimed to examine published evidence on machine learning models (ML) developed to estimate cancer risk based on symptoms and other patient characteristics.
Methods
Using MEDLINE, Scopus, and EMBASE, we performed a systematic review of studies published in 2014–2024, which included data on signs/symptoms for cancer risk prediction. We used the QUADAS-AI tools to assess study quality. We performed a quantitative synthesis of diagnostic performance, including accuracy, sensitivity, specificity, area under the curve (AUC). Adherence to TRIPOD guidelines was assessed.
Results
Among the 5646 initially identified articles, 34 met inclusion criteria. Included studies most frequently examined lung (n = 9 studies), mesothelioma (n = 7), and gastrointestinal cancers (n = 4) and used hospital electronic health records (n = 8) or publicly available online datasets (n = 13). In addition to signs/symptoms (n = 34), most models included sociodemographic characteristics (n = 27) and lifestyle factors (n = 20). In 70% of studies, internal validation was performed. ML models demonstrated variable performance, with AUC values ranging from 0.60 to 1 during validation. Random Forest, Support Vector Machine, Decision Tree, and Multilayer Perceptron showed the best predictive performance. Most of the studies (94.1%) had a high risk of bias for the index test.
Conclusion
ML models have been reported to demonstrate potential in managing complex data for cancer risk prediction. However, the current evidence is heterogeneous and frequently limited by bias and incomplete reporting. Further validation and thorough assessments of real-world performance are necessary before these models can be considered reliable for clinical use.
Read full article: https://onlinelibrary.wiley.com/doi/10.1002/cam4.71463










