From data to diagnosis: A review of the current state of the art in lung cancer prediction using machine learning

Samuel Fanijo 1, *, Olusola Olabisi Ogunseye 2, Olumayowa Adeleke Idowu 3, Olufemi Olulaja 4 and Oghenetanure Ryan Enaworu 5

1 Department of Computer Science, Iowa State University, USA.
2 Department of Community, Environment and Policy, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, Arizona, USA.
3 Department of Economics, University of Pittsburgh, Pittsburgh, USA.
4 African Center of Excellence for Genomics of Infectious Disease, Redeemers University, Nigeria.
5 Department of Microbiology and Pharmacology, St. George’s University, St. George, Grenada.
 
Review
International Journal of Science and Research Archive, 2022, 06(02), 292–314.
Article DOI: 10.30574/ijsra.2022.6.2.0118
Publication history: 
Received on 12 April 2022; revised on 21 June 2022; accepted on 24 June 2022
 
Abstract: 
Lung cancer remains a leading cause of cancer-related mortality worldwide, with 1.8 million cases of lung cancer-related death recorded in 2022 alone. This is often due to late-stage diagnosis and the complexity of its molecular subtypes, necessitating the need for early detection and personalized treatment, to improve patient outcomes. Predictive biomarkers—biological indicators that help detect, monitor, and guide treatment—help in addressing this challenge. However, traditional methods of biomarker discovery often struggle to cope with the heterogeneity of lung cancer and the vast datasets generated from genomic, proteomic, and imaging technologies. In response, machine learning (ML) has emerged as a tool, capable of analyzing data and identifying novel biomarkers that may be overlooked through conventional techniques. This paper reviews the current state of predictive biomarker discovery in lung cancer, focusing on the application of machine learning approaches. It examines the types of biomarkers used in lung cancer diagnosis and treatment, recent advancements in ML-driven biomarker discovery, and the challenges that persist—such as data quality, model overfitting, and interpretability. This paper concludes with recommendations for future research directions, emphasizing the need for improved data integration, better model interpretability, and clinical validation of biomarkers to ensure that machine learning can fully realize its potential in revolutionizing lung cancer care.
 
Keywords: 
Lung cancer; Predictive biomarkers; Artificial Intelligence; Machine learning; Genomic data
 
Full text article in PDF: