Using machine learning to understand and improve care and outcomes for patients with head and neck cancer

Download files
Access & Terms of Use
open access
Copyright: Kotevski, Damian
Altmetric
Abstract
Head and neck cancer (HNC) is a complex disease with diversity in treatment modality and survival by anatomical site of origin. There is limited knowledge of the utility of oncology information systems (OIS) for the collection and reporting of HNC data during routine clinical practice to investigate prognostic factors and predict head and neck cancer-specific survival (HNCSS). Routinely collected structured data was extracted from an OIS from seven major hospitals in Australia for patients diagnosed with HNC between 2000 and 2017 and treated with definitive radiotherapy. Deaths were obtained from the National Death Index via record linkage, and HNCSS was measured from the date of diagnosis until death from HNC. Open-source machine learning and nomogram models were used to predict HNCSS and perform multivariable analysis to identify prognostic factors. Descriptive and survival analysis was used to identify inter-hospital variation in data collection, primary radiotherapy treatment, and survival. A random sample of clinical radiation oncology documents from an OIS were anonymised using a customised open-source tool (Microsoft Presidio) to evaluate the use of unstructured information for medical research. Not all user-defined fields were routinely completed and not all hospitals relied solely on the OIS, with one hospital collecting disease information in a parallel database. However, structured information collected in a standardised way with minimal missing data during routine clinical practice in an OIS can be used to predict two-year HNCSS with high performance. Evidence of inter-hospital variation in data completeness, primary radiotherapy dose, and five-year HNCSS was detected. The presence of missing data in the OIS reduced the number of predictors for prognostic analysis and prevented exploratory analysis to explain differences in survival by hospital. Lastly, the application of the anonymisation tool on unstructured clinical information sourced from an OIS demonstrated safe and secure use for some fields and a need to improve the detection and removal of person names. Data mining techniques for unstructured data or strategies to improve structured data collection should be explored to enable the development of prediction models using more complete data, patients, and variables, followed by external validation to confirm model performance.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2023
Resource Type
Thesis
Degree Type
PhD Doctorate
Files
download public version.pdf 11.8 MB Adobe Portable Document Format
Related dataset(s)