Predicting Future Days in Hospital Using Health Insurance Claims

Download files
Access & Terms of Use
open access
Copyright: Xie, Yang
Altmetric
Abstract
Healthcare administrators worldwide are striving to lower the cost of care whilst improving the quality of care given. Hospitalisation is the largest component of health expenditure. Therefore, earlier identification of those at higher risk of being hospitalised would help healthcare administrators and health insurers to develop better plans and strategies. This thesis investigated how to utilise modern data-mining methods, and claims data from a large population collected across several years, to provide predictions of future hospitalisations. Prior to modelling, an exploratory data analysis (EDA) was performed on the claims data set. The EDA study aimed at understanding the properties of the data, inspecting qualitative features, and discovering new patterns and associations in the data through summarisation and visualisation. In addition, to ensure reproducibility in large-scale data analysis, a set of software was developed for data analysis, including functions for data pre-processing, feature engineering, modelling and result evaluation. In the first experiment, a regression decision tree algorithm was used, along with data from 242,075 individuals over three years, to predict number of days in hospital (DIH) in the third year, based on hospital admissions and procedure claims data from the initial two years. Results indicated that the proposed model significantly improved predictions over two established baseline methods (predicting a constant number of days for each customer and using the number of days in hospital of the previous year as the forecast for the following year), and provided a reasonable accuracy (AUC=0.843) when evaluated for the whole population. The second experiment further considered the hospital visits and historical claims data as temporal events, and developed a time series data mining approach to predict DIH. In the proposed method, the data were windowed at four different timescales (bi-monthly, quarterly, half-yearly and yearly) to construct regularly spaced time series features extracted from such events, resulting in four associated prediction models. These temporal models were evaluated and compared on their predictive performance of forecasting DIH. Non-yearly (i.e. half-yearly, quarterly and bi-monthly) models outperformed the yearly model when tested on the entire population.
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Xie, Yang
Supervisor(s)
Nigel, Lovell
Stephen, Redmond
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2017
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 2.28 MB Adobe Portable Document Format
Related dataset(s)