Studies found that a small portion of the population spent the majority of health care resources, and they highlighted the importance of predicting high-cost users in the health care management and policy. Most prior research on high-cost user prediction models are based on diagnosis data with additional cost and health care utilization data to improve prediction accuracy. To further improve the prediction of high-cost users, researchers have been testing various new data sources such as self-reported health status data. In this study, we use three categories of medical check-up data, laboratory tests, self-reported medical history, and self-reported health behavior data to build high-cost user prediction models, and to assess the medical check-up features as predictors of high-cost users. Using three data-mining models, logistic regression, random forest, and neural network models, we show that under the diagnosis-based approach, medical check-up data marginally improve diagnosis-based prediction models. Under the cost-based approach, we find that medical check-up data improve cost-based prediction models marginally and medical check-up data can be a viable alternate data source to diagnosis data in predicting high-cost users.
Keywords: health care cost; health insurance; high-cost users; medical check-up; predictive models.