Prediction of hospitalization due to heart diseases by supervised learning methods

Wuyang Dai; Theodora S Brisimi; William G Adams; Theofanie Mela; Venkatesh Saligrama; Ioannis Ch Paschalidis

doi:10.1016/j.ijmedinf.2014.10.002

Prediction of hospitalization due to heart diseases by supervised learning methods

Int J Med Inform. 2015 Mar;84(3):189-97. doi: 10.1016/j.ijmedinf.2014.10.002. Epub 2014 Oct 16.

Authors

Wuyang Dai¹, Theodora S Brisimi¹, William G Adams², Theofanie Mela³, Venkatesh Saligrama¹, Ioannis Ch Paschalidis⁴

Affiliations

¹ Department of Electrical & Computer Engineering, and Division of Systems Engineering, Boston University, 8 Saint Mary's Street, Boston, MA 02215, United States.
² Department of Pediatrics, Boston University School of Medicine and Boston Medical Center, 88 East Concord Street, Boston, MA 02118, United States.
³ Electrophysiology Lab/Arrhythmia Service, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114, United States.
⁴ Department of Electrical & Computer Engineering, and Division of Systems Engineering, Boston University, 8 Saint Mary's Street, Boston, MA 02215, United States. Electronic address: yannisp@bu.edu.

Abstract

Background: In 2008, the United States spent $2.2 trillion for healthcare, which was 15.5% of its GDP. 31% of this expenditure is attributed to hospital care. Evidently, even modest reductions in hospital care costs matter. A 2009 study showed that nearly $30.8 billion in hospital care cost during 2006 was potentially preventable, with heart diseases being responsible for about 31% of that amount.

Methods: Our goal is to accurately and efficiently predict heart-related hospitalizations based on the available patient-specific medical history. To the best of our knowledge, the approaches we introduce are novel for this problem. The prediction of hospitalization is formulated as a supervised classification problem. We use de-identified Electronic Health Record (EHR) data from a large urban hospital in Boston to identify patients with heart diseases. Patients are labeled and randomly partitioned into a training and a test set. We apply five machine learning algorithms, namely Support Vector Machines (SVM), AdaBoost using trees as the weak learner, logistic regression, a naïve Bayes event classifier, and a variation of a Likelihood Ratio Test adapted to the specific problem. Each model is trained on the training set and then tested on the test set.

Results: All five models show consistent results, which could, to some extent, indicate the limit of the achievable prediction accuracy. Our results show that with under 30% false alarm rate, the detection rate could be as high as 82%. These accuracy rates translate to a considerable amount of potential savings, if used in practice.

Keywords: Electronic Health Records (EHRs); Heart diseases; Hospitalization; Machine learning; Predictive models; Prevention.

Publication types

Comparative Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Artificial Intelligence*
Bayes Theorem
Boston
Electronic Health Records
Heart Diseases*
Hospitalization*
Humans
Likelihood Functions
Logistic Models
ROC Curve
Risk Assessment / methods*

Abstract

Publication types

MeSH terms

Grants and funding