Large unbalanced credit scoring using Lasso-logistic regression ensemble

PLoS One. 2015 Feb 23;10(2):e0117844. doi: 10.1371/journal.pone.0117844. eCollection 2015.

Abstract

Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

MeSH terms

  • Algorithms*
  • Financial Management / methods
  • Financial Management / statistics & numerical data*
  • Humans
  • Logistic Models*
  • Machine Learning
  • Models, Economic*
  • Reproducibility of Results

Grants and funding

The authors received no specific funding for this work.