Semisupervised Feature Selection Based on Relevance and Redundancy Criteria

IEEE Trans Neural Netw Learn Syst. 2017 Sep;28(9):1974-1984. doi: 10.1109/TNNLS.2016.2562670. Epub 2016 May 20.

Abstract

Feature selection aims to gain relevant features for improved classification performance and remove redundant features for reduced computational cost. How to balance these two factors is a problem especially when the categorical labels are costly to obtain. In this paper, we address this problem using semisupervised learning method and propose a max-relevance and min-redundancy criterion based on Pearson's correlation (RRPC) coefficient. This new method uses the incremental search technique to select optimal feature subsets. The new selected features have strong relevance to the labels in supervised manner, and avoid redundancy to the selected feature subsets under unsupervised constraints. Comparative studies are performed on binary data and multicategory data from benchmark data sets. The results show that the RRPC can achieve a good balance between relevance and redundancy in semisupervised feature selection. We also compare the RRPC with classic supervised feature selection criteria (such as mRMR and Fisher score), unsupervised feature selection criteria (such as Laplacian score), and semisupervised feature selection criteria (such as sSelect and locality sensitive). Experimental results demonstrate the effectiveness of our method.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.