Stopping rules for mutual information-based feature selection
Jan Mielniczuk , Paweł Roman Teisseyre
AbstractIn recent years feature selection methods based on mutual information have attracted a significant attention. Most of the proposed methods are based on sequential forward search which add at each step a feature which is most relevant in explaining a class variable when considered together with the already chosen features. Such procedures produce ranking of features ordered according to their relevance. However significant limitation of all existing methods is lack of stopping rules which separate relevant features placed on the top of the ranking list from irrelevant ones. Finding an appropriate stopping rule is particularly important in domains where one wants to precisely determine the set of features affecting the class variable and discard the irrelevant ones (e.g. in genome-wide association studies the goal is to precisely determine mutations in DNA affecting the disease). In this work we propose stopping rules which are based on distribution of approximation of conditional mutual information given that all relevant features have been already selected. We show that the distribution is approximately chi square with appropriate number of degrees of freedom provided features are discretized into moderate number of bins. The proposed stopping rules are based on quantiles of the distribution and related p-values which are compared with thresholds used in multiple hypothesis testing. Importantly the proposed methods do not require additional validation data and are independent from the classifier. The extensive simulation experiments indicate that the rules separate relevant features from the irrelevant ones. We show experimentally that Positive Selection Rate (fraction of features correctly selected as relevant with respect to all relevant features) approaches 1, when sample size increases. At the same time, False Discovery Rate (fraction of irrelevant features selected with respect to all selected features) is controlled. The experiments on 17 benchmark datasets indicate that the classification models, built on features selected by the proposed methods, in 13 cases achieve significantly higher accuracy than the models based on all available features.
|Journal series||Neurocomputing, ISSN 0925-2312, e-ISSN 1872-8286, (N/A 100 pkt)|
|Publication size in sheets||0.95|
|Keywords in English||Entropy; Mutual information; Interaction information; Feature selection; Multiple hypothesis testing; Stopping rules|
|ASJC Classification||; ;|
|Score||= 100.0, 09-01-2020, ArticleFromJournal|
|Publication indicators||= 1; : 2017 = 1.516; : 2018 = 4.072 (2) - 2018=3.824 (5)|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.