Stopping rules for mutual information-based feature selection

Jan Mielniczuk , Paweł Roman Teisseyre


In recent years feature selection methods based on mutual information have attracted a significant attention. Most of the proposed methods are based on sequential forward search which add at each step a feature which is most relevant in explaining a class variable when considered together with the already chosen features. Such procedures produce ranking of features ordered according to their relevance. However significant limitation of all existing methods is lack of stopping rules which separate relevant features placed on the top of the ranking list from irrelevant ones. Finding an appropriate stopping rule is particularly important in domains where one wants to precisely determine the set of features affecting the class variable and discard the irrelevant ones (e.g. in genome-wide association studies the goal is to precisely determine mutations in DNA affecting the disease). In this work we propose stopping rules which are based on distribution of approximation of conditional mutual information given that all relevant features have been already selected. We show that the distribution is approximately chi square with appropriate number of degrees of freedom provided features are discretized into moderate number of bins. The proposed stopping rules are based on quantiles of the distribution and related p-values which are compared with thresholds used in multiple hypothesis testing. Importantly the proposed methods do not require additional validation data and are independent from the classifier. The extensive simulation experiments indicate that the rules separate relevant features from the irrelevant ones. We show experimentally that Positive Selection Rate (fraction of features correctly selected as relevant with respect to all relevant features) approaches 1, when sample size increases. At the same time, False Discovery Rate (fraction of irrelevant features selected with respect to all selected features) is controlled. The experiments on 17 benchmark datasets indicate that the classification models, built on features selected by the proposed methods, in 13 cases achieve significantly higher accuracy than the models based on all available features.
Author Jan Mielniczuk (FMIS / DSPFM) - [Instytut Podstaw Informatyki Polskiej Akademii Nauk (IPI PAN) [Polish Academy of Sciences (PAN)]]
Jan Mielniczuk,,
- Department of Stochastic Processes and Financial Mathematics
- Instytut Podstaw Informatyki Polskiej Akademii Nauk
, Paweł Roman Teisseyre (FMIS)
Paweł Roman Teisseyre,,
- Faculty of Mathematics and Information Science
Journal seriesNeurocomputing, ISSN 0925-2312, e-ISSN 1872-8286
Issue year2019
Publication size in sheets0.95
Keywords in Polishreguła stopu, teoria informacji, warunkowa informacja wzajemna
Keywords in Englishstopping rule, information theory, variable selection
ASJC Classification2805 Cognitive Neuroscience; 1702 Artificial Intelligence; 1706 Computer Science Applications
Abstract in PolishW pracy wyprowadzono rozkład asymptotyczny przybliżenia CIFE dla warunkowej informacji wzajemnej i wykorzystano go do konstrukcji reguły stopu w problemie regresji. Działanie metody zbadano na zbiorach syntetycznych i rzeczywistych.
Languageen angielski
Score (nominal)140
Score sourcejournalList
ScoreMinisterial score = 140.0, 05-02-2020, ArticleFromJournal
Publication indicators WoS Citations = 1; Scopus SNIP (Source Normalised Impact per Paper): 2017 = 1.516; WoS Impact Factor: 2018 = 4.072 (2) - 2018=3.824 (5)
Citation count*
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?