What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited

Jan Mielniczuk , Paweł Roman Teisseyre

Abstract

The problem of fitting logistic regression to binary model allowing for missppecification of the response function is reconsidered. We introduce two-stage procedure which consists first in ordering predictors with respect to deviances of the models with the predictor in question omitted and then choosing the minimizer of Generalized Information Criterion in the resulting nested family of models. This allows for large number of potential predictors to be considered in contrast to an exhaustive method. We prove that the procedure consistently chooses model t∗ which is the closest in the averaged Kullback-Leibler sense to the true binary model t. We then consider interplay between t and t∗ and prove that for monotone response function when there is genuine dependence of response on predictors, t∗ is necessarily nonempty. This implies consistency of a deviance test of significance under misspecification. For a class of distributions of predictors, including normal family, Rudd’s result asserts that t∗=t . Numerical experiments reveal that for normally distributed predictors probability of correct selection and power of deviance test depend monotonically on Rudd’s proportionality constant η .
Author Jan Mielniczuk (FMIS / DSPFM) - [Instytut Podstaw Informatyki PAN (IPI PAN), MNiSW [80] [Polska Akademia Nauk (PAN)]]
Jan Mielniczuk,,
- Department of Stochastic Processes and Financial Mathematics
- Instytut Podstaw Informatyki PAN
, Paweł Roman Teisseyre (FMIS)
Paweł Roman Teisseyre,,
- Faculty of Mathematics and Information Science
Pages271-296
Publication size in sheets1.25
Book Matwin Stan, Mielniczuk Jan (eds.): Challenges in Computational Statistics and Data Mining, Studies in Computational Intelligence, vol. 605, 2016, Springer International Publishing, ISBN 978-3-319-18780-8, [978-3-319-18781-5], 399 p., DOI:10.1007/978-3-319-18781-5
front_matter.pdf / 110.53 KB / No licence information
Keywords in EnglishIncorrect model specification; Variable selection; Logistic regression
ASJC Classification1702 Artificial Intelligence
DOIDOI:10.1007/978-3-319-18781-5_15
URL https://link.springer.com/chapter/10.1007%2F978-3-319-18781-5_15
Languageen angielski
Score (nominal)0
Publication indicators Scopus SNIP (Source Normalised Impact per Paper): 2016 = 0.376
Citation count*
Cite
Share Share

Get link to the record


* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back
Confirmation
Are you sure?