What Do We Choose When We Err? Model Selection and Testing for Misspecified Logistic Regression Revisited
Jan Mielniczuk , Paweł Roman Teisseyre
AbstractThe problem of fitting logistic regression to binary model allowing for missppecification of the response function is reconsidered. We introduce two-stage procedure which consists first in ordering predictors with respect to deviances of the models with the predictor in question omitted and then choosing the minimizer of Generalized Information Criterion in the resulting nested family of models. This allows for large number of potential predictors to be considered in contrast to an exhaustive method. We prove that the procedure consistently chooses model t∗ which is the closest in the averaged Kullback-Leibler sense to the true binary model t. We then consider interplay between t and t∗ and prove that for monotone response function when there is genuine dependence of response on predictors, t∗ is necessarily nonempty. This implies consistency of a deviance test of significance under misspecification. For a class of distributions of predictors, including normal family, Rudd’s result asserts that t∗=t . Numerical experiments reveal that for normally distributed predictors probability of correct selection and power of deviance test depend monotonically on Rudd’s proportionality constant η .
|Publication size in sheets||1.25|
Matwin Stan, Mielniczuk Jan (eds.): Challenges in Computational Statistics and Data Mining, Studies in Computational Intelligence, vol. 605, 2016, Springer International Publishing, ISBN 978-3-319-18780-8, [978-3-319-18781-5], 399 p., DOI:10.1007/978-3-319-18781-5
front_matter.pdf / 110.53 KB / No licence information
|Keywords in English||Incorrect model specification; Variable selection; Logistic regression|
|Publication indicators||: 2016 = 0.376|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.