An, Hongzhi, Huang, Da, Yao, Qiwei and Zhang, Cun-Hui
Stepwise searching for feature variables in high-dimensional linear regression.
The London School of Economics and Political Science, London, UK.
We investigate the classical stepwise forward and backward search methods for selecting sparse models in the context of linear regression with the number of candidate variables p greater than the number of observations n. In the noiseless case, we give definite upper bounds for the number of forward search steps to recover all relevant variables, if each step of the forward search is approximately optimal in reduction of residual sum of squares, up to a fraction. These upper bounds for the number of steps are of the same order as the size of
a true sparse model under mild conditions. In the presence of noise, traditional information criteria such as BIC and AIC are designed for p < n and may fail spectacularly when p is greater than n. To overcome this difficulty, two information criteria BICP and BICC are
proposed to serve as the stopping rules in the stepwise searches. The forward search with noise is proved to be approximately optimal with high probability, compared with the optimal forward search without noise, so that the upper bounds for the number of steps still apply.
The proposed BICP is proved to stop the forward search as soon as it recovers all relevant variables and remove all extra variables in the backward deletion. This leads to the selection consistency of the estimated models. The proposed methods are illustrated in a simulation study which indicates that the new methods outperform a counterpart LASSO selector with a penalty parameter set at a fixed value.
Actions (login required)
||Record administration - authorised staff only