Cookies?
Library Header Image
LSE Research Online LSE Library Services

Ranking-based variable selection for high-dimensional data

Baranowski, Rafal, Chen, Yining ORCID: 0000-0003-1697-1920 and Fryzlewicz, Piotr ORCID: 0000-0002-9676-902X (2020) Ranking-based variable selection for high-dimensional data. Statistica Sinica, 30 (3). 1485 - 1516. ISSN 1017-0405

[img]
Preview
Text - Accepted Version
Download (1MB) | Preview

Identification Number: 10.5705/ss.202017.0139

Abstract

We propose a ranking-based variable selection (RBVS) technique that identifies important variables influencing the response in high-dimensional data. RBVS uses subsampling to identify the covariates that appear nonspuriously at the top of a chosen variable ranking. We study the conditions under which such a set is unique, and show that it can be recovered successfully from the data by our procedure. Unlike many existing high-dimensional variable selection techniques, among all relevant variables, RBVS distinguishes between important and unimportant variables, and aims to recover only the important ones. Moreover, RBVS does not require model restrictions on the relationship between the response and the covariates, and, thus, is widely applicable in both parametric and nonparametric contexts. Lastly, we illustrate the good practical performance of the proposed technique by means of a comparative simulation study. The RBVS algorithm is implemented in rbvs, a publicly available R package.

Item Type: Article
Official URL: http://www3.stat.sinica.edu.tw/statistica/
Additional Information: © 2020 Institute of Statistical Science, Academia Sinica
Divisions: Statistics
Subjects: H Social Sciences > HA Statistics
Date Deposited: 18 Sep 2018 10:02
Last Modified: 20 Oct 2021 01:29
URI: http://eprints.lse.ac.uk/id/eprint/90233

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics