Cookies?
Library Header Image
LSE Research Online LSE Library Services

The MIDAS touch: accurate and scalable missing-data imputation with deep learning

Lall, Ranjit ORCID: 0000-0003-1455-3506 and Robinson, Thomas ORCID: 0000-0001-7097-1599 (2022) The MIDAS touch: accurate and scalable missing-data imputation with deep learning. Political Analysis, 30 (2). 179 - 196. ISSN 1047-1987

[img] Text (The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning) - Accepted Version
Download (3MB)

Identification Number: 10.1017/pan.2020.49

Abstract

Principled methods for analyzing missing values, based chiefly on multiple imputation, have become increasingly popular yet can struggle to handle the kinds of large and complex data that are also becoming common. We propose an accurate, fast, and scalable approach to multiple imputation, which we call MIDAS (Multiple Imputation with Denoising Autoencoders). MIDAS employs a class of unsupervised neural networks known as denoising autoencoders, which are designed to reduce dimensionality by corrupting and attempting to reconstruct a subset of data. We repurpose denoising autoencoders for multiple imputation by treating missing values as an additional portion of corrupted data and drawing imputations from a model trained to minimize the reconstruction error on the originally observed portion. Systematic tests on simulated as well as real social science data, together with an applied example involving a large-scale electoral survey, illustrate MIDAS’s accuracy and efficiency across a range of settings. We provide open-source software for implementing MIDAS.

Item Type: Article
Official URL: https://www.cambridge.org/core/journals/political-...
Additional Information: © 2021 The Authors
Divisions: International Relations
Subjects: J Political Science > JA Political science (General)
Date Deposited: 04 Jan 2021 17:24
Last Modified: 12 Dec 2024 02:24
URI: http://eprints.lse.ac.uk/id/eprint/108170

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics