Active learning with biased non-response to label requests

Robinson, Thomas ORCID: 0000-0001-7097-1599, Tax, Niek, Mudd, Richard and Guy, Ido (2024) Active learning with biased non-response to label requests. Data Mining and Knowledge Discovery, 38 (4). 2117 - 2140. ISSN 1384-5810

Text (Robinson_active-learning-with-biased-non-response--published) - Published Version
Available under License Creative Commons Attribution.
Download (3MB)

Identification Number: 10.1007/s10618-024-01026-x

Abstract

Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning’s effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that biased non-response is likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy–the Upper Confidence Bound of the Expected Utility (UCB-EU)–that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for specific sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from an e-commerce platform. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy-to-implement correction that mitigates model degradation.

Item Type:	Article
Official URL:	https://link.springer.com/journal/10618
Additional Information:	© 2024 The Author(s)
Divisions:	Methodology
Subjects:	H Social Sciences > H Social Sciences (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science H Social Sciences > HF Commerce
Date Deposited:	13 May 2024 10:18
Last Modified:	25 Oct 2025 17:03
URI:	http://eprints.lse.ac.uk/id/eprint/123029

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics