An instrumental variable approach to confounded off-policy evaluation

Xu, Yang, Zhu, Jin, Shi, Chengchun ORCID: 0000-0001-7773-2099, Luo, Shikai and Song, Rui (2023) An instrumental variable approach to confounded off-policy evaluation. Proceedings of Machine Learning Research, 202. 38848 - 38880. ISSN 1938-7228

Text (An Instrumental Variable Approach to Confounded Off-Policy Evaluation) - Accepted Version
Download (814kB)

Abstract

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.

Item Type:	Article
Official URL:	https://proceedings.mlr.press/
Additional Information:	© 2023 The Author(s)
Divisions:	Statistics
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Date Deposited:	23 May 2024 14:33
Last Modified:	20 Nov 2025 20:37
URI:	http://eprints.lse.ac.uk/id/eprint/123599

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics