Cookies?
Library Header Image
LSE Research Online LSE Library Services

An instrumental variable approach to confounded off-policy evaluation

Xu, Yang, Zhu, Jin, Shi, Chengchun ORCID: 0000-0001-7773-2099, Luo, Shikai and Song, Rui (2023) An instrumental variable approach to confounded off-policy evaluation. Proceedings of Machine Learning Research, 202. 38848 - 38880. ISSN 1938-7228

[img] Text (An Instrumental Variable Approach to Confounded Off-Policy Evaluation) - Accepted Version
Download (814kB)

Abstract

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.

Item Type: Article
Official URL: https://proceedings.mlr.press/
Additional Information: © 2023 The Author(s)
Divisions: Statistics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Date Deposited: 23 May 2024 14:33
Last Modified: 19 Nov 2024 03:21
URI: http://eprints.lse.ac.uk/id/eprint/123599

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics