Xu, Yang, Zhu, Jin, Shi, Chengchun ORCID: 0000-0001-7773-2099, Luo, Shikai and Song, Rui (2023) An instrumental variable approach to confounded off-policy evaluation. Proceedings of Machine Learning Research, 202. 38848 - 38880. ISSN 1938-7228
Text (An Instrumental Variable Approach to Confounded Off-Policy Evaluation)
- Accepted Version
Download (814kB) |
Abstract
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In many cases, there exist unmeasured variables that confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making. Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy’s value in infinite horizon settings as well. Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a world-leading short-video platform.
Item Type: | Article |
---|---|
Official URL: | https://proceedings.mlr.press/ |
Additional Information: | © 2023 The Author(s) |
Divisions: | Statistics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Date Deposited: | 23 May 2024 14:33 |
Last Modified: | 12 Nov 2024 03:09 |
URI: | http://eprints.lse.ac.uk/id/eprint/123599 |
Actions (login required)
View Item |