Cookies?
Library Header Image
LSE Research Online LSE Library Services

Forward and backward state abstractions for off-policy evaluation

Hao, Meiling, Su, Pingfan, Hu, Liyuan, Szabo, Zoltan ORCID: 0000-0001-6183-7603, Zhao, Qianyu and Shi, Chengchun ORCID: 0000-0001-7773-2099 (2024) Forward and backward state abstractions for off-policy evaluation. . arXiv. (Submitted)

[img] Text (Szabo_forward-and-backward-state-abstractions) - Submitted Version
Download (1MB)

Abstract

Off-policy evaluation (OPE) is crucial for evaluating a target policy’s impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging. This paper studies state abstractions – originally designed for policy learning – in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE. (ii) We derive sufficient conditions for achieving irrelevance in Q-functions and marginalized importance sampling ratios, the latter obtained by constructing a time-reversed Markov decision process (MDP) based on the observed MDP. (iii) We propose a novel two-step procedure that sequentially projects the original state space into a smaller space, which substantially simplify the sample complexity of OPE arising from high cardinality.

Item Type: Monograph (Report)
Official URL: https://arxiv.org/
Divisions: Statistics
Subjects: H Social Sciences > HA Statistics
Date Deposited: 02 Jul 2024 07:54
Last Modified: 25 Oct 2024 10:09
URI: http://eprints.lse.ac.uk/id/eprint/124074

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics