Cookies?
Library Header Image
LSE Research Online LSE Library Services

Value enhancement of reinforcement learning via efficient and robust trust region optimization

Shi, Chengchun, Qi, Zhengling, Wang, Jianing and Zhou, Fan (2023) Value enhancement of reinforcement learning via efficient and robust trust region optimization. Journal of the American Statistical Association. pp. 1-15. ISSN 0162-1459

[img] Text (Value_Enhancement_of_Reinforcement_Learning_via_Efficient_and_Robust_Trust_Region_Optimization) - Accepted Version
Repository staff only until 20 July 2024.
Available under License Creative Commons Attribution Non-commercial.

Download (428kB) | Request a copy

Identification Number: 10.1080/01621459.2023.2238942

Abstract

Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing literature are developed in online settings where the data are easy to collect or simulate. Motivated by high stake domains such as mobile health studies with limited and pre-collected data, in this article, we study offline reinforcement learning methods. To efficiently use these datasets for policy optimization, we propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms. Specifically, when the initial policy is not consistent, our method will output a policy whose value is no worse and often better than that of the initial policy. When the initial policy is consistent, under some mild conditions, our method will yield a policy whose value converges to the optimal one at a faster rate than the initial policy, achieving the desired“value enhancement” property. The proposed method is generally applicable to any parameterized policy that belongs to certain pre-specified function class (e.g., deep neural networks). Extensive numerical studies are conducted to demonstrate the superior performance of our method. Supplementary materials for this article are available online.

Item Type: Article
Additional Information: © 2023 American Statistical Association
Divisions: Statistics
Subjects: H Social Sciences > HA Statistics
Date Deposited: 23 Apr 2024 11:15
Last Modified: 26 Apr 2024 10:51
URI: http://eprints.lse.ac.uk/id/eprint/122756

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics