Shi, Chengchun ORCID: 0000-0001-7773-2099, Qi, Zhengling, Wang, Jianing and Zhou, Fan (2023) Value enhancement of reinforcement learning via efficient and robust trust region optimization. Journal of the American Statistical Association. pp. 1-15. ISSN 0162-1459
Text (Value_Enhancement_of_Reinforcement_Learning_via_Efficient_and_Robust_Trust_Region_Optimization)
- Accepted Version
Available under License Creative Commons Attribution Non-commercial. Download (428kB) |
Abstract
Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy that maximizes the cumulative rewards in sequential decision making. Most of methods in the existing literature are developed in online settings where the data are easy to collect or simulate. Motivated by high stake domains such as mobile health studies with limited and pre-collected data, in this article, we study offline reinforcement learning methods. To efficiently use these datasets for policy optimization, we propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms. Specifically, when the initial policy is not consistent, our method will output a policy whose value is no worse and often better than that of the initial policy. When the initial policy is consistent, under some mild conditions, our method will yield a policy whose value converges to the optimal one at a faster rate than the initial policy, achieving the desired“value enhancement” property. The proposed method is generally applicable to any parameterized policy that belongs to certain pre-specified function class (e.g., deep neural networks). Extensive numerical studies are conducted to demonstrate the superior performance of our method. Supplementary materials for this article are available online.
Item Type: | Article |
---|---|
Additional Information: | © 2023 American Statistical Association |
Divisions: | Statistics |
Subjects: | H Social Sciences > HA Statistics |
Date Deposited: | 23 Apr 2024 11:15 |
Last Modified: | 15 Nov 2024 06:39 |
URI: | http://eprints.lse.ac.uk/id/eprint/122756 |
Actions (login required)
View Item |