Cookies?
Library Header Image
LSE Research Online LSE Library Services

Robust offline reinforcement learning with heavy-tailed rewards

Zhu, Jin ORCID: 0000-0001-8550-5822, Wan, Runzhe, Qi, Zhengling, Luo, Shikai and Shi, Chengchun ORCID: 0000-0001-7773-2099 (2024) Robust offline reinforcement learning with heavy-tailed rewards. In: Dasgupta, Sanjoy, Mandt, Stephan and Li, Yingzhen, (eds.) Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024. International Conference on Machine Learning, Valencia, Spain, 541 - 549.

[img] Text (Robust Offline Reinforcement Learning with Heavy-Tailed Rewards) - Accepted Version
Download (1MB)

Abstract

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavytailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavytailed reward distributions. The implementation of the proposal is available at https://github.com/Mamba413/ROOM.

Item Type: Book Section
Official URL: https://proceedings.mlr.press/
Additional Information: © 2024 The Author(s)
Divisions: Statistics
Subjects: H Social Sciences > HA Statistics
Date Deposited: 23 Apr 2024 08:30
Last Modified: 20 Dec 2024 00:20
URI: http://eprints.lse.ac.uk/id/eprint/122740

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics