Robust offline reinforcement learning with heavy-tailed rewards

Zhu, Jin ORCID: 0000-0001-8550-5822, Wan, Runzhe, Qi, Zhengling, Luo, Shikai and Shi, Chengchun ORCID: 0000-0001-7773-2099 (2024) Robust offline reinforcement learning with heavy-tailed rewards. Proceedings of Machine Learning Research, 238. 541 - 549. ISSN 2640-3498

Text (Robust Offline Reinforcement Learning with Heavy-Tailed Rewards) - Accepted Version
Available under License Creative Commons Attribution.
Download (1MB)

Abstract

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavytailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavytailed reward distributions. The implementation of the proposal is available at https://github.com/Mamba413/ROOM.

Item Type:	Article
Official URL:	https://proceedings.mlr.press/
Additional Information:	© 2024 The Author(s)
Divisions:	Statistics
Subjects:	H Social Sciences > HA Statistics
Date Deposited:	23 Apr 2024 08:30
Last Modified:	05 Dec 2025 19:19
URI:	http://eprints.lse.ac.uk/id/eprint/122740

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics