Up a level |
Zhu, Jin ORCID: 0000-0001-8550-5822, Wan, Runzhe, Qi, Zhengling, Luo, Shikai and Shi, Chengchun ORCID: 0000-0001-7773-2099 (2024) Robust offline reinforcement learning with heavy-tailed rewards. In: Dasgupta, Sanjoy, Mandt, Stephan and Li, Yingzhen, (eds.) Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024. International Conference on Machine Learning, Valencia, Spain, 541 - 549.
Shi, Chengchun ORCID: 0000-0001-7773-2099, Wan, Runzhe, Song, Ge, Luo, Shikai, Zhu, Hongtu and Song, Rui (2023) A multiagent reinforcement learning framework for off-policy evaluation in two-sided markets. Annals of Applied Statistics, 17 (4). 2701 - 2722. ISSN 1932-6157
Wan, Runzhe, Zhang, Sheng, Shi, Chengchun ORCID: 0000-0001-7773-2099, Luo, Shikai and Song, Rui (2021) Pattern transfer learning for reinforcement learning in order dispatching. In: International Joint Conference on Artificial Intelligence, 2021-08-19 - 2021-08-26. (In Press)
Shi, Chengchun ORCID: 0000-0001-7773-2099, Wan, Runzhe, Chernozhukov, Victor and Song, Rui (2021) Deeply-debiased off-policy interval estimation. In: International Conference on Machine Learning, 2021-07-18 - 2021-07-24, Online. (In Press)
Shi, Chengchun ORCID: 0000-0001-7773-2099, Wan, Runzhe, Song, Rui, Lu, Wenbin and Leng, Ling (2020) Does the Markov decision process fit the data: testing for the Markov property in sequential decision making. In: International Conference on Machine Learning, 2020-07-12 - 2020-07-18, Online. (In Press)