Up a level |
Ma, Tao ORCID: 0000-0002-8062-9217, Yang, Xuzhi and Szabo, Zoltan (2024) To switch or not to switch? Balanced policy switching in offline reinforcement learning. . arXiv. (Submitted)
Yang, Xuzhi and Wang, Tengyao ORCID: 0000-0003-2072-6645 (2024) Multiple-output composite quantile regression through an optimal transport lens. Proceedings of Machine Learning Research, 247. pp. 5076-5122. ISSN 2640-3498