![]() | Up a level |
Ma, Tao ORCID: 0000-0002-8062-9217, Yang, Xuzhi and Szabo, Zoltan
ORCID: 0000-0001-6183-7603
(2024)
To switch or not to switch? Balanced policy switching in offline reinforcement learning.
.
arXiv.
(Submitted)
Yang, Xuzhi and Wang, Tengyao ORCID: 0000-0003-2072-6645
(2024)
Multiple-output composite quantile regression through an optimal transport lens.
Proceedings of Machine Learning Research, 247.
pp. 5076-5122.
ISSN 2640-3498