Cookies?
Library Header Image
LSE Research Online LSE Library Services

Rotting infinitely many-armed bandits

Kim, Jung-Hun, Vojnovic, Milan ORCID: 0000-0003-1382-022X and Yun, Se-Young (2022) Rotting infinitely many-armed bandits. In: Proceedings of the 39th International Conference on Machine Learning. Journal of Machine Learning Research, pp. 11229-11254.

[img] Text (Rotting Infinitely Many-Armed Bandits) - Published Version
Download (547kB)

Abstract

We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate ϱ=o(1). We show that this learning problem has an Ω(max{ϱ1/3T,T−−√}) worst-case regret lower bound where T is the time horizon. We show that a matching upper bound O~(max{ϱ1/3T,T−−√}), up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate ϱ. We also show that an O~(max{ϱ1/3T,T3/4}) regret upper bound can be achieved by an algorithm that does not know the value of ϱ, by using an adaptive UCB index along with an adaptive threshold value.

Item Type: Book Section
Official URL: https://proceedings.mlr.press/v162/kim22j.html
Additional Information: © The Author(s).
Divisions: Statistics
Subjects: H Social Sciences > HA Statistics
Date Deposited: 30 Sep 2022 09:45
Last Modified: 03 Oct 2024 18:30
URI: http://eprints.lse.ac.uk/id/eprint/116714

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics