Combinatorial bandits for maximum value reward function under value-index feedback

Wang, Yiliu, Chen, Wei and Vojnovic, Milan ORCID: 0000-0003-1382-022X (2024) Combinatorial bandits for maximum value reward function under value-index feedback. In: ICLR 2024 The Twelfth International Conference on Learning Representations, 2024-05-07 - 2024-05-11, Messe Wien Exhibition and Congress Center, Vienna, Austria, AUT.

Text (Combinatorial Bandits for Maximum Value Reward Function under Value-Index Feedback) - Published Version
Download (662kB)

Abstract

We investigate the combinatorial multi-armed bandit problem where an action is to select $k$ arms from a set of base arms, and its reward is the maximum of the sample values of these $k$ arms, under a weak feedback structure that only returns the value and index of the arm with the maximum value. This novel feedback structure is much weaker than the semi-bandit feedback previously studied and is only slightly stronger than the full-bandit feedback, and thus it presents a new challenge for the online learning task. We propose an algorithm and derive a regret bound for instances where arm outcomes follow distributions with finite supports. Our algorithm introduces a novel concept of biased arm replacement to address the weak feedback challenge, and it achieves a distribution-dependent regret bound of $O((k/\Delta)\log(T))$ and a distribution-independent regret bound of $\tilde{O}(\sqrt{T})$, where $\Delta$ is the reward gap and $T$ is the time horizon. Notably, our regret bound is comparable to the bounds obtained under the more informative semi-bandit feedback. We demonstrate the effectiveness of our algorithm through experimental results.

Item Type:	Conference or Workshop Item (Paper)
Additional Information:	© 2024 The Author(s)
Divisions:	Statistics
Subjects:	H Social Sciences > HA Statistics
Date Deposited:	19 Jun 2024 10:39
Last Modified:	09 May 2025 04:41
URI:	http://eprints.lse.ac.uk/id/eprint/123919

Actions (login required)

View Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics