Efficient Algorithms for Combinatorial-Bandits with Monotonicity

Published in NeurIPS 2025 OPT Workshop, 2025

Wagde, A., Saha, A. NeurIPS 2025 OPT Workshop

In combinatorial bandits, a learner picks a subset of arms each round and observes a reward aggregated by an unknown function — prior work required strong structural assumptions (e.g., linearity) on that aggregator, leading to intractable sample complexities in the general case. We show that under monotonicity alone, two arms can be compared by randomizing over the remaining arms in the chosen set, effectively isolating a pairwise signal from combinatorial feedback. This reduction lets us plug in any pairwise-preference bandit algorithm, matching state-of-the-art sample complexity while dropping the restrictive aggregation assumptions that made previous approaches brittle.

Share on

Facebook LinkedIn

Aniket Wagde

Share on