Efficient Algorithms for Combinatorial-Bandits with Monotonicity
Published in NeurIPS 2025 OPT Workshop, 2025
Wagde, A., Saha, A. NeurIPS 2025 OPT Workshop
In combinatorial bandits, a learner picks a subset of arms each round and observes a reward aggregated by an unknown function — prior work required strong structural assumptions (e.g., linearity) on that aggregator, leading to intractable sample complexities in the general case. We show that under monotonicity alone, two arms can be compared by randomizing over the remaining arms in the chosen set, effectively isolating a pairwise signal from combinatorial feedback. This reduction lets us plug in any pairwise-preference bandit algorithm, matching state-of-the-art sample complexity while dropping the restrictive aggregation assumptions that made previous approaches brittle.
