Emilie Kaufmann - Solving pure exploration tasks in bandit models
Séminaire « Probabilités et Statistique »Bandit models are well-studied in the machine learning community due to numerous applications to online content optimization. In a multi-armed bandit model, an agent sequentially tries different actions, called arms. Each arm is associated with an unknown probability distribution. In a pure exploration task, the agent wants to learn something about these distributions (e.g., which arms has the largest expectation, in the best identification task) by querying as few samples as possible from them. After presenting a lower bound on the number of samples needed by any algorithm that solves the task with a given error probability, I will present some algorithms matching this lower bound, at least when the error probability is small. The first algorithm, Track and Stop, is directly inspired by the lower bound but can have a high computational cost. To mitigate it in the particular case of the best arm identification task, I will then advocate the use of a family of algorithms called “Top Two” algorithms.
Partager sur X Partager sur Facebook