Bandits for Algorithm Selection

Project Overview

Title: Bayesian Bandits for Algorithm Selection: Latent-State Modeling and Spatial Reward Structures
Supervisors: Prof. Christian Brownlees, Prof. David Rossell
Institution: Universitat Pompeu Fabra & Barcelona School of Economics
Objective: To develop and evaluate bandit algorithms capable of dynamically selecting forecasting models under non-stationary and spatially-structured reward distributions.
Team Members: Marvin Ernst, Oriol Gelabert, Melisa Vadenja

Motivation

Many real-world systems - from online platforms to adaptive control - require sequential decisions in uncertain environments where rewards evolve over time or depend on spatial features. This project explores how bandit algorithms can be enhanced to adapt to such non-stationarity, via hidden latent-state dynamics and spatial correlation among arms.

Problem Formulation

Setting: Each day (or time step) $t$, a decision-maker selects one arm (algorithm) out of $K$ options, receiving a stochastic reward.
Objective: Minimize cumulative regret over a finite horizon $T$, compared to the best dynamic or spatially optimal policy.
Extensions:
- Dynamic Bandits: Rewards evolve due to unobserved latent states.
- Spatial Bandits: Arms are embedded in a continuous space and share smooth reward structure.

Contributions

Dynamic Bandits:
- Implemented two Bayesian latent-state models:
  - M1: Arm-specific Hidden Markov Models (HMMs)
  - M2: Globally shared latent HMM
- Compared to autoregressive (AR) and classical baselines.
- Showed that M1-TS consistently outperforms others across settings, especially in adapting to hidden regime switches.
Spatial Bandits:
- Benchmarked GP-based UCB/TS against:
  - Zoom-In: A tree-based region refinement strategy for Lipschitz bandits.
  - Classical UCB as a baseline.
- Demonstrated the impact of kernel assumptions (noise, length scale) on GP performance.
- Analyzed robustness of Zoom-In in irregular landscapes and its sensitivity to hyperparameters.
Hybrid Strategies (Future Direction):
- Proposed combining GP-based exploration with Zoom-In-style regional refinement to balance accuracy and scalability.

Evaluation Metrics

Cumulative Regret
Instantaneous Regret
Distance to the Best Arm (for spatial scenarios)

Key Algorithms Implemented

Bayesian Inference in JAGS (for M1/M2)
Autoregressive Bandits
GP-UCB, GP-TS
Zoom-In Algorithm with multiple scoring rules (UCB, tuned-UCB, TS)
Baseline UCB and Thompson Sampling

Literature

Our methods build on key contributions in bandit theory, Bayesian inference, and spatial modeling:

Desautels et al. (2014) – Gaussian Process Bandits with Exploration-Exploitation
Introduced a framework for safe exploration using GPs under constraints, demonstrating GP-UCB effectiveness in structured spaces.
Chowdhury & Gopalan (2017) – Bandit Optimization with Gaussian Processes: Theory and Algorithms
Provided theoretical regret bounds and adaptive confidence bounds for GP-UCB, enhancing its robustness under unknown noise conditions.
Salgia et al. (2021) – Lipschitz Bandits without the Lipschitz Constant
Extended the Zooming algorithm to settings without prior knowledge of smoothness constants, leading to more adaptive exploration in spatial bandits.
Kandasamy et al. (2018) – Parallelised Bayesian Optimisation via Thompson Sampling
Discussed Bayesian optimization with Gaussian Processes, including Thompson Sampling in high-dimensional input spaces.
Kleinberg et al. (2008) – Regret Bounds for Restless Bandits
Introduced the theoretical foundations for the Zooming algorithm and regret minimization under Lipschitz continuity assumptions.
Lattimore & Szepesvári (2020) – Bandit Algorithms
Comprehensive textbook that lays the theoretical groundwork for classical and modern bandit methods, including UCB, Thompson Sampling, and structured bandits.
Hamilton (1994) – Time Series Analysis
Fundamental resource for understanding autoregressive models and hidden Markov models, which we adapted for non-stationary bandits in our dynamic modeling framework.

Final Remarks

This project advances the understanding of structured bandit problems by providing detailed empirical evidence and implementation insights. We highlight the strengths of Bayesian models in non-stationary settings and reveal the computational trade-offs in spatial domains. Our findings advocate for adaptive, problem-driven strategy design, and motivate further research in hybrid and scalable bandit architectures.

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
.Rproj.user		.Rproj.user
adv_sampling		adv_sampling
appendix		appendix
baseline		baseline
hmm		hmm
spatial		spatial
src		src
switching		switching
switching_models		switching_models
.DS_Store		.DS_Store
.RData		.RData
.Rhistory		.Rhistory
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
bandits_algorithms.Rproj		bandits_algorithms.Rproj
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bandits for Algorithm Selection

Project Overview

Motivation

Problem Formulation

Contributions

Evaluation Metrics

Key Algorithms Implemented

Literature

Final Remarks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

marvinernst1020/bandits_algorithms

Folders and files

Latest commit

History

Repository files navigation

Bandits for Algorithm Selection

Project Overview

Motivation

Problem Formulation

Contributions

Evaluation Metrics

Key Algorithms Implemented

Literature

Final Remarks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages