Contextual Multi-Armed Bandit

For the contextual multi-armed bandit (sMAB) when user information is available (context), we implemented a generalisation of Thompson sampling algorithm (Agrawal and Goyal, 2014) based on PyMC.

The following notebook contains an example of usage of the class Cmab, which implements the algorithm above.

[1]:

import numpy as np

from pybandits.cmab import CmabBernoulli
from pybandits.model import BayesianLogisticRegression, BnnLayerParams, BnnParams, StudentTArray

[2]:

n_samples = 1000
n_features = 5

First, we need to define the input context matrix \(X\) of size (\(n\_samples, n\_features\)) and the mapping of possible actions \(a_i \in A\) to their associated model.

[3]:

# context
X = 2 * np.random.random_sample((n_samples, n_features)) - 1  # random float in the interval (-1, 1)
print("X: context matrix of shape (n_samples, n_features)")
print(X[:10])

X: context matrix of shape (n_samples, n_features)
[[-0.51708501  0.8677052  -0.85651326 -0.01717868 -0.55550567]
 [-0.11783748 -0.19821845 -0.58396253 -0.95549289 -0.47057144]
 [ 0.01556878 -0.56368519 -0.18574868 -0.32404641 -0.1372408 ]
 [-0.9443975   0.01500404  0.56423411 -0.86641491  0.60334343]
 [ 0.7946573   0.27415311 -0.4550776   0.66035428  0.82163391]
 [-0.59670444 -0.53962572 -0.78036081  0.53374036  0.88225237]
 [-0.8420764  -0.51840596  0.44333103  0.73697753 -0.71510825]
 [ 0.56507487 -0.31945956 -0.85636786 -0.24168348 -0.99600618]
 [-0.56264075 -0.28831818  0.22446813 -0.13593802 -0.4958253 ]
 [-0.02452864  0.66788657 -0.14766404 -0.9604324  -0.71732733]]

[4]:

# define action model
bias = StudentTArray.cold_start(mu=1, sigma=2, shape=1)
weight = StudentTArray.cold_start(shape=(n_features, 1))
layer_params = BnnLayerParams(weight=weight, bias=bias)
model_params = BnnParams(bnn_layer_params=[layer_params])

update_method = "VI"
update_kwargs = {"fit": {"n": 100}, "batch_size": 128, "optimizer_type": "adam"}

actions = {
    "a1": BayesianLogisticRegression(
        model_params=model_params, update_method=update_method, update_kwargs=update_kwargs
    ),
    "a2": BayesianLogisticRegression(
        model_params=model_params, update_method=update_method, update_kwargs=update_kwargs
    ),
}

We can now init the bandit given the mapping of actions \(a_i\) to their model.

[5]:

# init contextual Multi-Armed Bandit model
cmab = CmabBernoulli(actions=actions)

The predict function below returns the action selected by the bandit at time \(t\): \(a_t = argmax_k P(r=1|\beta_k, x_t)\). The bandit selects one action per each sample of the contect matrix \(X\).

[6]:

# predict action
pred_actions, _, _ = cmab.predict(X)
print("Recommended action: {}".format(pred_actions[:10]))

Recommended action: ['a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a2']

Now, we observe the rewards and the context from the environment. In this example rewards and the context are randomly simulated.

[7]:

# simulate reward from environment
simulated_rewards = np.random.randint(2, size=n_samples).tolist()
print("Simulated rewards: {}".format(simulated_rewards[:10]))

Simulated rewards: [0, 0, 1, 1, 0, 1, 0, 1, 0, 0]

Finally, we update the model providing per each action sample: (i) its context \(x_t\) (ii) the action \(a_t\) selected by the bandit, (iii) the corresponding reward \(r_t\).

[8]:

# update model
cmab.update(context=X, actions=pred_actions, rewards=simulated_rewards)

/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/model/core.py:1238: UserWarning: total_size not provided for observed variable `out` that uses pm.Minibatch
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pytensor/link/c/cmodule.py:2968: UserWarning: PyTensor could not link to a BLAS installation. Operations that might benefit from BLAS will be severely degraded.
This usually happens when PyTensor is installed via pip. We recommend it be installed via conda/mamba/pixi instead.
Alternatively, you can use an experimental backend such as Numba or JAX that perform their own BLAS optimizations, by setting `pytensor.config.mode == 'NUMBA'` or passing `mode='NUMBA'` when compiling a PyTensor function.
For more options and details see https://pytensor.readthedocs.io/en/latest/troubleshooting.html#how-do-i-configure-test-my-blas-library
  warnings.warn(

/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:256:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')

/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/model/core.py:1238: UserWarning: total_size not provided for observed variable `out` that uses pm.Minibatch
  warnings.warn(

/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:256:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')