Contextual Multi-Armed Bandit
For the contextual multi-armed bandit (sMAB) when user information is available (context), we implemented a generalisation of Thompson sampling algorithm (Agrawal and Goyal, 2014) based on PyMC.
The following notebook contains an example of usage of the class Cmab, which implements the algorithm above.
[1]:
import numpy as np
from pybandits.cmab import CmabBernoulli
from pybandits.model import BayesianLogisticRegression, BnnLayerParams, BnnParams, StudentTArray
[2]:
n_samples = 1000
n_features = 5
First, we need to define the input context matrix \(X\) of size (\(n\_samples, n\_features\)) and the mapping of possible actions \(a_i \in A\) to their associated model.
[3]:
# context
X = 2 * np.random.random_sample((n_samples, n_features)) - 1 # random float in the interval (-1, 1)
print("X: context matrix of shape (n_samples, n_features)")
print(X[:10])
X: context matrix of shape (n_samples, n_features)
[[-6.27729994e-02 3.64241569e-02 8.07490987e-01 -4.69264936e-01
4.40862013e-01]
[ 5.22715536e-01 -1.01686473e-01 1.58127367e-01 -4.48403154e-01
3.32056267e-01]
[-2.37493482e-01 5.31812669e-01 9.21579411e-01 -3.74646767e-04
2.99091133e-01]
[-8.59296990e-01 -3.35181661e-02 -1.25170200e-02 8.33872560e-01
4.96203274e-01]
[ 8.46801110e-01 -1.60253205e-01 -8.76845482e-01 -1.24047908e-01
4.95856839e-01]
[ 3.12709496e-01 -5.83789543e-01 -1.15841358e-01 -7.30177852e-01
-1.17754922e-01]
[-1.53730939e-01 6.63229007e-01 7.00016201e-01 1.80066455e-01
-7.58149537e-01]
[ 1.89990895e-01 9.54926377e-01 -5.98092073e-01 9.89581378e-01
-8.99822348e-01]
[ 9.43759372e-01 -5.06548261e-01 1.11480662e-01 -9.36462854e-01
-4.79464061e-01]
[ 7.14464058e-01 -7.31254754e-01 -6.29802029e-01 1.83901148e-02
-9.81210640e-01]]
[4]:
# define action model
bias = StudentTArray.cold_start(mu=1, sigma=2, shape=1)
weight = StudentTArray.cold_start(shape=(n_features, 1))
layer_params = BnnLayerParams(weight=weight, bias=bias)
model_params = BnnParams(bnn_layer_params=[layer_params])
update_method = "VI"
update_kwargs = {"fit": {"n": 100}, "batch_size": 128, "optimizer_type": "adam"}
actions = {
"a1": BayesianLogisticRegression(
model_params=model_params, update_method=update_method, update_kwargs=update_kwargs
),
"a2": BayesianLogisticRegression(
model_params=model_params, update_method=update_method, update_kwargs=update_kwargs
),
}
We can now init the bandit given the mapping of actions \(a_i\) to their model.
[5]:
# init contextual Multi-Armed Bandit model
cmab = CmabBernoulli(actions=actions)
The predict function below returns the action selected by the bandit at time \(t\): \(a_t = argmax_k P(r=1|\beta_k, x_t)\). The bandit selects one action per each sample of the contect matrix \(X\).
[6]:
# predict action
pred_actions, _, _ = cmab.predict(X)
print("Recommended action: {}".format(pred_actions[:10]))
Recommended action: ['a2', 'a2', 'a1', 'a1', 'a2', 'a2', 'a1', 'a2', 'a2', 'a1']
Now, we observe the rewards and the context from the environment. In this example rewards and the context are randomly simulated.
[7]:
# simulate reward from environment
simulated_rewards = np.random.randint(2, size=n_samples).tolist()
print("Simulated rewards: {}".format(simulated_rewards[:10]))
Simulated rewards: [1, 1, 0, 0, 1, 0, 1, 0, 1, 0]
Finally, we update the model providing per each action sample: (i) its context \(x_t\) (ii) the action \(a_t\) selected by the bandit, (iii) the corresponding reward \(r_t\).
[8]:
# update model
cmab.update(context=X, actions=pred_actions, rewards=simulated_rewards)
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pytensor/link/c/cmodule.py:2968: UserWarning: PyTensor could not link to a BLAS installation. Operations that might benefit from BLAS will be severely degraded.
This usually happens when PyTensor is installed via pip. We recommend it be installed via conda/mamba/pixi instead.
Alternatively, you can use an experimental backend such as Numba or JAX that perform their own BLAS optimizations, by setting `pytensor.config.mode == 'NUMBA'` or passing `mode='NUMBA'` when compiling a PyTensor function.
For more options and details see https://pytensor.readthedocs.io/en/latest/troubleshooting.html#how-do-i-configure-test-my-blas-library
warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:260:
UserWarning: install "ipywidgets" for Jupyter support
warnings.warn('install "ipywidgets" for Jupyter support')
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:260:
UserWarning: install "ipywidgets" for Jupyter support
warnings.warn('install "ipywidgets" for Jupyter support')