cMAB Simulation

This notebook shows a simulation framework for the contextual multi-armed bandit (cMAB). It allows to study the behaviour of the bandit algoritm, to evaluate results and to run experiments on simulated data under different context, reward and action settings.

[1]:
from sklearn.datasets import make_classification

from pybandits.cmab import CmabBernoulli
from pybandits.cmab_simulator import CmabSimulator
from pybandits.model import BayesianLogisticRegression, BnnLayerParams, BnnParams, StudentTArray
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pydantic/_migration.py:283: UserWarning: `pydantic.generics:GenericModel` has been moved to `pydantic.BaseModel`.
  warnings.warn(f'`{import_path}` has been moved to `{new_location}`.')
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

First we need to define the simulation parameters. The parameters are split into two parts. The general parameters contain:

  • Number of update rounds

  • Number of samples per batch of update round

  • Seed for reproducibility

  • Verbosity enabler

  • Visualization enabler

The problem definition parameters contain:

  • Number of groups

  • Number of features

Data are processed in batches of size n>=1. Per each batch of simulated samples, the cMAB selects one action and collects the corresponding simulated reward for each sample. Then, prior parameters are updated based on returned rewards from recommended actions.

[2]:
# general simulator parameters
n_updates = 10
batch_size = 100
random_seed = None
verbose = True
visualize = True
[3]:
# problem definition simulation parameters
n_groups = 3
n_features = 5

Next, we initialize the context matrix \(X\) and the groups of samples. Samples that belong to the same group have features that come from the same distribution. Then, the action model and the cMAB are defined. We define three actions, each with a Bayesian Logistic Regression model. The model is defined by a Student-T prior for the intercept and a Student-T prior for each feature coefficient.

[4]:
# init context matrix and groups

context, group = make_classification(
    n_samples=batch_size * n_updates, n_features=n_features, n_informative=n_features, n_redundant=0, n_classes=n_groups
)
group = [str(g) for g in group]
[5]:
# define action model


def create_model_params(n_features, bias_mu, bias_sigma):
    """Create model parameters for Bayesian Logistic Regression."""

    bias = StudentTArray.cold_start(mu=bias_mu, sigma=bias_sigma, shape=1)
    weight = StudentTArray.cold_start(shape=(n_features, 1))
    layer_params = BnnLayerParams(weight=weight, bias=bias)
    model_params = BnnParams(bnn_layer_params=[layer_params])
    return model_params


actions = {
    "a1": BayesianLogisticRegression(
        model_params=create_model_params(n_features=n_features, bias_mu=1, bias_sigma=2), update_method="VI"
    ),
    "a2": BayesianLogisticRegression(
        model_params=create_model_params(n_features=n_features, bias_mu=1, bias_sigma=2), update_method="VI"
    ),
    "a3": BayesianLogisticRegression(
        model_params=create_model_params(n_features=n_features, bias_mu=1, bias_sigma=2), update_method="VI"
    ),
}
# init contextual Multi-Armed Bandit model
cmab = CmabBernoulli(actions=actions)

Finally, we need to define the probabilities of positive rewards per each action/group, i.e. the ground truth (‘Action A’: 0.8 for group ‘0’ means that if the bandits selects ‘Action A’ for samples that belong to group ‘0’, then the environment will return a positive reward with 80% probability).

[6]:
# init probability of rewards randomly using splines
probs_reward = None

Now, we initialize the cMAB as shown in the previous notebook and the CmabSimulator with the parameters set above.

[7]:
# init simulation
cmab_simulator = CmabSimulator(
    mab=cmab,
    group=group,
    batch_size=batch_size,
    n_updates=n_updates,
    probs_reward=probs_reward,
    context=context,
    verbose=verbose,
)

Now, we can start simulation process by executing run() which performs the following steps:

For i=0 to n_updates:
    Extract batch[i] of samples from X
    Model recommends the best actions as the action with the highest reward probability to each simulated sample in batch[i] and collect corresponding simulated rewards
    Model priors are updated using information from recommended actions and returned rewards

Finally, we can visualize the results of the simulation. As defined in the ground truth: ‘a2’ was the action recommended the most for samples that belong to group ‘0’, ‘a1’ to group ‘1’ and both ‘a1’ and ‘a3’ to group ‘2’.

[8]:
cmab_simulator.run()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pytensor/link/c/cmodule.py:2968: UserWarning: PyTensor could not link to a BLAS installation. Operations that might benefit from BLAS will be severely degraded.
This usually happens when PyTensor is installed via pip. We recommend it be installed via conda/mamba/pixi instead.
Alternatively, you can use an experimental backend such as Numba or JAX that perform their own BLAS optimizations, by setting `pytensor.config.mode == 'NUMBA'` or passing `mode='NUMBA'` when compiling a PyTensor function.
For more options and details see https://pytensor.readthedocs.io/en/latest/troubleshooting.html#how-do-i-configure-test-my-blas-library
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:328: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  self._results = pd.concat((self._results, batch_results), ignore_index=True)
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 144.04
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 425.72
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 543.87
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 44.001
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 281.2
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 810.11
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 30.812
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 54.165
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 1,077.2
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 49.97
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 4.2044
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 482.39
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 33.724
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 2.7433
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 608.91
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 24.827
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 0.14918
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 438.71
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 49.209
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 0.41397
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 267.89
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 31.827
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 207.31
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 36.6
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 0.55147
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 158.57
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
Sampling: [bias_0, out, weight_0]
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: overflow encountered in exp
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/work/pybandits/pybandits/pybandits/simulator.py:221: RuntimeWarning: invalid value encountered in scalar divide
  return np.where(s >= 0, 1 / (1 + np.exp(-s)), np.exp(s) / (1 + np.exp(s))).item()
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 50.125
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 2.9248
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/pymc/data.py:384: FutureWarning: Data is now always mutable. Specifying the `mutable` kwarg will raise an error in a future release
  warnings.warn(
/home/runner/.cache/pypoetry/virtualenvs/pybandits-vYJB-miV-py3.10/lib/python3.10/site-packages/rich/live.py:231:
UserWarning: install "ipywidgets" for Jupyter support
  warnings.warn('install "ipywidgets" for Jupyter support')
Finished [100%]: Average Loss = 37.483
2025-06-23 11:39:31.191 | INFO     | pybandits.simulator:_print_results:583 - Simulation results (first 10 observations):

2025-06-23 11:39:31.212 | INFO     | pybandits.simulator:_print_results:584 - Count of actions selected by the bandit:

2025-06-23 11:39:31.215 | INFO     | pybandits.simulator:_print_results:585 - Observed proportion of positive rewards for each action:

Furthermore, we can examine the number of times each action was selected and the proportion of positive rewards for each action.

[9]:
cmab_simulator.selected_actions_count
[9]:
action a1 a2 a3 cum_a1 cum_a2 cum_a3
group batch
0 0.0 11 8 11 11 8 11
1 0.0 13 8 16 13 8 16
2 0.0 14 6 13 14 6 13
0 1.0 2 6 23 13 14 34
1 1.0 15 14 3 28 22 19
2 1.0 7 16 14 21 22 27
0 2.0 13 21 9 26 35 43
1 2.0 13 16 2 41 38 21
2 2.0 9 12 5 30 34 32
0 3.0 7 10 5 33 45 48
1 3.0 19 14 2 60 52 23
2 3.0 19 24 0 49 58 32
0 4.0 18 13 3 51 58 51
1 4.0 17 19 1 77 71 24
2 4.0 17 10 2 66 68 34
0 5.0 13 13 0 64 71 51
1 5.0 19 18 0 96 89 24
2 5.0 20 16 1 86 84 35
0 6.0 17 15 1 81 86 52
1 6.0 17 12 1 113 101 25
2 6.0 25 12 0 111 96 35
0 7.0 24 16 0 105 102 52
1 7.0 18 13 0 131 114 25
2 7.0 14 15 0 125 111 35
0 8.0 26 8 0 131 110 52
1 8.0 26 10 0 157 124 25
2 8.0 20 8 2 145 119 37
0 9.0 29 5 5 160 115 57
1 9.0 17 6 2 174 130 27
2 9.0 29 6 1 174 125 38
0 total 160 115 57 160 115 57
1 total 174 130 27 174 130 27
2 total 174 125 38 174 125 38
total total 508 370 122 508 370 122
[10]:
cmab_simulator.positive_reward_proportion
[10]:
proportion
action group
a1 0 0.05
1 0.534483
2 0.718391
a2 0 0.469565
1 0.630769
2 0.848
a3 0 0.105263
1 0.037037
2 0.052632