{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# cMAB Simulation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook shows a simulation framework for the contextual multi-armed bandit (cMAB). It allows to study the behaviour of the bandit algoritm, to evaluate results and to run experiments on simulated data under different context, reward and action settings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import make_classification\n",
    "\n",
    "from pybandits.cmab import CmabBernoulli\n",
    "from pybandits.cmab_simulator import CmabSimulator\n",
    "from pybandits.model import BayesianLogisticRegression, BnnLayerParams, BnnParams, StudentTArray"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we need to define the simulation parameters. The parameters are split into two parts. The general parameters contain:\n",
    "- Number of update rounds\n",
    "- Number of samples per batch of update round\n",
    "- Seed for reproducibility\n",
    "- Verbosity enabler\n",
    "- Visualization enabler\n",
    "\n",
    "The problem definition parameters contain:\n",
    "- Number of groups\n",
    "- Number of features\n",
    "\n",
    "Data are processed in batches of size n>=1. Per each batch of simulated samples, the cMAB selects one action and collects the corresponding simulated reward for each sample. Then, prior parameters are updated based on returned rewards from recommended actions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# general simulator parameters\n",
    "n_updates = 10\n",
    "batch_size = 100\n",
    "random_seed = None\n",
    "verbose = True\n",
    "visualize = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# problem definition simulation parameters\n",
    "n_groups = 3\n",
    "n_features = 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we initialize the context matrix $X$ and the groups of samples. Samples that belong to the same group have features that come from the same distribution.\n",
    "Then, the action model and the cMAB are defined. We define three actions, each with a Bayesian Logistic Regression model. The model is defined by a Student-T prior for the intercept and a Student-T prior for each feature coefficient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init context matrix and groups\n",
    "\n",
    "context, group = make_classification(\n",
    "    n_samples=batch_size * n_updates, n_features=n_features, n_informative=n_features, n_redundant=0, n_classes=n_groups\n",
    ")\n",
    "group = [str(g) for g in group]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define action model\n",
    "\n",
    "\n",
    "def create_model_params(n_features, bias_mu, bias_sigma):\n",
    "    \"\"\"Create model parameters for Bayesian Logistic Regression.\"\"\"\n",
    "\n",
    "    bias = StudentTArray.cold_start(mu=bias_mu, sigma=bias_sigma, shape=1)\n",
    "    weight = StudentTArray.cold_start(shape=(n_features, 1))\n",
    "    layer_params = BnnLayerParams(weight=weight, bias=bias)\n",
    "    model_params = BnnParams(bnn_layer_params=[layer_params])\n",
    "    return model_params\n",
    "\n",
    "\n",
    "actions = {\n",
    "    \"a1\": BayesianLogisticRegression(\n",
    "        model_params=create_model_params(n_features=n_features, bias_mu=1, bias_sigma=2), update_method=\"VI\"\n",
    "    ),\n",
    "    \"a2\": BayesianLogisticRegression(\n",
    "        model_params=create_model_params(n_features=n_features, bias_mu=1, bias_sigma=2), update_method=\"VI\"\n",
    "    ),\n",
    "    \"a3\": BayesianLogisticRegression(\n",
    "        model_params=create_model_params(n_features=n_features, bias_mu=1, bias_sigma=2), update_method=\"VI\"\n",
    "    ),\n",
    "}\n",
    "# init contextual Multi-Armed Bandit model\n",
    "cmab = CmabBernoulli(actions=actions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we need to define the probabilities of positive rewards per each action/group, i.e. the ground truth ('Action A': 0.8 for group '0' means that if the bandits selects 'Action A' for samples that belong to group '0', then the environment will return a positive reward with 80% probability).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init probability of rewards randomly using splines\n",
    "probs_reward = None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we initialize the cMAB as shown in the previous notebook and the CmabSimulator with the parameters set above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init simulation\n",
    "cmab_simulator = CmabSimulator(\n",
    "    mab=cmab,\n",
    "    group=group,\n",
    "    batch_size=batch_size,\n",
    "    n_updates=n_updates,\n",
    "    probs_reward=probs_reward,\n",
    "    context=context,\n",
    "    verbose=verbose,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can start simulation process by executing run() which performs the following steps:\n",
    "```\n",
    "For i=0 to n_updates:\n",
    "    Extract batch[i] of samples from X\n",
    "    Model recommends the best actions as the action with the highest reward probability to each simulated sample in batch[i] and collect corresponding simulated rewards\n",
    "    Model priors are updated using information from recommended actions and returned rewards\n",
    "```\n",
    "Finally, we can visualize the results of the simulation. As defined in the ground truth: 'a2' was the action recommended the most for samples that belong to group '0', 'a1' to group '1' and both 'a1' and 'a3' to group '2'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cmab_simulator.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Furthermore, we can examine the number of times each action was selected and the proportion of positive rewards for each action."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cmab_simulator.selected_actions_count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cmab_simulator.positive_reward_proportion"
   ]
  }
 ],
 "metadata": {
  "hide_input": false,
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}