{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Stochastic Multi-Armed Bandit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the stochastic multi-armed bandit (sMAB), we implemented a Bernoulli multi-armed bandit based on Thompson sampling algorithm ([Agrawal and Goyal, 2012](http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![title](img/smab.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following notebook contains an example of usage of the class Smab, which implements the algorithm above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we need to define the list of possible actions $a_i \\in A$ and the priors parameters for each Beta distibution $\\alpha, \\beta$. By setting them all to 1, all actions have the same probability to be selected by the bandit at the beginning before the first update."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# define actions\n",
    "action_ids = [\"Action A\", \"Action B\", \"Action C\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "from pybandits.model import Beta\n",
    "from pybandits.smab import SmabBernoulli"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_samples = 1000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we need to define the mapping of possible actions $a_i \\in A$ to their associated model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define action model\n",
    "actions = {\n",
    "    \"a1\": Beta(),\n",
    "    \"a2\": Beta(),\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now init the bandit given the mapping of actions $a_i$ to their model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# init stochastic Multi-Armed Bandit model\n",
    "smab = SmabBernoulli(actions=actions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The predict function below returns the action selected by the bandit at time $t$: $a_t = argmax_k \\theta_k^t$, where $\\theta_k^t$ is the sample from the Beta distribution $k$ at time $t$. The bandit selects one action at time when n_samples=1, or it selects batches of samples when n_samples>1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# predict actions\n",
    "pred_actions, _ = smab.predict(n_samples=n_samples)\n",
    "print(\"Recommended action: {}\".format(pred_actions[:10]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we observe the rewards from the environment. In this example rewards are randomly simulated. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# simulate reward from environment\n",
    "simulated_rewards = np.random.randint(2, size=n_samples)\n",
    "print(\"Simulated rewards: {}\".format(simulated_rewards[:10]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we update the model providing per each action sample: (i) the action $a_t$ selected by the bandit, (ii) the corresponding reward $r_t$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "smab.update(actions=pred_actions, rewards=simulated_rewards)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "hide_input": false,
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}