{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sample Size Determination"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This tutorial shows how to compute the minimum sample size needed for an A/B test experiment with two variants, so \n",
    "called control and treatment groups. This problem is usually referred as __Sample Size Determination (SSD)__. \n",
    "\n",
    "Let's import first the tools needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "pycharm": {
     "is_executing": false,
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from abexp.core.design import SampleSize"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Formulate hp #1__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which kind of A/B experiment do you intend to run?\n",
    "\n",
    "* __Compare means__: the experiment aims to compare the mean of a certain metrics in the control group versus the \n",
    "treatment group. This metrics is a continuous variable and it represents the kpi of the experiment, e.g. revenue.\n",
    "\n",
    "* __Compare proportions__: the experiment aims to compare the proportion/probability of a certain metrics the control \n",
    "group versus the treatment group. This metrics represents the kpi of the experiment, e.g. %churners, probability of \n",
    "having premium users."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compare means"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Formulate hp #2__\n",
    "\n",
    "Here you need to define the desirable minimum delta between control and treatment groups:\n",
    "\n",
    "* What is the mean of the control group?\n",
    "* What is the standard deviation of the control group?\n",
    "* What is the desirable/expected mean of the treatment group?\n",
    "\n",
    "Define these according to your domain expertise. Please formulate reasonable values that you expect see at the end of \n",
    "the experiment (after that the treatment will be applied to the treatment group)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Compute sample size__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Minimum sample size per each group = 6280\n"
     ]
    }
   ],
   "source": [
    "sample_size = SampleSize.ssd_mean(mean_contr=790, mean_treat=800, std_contr=200, alpha=0.05, power=0.8)\n",
    "print('Minimum sample size per each group = {}'.format(sample_size))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Compare proportions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Formulate hp #2__\n",
    "\n",
    "Here you need to define the desirable minimum delta between control and treatment groups:\n",
    "\n",
    "* What is the proportion in the control group?\n",
    "* What is the desirable/expected proportion in the treatment group?\n",
    "\n",
    "Define these according to your domain expertise. Please formulate reasonable values that you expect see at the end of \n",
    "the experiment (after that the treatment will be applied to the treatment group)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Compute sample size__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Minimum sample size per each group = 8538\n"
     ]
    }
   ],
   "source": [
    "sample_size = SampleSize.ssd_prop(prop_contr=0.33, prop_treat=0.31, alpha=0.05, power=0.8)\n",
    "print('Minimum sample size per each group = {}'.format(sample_size))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Statistics behind"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "``abexp`` masks the statistical techniques applied in the background. Sample Size Determination is achieved \n",
    "via power analysis. Given the values of the three parameters below, it estimate the minimum sample size required: \n",
    "\n",
    "* significance level, default 0.05\n",
    "* power, default 0.80\n",
    "* estimation of the desirable minimum effect size, specific to the experiment \n",
    "\n",
    "The statistical tests used in this context are respectively *t-test* to compare means and *z-test* to compare \n",
    "proportions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Notes__\n",
    "\n",
    "* ``alpha`` and ``power`` are respectively set to 0.05 and 0.8, which are the suggested default values. Be careful if \n",
    "you want to change them.\n",
    "* Power analysis is valid on the assumption that sample data are normally distributed."
   ]
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}