{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analysis Bayesian Approach" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial shows how to perform post-test analysis of an A/B test experiment with two variants, so called control and \n", "treatment groups, using bayesian statistics. It handles both the case of means comparison and conversions comparison.\n", "\n", "Let's import first the tools needed." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from abexp.core.analysis_bayesian import BayesianAnalyzer\n", "from abexp.core.analysis_bayesian import BayesianGLMAnalyzer\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare means" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we want to compare the *average revenue per user* of the control group versus the treatment group." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Revenue for users\n", "np.random.seed(42)\n", "revenue_contr = np.random.randint(low=400, high=500, size=10000)\n", "revenue_treat = np.random.randint(low=500, high=700, size=10000)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Define the analyzer\n", "analyzer = BayesianAnalyzer()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "logp = -1.18e+05, ||grad|| = 3.0081e+10: 100%|██████████| 22/22 [00:00<00:00, 773.97it/s] \n", "Multiprocess sampling (4 chains in 4 jobs)\n", "CompoundStep\n", ">Metropolis: [nu_minus_one]\n", ">Metropolis: [std_treat]\n", ">Metropolis: [std_contr]\n", ">Metropolis: [mean_treat]\n", ">Metropolis: [mean_contr]\n", "Sampling 4 chains, 0 divergences: 100%|██████████| 202000/202000 [02:51<00:00, 1181.01draws/s]\n", "The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.\n", "The estimated number of effective samples is smaller than 200 for some parameters.\n" ] } ], "source": [ "prob, lift, diff_means, ci = analyzer.compare_mean(obs_contr=revenue_contr, obs_treat=revenue_treat)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability that mean revenue(treatment) is greater than mean revenue(control) = 94.79%\n" ] } ], "source": [ "print('Probability that mean revenue(treatment) is greater than mean revenue(control) = {:.2%}'.format(prob))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lift between treatment and control = 33.20%\n" ] } ], "source": [ "print('Lift between treatment and control = {:.2%}'.format(lift))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result of bayesian A/B testing is the probability that the treatment group perform better than the control group \n", "i.e. highest mean revenue per user value in the current example. This is a very intuitive way of doing A/B testing \n", "because it does not introduce any statistical measures (e.g. p-value) which are more difficult to be interpreted by \n", "non statisticians.\n", "\n", "We can set an arbitrary threshold to define how to consider the outcome of the bayesian test, e.g. if ``prob`` $>$ \n", "``90%`` we can conclude to a significative effect of the treatment on the mean revenue per user when compare to the \n", "control group." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compare proportions" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Number of users that made a purchase\n", "purchase_contr = 470\n", "purchase_treat = 500\n", "\n", "\n", "# Total number of users\n", "total_usr_treat = 5000\n", "total_usr_contr = 5000" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "prob, lift = analyzer.compare_conv(conv_contr=purchase_contr,\n", " conv_treat=purchase_treat,\n", " nobs_contr=total_usr_treat,\n", " nobs_treat=total_usr_contr)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability that mean revenue(treatment) is greater than mean revenue(control) = 84.45%\n" ] } ], "source": [ "print('Probability that mean revenue(treatment) is greater than mean revenue(control) = {:.2%}'.format(prob))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lift between treatment and control = 6.37%\n" ] } ], "source": [ "print('Lift between treatment and control = {:.2%}'.format(lift))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bayesian GLM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we want to compare the *average revenue per user* of the control group versus the treatment group. We are also \n", "interested to differentiate the results based on some categorical features of the input samples (i.e. \n", "``seniority_level``, ``country``)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Define the analyzer\n", "analyzer = BayesianGLMAnalyzer()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Multivariate Regression__" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame([[1, 4, 35],\n", " [0, 4, 5],\n", " [1, 3, 28],\n", " [0, 1, 5],\n", " [0, 2, 1],\n", " [1, 0, 1.5]], columns=['group', 'seniority_level', 'revenue'])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Auto-assigning NUTS sampler...\n", "Initializing NUTS using adapt_diag...\n", "Multiprocess sampling (4 chains in 4 jobs)\n", "NUTS: [lam, seniority_level, group, Intercept]\n", "Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [00:03<00:00, 2035.12draws/s]\n", "The number of effective samples is smaller than 25% for some parameters.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanstdmin25%50%75%maxProb<0Prob>0
Intercept1.0484602.940644-13.254892-0.3723760.9672422.37286226.8603660.303250.69675
group0.5767850.551946-1.4258420.1956780.5727840.9579112.7389900.147250.85275
seniority_level1.6465751.287070-2.4387780.8176721.3528012.2574628.2198040.050500.94950
lam0.7747181.3908440.0012020.1015340.2968130.82110616.3589890.000001.00000
\n", "
" ], "text/plain": [ " mean std min 25% 50% 75% \\\n", "Intercept 1.048460 2.940644 -13.254892 -0.372376 0.967242 2.372862 \n", "group 0.576785 0.551946 -1.425842 0.195678 0.572784 0.957911 \n", "seniority_level 1.646575 1.287070 -2.438778 0.817672 1.352801 2.257462 \n", "lam 0.774718 1.390844 0.001202 0.101534 0.296813 0.821106 \n", "\n", " max Prob<0 Prob>0 \n", "Intercept 26.860366 0.30325 0.69675 \n", "group 2.738990 0.14725 0.85275 \n", "seniority_level 8.219804 0.05050 0.94950 \n", "lam 16.358989 0.00000 1.00000 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats = analyzer.multivariate_regression(df, 'revenue')\n", "stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the last column ``Prob>0``, the table above shows that there is there is ``85.27%`` of probability that ``revenue`` \n", "of *group 1* is greater than *group 2*. Moreover it also shows that there is``94.95%`` of probability that \n", "``seniority level`` is positively associated to ``revenue``.\n", "\n", "For the sake of providing a general summary of statistics the table also shows: the intercept and lambda (``lam``) of \n", "the regression model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Hierarchical regression__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If your are not familiar with hierarchical regression have a look at the blog \n", "https://twiecki.io/blog/2014/03/17/bayesian-glms-3/." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "df = pd.DataFrame([[0, 5, 'USA'],\n", " [0, 5, 'USA'],\n", " [0, 100, 'Italy'],\n", " [1, 100, 'USA'],\n", " [1, 100, 'USA'],\n", " [1, 100, 'France']], columns=['group', 'revenue', 'country'])\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Auto-assigning NUTS sampler...\n", "Initializing NUTS using jitter+adapt_diag...\n", "Multiprocess sampling (4 chains in 4 jobs)\n", "NUTS: [eps, beta, alpha, sigma_beta, sigma_alpha, mu_beta, mu_alpha]\n", "Sampling 4 chains, 816 divergences: 100%|██████████| 6000/6000 [02:10<00:00, 45.87draws/s] \n", "There were 52 divergences after tuning. Increase `target_accept` or reparameterize.\n", "There were 364 divergences after tuning. Increase `target_accept` or reparameterize.\n", "The acceptance probability does not match the target. It is 0.2979906043312202, but should be close to 0.8. Try to increase the number of tuning steps.\n", "There were 75 divergences after tuning. Increase `target_accept` or reparameterize.\n", "The acceptance probability does not match the target. It is 0.6628490775514363, but should be close to 0.8. Try to increase the number of tuning steps.\n", "There were 325 divergences after tuning. Increase `target_accept` or reparameterize.\n", "The acceptance probability does not match the target. It is 0.7113696800957767, but should be close to 0.8. Try to increase the number of tuning steps.\n", "The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.\n", "The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.\n", "The estimated number of effective samples is smaller than 200 for some parameters.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanstdmin25%50%75%maxProb<0Prob>0
mu_alpha-0.0280850.989639-3.581447-0.695825-0.1322190.6881853.5981910.541000.45900
mu_beta0.1767660.993789-3.468508-0.4870230.3092180.8324373.5887250.397500.60250
alpha__USA14.07489437.636252-171.899366-0.9907960.31733211.625923240.5211790.458750.54125
alpha__Italy32.56469146.492324-57.351711-0.5323050.94573699.803488163.6130530.391500.60850
alpha__France2.5475046.700164-40.234538-0.4678541.0407514.97180091.0830580.355500.64450
beta__USA22.41934143.726614-140.604607-0.1454411.60378633.143822272.0225840.261500.73850
beta__Italy-1.96774858.002111-484.885230-3.5178650.3490323.400547481.3916530.448500.55150
beta__France34.93947045.972820-86.950038-0.0486461.92814394.856067208.5327130.256500.74350
sigma_alpha26.19733442.1251000.1901350.5289371.93784651.083900458.6401770.000001.00000
sigma_beta36.30963754.4662050.0756080.9896055.20323459.455603434.3678470.000001.00000
eps60.21896746.7600940.1039700.66405367.35677199.604387282.4302190.000001.00000
\n", "
" ], "text/plain": [ " mean std min 25% 50% \\\n", "mu_alpha -0.028085 0.989639 -3.581447 -0.695825 -0.132219 \n", "mu_beta 0.176766 0.993789 -3.468508 -0.487023 0.309218 \n", "alpha__USA 14.074894 37.636252 -171.899366 -0.990796 0.317332 \n", "alpha__Italy 32.564691 46.492324 -57.351711 -0.532305 0.945736 \n", "alpha__France 2.547504 6.700164 -40.234538 -0.467854 1.040751 \n", "beta__USA 22.419341 43.726614 -140.604607 -0.145441 1.603786 \n", "beta__Italy -1.967748 58.002111 -484.885230 -3.517865 0.349032 \n", "beta__France 34.939470 45.972820 -86.950038 -0.048646 1.928143 \n", "sigma_alpha 26.197334 42.125100 0.190135 0.528937 1.937846 \n", "sigma_beta 36.309637 54.466205 0.075608 0.989605 5.203234 \n", "eps 60.218967 46.760094 0.103970 0.664053 67.356771 \n", "\n", " 75% max Prob<0 Prob>0 \n", "mu_alpha 0.688185 3.598191 0.54100 0.45900 \n", "mu_beta 0.832437 3.588725 0.39750 0.60250 \n", "alpha__USA 11.625923 240.521179 0.45875 0.54125 \n", "alpha__Italy 99.803488 163.613053 0.39150 0.60850 \n", "alpha__France 4.971800 91.083058 0.35550 0.64450 \n", "beta__USA 33.143822 272.022584 0.26150 0.73850 \n", "beta__Italy 3.400547 481.391653 0.44850 0.55150 \n", "beta__France 94.856067 208.532713 0.25650 0.74350 \n", "sigma_alpha 51.083900 458.640177 0.00000 1.00000 \n", "sigma_beta 59.455603 434.367847 0.00000 1.00000 \n", "eps 99.604387 282.430219 0.00000 1.00000 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats = analyzer.hierarchical_regression(df, group_col='group', cat_col='country', kpi_col='revenue')\n", "stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the table above we will focus on the *beta* parameters which represents the coefficients of the hierarchical \n", "regression. In the last column ``Prob>0``, the table shows per each country the probability that ``revenue`` of \n", "*group 1* is greater than *group 2*. In this way we can have an idea of the country in which the treatment was more \n", "effective.\n", "\n", "For the sake of providing a general summary of statistics the table also shows: the *alpha* parameters which are the \n", "intercepts of the hierarchical regression; mu, sigma and eps which are the hyperpriors of the regression." ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 4 }