Analysis Bayesian Approach¶
This tutorial shows how to perform post-test analysis of an A/B test experiment with two variants, so called control and treatment groups, using bayesian statistics. It handles both the case of means comparison and conversions comparison.
Let’s import first the tools needed.
[1]:
import numpy as np
import pandas as pd
from abexp.core.analysis_bayesian import BayesianAnalyzer
from abexp.core.analysis_bayesian import BayesianGLMAnalyzer
import warnings
warnings.filterwarnings('ignore')
Compare means¶
Here we want to compare the average revenue per user of the control group versus the treatment group.
[2]:
# Revenue for users
np.random.seed(42)
revenue_contr = np.random.randint(low=400, high=500, size=10000)
revenue_treat = np.random.randint(low=500, high=700, size=10000)
[3]:
# Define the analyzer
analyzer = BayesianAnalyzer()
[4]:
prob, lift, diff_means, ci = analyzer.compare_mean(obs_contr=revenue_contr, obs_treat=revenue_treat)
logp = -1.18e+05, ||grad|| = 3.0081e+10: 100%|██████████| 22/22 [00:00<00:00, 773.97it/s]
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>Metropolis: [nu_minus_one]
>Metropolis: [std_treat]
>Metropolis: [std_contr]
>Metropolis: [mean_treat]
>Metropolis: [mean_contr]
Sampling 4 chains, 0 divergences: 100%|██████████| 202000/202000 [02:51<00:00, 1181.01draws/s]
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
[5]:
print('Probability that mean revenue(treatment) is greater than mean revenue(control) = {:.2%}'.format(prob))
Probability that mean revenue(treatment) is greater than mean revenue(control) = 94.79%
[6]:
print('Lift between treatment and control = {:.2%}'.format(lift))
Lift between treatment and control = 33.20%
The result of bayesian A/B testing is the probability that the treatment group perform better than the control group i.e. highest mean revenue per user value in the current example. This is a very intuitive way of doing A/B testing because it does not introduce any statistical measures (e.g. p-value) which are more difficult to be interpreted by non statisticians.
We can set an arbitrary threshold to define how to consider the outcome of the bayesian test, e.g. if prob
\(>\) 90%
we can conclude to a significative effect of the treatment on the mean revenue per user when compare to the control group.
Compare proportions¶
[7]:
# Number of users that made a purchase
purchase_contr = 470
purchase_treat = 500
# Total number of users
total_usr_treat = 5000
total_usr_contr = 5000
[8]:
prob, lift = analyzer.compare_conv(conv_contr=purchase_contr,
conv_treat=purchase_treat,
nobs_contr=total_usr_treat,
nobs_treat=total_usr_contr)
[9]:
print('Probability that purchase(treatment) is greater than purchase proportion(control) = {:.2%}'.format(prob))
Probability that purchase(treatment) is greater than purchase proportion(control) = 84.45%
[10]:
print('Lift between treatment and control = {:.2%}'.format(lift))
Lift between treatment and control = 6.37%
Bayesian GLM¶
Here we want to compare the average revenue per user of the control group versus the treatment group. We are also interested to differentiate the results based on some categorical features of the input samples (i.e. seniority_level
, country
).
[11]:
# Define the analyzer
analyzer = BayesianGLMAnalyzer()
Multivariate Regression
[12]:
df = pd.DataFrame([[1, 4, 35],
[0, 4, 5],
[1, 3, 28],
[0, 1, 5],
[0, 2, 1],
[1, 0, 1.5]], columns=['group', 'seniority_level', 'revenue'])
[13]:
stats = analyzer.multivariate_regression(df, 'revenue')
stats
Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [lam, seniority_level, group, Intercept]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [00:03<00:00, 2035.12draws/s]
The number of effective samples is smaller than 25% for some parameters.
[13]:
mean | std | min | 25% | 50% | 75% | max | Prob<0 | Prob>0 | |
---|---|---|---|---|---|---|---|---|---|
Intercept | 1.048460 | 2.940644 | -13.254892 | -0.372376 | 0.967242 | 2.372862 | 26.860366 | 0.30325 | 0.69675 |
group | 0.576785 | 0.551946 | -1.425842 | 0.195678 | 0.572784 | 0.957911 | 2.738990 | 0.14725 | 0.85275 |
seniority_level | 1.646575 | 1.287070 | -2.438778 | 0.817672 | 1.352801 | 2.257462 | 8.219804 | 0.05050 | 0.94950 |
lam | 0.774718 | 1.390844 | 0.001202 | 0.101534 | 0.296813 | 0.821106 | 16.358989 | 0.00000 | 1.00000 |
In the last column Prob>0
, the table above shows that there is there is 85.27%
of probability that revenue
of group 1 is greater than group 2. Moreover it also shows that there is94.95%
of probability that seniority level
is positively associated to revenue
.
For the sake of providing a general summary of statistics the table also shows: the intercept and lambda (lam
) of the regression model.
Hierarchical regression
If your are not familiar with hierarchical regression have a look at the blog https://twiecki.io/blog/2014/03/17/bayesian-glms-3/.
[14]:
df = pd.DataFrame([[0, 5, 'USA'],
[0, 5, 'USA'],
[0, 100, 'Italy'],
[1, 100, 'USA'],
[1, 100, 'USA'],
[1, 100, 'France']], columns=['group', 'revenue', 'country'])
[15]:
stats = analyzer.hierarchical_regression(df, group_col='group', cat_col='country', kpi_col='revenue')
stats
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [eps, beta, alpha, sigma_beta, sigma_alpha, mu_beta, mu_alpha]
Sampling 4 chains, 816 divergences: 100%|██████████| 6000/6000 [02:10<00:00, 45.87draws/s]
There were 52 divergences after tuning. Increase `target_accept` or reparameterize.
There were 364 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.2979906043312202, but should be close to 0.8. Try to increase the number of tuning steps.
There were 75 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.6628490775514363, but should be close to 0.8. Try to increase the number of tuning steps.
There were 325 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.7113696800957767, but should be close to 0.8. Try to increase the number of tuning steps.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
[15]:
mean | std | min | 25% | 50% | 75% | max | Prob<0 | Prob>0 | |
---|---|---|---|---|---|---|---|---|---|
mu_alpha | -0.028085 | 0.989639 | -3.581447 | -0.695825 | -0.132219 | 0.688185 | 3.598191 | 0.54100 | 0.45900 |
mu_beta | 0.176766 | 0.993789 | -3.468508 | -0.487023 | 0.309218 | 0.832437 | 3.588725 | 0.39750 | 0.60250 |
alpha__USA | 14.074894 | 37.636252 | -171.899366 | -0.990796 | 0.317332 | 11.625923 | 240.521179 | 0.45875 | 0.54125 |
alpha__Italy | 32.564691 | 46.492324 | -57.351711 | -0.532305 | 0.945736 | 99.803488 | 163.613053 | 0.39150 | 0.60850 |
alpha__France | 2.547504 | 6.700164 | -40.234538 | -0.467854 | 1.040751 | 4.971800 | 91.083058 | 0.35550 | 0.64450 |
beta__USA | 22.419341 | 43.726614 | -140.604607 | -0.145441 | 1.603786 | 33.143822 | 272.022584 | 0.26150 | 0.73850 |
beta__Italy | -1.967748 | 58.002111 | -484.885230 | -3.517865 | 0.349032 | 3.400547 | 481.391653 | 0.44850 | 0.55150 |
beta__France | 34.939470 | 45.972820 | -86.950038 | -0.048646 | 1.928143 | 94.856067 | 208.532713 | 0.25650 | 0.74350 |
sigma_alpha | 26.197334 | 42.125100 | 0.190135 | 0.528937 | 1.937846 | 51.083900 | 458.640177 | 0.00000 | 1.00000 |
sigma_beta | 36.309637 | 54.466205 | 0.075608 | 0.989605 | 5.203234 | 59.455603 | 434.367847 | 0.00000 | 1.00000 |
eps | 60.218967 | 46.760094 | 0.103970 | 0.664053 | 67.356771 | 99.604387 | 282.430219 | 0.00000 | 1.00000 |
In the table above we will focus on the beta parameters which represents the coefficients of the hierarchical regression. In the last column Prob>0
, the table shows per each country the probability that revenue
of group 1 is greater than group 2. In this way we can have an idea of the country in which the treatment was more effective.
For the sake of providing a general summary of statistics the table also shows: the alpha parameters which are the intercepts of the hierarchical regression; mu, sigma and eps which are the hyperpriors of the regression.