abexp.core

abexp.core.design

class abexp.core.design.SampleSize

Bases: object

This class provides some utils to be used before running A/B test experiments. It includes minimum sample size determination, power calculation and effect size estimation. It handles both the case of means comparison and proportions comparison. Results are computed via power analysis with closed-form solution or simulation under the assumption that sample data are normally distributed.

static ssd_mean(mean_contr, mean_treat, std_contr, alpha=0.05, power=0.8)

Sample size determination (SDD) to compare means. Compute the minimum sample size needed to run A/B test experiments. The result is computed via power analysis with closed-form solution t-test. Effect size estimation is calculated with cohen’s d coefficient.

Parameters
  • mean_contr (float) – Mean of the control group.

  • mean_treat (float) – Mean of the treatment group.

  • std_contr (float > 0) – Standard deviation of the control group. It assumes that the standard deviation of the control group is equal to the standard deviation of the treatment group.

  • alpha (float in interval (0,1)) – Significance level, default 0.05. It is the probability of a type I error, that is wrong rejections if the null hypothesis is true.

  • power (float in interval (0,1)) – Statistical power of the test, default 0.8. It is one minus the probability of a type II error. Power is the probability that the test correctly rejects the null hypothesis if the alternative hypothesis is true.

Returns

sample_size – Minimum sample size per each group

Return type

int

static ssd_mean_sim(mean_contr, mean_treat, std_contr, alpha=0.05, power=0.8, sims=1000, start_size=100, step_size=0, max_size=10000)

Sample size determination (SDD) to compare means with simulation. Compute the minimum sample size needed to run A/B test experiments. The result is computed via power analysis with simulation through t-test.

Parameters
  • mean_contr (float) – Mean of the control group.

  • mean_treat (float) – Mean of the treatment group.

  • std_contr (float > 0) – Standard deviation of the control group. It assumes that the standard deviation of the control group is equal to the standard deviation of the treatment group.

  • alpha (float in interval (0,1)) – Significance level, default 0.05. It is the probability of a type I error, that is wrong rejections if the Null Hypothesis is true.

  • power (float in interval (0,1)) – Statistical Power of the test, default 0.8. It is one minus the probability of a type II error. Power is the probability that the test correctly rejects the Null Hypothesis if the Alternative Hypothesis is true.

  • sims (int) – Number simulations, default 1000.

  • start_size (int) – Initial sample size, default 100, used for the first iteration.

  • step_size (int) – Spacing between samples size, default 50. This is the distance between two adjacent sample size, sample_size[i+1] - sample_size[i].

  • max_size (int) – Maximum sample size, default 10000. The function returns this value if the desired power is not reached via simulation.

Returns

sample_size – Minimum sample size per each group

Return type

int

static ssd_prop(prop_contr, prop_treat, alpha=0.05, power=0.8)

Sample size determination (SDD) to compare proportions. Compute the minimum sample size needed to run A/B test experiments. The result is computed via power analysis with closed-form solution z-test. Effect size estimation is calculated with cohen’s h coefficient.

Parameters
  • prop_contr (float in interval (0,1)) – Proportion in the control group.

  • prop_treat (float in interval (0,1)) – Proportion in the treatment group.

  • alpha (float in interval (0,1)) – Significance level, default 0.05. It is the probability of a type I error, that is wrong rejections if the Null Hypothesis is true.

  • power (float in interval (0,1)) – Statistical Power of the test, default 0.8. It is one minus the probability of a type II error. Power is the probability that the test correctly rejects the Null Hypothesis if the Alternative Hypothesis is true.

Returns

sample_size – Minimum sample size per each group

Return type

int

abexp.core.planning

class abexp.core.planning.Planning

Bases: object

static planning_diff_mean(avg_n_users_per_day, mean_contr, mean_treat, std_contr, alpha=0.05, power=0.8)

Use the sample size determination with means comparison from the core.design.SampleSize class to estimate the number of days that a test must run to achieve the desired significance and power level.

Parameters
  • avg_n_users_per_day (int) – The number users per day which can be directed to the variant.

  • mean_contr (float) – Mean of the control group.

  • mean_treat (float) – Mean of the treatment group.

  • std_contr (float > 0) – Standard deviation of the control group. It assumes that the standard deviation of the control group is equal to the standard deviation of the treatment group.

  • alpha (float in interval (0,1)) – Significance level, default 0.05. It is the probability of a type I error, that is wrong rejections if the null hypothesis is true.

  • power (float in interval (0,1)) – Statistical power of the test, default 0.8. It is one minus the probability of a type II error. Power is the probability that the test correctly rejects the null hypothesis if the alternative hypothesis is true.

Returns

n_days – Minimum number of days to run the A/B test.

Return type

int

static planning_diff_prop(avg_n_users_per_day, prop_contr, prop_treat, alpha=0.05, power=0.8)

Use the sample size determination with proportions comparison from the core.design.SampleSize class to estimate the number of days that a test must run to achieve the desired significance and power level.

Parameters
  • avg_n_users_per_day (int) – The number users per day which can be directed to the variant.

  • prop_contr (float in interval (0,1)) – Proportion in the control group.

  • prop_treat (float in interval (0,1)) – Proportion in the treatment group.

  • alpha (float in interval (0,1)) – Significance level, default 0.05. It is the probability of a type I error, that is wrong rejections if the Null Hypothesis is true.

  • power (float in interval (0,1)) – Statistical Power of the test, default 0.8. It is one minus the probability of a type II error. Power is the probability that the test correctly rejects the Null Hypothesis if the Alternative Hypothesis is true.

Returns

n_days – Minimum number of days to run the A/B test.

Return type

int

abexp.core.allocation

class abexp.core.allocation.Allocator

Bases: object

This class provides some utils to be used before running A/B test experiments. Groups allocation is the process that assigns (allocates) a list of users either to group A (e.g. control) or to group B (e.g. treatment). This class provides functionalities to randomly allocate users in two or more groups (A/B/C/…).

static blocks_randomization(df, id_col, stratum_cols, ngroups=2, prop=[None], seed=None)

Random allocate users within a block in n groups. Users with similar characteristics (features) define a block, and randomization is conducted within a block. This enables balanced and homogeneous groups of similar sizes.

Parameters
  • df (pd.DataFrame) – Input dataset of users.

  • id_col (str) – Column name of the user ids.

  • stratum_cols (list) – List of column names to be stratified over

  • ngroups (int) – Number of group variations, default 2.

  • prop (array_like of floats in interval (0,1)) – Proportions of users in each group. By default, each group has the same amount of users.

  • seed (int, default None.) – Seed for random state. The function outputs deterministic results if called more times with equal inputs while maintaining the same seed.

Returns

  • df (pd.DataFrame) – Dataset of users with additional column for the group variation

  • stats (pd.DataFrame) – Statistics of the number of users contained in each group

static complete_randomization(user_id, ngroups=2, prop=None, seed=None)

Random allocate users in n groups.

Parameters
  • user_id (array_like) – Array of user ids.

  • ngroups (int) – Number of group variations, default 2.

  • prop (array_like of floats in interval (0,1)) – Proportions of users in each group. By default, each group has the same amount of users. Proportion should sum up to 1.

  • seed (int, default None.) – Seed for random state. The function outputs deterministic results if called more times with equal inputs while maintaining the same seed.

Returns

  • df (pd.DataFrame) – Dataset of user ids with additional column for the group variation

  • stats (pd.DataFrame) – Statistics of the number of users contained in each group

abexp.core.analysis_frequentist

class abexp.core.analysis_frequentist.FrequentistAnalyzer

Bases: object

This class provides tools to perform analysis after A/B test experiments with frequentist statistical approach. It handles both the case of means comparison and conversions comparison with closed-form-solutions. It also includes bootstrapping and homogeneity checks of the observed samples.

bootstrap(data, func, rep=500, seed=None)

Perform bootstrapping on the observed dataset. This technique makes inference about a certain estimate (e.g. sample mean) for a certain population parameter (e.g. population mean) by resampling with replacement from the observed dataset. This technique does not make assumptions on the observed samples distribution.

Parameters
  • data (array_like of shape (n_samples, n_days)) – Input samples for bootstrapping.

  • func (function, default np.mean) – Function used to aggregate samples at each bootstrapping iteration. The function must compute its aggregation along axis=0.

  • rep (int, default 500.) – Number of resampling repetitions.

  • seed (int, default None.) – Seed for random state. The function outputs deterministic results if called more times with equal inputs while maintaining the same seed.

Returns

stats – Summary statistics of bootstrapping (median, 2.5 percentile, 97.5 percentile).

Return type

pandas DataFrame

check_homogeneity(df, group, cat_cols, verbose=False)

Check variables homogeneity of the samples considered in the experiment. The goal is to verify homogeneity between control and treatment groups. It performs univariate logistic regression per each variable of the input samples where the dependent variable is the group variation.

Parameters
  • df (pandas DataFrame of shape (n_samples, n_variables)) – Input samples to be checked.

  • group (array-like of shape (n_samples,)) – Groups variation of each sample (either 0 or 1).

  • cat_cols (list) – List of the column names to be considered as categorical variables.

  • verbose (bool) – Print detailed information of the logistic regression.

Returns

stats – Statistics of the logistic regression (coefficients, p-values, etc.)

Return type

pandas DataFrame

compare_conv_obs(obs_contr, obs_treat, alpha=0.05)

Compare conversions from observed samples. Compare the conversions of the control group versus the conversions of the treatment group. The result is computed with z-test (closed-form solution) given the observed samples of the two groups. It assumes that sample data are normally distributed.

Parameters
  • obs_contr (array_like) – Observation of the control sample. It is a boolean vector (0 or 1) which indicates weather the sample i-th of the array was converted or not.

  • obs_treat (array_like) – Observation of the treatment sample. It is a boolean vector (0 or 1) which indicates weather the sample i-th of the array was converted or not.

  • alpha (float in interval (0,1)) – Significance level, default 0.05. It is the probability of a type I error, that is wrong rejections if the Null Hypothesis is true.

Returns

  • p_value (float in interval (0,1)) – p-value for the statistical test.

  • ci_contr (tuple) – confidence interval for the control group.

  • ci_treat (tuple) – confidence interval for the treatment group.

compare_conv_stats(conv_contr, conv_treat, nobs_contr, nobs_treat, alpha=0.05)

Compare conversions from statistics. Compare the conversions of the control group versus the conversions of the treatment group. The result is computed with z-test (closed-form solution) given the groups statistics. It assumes that sample data are normally distributed.

Parameters
  • conv_contr (int > 0) – Number of conversions in the control group.

  • conv_treat (int > 0) – Number of conversions in the treatment group.

  • nobs_contr (int > 0) – Total number of observations of the control group.

  • nobs_treat (int > 0) – Total number of observations of the treatment group.

  • alpha (float in interval (0,1)) – Significance level, default 0.05. It is the probability of a type I error, that is wrong rejections if the Null Hypothesis is true.

Returns

  • p_value (float in interval (0,1)) – p-value for the statistical test.

  • ci_contr (tuple) – confidence interval for the control group.

  • ci_treat (tuple) – confidence interval for the treatment group.

compare_mean_obs(obs_contr, obs_treat, alpha=0.05)

Compare means from observed samples. Compare the mean of the control group versus the mean of the treatment group. The result is computed with t-test (closed-form solution) given the observed samples of the two groups. It assumes that sample data are normally distributed.

Parameters
  • obs_contr (array_like) – Observation of the control sample. It contains the value to be analyzed per each sample.

  • obs_treat (array_like) – Observation of the treatment sample. It contains the value to be analyzed per each sample.

  • alpha (float in interval (0,1)) – Significance level, default 0.05. It is the probability of a type I error, that is wrong rejections if the Null Hypothesis is true.

Returns

  • p_value (float in interval (0,1)) – p-value for the statistical test.

  • ci_contr (tuple) – confidence interval for the control group.

  • ci_treat (tuple) – confidence interval for the treatment group.

compare_mean_stats(mean_contr, mean_treat, std_contr, nobs_contr, nobs_treat, alpha=0.05)

Compare means from statistics. Compare the mean of the control group versus the mean of the treatment group. The result is computed with t-test (closed-form solution) given the groups statistics. It assumes that sample data are normally distributed.

Parameters
  • mean_contr (float) – Mean of the control group.

  • mean_treat (float) – Mean of the treatment group.

  • std_contr (float > 0) – Standard deviation of the control group. It assumes that control and treatment group have the same standard deviation.

  • nobs_contr (int > 0) – Number of observations in the control group

  • nobs_treat (int > 0) – Number of observations in the treatment group

  • alpha (float in interval (0,1)) – Significance level, default 0.05. It is the probability of a type I error, that is wrong rejections if the Null Hypothesis is true.

Returns

  • p_value (float in interval (0,1)) – p-value for the statistical test.

  • ci_contr (tuple) – confidence interval for the control group.

  • ci_treat (tuple) – confidence interval for the treatment group.

abexp.core.analysis_bayesian

class abexp.core.analysis_bayesian.BayesianAnalyzer

Bases: object

This class provides tools to perform analysis after A/B test experiments with bayesian statistical approach. It handles both the case of means comparison and conversions comparison with closed-form-solutions or simulation. Bayesian analysis does not make any normality assumptions on the sample data.

compare_conv(conv_contr, conv_treat, nobs_contr, nobs_treat)

Compare conversions from statistics. Compare the conversions of the control group versus the conversions of the treatment group. The result is computed via bayesian analysis with a closed-form solution based on the concept of conjugate priors.

Reference paper: John Cook, Exact calculation of beta inequalities (2005).

Parameters
  • conv_contr (int > 0) – Number of conversions in the control group.

  • conv_treat (int > 0) – Number of conversions in the treatment group.

  • nobs_contr (int > 0) – Number of observations in the control group

  • nobs_treat (int > 0) – Number of observations in the treatment group

Returns

  • prob (float in interval (0,1)) – probability that treatment group is better than control group

  • lift (float in interval (0,1)) – lift between the two groups

compare_mean(obs_contr, obs_treat, n=50000)

Compare means from observed samples. Compare the mean of the control group versus the mean of the treatment group. The result is computed via bayesian analysis with Markov chain Monte Carlo (MCMC) simulation.

Reference paper: John K. Kruschke, Bayesian Estimation Supersedes the t Test (2012)

Parameters
  • obs_contr (array_like) – Observation of first sample

  • obs_treat (array_like) – Observation of second sample

  • n (int, default 500000) – The number of samples to draw in MCMC

Returns

  • prob (float in interval (0,1)) – Probability that treatment group is better than control group

  • lift (float in interval (0,1)) – Lift between the two groups

  • diff_means (float) – Difference of means. The treatment group mean - control group mean.

  • ci (Tuple of floats) – Credible intervals. Lower and upper values of the interval [low, high].

class abexp.core.analysis_bayesian.BayesianGLMAnalyzer

Bases: object

The class provides tools to perform analysis after A/B test experiments with bayesian statistical approach. It provides techniques based on bayesian generalized linear model (GLM) with multivariate and hierarchical regression. Bayesian analysis does not make any normality assumptions on the sample data.

hierarchical_regression(df, group_col, cat_col, kpi_col)

Compare means from observed samples. Compare the mean of the control group versus the mean of the treatment group. The result is computed via bayesian hierarchical generalized linear model (GLM).

Parameters
  • df (pandas DataFrame of shape (n_samples, n_variables)) – Input samples data set

  • group_col (str) – Column name in the input dataset of the group variation

  • cat_col (str) – Column name in the input dataset of the categorical variable

  • kpi_col (str) – Column name in the input dataset of the kpi

Returns

stats – Summary statistics of the model

Return type

pandas DataFrame

multivariate_regression(df, kpi_col, family=Family <class 'pymc3.glm.families.StudentT'>:     Likelihood  : StudentT(mu)     Priors      : {'lam': <pymc3.distributions.continuous.HalfCauchy object>, 'nu': 1}     Link function: <pymc3.glm.families.Identity object>.)

Compare means from observed samples. Compare the mean of the control group versus the mean of the treatment group. The result is computed via bayesian generalized linear model (GLM) with robust multivariate regression.

Parameters
  • df (pandas DataFrame of shape (n_samples, n_variables)) – Input samples data set

  • kpi_col (str) – Column name in the input dataset of the kpi

  • family (pymc3.glm.families, default StudentT) – Priors family distribution

Returns

stats – Summary statistics of the model

Return type

pandas DataFrame

abexp.statistics

abexp.statistics.stats_metrics

abexp.statistics.stats_metrics.cohens_d(mu_1, mu_2, std)

Compute the standardized effect size as difference between the two means divided by the standard deviation.

Parameters
  • mu_1 (float) – Mean of the first sample.

  • mu_2 (float) – Mean of the second sample.

  • std (float > 0) – Pooled standard deviation. It assumes that the variance of each population is the same.

Returns

effect_size – Effect size as cohen’s d coefficient

Return type

float

abexp.statistics.stats_metrics.cohens_h(p1, p2)

Compute the effect size as measure of distance between two proportions or probabilities. It is the difference between their arcsine transformations

Parameters
  • p1 (float in interval (0,1)) – Proportion or probability of the first sample.

  • p2 (float in interval (0,1)) – Proportion or probability of the second sample.

Returns

effect_size – Effect size as cohen’s h coefficient

Return type

float

abexp.statistics.stats_metrics.pooled_std(sample1, sample2)

Compute pooled standard deviation between two samples.

Parameters
  • sample1 (array_like) – Observation of first sample

  • sample2 (array_like) – Observation of second sample

Returns

pooled_std – p-value for the test

Return type

float > 0

abexp.statistics.stats_tests

abexp.statistics.stats_tests.normal_test(x, method='dagostino')float

Perform a normality test for the sample data. It tests if the sample comes from a normal distribution.

Parameters
  • x (array_like) – The array of sample data

  • method (string) – statistical method to perform the normality test, default ‘dagostino’. It value can be either ‘dagostino’ to perform the D’Agostino and Pearson’s test which combines skew and kurtosis or ‘shapiro’ to perform the Shapiro-Wilk test for normality. For N > 5000 the p-value with ‘shapiro’ may not be accurate.

Returns

p_val – p-value for the test

Return type

float in interval (0,1)

abexp.statistics.stats_tests.permutation_test(obs_1, obs_2, reps=1000)float

Run the permutation test, a type of statistical significance test based on resampling method. The distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under N possible rearrangements of the observed data points randomly selected. N is the number of repetitions.

Parameters
  • obs_1 (array_like) – Observation of first sample

  • obs_2 (array_like) – Observation of second sample

  • reps (int > 0) – Number of repetition of the permutations

Returns

p_val – p-value for the test

Return type

float in interval (0,1)

abexp.visualization

abexp.visualization.analysis_plots

class abexp.visualization.analysis_plots.AnalysisPlot
static barplot(bars, yerr, figsize=(10, 8), width=0.4, fontsize=14, xlabel=None, ylabel=None, groupslabel=None, title=None, rotation=0, capsize=None, legendloc=None)

Make bars plot with confidence intervals for N groups (A/B/C…) given M segments.

Parameters
  • bars (array_like of shape (n_group, n_segments)) – Height of the bars.

  • yerr (array_like of shape (n_groups,)) – Lower and upper limit of the confidence interval error bar (y-err_low, y+err_upp) per each group.

  • figsize (Tuple, default (10, 8)) – Figure dimension (width, height) in inches.

  • width (float, default 0.4) – Width of the bars.

  • fontsize (float, default None) – Font size of the elements in the figure.

  • xlabel (list of length n_segments, default None) – List of labels for the segments on the x axis.

  • ylabel (string, default None) – Label for the y axis.

  • groupslabel (list of length n_groups, default None) – List of labels for group variations.

  • title (str, default None) – Title of the figure.

  • rotation (float, default 0) – Degree of rotation for xlabels.

  • capsize (float, default None) – Width of the confidence intervals cap.

  • legendloc (str, default None) – Location of the legend. Possible values: ‘center’, ‘best’, ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’, ‘upper center’, ‘lower center’, ‘center left’, ‘center right’.

Returns

fig – Output figure

Return type

matplotlib figure

static forest_plot(y, ci, figsize=(10, 8), fontsize=14, xlabel=None, ylabel=None, annotation=None, annotationlabel=None, title=None, rotation=0, capsize=None, legendloc=None, marker='s')

Make forest plot with confidence intervals for N groups.

Parameters
  • y (array_like of shape (n_groups,)) – Vertical coordinate of the central data points

  • ci (array_like of shape (n_groups,)) – Confidence intervals +/- values for y.

  • figsize (Tuple, default (10, 8)) – Figure dimension (width, height) in inches.

  • fontsize (float, default None) – Font size of the elements in the figure.

  • xlabel (list of length n_groups, default None) – List of labels for the groups on the x axis.

  • ylabel (string, default None) – Label for the y axis.

  • annotation (list of length n_groups) – Annotation value to be displayed per each bar

  • annotationlabel (list of shape) – Annotation label description to be displayed per each bar

  • title (str, default None) – Title of the figure.

  • rotation (float, default 0) – Degree of rotation for xlabels.

  • capsize (float, default None) – Width of the confidence intervals cap.

  • legendloc (str, default None) – Location of the legend. Possible values: ‘center’, ‘best’, ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’, ‘upper center’, ‘lower center’, ‘center left’, ‘center right’.

  • marker (str, default 's') – Marker style. Possible values: ‘.’, ‘,’, ‘o’, ‘8’, ‘s’, ‘P’, ‘h’, ‘H’, ‘+’, ‘x’, ‘X’, ‘d’, ‘D’, ‘_’, etc.

Returns

fig – Output figure

Return type

matplotlib figure

static timeseries_plot(y, ci, figsize=(15, 10), fontsize=14, xlabel=None, ylabel=None, groupslabel=None, title=None, rotation=45, capsize=None, legendloc=None)

Make time series plot with confidence intervals for N groups.

Parameters
  • y (array_like of shape (n_groups, n_day)) – Input time series

  • ci (array_like of shape (n_group, n_days)) – Confidence intervals +/- values for y.

  • figsize (Tuple, default (10, 8)) – Figure dimension (width, height) in inches.

  • fontsize (float, default None) – Font size of the elements in the figure.

  • xlabel (list of length n_days, default None) – List of labels for the days on the x axis.

  • ylabel (string, default None) – Label for the y axis.

  • groupslabel (list of length n_groups, default None) – List of labels for group variations.

  • title (str, default None) – Title of the figure.

  • rotation (float, default 45) – Degree of rotation for xlabels.

  • capsize (float, default None) – Width of the confidence intervals cap.

  • legendloc (str, default None) – Location of the legend. Possible values: ‘center’, ‘best’, ‘upper left’, ‘upper right’, ‘lower left’, ‘lower right’, ‘upper center’, ‘lower center’, ‘center left’, ‘center right’.

Returns

fig – Output figure

Return type

matplotlib figure