uplift_analysis.utils module

This module implements some basic utility functions required for the analysis and evaluation procedure.

uplift_analysis.utils.is_multi_action(actions: pandas.core.series.Series, neutral_indicator: Union[int, str])bool

This method checks whether the input dataframe is associated with a single action (except for the neutral action) or with a multitude of possible actions (multiple treatments).

Parameters
  • actions (pd.Series) – A Pandas series representing a set of observed actions.

  • neutral_indicator (Union[int,str]) – The action value associated with the neutral action.

Returns

A boolean indicating if the set is associated with multiple actions (True).

Return type

bool

uplift_analysis.utils.is_binary_response(responses: pandas.core.series.Series)bool

This method checks whether the input dataframe is associated with a response of binary type.

Parameters

responses (pd.Series) – A Pandas series representing a set of observed responses.

Returns

A boolean indicating if the set is associated with a binary response (True).

Return type

bool

uplift_analysis.utils.get_standard_error_mean(sample_size, std, z: Optional[float] = 1.96)

A function for computing the one-sided margin, corresponding to a desired confidence interval coverage of the standard error of the sample mean estimator. Refer to this page for more details.

Parameters
  • sample_size – The size of the sample (number of observations) for which the standard error estimation is required. It could be a scalar value, for a single computation, or an array-like input for multiple computations at once.

  • std – The standard deviations, associated with each of the elements in sample_size.

  • z (Optional[float]) – The one-sided cofidence interval covrage, corresponding to the standrad normal distribution. The default value corresponds to a 95% confidence interval.

Returns

Return type

The standard error of mean estimation corresponding to the provided sample sizes and standard deviations.

uplift_analysis.utils.get_standard_error_proportion(sample_size, proportion_estimate, z: float = 1.96)

A function for computing the one-sided margin, corresponding to a desired confidence interval coverage of the standard error of the proportion estimator (expectation of a binary random variable). Refer to this page for more details.

Parameters
  • sample_size – The size of the sample (number of observations) for which the standard error estimation is required. It could be a scalar value, for a single computation, or an array-like input for multiple computations at once.

  • proportion_estimate – The estimated proportions, associated with each of the elements in sample_size.

  • z (Optional[float]) – The one-sided cofidence interval covrage, corresponding to the standrad normal distribution. The default value corresponds to a 95% confidence interval.

Returns

Return type

The standard error of proportion estimation corresponding to the provided sample sizes and proportion_estimates.

uplift_analysis.utils.proportions_test(proportion_est_1, proportion_est_2, sample_siz_1, sample_siz_2)

This function implements an hypothesis testing for the difference between proportions of two groups.

Given the proportion estimates of two groups, and the sample size associated with each of these groups, it tests the null hypothesis, that states that the proportions of the populations from which the two groups were sampled is identical. The alternative hypothesis in this case, is two-tailed, and it simply states, that the proportions of the populations from which the two groups are sampled, is different. The two-tailed hypothesis, implies that the order of the two groups in this case is arbitrary.

For more detailes, see Stat Trek page on Hypothesis Test: Difference Between Proportions.

All the inputs can be array-like for performing multiple computations at once, or scalar values, for performing a single test.

Parameters
  • proportion_est_1 – The estimated proportion for the first group.

  • proportion_est_2 – The estimated proportion for the second group.

  • sample_siz_1 – The sample size of the first group.

  • sample_siz_2 – The sample size of the second group.

Returns

The result(s) of the test(s) performed, in the form of p-value(s). Each p-value represents the probability that similar findings will occur in the situation where the null hypothesis is true.

Return type

p_vals

uplift_analysis.utils.t_test(mu_1, mu_2, sample_siz_1, sample_siz_2, std_1, std_2)

This function implements an hypothesis testing for the difference between means of two groups.

Given the mean estimates of two groups, the sample size associated with each of these groups, and their standarad deviations, the function uses t-test to examine the null hypothesis, that states that the means of the populations from which the two groups were sampled is identical. The alternative hypothesis in this case, is two-tailed, and it simply states, that the means of the populations from which the two groups are sampled, is different. The two-tailed hypothesis, implies that the order of the two groups in this case is arbitrary.

For more detailes, see Stat Trek page on Hypothesis Test: Difference Between Means.

All the inputs can be array-like for performing multiple computations at once, or scalar values, for performing a single test.

Parameters
  • mu_1 – The estimated mean for the first group.

  • mu_2 – The estimated mean for the second group.

  • sample_siz_1 – The sample size of the first group.

  • sample_siz_2 – The sample size of the second group.

  • std_1 – The standard deviation of the first group.

  • std_2 – The standard deviation of the second group.

Returns

The result(s) of the test(s) performed, in the form of p-value(s). Each p-value represents the probability that similar findings will occur in the situation where the null hypothesis is true.

Return type

p_vals