uplift_analysis.data module¶
This module implements the primary dataclass
required for performing uplift analysis - EvalSet
.
- class uplift_analysis.data.EvalSet(df: pandas.core.frame.DataFrame, name: Union[str, NoneType] = None, observed_action_field: Union[str, NoneType] = 'observed_action', response_field: Union[str, NoneType] = 'response', score_field: Union[str, NoneType] = 'score', proposed_action_field: Union[str, NoneType] = 'proposed_action', control_indicator: Union[str, int, NoneType] = 0, _is_evaluated: bool = False, _is_binary_response: bool = False, _is_multiple_actions: bool = False)¶
- classmethod conf_fields() → Dict[str, List]¶
the returned dict specifies the fields required for performing the evaluation and their expected types
- set_props(**kwargs) → None¶
A function for performing object configuration in terms of specifying the field names required for performing the uplift analysis.
- Parameters
**kwargs – Arbitrary keyword arguments.
- property is_evaluated: bool¶
This property will be assigned by
evaluation.Evaluator
when the evaluation procedure is completed.
- property is_binary_response: bool¶
This property will indicate whether the dataset is associated with a binary response.
- property is_multiple_actions: bool¶
This property will indicate whether the dataset is associated with multiple actions.
- set_problem_type()¶
This method performs two checks:
is the dataset associated with a single action (except for the neutral action), or with a multitude of possible actions (multiple treatments).
is the dataset associated with a response of binary type.
- sort_and_rank() → None¶
This method orders the dataset according to the provided score, in a descending manner, and assigns a new index, corresponding to the relative percentile of each observation in the dataset, in terms of score.
- infer_subgroup_assignment() → None¶
This method specifies, for each observation in the dataset, what group was the observation assigned to - whether each observation was assigned an actual action (different from the neutral action), or the neutral one.
In addition, this function marks observations in which the specific action observed/assigned matches/intersects with the recommended action according to the model - this is needed for cases in which there are multiple treatments.
- get_cumulative_counts() → None¶
This method performs summation in a cumulative manner of the observations in each group. This stage is performed after ranking and ordering the dataframe according to the provided score.
- response_averaging() → None¶
This method performs the averaging of the responses for each extent of exposure, and for each subgroup.
- compute_uplift() → None¶
This function computes the uplift, as the difference between the average response of the treated group, and the not-treated (control) group, per percentile. The same is done for the intersection group, in which all the recommended actions go hand in hand with the actual actions assigned/observed - relevant for multiple treatments/actions scenario.
- compute_gain() → None¶
This method computes the gain as the multiplication of the uplift curve with cumulative count of control/non-treated cases. The gain signal enables the perception of the trade-off between the quantity of exposed cases and the reduction in uplift. Interpreting the gain signal is harder when the response is not binary (real value response).
- compute_expected_response() → None¶
This method computes the expected response per group, by weighting the average response of the exposed group, and the observed average response of the control group on the complementary part of the dataset, taking into account the relevant percentiles.
- compute_relative_lift() → None¶
This method computes the difference between the expected response computed on an earlier stage, and the average response of the control group overall.
- get_quantile_interval() → float¶
This method returns the “sampling interval” between each sample/observation in the dataset, in terms of quantiles. This variable can be used, for example, for computing the integral below the uplift curve.
- Returns
The quantile interval.
- Return type
float