uplift_analysis.scoring module

This module implements a scoring utility wrapped as a class named Scorer. Given a set of (or a single) scoring configurations, each of which specifies the relevant fields, and the specific function to apply to these fields, each observation within the input dataset is scored. In case of multiple scoring configurations, the scores of the methods are combined into a new score, weighting the magnitude of the score, associated with each action (relevant for a multiple actions scenario).

Notes

  • Scorer also supports use-cases with multiple treatments.

class uplift_analysis.scoring.Scorer(scoring_configuration: Optional[Union[Dict, List[Dict]]] = None)

The Scorer class is used for scoring observations on a given dataset, according to a provided configuration, or a set of configurations.

Parameters

scoring_configuration (Optional[Union[Dict, List[Dict]]]) – A list of configurations or a single configuration (each of which represented as dict) specifying scoring methods.

set_scoring_config(scoring_configuration: Union[Dict, List[Dict]])None

A method for setting the scoring configuration associated with the object.

Parameters

scoring_configuration (Union[Dict, List[Dict]]) – A list of configurations or a single configuration (each of which represented as dict) specifying scoring methods.

calculate_scores(dataset: Union[Dict[str, numpy.ndarray], pandas.core.frame.DataFrame], scoring_configuration: Optional[Union[Dict, List[Dict]]] = None)Tuple

This function serves as the primary interface of the class. Given a dataset, and scoring configuration, this function returns the corresponding scores for each observation in the set, accompanied with the recommended action.

Parameters
  • dataset (Union[Dict[str, np.ndarray], pd.DataFrame]) – the dataset to be scored.

  • scoring_configuration (Union[Dict, List[Dict]]) – the configuration according to which the observations will be scored.

Returns

  • rankings (np.ndarray) – The relative rank (0,1] - highest means highest uplift score - of each observation in the dataset.

  • scored_actions (np.ndarray) – The serial index of the action corresponding to the highest score, per observation.

  • scores (np.ndarray) – The score for each observation, according to the provided configuration.

  • action_dim (int) – The quantity of actions taken into account.

multiple_scoring_methods_calc(dataset: Union[Dict[str, numpy.ndarray], pandas.core.frame.DataFrame], scoring_methods: List[Dict])

This function applies a set of scoring method configurations to the provided dataset, and returns the resulting scores, and the recommended actions after combining the set of computed scores.

Parameters
  • dataset (Union[Dict[str, np.ndarray], pd.DataFrame]) – The set of observations to score.

  • scoring_methods (List[Dict]) – A list of dictionaries representing the scoring method configurations.

Returns

  • rankings (np.ndarray) – The relative rank (0,1] - highest means highest uplift score - of each observation in the dataset.

  • scored_actions (np.ndarray) – The serial index of the action corresponding to the highest score, per observation.

  • scores (np.ndarray) – The score for each observation, according to the provided configuration.

  • action_dim (int) – The quantity of actions taken into account.

combine_scores(rankings: numpy.ndarray, scored_actions: numpy.ndarray, action_dim: int)

A function for combining the recommendations and scores resulting of multiple scoring methods, according to the relative rankings.

Parameters
  • rankings (np.ndarray) – An array containing the relative ranking for each observation (row), according to each scoring method (column).

  • scored_actions (np.ndarray) – An array containing the recommended action for each observation (row), according to each scoring method (column).

  • action_dim (int) – The cardinality of the action space.

Returns

  • combined_rankings (np.ndarray) – The relative rank (0,1] - highest means highest uplift score - of each observation in the dataset.

  • combined_score_action (np.ndarray) – The serial index of the action corresponding to the highest score, per observation.

  • combined_score (np.ndarray) – The score for each observation, according to the provided configuration.

  • action_dim (int) – The quantity of actions taken into account.

single_scoring_method_calc(dataset: Union[Dict[str, numpy.ndarray], pandas.core.frame.DataFrame], scoring_method: Dict)

This function applies a single scoring method configuration to the provided dataset, and returns the resulting scores, and the recommended actions according to these scores.

Parameters
  • dataset (Union[Dict[str, np.ndarray], pd.DataFrame]) – The set of observations to score.

  • scoring_method (Dict) – A dictionary representing the scoring method configuration.

Returns

  • rankings (np.ndarray) – The relative rank (0,1] - highest means highest uplift score - of each observation in the dataset.

  • scored_action (np.ndarray) – The serial index of the action corresponding to the highest score, per observation.

  • observation_score (np.ndarray) – The score for each observation, according to the provided configuration.

  • action_dim (int) – The quantity of actions taken into account.

score_computation(dataset: Union[Dict[str, numpy.ndarray], pandas.core.frame.DataFrame], scoring_method: Dict)numpy.ndarray

This function uses a single scoring method configuration and applies it to the provided dataset, for score computation.

Parameters
  • dataset (Union[Dict[str, np.ndarray], pd.DataFrame]) – The dataset containing the observations that require scoring.

  • scoring_method (Dict) – A dictionary specifying the scoring method configuration.

Returns

The resulting scores, corresponding the provided scoring method configuration.

Return type

np.ndarray

static rank_scores(observation_score: numpy.ndarray)numpy.ndarray

A method for computing relative rank (among the provided dataset) for each observation, according to the computed score.

Parameters

observation_score (np.ndarray) – An array representing the score for each observation.

Returns

relative value in the range (0,1] indicating score rank (within the given dataset), for each of the observations.

Return type

np.ndarray