pybandits

pybandits.smab

class pybandits.smab.BaseSmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BaseBeta | BaseSmabZoomingModel], strategy: Strategy)

Bases: BaseMab, ABC

Base model for a Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling.

Parameters:
  • actions (Dict[ActionId, Union[BaseBeta, BaseSmabZoomingModel]]) – The list of possible actions, and their associated Model.

  • strategy (Strategy) – The strategy used to select actions.

actions_manager: SmabActionsManager[BaseBeta | BaseSmabZoomingModel]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

predict(n_samples: Annotated[int, Gt(gt=0)] = 1, forbidden_actions: Set[ActionId] | None = None) SmabPredictions

Predict actions.

Parameters:
  • n_samples (PositiveInt, default=1) – Number of samples to predict.

  • forbidden_actions (Optional[Set[ActionId]], default=None) – Set of forbidden actions. If specified, the model will discard the forbidden_actions and it will only consider the remaining allowed_actions. By default, the model considers all actions as allowed_actions. Note that: actions = allowed_actions U forbidden_actions.

Returns:

  • actions (List[UnifiedActionId]) – The actions selected by the multi-armed bandit model.

  • probs (Union[List[Dict[UnifiedActionId, Probability]], List[Dict[UnifiedActionId, MOProbability]]]) – The probabilities of getting a positive reward for each action.

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None = None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the stochastic Bernoulli bandit given the list of selected actions and their corresponding binary rewards.

Parameters:
  • actions (List[ActionId] of shape (n_samples,), e.g. ['a1', 'a2', 'a3', 'a4', 'a5']) – The selected action for each sample.

  • rewards (List[Union[BinaryReward, List[BinaryReward]]] of shape (n_samples, n_objectives)) –

    The binary reward for each sample.
    If strategy is not MultiObjectiveBandit, rewards should be a list, e.g.

    rewards = [1, 0, 1, 1, 1, …]

    If strategy is MultiObjectiveBandit, rewards should be a list of list, e.g. (with n_objectives=2):

    rewards = [[1, 1], [1, 0], [1, 1], [1, 0], [1, 1], …]

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.smab.SmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[Beta | SmabZoomingModel], strategy: ClassicBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling.

References

Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

Parameters:
  • actions_manager (SmabActionsManagerSO) – The manager for actions and their associated models.

  • strategy (ClassicBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[Beta | SmabZoomingModel]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: ClassicBandit
class pybandits.smab.SmabBernoulliBAI(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[Beta | SmabZoomingModel], strategy: BestActionIdentificationBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling, and Best Action Identification strategy.

References

Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

Parameters:
  • actions_manager (SmabActionsManagerSO) – The manager for actions and their associated models.

  • strategy (BestActionIdentificationBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[Beta | SmabZoomingModel]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: BestActionIdentificationBandit
class pybandits.smab.SmabBernoulliCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BetaCC | SmabZoomingModelCC], strategy: CostControlBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling, and Cost Control strategy.

The sMAB is extended to include a control of the action cost. Each action is associated with a predefined “cost”. At prediction time, the model considers the actions whose expected rewards is above a pre-defined lower bound. Among these actions, the one with the lowest associated cost is recommended. The expected reward interval for feasible actions is defined as [(1-subsidy_factor) * max_p, max_p], where max_p is the highest expected reward sampled value.

References

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638

Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488

Parameters:
  • actions_manager (SmabActionsManagerCC) – The manager for actions and their associated models.

  • strategy (CostControlBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[BetaCC | SmabZoomingModelCC]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: CostControlBandit
class pybandits.smab.SmabBernoulliMO(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BetaMO], strategy: MultiObjectiveBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling, and Multi-Objectives strategy.

The reward pertaining to an action is a multidimensional vector instead of a scalar value. In this setting, different actions are compared according to Pareto order between their expected reward vectors, and those actions whose expected rewards are not inferior to that of any other actions are called Pareto optimal actions, all of which constitute the Pareto front.

References

Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem (Yahyaa and Manderick, 2015) https://www.researchgate.net/publication/272823659_Thompson_Sampling_for_Multi-Objective_Multi-Armed_Bandits_Problem

Parameters:
  • actions_manager (SmabActionsManagerMO) – The manager for actions and their associated models.

  • strategy (MultiObjectiveBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[BetaMO]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: MultiObjectiveBandit
class pybandits.smab.SmabBernoulliMOCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BetaMOCC], strategy: MultiObjectiveCostControlBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling implementation for Multi-Objective (MO) with Cost Control (CC) strategy.

This Bandit allows the reward to be a multidimensional vector and include a control of the action cost. It merges the Multi-Objective and Cost Control strategies.

Parameters:
  • actions_manager (SmabActionsManagerMOCC) – The manager for actions and their associated models.

  • strategy (MultiObjectiveCostControlBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[BetaMOCC]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: MultiObjectiveCostControlBandit

pybandits.cmab

class pybandits.cmab.BaseCmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BaseBayesianNeuralNetwork | BaseCmabZoomingModel], strategy: Strategy)

Bases: BaseMab, ABC

Base model for a Contextual Multi-Armed Bandit for Bernoulli bandits with Thompson Sampling.

Parameters:
  • actions (Dict[ActionId, Union[BaseBayesianLogisticRegression, BaseCmabZoomingModel]]) – The list of possible actions, and their associated Model.

  • strategy (Strategy) – The strategy used to select actions.

actions_manager: CmabActionsManager[BaseBayesianNeuralNetwork | BaseCmabZoomingModel]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

predict(context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], forbidden_actions: Set[ActionId] | None = None) CmabPredictions

Predict actions.

Parameters:
  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • forbidden_actions (Optional[Set[ActionId]], default=None) – Set of forbidden actions. If specified, the model will discard the forbidden_actions and it will only consider the remaining allowed_actions. By default, the model considers all actions as allowed_actions. Note that: actions = allowed_actions U forbidden_actions.

Returns:

  • actions (List[ActionId] of shape (n_samples,)) – The actions selected by the multi-armed bandit model.

  • probs (Union[List[Dict[UnifiedActionId, Probability]], List[Dict[UnifiedActionId, MOProbability]]]) – The probabilities of getting a positive reward for each action.

  • ws (Union[List[Dict[UnifiedActionId, float]], List[Dict[UnifiedActionId, List[float]]]]) – The weighted sum of logistic regression logits.

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], quantities: List[float | List[float] | None] | None = None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)

Update the contextual Bernoulli bandit given the list of selected actions and their corresponding binary rewards.

Parameters:
  • actions (List[ActionId] of shape (n_samples,), e.g. ['a1', 'a2', 'a3', 'a4', 'a5']) – The selected action for each sample.

  • rewards (List[Union[BinaryReward, List[BinaryReward]]] of shape (n_samples, n_objectives)) –

    The binary reward for each sample.
    If strategy is not MultiObjectiveBandit, rewards should be a list, e.g.

    rewards = [1, 0, 1, 1, 1, …]

    If strategy is MultiObjectiveBandit, rewards should be a list of list, e.g. (with n_objectives=2):

    rewards = [[1, 1], [1, 0], [1, 1], [1, 0], [1, 1], …]

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

classmethod update_old_state(state: Dict[str, str | int | float | bool | None | Dict[str, str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]] | List[str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]]], delta: PositiveProbability | None) Dict[str, str | int | float | bool | None | Dict[str, str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]] | List[str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]]]

Update the model state to the current version. Besides the updates in the MAB class, it also loads legacy Bayesian Logistic Regression model parmeters into the new Bayesian Neural Network model.

Parameters:
  • state (Dict[str, Serializable]) – The internal state of a model (actions, strategy, etc.) of the same type. The state is expected to be in the old format of PyBandits below the current supported version.

  • delta (Optional[PositiveProbability]) – The delta value to be set in the actions_manager. If None, it will not be set. This is relevant only for adaptive window models.

Returns:

state – The updated state of the model. The state is in the current format of PyBandits, with actions_manager and delta added if needed.

Return type:

Dict[str, Serializable]

class pybandits.cmab.CmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetwork | CmabZoomingModel], strategy: ClassicBandit)

Bases: BaseCmabBernoulli

Contextual Bernoulli Multi-Armed Bandit with Thompson Sampling.

References

Thompson Sampling for Contextual Bandits with Linear Payoffs (Agrawal and Goyal, 2014) https://arxiv.org/pdf/1209.3352.pdf

Parameters:
  • actions_manager (CmabActionsManagerSO) – The manager for actions and their associated models.

  • strategy (ClassicBandit) – The strategy used to select actions.

actions_manager: CmabActionsManager[BayesianNeuralNetwork | CmabZoomingModel]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

strategy: ClassicBandit
class pybandits.cmab.CmabBernoulliBAI(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetwork | CmabZoomingModel], strategy: BestActionIdentificationBandit)

Bases: BaseCmabBernoulli

Contextual Bernoulli Multi-Armed Bandit with Thompson Sampling, and Best Action Identification strategy.

References

Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

Parameters:
  • actions_manager (CmabActionsManagerSO) – The manager for actions and their associated models.

  • strategy (BestActionIdentificationBandit) – The strategy used to select actions.

actions_manager: CmabActionsManager[BayesianNeuralNetwork | CmabZoomingModel]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

strategy: BestActionIdentificationBandit
class pybandits.cmab.CmabBernoulliCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetworkCC | CmabZoomingModelCC], strategy: CostControlBandit)

Bases: BaseCmabBernoulli

Contextual Bernoulli Multi-Armed Bandit with Thompson Sampling, and Cost Control strategy.

The Cmab is extended to include a control of the action cost. Each action is associated with a predefined “cost”. At prediction time, the model considers the actions whose expected rewards is above a pre-defined lower bound. Among these actions, the one with the lowest associated cost is recommended. The expected reward interval for feasible actions is defined as [(1-subsidy_factor) * max_p, max_p], where max_p is the highest expected reward sampled value.

References

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638

Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488

Parameters:
  • actions_manager (CmabActionsManagerCC) – The manager for actions and their associated models.

  • strategy (CostControlBandit) – The strategy used to select actions.

actions_manager: CmabActionsManager[BayesianNeuralNetworkCC | CmabZoomingModelCC]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

strategy: CostControlBandit

pybandits.model

class pybandits.model.BaseBayesianNeuralNetwork(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)

Bases: Model, ABC

Bayesian Neural Network model for binary classification.

This class implements a Bayesian Neural Network with an arbitrary number of fully connected layers using PyMC for binary classification tasks. It supports both Markov Chain Monte Carlo (MCMC) and Variational Inference (VI) methods for posterior inference.

References

Bayesian Learning for Neural Networks (Radford M. Neal, 1995) https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=db869fa192a3222ae4f2d766674a378e47013b1b

Parameters:
  • model_params (BnnParams) – The parameters of the Bayesian Neural Network, including weights and biases for each layer and their initial values for resetting

  • update_method (str, optional) – The method used for posterior inference, either “MCMC” or “VI” (default is “MCMC”).

  • update_kwargs (Optional[dict], optional) – A dictionary of keyword arguments for the update method. For MCMC, it contains ‘trace’ settings. For VI, it contains both ‘trace’ and ‘fit’ settings.

Notes

  • The model uses tanh activation for hidden layers and sigmoid activation for the output layer.

  • The output layer is designed for binary classification tasks, with probabilities modeled using a Bernoulli likelihood.

class Config

Bases: object

arbitrary_types_allowed = True
property approx_history: ndarray | None
arrange_update_kwargs()
check_context_matrix(context: ndarray)

Check and cast context matrix.

Parameters:

context (np.ndarray of shape (n_samples, n_features)) – Matrix of contextual features.

Returns:

context – Matrix of contextual features.

Return type:

pandas DataFrame of shape (n_samples, n_features)

classmethod cold_start(n_features: Annotated[int, Gt(gt=0)], hidden_dim_list: List[Annotated[int, Gt(gt=0)]] | None = None, update_method: Literal['VI', 'MCMC'] = 'MCMC', update_kwargs: dict | None = None, dist_params_init: Dict[str, float] | None = None, **kwargs) Self

Initialize a Bayesian Neural Network with a cold start.

Parameters:
  • n_features (PositiveInt) – Number of input features for the network.

  • hidden_dim_list (Optional[List[PositiveInt]], optional) – List of dimensions for the hidden layers of the network. If None, no hidden layers are added.

  • update_method (UpdateMethods) – Method to update the network, either “MCMC” or “VI”. Default is “MCMC”.

  • update_kwargs (Optional[dict], optional) – Additional keyword arguments for the update method. Default is None.

  • dist_params_init (Optional[Dict[str, float]], optional) – Initial distribution parameters for the network weights and biases. Default is None.

  • **kwargs – Additional keyword arguments for the BayesianNeuralNetwork constructor.

Returns:

An instance of the Bayesian Neural Network initialized with the specified parameters.

Return type:

Self

create_model(x: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y: List[BinaryReward] | ndarray | None = None, is_predict: bool = False) Model

Create a PyMC model for Bayesian Neural Network.

This method builds a PyMC model with the network architecture specified in model_params. The model uses tanh activation for hidden layers and sigmoid for the output layer.

Parameters:
  • x (ArrayLike) – Input features of shape (n_samples, n_features)

  • y (Union[List[BinaryReward], np.ndarray]) – Binary target values of shape (n_samples,)

  • is_predict (bool) – If True, process samples independently. If False, process all samples at once. In the predict step, we would like to sample the model parameters independently for each sample. In the update step, this is not required.

Returns:

PyMC model object with the specified neural network architecture

Return type:

PymcModel

Notes

The model structure follows these steps: 1. For each layer, create weight and bias variables from StudentT distributions. 2. Apply linear transformations and activations through the layers.

When is_sampelwise is True, the linear transformation is applied on each row separately (so random variables are not shared). When is_sampelwise is False, the linear transformation is applied on the whole matrix at once, so random variables are shared.

  1. Apply sigmoid activation at the output

  2. Use Bernoulli likelihood for binary classification

classmethod create_model_params(n_features: Annotated[int, Gt(gt=0)], hidden_dim_list: List[Annotated[int, Gt(gt=0)]], **dist_params_init) BnnParams

Creates model parameters for a Bayesian neural network (BNN) model according to dist_params_init This method initializes the distribution’s parameters for each layer of a BNN using the specified number of features, hidden dimensions, and distribution initialization parameters.

Parameters:
  • n_features (PositiveInt) – The number of input features for the BNN.

  • hidden_dim_list (List[PositiveInt]) – A list of integers specifying the number of hidden units in each hidden layer. If None, no hidden layers are added.

  • **dist_params_init (dict, optional) – Additional parameters for initializing the distribution of weights and biases.

Returns:

An instance of BnnParams containing the initialized layer parameters.

Return type:

BnnParams

classmethod get_layer_params_name(layer_ind: Annotated[int, Gt(gt=0)]) Tuple[str, str]
property input_dim: Annotated[int, Gt(gt=0)]

Returns the expected input dimension of the model.

Returns:

The number of input features expected by the model, derived from the shape of the weight matrix in the first layer’s parameters.

Return type:

int

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_params: BnnParams
model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

sample_proba(context: ndarray) List[Tuple[Probability, float]]

Samples probabilities and weighted sums from the prior predictive distribution.

Parameters:

context (ArrayLike) – The context matrix for which the probabilities are to be sampled.

Returns:

Each element is a tuple containing the probability of a positive reward and the corresponding weighted sum between contextual feature quantities and sampled coefficients.

Return type:

List[ProbabilityWeight]

update_kwargs: dict | None
update_method: str
class pybandits.model.BaseBeta(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)

Bases: Model, ABC

Beta Distribution model for Bernoulli multi-armed bandits.

Parameters:
  • n_successes (PositiveInt = 1) – Counter of the number of successes.

  • n_failures (PositiveInt = 1) – Counter of the number of failures.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sample_proba(n_samples: Annotated[int, Gt(gt=0)]) List[Probability]

Sample the probability of getting a positive reward.

Returns:

prob – Probability of getting a positive reward.

Return type:

Probability

property std: float

The corrected standard deviation (Bessel’s correction) of the binary distribution of successes and failures.

class pybandits.model.BaseBetaMO(*, models: List[Beta])

Bases: ModelMO, ABC

Base beta Distribution model for Bernoulli multi-armed bandits with multi-objectives.

Parameters:

models (List[Beta] of length (n_objectives,)) – List of Beta distributions.

classmethod cold_start(n_objectives: Annotated[int, Gt(gt=0)], **kwargs) BetaMO

Utility function to create a Bayesian Logistic Regression model or child model with cost control, with default parameters.

It is modeled as:

y = sigmoid(alpha + beta1 * x1 + beta2 * x2 + … + betaN * xN)

where the alpha and betas coefficients are Student’s t-distributions.

Parameters:
  • n_betas (PositiveInt) – The number of betas of the Bayesian Logistic Regression model. This is also the number of features expected after in the context matrix.

  • kwargs (Dict[str, Any]) – Additional arguments for the Bayesian Logistic Regression child model.

Returns:

beta_mo – The multi-objective Beta model.

Return type:

BetaMO

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

models: List[Beta]
sample_proba(n_samples: Annotated[int, Gt(gt=0)]) List[List[Probability]]

Sample the probability of getting a positive reward.

Parameters:

n_samples (PositiveInt) – Number of samples to draw.

Returns:

prob – Probabilities of getting a positive reward for each sample and objective.

Return type:

List[MOProbability]

class pybandits.model.BayesianLogisticRegression(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)

Bases: BayesianNeuralNetwork

A Bayesian Logistic Regression model that inherits from BayesianNeuralNetwork. This model is a specialized version of a Bayesian Neural Network with a single layer, designed specifically for logistic regression tasks. The model parameters are validated to ensure that the model adheres to this single-layer constraint.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

classmethod validate_model_params(model_params)
class pybandits.model.BayesianLogisticRegressionCC(*, cost: Annotated[float, Ge(ge=0)], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)

Bases: BayesianLogisticRegression, ModelCC

A Bayesian Logistic Regression model with cost control.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

class pybandits.model.BayesianNeuralNetwork(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)

Bases: BaseBayesianNeuralNetwork

Bayesian Neural Network class. This class implements a Bayesian Neural Network by extending the BaseBayesianNeuralNetwork. It provides functionality for probabilistic modeling and inference using neural networks.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

class pybandits.model.BayesianNeuralNetworkCC(*, cost: Annotated[float, Ge(ge=0)], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)

Bases: BaseBayesianNeuralNetwork, ModelCC

Bayesian Neural Network model for binary classification with cost constraint.

This class implements a Bayesian Neural Network with an arbitrary number of fully connected layers using PyMC for binary classification tasks. It supports both Markov Chain Monte Carlo (MCMC) and Variational Inference (VI) methods for posterior inference.

References

Bayesian Learning for Neural Networks (Radford M. Neal, 1995) https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=db869fa192a3222ae4f2d766674a378e47013b1b

Parameters:
  • model_params (BnnParams) – The parameters of the Bayesian Neural Network, including weights and biases for each layer and their initial values for resetting

  • update_method (str, optional) – The method used for posterior inference, either “MCMC” or “VI” (default is “MCMC”).

  • update_kwargs (Optional[dict], optional) – A dictionary of keyword arguments for the update method. For MCMC, it contains ‘trace’ settings. For VI, it contains both ‘trace’ and ‘fit’ settings.

  • cost (NonNegativeFloat) – Cost associated to the Bayesian Neural Network model.

Notes

  • The model uses tanh activation for hidden layers and sigmoid activation for the output layer.

  • The output layer is designed for binary classification tasks, with probabilities modeled using a Bernoulli likelihood.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

class pybandits.model.Beta(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)

Bases: BaseBeta

Beta Distribution model for Bernoulli multi-armed bandits.

Parameters:
  • n_successes (PositiveInt = 1) – Counter of the number of successes.

  • n_failures (PositiveInt = 1) – Counter of the number of failures.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.BetaCC(*, cost: Annotated[float, Ge(ge=0)], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)

Bases: BaseBeta, ModelCC

Beta Distribution model for Bernoulli multi-armed bandits with cost control.

Parameters:
  • n_successes (PositiveInt = 1) – Counter of the number of successes.

  • n_failures (PositiveInt = 1) – Counter of the number of failures.

  • cost (NonNegativeFloat) – Cost associated to the Beta distribution.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.BetaMO(*, models: List[Beta])

Bases: BaseBetaMO

Beta Distribution model for Bernoulli multi-armed bandits with multi-objectives.

Parameters:

models (List[Beta] of length (n_objectives,)) – List of Beta distributions.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.BetaMOCC(*, cost: Annotated[float, Ge(ge=0)], models: List[Beta])

Bases: BaseBetaMO, ModelCC

Beta Distribution model for Bernoulli multi-armed bandits with multi-objectives and cost control.

Parameters:
  • models (List[BetaCC] of shape (n_objectives,)) – List of Beta distributions.

  • cost (NonNegativeFloat) – Cost associated to the Beta distribution.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.BnnLayerParams(*, weight: StudentTArray, bias: StudentTArray)

Bases: PyBanditsBaseModel

Represents the parameters of a Bayesian neural network (BNN) layer.

Parameters:
  • weight (StudentTArray) – The weight parameter of the BNN layer, represented as a StudentTArray.

  • bias (StudentTArray) – The bias parameter of the BNN layer, represented as a StudentTArray.

bias: StudentTArray
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

weight: StudentTArray
class pybandits.model.BnnParams(*, bnn_layer_params: ~typing.List[~pybandits.model.BnnLayerParams] | None, bnn_layer_params_init: ~typing.List[~pybandits.model.BnnLayerParams] = <factory>)

Bases: PyBanditsBaseModel

Represents the parameters of a Bayesian Neural Network (BNN), including both the current layer parameters and the initial layer parameters. We keep the init parameters in case we need to reset the model.

Parameters:
  • bnn_layer_params (List[BnnLayerParams]) – A list of BNN layer parameters representing the current state of the model.

  • bnn_layer_params_init (List[BnnLayerParams]) – A list of BNN layer parameters representing the initial state of the model.

bnn_layer_params: List[BnnLayerParams] | None
bnn_layer_params_init: List[BnnLayerParams]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_inputs(values)
class pybandits.model.Model(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)

Bases: BaseModelSO, ABC

Class to model the prior distributions for single objective.

Parameters:
  • n_successes (PositiveInt = 1) – Counter of the number of successes.

  • n_failures (PositiveInt = 1) – Counter of the number of failures.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstract sample_proba(**kwargs) List[Probability] | List[List[Probability]] | List[Tuple[Probability, float]]

Sample the probability of getting a positive reward.

class pybandits.model.ModelCC(*, cost: Annotated[float, Ge(ge=0)])

Bases: BaseModelCC, ABC

Class to model action cost.

Parameters:

cost (NonNegativeFloat) – Cost associated to the action.

cost: Annotated[float, Ge(ge=0)]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.ModelMO(*, models: List[Model])

Bases: BaseModelMO, ABC

Class to model the prior distributions for multi-objective.

Parameters:

models (List[Model]) – The list of models for each objective.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

models: List[Model]
class pybandits.model.StudentTArray(*, mu: List[float] | List[List[float]], sigma: List[Annotated[float, Ge(ge=0)]] | List[List[Annotated[float, Ge(ge=0)]]], nu: List[Annotated[float, Gt(gt=0)]] | List[List[Annotated[float, Gt(gt=0)]]])

Bases: PyBanditsBaseModel

A class representing an array of Student’s t-distributions with parameters mu, sigma, and nu. A specific element (e.g, a single parameter of a layer) distribution is defined by the the corresponding elements in the lists. The mean values are represented by mu, the scale (standard deviation) values by sigma, and the degrees of freedom by nu.

Parameters:
  • mu (Union[List[float], List[List[float]]]) – The mean values of the Student’s t-distributions. Can be a 1D (for the layer bias term) or 2D list (for the layer weight term).

  • sigma (Union[List[NonNegativeFloat], List[List[NonNegativeFloat]]]) – The scale (standard deviation) values of the Student’s t-distributions. Must be non-negative. Can be a 1D or 2D list.

  • nu (Union[List[PositiveFloat], List[List[PositiveFloat]]]) – The degrees of freedom of the Student’s t-distributions. Must be positive. Can be a 1D or 2D list.

classmethod cold_start(shape: Annotated[int, Gt(gt=0)] | Tuple[Annotated[int, Gt(gt=0)], ...], mu: float = 0.0, sigma: Annotated[float, Ge(ge=0)] = 10.0, nu: Annotated[float, Gt(gt=0)] = 5.0) StudentTArray
static convert_list_to_array(input_list: List[float] | List[List[float]]) bool
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

mu: List[float] | List[List[float]]
nu: List[Annotated[float, Gt(gt=0)]] | List[List[Annotated[float, Gt(gt=0)]]]
property params
property shape: Tuple[Annotated[int, Gt(gt=0)], ...]
sigma: List[Annotated[float, Ge(ge=0)]] | List[List[Annotated[float, Ge(ge=0)]]]
classmethod validate_inputs(values)

pybandits.quantitative_model

class pybandits.quantitative_model.BaseCmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None], base_model_cold_start_kwargs: Dict[str, Any])

Bases: ZoomingModel, ABC

Zooming model for CMAB.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[BayesianNeuralNetwork]]) – Mapping of segments to Bayesian Logistic Regression models.

  • base_model_cold_start_kwargs (Dict[str, Any]) – Keyword arguments for the base model cold start.

base_model_cold_start_kwargs: Dict[str, Any]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None]
classmethod validate_n_features(value)
class pybandits.quantitative_model.BaseSmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None])

Bases: ZoomingModel, ABC

Zooming model for sMAB.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Beta]]) – Mapping of segments to Beta models.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None]
class pybandits.quantitative_model.CmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None], base_model_cold_start_kwargs: Dict[str, Any])

Bases: BaseCmabZoomingModel

Zooming model for CMAB.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[BayesianNeuralNetwork]]) – Mapping of segments to Bayesian Logistic Regression models.

  • base_model_cold_start_kwargs (Dict[str, Any]) – Keyword arguments for the base model cold start.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

class pybandits.quantitative_model.CmabZoomingModelCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None], base_model_cold_start_kwargs: Dict[str, Any])

Bases: BaseCmabZoomingModel, QuantitativeModelCC

Zooming model for CMAB with cost control.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[BayesianNeuralNetwork]]) – Mapping of segments to Bayesian Logistic Regression models.

  • base_model_cold_start_kwargs (Dict[str, Any]) – Keyword arguments for the base model cold start.

  • cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost associated to the Beta distribution.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

class pybandits.quantitative_model.QuantitativeModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)])

Bases: BaseModelSO, ABC

Base class for quantitative models.

Parameters:

dimension (PositiveInt) – Number of parameters of the model.

dimension: Annotated[int, Gt(gt=0)]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstract sample_proba(**kwargs) List[Tuple[Tuple[Tuple[Float_0_1, ...], Probability], ...]]

Sample the model.

class pybandits.quantitative_model.QuantitativeModelCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]])

Bases: BaseModelCC, ABC

Class to model quantitative action cost.

Parameters:

cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost associated to the Beta distribution.

cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]]
classmethod deserialize_cost(value)

Deserialize cost from string representation if needed.

encode_cost(value)
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

static serialize_cost(cost_value) str

Serialize cost value to string representation.

classmethod validate_cost(value)

Deserialize cost from string representation if needed.

class pybandits.quantitative_model.Segment(*, intervals: Tuple[Tuple[Float_0_1, Float_0_1], ...])

Bases: PyBanditsBaseModel

This class is used to represent a segment of the quantities space. A segment is defined by a list of intervals, thus representing a hyper rectangle.

Parameters:

intervals (Tuple[Tuple[Float01, Float01], ...]) – Intervals of the segment.

intervals: Tuple[Tuple[Float_0_1, Float_0_1], ...]
property intervals_array: ndarray
is_adjacent(other: Segment) bool

Check if two segments are adjacent. Segments are adjacent if they share a face,

meaning they have identical intervals in all dimensions except one, where they touch.

Parameters:

other (Segment) – Segment to check for adjacency.

Returns:

Whether the segments are adjacent.

Return type:

bool

property maxs: ndarray
property mins: ndarray
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod segment_intervals_to_tuple(value)
split() Tuple[Segment, Segment]
class pybandits.quantitative_model.SmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None])

Bases: BaseSmabZoomingModel

Zooming model for sMAB.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Beta]]) – Mapping of segments to Beta models.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

class pybandits.quantitative_model.SmabZoomingModelCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None])

Bases: BaseSmabZoomingModel, QuantitativeModelCC

Zooming model for sMAB with cost control.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Beta]]) – Mapping of segments to Beta models.

  • cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost associated to the Beta distribution.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

class pybandits.quantitative_model.ZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Model | None])

Bases: QuantitativeModel, ABC

This class is used to implement the zooming method. The approach is based on adaptive discretization of the quantitative action space. The space is represented s a hyper cube with a dimension number of dimensions. After each update step, the model checks if the segments are interesting or nuisance based on segment_update_factor. If a segment is interesting, it can be split to two segments. In contrast, adjacent nuisance segments can be merged based on comparison_threshold. The number of segments can be limited using n_max_segments.

References

Multi-Armed Bandits in Metric Spaces (Kleinberg, Slivkins, and Upfal, 2008) https://arxiv.org/pdf/0809.4882

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Model]]) – Mapping of segments to models.

classmethod cold_start(dimension: Annotated[int, Gt(gt=0)] = 1, comparison_threshold: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, **kwargs) Self

Create a cold start model.

Returns:

Cold start model.

Return type:

ZoomingModel

comparison_threshold: Float_0_1
classmethod deserialize_sub_actions(value)

Convert sub_actions from a dict with string keys (json representation) to tuple (object representation).

dimension: Annotated[int, Gt(gt=0)]
is_similar_performance(segment1: Segment, segment2: Segment) bool

Check if two segments have similar performance.

Parameters:
  • segment1 (Segment) – First segment.

  • segment2 (Segment) – Second segment.

Returns:

Whether the segments have similar performance.

Return type:

bool

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

n_comparison_points: Annotated[int, Gt(gt=0)]
n_max_segments: Annotated[int, Gt(gt=0)] | None
sample_proba(**kwargs) List[Tuple[Tuple[Tuple[Float_0_1, ...], Probability], ...]]

Sample an action value from each of the intervals.

segment_update_factor: Float_0_1
property segmented_actions: Dict[Segment, Model | None]
serialize_sub_actions(value)
sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Model | None]

pybandits.strategy

class pybandits.strategy.BestActionIdentificationBandit(*, exploit_p: Float_0_1 | None = 0.5)

Bases: Strategy

Best-Action Identification (BAI) strategy for multi-armed bandits.

References

Simple Bayesian Algorithms for Best-Arm Identification (Russo, 2018) https://arxiv.org/pdf/1602.08448.pdf

Parameters:

exploit_p (Optional[Float01], 0.5 if not specified) – Tuning parameter taking value in [0, 1] which specifies the probability of selecting the best or an alternative action. If exploit_p is 1, the bandit always selects the action with the highest probability of getting a positive reward. That is, it behaves as a Greedy strategy. If exploit_p is 0, the bandit always select the action with 2nd highest probability of getting a positive reward.

compare_best_actions(actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], Beta]) float

Compare the 2 best actions, hence the 2 actions with the highest expected means of getting a positive reward.

Parameters:

actions (Dict[UnifiedActionId, Beta])

Returns:

pvalue – p-value result of the statistical test.

Return type:

float

exploit_p: Float_0_1 | None
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod numerize_exploit_p(v)
select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], float], actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], BaseModel] | None = None) ActionId | Tuple[ActionId, Tuple[float, ...]]

Select with probability self.exploit_p the best action (i.e. the action with the highest probability of getting a positive reward), and with probability 1-self.exploit_p it returns the second best action (i.e. the action with the second highest probability of getting a positive reward).

Parameters:
  • p (Dict[UnifiedActionId, Probability]) – The dictionary of actions and their sampled probability of getting a positive reward.

  • actions (Optional[Dict[UnifiedActionId, BaseModel]]) – The dictionary of actions and their associated model.

Returns:

selected_action – The selected action.

Return type:

UnifiedActionId

with_exploit_p(exploit_p: Float_0_1 | None) Self

Instantiate a mutated cost control bandit strategy with an altered subsidy factor.

Parameters:

exploit_p (Optional[Float01], 0.5 if not specified) – Tuning parameter taking value in [0, 1] which specifies the probability of selecting the best or an alternative action. If exploit_p is 1, the bandit always selects the action with the highest probability of getting a positive reward. That is, it behaves as a Greedy strategy. If exploit_p is 0, the bandit always select the action with 2nd highest probability of getting a positive reward.

Returns:

mutated_best_action_identification – The mutated best action identification strategy.

Return type:

BestActionIdentificationBandit

class pybandits.strategy.ClassicBandit

Bases: Strategy

Classic multi-armed bandits strategy.

References

Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

Thompson Sampling for Contextual Bandits with Linear Payoffs (Agrawal and Goyal, 2014) https://arxiv.org/pdf/1209.3352.pdf

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], float], actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], BaseModel] | None = None) ActionId | Tuple[ActionId, Tuple[float, ...]]

Select the action with the highest probability of getting a positive reward.

Parameters:
  • p (Dict[UnifiedActionId, Probability]) – The dictionary of actions and their sampled probability of getting a positive reward.

  • actions (Optional[Dict[UnifiedActionId, BaseModel]]) – The dictionary of actions and their associated model.

Returns:

selected_action – The selected action.

Return type:

UnifiedActionId

class pybandits.strategy.CostControlBandit(*, subsidy_factor: Float_0_1 | None = 0.5)

Bases: CostControlStrategy

Cost Control (CC) strategy for multi-armed bandits.

Bandits are extended to include a control of the action cost. Each action is associated with a predefined “cost”. At prediction time, the model considers the actions whose expected rewards are above a pre-defined lower bound. Among these actions, the one with the lowest associated cost is recommended. The expected reward interval for feasible actions is defined as [(1-subsidy_factor)*max_p, max_p], where max_p is the highest expected reward sampled value.

References

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638

Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488

Parameters:

subsidy_factor (Optional[Float01], 0.5 if not specified) – Number in [0, 1] to define smallest tolerated probability reward, hence the set of feasible actions. If subsidy_factor is 1, the bandits always selects the action with the minimum cost. If subsidy_factor is 0, the bandits always selects the action with highest probability of getting a positive reward (it behaves as a classic Bernoulli bandit).

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod numerize_subsidy_factor(v)
select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], Probability], actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], BaseModel]) ActionId | Tuple[ActionId, Tuple[float, ...]]

Select the action with the minimum cost among the set of feasible actions (the actions whose expected rewards are above a certain lower bound defined as [(1-subsidy_factor)*max_p, max_p], where max_p is the highest expected reward sampled value.

Parameters:
  • p (Dict[UnifiedActionId, Probability]) – The dictionary or actions and their sampled probability of getting a positive reward.

  • actions (Dict[UnifiedActionId, BetaCC]) – The dictionary or actions and their cost.

Returns:

selected_action – The selected action.

Return type:

UnifiedActionId

subsidy_factor: Float_0_1 | None
with_subsidy_factor(subsidy_factor: Float_0_1 | None) Self

Instantiate a mutated cost control bandit strategy with an altered subsidy factor.

Parameters:

subsidy_factor (Optional[Float01], 0.5 if not specified) – Number in [0, 1] to define smallest tolerated probability reward, hence the set of feasible actions. If subsidy_factor is 1, the bandits always selects the action with the minimum cost. If subsidy_factor is 0, the bandits always selects the action with highest probability of getting a positive reward (it behaves as a classic Bernoulli bandit).

Returns:

mutated_cost_control_bandit – The mutated cost control bandit strategy.

Return type:

CostControlBandit

class pybandits.strategy.CostControlStrategy

Bases: Strategy, ABC

Cost Control (CC) strategy for multi-armed bandits.

Bandits are extended to include a control of the action cost. Each action is associated with a predefined “cost”.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.strategy.MultiObjectiveBandit

Bases: MultiObjectiveStrategy

Multi-Objective (MO) strategy for multi-armed bandits.

The reward pertaining to an action is a multidimensional vector instead of a scalar value. In this setting, different actions are compared according to Pareto order between their expected reward vectors, and those actions whose expected rewards are not inferior to that of any other actions are called Pareto optimal actions, all of which constitute the Pareto front.

References

Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem (Yahyaa and Manderick, 2015) https://www.researchgate.net/publication/272823659_Thompson_Sampling_for_Multi-Objective_Multi-Armed_Bandits_Problem

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], List[Probability]], **kwargs) ActionId | Tuple[ActionId, Tuple[float, ...]]

Select an action at random from the Pareto optimal set of action. The Pareto optimal action set (Pareto front) A* is the set of actions not dominated by any other actions not in A*. Dominance relation is established based on the objective reward probabilities vectors.

Parameters:

p (Dict[ActionId, List[Probability]]) – The dictionary of actions and their sampled probability of getting a positive reward for each objective.

Returns:

selected_action – The selected action.

Return type:

ActionId

class pybandits.strategy.MultiObjectiveCostControlBandit

Bases: MultiObjectiveStrategy, CostControlStrategy

Multi-Objective (MO) with Cost Control (CC) strategy for multi-armed bandits.

This strategy allows the reward to be a multidimensional vector and include a control of the action cost. It merges the Multi-Objective and Cost Control strategies.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], List[Probability]], actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], BetaMOCC]) ActionId | Tuple[ActionId, Tuple[float, ...]]

Select the action with the minimum cost among the Pareto optimal set of action. The Pareto optimal action set (Pareto front) A* is the set of actions not dominated by any other actions not in A*. Dominance relation is established based on the objective reward probabilities vectors.

Parameters:

p (Dict[UnifiedActionId, List[Probability]]) – The dictionary of actions and their sampled probability of getting a positive reward for each objective.

Returns:

selected_action – The selected action.

Return type:

UnifiedActionId

class pybandits.strategy.MultiObjectiveStrategy

Bases: Strategy, ABC

Multi Objective Strategy to select actions in multi-armed bandits.

classmethod get_pareto_front(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], List[Probability]]) List[ActionId | Tuple[ActionId, Tuple[float, ...]]]

Create Pareto optimal set of actions (Pareto front) A* identified as actions that are not dominated by any action out of the set A*.

Parameters:

p: Dict[UnifiedActionId, Probability]

The dictionary or actions and their sampled probability of getting a positive reward for each objective.

returns:

pareto_front – The list of Pareto optimal actions

rtype:

set

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.strategy.Strategy

Bases: PyBanditsBaseModel, ABC

Strategy to select actions in multi-armed bandits.

classmethod get_expected_value_from_state(state: Dict[str, Any], field_name: str) float
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod numerize_field(v, field_name: str)
abstract select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], Probability], actions: Dict[ActionId, BaseModel] | None) ActionId | Tuple[ActionId, Tuple[float, ...]]

Select the action.

pybandits.strategy.random() x in the interval [0, 1).

pybandits.actions_manager

class pybandits.actions_manager.ActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: PyBanditsBaseModel, ABC

Base class for managing actions and their associated models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update. The change point detection is based on the adaptive windowing scheme.

References

Scaling Multi-Armed Bandit Algorithms (Fouché et al., 2019) https://edouardfouche.com/publications/S-MAB_FOUCHE_KDD19.pdf

Parameters:
  • actions (Dict[ActionId, Model]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability]) – The confidence level for the adaptive window. None for skipping the change point detection.

actions: Dict[ActionId, BaseModel]
actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]]
classmethod at_least_one_action_is_defined(v)
delta: PositiveProbability | None
property maximum_memory_length: Annotated[int, Ge(ge=0)]

Get maximum possible memory length based on current action statistics.

Returns:

Maximum memory length allowed.

Return type:

NonNegativeInt

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None = None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, **kwargs)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[CmabModelType]

Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, CmabModelType]
classmethod check_models(v)
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

pybandits.actions_manager.CmabActionsManagerCC

alias of CmabActionsManager[Union[BayesianNeuralNetworkCC, CmabZoomingModelCC]]

pybandits.actions_manager.CmabActionsManagerSO

alias of CmabActionsManager[Union[BayesianNeuralNetwork, CmabZoomingModel]]

class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[CmabModelType]

Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, CmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod check_models(v)
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[CmabModelType]

Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, CmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod check_models(v)
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

pybandits.actions_manager.SmabActionsManagerCC

alias of SmabActionsManager[Union[BetaCC, SmabZoomingModelCC]]

pybandits.actions_manager.SmabActionsManagerMO

alias of SmabActionsManager[BetaMO]

pybandits.actions_manager.SmabActionsManagerMOCC

alias of SmabActionsManager[BetaMOCC]

pybandits.actions_manager.SmabActionsManagerSO

alias of SmabActionsManager[Union[Beta, SmabZoomingModel]]

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

pybandits.smab_simulator

class pybandits.smab_simulator.SmabSimulator(*, smab: BaseSmabBernoulli, n_updates: Annotated[int, Gt(gt=0)] = 10, batch_size: Annotated[int, Gt(gt=0)] = 100, probs_reward: Dict[ActionId, Probability | Callable[[ndarray], Probability]] | Dict[str, Dict[ActionId, Probability | Callable[[ndarray], Probability]]] | None = None, save: bool = False, path: str = '', file_prefix: str = '', random_seed: Annotated[int, Ge(ge=0)] | None = None, verbose: bool = False, visualize: bool = False)

Bases: Simulator

Simulate environment for stochastic multi-armed bandits.

This class performs simulation of stochastic Multi-Armed Bandits (sMAB). Data are processed in batches of size n>=1. Per each batch of simulated samples, the mab selects one action and collects the corresponding simulated reward for each sample. Then, prior parameters are updated based on returned rewards from recommended actions.

Parameters:

mab (BaseSmabBernoulli) – sMAB model.

mab: BaseSmabBernoulli
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_Simulator__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

probs_reward: Dict[ActionId, Probability | Callable[[ndarray], Probability]] | Dict[str, Dict[ActionId, Probability | Callable[[ndarray], Probability]]] | None
classmethod replace_null_and_validate_probs_reward(values)
classmethod validate_probs_reward_columns(values)

pybandits.cmab_simulator

class pybandits.cmab_simulator.CmabSimulator(*, cmab: BaseCmabBernoulli, n_updates: Annotated[int, Gt(gt=0)] = 10, batch_size: Annotated[int, Gt(gt=0)] = 100, probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None = None, save: bool = False, path: str = '', file_prefix: str = '', random_seed: Annotated[int, Ge(ge=0)] | None = None, verbose: bool = False, visualize: bool = False, context: ndarray, group: List | None = None)

Bases: Simulator

Simulate environment for contextual multi-armed bandit models.

This class simulates information required by the contextual bandit. Generated data are processed by the bandit with batches of size n>=1. For each batch of samples, actions are recommended by the bandit and corresponding simulated rewards collected. Bandit policy parameters are then updated based on returned rewards from recommended actions.

Parameters:
  • mab (BaseCmabBernoulli) – Contextual multi-armed bandit model

  • context (np.ndarray of shape (n_samples, n_feature)) – Context matrix of samples features.

  • group (Optional[List] with length=n_samples) – Group to which each sample belongs. Samples which belongs to the same group have features that come from the same distribution and they have the same probability to receive a positive/negative feedback from each action. If not supplied, all samples are assigned to the group.

context: ndarray
group: List | None
mab: BaseCmabBernoulli
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_Simulator__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None
classmethod replace_nulls_and_validate_sizes_and_dtypes(values)
classmethod validate_probs_reward_columns(values)

pybandits.offline_policy_evaluator

class pybandits.cmab_simulator.CmabSimulator(*, cmab: BaseCmabBernoulli, n_updates: Annotated[int, Gt(gt=0)] = 10, batch_size: Annotated[int, Gt(gt=0)] = 100, probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None = None, save: bool = False, path: str = '', file_prefix: str = '', random_seed: Annotated[int, Ge(ge=0)] | None = None, verbose: bool = False, visualize: bool = False, context: ndarray, group: List | None = None)

Bases: Simulator

Simulate environment for contextual multi-armed bandit models.

This class simulates information required by the contextual bandit. Generated data are processed by the bandit with batches of size n>=1. For each batch of samples, actions are recommended by the bandit and corresponding simulated rewards collected. Bandit policy parameters are then updated based on returned rewards from recommended actions.

Parameters:
  • mab (BaseCmabBernoulli) – Contextual multi-armed bandit model

  • context (np.ndarray of shape (n_samples, n_feature)) – Context matrix of samples features.

  • group (Optional[List] with length=n_samples) – Group to which each sample belongs. Samples which belongs to the same group have features that come from the same distribution and they have the same probability to receive a positive/negative feedback from each action. If not supplied, all samples are assigned to the group.

batch_size: PositiveInt
context: ndarray
file_prefix: str
group: List | None
mab: BaseCmabBernoulli
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_Simulator__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

n_updates: PositiveInt
path: str
probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None
random_seed: NonNegativeInt | None
classmethod replace_nulls_and_validate_sizes_and_dtypes(values)
save: bool
classmethod validate_probs_reward_columns(values)
verbose: bool
visualize: bool

pybandits.offline_policy_estimator

Comprehensive Offline Policy Evaluation (OPE) estimators.

This module provides a complete set of estimators for OPE.

class pybandits.offline_policy_estimator.BalancedInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedInverseProbabilityWeighting

Balanced Inverse Probability Weighing estimator.

References

Balanced Off-Policy Evaluation in General Action Spaces (Sondhi, Arbour, and Dimmery, 2020) https://arxiv.org/pdf/1906.03694

Parameters:
  • alpha (Float01, defaults to 0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, defaults to 10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, defaults to None) – Random seed for bootstrap sampling.

  • ----------

  • Sondhi (Arjun)

  • Arbour (David)

  • Dimmery (and Drew)

  • Spaces." ("Balanced Off-Policy Evaluation in General Action)

  • 2020.

estimate_sample_rewards(reward: ndarray, expected_importance_weight: ndarray, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • expected_importance_weight (np.ndarray) – Array of expected importance weights.

Returns:

sample_reward – Estimated rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'b-ipw'
class pybandits.offline_policy_estimator.BaseOfflinePolicyEstimator(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: PyBanditsBaseModel, ABC

Base class for all OPE estimators.

This class defines the interface for all OPE estimators and provides a common method for estimating the policy value.

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

alpha: Float_0_1
estimate_policy_value_with_confidence_interval(**kwargs) Tuple[float, float, float, float]

Estimate the policy value with a confidence interval.

Parameters:

action (np.ndarray) – Array of actions taken.

Returns:

Estimated policy value, mean, lower bound, and upper bound of the confidence interval.

Return type:

Tuple[float, float, float, float]

abstract estimate_sample_rewards(**kwargs) ndarray

Estimate sample rewards.

Returns:

Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_bootstrap_samples: int
name: ClassVar
random_state: int | None
class pybandits.offline_policy_estimator.DirectMethod(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: BaseOfflinePolicyEstimator

Direct Method (DM) estimator.

This estimator uses the evaluation policy to Estimate the sample rewards.

References

The Offset Tree for Learning with Partial Labels (Beygelzimer and Langford, 2009) https://arxiv.org/pdf/0812.4044

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(estimated_policy: ndarray, expected_reward: ndarray, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • estimated_policy (np.ndarray) – Array of action distributions.

  • expected_reward (np.ndarray) – Array of expected rewards.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'dm'
class pybandits.offline_policy_estimator.DoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedDoublyRobust

Doubly Robust (DR) estimator.

Doubly Robust Policy Evaluation and Optimization (Dudík, Erhan, Langford, and Li, 2014) https://arxiv.org/pdf/1503.02834

More Robust Doubly Robust Off-policy Evaluation (Farajtabar, Chow, and Ghavamzadeh, 2018) https://arxiv.org/pdf/1802.03493

alphaFloat01, default=0.05

Significance level for confidence interval estimation.

n_bootstrap_samplesint, default=10000

Number of bootstrap samples for confidence interval estimation.

random_stateint, default=None

Random seed for bootstrap sampling.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'dr'
class pybandits.offline_policy_estimator.DoublyRobustWithOptimisticShrinkage(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, shrinkage_factor: Annotated[float, Ge(ge=0)] = 0.0)

Bases: DoublyRobust

Optimistic version of DRos estimator.

This estimator uses a shrinkage factor to shrink the importance weight in the native DR.

References

Doubly Robust Off-Policy Evaluation with Shrinkage (Su, Dimakopoulou, Krishnamurthy, and Dudik, 2020) https://arxiv.org/pdf/1907.09623

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

  • shrinkage_factor (float, default=0.0) – Shrinkage factor for the importance weights. If set to 0 or infinity, the estimator is equivalent to the native DM or DR estimators, respectively.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'dros-opt'
shrinkage_factor: Annotated[float, Ge(ge=0)]
class pybandits.offline_policy_estimator.DoublyRobustWithPessimisticShrinkage(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, shrinkage_factor: Annotated[float, Gt(gt=0)] = inf)

Bases: DoublyRobust

Pessimistic version of DRos estimator.

This estimator uses a shrinkage factor to shrink the importance weight in the native DR.

References

Doubly Robust Off-Policy Evaluation with Shrinkage (Su, Dimakopoulou, Krishnamurthy, and Dudik, 2020) https://arxiv.org/pdf/1907.09623

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

  • shrinkage_factor (float, default=0.0) – Shrinkage factor for the importance weights.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'dros-pess'
shrinkage_factor: Annotated[float, Gt(gt=0)]
class pybandits.offline_policy_estimator.GeneralizedDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: BaseOfflinePolicyEstimator, ABC

Abstract generalization of the Doubly Robust (DR) estimator.

References

Doubly Robust Policy Evaluation and Optimization (Dudík, Erhan, Langford, and Li, 2014) https://arxiv.org/pdf/1503.02834

More Robust Doubly Robust Off-policy Evaluation (Farajtabar, Chow, and Ghavamzadeh, 2018) https://arxiv.org/pdf/1802.03493

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(action: ndarray, reward: ndarray, propensity_score: ndarray, estimated_policy: ndarray, expected_reward: ndarray, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • action (np.ndarray) – Array of actions taken.

  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • propensity_score (np.ndarray) – Array of propensity scores.

  • estimated_policy (np.ndarray) – Array of action distributions.

  • expected_reward (np.ndarray) – Array of expected rewards.

Returns:

sample_reward – Estimated rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

class pybandits.offline_policy_estimator.GeneralizedInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: BaseOfflinePolicyEstimator, ABC

Abstract generalization of the Inverse Probability Weighting (IPW) estimator.

References

Learning from Logged Implicit Exploration Data (Strehl, Langford, Li, and Kakade, 2010) https://arxiv.org/pdf/1003.0120

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(reward: ndarray, shrinkage_method: Callable | None, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • shrinkage_method (Optional[Callable]) – Shrinkage method for the importance weights.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.offline_policy_estimator.InverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedInverseProbabilityWeighting

Inverse Probability Weighing (IPW) estimator.

References

Learning from Logged Implicit Exploration Data (Strehl, Langford, Li, and Kakade, 2010) https://arxiv.org/pdf/1003.0120

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(action: ndarray, reward: ndarray, propensity_score: ndarray, estimated_policy: ndarray, shrinkage_method: Callable | None = None, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • action (np.ndarray) – Array of actions taken.

  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • propensity_score (np.ndarray) – Array of propensity scores.

  • estimated_policy (np.ndarray) – Array of action distributions.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'ipw'
class pybandits.offline_policy_estimator.ReplayMethod(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: BaseOfflinePolicyEstimator

Replay Method estimator.

This estimator is a simple baseline that estimates the policy value by averaging the rewards of the matched samples.

References

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms (Li, Chu, Langford, and Wang, 2011) https://arxiv.org/pdf/1003.5956

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(action: ndarray, reward: ndarray, estimated_policy: ndarray, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • action (np.ndarray) – Array of actions taken.

  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • estimated_policy (np.ndarray) – Array of action distributions.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'rep'
class pybandits.offline_policy_estimator.SelfNormalizedDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedDoublyRobust

Self-Normalized Doubly Robust (SNDR) estimator.

This estimator uses the self-normalized importance weights to combine the DR and IPS estimators.

References

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning (Kallus and Uehara, 2019) https://arxiv.org/pdf/1906.03735

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'sndr'
class pybandits.offline_policy_estimator.SelfNormalizedInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: InverseProbabilityWeighting

Self-Normalized Inverse Propensity Score (SNIPS) estimator.

References

The Self-normalized Estimator for Counterfactual Learning (Swaminathan and Joachims, 2015) https://papers.nips.cc/paper_files/paper/2015/file/39027dfad5138c9ca0c474d71db915c3-Paper.pdf

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(action: ndarray, reward: ndarray, propensity_score: ndarray, estimated_policy: ndarray, shrinkage_method: Callable | None = None, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • action (np.ndarray) – Array of actions taken.

  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • propensity_score (np.ndarray) – Array of propensity scores.

  • estimated_policy (np.ndarray) – Array of action distributions.

  • shrinkage_method (Optional[Callable]) – Shrinkage method for the importance weights.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'snips'
class pybandits.offline_policy_estimator.SubGaussianDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedDoublyRobust

SubGaussian Doubly Robust estimator.

References

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning (Metelli, Russo, and Restelli, 2021) https://proceedings.neurips.cc/paper_files/paper/2021/file/4476b929e30dd0c4e8bdbcc82c6ba23a-Paper.pdf

Parameters:
  • alpha (Float01, defaults to 0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, defaults to 10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, defaults to None) – Random seed for bootstrap sampling.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'sg-dr'
class pybandits.offline_policy_estimator.SubGaussianInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, shrinkage_factor: Float_0_1 = 0.0)

Bases: InverseProbabilityWeighting

SubGaussian Inverse Probability Weighing estimator.

References

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning (Metelli, Russo, and Restelli, 2021) https://proceedings.neurips.cc/paper_files/paper/2021/file/4476b929e30dd0c4e8bdbcc82c6ba23a-Paper.pdf

Parameters:
  • alpha (Float01, defaults to 0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, defaults to 10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, defaults to None) – Random seed for bootstrap sampling.

  • shrinkage_factor (Float01, defaults to 0.0) – Shrinkage factor for the importance weights.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'sg-ipw'
shrinkage_factor: Float_0_1
class pybandits.offline_policy_estimator.SwitchDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, switch_threshold: float = inf)

Bases: DoublyRobust

Switch Doubly Robust (Switch-DR) estimator.

This estimator uses a switching rule based on the propensity score to combine the DR and IPS estimators.

References

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits (Wang, Agarwal, and Dudik, 2017) https://arxiv.org/pdf/1507.02646

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (Optional[int], default=None) – Random seed for bootstrap sampling.

  • switch_threshold (float, default=inf) – Threshold for the importance weight to switch between the DR and IPS estimators.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'switch-dr'
switch_threshold: float