pybandits
pybandits.smab
- class pybandits.smab.BaseSmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BaseBeta | BaseSmabZoomingModel], strategy: Strategy)
Bases:
BaseMab
,ABC
Base model for a Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling.
- Parameters:
actions (Dict[ActionId, Union[BaseBeta, BaseSmabZoomingModel]]) – The list of possible actions, and their associated Model.
strategy (Strategy) – The strategy used to select actions.
- actions_manager: SmabActionsManager[BaseBeta | BaseSmabZoomingModel]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- predict(n_samples: Annotated[int, Gt(gt=0)] = 1, forbidden_actions: Set[ActionId] | None = None) SmabPredictions
Predict actions.
- Parameters:
n_samples (PositiveInt, default=1) – Number of samples to predict.
forbidden_actions (Optional[Set[ActionId]], default=None) – Set of forbidden actions. If specified, the model will discard the forbidden_actions and it will only consider the remaining allowed_actions. By default, the model considers all actions as allowed_actions. Note that: actions = allowed_actions U forbidden_actions.
- Returns:
actions (List[UnifiedActionId]) – The actions selected by the multi-armed bandit model.
probs (Union[List[Dict[UnifiedActionId, Probability]], List[Dict[UnifiedActionId, MOProbability]]]) – The probabilities of getting a positive reward for each action.
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None = None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)
Update the stochastic Bernoulli bandit given the list of selected actions and their corresponding binary rewards.
- Parameters:
actions (List[ActionId] of shape (n_samples,), e.g. ['a1', 'a2', 'a3', 'a4', 'a5']) – The selected action for each sample.
rewards (List[Union[BinaryReward, List[BinaryReward]]] of shape (n_samples, n_objectives)) –
- The binary reward for each sample.
- If strategy is not MultiObjectiveBandit, rewards should be a list, e.g.
rewards = [1, 0, 1, 1, 1, …]
- If strategy is MultiObjectiveBandit, rewards should be a list of list, e.g. (with n_objectives=2):
rewards = [[1, 1], [1, 0], [1, 1], [1, 0], [1, 1], …]
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
- class pybandits.smab.SmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[Beta | SmabZoomingModel], strategy: ClassicBandit)
Bases:
BaseSmabBernoulli
Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling.
References
Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf
- Parameters:
actions_manager (SmabActionsManagerSO) – The manager for actions and their associated models.
strategy (ClassicBandit) – The strategy used to select actions.
- actions_manager: SmabActionsManager[Beta | SmabZoomingModel]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- strategy: ClassicBandit
- class pybandits.smab.SmabBernoulliBAI(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[Beta | SmabZoomingModel], strategy: BestActionIdentificationBandit)
Bases:
BaseSmabBernoulli
Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling, and Best Action Identification strategy.
References
Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf
- Parameters:
actions_manager (SmabActionsManagerSO) – The manager for actions and their associated models.
strategy (BestActionIdentificationBandit) – The strategy used to select actions.
- actions_manager: SmabActionsManager[Beta | SmabZoomingModel]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- strategy: BestActionIdentificationBandit
- class pybandits.smab.SmabBernoulliCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BetaCC | SmabZoomingModelCC], strategy: CostControlBandit)
Bases:
BaseSmabBernoulli
Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling, and Cost Control strategy.
The sMAB is extended to include a control of the action cost. Each action is associated with a predefined “cost”. At prediction time, the model considers the actions whose expected rewards is above a pre-defined lower bound. Among these actions, the one with the lowest associated cost is recommended. The expected reward interval for feasible actions is defined as [(1-subsidy_factor) * max_p, max_p], where max_p is the highest expected reward sampled value.
References
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638
Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488
- Parameters:
actions_manager (SmabActionsManagerCC) – The manager for actions and their associated models.
strategy (CostControlBandit) – The strategy used to select actions.
- actions_manager: SmabActionsManager[BetaCC | SmabZoomingModelCC]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- strategy: CostControlBandit
- class pybandits.smab.SmabBernoulliMO(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BetaMO], strategy: MultiObjectiveBandit)
Bases:
BaseSmabBernoulli
Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling, and Multi-Objectives strategy.
The reward pertaining to an action is a multidimensional vector instead of a scalar value. In this setting, different actions are compared according to Pareto order between their expected reward vectors, and those actions whose expected rewards are not inferior to that of any other actions are called Pareto optimal actions, all of which constitute the Pareto front.
References
Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem (Yahyaa and Manderick, 2015) https://www.researchgate.net/publication/272823659_Thompson_Sampling_for_Multi-Objective_Multi-Armed_Bandits_Problem
- Parameters:
actions_manager (SmabActionsManagerMO) – The manager for actions and their associated models.
strategy (MultiObjectiveBandit) – The strategy used to select actions.
- actions_manager: SmabActionsManager[BetaMO]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- strategy: MultiObjectiveBandit
- class pybandits.smab.SmabBernoulliMOCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BetaMOCC], strategy: MultiObjectiveCostControlBandit)
Bases:
BaseSmabBernoulli
Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling implementation for Multi-Objective (MO) with Cost Control (CC) strategy.
This Bandit allows the reward to be a multidimensional vector and include a control of the action cost. It merges the Multi-Objective and Cost Control strategies.
- Parameters:
actions_manager (SmabActionsManagerMOCC) – The manager for actions and their associated models.
strategy (MultiObjectiveCostControlBandit) – The strategy used to select actions.
- actions_manager: SmabActionsManager[BetaMOCC]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- strategy: MultiObjectiveCostControlBandit
pybandits.cmab
- class pybandits.cmab.BaseCmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BaseBayesianNeuralNetwork | BaseCmabZoomingModel], strategy: Strategy)
Bases:
BaseMab
,ABC
Base model for a Contextual Multi-Armed Bandit for Bernoulli bandits with Thompson Sampling.
- Parameters:
actions (Dict[ActionId, Union[BaseBayesianLogisticRegression, BaseCmabZoomingModel]]) – The list of possible actions, and their associated Model.
strategy (Strategy) – The strategy used to select actions.
- actions_manager: CmabActionsManager[BaseBayesianNeuralNetwork | BaseCmabZoomingModel]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_BaseMab__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- predict(context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], forbidden_actions: Set[ActionId] | None = None) CmabPredictions
Predict actions.
- Parameters:
context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.
forbidden_actions (Optional[Set[ActionId]], default=None) – Set of forbidden actions. If specified, the model will discard the forbidden_actions and it will only consider the remaining allowed_actions. By default, the model considers all actions as allowed_actions. Note that: actions = allowed_actions U forbidden_actions.
- Returns:
actions (List[ActionId] of shape (n_samples,)) – The actions selected by the multi-armed bandit model.
probs (Union[List[Dict[UnifiedActionId, Probability]], List[Dict[UnifiedActionId, MOProbability]]]) – The probabilities of getting a positive reward for each action.
ws (Union[List[Dict[UnifiedActionId, float]], List[Dict[UnifiedActionId, List[float]]]]) – The weighted sum of logistic regression logits.
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], quantities: List[float | List[float] | None] | None = None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)
Update the contextual Bernoulli bandit given the list of selected actions and their corresponding binary rewards.
- Parameters:
actions (List[ActionId] of shape (n_samples,), e.g. ['a1', 'a2', 'a3', 'a4', 'a5']) – The selected action for each sample.
rewards (List[Union[BinaryReward, List[BinaryReward]]] of shape (n_samples, n_objectives)) –
- The binary reward for each sample.
- If strategy is not MultiObjectiveBandit, rewards should be a list, e.g.
rewards = [1, 0, 1, 1, 1, …]
- If strategy is MultiObjectiveBandit, rewards should be a list of list, e.g. (with n_objectives=2):
rewards = [[1, 1], [1, 0], [1, 1], [1, 0], [1, 1], …]
context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.
- classmethod update_old_state(state: Dict[str, str | int | float | bool | None | Dict[str, str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]] | List[str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]]], delta: PositiveProbability | None) Dict[str, str | int | float | bool | None | Dict[str, str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]] | List[str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]]]
Update the model state to the current version. Besides the updates in the MAB class, it also loads legacy Bayesian Logistic Regression model parmeters into the new Bayesian Neural Network model.
- Parameters:
state (Dict[str, Serializable]) – The internal state of a model (actions, strategy, etc.) of the same type. The state is expected to be in the old format of PyBandits below the current supported version.
delta (Optional[PositiveProbability]) – The delta value to be set in the actions_manager. If None, it will not be set. This is relevant only for adaptive window models.
- Returns:
state – The updated state of the model. The state is in the current format of PyBandits, with actions_manager and delta added if needed.
- Return type:
Dict[str, Serializable]
- class pybandits.cmab.CmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetwork | CmabZoomingModel], strategy: ClassicBandit)
Bases:
BaseCmabBernoulli
Contextual Bernoulli Multi-Armed Bandit with Thompson Sampling.
References
Thompson Sampling for Contextual Bandits with Linear Payoffs (Agrawal and Goyal, 2014) https://arxiv.org/pdf/1209.3352.pdf
- Parameters:
actions_manager (CmabActionsManagerSO) – The manager for actions and their associated models.
strategy (ClassicBandit) – The strategy used to select actions.
- actions_manager: CmabActionsManager[BayesianNeuralNetwork | CmabZoomingModel]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_BaseMab__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- strategy: ClassicBandit
- class pybandits.cmab.CmabBernoulliBAI(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetwork | CmabZoomingModel], strategy: BestActionIdentificationBandit)
Bases:
BaseCmabBernoulli
Contextual Bernoulli Multi-Armed Bandit with Thompson Sampling, and Best Action Identification strategy.
References
Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf
- Parameters:
actions_manager (CmabActionsManagerSO) – The manager for actions and their associated models.
strategy (BestActionIdentificationBandit) – The strategy used to select actions.
- actions_manager: CmabActionsManager[BayesianNeuralNetwork | CmabZoomingModel]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_BaseMab__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- strategy: BestActionIdentificationBandit
- class pybandits.cmab.CmabBernoulliCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetworkCC | CmabZoomingModelCC], strategy: CostControlBandit)
Bases:
BaseCmabBernoulli
Contextual Bernoulli Multi-Armed Bandit with Thompson Sampling, and Cost Control strategy.
The Cmab is extended to include a control of the action cost. Each action is associated with a predefined “cost”. At prediction time, the model considers the actions whose expected rewards is above a pre-defined lower bound. Among these actions, the one with the lowest associated cost is recommended. The expected reward interval for feasible actions is defined as [(1-subsidy_factor) * max_p, max_p], where max_p is the highest expected reward sampled value.
References
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638
Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488
- Parameters:
actions_manager (CmabActionsManagerCC) – The manager for actions and their associated models.
strategy (CostControlBandit) – The strategy used to select actions.
- actions_manager: CmabActionsManager[BayesianNeuralNetworkCC | CmabZoomingModelCC]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_BaseMab__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- strategy: CostControlBandit
pybandits.model
- class pybandits.model.BaseBayesianNeuralNetwork(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)
Bases:
Model
,ABC
Bayesian Neural Network model for binary classification.
This class implements a Bayesian Neural Network with an arbitrary number of fully connected layers using PyMC for binary classification tasks. It supports both Markov Chain Monte Carlo (MCMC) and Variational Inference (VI) methods for posterior inference.
References
Bayesian Learning for Neural Networks (Radford M. Neal, 1995) https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=db869fa192a3222ae4f2d766674a378e47013b1b
- Parameters:
model_params (BnnParams) – The parameters of the Bayesian Neural Network, including weights and biases for each layer and their initial values for resetting
update_method (str, optional) – The method used for posterior inference, either “MCMC” or “VI” (default is “MCMC”).
update_kwargs (Optional[dict], optional) – A dictionary of keyword arguments for the update method. For MCMC, it contains ‘trace’ settings. For VI, it contains both ‘trace’ and ‘fit’ settings.
Notes
The model uses tanh activation for hidden layers and sigmoid activation for the output layer.
The output layer is designed for binary classification tasks, with probabilities modeled using a Bernoulli likelihood.
- property approx_history: ndarray | None
- arrange_update_kwargs()
- check_context_matrix(context: ndarray)
Check and cast context matrix.
- Parameters:
context (np.ndarray of shape (n_samples, n_features)) – Matrix of contextual features.
- Returns:
context – Matrix of contextual features.
- Return type:
pandas DataFrame of shape (n_samples, n_features)
- classmethod cold_start(n_features: Annotated[int, Gt(gt=0)], hidden_dim_list: List[Annotated[int, Gt(gt=0)]] | None = None, update_method: Literal['VI', 'MCMC'] = 'MCMC', update_kwargs: dict | None = None, dist_params_init: Dict[str, float] | None = None, **kwargs) Self
Initialize a Bayesian Neural Network with a cold start.
- Parameters:
n_features (PositiveInt) – Number of input features for the network.
hidden_dim_list (Optional[List[PositiveInt]], optional) – List of dimensions for the hidden layers of the network. If None, no hidden layers are added.
update_method (UpdateMethods) – Method to update the network, either “MCMC” or “VI”. Default is “MCMC”.
update_kwargs (Optional[dict], optional) – Additional keyword arguments for the update method. Default is None.
dist_params_init (Optional[Dict[str, float]], optional) – Initial distribution parameters for the network weights and biases. Default is None.
**kwargs – Additional keyword arguments for the BayesianNeuralNetwork constructor.
- Returns:
An instance of the Bayesian Neural Network initialized with the specified parameters.
- Return type:
Self
- create_model(x: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], y: List[BinaryReward] | ndarray | None = None, is_predict: bool = False) Model
Create a PyMC model for Bayesian Neural Network.
This method builds a PyMC model with the network architecture specified in model_params. The model uses tanh activation for hidden layers and sigmoid for the output layer.
- Parameters:
x (ArrayLike) – Input features of shape (n_samples, n_features)
y (Union[List[BinaryReward], np.ndarray]) – Binary target values of shape (n_samples,)
is_predict (bool) – If True, process samples independently. If False, process all samples at once. In the predict step, we would like to sample the model parameters independently for each sample. In the update step, this is not required.
- Returns:
PyMC model object with the specified neural network architecture
- Return type:
PymcModel
Notes
The model structure follows these steps: 1. For each layer, create weight and bias variables from StudentT distributions. 2. Apply linear transformations and activations through the layers.
When is_sampelwise is True, the linear transformation is applied on each row separately (so random variables are not shared). When is_sampelwise is False, the linear transformation is applied on the whole matrix at once, so random variables are shared.
Apply sigmoid activation at the output
Use Bernoulli likelihood for binary classification
- classmethod create_model_params(n_features: Annotated[int, Gt(gt=0)], hidden_dim_list: List[Annotated[int, Gt(gt=0)]], **dist_params_init) BnnParams
Creates model parameters for a Bayesian neural network (BNN) model according to dist_params_init This method initializes the distribution’s parameters for each layer of a BNN using the specified number of features, hidden dimensions, and distribution initialization parameters.
- Parameters:
n_features (PositiveInt) – The number of input features for the BNN.
hidden_dim_list (List[PositiveInt]) – A list of integers specifying the number of hidden units in each hidden layer. If None, no hidden layers are added.
**dist_params_init (dict, optional) – Additional parameters for initializing the distribution of weights and biases.
- Returns:
An instance of BnnParams containing the initialized layer parameters.
- Return type:
- classmethod get_layer_params_name(layer_ind: Annotated[int, Gt(gt=0)]) Tuple[str, str]
- property input_dim: Annotated[int, Gt(gt=0)]
Returns the expected input dimension of the model.
- Returns:
The number of input features expected by the model, derived from the shape of the weight matrix in the first layer’s parameters.
- Return type:
int
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- sample_proba(context: ndarray) List[Tuple[Probability, float]]
Samples probabilities and weighted sums from the prior predictive distribution.
- Parameters:
context (ArrayLike) – The context matrix for which the probabilities are to be sampled.
- Returns:
Each element is a tuple containing the probability of a positive reward and the corresponding weighted sum between contextual feature quantities and sampled coefficients.
- Return type:
List[ProbabilityWeight]
- update_kwargs: dict | None
- update_method: str
- class pybandits.model.BaseBeta(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)
Bases:
Model
,ABC
Beta Distribution model for Bernoulli multi-armed bandits.
- Parameters:
n_successes (PositiveInt = 1) – Counter of the number of successes.
n_failures (PositiveInt = 1) – Counter of the number of failures.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- sample_proba(n_samples: Annotated[int, Gt(gt=0)]) List[Probability]
Sample the probability of getting a positive reward.
- Returns:
prob – Probability of getting a positive reward.
- Return type:
Probability
- property std: float
The corrected standard deviation (Bessel’s correction) of the binary distribution of successes and failures.
- class pybandits.model.BaseBetaMO(*, models: List[Beta])
Bases:
ModelMO
,ABC
Base beta Distribution model for Bernoulli multi-armed bandits with multi-objectives.
- Parameters:
models (List[Beta] of length (n_objectives,)) – List of Beta distributions.
- classmethod cold_start(n_objectives: Annotated[int, Gt(gt=0)], **kwargs) BetaMO
Utility function to create a Bayesian Logistic Regression model or child model with cost control, with default parameters.
It is modeled as:
y = sigmoid(alpha + beta1 * x1 + beta2 * x2 + … + betaN * xN)
where the alpha and betas coefficients are Student’s t-distributions.
- Parameters:
n_betas (PositiveInt) – The number of betas of the Bayesian Logistic Regression model. This is also the number of features expected after in the context matrix.
kwargs (Dict[str, Any]) – Additional arguments for the Bayesian Logistic Regression child model.
- Returns:
beta_mo – The multi-objective Beta model.
- Return type:
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- sample_proba(n_samples: Annotated[int, Gt(gt=0)]) List[List[Probability]]
Sample the probability of getting a positive reward.
- Parameters:
n_samples (PositiveInt) – Number of samples to draw.
- Returns:
prob – Probabilities of getting a positive reward for each sample and objective.
- Return type:
List[MOProbability]
- class pybandits.model.BayesianLogisticRegression(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)
Bases:
BayesianNeuralNetwork
A Bayesian Logistic Regression model that inherits from BayesianNeuralNetwork. This model is a specialized version of a Bayesian Neural Network with a single layer, designed specifically for logistic regression tasks. The model parameters are validated to ensure that the model adheres to this single-layer constraint.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- classmethod validate_model_params(model_params)
- class pybandits.model.BayesianLogisticRegressionCC(*, cost: Annotated[float, Ge(ge=0)], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)
Bases:
BayesianLogisticRegression
,ModelCC
A Bayesian Logistic Regression model with cost control.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- class pybandits.model.BayesianNeuralNetwork(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)
Bases:
BaseBayesianNeuralNetwork
Bayesian Neural Network class. This class implements a Bayesian Neural Network by extending the BaseBayesianNeuralNetwork. It provides functionality for probabilistic modeling and inference using neural networks.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- class pybandits.model.BayesianNeuralNetworkCC(*, cost: Annotated[float, Ge(ge=0)], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: str = 'MCMC', update_kwargs: dict | None = None)
Bases:
BaseBayesianNeuralNetwork
,ModelCC
Bayesian Neural Network model for binary classification with cost constraint.
This class implements a Bayesian Neural Network with an arbitrary number of fully connected layers using PyMC for binary classification tasks. It supports both Markov Chain Monte Carlo (MCMC) and Variational Inference (VI) methods for posterior inference.
References
Bayesian Learning for Neural Networks (Radford M. Neal, 1995) https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=db869fa192a3222ae4f2d766674a378e47013b1b
- Parameters:
model_params (BnnParams) – The parameters of the Bayesian Neural Network, including weights and biases for each layer and their initial values for resetting
update_method (str, optional) – The method used for posterior inference, either “MCMC” or “VI” (default is “MCMC”).
update_kwargs (Optional[dict], optional) – A dictionary of keyword arguments for the update method. For MCMC, it contains ‘trace’ settings. For VI, it contains both ‘trace’ and ‘fit’ settings.
cost (NonNegativeFloat) – Cost associated to the Bayesian Neural Network model.
Notes
The model uses tanh activation for hidden layers and sigmoid activation for the output layer.
The output layer is designed for binary classification tasks, with probabilities modeled using a Bernoulli likelihood.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None
This function is meant to behave like a BaseModel method to initialise private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- class pybandits.model.Beta(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)
Bases:
BaseBeta
Beta Distribution model for Bernoulli multi-armed bandits.
- Parameters:
n_successes (PositiveInt = 1) – Counter of the number of successes.
n_failures (PositiveInt = 1) – Counter of the number of failures.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pybandits.model.BetaCC(*, cost: Annotated[float, Ge(ge=0)], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)
-
Beta Distribution model for Bernoulli multi-armed bandits with cost control.
- Parameters:
n_successes (PositiveInt = 1) – Counter of the number of successes.
n_failures (PositiveInt = 1) – Counter of the number of failures.
cost (NonNegativeFloat) – Cost associated to the Beta distribution.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pybandits.model.BetaMO(*, models: List[Beta])
Bases:
BaseBetaMO
Beta Distribution model for Bernoulli multi-armed bandits with multi-objectives.
- Parameters:
models (List[Beta] of length (n_objectives,)) – List of Beta distributions.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pybandits.model.BetaMOCC(*, cost: Annotated[float, Ge(ge=0)], models: List[Beta])
Bases:
BaseBetaMO
,ModelCC
Beta Distribution model for Bernoulli multi-armed bandits with multi-objectives and cost control.
- Parameters:
models (List[BetaCC] of shape (n_objectives,)) – List of Beta distributions.
cost (NonNegativeFloat) – Cost associated to the Beta distribution.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pybandits.model.BnnLayerParams(*, weight: StudentTArray, bias: StudentTArray)
Bases:
PyBanditsBaseModel
Represents the parameters of a Bayesian neural network (BNN) layer.
- Parameters:
weight (StudentTArray) – The weight parameter of the BNN layer, represented as a StudentTArray.
bias (StudentTArray) – The bias parameter of the BNN layer, represented as a StudentTArray.
- bias: StudentTArray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- weight: StudentTArray
- class pybandits.model.BnnParams(*, bnn_layer_params: ~typing.List[~pybandits.model.BnnLayerParams] | None, bnn_layer_params_init: ~typing.List[~pybandits.model.BnnLayerParams] = <factory>)
Bases:
PyBanditsBaseModel
Represents the parameters of a Bayesian Neural Network (BNN), including both the current layer parameters and the initial layer parameters. We keep the init parameters in case we need to reset the model.
- Parameters:
bnn_layer_params (List[BnnLayerParams]) – A list of BNN layer parameters representing the current state of the model.
bnn_layer_params_init (List[BnnLayerParams]) – A list of BNN layer parameters representing the initial state of the model.
- bnn_layer_params: List[BnnLayerParams] | None
- bnn_layer_params_init: List[BnnLayerParams]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod validate_inputs(values)
- class pybandits.model.Model(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)
Bases:
BaseModelSO
,ABC
Class to model the prior distributions for single objective.
- Parameters:
n_successes (PositiveInt = 1) – Counter of the number of successes.
n_failures (PositiveInt = 1) – Counter of the number of failures.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- abstract sample_proba(**kwargs) List[Probability] | List[List[Probability]] | List[Tuple[Probability, float]]
Sample the probability of getting a positive reward.
- class pybandits.model.ModelCC(*, cost: Annotated[float, Ge(ge=0)])
Bases:
BaseModelCC
,ABC
Class to model action cost.
- Parameters:
cost (NonNegativeFloat) – Cost associated to the action.
- cost: Annotated[float, Ge(ge=0)]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pybandits.model.ModelMO(*, models: List[Model])
Bases:
BaseModelMO
,ABC
Class to model the prior distributions for multi-objective.
- Parameters:
models (List[Model]) – The list of models for each objective.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pybandits.model.StudentTArray(*, mu: List[float] | List[List[float]], sigma: List[Annotated[float, Ge(ge=0)]] | List[List[Annotated[float, Ge(ge=0)]]], nu: List[Annotated[float, Gt(gt=0)]] | List[List[Annotated[float, Gt(gt=0)]]])
Bases:
PyBanditsBaseModel
A class representing an array of Student’s t-distributions with parameters mu, sigma, and nu. A specific element (e.g, a single parameter of a layer) distribution is defined by the the corresponding elements in the lists. The mean values are represented by mu, the scale (standard deviation) values by sigma, and the degrees of freedom by nu.
- Parameters:
mu (Union[List[float], List[List[float]]]) – The mean values of the Student’s t-distributions. Can be a 1D (for the layer bias term) or 2D list (for the layer weight term).
sigma (Union[List[NonNegativeFloat], List[List[NonNegativeFloat]]]) – The scale (standard deviation) values of the Student’s t-distributions. Must be non-negative. Can be a 1D or 2D list.
nu (Union[List[PositiveFloat], List[List[PositiveFloat]]]) – The degrees of freedom of the Student’s t-distributions. Must be positive. Can be a 1D or 2D list.
- classmethod cold_start(shape: Annotated[int, Gt(gt=0)] | Tuple[Annotated[int, Gt(gt=0)], ...], mu: float = 0.0, sigma: Annotated[float, Ge(ge=0)] = 10.0, nu: Annotated[float, Gt(gt=0)] = 5.0) StudentTArray
- static convert_list_to_array(input_list: List[float] | List[List[float]]) bool
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- mu: List[float] | List[List[float]]
- nu: List[Annotated[float, Gt(gt=0)]] | List[List[Annotated[float, Gt(gt=0)]]]
- property params
- property shape: Tuple[Annotated[int, Gt(gt=0)], ...]
- sigma: List[Annotated[float, Ge(ge=0)]] | List[List[Annotated[float, Ge(ge=0)]]]
- classmethod validate_inputs(values)
pybandits.quantitative_model
- class pybandits.quantitative_model.BaseCmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None], base_model_cold_start_kwargs: Dict[str, Any])
Bases:
ZoomingModel
,ABC
Zooming model for CMAB.
- Parameters:
dimension (PositiveInt) – Number of parameters of the model.
comparison_threshold (Float01) – Comparison threshold.
segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.
n_comparison_points (PositiveInt) – Number of comparison points.
n_max_segments (PositiveInt) – Maximum number of segments.
sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[BayesianNeuralNetwork]]) – Mapping of segments to Bayesian Logistic Regression models.
base_model_cold_start_kwargs (Dict[str, Any]) – Keyword arguments for the base model cold start.
- base_model_cold_start_kwargs: Dict[str, Any]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_ZoomingModel__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None]
- classmethod validate_n_features(value)
- class pybandits.quantitative_model.BaseSmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None])
Bases:
ZoomingModel
,ABC
Zooming model for sMAB.
- Parameters:
dimension (PositiveInt) – Number of parameters of the model.
comparison_threshold (Float01) – Comparison threshold.
segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.
n_comparison_points (PositiveInt) – Number of comparison points.
n_max_segments (PositiveInt) – Maximum number of segments.
sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Beta]]) – Mapping of segments to Beta models.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_ZoomingModel__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- class pybandits.quantitative_model.CmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None], base_model_cold_start_kwargs: Dict[str, Any])
Bases:
BaseCmabZoomingModel
Zooming model for CMAB.
- Parameters:
dimension (PositiveInt) – Number of parameters of the model.
comparison_threshold (Float01) – Comparison threshold.
segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.
n_comparison_points (PositiveInt) – Number of comparison points.
n_max_segments (PositiveInt) – Maximum number of segments.
sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[BayesianNeuralNetwork]]) – Mapping of segments to Bayesian Logistic Regression models.
base_model_cold_start_kwargs (Dict[str, Any]) – Keyword arguments for the base model cold start.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_ZoomingModel__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- class pybandits.quantitative_model.CmabZoomingModelCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None], base_model_cold_start_kwargs: Dict[str, Any])
Bases:
BaseCmabZoomingModel
,QuantitativeModelCC
Zooming model for CMAB with cost control.
- Parameters:
dimension (PositiveInt) – Number of parameters of the model.
comparison_threshold (Float01) – Comparison threshold.
segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.
n_comparison_points (PositiveInt) – Number of comparison points.
n_max_segments (PositiveInt) – Maximum number of segments.
sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[BayesianNeuralNetwork]]) – Mapping of segments to Bayesian Logistic Regression models.
base_model_cold_start_kwargs (Dict[str, Any]) – Keyword arguments for the base model cold start.
cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost associated to the Beta distribution.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_ZoomingModel__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- class pybandits.quantitative_model.QuantitativeModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)])
Bases:
BaseModelSO
,ABC
Base class for quantitative models.
- Parameters:
dimension (PositiveInt) – Number of parameters of the model.
- dimension: Annotated[int, Gt(gt=0)]
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- abstract sample_proba(**kwargs) List[Tuple[Tuple[Tuple[Float_0_1, ...], Probability], ...]]
Sample the model.
- class pybandits.quantitative_model.QuantitativeModelCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]])
Bases:
BaseModelCC
,ABC
Class to model quantitative action cost.
- Parameters:
cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost associated to the Beta distribution.
- cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]]
- classmethod deserialize_cost(value)
Deserialize cost from string representation if needed.
- encode_cost(value)
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- static serialize_cost(cost_value) str
Serialize cost value to string representation.
- classmethod validate_cost(value)
Deserialize cost from string representation if needed.
- class pybandits.quantitative_model.Segment(*, intervals: Tuple[Tuple[Float_0_1, Float_0_1], ...])
Bases:
PyBanditsBaseModel
This class is used to represent a segment of the quantities space. A segment is defined by a list of intervals, thus representing a hyper rectangle.
- Parameters:
intervals (Tuple[Tuple[Float01, Float01], ...]) – Intervals of the segment.
- intervals: Tuple[Tuple[Float_0_1, Float_0_1], ...]
- property intervals_array: ndarray
- is_adjacent(other: Segment) bool
Check if two segments are adjacent. Segments are adjacent if they share a face,
meaning they have identical intervals in all dimensions except one, where they touch.
- Parameters:
other (Segment) – Segment to check for adjacency.
- Returns:
Whether the segments are adjacent.
- Return type:
bool
- property maxs: ndarray
- property mins: ndarray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod segment_intervals_to_tuple(value)
- class pybandits.quantitative_model.SmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None])
Bases:
BaseSmabZoomingModel
Zooming model for sMAB.
- Parameters:
dimension (PositiveInt) – Number of parameters of the model.
comparison_threshold (Float01) – Comparison threshold.
segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.
n_comparison_points (PositiveInt) – Number of comparison points.
n_max_segments (PositiveInt) – Maximum number of segments.
sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Beta]]) – Mapping of segments to Beta models.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_ZoomingModel__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- class pybandits.quantitative_model.SmabZoomingModelCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None])
Bases:
BaseSmabZoomingModel
,QuantitativeModelCC
Zooming model for sMAB with cost control.
- Parameters:
dimension (PositiveInt) – Number of parameters of the model.
comparison_threshold (Float01) – Comparison threshold.
segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.
n_comparison_points (PositiveInt) – Number of comparison points.
n_max_segments (PositiveInt) – Maximum number of segments.
sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Beta]]) – Mapping of segments to Beta models.
cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost associated to the Beta distribution.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_ZoomingModel__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- class pybandits.quantitative_model.ZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Model | None])
Bases:
QuantitativeModel
,ABC
This class is used to implement the zooming method. The approach is based on adaptive discretization of the quantitative action space. The space is represented s a hyper cube with a dimension number of dimensions. After each update step, the model checks if the segments are interesting or nuisance based on segment_update_factor. If a segment is interesting, it can be split to two segments. In contrast, adjacent nuisance segments can be merged based on comparison_threshold. The number of segments can be limited using n_max_segments.
References
Multi-Armed Bandits in Metric Spaces (Kleinberg, Slivkins, and Upfal, 2008) https://arxiv.org/pdf/0809.4882
- Parameters:
dimension (PositiveInt) – Number of parameters of the model.
comparison_threshold (Float01) – Comparison threshold.
segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.
n_comparison_points (PositiveInt) – Number of comparison points.
n_max_segments (PositiveInt) – Maximum number of segments.
sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Model]]) – Mapping of segments to models.
- classmethod cold_start(dimension: Annotated[int, Gt(gt=0)] = 1, comparison_threshold: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, **kwargs) Self
Create a cold start model.
- Returns:
Cold start model.
- Return type:
- comparison_threshold: Float_0_1
- classmethod deserialize_sub_actions(value)
Convert sub_actions from a dict with string keys (json representation) to tuple (object representation).
- dimension: Annotated[int, Gt(gt=0)]
- is_similar_performance(segment1: Segment, segment2: Segment) bool
Check if two segments have similar performance.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_ZoomingModel__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- n_comparison_points: Annotated[int, Gt(gt=0)]
- n_max_segments: Annotated[int, Gt(gt=0)] | None
- sample_proba(**kwargs) List[Tuple[Tuple[Tuple[Float_0_1, ...], Probability], ...]]
Sample an action value from each of the intervals.
- segment_update_factor: Float_0_1
- serialize_sub_actions(value)
pybandits.strategy
- class pybandits.strategy.BestActionIdentificationBandit(*, exploit_p: Float_0_1 | None = 0.5)
Bases:
Strategy
Best-Action Identification (BAI) strategy for multi-armed bandits.
References
Simple Bayesian Algorithms for Best-Arm Identification (Russo, 2018) https://arxiv.org/pdf/1602.08448.pdf
- Parameters:
exploit_p (Optional[Float01], 0.5 if not specified) – Tuning parameter taking value in [0, 1] which specifies the probability of selecting the best or an alternative action. If exploit_p is 1, the bandit always selects the action with the highest probability of getting a positive reward. That is, it behaves as a Greedy strategy. If exploit_p is 0, the bandit always select the action with 2nd highest probability of getting a positive reward.
- compare_best_actions(actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], Beta]) float
Compare the 2 best actions, hence the 2 actions with the highest expected means of getting a positive reward.
- Parameters:
actions (Dict[UnifiedActionId, Beta])
- Returns:
pvalue – p-value result of the statistical test.
- Return type:
float
- exploit_p: Float_0_1 | None
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod numerize_exploit_p(v)
- select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], float], actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], BaseModel] | None = None) ActionId | Tuple[ActionId, Tuple[float, ...]]
Select with probability self.exploit_p the best action (i.e. the action with the highest probability of getting a positive reward), and with probability 1-self.exploit_p it returns the second best action (i.e. the action with the second highest probability of getting a positive reward).
- Parameters:
p (Dict[UnifiedActionId, Probability]) – The dictionary of actions and their sampled probability of getting a positive reward.
actions (Optional[Dict[UnifiedActionId, BaseModel]]) – The dictionary of actions and their associated model.
- Returns:
selected_action – The selected action.
- Return type:
UnifiedActionId
- with_exploit_p(exploit_p: Float_0_1 | None) Self
Instantiate a mutated cost control bandit strategy with an altered subsidy factor.
- Parameters:
exploit_p (Optional[Float01], 0.5 if not specified) – Tuning parameter taking value in [0, 1] which specifies the probability of selecting the best or an alternative action. If exploit_p is 1, the bandit always selects the action with the highest probability of getting a positive reward. That is, it behaves as a Greedy strategy. If exploit_p is 0, the bandit always select the action with 2nd highest probability of getting a positive reward.
- Returns:
mutated_best_action_identification – The mutated best action identification strategy.
- Return type:
- class pybandits.strategy.ClassicBandit
Bases:
Strategy
Classic multi-armed bandits strategy.
References
Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf
Thompson Sampling for Contextual Bandits with Linear Payoffs (Agrawal and Goyal, 2014) https://arxiv.org/pdf/1209.3352.pdf
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], float], actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], BaseModel] | None = None) ActionId | Tuple[ActionId, Tuple[float, ...]]
Select the action with the highest probability of getting a positive reward.
- Parameters:
p (Dict[UnifiedActionId, Probability]) – The dictionary of actions and their sampled probability of getting a positive reward.
actions (Optional[Dict[UnifiedActionId, BaseModel]]) – The dictionary of actions and their associated model.
- Returns:
selected_action – The selected action.
- Return type:
UnifiedActionId
- class pybandits.strategy.CostControlBandit(*, subsidy_factor: Float_0_1 | None = 0.5)
Bases:
CostControlStrategy
Cost Control (CC) strategy for multi-armed bandits.
Bandits are extended to include a control of the action cost. Each action is associated with a predefined “cost”. At prediction time, the model considers the actions whose expected rewards are above a pre-defined lower bound. Among these actions, the one with the lowest associated cost is recommended. The expected reward interval for feasible actions is defined as [(1-subsidy_factor)*max_p, max_p], where max_p is the highest expected reward sampled value.
References
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638
Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488
- Parameters:
subsidy_factor (Optional[Float01], 0.5 if not specified) – Number in [0, 1] to define smallest tolerated probability reward, hence the set of feasible actions. If subsidy_factor is 1, the bandits always selects the action with the minimum cost. If subsidy_factor is 0, the bandits always selects the action with highest probability of getting a positive reward (it behaves as a classic Bernoulli bandit).
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod numerize_subsidy_factor(v)
- select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], Probability], actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], BaseModel]) ActionId | Tuple[ActionId, Tuple[float, ...]]
Select the action with the minimum cost among the set of feasible actions (the actions whose expected rewards are above a certain lower bound defined as [(1-subsidy_factor)*max_p, max_p], where max_p is the highest expected reward sampled value.
- Parameters:
p (Dict[UnifiedActionId, Probability]) – The dictionary or actions and their sampled probability of getting a positive reward.
actions (Dict[UnifiedActionId, BetaCC]) – The dictionary or actions and their cost.
- Returns:
selected_action – The selected action.
- Return type:
UnifiedActionId
- subsidy_factor: Float_0_1 | None
- with_subsidy_factor(subsidy_factor: Float_0_1 | None) Self
Instantiate a mutated cost control bandit strategy with an altered subsidy factor.
- Parameters:
subsidy_factor (Optional[Float01], 0.5 if not specified) – Number in [0, 1] to define smallest tolerated probability reward, hence the set of feasible actions. If subsidy_factor is 1, the bandits always selects the action with the minimum cost. If subsidy_factor is 0, the bandits always selects the action with highest probability of getting a positive reward (it behaves as a classic Bernoulli bandit).
- Returns:
mutated_cost_control_bandit – The mutated cost control bandit strategy.
- Return type:
- class pybandits.strategy.CostControlStrategy
Bases:
Strategy
,ABC
Cost Control (CC) strategy for multi-armed bandits.
Bandits are extended to include a control of the action cost. Each action is associated with a predefined “cost”.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pybandits.strategy.MultiObjectiveBandit
Bases:
MultiObjectiveStrategy
Multi-Objective (MO) strategy for multi-armed bandits.
The reward pertaining to an action is a multidimensional vector instead of a scalar value. In this setting, different actions are compared according to Pareto order between their expected reward vectors, and those actions whose expected rewards are not inferior to that of any other actions are called Pareto optimal actions, all of which constitute the Pareto front.
References
Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem (Yahyaa and Manderick, 2015) https://www.researchgate.net/publication/272823659_Thompson_Sampling_for_Multi-Objective_Multi-Armed_Bandits_Problem
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], List[Probability]], **kwargs) ActionId | Tuple[ActionId, Tuple[float, ...]]
Select an action at random from the Pareto optimal set of action. The Pareto optimal action set (Pareto front) A* is the set of actions not dominated by any other actions not in A*. Dominance relation is established based on the objective reward probabilities vectors.
- Parameters:
p (Dict[ActionId, List[Probability]]) – The dictionary of actions and their sampled probability of getting a positive reward for each objective.
- Returns:
selected_action – The selected action.
- Return type:
ActionId
- class pybandits.strategy.MultiObjectiveCostControlBandit
Bases:
MultiObjectiveStrategy
,CostControlStrategy
Multi-Objective (MO) with Cost Control (CC) strategy for multi-armed bandits.
This strategy allows the reward to be a multidimensional vector and include a control of the action cost. It merges the Multi-Objective and Cost Control strategies.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], List[Probability]], actions: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], BetaMOCC]) ActionId | Tuple[ActionId, Tuple[float, ...]]
Select the action with the minimum cost among the Pareto optimal set of action. The Pareto optimal action set (Pareto front) A* is the set of actions not dominated by any other actions not in A*. Dominance relation is established based on the objective reward probabilities vectors.
- Parameters:
p (Dict[UnifiedActionId, List[Probability]]) – The dictionary of actions and their sampled probability of getting a positive reward for each objective.
- Returns:
selected_action – The selected action.
- Return type:
UnifiedActionId
- class pybandits.strategy.MultiObjectiveStrategy
Bases:
Strategy
,ABC
Multi Objective Strategy to select actions in multi-armed bandits.
- classmethod get_pareto_front(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], List[Probability]]) List[ActionId | Tuple[ActionId, Tuple[float, ...]]]
Create Pareto optimal set of actions (Pareto front) A* identified as actions that are not dominated by any action out of the set A*.
Parameters:
- p: Dict[UnifiedActionId, Probability]
The dictionary or actions and their sampled probability of getting a positive reward for each objective.
- returns:
pareto_front – The list of Pareto optimal actions
- rtype:
set
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pybandits.strategy.Strategy
Bases:
PyBanditsBaseModel
,ABC
Strategy to select actions in multi-armed bandits.
- classmethod get_expected_value_from_state(state: Dict[str, Any], field_name: str) float
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- classmethod numerize_field(v, field_name: str)
- abstract select_action(p: Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], Probability], actions: Dict[ActionId, BaseModel] | None) ActionId | Tuple[ActionId, Tuple[float, ...]]
Select the action.
- pybandits.strategy.random() x in the interval [0, 1).
pybandits.actions_manager
- class pybandits.actions_manager.ActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)
Bases:
PyBanditsBaseModel
,ABC
Base class for managing actions and their associated models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update. The change point detection is based on the adaptive windowing scheme.
References
Scaling Multi-Armed Bandit Algorithms (Fouché et al., 2019) https://edouardfouche.com/publications/S-MAB_FOUCHE_KDD19.pdf
- Parameters:
actions (Dict[ActionId, Model]) – The list of possible actions, and their associated Model.
delta (Optional[PositiveProbability]) – The confidence level for the adaptive window. None for skipping the change point detection.
- actions: Dict[ActionId, BaseModel]
- actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]]
- classmethod at_least_one_action_is_defined(v)
- delta: PositiveProbability | None
- property maximum_memory_length: Annotated[int, Ge(ge=0)]
Get maximum possible memory length based on current action statistics.
- Returns:
Maximum memory length allowed.
- Return type:
NonNegativeInt
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None = None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, **kwargs)
Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.
- Parameters:
actions (List[ActionId]) – The selected action for each sample.
rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
- class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)
Bases:
ActionsManager
,BaseModel
,Generic
[CmabModelType
]Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.
- Parameters:
actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.
delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.
- actions: Dict[ActionId, CmabModelType]
- classmethod check_models(v)
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)
Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.
- Parameters:
actions (List[ActionId]) – The selected action for each sample.
rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.
- pybandits.actions_manager.CmabActionsManagerCC
alias of
CmabActionsManager[Union[BayesianNeuralNetworkCC, CmabZoomingModelCC]]
- pybandits.actions_manager.CmabActionsManagerSO
alias of
CmabActionsManager[Union[BayesianNeuralNetwork, CmabZoomingModel]]
- class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)
Bases:
ActionsManager
,BaseModel
,Generic
[CmabModelType
]Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.
- Parameters:
actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.
delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.
- actions: Dict[ActionId, CmabModelType]
- actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
- classmethod check_models(v)
- delta: PositiveProbability | None
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)
Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.
- Parameters:
actions (List[ActionId]) – The selected action for each sample.
rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.
- class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)
Bases:
ActionsManager
,BaseModel
,Generic
[CmabModelType
]Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.
- Parameters:
actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.
delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.
- actions: Dict[ActionId, CmabModelType]
- actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
- classmethod check_models(v)
- delta: PositiveProbability | None
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None)
Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.
- Parameters:
actions (List[ActionId]) – The selected action for each sample.
rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.
- class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)
Bases:
ActionsManager
,BaseModel
,Generic
[SmabModelType
]Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.
- Parameters:
actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.
delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.
- actions: Dict[ActionId, SmabModelType]
- classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)
Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.
- Parameters:
actions (List[ActionId]) – The selected action for each sample.
rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
- pybandits.actions_manager.SmabActionsManagerCC
alias of
SmabActionsManager[Union[BetaCC, SmabZoomingModelCC]]
- pybandits.actions_manager.SmabActionsManagerMO
alias of
SmabActionsManager[BetaMO]
- pybandits.actions_manager.SmabActionsManagerMOCC
alias of
SmabActionsManager[BetaMOCC]
- pybandits.actions_manager.SmabActionsManagerSO
alias of
SmabActionsManager[Union[Beta, SmabZoomingModel]]
- class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)
Bases:
ActionsManager
,BaseModel
,Generic
[SmabModelType
]Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.
- Parameters:
actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.
delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.
- actions: Dict[ActionId, SmabModelType]
- actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
- classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
- delta: PositiveProbability | None
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)
Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.
- Parameters:
actions (List[ActionId]) – The selected action for each sample.
rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
- class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)
Bases:
ActionsManager
,BaseModel
,Generic
[SmabModelType
]Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.
- Parameters:
actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.
delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.
- actions: Dict[ActionId, SmabModelType]
- actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
- classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
- delta: PositiveProbability | None
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)
Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.
- Parameters:
actions (List[ActionId]) – The selected action for each sample.
rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
- class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)
Bases:
ActionsManager
,BaseModel
,Generic
[SmabModelType
]Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.
- Parameters:
actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.
delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.
- actions: Dict[ActionId, SmabModelType]
- actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
- classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
- delta: PositiveProbability | None
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)
Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.
- Parameters:
actions (List[ActionId]) – The selected action for each sample.
rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
- class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)
Bases:
ActionsManager
,BaseModel
,Generic
[SmabModelType
]Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.
- Parameters:
actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.
delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.
- actions: Dict[ActionId, SmabModelType]
- actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
- classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
- delta: PositiveProbability | None
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)
Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.
- Parameters:
actions (List[ActionId]) – The selected action for each sample.
rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.
quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.
actions_memory (Optional[List[ActionId]]) – List of previously selected actions.
rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.
pybandits.smab_simulator
- class pybandits.smab_simulator.SmabSimulator(*, smab: BaseSmabBernoulli, n_updates: Annotated[int, Gt(gt=0)] = 10, batch_size: Annotated[int, Gt(gt=0)] = 100, probs_reward: Dict[ActionId, Probability | Callable[[ndarray], Probability]] | Dict[str, Dict[ActionId, Probability | Callable[[ndarray], Probability]]] | None = None, save: bool = False, path: str = '', file_prefix: str = '', random_seed: Annotated[int, Ge(ge=0)] | None = None, verbose: bool = False, visualize: bool = False)
Bases:
Simulator
Simulate environment for stochastic multi-armed bandits.
This class performs simulation of stochastic Multi-Armed Bandits (sMAB). Data are processed in batches of size n>=1. Per each batch of simulated samples, the mab selects one action and collects the corresponding simulated reward for each sample. Then, prior parameters are updated based on returned rewards from recommended actions.
- Parameters:
mab (BaseSmabBernoulli) – sMAB model.
- mab: BaseSmabBernoulli
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_Simulator__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- probs_reward: Dict[ActionId, Probability | Callable[[ndarray], Probability]] | Dict[str, Dict[ActionId, Probability | Callable[[ndarray], Probability]]] | None
- classmethod replace_null_and_validate_probs_reward(values)
- classmethod validate_probs_reward_columns(values)
pybandits.cmab_simulator
- class pybandits.cmab_simulator.CmabSimulator(*, cmab: BaseCmabBernoulli, n_updates: Annotated[int, Gt(gt=0)] = 10, batch_size: Annotated[int, Gt(gt=0)] = 100, probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None = None, save: bool = False, path: str = '', file_prefix: str = '', random_seed: Annotated[int, Ge(ge=0)] | None = None, verbose: bool = False, visualize: bool = False, context: ndarray, group: List | None = None)
Bases:
Simulator
Simulate environment for contextual multi-armed bandit models.
This class simulates information required by the contextual bandit. Generated data are processed by the bandit with batches of size n>=1. For each batch of samples, actions are recommended by the bandit and corresponding simulated rewards collected. Bandit policy parameters are then updated based on returned rewards from recommended actions.
- Parameters:
mab (BaseCmabBernoulli) – Contextual multi-armed bandit model
context (np.ndarray of shape (n_samples, n_feature)) – Context matrix of samples features.
group (Optional[List] with length=n_samples) – Group to which each sample belongs. Samples which belongs to the same group have features that come from the same distribution and they have the same probability to receive a positive/negative feedback from each action. If not supplied, all samples are assigned to the group.
- context: ndarray
- group: List | None
- mab: BaseCmabBernoulli
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_Simulator__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None
- classmethod replace_nulls_and_validate_sizes_and_dtypes(values)
- classmethod validate_probs_reward_columns(values)
pybandits.offline_policy_evaluator
- class pybandits.cmab_simulator.CmabSimulator(*, cmab: BaseCmabBernoulli, n_updates: Annotated[int, Gt(gt=0)] = 10, batch_size: Annotated[int, Gt(gt=0)] = 100, probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None = None, save: bool = False, path: str = '', file_prefix: str = '', random_seed: Annotated[int, Ge(ge=0)] | None = None, verbose: bool = False, visualize: bool = False, context: ndarray, group: List | None = None)
Bases:
Simulator
Simulate environment for contextual multi-armed bandit models.
This class simulates information required by the contextual bandit. Generated data are processed by the bandit with batches of size n>=1. For each batch of samples, actions are recommended by the bandit and corresponding simulated rewards collected. Bandit policy parameters are then updated based on returned rewards from recommended actions.
- Parameters:
mab (BaseCmabBernoulli) – Contextual multi-armed bandit model
context (np.ndarray of shape (n_samples, n_feature)) – Context matrix of samples features.
group (Optional[List] with length=n_samples) – Group to which each sample belongs. Samples which belongs to the same group have features that come from the same distribution and they have the same probability to receive a positive/negative feedback from each action. If not supplied, all samples are assigned to the group.
- batch_size: PositiveInt
- context: ndarray
- file_prefix: str
- group: List | None
- mab: BaseCmabBernoulli
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_Simulator__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- n_updates: PositiveInt
- path: str
- probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None
- random_seed: NonNegativeInt | None
- classmethod replace_nulls_and_validate_sizes_and_dtypes(values)
- save: bool
- classmethod validate_probs_reward_columns(values)
- verbose: bool
- visualize: bool
pybandits.offline_policy_estimator
Comprehensive Offline Policy Evaluation (OPE) estimators.
This module provides a complete set of estimators for OPE.
- class pybandits.offline_policy_estimator.BalancedInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
GeneralizedInverseProbabilityWeighting
Balanced Inverse Probability Weighing estimator.
References
Balanced Off-Policy Evaluation in General Action Spaces (Sondhi, Arbour, and Dimmery, 2020) https://arxiv.org/pdf/1906.03694
- Parameters:
alpha (Float01, defaults to 0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, defaults to 10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, defaults to None) – Random seed for bootstrap sampling.
----------
Sondhi (Arjun)
Arbour (David)
Dimmery (and Drew)
Spaces." ("Balanced Off-Policy Evaluation in General Action)
2020.
- estimate_sample_rewards(reward: ndarray, expected_importance_weight: ndarray, **kwargs) ndarray
Estimate the sample rewards.
- Parameters:
reward (np.ndarray) – Array of rewards corresponding to each action.
expected_importance_weight (np.ndarray) – Array of expected importance weights.
- Returns:
sample_reward – Estimated rewards.
- Return type:
np.ndarray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: ClassVar = 'b-ipw'
- class pybandits.offline_policy_estimator.BaseOfflinePolicyEstimator(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
PyBanditsBaseModel
,ABC
Base class for all OPE estimators.
This class defines the interface for all OPE estimators and provides a common method for estimating the policy value.
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
- alpha: Float_0_1
- estimate_policy_value_with_confidence_interval(**kwargs) Tuple[float, float, float, float]
Estimate the policy value with a confidence interval.
- Parameters:
action (np.ndarray) – Array of actions taken.
- Returns:
Estimated policy value, mean, lower bound, and upper bound of the confidence interval.
- Return type:
Tuple[float, float, float, float]
- abstract estimate_sample_rewards(**kwargs) ndarray
Estimate sample rewards.
- Returns:
Estimated sample rewards.
- Return type:
np.ndarray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- n_bootstrap_samples: int
- name: ClassVar
- random_state: int | None
- class pybandits.offline_policy_estimator.DirectMethod(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
BaseOfflinePolicyEstimator
Direct Method (DM) estimator.
This estimator uses the evaluation policy to Estimate the sample rewards.
References
The Offset Tree for Learning with Partial Labels (Beygelzimer and Langford, 2009) https://arxiv.org/pdf/0812.4044
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
- estimate_sample_rewards(estimated_policy: ndarray, expected_reward: ndarray, **kwargs) ndarray
Estimate the sample rewards.
- Parameters:
estimated_policy (np.ndarray) – Array of action distributions.
expected_reward (np.ndarray) – Array of expected rewards.
- Returns:
sample_reward – Estimated sample rewards.
- Return type:
np.ndarray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: ClassVar = 'dm'
- class pybandits.offline_policy_estimator.DoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
GeneralizedDoublyRobust
Doubly Robust (DR) estimator.
Doubly Robust Policy Evaluation and Optimization (Dudík, Erhan, Langford, and Li, 2014) https://arxiv.org/pdf/1503.02834
More Robust Doubly Robust Off-policy Evaluation (Farajtabar, Chow, and Ghavamzadeh, 2018) https://arxiv.org/pdf/1802.03493
- alphaFloat01, default=0.05
Significance level for confidence interval estimation.
- n_bootstrap_samplesint, default=10000
Number of bootstrap samples for confidence interval estimation.
- random_stateint, default=None
Random seed for bootstrap sampling.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_GeneralizedDoublyRobust__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- name: ClassVar = 'dr'
- class pybandits.offline_policy_estimator.DoublyRobustWithOptimisticShrinkage(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, shrinkage_factor: Annotated[float, Ge(ge=0)] = 0.0)
Bases:
DoublyRobust
Optimistic version of DRos estimator.
This estimator uses a shrinkage factor to shrink the importance weight in the native DR.
References
Doubly Robust Off-Policy Evaluation with Shrinkage (Su, Dimakopoulou, Krishnamurthy, and Dudik, 2020) https://arxiv.org/pdf/1907.09623
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
shrinkage_factor (float, default=0.0) – Shrinkage factor for the importance weights. If set to 0 or infinity, the estimator is equivalent to the native DM or DR estimators, respectively.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_GeneralizedDoublyRobust__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- name: ClassVar = 'dros-opt'
- shrinkage_factor: Annotated[float, Ge(ge=0)]
- class pybandits.offline_policy_estimator.DoublyRobustWithPessimisticShrinkage(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, shrinkage_factor: Annotated[float, Gt(gt=0)] = inf)
Bases:
DoublyRobust
Pessimistic version of DRos estimator.
This estimator uses a shrinkage factor to shrink the importance weight in the native DR.
References
Doubly Robust Off-Policy Evaluation with Shrinkage (Su, Dimakopoulou, Krishnamurthy, and Dudik, 2020) https://arxiv.org/pdf/1907.09623
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
shrinkage_factor (float, default=0.0) – Shrinkage factor for the importance weights.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_GeneralizedDoublyRobust__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- name: ClassVar = 'dros-pess'
- shrinkage_factor: Annotated[float, Gt(gt=0)]
- class pybandits.offline_policy_estimator.GeneralizedDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
BaseOfflinePolicyEstimator
,ABC
Abstract generalization of the Doubly Robust (DR) estimator.
References
Doubly Robust Policy Evaluation and Optimization (Dudík, Erhan, Langford, and Li, 2014) https://arxiv.org/pdf/1503.02834
More Robust Doubly Robust Off-policy Evaluation (Farajtabar, Chow, and Ghavamzadeh, 2018) https://arxiv.org/pdf/1802.03493
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
- estimate_sample_rewards(action: ndarray, reward: ndarray, propensity_score: ndarray, estimated_policy: ndarray, expected_reward: ndarray, **kwargs) ndarray
Estimate the sample rewards.
- Parameters:
action (np.ndarray) – Array of actions taken.
reward (np.ndarray) – Array of rewards corresponding to each action.
propensity_score (np.ndarray) – Array of propensity scores.
estimated_policy (np.ndarray) – Array of action distributions.
expected_reward (np.ndarray) – Array of expected rewards.
- Returns:
sample_reward – Estimated rewards.
- Return type:
np.ndarray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_GeneralizedDoublyRobust__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- class pybandits.offline_policy_estimator.GeneralizedInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
BaseOfflinePolicyEstimator
,ABC
Abstract generalization of the Inverse Probability Weighting (IPW) estimator.
References
Learning from Logged Implicit Exploration Data (Strehl, Langford, Li, and Kakade, 2010) https://arxiv.org/pdf/1003.0120
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
- estimate_sample_rewards(reward: ndarray, shrinkage_method: Callable | None, **kwargs) ndarray
Estimate the sample rewards.
- Parameters:
reward (np.ndarray) – Array of rewards corresponding to each action.
shrinkage_method (Optional[Callable]) – Shrinkage method for the importance weights.
- Returns:
sample_reward – Estimated sample rewards.
- Return type:
np.ndarray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pybandits.offline_policy_estimator.InverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
GeneralizedInverseProbabilityWeighting
Inverse Probability Weighing (IPW) estimator.
References
Learning from Logged Implicit Exploration Data (Strehl, Langford, Li, and Kakade, 2010) https://arxiv.org/pdf/1003.0120
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
- estimate_sample_rewards(action: ndarray, reward: ndarray, propensity_score: ndarray, estimated_policy: ndarray, shrinkage_method: Callable | None = None, **kwargs) ndarray
Estimate the sample rewards.
- Parameters:
action (np.ndarray) – Array of actions taken.
reward (np.ndarray) – Array of rewards corresponding to each action.
propensity_score (np.ndarray) – Array of propensity scores.
estimated_policy (np.ndarray) – Array of action distributions.
- Returns:
sample_reward – Estimated sample rewards.
- Return type:
np.ndarray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: ClassVar = 'ipw'
- class pybandits.offline_policy_estimator.ReplayMethod(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
BaseOfflinePolicyEstimator
Replay Method estimator.
This estimator is a simple baseline that estimates the policy value by averaging the rewards of the matched samples.
References
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms (Li, Chu, Langford, and Wang, 2011) https://arxiv.org/pdf/1003.5956
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
- estimate_sample_rewards(action: ndarray, reward: ndarray, estimated_policy: ndarray, **kwargs) ndarray
Estimate the sample rewards.
- Parameters:
action (np.ndarray) – Array of actions taken.
reward (np.ndarray) – Array of rewards corresponding to each action.
estimated_policy (np.ndarray) – Array of action distributions.
- Returns:
sample_reward – Estimated sample rewards.
- Return type:
np.ndarray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: ClassVar = 'rep'
- class pybandits.offline_policy_estimator.SelfNormalizedDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
GeneralizedDoublyRobust
Self-Normalized Doubly Robust (SNDR) estimator.
This estimator uses the self-normalized importance weights to combine the DR and IPS estimators.
References
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning (Kallus and Uehara, 2019) https://arxiv.org/pdf/1906.03735
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_GeneralizedDoublyRobust__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- name: ClassVar = 'sndr'
- class pybandits.offline_policy_estimator.SelfNormalizedInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
InverseProbabilityWeighting
Self-Normalized Inverse Propensity Score (SNIPS) estimator.
References
The Self-normalized Estimator for Counterfactual Learning (Swaminathan and Joachims, 2015) https://papers.nips.cc/paper_files/paper/2015/file/39027dfad5138c9ca0c474d71db915c3-Paper.pdf
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, default=None) – Random seed for bootstrap sampling.
- estimate_sample_rewards(action: ndarray, reward: ndarray, propensity_score: ndarray, estimated_policy: ndarray, shrinkage_method: Callable | None = None, **kwargs) ndarray
Estimate the sample rewards.
- Parameters:
action (np.ndarray) – Array of actions taken.
reward (np.ndarray) – Array of rewards corresponding to each action.
propensity_score (np.ndarray) – Array of propensity scores.
estimated_policy (np.ndarray) – Array of action distributions.
shrinkage_method (Optional[Callable]) – Shrinkage method for the importance weights.
- Returns:
sample_reward – Estimated sample rewards.
- Return type:
np.ndarray
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: ClassVar = 'snips'
- class pybandits.offline_policy_estimator.SubGaussianDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)
Bases:
GeneralizedDoublyRobust
SubGaussian Doubly Robust estimator.
References
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning (Metelli, Russo, and Restelli, 2021) https://proceedings.neurips.cc/paper_files/paper/2021/file/4476b929e30dd0c4e8bdbcc82c6ba23a-Paper.pdf
- Parameters:
alpha (Float01, defaults to 0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, defaults to 10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, defaults to None) – Random seed for bootstrap sampling.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_GeneralizedDoublyRobust__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- name: ClassVar = 'sg-dr'
- class pybandits.offline_policy_estimator.SubGaussianInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, shrinkage_factor: Float_0_1 = 0.0)
Bases:
InverseProbabilityWeighting
SubGaussian Inverse Probability Weighing estimator.
References
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning (Metelli, Russo, and Restelli, 2021) https://proceedings.neurips.cc/paper_files/paper/2021/file/4476b929e30dd0c4e8bdbcc82c6ba23a-Paper.pdf
- Parameters:
alpha (Float01, defaults to 0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, defaults to 10000) – Number of bootstrap samples for confidence interval estimation.
random_state (int, defaults to None) – Random seed for bootstrap sampling.
shrinkage_factor (Float01, defaults to 0.0) – Shrinkage factor for the importance weights.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: ClassVar = 'sg-ipw'
- shrinkage_factor: Float_0_1
- class pybandits.offline_policy_estimator.SwitchDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, switch_threshold: float = inf)
Bases:
DoublyRobust
Switch Doubly Robust (Switch-DR) estimator.
This estimator uses a switching rule based on the propensity score to combine the DR and IPS estimators.
References
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits (Wang, Agarwal, and Dudik, 2017) https://arxiv.org/pdf/1507.02646
- Parameters:
alpha (Float01, default=0.05) – Significance level for confidence interval estimation.
n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.
random_state (Optional[int], default=None) – Random seed for bootstrap sampling.
switch_threshold (float, default=inf) – Threshold for the importance weight to switch between the DR and IPS estimators.
- model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(_GeneralizedDoublyRobust__context: Any) None
Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.
- name: ClassVar = 'switch-dr'
- switch_threshold: float