pybandits

pybandits.smab

class pybandits.smab.BaseSmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BaseBeta | BaseSmabZoomingModel], strategy: BaseStrategy)

Bases: BaseMab, ABC

Base model for a Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling.

Parameters:
  • actions (Dict[ActionId, Union[BaseBeta, BaseSmabZoomingModel]]) – The list of possible actions, and their associated Model.

  • strategy (Strategy) – The strategy used to select actions.

actions_manager: SmabActionsManager[BaseBeta | BaseSmabZoomingModel]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

predict(n_samples: Annotated[int, Gt(gt=0)] = 1, forbidden_actions: Set[ActionId] | None = None) SmabPredictions

Predict actions.

Parameters:
  • n_samples (PositiveInt, default=1) – Number of samples to predict.

  • forbidden_actions (Optional[Set[ActionId]], default=None) – Set of forbidden actions. If specified, the model will discard the forbidden_actions and it will only consider the remaining allowed_actions. By default, the model considers all actions as allowed_actions. Note that: actions = allowed_actions U forbidden_actions.

Returns:

  • actions (List[UnifiedActionId]) – The actions selected by the multi-armed bandit model.

  • probs (Union[List[Dict[UnifiedActionId, Probability]], List[Dict[UnifiedActionId, MOProbability]]]) – The probabilities of getting a positive reward for each action.

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None = None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the stochastic Bernoulli bandit given the list of selected actions and their corresponding binary rewards.

Parameters:
  • actions (List[ActionId] of shape (n_samples,), e.g. ['a1', 'a2', 'a3', 'a4', 'a5']) – The selected action for each sample.

  • rewards (List[Union[BinaryReward, List[BinaryReward]]] of shape (n_samples, n_objectives)) –

    The binary reward for each sample.
    If strategy is not MultiObjectiveBandit, rewards should be a list, e.g.

    rewards = [1, 0, 1, 1, 1, …]

    If strategy is MultiObjectiveBandit, rewards should be a list of list, e.g. (with n_objectives=2):

    rewards = [[1, 1], [1, 0], [1, 1], [1, 0], [1, 1], …]

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.smab.SmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[Beta | SmabZoomingModel], strategy: ClassicBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling.

References

Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

Parameters:
  • actions_manager (SmabActionsManagerSO) – The manager for actions and their associated models.

  • strategy (ClassicBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[Beta | SmabZoomingModel]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: ClassicBandit
class pybandits.smab.SmabBernoulliBAI(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[Beta | SmabZoomingModel], strategy: BestActionIdentificationBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling, and Best Action Identification strategy.

References

Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

Parameters:
  • actions_manager (SmabActionsManagerSO) – The manager for actions and their associated models.

  • strategy (BestActionIdentificationBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[Beta | SmabZoomingModel]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: BestActionIdentificationBandit
class pybandits.smab.SmabBernoulliCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BetaCC | SmabZoomingModelCC], strategy: CostControlBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling, and Cost Control strategy.

The sMAB is extended to include a control of the action cost. Each action is associated with a predefined “cost”. At prediction time, the model considers the actions whose expected rewards is above a pre-defined lower bound. Among these actions, the one with the lowest associated cost is recommended. The expected reward interval for feasible actions is defined as [(1-subsidy_factor) * max_p, max_p], where max_p is the highest expected reward sampled value.

References

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638

Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488

Parameters:
  • actions_manager (SmabActionsManagerCC) – The manager for actions and their associated models.

  • strategy (CostControlBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[BetaCC | SmabZoomingModelCC]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: CostControlBandit
class pybandits.smab.SmabBernoulliMO(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BetaMO], strategy: MultiObjectiveBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling, and Multi-Objectives strategy.

The reward pertaining to an action is a multidimensional vector instead of a scalar value. In this setting, different actions are compared according to Pareto order between their expected reward vectors, and those actions whose expected rewards are not inferior to that of any other actions are called Pareto optimal actions, all of which constitute the Pareto front.

References

Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem (Yahyaa and Manderick, 2015) https://www.researchgate.net/publication/272823659_Thompson_Sampling_for_Multi-Objective_Multi-Armed_Bandits_Problem

Parameters:
  • actions_manager (SmabActionsManagerMO) – The manager for actions and their associated models.

  • strategy (MultiObjectiveBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[BetaMO]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: MultiObjectiveBandit
class pybandits.smab.SmabBernoulliMOCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: SmabActionsManager[BetaMOCC], strategy: MultiObjectiveCostControlBandit)

Bases: BaseSmabBernoulli

Stochastic Bernoulli Multi-Armed Bandit with Thompson Sampling implementation for Multi-Objective (MO) with Cost Control (CC) strategy.

This Bandit allows the reward to be a multidimensional vector and include a control of the action cost. It merges the Multi-Objective and Cost Control strategies.

Parameters:
  • actions_manager (SmabActionsManagerMOCC) – The manager for actions and their associated models.

  • strategy (MultiObjectiveCostControlBandit) – The strategy used to select actions.

actions_manager: SmabActionsManager[BetaMOCC]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

strategy: MultiObjectiveCostControlBandit

pybandits.cmab

class pybandits.cmab.BaseCmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager, strategy: BaseStrategy)

Bases: BaseMab, ABC

Base model for a Contextual Multi-Armed Bandit for Bernoulli bandits with Thompson Sampling.

Parameters:
  • actions (Dict[ActionId, Union[BaseBayesianLogisticRegression, BaseQuantitativeBayesianNeuralNetwork]]) – The list of possible actions, and their associated Model.

  • strategy (Strategy) – The strategy used to select actions.

actions_manager: CmabActionsManager
property input_dim: int

Returns the input feature dimension (number of context features).

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

predict(context: ndarray, forbidden_actions: Set[ActionId] | None = None) CmabPredictions

Predict actions.

Parameters:
  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • forbidden_actions (Optional[Set[ActionId]], default=None) – Set of forbidden actions. If specified, the model will discard the forbidden_actions and it will only consider the remaining allowed_actions. By default, the model considers all actions as allowed_actions. Note that: actions = allowed_actions U forbidden_actions.

Returns:

  • actions (List[ActionId] of shape (n_samples,)) – The actions selected by the multi-armed bandit model.

  • probs (Union[List[Dict[UnifiedActionId, Probability]], List[Dict[UnifiedActionId, MOProbability]]]) – The probabilities of getting a positive reward for each action.

  • ws (Union[List[Dict[UnifiedActionId, float]], List[Dict[UnifiedActionId, List[float]]]]) – The weighted sum of logistic regression logits.

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], context: ndarray, quantities: List[float | List[float] | None] | None = None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: ndarray | None = None)

Update the contextual Bernoulli bandit given the list of selected actions and their corresponding binary rewards.

Parameters:
  • actions (List[ActionId] of shape (n_samples,), e.g. ['a1', 'a2', 'a3', 'a4', 'a5']) – The selected action for each sample.

  • rewards (List[Union[BinaryReward, List[BinaryReward]]] of shape (n_samples, n_objectives)) –

    The binary reward for each sample.
    If strategy is not MultiObjectiveBandit, rewards should be a list, e.g.

    rewards = [1, 0, 1, 1, 1, …]

    If strategy is MultiObjectiveBandit, rewards should be a list of list, e.g. (with n_objectives=2):

    rewards = [[1, 1], [1, 0], [1, 1], [1, 0], [1, 1], …]

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

classmethod update_old_state(state: Dict[str, str | int | float | bool | None | Dict[str, str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]] | List[str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]]], delta: PositiveProbability | None) Dict[str, str | int | float | bool | None | Dict[str, str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]] | List[str | int | float | bool | None | Dict[str, Serializable] | List[Serializable]]]

Update the model state to the current version. Besides the updates in the MAB class, it also loads legacy Bayesian Logistic Regression model parameters

into the new Bayesian Neural Network model.

Parameters:
  • state (Dict[str, Serializable]) – The internal state of a model (actions, strategy, etc.) of the same type. The state is expected to be in the old format of PyBandits below the current supported version.

  • delta (Optional[PositiveProbability]) – The delta value to be set in the actions_manager. If None, it will not be set. This is relevant only for adaptive window models.

Returns:

state – The updated state of the model. The state is in the current format of PyBandits, with actions_manager and delta added if needed.

Return type:

Dict[str, Serializable]

class pybandits.cmab.BaseCmabBernoulliMO(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BaseBayesianNeuralNetworkMO], strategy: MultiObjectiveStrategy)

Bases: BaseCmabBernoulli, ABC

Base model for a Contextual Multi-Armed Bandit with Thompson Sampling and Multi-Objective strategy.

Parameters:
actions_manager: CmabActionsManager[BaseBayesianNeuralNetworkMO]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

strategy: MultiObjectiveStrategy
class pybandits.cmab.CmabBernoulli(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetwork | QuantitativeBayesianNeuralNetwork], strategy: ClassicBandit)

Bases: BaseCmabBernoulli

Contextual Bernoulli Multi-Armed Bandit with Thompson Sampling.

References

Thompson Sampling for Contextual Bandits with Linear Payoffs (Agrawal and Goyal, 2014) https://arxiv.org/pdf/1209.3352.pdf

Parameters:
  • actions_manager (CmabActionsManagerSO) – The manager for actions and their associated models.

  • strategy (ClassicBandit) – The strategy used to select actions.

actions_manager: CmabActionsManager[BayesianNeuralNetwork | QuantitativeBayesianNeuralNetwork]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

strategy: ClassicBandit
class pybandits.cmab.CmabBernoulliBAI(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetwork | QuantitativeBayesianNeuralNetwork], strategy: BestActionIdentificationBandit)

Bases: BaseCmabBernoulli

Contextual Bernoulli Multi-Armed Bandit with Thompson Sampling, and Best Action Identification strategy.

References

Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

Parameters:
  • actions_manager (CmabActionsManagerSO) – The manager for actions and their associated models.

  • strategy (BestActionIdentificationBandit) – The strategy used to select actions.

actions_manager: CmabActionsManager[BayesianNeuralNetwork | QuantitativeBayesianNeuralNetwork]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

strategy: BestActionIdentificationBandit
class pybandits.cmab.CmabBernoulliCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetworkCC | QuantitativeBayesianNeuralNetworkCC], strategy: CostControlBandit)

Bases: BaseCmabBernoulli

Contextual Bernoulli Multi-Armed Bandit with Thompson Sampling, and Cost Control strategy.

The Cmab is extended to include a control of the action cost. Each action is associated with a predefined “cost”. At prediction time, the model considers the actions whose expected rewards is above a pre-defined lower bound. Among these actions, the one with the lowest associated cost is recommended. The expected reward interval for feasible actions is defined as [(1-subsidy_factor) * max_p, max_p], where max_p is the highest expected reward sampled value.

References

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638

Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488

Parameters:
  • actions_manager (CmabActionsManagerCC) – The manager for actions and their associated models.

  • strategy (CostControlBandit) – The strategy used to select actions.

actions_manager: CmabActionsManager[BayesianNeuralNetworkCC | QuantitativeBayesianNeuralNetworkCC]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

strategy: CostControlBandit
class pybandits.cmab.CmabBernoulliMO(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetworkMO], strategy: MultiObjectiveBandit)

Bases: BaseCmabBernoulliMO

Contextual Multi-Armed Bandit with Thompson Sampling and Multi-Objective strategy.

The reward for an action is a multidimensional vector. Actions are compared using Pareto order between their expected reward vectors. Pareto optimal actions are those not strictly dominated by any other action.

Reference

Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem (Yahyaa and Manderick, 2015) https://www.researchgate.net/publication/272823659_Thompson_Sampling_for_Multi-Objective_Multi-Armed_Bandits_Problem

param actions_manager:

The manager for actions and their associated models.

type actions_manager:

CmabActionsManagerMO

param strategy:

The strategy used to select actions.

type strategy:

MultiObjectiveBandit

actions_manager: CmabActionsManager[BayesianNeuralNetworkMO]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

strategy: MultiObjectiveBandit
class pybandits.cmab.CmabBernoulliMOCC(epsilon: Float_0_1 | None = None, default_action: ActionId | None = None, version: str | None = None, *, actions_manager: CmabActionsManager[BayesianNeuralNetworkMOCC], strategy: MultiObjectiveCostControlBandit)

Bases: BaseCmabBernoulliMO

Contextual Multi-Armed Bandit with Thompson Sampling for Multi-Objective (MO) and Cost Control (CC) strategy.

This bandit allows the reward to be a multidimensional vector and includes control of the action cost, merging Multi-Objective and Cost Control strategies.

Parameters:
  • actions_manager (CmabActionsManagerMOCC) – The manager for actions and their associated models.

  • strategy (MultiObjectiveCostControlBandit) – The strategy used to select actions.

actions_manager: CmabActionsManager[BayesianNeuralNetworkMOCC]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseMab__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

strategy: MultiObjectiveCostControlBandit

pybandits.model

class pybandits.model.BaseBayesianNeuralNetwork(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: Literal['VI', 'MCMC'] = 'VI', update_kwargs: dict | None = None, activation: Literal['tanh', 'relu', 'sigmoid', 'gelu'] = 'tanh', use_residual_connections: bool = False, feature_config: FeaturesConfig, random_seed: int | None = None)

Bases: Model, ABC

Bayesian Neural Network model for binary classification.

This class implements a Bayesian Neural Network with an arbitrary number of fully connected layers using NumPyro for binary classification tasks. It supports both Markov Chain Monte Carlo (MCMC) and Variational Inference (VI) methods for posterior inference.

References

Bayesian Learning for Neural Networks (Radford M. Neal, 1995) https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=db869fa192a3222ae4f2d766674a378e47013b1b

Parameters:
  • model_params (BnnParams) – The parameters of the Bayesian Neural Network, including weights and biases for each layer and their initial values for resetting

  • update_method (str, optional) – The method used for posterior inference, either “MCMC” or “VI” (default is “MCMC”).

  • update_kwargs (Optional[dict], optional) – A dictionary of keyword arguments for the update method. For MCMC, it contains ‘trace’ settings. For VI, it contains ‘fit’ settings and additional parameters like ‘epochs’, ‘optimizer_type’, ‘optimizer_kwargs’, ‘batch_size’, and ‘early_stopping_kwargs’. The ‘epochs’ parameter specifies the number of iterations for VI (maps to ‘step_size’ in numpyro’s API).

  • activation (str, optional) – The activation function to use for hidden layers. Supported values are: “tanh”, “relu”, “sigmoid”, “gelu” (default is “tanh”).

  • use_residual_connections (bool, optional) – Whether to use residual connections in the network. Residual connections are only added when the layer output dimension is greater than or equal to the input dimension (default is False).

  • early_stopping_config (Optional[EarlyStoppingConfig], optional) – Configuration for early stopping during VI training. If None, no early stopping is used (default is None). Only applicable when update_method is “VI”.

Examples

>>> # Create BNN with Student-t priors (default)
>>> bnn = BayesianNeuralNetwork.cold_start(
...     n_features=2,
...     hidden_dim_list=[16, 16],
...     dist_type="studentt",
...     dist_params_init={"mu": 0, "sigma": 1, "nu": 5}
... )
>>> # Create BNN with Normal priors
>>> bnn = BayesianNeuralNetwork.cold_start(
...     n_features=2,
...     hidden_dim_list=[16, 16],
...     dist_type="normal",
...     dist_params_init={"mu": 0, "sigma": 1}
... )

Notes

  • The model uses the specified activation function for hidden layers and sigmoid activation for the output layer.

  • The output layer is designed for binary classification tasks, with probabilities modeled using a Bernoulli likelihood.

  • When use_residual_connections is True, residual connections are added to hidden layers where the output dimension is >= input dimension. For expanding dimensions, the residual is zero-padded.

class Config

Bases: object

arbitrary_types_allowed = True
activation: Literal['tanh', 'relu', 'sigmoid', 'gelu']
property approx_history: ndarray | None
bias_var_name: ClassVar[str] = 'bias'
check_context_matrix(context: ndarray)

Validate the context input.

Context must be an array-like with numeric values and the correct number of columns. Categorical columns are validated to contain valid integer indices within their vocab range.

Parameters:

context (np.ndarray) – Matrix of contextual features of shape (n_samples, n_cols).

classmethod cold_start(n_features: Annotated[int, Gt(gt=0)], hidden_dim_list: List[Annotated[int, Gt(gt=0)]] | None = None, update_method: Literal['VI', 'MCMC'] = 'VI', update_kwargs: dict | None = None, dist_type: Literal['normal', 'studentt'] = 'studentt', dist_params_init: Dict[str, float] | None = None, activation: Literal['tanh', 'relu', 'sigmoid', 'gelu'] = 'tanh', use_residual_connections: bool = False, use_layerwise_scaling: bool = False, categorical_features: Dict[Annotated[int, Ge(ge=0)], Annotated[int, Ge(ge=0)]] | None = None, random_seed: int | None = None, **kwargs) Self

Initialize a Bayesian Neural Network with a cold start.

Parameters:
  • n_features (PositiveInt) – Total number of columns in the context array, including any categorical columns.

  • hidden_dim_list (Optional[List[PositiveInt]], optional) – List of dimensions for the hidden layers of the network. If None, no hidden layers are added.

  • update_method (UpdateMethods) – Method to update the network, either “MCMC” or “VI”. Default is “VI”.

  • update_kwargs (Optional[dict], optional) – Additional keyword arguments for the update method. Default is None.

  • dist_type (Literal["normal", "studentt"]) – Type of distribution to use for priors. Default is “studentt”.

  • dist_params_init (Optional[Dict[str, float]], optional) – Initial distribution parameters for the network weights and biases. Default is None. For Student-t distributions: requires “mu”, “sigma”, and “nu” parameters. For Normal distributions: requires “mu” and “sigma” parameters (no “nu” needed).

  • activation (str) – The activation function to use for hidden layers. Supported values are: “tanh”, “relu”, “sigmoid”, “gelu” (default is “tanh”).

  • use_residual_connections (bool) – Whether to use residual connections in the network (default is False).

  • use_layerwise_scaling (bool) – Whether to use layerwise scaling in the network (default is False). When applied, the sigma is scaled by the square root of the input dimension. This is useful to enable smoother convergence with Gaussian Process-like behavior.

  • categorical_features (Optional[Dict[int, int]], optional) – Categorical columns as {column_index: cardinality}. Each categorical column is modelled with a Bayesian embedding matrix; embedding_dim is set automatically to ceil(cardinality / _embedding_dim_divisor). Columns absent from this dict are treated as numerical.

  • random_seed (Optional[int], optional) – Seed for the JAX PRNG key. If None, a seed is drawn from OS entropy at construction time and stored on the instance, so the same initial key is reproduced after serialization. Pass an explicit integer for fully reproducible runs.

  • **kwargs – Additional keyword arguments for the BayesianNeuralNetwork constructor.

Returns:

An instance of the Bayesian Neural Network initialized with the specified parameters.

Return type:

Self

classmethod create_model_params(feature_config: ~pybandits.model.FeaturesConfig, hidden_dim_list: ~typing.List[~typing.Annotated[int, ~annotated_types.Gt(gt=0)]] | None, use_layerwise_scaling: bool = False, dist_class: type[~pybandits.model.BaseLocationScaleArray] = <class 'pybandits.model.StudentTArray'>, **dist_params_init) BnnParams

Creates model parameters for a Bayesian neural network (BNN) model according to dist_params_init. This method initializes the distribution’s parameters for each layer of a BNN using the specified number of features, hidden dimensions, and distribution initialization parameters.

Parameters:
  • feature_config (FeaturesConfig) – Full input layout description. First-layer input dimension is feature_config.total_output_dim. EmbeddingParams are created when feature_config.categorical_features_configs is non-empty.

  • hidden_dim_list (Optional[List[PositiveInt]]) – Number of hidden units per hidden layer. If None, no hidden layers are added.

  • use_layerwise_scaling (bool) – Whether to use layerwise scaling in the network (default is False).

  • dist_class (type) – The distribution class to use for weights, biases, and embeddings, by default StudentTArray.

  • **dist_params_init (dict, optional) – Additional parameters for initializing the distribution of weights and biases.

Returns:

An instance of BnnParams containing the initialized layer parameters.

Return type:

BnnParams

create_update_model(batch_size: Annotated[int, Gt(gt=0)] | None = None) Callable

Create a NumPyro model function for Bayesian Neural Network.

This method builds a NumPyro model function with the network architecture specified in model_params. Data is passed as arguments to the returned model function. Minibatching is handled via numpyro.plate with subsample_size.

Numerical columns are passed through as-is. Categorical columns (identified by their column_index in feature_config) are modelled with Bayesian embedding matrices sampled as NumPyro random variables.

Parameters:

batch_size (Optional[PositiveInt]) – If provided, use minibatching with this batch size via numpyro.plate.

Returns:

NumPyro model function with the specified neural network architecture

Return type:

Callable

Notes

The model structure follows these steps: 1. For each layer, create weight and bias variables from prior distributions. 2. Sample embedding matrices for categorical features (if any). 3. Apply linear transformations and activations through the layers. 4. Apply sigmoid activation at the output 5. Use Bernoulli likelihood for binary classification

static extract_sample(sampled_weights: List[Tuple[ndarray, ndarray]], sampled_embeddings: List[ndarray] | None, sample_idx: Annotated[int, Ge(ge=0)]) Tuple[List[Tuple[ndarray, ndarray]], List[ndarray] | None]

Extract the weights, biases, and embeddings for a specific sample.

Parameters:
  • sampled_weights (List[Tuple[np.ndarray, np.ndarray]]) – List of (weights, biases) per layer. Each weights has shape (n_samples, input_dim, output_dim), biases (n_samples, output_dim).

  • sampled_embeddings (Optional[List[np.ndarray]]) – Pre-sampled embedding vectors, one per categorical feature, each of shape (n_samples, emb_dim). None when no categorical features.

  • sample_idx (NonNegativeInt) – The index of the sample to extract.

Returns:

(weights_idx, embeddings_idx) sliced to a single sample (batch dim = 1).

Return type:

Tuple[List[Tuple[np.ndarray, np.ndarray]], Optional[List[np.ndarray]]]

feature_config: FeaturesConfig
forward_pass(sampled_weights: List[Tuple[ndarray, ndarray]], context: ndarray, sampled_embeddings: List[ndarray] | None = None) List[Tuple[Probability, float]]

Apply the neural network forward pass using pre-sampled weights, biases, and embeddings.

All stochastic parameters must be sampled externally (via sample_weights and sample_embeddings) before calling this method.

Parameters:
  • sampled_weights (List[Tuple[np.ndarray, np.ndarray]]) – List of (weights, biases) per layer from sample_weights. Each weights has shape (n_samples, input_dim, output_dim), biases (n_samples, output_dim).

  • context (np.ndarray) – Context matrix, shape (n_samples, feature_config.n_features). Categorical columns contain integer indices into the embedding vocabulary.

  • sampled_embeddings (Optional[List[np.ndarray]]) – Pre-sampled embedding vectors from sample_embeddings, one array per categorical feature, each of shape (n_samples, emb_dim). None when the model has no categorical features.

Returns:

Each element is (probability, weighted_sum) per sample.

Return type:

List[ProbabilityWeight]

classmethod get_embedding_var_name(feat_index: int) str

Return the NumPyro variable name for a categorical embedding matrix.

classmethod get_layer_params_name(layer_ind: Annotated[int, Gt(gt=0)]) Tuple[str, str]
property hidden_dim_list: List[int]

Returns the hidden layer dimensions of the model.

Returns:

Output dimension of each layer except the final output layer. Empty list when no hidden layers are present.

Return type:

List[int]

property input_dim: Annotated[int, Gt(gt=0)]

Returns the number of raw context columns expected by the model.

Returns:

Equal to feature_config.n_features: the number of columns the context numpy array must have. For categorical models this differs from the post-embedding dimension (feature_config.total_output_dim).

Return type:

PositiveInt

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_params: BnnParams
model_post_init(_BaseBayesianNeuralNetwork__context: Any) None

Initialize activation function PrivateAttr based on the activation setting.

random_seed: int | None
sample_embeddings(context: ndarray) List[ndarray] | None

Sample embedding vectors for each categorical feature given the context.

For each categorical feature, extracts the integer indices from the context and samples from the corresponding embedding distribution at those indices.

Parameters:

context (np.ndarray) – Context matrix, shape (n_samples, feature_config.n_features). Categorical columns contain integer indices into the embedding vocabulary.

Returns:

One array per categorical feature, each of shape (n_samples, emb_dim). None when the model has no categorical features.

Return type:

Optional[List[np.ndarray]]

sample_proba(context: ndarray) List[Tuple[Probability, float]]

Samples probabilities and logits from the prior predictive distribution.

Parameters:

context (np.ndarray) – The context matrix for which the probabilities are to be sampled.

Returns:

Each element is a tuple containing the probability of a positive reward and the network logit.

Return type:

List[ProbabilityWeight]

sample_weights(n_samples: Annotated[int, Gt(gt=0)]) List[Tuple[ndarray, ndarray]]

Sample weights and biases for each sample and each layer.

Parameters:

n_samples (PositiveInt) – The number of samples (users) to draw weights for. Must be positive.

Returns:

A list of length num_layers, where each element is (weights, biases) for that layer. - weights shape: (n_samples, input_dim, output_dim) - biases shape: (n_samples, output_dim)

Return type:

List[Tuple[np.ndarray, np.ndarray]]

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ('update_method',)

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'model_params')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('activation', 'use_residual_connections')

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

update_kwargs: dict | None
update_method: Literal['VI', 'MCMC']
use_residual_connections: bool
classmethod validate_activation(v)
weight_var_name: ClassVar[str] = 'weight'
class pybandits.model.BaseBayesianNeuralNetworkMO(*, models: Annotated[list[BayesianNeuralNetwork], Len(min_length=1, max_length=None)])

Bases: ModelMO, ABC

Base class for Bayesian Neural Network with multi-objective.

Parameters:

models (List[BayesianNeuralNetwork]) – The list of Bayesian Neural Network models for each objective.

classmethod cold_start(n_objectives: Annotated[int, Gt(gt=0)], n_features: Annotated[int, Gt(gt=0)], hidden_dim_list: List[Annotated[int, Gt(gt=0)]] | None = None, update_method: Literal['VI', 'MCMC'] = 'VI', update_kwargs: dict | None = None, dist_type: Literal['normal', 'studentt'] = 'studentt', dist_params: Dict[str, float] | None = None, activation: Literal['tanh', 'relu', 'sigmoid', 'gelu'] = 'tanh', use_residual_connections: bool = False, use_layerwise_scaling: bool = False, **kwargs) Self

Initialize a multi-objective Bayesian Neural Network with a cold start.

Parameters:
  • n_objectives (PositiveInt) – Number of objectives (models) to create.

  • n_features (PositiveInt) – Number of input features for each network.

  • hidden_dim_list (Optional[List[PositiveInt]], optional) – List of dimensions for the hidden layers of each network.

  • update_method (UpdateMethods) – Method to update the networks.

  • update_kwargs (Optional[dict], optional) – Additional keyword arguments for the update method.

  • dist_type (Literal["normal", "studentt"]) – Type of distribution to use for priors. Default is “studentt”.

  • dist_params (Optional[Dict[str, float]], optional) – Initial distribution parameters for the network weights and biases.

  • activation (str) – The activation function to use for hidden layers. Supported values are: “tanh”, “relu”, “sigmoid”, “gelu” (default is “tanh”).

  • use_residual_connections (bool) – Whether to use residual connections in the network (default is False).

  • use_layerwise_scaling (bool) – Whether to use layerwise scaling in the network (default is False).

  • **kwargs – Additional keyword arguments.

Returns:

A multi-objective BNN with the specified number of objectives.

Return type:

BayesianNeuralNetworkMO

property hidden_dim_list: List[int]

Returns the hidden layer dimensions of the model.

Returns:

The output dimension of each layer except the last, derived from the shape of the weight matrices in the layer parameters.

Return type:

List[int]

property input_dim: Annotated[int, Gt(gt=0)]

Returns the expected input dimension of the model.

Returns:

The number of input features expected by the model, derived from the shape of the weight matrix in the first layer’s parameters of the first objective model.

Return type:

PositiveInt

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseBayesianNeuralNetworkMO__context: Any) None

Validate that all models have the same number of features.

models: Annotated[list[BayesianNeuralNetwork], Len(min_length=1, max_length=None)]
class pybandits.model.BaseBeta(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)

Bases: Model, ABC

Beta Distribution model for Bernoulli multi-armed bandits.

Parameters:
  • n_successes (PositiveInt = 1) – Counter of the number of successes.

  • n_failures (PositiveInt = 1) – Counter of the number of failures.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sample_proba(n_samples: Annotated[int, Gt(gt=0)]) List[Probability]

Sample the probability of getting a positive reward.

Returns:

prob – Probability of getting a positive reward.

Return type:

Probability

property std: float

The corrected standard deviation (Bessel’s correction) of the binary distribution of successes and failures.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.model.BaseBetaMO(*, models: Annotated[list[Beta], Len(min_length=1, max_length=None)])

Bases: ModelMO, ABC

Base beta Distribution model for Bernoulli multi-armed bandits with multi-objectives.

Parameters:

models (List[Beta] of length (n_objectives,)) – List of Beta distributions.

classmethod cold_start(n_objectives: Annotated[int, Gt(gt=0)], **kwargs) BetaMO

Utility function to create a Bayesian Logistic Regression model or child model with cost control, with default parameters.

It is modeled as:

y = sigmoid(alpha + beta1 * x1 + beta2 * x2 + … + betaN * xN)

where the alpha and betas coefficients are Student’s t-distributions.

Parameters:
  • n_betas (PositiveInt) – The number of betas of the Bayesian Logistic Regression model. This is also the number of features expected after in the context matrix.

  • kwargs (Dict[str, Any]) – Additional arguments for the Bayesian Logistic Regression child model.

Returns:

beta_mo – The multi-objective Beta model.

Return type:

BetaMO

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

models: Annotated[list[Beta], Len(min_length=1, max_length=None)]
class pybandits.model.BaseLocationScaleArray(*, mu: List[float] | List[List[float]], sigma: List[Annotated[float, Ge(ge=0)]] | List[List[Annotated[float, Ge(ge=0)]]])

Bases: PyBanditsBaseModel, ABC

Abstract base class for location-scale distribution arrays used in Bayesian Neural Networks.

Parameters:
  • mu (Union[List[float], List[List[float]]]) – The mean values of the distributions. Can be a 1D (for the layer bias term) or 2D list (for the layer weight term).

  • sigma (Union[List[NonNegativeFloat], List[List[NonNegativeFloat]]]) – The scale (standard deviation) values of the distributions. Must be non-negative. Can be a 1D or 2D list.

classmethod cold_start(shape: Annotated[int, Gt(gt=0)] | Tuple[Annotated[int, Gt(gt=0)], ...], mu: float = 0.0, sigma: Annotated[float, Ge(ge=0)] = 10.0, use_layerwise_scaling: bool = False, **kwargs) BaseLocationScaleArray

Template method for cold start initialization.

Common logic for shape normalization, validation, and parameter array creation is handled here. Subclasses override _get_distribution_specific_params to provide distribution-specific parameters.

Parameters:
  • shape (Union[PositiveInt, Tuple[PositiveInt, ...]]) – Dimensions of the distribution array.

  • mu (float) – Mean of the distribution, by default 0.0.

  • sigma (NonNegativeFloat) – Standard deviation of the distribution, by default 10.0.

  • use_layerwise_scaling (bool) – Whether to use layerwise scaling in the network (default is False). When applied, the sigma is scaled by the square root of the input dimension. This is useful to enable smoother convergence with Gaussian Process-like behavior.

  • **kwargs – Additional keyword arguments for distribution-specific parameters (e.g., nu for StudentTArray).

Returns:

An instance of the distribution array with the specified parameters.

Return type:

BaseLocationScaleArray

Raises:

ValueError – If shape has empty dimensions.

static maybe_convert_list_to_array(input_list: List[float] | List[List[float]]) ndarray

Convert a list or list of lists to a numpy array.

Parameters:

input_list (Union[List[float], List[List[float]]]) – Input list to convert.

Returns:

Converted numpy array.

Return type:

np.ndarray

Raises:

ValueError – If the input list is not a valid 1D or 2D list.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseLocationScaleArray__context: Any) None

Initialize private numpy array attributes by converting lists to arrays once at initialization.

Parameters:

__context (Any) – Pydantic context (unused).

mu: List[float] | List[List[float]]
param_map: ClassVar[Dict[str, str]] = {'mu': 'loc', 'sigma': 'scale'}
property params: Dict[str, ndarray]

Get the parameters as a dictionary of numpy arrays.

Returns:

Dictionary containing ‘mu’ and ‘sigma’ as numpy arrays.

Return type:

Dict[str, np.ndarray]

sample_at_indices(indices: List[Annotated[int, Ge(ge=0)]] | ndarray) ndarray

Sample one row-vector per entry in indices from a 2-D distribution matrix.

For each i, draws independently from the distribution at row indices[i]. This is equivalent to sample_rvs(size=(len(indices), *row_shape))[np.arange(len(indices)), indices] but allocates only O(len(indices) × ncols) memory instead of O(len(indices) × nrows × ncols).

Parameters:

indices (Union[List[NonNegativeInt], np.ndarray] of shape (n,) with dtype int) – Row indices to sample from.

Returns:

Sampled instances.

Return type:

np.ndarray of shape (n, ncols)

sample_rvs(size: Tuple[int, ...]) ndarray

Sample random variates from this distribution.

Parameters:

size (Tuple[int, ...]) – Shape of the output array.

Returns:

Array of sampled values.

Return type:

np.ndarray

property shape: Tuple[Annotated[int, Gt(gt=0)], ...]

Get the shape of the mu array.

Returns:

The shape of the mu array.

Return type:

Tuple[PositiveInt, …]

sigma: List[Annotated[float, Ge(ge=0)]] | List[List[Annotated[float, Ge(ge=0)]]]
to_numpyro_distribution() Distribution

Create a NumPyro distribution from this prior distribution array.

Maps internal parameter names (mu, sigma, nu) to NumPyro parameter names (loc, scale, df) using the subclass-defined param_map.

Returns:

A NumPyro distribution instance.

Return type:

npdist.Distribution

classmethod validate_input_shapes(values)

Validate that all array-like parameters have the same shape.

Parameters:

values (dict or BaseLocationScaleArray instance) – Dictionary of field values or an already-instantiated object.

Returns:

Validated values dictionary or the object itself if already instantiated.

Return type:

dict or BaseLocationScaleArray instance

Raises:

ValueError – If array-like parameters have different shapes or empty dimensions.

with_dist_parameters(**kwargs) BaseLocationScaleArray

Create a new instance with updated distribution parameters.

Parameters:

**kwargs – Parameters to update (e.g., mu, sigma, nu for StudentTArray). If empty, returns self unchanged.

Returns:

A new instance with the updated parameters.

Return type:

BaseLocationScaleArray

class pybandits.model.BayesianLogisticRegression(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: Literal['VI', 'MCMC'] = 'VI', update_kwargs: dict | None = None, activation: Literal['tanh', 'relu', 'sigmoid', 'gelu'] = 'tanh', use_residual_connections: bool = False, feature_config: FeaturesConfig, random_seed: int | None = None)

Bases: BayesianNeuralNetwork

A Bayesian Logistic Regression model that inherits from BayesianNeuralNetwork. This model is a specialized version of a Bayesian Neural Network with a single layer, designed specifically for logistic regression tasks. The model parameters are validated to ensure that the model adheres to this single-layer constraint.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseBayesianNeuralNetwork__context: Any) None

Initialize activation function PrivateAttr based on the activation setting.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ('update_method',)

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'model_params')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('activation', 'use_residual_connections')

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

classmethod validate_model_params(model_params)
class pybandits.model.BayesianLogisticRegressionCC(*, cost: Annotated[float, Ge(ge=0)], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: Literal['VI', 'MCMC'] = 'VI', update_kwargs: dict | None = None, activation: Literal['tanh', 'relu', 'sigmoid', 'gelu'] = 'tanh', use_residual_connections: bool = False, feature_config: FeaturesConfig, random_seed: int | None = None)

Bases: BayesianLogisticRegression, ModelCC

A Bayesian Logistic Regression model with cost control.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseBayesianNeuralNetwork__context: Any) None

Initialize activation function PrivateAttr based on the activation setting.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ('update_method',)

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'model_params')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('activation', 'use_residual_connections')

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.model.BayesianNeuralNetwork(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: Literal['VI', 'MCMC'] = 'VI', update_kwargs: dict | None = None, activation: Literal['tanh', 'relu', 'sigmoid', 'gelu'] = 'tanh', use_residual_connections: bool = False, feature_config: FeaturesConfig, random_seed: int | None = None)

Bases: BaseBayesianNeuralNetwork

Bayesian Neural Network class. This class implements a Bayesian Neural Network by extending the BaseBayesianNeuralNetwork. It provides functionality for probabilistic modeling and inference using neural networks.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseBayesianNeuralNetwork__context: Any) None

Initialize activation function PrivateAttr based on the activation setting.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ('update_method',)

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'model_params')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('activation', 'use_residual_connections')

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.model.BayesianNeuralNetworkCC(*, cost: Annotated[float, Ge(ge=0)], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, model_params: BnnParams, update_method: Literal['VI', 'MCMC'] = 'VI', update_kwargs: dict | None = None, activation: Literal['tanh', 'relu', 'sigmoid', 'gelu'] = 'tanh', use_residual_connections: bool = False, feature_config: FeaturesConfig, random_seed: int | None = None)

Bases: BaseBayesianNeuralNetwork, ModelCC

Bayesian Neural Network model for binary classification with cost constraint.

This class implements a Bayesian Neural Network with an arbitrary number of fully connected layers using PyMC for binary classification tasks. It supports both Markov Chain Monte Carlo (MCMC) and Variational Inference (VI) methods for posterior inference.

Parameters:
  • model_params (BnnParams) – The parameters of the Bayesian Neural Network, including weights and biases for each layer and their initial values for resetting

  • update_method (str, optional) – The method used for posterior inference, either “MCMC” or “VI” (default is “MCMC”).

  • update_kwargs (Optional[dict], optional) – A dictionary of keyword arguments for the update method. For MCMC, it contains ‘trace’ settings. For VI, it contains both ‘trace’ and ‘fit’ settings.

  • cost (NonNegativeFloat) – Cost associated to the Bayesian Neural Network model.

Notes

  • The model uses tanh activation for hidden layers and sigmoid activation for the output layer.

  • The output layer is designed for binary classification tasks, with probabilities modeled using a Bernoulli likelihood.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseBayesianNeuralNetwork__context: Any) None

Initialize activation function PrivateAttr based on the activation setting.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ('update_method',)

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'model_params')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('activation', 'use_residual_connections')

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.model.BayesianNeuralNetworkMO(*, models: Annotated[list[BayesianNeuralNetwork], Len(min_length=1, max_length=None)])

Bases: BaseBayesianNeuralNetworkMO

Bayesian Neural Network model for multi-objective.

Parameters:

models (List[BayesianNeuralNetwork]) – The list of Bayesian Neural Network models for each objective.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.BayesianNeuralNetworkMOCC(*, cost: Annotated[float, Ge(ge=0)], models: Annotated[list[BayesianNeuralNetwork], Len(min_length=1, max_length=None)])

Bases: BaseBayesianNeuralNetworkMO, ModelMO, ModelCC

Bayesian Neural Network model for multi-objective with cost control.

Parameters:
  • models (List[BayesianNeuralNetwork]) – The list of Bayesian Neural Network models for each objective.

  • cost (NonNegativeFloat) – Cost associated to the Bayesian Neural Network model.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.Beta(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)

Bases: BaseBeta

Beta Distribution model for Bernoulli multi-armed bandits.

Parameters:
  • n_successes (PositiveInt = 1) – Counter of the number of successes.

  • n_failures (PositiveInt = 1) – Counter of the number of failures.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.model.BetaCC(*, cost: Annotated[float, Ge(ge=0)], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)

Bases: BaseBeta, ModelCC

Beta Distribution model for Bernoulli multi-armed bandits with cost control.

Parameters:
  • n_successes (PositiveInt = 1) – Counter of the number of successes.

  • n_failures (PositiveInt = 1) – Counter of the number of failures.

  • cost (NonNegativeFloat) – Cost associated to the Beta distribution.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.model.BetaMO(*, models: Annotated[list[Beta], Len(min_length=1, max_length=None)])

Bases: BaseBetaMO

Beta Distribution model for Bernoulli multi-armed bandits with multi-objectives.

Parameters:

models (List[Beta] of length (n_objectives,)) – List of Beta distributions.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.BetaMOCC(*, cost: Annotated[float, Ge(ge=0)], models: Annotated[list[Beta], Len(min_length=1, max_length=None)])

Bases: BaseBetaMO, ModelCC

Beta Distribution model for Bernoulli multi-armed bandits with multi-objectives and cost control.

Parameters:
  • models (List[BetaCC] of shape (n_objectives,)) – List of Beta distributions.

  • cost (NonNegativeFloat) – Cost associated to the Beta distribution.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.BnnLayerParams(*, weight: NormalArray | StudentTArray, bias: StudentTArray | NormalArray)

Bases: PyBanditsBaseModel

Represents the parameters of a Bayesian neural network (BNN) layer.

Parameters:
  • weight (Union[NormalArray, StudentTArray]) – The weight parameter of the BNN layer, represented as either a NormalArray or StudentTArray.

  • bias (Union[StudentTArray, NormalArray]) – The bias parameter of the BNN layer, represented as either a StudentTArray or NormalArray.

bias: StudentTArray | NormalArray
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

weight: NormalArray | StudentTArray
class pybandits.model.BnnParams(*, bnn_layer_params: ~typing.List[~pybandits.model.BnnLayerParams] | None, bnn_layer_params_init: ~typing.List[~pybandits.model.BnnLayerParams] = <factory>, embedding_params: ~pybandits.model.EmbeddingParams | None = None, embedding_params_init: ~pybandits.model.EmbeddingParams | None = None)

Bases: PyBanditsBaseModel

Represents the parameters of a Bayesian Neural Network (BNN), including both the current layer parameters and the initial layer parameters. We keep the init parameters in case we need to reset the model.

Parameters:
  • bnn_layer_params (List[BnnLayerParams]) – A list of BNN layer parameters representing the current state of the model.

  • bnn_layer_params_init (List[BnnLayerParams]) – A list of BNN layer parameters representing the initial state of the model.

  • embedding_params (Optional[EmbeddingParams]) – Bayesian embedding matrices for categorical features. None when no categorical features are configured.

  • embedding_params_init (Optional[EmbeddingParams]) – Frozen copy of the initial embedding parameters for resetting. Set automatically.

bnn_layer_params: List[BnnLayerParams] | None
bnn_layer_params_init: List[BnnLayerParams]
embedding_params: EmbeddingParams | None
embedding_params_init: EmbeddingParams | None
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod validate_inputs(values)
class pybandits.model.CategoricalFeatureConfig(*, column_index: Annotated[int, Ge(ge=0)], cardinality: Annotated[int, Gt(gt=0)], embedding_dim: Annotated[int, Gt(gt=0)])

Bases: PyBanditsBaseModel

Configuration for a single categorical feature with Bayesian embedding.

The caller is responsible for pre-encoding categorical values as integer indices in the range [0, cardinality).

Parameters:
  • column_index (NonNegativeInt) – Column position of this feature in the input numpy array.

  • cardinality (PositiveInt) – Number of distinct integer category codes. The context array must contain pre-encoded integer indices in the range [0, cardinality).

  • embedding_dim (PositiveInt) – Dimensionality of the embedding vector for this feature. Default is 8.

cardinality: Annotated[int, Gt(gt=0)]
column_index: Annotated[int, Ge(ge=0)]
embedding_dim: Annotated[int, Gt(gt=0)]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.EarlyStopping(*, patience: Annotated[int, Gt(gt=0)] = 10, tolerance: Annotated[float, Gt(gt=0)] = 0.0001, diff_type: Literal['relative', 'absolute'] = 'relative')

Bases: PyBanditsBaseModel

Early stopping monitor for SVI training.

Monitors loss convergence and signals when training should stop. Stops after patience consecutive epochs where the loss change is below tolerance.

Parameters:
  • patience (PositiveInt) – Number of consecutive non-improving epochs required before stopping.

  • tolerance (PositiveFloat) – Threshold for convergence.

  • diff_type (Literal["relative", "absolute"]) – Type of difference to check: “relative” or “absolute”.

diff_type: Literal['relative', 'absolute']
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

patience: Annotated[int, Gt(gt=0)]
reset() None

Reset early stopping state for a new training run.

should_stop(loss: float) bool

Check if training should stop based on loss convergence.

tolerance: Annotated[float, Gt(gt=0)]
class pybandits.model.EmbeddingParams(*, embeddings: ~typing.List[~pybandits.model.StudentTArray | ~pybandits.model.NormalArray], embeddings_init: ~typing.List[~pybandits.model.StudentTArray | ~pybandits.model.NormalArray] = <factory>)

Bases: PyBanditsBaseModel

Stores Bayesian embedding matrices for all categorical features.

Each embedding matrix has shape (cardinality, embedding_dim) and is stored as a BaseLocationScaleArray (StudentTArray or NormalArray) — the same representation used for layer weights in BnnLayerParams.

Parameters:
  • embeddings (List[Union[StudentTArray, NormalArray]]) – Ordered list of embedding matrix distributions, matching the order of FeaturesConfig.categorical_features_configs. Shape of each matrix: (cardinality, embedding_dim).

  • embeddings_init (List[Union[StudentTArray, NormalArray]]) – Frozen copy of the initial embeddings for resetting. Set automatically.

classmethod cold_start(feature_config: ~pybandits.model.FeaturesConfig, dist_class: type[~pybandits.model.BaseLocationScaleArray] = <class 'pybandits.model.StudentTArray'>, **dist_params_init) EmbeddingParams

Create EmbeddingParams with prior distributions for all categorical features.

Parameters:
  • feature_config (FeaturesConfig) – Feature configuration containing categorical feature specs.

  • dist_class (type) – The distribution class to use for embedding priors, by default StudentTArray.

  • **dist_params_init – Distribution parameters passed to BaseLocationScaleArray.cold_start (e.g. mu, sigma, nu for StudentT; mu, sigma for Normal).

Returns:

An EmbeddingParams instance with one embedding matrix per categorical feature, each of shape (cardinality, embedding_dim), initialised from dist_class cold-start priors.

Return type:

EmbeddingParams

embeddings: List[StudentTArray | NormalArray]
embeddings_init: List[StudentTArray | NormalArray]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.FeaturesConfig(*, n_features: ~typing.Annotated[int, ~annotated_types.Ge(ge=0)] = 0, categorical_features_configs: ~typing.List[~pybandits.model.CategoricalFeatureConfig] = <factory>)

Bases: PyBanditsBaseModel

Specification of the structure of a numpy context array.

Columns can appear in any order. Categorical features are identified by their explicit column_index; all remaining columns are treated as numerical.

Parameters:
  • n_features (int) – Total number of columns in the input numpy array. Default 0.

  • categorical_features_configs (List[CategoricalFeatureConfig]) – List of categorical feature configurations.

categorical_features_configs: List[CategoricalFeatureConfig]
property has_categorical: bool

True if at least one categorical feature is configured.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_features: Annotated[int, Ge(ge=0)]
property n_numerical: int

n_features minus categorical count).

Type:

Number of numerical columns (derived

property numerical_indices: List[int]

Sorted list of column positions treated as numerical (not used by any categorical).

property total_output_dim: int

Total dimensionality of the concatenated vector fed into the first BNN layer.

= n_numerical + sum(cat.embedding_dim)

class pybandits.model.Model(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1)

Bases: BaseModelSO, ABC

Class to model the prior distributions for single objective.

Parameters:
  • n_successes (PositiveInt = 1) – Counter of the number of successes.

  • n_failures (PositiveInt = 1) – Counter of the number of failures.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstract sample_proba(**kwargs) List[Probability] | List[List[Probability]] | List[Tuple[Probability, float]]

Sample the probability of getting a positive reward.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.model.ModelCC(*, cost: Annotated[float, Ge(ge=0)])

Bases: BaseModelCC, ABC

Class to model action cost.

Parameters:

cost (NonNegativeFloat) – Cost associated to the action.

cost: Annotated[float, Ge(ge=0)]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.model.ModelMO(*, models: Annotated[list[Model], Len(min_length=1, max_length=None)])

Bases: BaseModelMO, ABC

Class to model the prior distributions for multi-objective.

Parameters:

models (List[Model]) – The list of models for each objective.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

models: Annotated[list[Model], Len(min_length=1, max_length=None)]
class pybandits.model.NormalArray(*, mu: List[float] | List[List[float]], sigma: List[Annotated[float, Ge(ge=0)]] | List[List[Annotated[float, Ge(ge=0)]]])

Bases: BaseLocationScaleArray

A class representing an array of Normal distributions with parameters mu and sigma. A specific element (e.g, a single parameter of a layer) distribution is defined by the corresponding elements in the lists. The mean values are represented by mu and the standard deviation values by sigma.

Normal distributions are simpler and faster than Student-t distributions, but less robust to outliers. They provide standard L2-like regularization.

Parameters:
  • mu (Union[List[float], List[List[float]]]) – The mean values of the Normal distributions. Can be a 1D (for the layer bias term) or 2D list (for the layer weight term).

  • sigma (Union[List[NonNegativeFloat], List[List[NonNegativeFloat]]]) – The standard deviation values of the Normal distributions. Must be non-negative. Can be a 1D or 2D list.

Examples

>>> # Create NormalArray with default parameters
>>> normal = NormalArray.cold_start(shape=(10, 5), mu=0.0, sigma=1.0)
>>> # Use in BNN
>>> bnn = BayesianNeuralNetwork.cold_start(
...     n_features=10,
...     dist_type="normal",
...     dist_params_init={"mu": 0, "sigma": 1}
... )
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_BaseLocationScaleArray__context: Any) None

Initialize private numpy array attributes by converting lists to arrays once at initialization.

Parameters:

__context (Any) – Pydantic context (unused).

class pybandits.model.StudentTArray(*, mu: List[float] | List[List[float]], sigma: List[Annotated[float, Ge(ge=0)]] | List[List[Annotated[float, Ge(ge=0)]]], nu: List[Annotated[float, Gt(gt=0)]] | List[List[Annotated[float, Gt(gt=0)]]])

Bases: BaseLocationScaleArray

A class representing an array of Student’s t-distributions with parameters mu, sigma, and nu. A specific element (e.g, a single parameter of a layer) distribution is defined by the the corresponding elements in the lists. The mean values are represented by mu, the scale (standard deviation) values by sigma, and the degrees of freedom by nu.

Parameters:
  • mu (Union[List[float], List[List[float]]]) – The mean values of the Student’s t-distributions. Can be a 1D (for the layer bias term) or 2D list (for the layer weight term).

  • sigma (Union[List[NonNegativeFloat], List[List[NonNegativeFloat]]]) – The scale (standard deviation) values of the Student’s t-distributions. Must be non-negative. Can be a 1D or 2D list.

  • nu (Union[List[PositiveFloat], List[List[PositiveFloat]]]) – The degrees of freedom of the Student’s t-distributions. Must be positive. Can be a 1D or 2D list.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_StudentTArray__context: Any) None

Initialize private numpy array attributes by converting lists to arrays once at initialization.

Parameters:

__context (Any) – Pydantic context (unused).

nu: List[Annotated[float, Gt(gt=0)]] | List[List[Annotated[float, Gt(gt=0)]]]
param_map: ClassVar[Dict[str, str]] = {'mu': 'loc', 'nu': 'df', 'sigma': 'scale'}
property shape: Tuple[Annotated[int, Gt(gt=0)], ...]

Get the shape of the mu array.

Returns:

The shape of the mu array.

Return type:

Tuple[PositiveInt, …]

classmethod validate_input_shapes(values)

Validate that all array-like parameters have the same shape.

Parameters:

values (dict or BaseLocationScaleArray instance) – Dictionary of field values or an already-instantiated object.

Returns:

Validated values dictionary or the object itself if already instantiated.

Return type:

dict or BaseLocationScaleArray instance

Raises:

ValueError – If array-like parameters have different shapes or empty dimensions.

pybandits.quantitative_model

class pybandits.quantitative_model.BaseCmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None], base_model_cold_start_kwargs: Dict[str, Any])

Bases: ZoomingModel, ABC

Zooming model for CMAB.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[BayesianNeuralNetwork]]) – Mapping of segments to Bayesian Logistic Regression models.

  • base_model_cold_start_kwargs (Dict[str, Any]) – Keyword arguments for the base model cold start.

base_model_cold_start_kwargs: Dict[str, Any]
property input_dim: int

Returns the input feature dimension (number of context features).

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None]
transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'sub_actions')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

classmethod validate_n_features(value)
class pybandits.quantitative_model.BaseQuantitativeBayesianNeuralNetwork(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], bnn: BayesianNeuralNetwork)

Bases: QuantitativeModel, ABC

A Bayesian Neural Network based QuantitativeModel.

This class implements a quantitative model using a Bayesian Neural Network where quantities are used as input features to predict reward probabilities. The BNN learns the relationship between quantities and rewards.

Parameters:
  • dimension (PositiveInt) – Number of quantity dimensions (input features for the BNN).

  • bnn (BayesianNeuralNetwork) – The underlying Bayesian Neural Network model.

bnn: BayesianNeuralNetwork
classmethod cold_start(dimension: Annotated[int, Gt(gt=0)] = 1, n_features: Annotated[int, Ge(ge=0)] = 1, categorical_features: Dict[Annotated[int, Ge(ge=0)], Annotated[int, Ge(ge=0)]] | None = None, base_model_cold_start_kwargs: Dict[str, Any] | None = FieldInfo(annotation=NoneType, required=False, default_factory=dict), **kwargs) Self

Create a cold start QuantitativeBayesianNeuralNetwork model.

Parameters:
  • dimension (PositiveInt) – Dimension of the quantity (action) space. Default is 1.

  • n_features (NonNegativeInt) – Total number of columns in the context array, including any categorical columns. Default is 1.

  • categorical_features (Optional[Dict[NonNegativeInt, NonNegativeInt]]) – Categorical context columns as {column_index: cardinality}.

  • base_model_cold_start_kwargs (Optional[Dict[str, Any]], optional) – Keyword arguments passed to BayesianNeuralNetwork.cold_start. May include e.g. hidden_dim_list, update_method, update_kwargs, dist_type, dist_params_init, activation, use_residual_connections, use_layerwise_scaling. Default is None.

  • **kwargs – Additional keyword arguments for the QuantitativeBayesianNeuralNetwork constructor.

Returns:

A cold start QuantitativeBayesianNeuralNetwork model.

Return type:

Self

property input_dim: Annotated[int, Gt(gt=0)]

Returns the expected context dimension of the model (number of context columns).

Returns:

The number of context columns expected by the model, i.e. feature_config.n_features - dimension.

Return type:

PositiveInt

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sample_proba(context: ndarray) List[Tuple[Callable[[ndarray], Probability], Callable[[ndarray], float]]]

Create probability functions which receive the context and creates a function that evaluates the probability given a quantity for each sample.

Parameters:

context (np.ndarray) – The context at which to evaluate the probability.

Returns:

A list of (probability, weight) callables per sample, each taking a quantity (Union[float, np.ndarray]).

Return type:

List[QuantitativeProbabilityWeight]

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.quantitative_model.BaseSmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None])

Bases: ZoomingModel, ABC

Zooming model for sMAB.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Beta]]) – Mapping of segments to Beta models.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None]
transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'sub_actions')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.quantitative_model.CmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None], base_model_cold_start_kwargs: Dict[str, Any])

Bases: BaseCmabZoomingModel

Zooming model for CMAB.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[BayesianNeuralNetwork]]) – Mapping of segments to Bayesian Logistic Regression models.

  • base_model_cold_start_kwargs (Dict[str, Any]) – Keyword arguments for the base model cold start.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'sub_actions')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.quantitative_model.CmabZoomingModelCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], BayesianNeuralNetwork | None], base_model_cold_start_kwargs: Dict[str, Any])

Bases: BaseCmabZoomingModel, QuantitativeModelCC

Zooming model for CMAB with cost control.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[BayesianNeuralNetwork]]) – Mapping of segments to Bayesian Logistic Regression models.

  • base_model_cold_start_kwargs (Dict[str, Any]) – Keyword arguments for the base model cold start.

  • cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost associated to the Beta distribution.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'sub_actions')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.quantitative_model.QuantitativeBayesianNeuralNetwork(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], bnn: BayesianNeuralNetwork)

Bases: BaseQuantitativeBayesianNeuralNetwork

A Bayesian Neural Network based QuantitativeModel.

This class implements a quantitative model using a Bayesian Neural Network where quantities are used as input features to predict reward probabilities. The BNN learns the relationship between quantities and rewards.

Parameters:
  • dimension (PositiveInt) – Number of quantity dimensions (input features for the BNN).

  • bnn (BayesianNeuralNetwork) – The underlying Bayesian Neural Network model.

  • hidden_dim_list (Optional[List[PositiveInt]]) – List of hidden layer dimensions for the BNN. None means no hidden layers.

  • update_method (str) – The method used for posterior inference, either “MCMC” or “VI”.

  • update_kwargs (Optional[dict]) – Additional keyword arguments for the update method.

Examples

>>> # Create a cold start model with 2 quantity dimensions
>>> model = QuantitativeBayesianNeuralNetwork.cold_start(
...     dimension=2,
...     hidden_dim_list=[8, 4],
...     update_method="VI"
... )
>>> # Sample probability functions (context required for BNN)
>>> context = np.zeros((3, 1))  # (n_samples, n_features)
>>> prob_funcs = model.sample_proba(context=context)
>>> # Evaluate probability at a specific quantity
>>> prob, weight = prob_funcs[0]
>>> prob_at_q = prob(np.array([0.3, 0.7]))
>>> # Update with observations
>>> quantities = [[0.2, 0.8], [0.5, 0.5], [0.9, 0.1]]
>>> rewards = [1, 0, 1]
>>> model._quantitative_update(quantities, rewards, context=context)
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.quantitative_model.QuantitativeBayesianNeuralNetworkCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], bnn: BayesianNeuralNetwork)

Bases: BaseQuantitativeBayesianNeuralNetwork, QuantitativeModelCC

A Bayesian Neural Network based QuantitativeModel with cost control.

This class extends QuantitativeBayesianNeuralNetwork with cost control functionality, allowing the model to incorporate cost considerations when making decisions.

Parameters:
  • dimension (PositiveInt) – Number of quantity dimensions (input features for the BNN).

  • bnn (BayesianNeuralNetwork) – The underlying Bayesian Neural Network model.

  • hidden_dim_list (Optional[List[PositiveInt]]) – List of hidden layer dimensions for the BNN. None means no hidden layers.

  • update_method (str) – The method used for posterior inference, either “MCMC” or “VI”.

  • update_kwargs (Optional[dict]) – Additional keyword arguments for the update method.

  • cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost function that takes a quantity value and returns the associated cost.

Examples

>>> # Create a cold start model with cost control
>>> model = QuantitativeBayesianNeuralNetworkCC.cold_start(
...     dimension=1,
...     hidden_dim_list=[4],
...     cost=lambda x: x * 0.1  # Linear cost function
... )
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.quantitative_model.QuantitativeModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)])

Bases: BaseModelSO, ABC

Base class for quantitative models.

Parameters:

dimension (PositiveInt) – Number of parameters of the model.

dimension: Annotated[int, Gt(gt=0)]
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstract sample_proba(**kwargs) List[Callable[[ndarray], Probability]] | List[Tuple[Callable[[ndarray], Probability], Callable[[ndarray], float]]]

Sample the model.

Returns:

A list of callables: either probability functions (quantity -> Probability) or (probability, weight) tuples. List length is equal to the number of samples.

Return type:

Union[List[QuantitativeProbability], List[QuantitativeProbabilityWeight]]

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.quantitative_model.QuantitativeModelCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]])

Bases: BaseModelCC, ABC

Class to model quantitative action cost.

Parameters:

cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost associated to the Beta distribution.

cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]]
classmethod deserialize_cost(value)

Deserialize cost from string representation if needed.

encode_cost(value)
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

static serialize_cost(cost_value) str

Serialize cost value to string representation.

classmethod validate_cost(value)

Deserialize cost from string representation if needed.

class pybandits.quantitative_model.Segment(*, intervals: Tuple[Tuple[Float_0_1, Float_0_1], ...])

Bases: PyBanditsBaseModel

This class is used to represent a segment of the quantities space. A segment is defined by a list of intervals, thus representing a hyper rectangle.

Parameters:

intervals (Tuple[Tuple[Float01, Float01], ...]) – Intervals of the segment.

intervals: Tuple[Tuple[Float_0_1, Float_0_1], ...]
property intervals_array: ndarray
is_adjacent(other: Segment) bool

Check if two segments are adjacent. Segments are adjacent if they share a face,

meaning they have identical intervals in all dimensions except one, where they touch.

Parameters:

other (Segment) – Segment to check for adjacency.

Returns:

Whether the segments are adjacent.

Return type:

bool

property maxs: ndarray
property mins: ndarray
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod segment_intervals_to_tuple(value)
split() Tuple[Segment, Segment]
class pybandits.quantitative_model.SmabZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None])

Bases: BaseSmabZoomingModel

Zooming model for sMAB.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Beta]]) – Mapping of segments to Beta models.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'sub_actions')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.quantitative_model.SmabZoomingModelCC(*, cost: Callable[[float | Annotated[float, Ge(ge=0)]], Annotated[float, Ge(ge=0)]], n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Beta | None])

Bases: BaseSmabZoomingModel, QuantitativeModelCC

Zooming model for sMAB with cost control.

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Beta]]) – Mapping of segments to Beta models.

  • cost (Callable[[Union[float, NonNegativeFloat]], NonNegativeFloat]) – Cost associated to the Beta distribution.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'sub_actions')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

class pybandits.quantitative_model.ZoomingModel(*, n_successes: Annotated[int, Gt(gt=0)] = 1, n_failures: Annotated[int, Gt(gt=0)] = 1, dimension: Annotated[int, Gt(gt=0)], comparison_threshold: Float_0_1 = 0.1, segment_update_factor: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Model | None])

Bases: QuantitativeModel, ABC

This class is used to implement the zooming method. The approach is based on adaptive discretization of the quantitative action space. The space is represented s a hyper cube with a dimension number of dimensions. After each update step, the model checks if the segments are interesting or nuisance based on segment_update_factor. If a segment is interesting, it can be split to two segments. In contrast, adjacent nuisance segments can be merged based on comparison_threshold. The number of segments can be limited using n_max_segments.

References

Multi-Armed Bandits in Metric Spaces (Kleinberg, Slivkins, and Upfal, 2008) https://arxiv.org/pdf/0809.4882

Parameters:
  • dimension (PositiveInt) – Number of parameters of the model.

  • comparison_threshold (Float01) – Comparison threshold.

  • segment_update_factor (Float01) – Segment update factor. If the number of samples in a segment is more than the average number of samples in all segments by this factor, the segment is considered interesting. If the number of samples in a segment is less than the average number of samples in all segments by this factor, the segment is considered a nuisance. Interest segments can be split, while nuisance segments can be merged.

  • n_comparison_points (PositiveInt) – Number of comparison points.

  • n_max_segments (PositiveInt) – Maximum number of segments.

  • sub_actions (Dict[Tuple[Tuple[Float01, Float01], ...], Optional[Model]]) – Mapping of segments to models.

classmethod cold_start(dimension: Annotated[int, Gt(gt=0)] = 1, comparison_threshold: Float_0_1 = 0.1, n_comparison_points: Annotated[int, Gt(gt=0)] = 1000, n_max_segments: Annotated[int, Gt(gt=0)] | None = 32, **kwargs) Self

Create a cold start model.

Returns:

Cold start model.

Return type:

ZoomingModel

comparison_threshold: Float_0_1
classmethod deserialize_sub_actions(value)

Convert sub_actions from a dict with string keys (json representation) to tuple (object representation).

dimension: Annotated[int, Gt(gt=0)]
is_similar_performance(segment1: Segment, segment2: Segment) bool

Check if two segments have similar performance.

Parameters:
  • segment1 (Segment) – First segment.

  • segment2 (Segment) – Second segment.

Returns:

Whether the segments have similar performance.

Return type:

bool

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_ZoomingModel__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

n_comparison_points: Annotated[int, Gt(gt=0)]
n_max_segments: Annotated[int, Gt(gt=0)] | None
sample_proba(**kwargs) List[Callable[[ndarray], Probability]]

Sample probability functions from the model.

Returns:

A list of functions that evaluate probability at any given location.

Return type:

List[QuantitativeProbability]

segment_update_factor: Float_0_1
property segmented_actions: Dict[Segment, Model | None]
serialize_sub_actions(value)
sub_actions: Dict[Tuple[Tuple[Float_0_1, Float_0_1], ...], Model | None]
transfer_extendable_keys: ClassVar[Tuple[str, ...]] = ()

Accumulated extendable keys from all classes in the MRO. Changes to these emit warnings (but not errors) during transfer.

transfer_learned_keys: ClassVar[Tuple[str, ...]] = ('n_successes', 'n_failures', 'sub_actions')

Accumulated learned-state keys from all classes in the MRO. Used by transfer.py to decide which keys to copy from source to target.

transfer_structural_keys: ClassVar[Tuple[str, ...]] = ('dimension',)

Accumulated structural keys from all classes in the MRO. Mismatches in these raise ValueError during transfer.

pybandits.strategy

class pybandits.strategy.BaseStrategy

Bases: PyBanditsBaseModel, ABC

Abstract base strategy for selecting actions in multi-armed bandits.

This class defines the interface that all bandit strategies must implement. Strategies determine how to select actions based on their estimated rewards and other criteria.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

abstract select_action(p: Dict[ActionId, float | Callable[[ndarray], float]], actions: Dict[ActionId, BaseModel], **kwargs) ActionId | Tuple[ActionId, Tuple[float, ...]]

Select an action based on the strategy’s selection criteria.

Parameters:
  • p (Dict[ActionId, Union[float, Callable[[np.ndarray], float]]]) – Dictionary mapping action IDs to either: - float: Fixed probability of positive reward - Callable: Function that computes probability given quantity vector

  • actions (Dict[ActionId, BaseModel]) – Dictionary mapping action IDs to their associated models.

  • **kwargs – Additional strategy-specific parameters.

Returns:

The selected action ID, either a simple ActionId or a tuple of (ActionId, quantity_vector) for quantitative actions.

Return type:

UnifiedActionId

class pybandits.strategy.BestActionIdentificationBandit(*, exploit_p: Float_0_1 | None = 0.5)

Bases: ClassicBandit

Best-Action Identification (BAI) strategy for multi-armed bandits.

This strategy balances between exploitation and exploration by probabilistically choosing between the best action and the second-best action. It’s designed for scenarios where identifying the truly best action is important.

Parameters:

exploit_p (Optional[Float01], default=0.5) – Probability of selecting the best action versus the second-best action. - If exploit_p = 1: Always selects the best action (pure exploitation/greedy). - If exploit_p = 0: Always selects the second-best action. - If exploit_p = 0.5: Equal probability of selecting best or second-best.

References

Simple Bayesian Algorithms for Best-Arm Identification (Russo, 2018) https://arxiv.org/pdf/1602.08448.pdf

exploit_p: Float_0_1 | None
model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod normalize_exploit_p(v)

Normalize the exploit_p field value to its default if None.

Parameters:

v (Any) – The exploit_p value to normalize.

Returns:

The original value if not None, otherwise 0.5.

Return type:

Float01

with_exploit_p(exploit_p: Float_0_1 | None) Self

Create a new instance with a different exploitation probability.

Parameters:

exploit_p (Optional[Float01], default=0.5) – Probability of selecting the best action versus the second-best action. - If exploit_p = 1: Always selects the best action (pure exploitation). - If exploit_p = 0: Always selects the second-best action. - If exploit_p = 0.5: Equal probability of selecting best or second-best.

Returns:

mutated_best_action_identification – A new instance with the specified exploitation probability.

Return type:

BestActionIdentificationBandit

class pybandits.strategy.ClassicBandit

Bases: SingleObjectiveStrategy

Classic Thompson Sampling strategy for multi-armed bandits.

This strategy implements pure exploitation by always selecting the action with the highest sampled probability of reward. It considers all actions without any filtering or cost considerations.

References

Analysis of Thompson Sampling for the Multi-armed Bandit Problem (Agrawal and Goyal, 2012) http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf

Thompson Sampling for Contextual Bandits with Linear Payoffs (Agrawal and Goyal, 2014) https://arxiv.org/pdf/1209.3352.pdf

get_prerequisites(p: Dict[ActionId, float | Callable[[ndarray], float]], actions: Dict[ActionId, BaseModel], constraint_list: List[Callable[[ndarray], bool]] | None) Dict[str, Any]

Compute prerequisites for classic bandit strategy.

Classic bandits don’t require any prerequisites as they consider all actions equally without additional filtering criteria.

Parameters:
  • p (Dict[ActionId, Union[float, Callable[[np.ndarray], float]]]) – Dictionary mapping action IDs to probability functions or values.

  • actions (Dict[ActionId, BaseModel]) – Dictionary mapping action IDs to their associated models.

  • constraint_list (Optional[List[Callable[[np.ndarray], bool]]]) – List of constraint functions (unused in classic bandit).

Returns:

Empty dictionary as no prerequisites are needed.

Return type:

Dict[str, Any]

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.strategy.CostControlBandit(*, subsidy_factor: Float_0_1 | None = 0.5)

Bases: SingleObjectiveStrategy, CostControlStrategy

Cost-controlled Thompson Sampling strategy for multi-armed bandits.

This strategy extends classic bandits by considering action costs. It first identifies a feasible set of actions whose rewards are within a tolerance of the best reward, then selects the lowest-cost action from this set.

The feasible action set is defined as those with expected rewards in the range [(1-subsidy_factor)*max_reward, max_reward], where max_reward is the highest sampled reward value.

Parameters:

subsidy_factor (Optional[Float01], default=0.5) – Tolerance factor defining the feasible action set. - If subsidy_factor = 1: Always selects minimum cost action. - If subsidy_factor = 0: Always selects highest reward action (classic bandit). - Values in between balance reward and cost considerations.

References

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638

Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488

get_prerequisites(p: Dict[ActionId, float | Callable[[ndarray], float]], actions: Dict[ActionId, BaseModel], constraint_list: List[Callable[[ndarray], bool]] | None) Dict[str, Any]

Compute the best available reward for defining the feasible action set.

This method finds the maximum reward value across all actions, which is used to determine the reward threshold for feasible actions.

Parameters:
  • p (Dict[ActionId, Union[float, Callable[[np.ndarray], float]]]) – Dictionary mapping action IDs to probability functions or values.

  • actions (Dict[ActionId, BaseModel]) – Dictionary mapping action IDs to their associated models.

  • constraint_list (Optional[List[Callable[[np.ndarray], bool]]]) – List of constraint functions for quantitative actions.

Returns:

Dictionary containing ‘best_value’: the maximum reward value.

Return type:

Dict[str, Any]

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.strategy.CostControlStrategy(*, subsidy_factor: Float_0_1 | None = 0.5)

Bases: PyBanditsBaseModel

Mixin class for cost-aware action selection strategies.

This class provides functionality for strategies that consider action costs in addition to rewards. It defines a feasible action set based on a tolerance threshold and selects the lowest-cost action from this set.

Parameters:

subsidy_factor (Optional[Float01], default=0.5) – Tolerance factor defining the feasible action set as those with rewards in the range [(1-subsidy_factor)*max_reward, max_reward]. - If subsidy_factor = 1: Selects minimum cost action (ignores rewards). - If subsidy_factor = 0: Selects highest reward action (ignores costs). - If subsidy_factor = 0.5: Balances between reward and cost.

References

Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints (Daulton et al., 2019) https://arxiv.org/abs/1911.00638

Multi-Armed Bandits with Cost Subsidy (Sinha et al., 2021) https://arxiv.org/abs/2011.01488

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

classmethod normalize_subsidy_factor(v)

Normalize the subsidy_factor field value to its default if None.

Parameters:

v (Any) – The subsidy_factor value to normalize.

Returns:

The original value if not None, otherwise 0.5.

Return type:

Float01

subsidy_factor: Float_0_1 | None
with_subsidy_factor(subsidy_factor: Float_0_1 | None) Self

Create a new instance with a different subsidy factor.

Parameters:

subsidy_factor (Optional[Float01], default=0.5) – Tolerance factor defining the feasible action set. - If subsidy_factor = 1: Selects minimum cost action (ignores rewards). - If subsidy_factor = 0: Selects highest reward action (ignores costs). - Values in between balance reward and cost considerations.

Returns:

A new instance with the specified subsidy factor.

Return type:

mutated_cost_control_bandit

class pybandits.strategy.MultiObjectiveBandit

Bases: MultiObjectiveStrategy

Multi-objective Thompson Sampling strategy for multi-armed bandits.

This strategy handles vector-valued rewards where each action produces multiple reward outcomes. Actions are selected from the Pareto front - the set of non-dominated actions where no other action is superior in all objectives.

The strategy uses Thompson Sampling for exploration by sampling from posterior distributions and then selecting uniformly from the resulting Pareto front.

References

Thompson Sampling for Multi-Objective Multi-Armed Bandits Problem (Yahyaa and Manderick, 2015) https://www.researchgate.net/publication/272823659_Thompson_Sampling_for_Multi-Objective_Multi-Armed_Bandits_Problem

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

objective_selector_class

alias of ClassicBandit

class pybandits.strategy.MultiObjectiveCostControlBandit(*, subsidy_factor: Float_0_1 | None = 0.5)

Bases: MultiObjectiveStrategy, CostControlStrategy

Multi-objective strategy with cost control for multi-armed bandits.

Combines multi-objective optimization with cost awareness. For each objective, identifies actions within a tolerance of the best reward, then considers only the lowest-cost actions from these feasible sets when computing the Pareto front.

This strategy is useful when actions have both multiple reward objectives and associated costs, requiring a balance between Pareto-optimality and cost efficiency.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

objective_selector_class

alias of CostControlBandit

class pybandits.strategy.MultiObjectiveStrategy

Bases: BaseStrategy, ABC

Abstract strategy for multi-objective multi-armed bandits.

This class handles bandits where each action has multiple reward objectives. It selects actions from the Pareto front - the set of non-dominated actions where no other action is better in all objectives.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

objective_selector_class: ClassVar[Type[SingleObjectiveStrategy]]
select_action(p: Dict[ActionId, List[float] | Callable[[ndarray], List[float]]], actions: Dict[ActionId, BaseModel]) ActionId | Tuple[ActionId, Tuple[float, ...]]

Select an action from the Pareto front.

This method finds all Pareto-optimal actions and randomly selects one, giving equal probability to each non-dominated action.

Parameters:
  • p (Dict[ActionId, Union[List[float], Callable[[np.ndarray], List[float]]]]) – Dictionary mapping action IDs to either: - List[float]: Fixed reward vector for multiple objectives - Callable: Function that computes reward vector given quantity

  • actions (Dict[ActionId, BaseModel]) – Dictionary mapping action IDs to their associated models.

Returns:

A randomly selected action from the Pareto front.

Return type:

UnifiedActionId

class pybandits.strategy.SingleObjectiveStrategy

Bases: BaseStrategy, ABC

Abstract strategy for single-objective multi-armed bandits.

This class handles bandits where each action has a single scalar reward. It provides a framework for refining actions based on constraints and selecting the best action according to strategy-specific criteria.

abstract get_prerequisites(p: Dict[ActionId, float | Callable[[ndarray], float]], actions: Dict[ActionId, BaseModel], constraint_list: List[Callable[[ndarray], bool]] | None) Dict[str, Any]

Compute prerequisites needed for strategy-specific action selection.

This method allows strategies to pre-compute values needed for their selection logic, such as the best available reward for cost control.

Parameters:
  • p (Dict[ActionId, Union[float, Callable[[np.ndarray], float]]]) – Dictionary mapping action IDs to probability functions or values.

  • actions (Dict[ActionId, BaseModel]) – Dictionary mapping action IDs to their associated models.

  • constraint_list (Optional[List[Callable[[np.ndarray], bool]]]) – List of constraint functions for quantitative actions.

Returns:

Dictionary of prerequisite values needed by the strategy.

Return type:

Dict[str, Any]

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

refine_p(p: Dict[ActionId, float | Callable[[ndarray], float]], actions: Dict[ActionId, BaseModel], constraint_list: List[Callable[[ndarray], bool]] | None) Dict[ActionId | Tuple[ActionId, Tuple[float, ...]], float]

Refine action probabilities by evaluating quantitative actions and filtering.

This method processes both standard and quantitative actions, evaluating quantitative functions at optimal points and filtering actions based on strategy-specific criteria.

Parameters:
  • p (Dict[ActionId, Union[float, Callable[[np.ndarray], float]]]) – Dictionary of actions and their probability functions or values.

  • actions (Dict[ActionId, BaseModel]) – Dictionary of actions and their associated models.

  • constraint_list (Optional[List[Callable[[np.ndarray], bool]]]) – List of constraint functions for quantitative actions.

Returns:

refined_p – Dictionary mapping unified action IDs to their refined probability values.

Return type:

Dict[UnifiedActionId, float]

select_action(p: Dict[ActionId, float | Callable[[ndarray], float]], actions: Dict[ActionId, BaseModel], constraint: Callable[[ndarray], bool] | None = None) ActionId | Tuple[ActionId, Tuple[float, ...]]

Select an action for single-objective optimization.

Parameters:
  • p (Dict[ActionId, Union[float, Callable[[np.ndarray], float]]]) – Dictionary mapping action IDs to either: - float: Fixed probability of positive reward - Callable: Function that computes probability given quantity vector

  • actions (Dict[ActionId, BaseModel]) – Dictionary mapping action IDs to their associated models.

  • constraint (Optional[Callable[[np.ndarray], bool]], default=None) – Optional constraint function that returns True if a quantity vector satisfies the constraints.

Returns:

The selected action ID, either a simple ActionId or a tuple of (ActionId, quantity_vector) for quantitative actions.

Return type:

UnifiedActionId

verify_and_select_from_quantitative_action(score_func: Callable[[ndarray], float], model: BaseModel, constraint_list: List[Callable[[ndarray], bool]] | None) ndarray | None

Public interface for verifying and selecting from quantitative actions.

This method wraps the private implementation to provide a clean public API for finding optimal quantities for quantitative actions.

Parameters:
  • score_func (Callable[[np.ndarray], float]) – Function that computes probability/score given a quantity vector.

  • model (BaseModel) – The model associated with this quantitative action.

  • constraint_list (Optional[List[Callable[[np.ndarray], bool]]]) – List of constraint functions that quantity must satisfy.

Returns:

Optimal quantity vector if found, None otherwise.

Return type:

Optional[np.ndarray]

pybandits.strategy.random() x in the interval [0, 1).

pybandits.actions_manager

class pybandits.actions_manager.ActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: PyBanditsBaseModel, ABC

Base class for managing actions and their associated models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update. The change point detection is based on the adaptive windowing scheme.

References

Scaling Multi-Armed Bandit Algorithms (Fouché et al., 2019) https://edouardfouche.com/publications/S-MAB_FOUCHE_KDD19.pdf

Parameters:
  • actions (Dict[ActionId, Model]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability]) – The confidence level for the adaptive window. None for skipping the change point detection.

actions: Dict[ActionId, BaseModel]
actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]]
classmethod at_least_one_action_is_defined(v)
delta: PositiveProbability | None
property maximum_memory_length: Annotated[int, Ge(ge=0)]

Get maximum possible memory length based on current action statistics.

Returns:

Maximum memory length allowed.

Return type:

NonNegativeInt

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None = None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, **kwargs)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[CmabModelType]

Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, CmabModelType]
classmethod check_models(v)
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: ndarray, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: ndarray | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

pybandits.actions_manager.CmabActionsManagerCC

alias of CmabActionsManager[Union[BayesianNeuralNetworkCC, QuantitativeBayesianNeuralNetworkCC]]

pybandits.actions_manager.CmabActionsManagerMO

alias of CmabActionsManager[BayesianNeuralNetworkMO]

pybandits.actions_manager.CmabActionsManagerMOCC

alias of CmabActionsManager[BayesianNeuralNetworkMOCC]

pybandits.actions_manager.CmabActionsManagerSO

alias of CmabActionsManager[Union[BayesianNeuralNetwork, QuantitativeBayesianNeuralNetwork]]

class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[CmabModelType]

Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, CmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod check_models(v)
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: ndarray, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: ndarray | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[CmabModelType]

Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, CmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod check_models(v)
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: ndarray, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: ndarray | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[CmabModelType]

Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, CmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod check_models(v)
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: ndarray, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: ndarray | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

class pybandits.actions_manager.CmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[CmabModelType]

Manages actions and their associated models for cMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBayesianNeuralNetwork]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, CmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod check_models(v)
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, context: ndarray, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None, context_memory: ndarray | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • context (ArrayLike of shape (n_samples, n_features)) – Matrix of contextual features.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

  • context_memory (Optional[ArrayLike] of shape (n_samples, n_features)) – Matrix of contextual features.

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

pybandits.actions_manager.SmabActionsManagerCC

alias of SmabActionsManager[Union[BetaCC, SmabZoomingModelCC]]

pybandits.actions_manager.SmabActionsManagerMO

alias of SmabActionsManager[BetaMO]

pybandits.actions_manager.SmabActionsManagerMOCC

alias of SmabActionsManager[BetaMOCC]

pybandits.actions_manager.SmabActionsManagerSO

alias of SmabActionsManager[Union[Beta, SmabZoomingModel]]

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

class pybandits.actions_manager.SmabActionsManager(delta: PositiveProbability | None = None, actions: Dict[ActionId, Model] | None = None, action_ids: Set[ActionId] | None = None, quantitative_action_ids: Set[ActionId] | None = None, kwargs: Dict[str, Any] | None = None, actions_with_change: Set[Tuple[ActionId, Annotated[int, Ge(ge=0)]]] | None = None)

Bases: ActionsManager, BaseModel, Generic[SmabModelType]

Manages actions and their associated models for sMAB models. The class allows to account for non-stationarity by providing an adaptive window scheme for action update.

Parameters:
  • actions (Dict[ActionId, BaseBeta]) – The list of possible actions, and their associated Model.

  • delta (Optional[PositiveProbability], 0.1 if not specified.) – The confidence level for the adaptive window.

actions: Dict[ActionId, SmabModelType]
actions_with_change: Set[Tuple[ActionId, NonNegativeInt]]
classmethod all_actions_have_same_number_of_objectives(actions: Dict[ActionId, SmabModelType])
delta: PositiveProbability | None
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'json_encoders': {<class 'collections.deque'>: <class 'list'>}}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

update(actions: List[ActionId], rewards: List[BinaryReward] | List[List[BinaryReward]], quantities: List[float | List[float] | None] | None, actions_memory: List[ActionId] | None = None, rewards_memory: List[BinaryReward] | List[List[BinaryReward]] | None = None)

Update the models associated with the given actions using the provided rewards. For adaptive window size, the update by resetting the action models and retraining them on the new data.

Parameters:
  • actions (List[ActionId]) – The selected action for each sample.

  • rewards (Union[List[BinaryReward], List[List[BinaryReward]]]) – The reward for each sample.

  • quantities (Optional[List[Union[float, List[float], None]]]) – The value associated with each action. If none, the value is not used, i.e. non-quantitative action.

  • actions_memory (Optional[List[ActionId]]) – List of previously selected actions.

  • rewards_memory (Optional[Union[List[BinaryReward], List[List[BinaryReward]]]]) – List of previously collected rewards.

pybandits.smab_simulator

class pybandits.smab_simulator.SmabSimulator(*, smab: BaseSmabBernoulli, n_updates: Annotated[int, Gt(gt=0)] = 10, batch_size: Annotated[int, Gt(gt=0)] = 100, probs_reward: Dict[ActionId, Probability | Callable[[ndarray], Probability]] | Dict[str, Dict[ActionId, Probability | Callable[[ndarray], Probability]]] | None = None, save: bool = False, path: str = '', file_prefix: str = '', random_seed: Annotated[int, Ge(ge=0)] | None = None, verbose: bool = False, visualize: bool = False)

Bases: Simulator

Simulate environment for stochastic multi-armed bandits.

This class performs simulation of stochastic Multi-Armed Bandits (sMAB). Data are processed in batches of size n>=1. Per each batch of simulated samples, the mab selects one action and collects the corresponding simulated reward for each sample. Then, prior parameters are updated based on returned rewards from recommended actions.

Parameters:

mab (BaseSmabBernoulli) – sMAB model.

mab: BaseSmabBernoulli
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_Simulator__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

probs_reward: Dict[ActionId, Probability | Callable[[ndarray], Probability]] | Dict[str, Dict[ActionId, Probability | Callable[[ndarray], Probability]]] | None
classmethod replace_null_and_validate_probs_reward(values)
classmethod validate_probs_reward_columns(values)

pybandits.cmab_simulator

class pybandits.cmab_simulator.CmabSimulator(*, cmab: BaseCmabBernoulli, n_updates: Annotated[int, Gt(gt=0)] = 10, batch_size: Annotated[int, Gt(gt=0)] = 100, probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None = None, save: bool = False, path: str = '', file_prefix: str = '', random_seed: Annotated[int, Ge(ge=0)] | None = None, verbose: bool = False, visualize: bool = False, context: ndarray, group: List | None = None)

Bases: Simulator

Simulate environment for contextual multi-armed bandit models.

This class simulates information required by the contextual bandit. Generated data are processed by the bandit with batches of size n>=1. For each batch of samples, actions are recommended by the bandit and corresponding simulated rewards collected. Bandit policy parameters are then updated based on returned rewards from recommended actions.

Parameters:
  • mab (BaseCmabBernoulli) – Contextual multi-armed bandit model

  • context (np.ndarray of shape (n_samples, n_feature)) – Context matrix of samples features.

  • group (Optional[List] with length=n_samples) – Group to which each sample belongs. Samples which belongs to the same group have features that come from the same distribution and they have the same probability to receive a positive/negative feedback from each action. If not supplied, all samples are assigned to the group.

context: ndarray
group: List | None
mab: BaseCmabBernoulli
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_Simulator__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None
classmethod replace_nulls_and_validate_sizes_and_dtypes(values)
classmethod validate_probs_reward_columns(values)

pybandits.offline_policy_evaluator

class pybandits.cmab_simulator.CmabSimulator(*, cmab: BaseCmabBernoulli, n_updates: Annotated[int, Gt(gt=0)] = 10, batch_size: Annotated[int, Gt(gt=0)] = 100, probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None = None, save: bool = False, path: str = '', file_prefix: str = '', random_seed: Annotated[int, Ge(ge=0)] | None = None, verbose: bool = False, visualize: bool = False, context: ndarray, group: List | None = None)

Bases: Simulator

Simulate environment for contextual multi-armed bandit models.

This class simulates information required by the contextual bandit. Generated data are processed by the bandit with batches of size n>=1. For each batch of samples, actions are recommended by the bandit and corresponding simulated rewards collected. Bandit policy parameters are then updated based on returned rewards from recommended actions.

Parameters:
  • mab (BaseCmabBernoulli) – Contextual multi-armed bandit model

  • context (np.ndarray of shape (n_samples, n_feature)) – Context matrix of samples features.

  • group (Optional[List] with length=n_samples) – Group to which each sample belongs. Samples which belongs to the same group have features that come from the same distribution and they have the same probability to receive a positive/negative feedback from each action. If not supplied, all samples are assigned to the group.

batch_size: PositiveInt
context: ndarray
file_prefix: str
group: List | None
mab: BaseCmabBernoulli
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_Simulator__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

n_updates: PositiveInt
path: str
probs_reward: Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]] | Dict[str, Dict[ActionId, Callable[[ndarray], Probability] | Callable[[ndarray, ndarray], Probability]]] | None
random_seed: NonNegativeInt | None
classmethod replace_nulls_and_validate_sizes_and_dtypes(values)
save: bool
classmethod validate_probs_reward_columns(values)
verbose: bool
visualize: bool

pybandits.offline_policy_estimator

Comprehensive Offline Policy Evaluation (OPE) estimators.

This module provides a complete set of estimators for OPE.

class pybandits.offline_policy_estimator.BalancedInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedInverseProbabilityWeighting

Balanced Inverse Probability Weighing estimator.

References

Balanced Off-Policy Evaluation in General Action Spaces (Sondhi, Arbour, and Dimmery, 2020) https://arxiv.org/pdf/1906.03694

Parameters:
  • alpha (Float01, defaults to 0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, defaults to 10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, defaults to None) – Random seed for bootstrap sampling.

  • ----------

  • Sondhi (Arjun)

  • Arbour (David)

  • Dimmery (and Drew)

  • Spaces." ("Balanced Off-Policy Evaluation in General Action)

  • 2020.

estimate_sample_rewards(reward: ndarray, expected_importance_weight: ndarray, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • expected_importance_weight (np.ndarray) – Array of expected importance weights.

Returns:

sample_reward – Estimated rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'b-ipw'
class pybandits.offline_policy_estimator.BaseOfflinePolicyEstimator(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: PyBanditsBaseModel, ABC

Base class for all OPE estimators.

This class defines the interface for all OPE estimators and provides a common method for estimating the policy value.

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

alpha: Float_0_1
estimate_policy_value_with_confidence_interval(**kwargs) Tuple[float, float, float, float]

Estimate the policy value with a confidence interval.

Parameters:

action (np.ndarray) – Array of actions taken.

Returns:

Estimated policy value, mean, lower bound, and upper bound of the confidence interval.

Return type:

Tuple[float, float, float, float]

abstract estimate_sample_rewards(**kwargs) ndarray

Estimate sample rewards.

Returns:

Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_bootstrap_samples: int
name: ClassVar
random_state: int | None
class pybandits.offline_policy_estimator.DirectMethod(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: BaseOfflinePolicyEstimator

Direct Method (DM) estimator.

This estimator uses the evaluation policy to Estimate the sample rewards.

References

The Offset Tree for Learning with Partial Labels (Beygelzimer and Langford, 2009) https://arxiv.org/pdf/0812.4044

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(estimated_policy: ndarray, expected_reward: ndarray, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • estimated_policy (np.ndarray) – Array of action distributions.

  • expected_reward (np.ndarray) – Array of expected rewards.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'dm'
class pybandits.offline_policy_estimator.DoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedDoublyRobust

Doubly Robust (DR) estimator.

Doubly Robust Policy Evaluation and Optimization (Dudík, Erhan, Langford, and Li, 2014) https://arxiv.org/pdf/1503.02834

More Robust Doubly Robust Off-policy Evaluation (Farajtabar, Chow, and Ghavamzadeh, 2018) https://arxiv.org/pdf/1802.03493

alphaFloat01, default=0.05

Significance level for confidence interval estimation.

n_bootstrap_samplesint, default=10000

Number of bootstrap samples for confidence interval estimation.

random_stateint, default=None

Random seed for bootstrap sampling.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'dr'
class pybandits.offline_policy_estimator.DoublyRobustWithOptimisticShrinkage(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, shrinkage_factor: Annotated[float, Ge(ge=0)] = 0.0)

Bases: DoublyRobust

Optimistic version of DRos estimator.

This estimator uses a shrinkage factor to shrink the importance weight in the native DR.

References

Doubly Robust Off-Policy Evaluation with Shrinkage (Su, Dimakopoulou, Krishnamurthy, and Dudik, 2020) https://arxiv.org/pdf/1907.09623

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

  • shrinkage_factor (float, default=0.0) – Shrinkage factor for the importance weights. If set to 0 or infinity, the estimator is equivalent to the native DM or DR estimators, respectively.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'dros-opt'
shrinkage_factor: Annotated[float, Ge(ge=0)]
class pybandits.offline_policy_estimator.DoublyRobustWithPessimisticShrinkage(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, shrinkage_factor: Annotated[float, Gt(gt=0)] = inf)

Bases: DoublyRobust

Pessimistic version of DRos estimator.

This estimator uses a shrinkage factor to shrink the importance weight in the native DR.

References

Doubly Robust Off-Policy Evaluation with Shrinkage (Su, Dimakopoulou, Krishnamurthy, and Dudik, 2020) https://arxiv.org/pdf/1907.09623

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

  • shrinkage_factor (float, default=0.0) – Shrinkage factor for the importance weights.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'dros-pess'
shrinkage_factor: Annotated[float, Gt(gt=0)]
class pybandits.offline_policy_estimator.GeneralizedDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: BaseOfflinePolicyEstimator, ABC

Abstract generalization of the Doubly Robust (DR) estimator.

References

Doubly Robust Policy Evaluation and Optimization (Dudík, Erhan, Langford, and Li, 2014) https://arxiv.org/pdf/1503.02834

More Robust Doubly Robust Off-policy Evaluation (Farajtabar, Chow, and Ghavamzadeh, 2018) https://arxiv.org/pdf/1802.03493

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(action: ndarray, reward: ndarray, propensity_score: ndarray, estimated_policy: ndarray, expected_reward: ndarray, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • action (np.ndarray) – Array of actions taken.

  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • propensity_score (np.ndarray) – Array of propensity scores.

  • estimated_policy (np.ndarray) – Array of action distributions.

  • expected_reward (np.ndarray) – Array of expected rewards.

Returns:

sample_reward – Estimated rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

class pybandits.offline_policy_estimator.GeneralizedInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: BaseOfflinePolicyEstimator, ABC

Abstract generalization of the Inverse Probability Weighting (IPW) estimator.

References

Learning from Logged Implicit Exploration Data (Strehl, Langford, Li, and Kakade, 2010) https://arxiv.org/pdf/1003.0120

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(reward: ndarray, shrinkage_method: Callable | None, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • shrinkage_method (Optional[Callable]) – Shrinkage method for the importance weights.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pybandits.offline_policy_estimator.InverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedInverseProbabilityWeighting

Inverse Probability Weighing (IPW) estimator.

References

Learning from Logged Implicit Exploration Data (Strehl, Langford, Li, and Kakade, 2010) https://arxiv.org/pdf/1003.0120

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(action: ndarray, reward: ndarray, propensity_score: ndarray, estimated_policy: ndarray, shrinkage_method: Callable | None = None, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • action (np.ndarray) – Array of actions taken.

  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • propensity_score (np.ndarray) – Array of propensity scores.

  • estimated_policy (np.ndarray) – Array of action distributions.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'ipw'
class pybandits.offline_policy_estimator.ReplayMethod(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: BaseOfflinePolicyEstimator

Replay Method estimator.

This estimator is a simple baseline that estimates the policy value by averaging the rewards of the matched samples.

References

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms (Li, Chu, Langford, and Wang, 2011) https://arxiv.org/pdf/1003.5956

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(action: ndarray, reward: ndarray, estimated_policy: ndarray, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • action (np.ndarray) – Array of actions taken.

  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • estimated_policy (np.ndarray) – Array of action distributions.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'rep'
class pybandits.offline_policy_estimator.SelfNormalizedDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedDoublyRobust

Self-Normalized Doubly Robust (SNDR) estimator.

This estimator uses the self-normalized importance weights to combine the DR and IPS estimators.

References

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning (Kallus and Uehara, 2019) https://arxiv.org/pdf/1906.03735

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'sndr'
class pybandits.offline_policy_estimator.SelfNormalizedInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: InverseProbabilityWeighting

Self-Normalized Inverse Propensity Score (SNIPS) estimator.

References

The Self-normalized Estimator for Counterfactual Learning (Swaminathan and Joachims, 2015) https://papers.nips.cc/paper_files/paper/2015/file/39027dfad5138c9ca0c474d71db915c3-Paper.pdf

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, default=None) – Random seed for bootstrap sampling.

estimate_sample_rewards(action: ndarray, reward: ndarray, propensity_score: ndarray, estimated_policy: ndarray, shrinkage_method: Callable | None = None, **kwargs) ndarray

Estimate the sample rewards.

Parameters:
  • action (np.ndarray) – Array of actions taken.

  • reward (np.ndarray) – Array of rewards corresponding to each action.

  • propensity_score (np.ndarray) – Array of propensity scores.

  • estimated_policy (np.ndarray) – Array of action distributions.

  • shrinkage_method (Optional[Callable]) – Shrinkage method for the importance weights.

Returns:

sample_reward – Estimated sample rewards.

Return type:

np.ndarray

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'snips'
class pybandits.offline_policy_estimator.SubGaussianDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None)

Bases: GeneralizedDoublyRobust

SubGaussian Doubly Robust estimator.

References

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning (Metelli, Russo, and Restelli, 2021) https://proceedings.neurips.cc/paper_files/paper/2021/file/4476b929e30dd0c4e8bdbcc82c6ba23a-Paper.pdf

Parameters:
  • alpha (Float01, defaults to 0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, defaults to 10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, defaults to None) – Random seed for bootstrap sampling.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'sg-dr'
class pybandits.offline_policy_estimator.SubGaussianInverseProbabilityWeighting(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, shrinkage_factor: Float_0_1 = 0.0)

Bases: InverseProbabilityWeighting

SubGaussian Inverse Probability Weighing estimator.

References

Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning (Metelli, Russo, and Restelli, 2021) https://proceedings.neurips.cc/paper_files/paper/2021/file/4476b929e30dd0c4e8bdbcc82c6ba23a-Paper.pdf

Parameters:
  • alpha (Float01, defaults to 0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, defaults to 10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (int, defaults to None) – Random seed for bootstrap sampling.

  • shrinkage_factor (Float01, defaults to 0.0) – Shrinkage factor for the importance weights.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: ClassVar = 'sg-ipw'
shrinkage_factor: Float_0_1
class pybandits.offline_policy_estimator.SwitchDoublyRobust(*, alpha: Float_0_1 = 0.05, n_bootstrap_samples: int = 10000, random_state: int | None = None, switch_threshold: float = inf)

Bases: DoublyRobust

Switch Doubly Robust (Switch-DR) estimator.

This estimator uses a switching rule based on the propensity score to combine the DR and IPS estimators.

References

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits (Wang, Agarwal, and Dudik, 2017) https://arxiv.org/pdf/1507.02646

Parameters:
  • alpha (Float01, default=0.05) – Significance level for confidence interval estimation.

  • n_bootstrap_samples (int, default=10000) – Number of bootstrap samples for confidence interval estimation.

  • random_state (Optional[int], default=None) – Random seed for bootstrap sampling.

  • switch_threshold (float, default=inf) – Threshold for the importance weight to switch between the DR and IPS estimators.

model_config: ClassVar[ConfigDict] = {'extra': 'forbid'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(_GeneralizedDoublyRobust__context: Any) None

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

name: ClassVar = 'switch-dr'
switch_threshold: float