{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction\n", "\n", "This notebook demonstrates the use of Bayesian models for classification tasks, focusing on Bayesian Logistic Regression and Bayesian Neural Networks. Bayesian methods provide a probabilistic approach to machine learning, allowing for uncertainty quantification in predictions, which is particularly useful in decision-making under uncertainty.\n", "The notebook is based on the example in https://www.pymc.io/projects/examples/en/latest/variational_inference/bayesian_neural_network_advi.html\n", "\n", "### Objectives\n", "\n", "#### Train Bayesian models, including:\n", "Bayesian Logistic Regression: A probabilistic version of logistic regression.\n", "Bayesian Neural Networks: Neural networks with Bayesian inference for parameter estimation.\n", "\n", "#### Evaluation:\n", "\n", "Evaluate the performance of the models using metrics such as:\n", "Accuracy: To measure the proportion of correct predictions.\n", "ROC-AUC: To assess the model's ability to distinguish between classes.\n", "Visualization:\n", "\n", "Visualize the results, including decision boundaries and uncertainty estimates, to better understand the behavior of Bayesian models." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pymc as pm\n", "from sklearn.datasets import make_moons\n", "from sklearn.metrics import accuracy_score, roc_auc_score\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import scale\n", "\n", "from pybandits.model import BayesianLogisticRegression, BayesianNeuralNetwork\n", "\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_map(value, grid, x_test, pred, metric):\n", " cmap = plt.get_cmap(\"coolwarm\")\n", "\n", " fig, ax = plt.subplots(figsize=(10, 6))\n", "\n", " # Create the contour plot\n", " contour = ax.contourf(*grid, value.reshape(100, 100), cmap=cmap)\n", "\n", " # Scatter plot for the test points\n", " ax.scatter(x_test[pred == 0, 0], x_test[pred == 0, 1], label=\"Class 0\")\n", " ax.scatter(x_test[pred == 1, 0], x_test[pred == 1, 1], color=\"r\", label=\"Class 1\")\n", "\n", " # Add a colorbar\n", " cbar = plt.colorbar(contour, ax=ax)\n", " cbar.ax.set_ylabel(f\"{metric} of Output\")\n", "\n", " # Set axis limits and labels\n", " ax.set(xlim=(-3, 3), ylim=(-3, 3), xlabel=\"X1\", ylabel=\"X2\")\n", "\n", " # Add a legend\n", " ax.legend()\n", "\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first generate a binarly labeled data set, with a two dimensional feature space, and is not lineraly seprabale.\n", "We then split the data set to a training data setm and a test data set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X, Y = make_moons(noise=0.2, random_state=0, n_samples=1000)\n", "X = scale(X)\n", "x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.5)\n", "y_train = y_train.tolist()\n", "y_test = y_test.tolist()\n", "fig, ax = plt.subplots()\n", "ax.scatter(X[Y == 0, 0], X[Y == 0, 1], label=\"Class 0\")\n", "ax.scatter(X[Y == 1, 0], X[Y == 1, 1], color=\"r\", label=\"Class 1\")\n", "ax.set(xlabel=\"X1\", ylabel=\"X2\", title=\"Data set\")\n", "ax.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Bayesian logistic regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the cold_start method of BayesianLogisticRegression we can create a Bayesian Logistic Regresion model with default Student-T parameters.\n", "Default parameters values are mu (mean) = 0, sigma (standard deviation) = 10, nu (degrees of freedom) = 5.\n", "\n", "In this case we define the model to run with Variational inference which is usually faster than MCMC." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "blr = BayesianLogisticRegression.cold_start(n_features=2, update_method=\"VI\", update_kwargs={\"fit\": {\"n\": 40000}})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Applying the update method will calculate the approximated posterior of the parameters, and update the model. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "blr.update(context=x_train, rewards=y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to sample from the posterior distribution, we are creating the model with the new updated parameters, and generate samples using pm.sample_prior_predictive.\n", "\n", "We consider a point to belong to Class 1, if more than half of the samples belong to Class 1 is higher than 0.5." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_model = blr.create_model(x=x_test, is_predict=True)\n", "with _model:\n", " samples = pm.sample_prior_predictive(samples=500)\n", "\n", "pred_proba = samples[\"prior_predictive\"][\"out\"].squeeze().values.mean(axis=0)\n", "pred = pred_proba > 0.5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since the Bayesian Logistic Regression model is linear, it's not able do separate the space accurately." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots()\n", "ax.scatter(x_test[pred == 0, 0], x_test[pred == 0, 1], label=\"Class 0\")\n", "ax.scatter(x_test[pred == 1, 0], x_test[pred == 1, 1], color=\"r\", label=\"Class 1\")\n", "ax.set(title=\"Predicted labels in testing set\", xlabel=\"X1\", ylabel=\"X2\")\n", "ax.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we calculate the accuracy and AUC score of the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calculate accuracy\n", "accuracy = accuracy_score(y_test, pred)\n", "print(f\"Accuracy: {accuracy}\")\n", "\n", "# Calculate AUC\n", "auc = roc_auc_score(y_test, pred_proba)\n", "print(f\"AUC: {auc}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check the convergence by plotting the Evidence Lower Bound (ELBO) over iterations.\n", "ELBO is a metric used in Variational Inference to measure the convergence of the approximation to the true posterior.\n", "A higher ELBO indicates a better approximation.\n", "https://en.wikipedia.org/wiki/Evidence_lower_bound\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(blr.approx_history)\n", "plt.ylabel(\"ELBO\")\n", "plt.xlabel(\"iteration\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Predictions + uncertainty map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to visualize the posterior predictive distribution, we can create a grid of points in the input space and evaluate the model at those points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "grid = np.mgrid[-3:3:100j, -3:3:100j].astype(float)\n", "grid_2d = grid.reshape(2, -1).T\n", "\n", "_model = blr.create_model(grid_2d, is_predict=True)\n", "with _model:\n", " samples = pm.sample_prior_predictive(samples=500)\n", "\n", "prob = samples[\"prior_predictive\"][\"out\"].squeeze().values.mean(axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the average of the output of the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_map(prob, grid, x_test, pred, \"Mean\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the standard deviation of the output of the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "std = samples[\"prior_predictive\"][\"out\"].squeeze().values.std(axis=0)\n", "plot_map(std, grid, x_test, pred, \"Standard Deviation\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Bayesian Neural Network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following section we will demonstrate how a bayesian neural network (BNN) can capture complex patterns and relationships in data through multiple layers of abstraction, leading to better performance on tasks with high-dimensional features. In contrast, bayesian logistic regression is limited to linear relationships and may not effectively model intricate dependencies within the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to define the model structure, we set the hidden_dim_list=[5, 5], which means that the model structure will have a 2 layers network - with first and second layers having five neurons.\n", "\n", "The neurons' Student-T distribution are initialized with mu = 0, sigma = 1 and nu = 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dist_params_init = {\"mu\": 0, \"sigma\": 1, \"nu\": 5}\n", "bnn = BayesianNeuralNetwork.cold_start(\n", " n_features=2,\n", " hidden_dim_list=[5, 5],\n", " update_method=\"VI\",\n", " dist_params_init=dist_params_init,\n", " update_kwargs={\"fit\": {\"n\": 100000}},\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Applying the update method will calculate the approximated posterior of the parameters, and update the model. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bnn.update(context=x_train, rewards=y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the separation is much better than the bayesian logistic model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_model = bnn.create_model(x_test, is_predict=True)\n", "with _model:\n", " samples = pm.sample_prior_predictive(samples=500)\n", "\n", "pred_proba = samples[\"prior_predictive\"][\"out\"].squeeze().values.mean(axis=0)\n", "pred = pred_proba > 0.5\n", "\n", "fig, ax = plt.subplots()\n", "ax.scatter(x_test[pred == 0, 0], x_test[pred == 0, 1])\n", "ax.scatter(x_test[pred == 1, 0], x_test[pred == 1, 1], color=\"r\")\n", "ax.set(title=\"Predicted labels in testing set\", xlabel=\"X\", ylabel=\"Y\");" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calculate accuracy\n", "accuracy = accuracy_score(y_test, pred)\n", "print(f\"Accuracy: {accuracy}\")\n", "\n", "# Calculate AUC\n", "auc = roc_auc_score(y_test, pred_proba)\n", "print(f\"AUC: {auc}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plotting the ELBO convergence" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(bnn.approx_history)\n", "plt.ylabel(\"ELBO\")\n", "plt.xlabel(\"iteration\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Predictions + uncertainty map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now visualize the posterior predictive distribution of output of the bayesian neural network model\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "grid = np.mgrid[-3:3:100j, -3:3:100j].astype(float)\n", "grid_2d = grid.reshape(2, -1).T\n", "_model = bnn.create_model(grid_2d, is_predict=True)\n", "with _model:\n", " samples = pm.sample_prior_predictive(samples=500)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the average of the output of the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mean = samples[\"prior_predictive\"][\"out\"].squeeze().values.mean(axis=0)\n", "plot_map(mean, grid, x_test, pred, \"Mean\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the standard deviation of the output of the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "std = samples[\"prior_predictive\"][\"out\"].squeeze().values.std(axis=0)\n", "plot_map(std, grid, x_test, pred, \"Standard Deviation\")" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }