model

class LinearModel

Bases: Model

LinearModel object.

Attributes

datapandas.DataFrame

Data for the linear regression model, is a pandas.DataFrame containing all regressor variables \(X\) and the response variable \(y\).

response_variablestr

Response variable \(y\) of the linear model.

priorsdict

Priors for the regressors’ and variance parameters.

variable_nameslist

The list of all model variables: the regressors \(X\), including the 'intercept' and the 'variance' \(\sigma^2\).

posteriorsdict

Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.

Methods

posteriors_to_frame()

It organizes the posteriors in a pandas.DataFrame.

residuals()

It computes the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).

predict_distribution()

It predicts a posterior distribution for an unobserved values.

likelihood()

It computes the likelihood of observations response_variable given a model 'mean' and 'variance'.

log_likelihood()

It computes the log likelihood of observations response_variable given a model 'mean' and 'variance'.

property data: DataFrame

Data for the linear regression model, is a pandas.DataFrame containing all regressor variables \(X\) and the response variable \(y\).

Returns

pandas.DataFrame

Observed data of the model. It cannot be empty. It must contain regressor variables \(X\) and the response_variable \(y\).

Raises

TypeError

If data is not an instance of pandas.DataFrame.

ValueError

If data is an empty pandas.DataFrame.

likelihood(data: DataFrame) ndarray

It computes the likelihood of observations response_variable given a model 'mean' and 'variance'.

Parameters

data: pandas.DataFrame

Data to use for likelihood computation. It cannot be empty. It must contain columns response_variable, 'mean' and 'variance'.

Returns

numpy.ndarray

Array of computed likelihood. It has the same length of data. Each element is a likelihood computation of each row of data.

Raises

TypeError

If data is not an instance of pandas.DataFrame.

ValueError

Notes

The likelihood is computed with the normal distribution probability density function:

\[L(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{- \frac{\left(y - \mu \right)^2}{2 \sigma^2}}\]

where \(\mu\) is the 'mean' column and \(\sigma^2\) is the 'variance' column.

log_likelihood(data: DataFrame) ndarray

It computes the log likelihood of observations response_variable given a model 'mean' and 'variance'.

Parameters

data: pandas.DataFrame

Data to use for log likelihood computation. It cannot be empty. It must contain columns response_variable, 'mean' and 'variance'.

Returns

numpy.ndarray

Array of computed log likelihood. It has the same length of data. Each element is a log likelihood computation of each row of data.

Raises

TypeError

If data is not an instance of pandas.DataFrame.

ValueError

Notes

The log likelihood is computed as the log of the normal distribution probability density function:

\[l(y) = - \frac{1}{2} \log{2 \pi \sigma^2} - \frac{1}{2} \frac{\left(y - \mu \right)^2}{\sigma^2}\]

where \(\mu\) is the 'mean' column and \(\sigma^2\) is the 'variance' column.

property posteriors: dict[str, ndarray]

Posteriors of the regressors’ and variance parameters. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.

Returns

dict

Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.

Raises

TypeError
KeyError

If posteriors does not contain both 'intercept' and 'variance' keys.

ValueError

If a posterior sample is an empty numpy.ndarray.

posteriors_to_frame() DataFrame

It organizes the posteriors in a pandas.DataFrame. Each posterior is a frame column. The length of the frame is the number of sampling iterations times the number of sampling chains.

Returns

pandas.DataFrame

Returns posterior samples. Posteriors are organized in a pandas.DataFrame, one for each column. The length of the frame is the number of sampling iterations times the number of sampling chains.

Raises

ValueError

If posteriors are not available because the method LinearRegression.sample has not been called yet.

predict_distribution(predictors: dict[str, float | int]) np.ndarray

It predicts a posterior distribution for an unobserved values. For each posterior sample, it draws a sample from the likelihood.

Parameters

predictorsdict

Values of predictors \(X\) at which compute the posterior distribution. Each predictor has to be set as a key-value pair.

Returns

numpy.ndarray

Array of the predicted posterior distribution. It contains a number of element equal to the number of regression iterations times the number of model Markov chains.

Raises

TypeError

If predictors is not a dict.

KeyError

If a predictors key is not a key of posteriors.

ValueError

If predictors is an empty dict.

See Also

LinearRegression

property priors: dict[str, dict[str, float | int]]

Priors for the regressors’ and variance parameters. Each prior is a key-value pair, where the value is a dict with:

  • hyperparameter names as keys

  • hyperparameter values as values.

Returns

dict

Priors for each random variable. It must contain an 'intercept' and a 'variance' keys. Each value must be a dict with hyperparameter names as key and hyperparameter values as values.

Raises

TypeError
ValueError
  • If priors is an empty dict,

  • if a priors’ value is an empty dict,

  • if a 'variance' value is not positive,

  • if a 'shape' value is not positive,

  • if a 'scale' value is not positive.

KeyError
  • If priors does not contain both 'intercept' and 'variance' keys,

  • if a prior’s hyperparameters are not:
    • 'mean' and 'variance' for a regression parameter \(\beta_j\) or

    • 'shape' and 'scale' for variance \(\sigma^2\).

Notes

To each random variables is assigned a prior distribution:

  • to each regressor parameter \(\beta_j\) is assigned a normal prior distribution with hyperparameters 'mean' \(\beta_j^0\) and 'variance' \(\Sigma_{\beta_j}^0\):

    \[\beta_j \sim N(\beta_j^0 , \Sigma_{\beta_j}^0)\]
  • to variance \(\sigma^2\) is assigned an inverse gamma distribution with hyperparameters 'shape' \(\kappa^0\) and 'scale' \(\theta^0\):

    \[\sigma^2 \sim \text{Inv-}\Gamma(\kappa^0, \theta^0)\]

Examples

Consider a linear regression of the response_variable \(y\) with respect to regressors \(x_1\), \(x_2\) and \(x_3\), according to the following model:

\[y \sim N(\mu, \sigma^2)\]
\[\mu = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\]
then the sampler would require priors for:
  • parameter \(\beta_0\) of variable 'intercept', with 'mean' \(\beta_0^0\) and 'variance' \(\Sigma_{\beta_0}^0\)

  • parameter \(\beta_1\) of variable \(x_1\), with 'mean' \(\beta_1^0\) and 'variance' \(\Sigma_{\beta_1}^0\)

  • parameter \(\beta_2\) of variable \(x_2\), with 'mean' \(\beta_2^0\) and 'variance' \(\Sigma_{\beta_2}^0\)

  • parameter \(\beta_3\) of variable \(x_3\), with 'mean' \(\beta_3^0\) and 'variance' \(\Sigma_{\beta_3}^0\)

  • variable \(\sigma^2\), with 'shape' \(\kappa^0\) and 'scale' \(\theta^0\)

>>> model = baypy.model.LinearModel()
>>> model.priors = {'intercept': {'mean': 0, 'variance': 1e6},
...                 'x_1': {'mean': 0, 'variance': 1e6},
...                 'x_2': {'mean': 0, 'variance': 1e6},
...                 'x_3': {'mean': 0, 'variance': 1e6},
...                 'variance': {'shape': 1, 'scale': 1e-6}}
residuals() DataFrame

It computes the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).

Returns

pandas.DataFrame

Returns a copy of data with 3 more columns: 'intercept', 'predicted' and 'residuals'.

Raises

ValueError

Notes

Predicted values are computed at data points \(X\) using the posteriors means for each regressor’s parameter:

\[\hat{y_i} = \beta_0 + \sum_{j = 1}^{m} \beta_j x_{i,j}\]

while residuals are the difference between the observed values and the predicted values of the response_variable:

\[\epsilon_i = y_i - \hat{y_i}\]
property response_variable: str

Response variable \(y\) of the linear model.

Returns

str

Name of the response variable \(y\). In must be one of the columns of data.

Raises

TypeError

If response_variable is not a str.

property variable_names: list[str]

Variables of the linear model.

Returns

list

The list of all model variables: the regressors \(X\), including the 'intercept' and the 'variance' \(\sigma^2\).

class Model

Bases: ABC

Model object.

Abstract base class for creating model objects.

See Also

LinearModel

abstract property data: None
abstract likelihood(data: DataFrame) None
abstract log_likelihood(data: DataFrame) None
abstract property posteriors: None
abstract posteriors_to_frame() None
abstract predict_distribution(predictors: dict[str, float | int]) None
abstract property priors: None
abstract residuals() None
abstract property response_variable: None
abstract property variable_names: None