LinearModel¶

class LinearModel¶

Bases: Model

baypy.model.model.LinearModel object.

Attributes¶

baypy.model.linear_model.LinearModel.datapandas.DataFrame: Data for the linear regression model, is a pandas.DataFrame containing all regressor variables \(X\) and the response variable \(y\).
baypy.model.linear_model.LinearModel.response_variablestring: Response variable \(y\) of the linear model.
baypy.model.linear_model.LinearModel.priorsdict: Priors for the regressors’ and variance parameters.
baypy.model.linear_model.LinearModel.variable_nameslist: The list of all model variables: the regressors \(X\), including the intercept and the variance \(\sigma^2\).
baypy.model.linear_model.LinearModel.posteriorsdict: Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.

Methods¶

baypy.model.linear_model.LinearModel.posteriors_to_frame(): Organizes the posteriors in a pandas.DataFrame.
baypy.model.linear_model.LinearModel.residuals(): Compute the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).
baypy.model.linear_model.LinearModel.predict_distribution(): Predicts a posterior distribution for an unobserved values.
baypy.model.linear_model.LinearModel.likelihood(): Computes the likelihood of observations model.response_variable given a model 'mean' and 'variance'.
baypy.model.linear_model.LinearModel.log_likelihood(): Computes the log likelihood of observations model.response_variable given a model 'mean' and 'variance'.

property data: DataFrame¶

Data for the linear regression model, is a pandas.DataFrame containing all regressor variables \(X\) and the response variable \(y\).

Returns¶

pandas.DataFrame: Observed data of the model. It cannot be empty. It must contain regressor variables \(X\) and the response variable \(y\).

Raises¶

TypeError: If data is not an instance of pandas.DataFrame.
ValueError: If data is an empty pandas.DataFrame.

likelihood(data: DataFrame) → ndarray¶

Computes the likelihood of observations response_variable given a model 'mean' and 'variance'.

Parameters¶

data: pandas.DataFrame: Data to use for likelihood computation. It cannot be empty. It must contain columns response_variable, 'mean' and 'variance'.

Returns¶

numpy.ndarray: Array of computed likelihood. It has the same length of data. Each element is a likelihood computation of each row of data.

Raises¶

TypeError

If data is not an instance of pandas.DataFrame.

ValueError

If data is an empty pandas.DataFrame,
if response_variable is not a column of data,
if 'mean' is not a column of data,
if 'variance' is not a column of data.

Notes¶

The likelihood is computed with the normal distribution probability density function:

\[L(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{- \frac{\left(y - \mu \right)^2}{2 \sigma^2}}\]

where \(\mu\) is the 'mean' column and \(\sigma^2\) is the 'variance' column.

log_likelihood(data: DataFrame) → ndarray¶

Computes the log likelihood of observations response_variable given a model 'mean' and 'variance'.

Parameters¶

data: pandas.DataFrame: Data to use for log likelihood computation. It cannot be empty. It must contain columns response_variable, 'mean' and 'variance'.

Returns¶

numpy.ndarray: Array of computed log likelihood. It has the same length of data. Each element is a log likelihood computation of each row of data.

Raises¶

TypeError

If data is not an instance of pandas.DataFrame.

ValueError

If data is an empty pandas.DataFrame,
if response_variable is not a column of data,
if 'mean' is not a column of data,
if 'variance' is not a column of data.

Notes¶

The log likelihood is computed as the log of the normal distribution probability density function:

\[l(y) = - \frac{1}{2} \log{2 \pi \sigma^2} - \frac{1}{2} \frac{\left(y - \mu \right)^2}{\sigma^2}\]

where \(\mu\) is the 'mean' column and \(\sigma^2\) is the 'variance' column.

property posteriors: dict¶

Posteriors of the regressors’ and variance parameters. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.

Returns¶

dict: Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.

Raises¶

TypeError

If posteriors is not a dict,
if a posterior sample is not a numpy.ndarray.

KeyError

If posteriors does not contain both intercept and variance keys.

ValueError

If a posterior sample is an empty numpy.ndarray.

posteriors_to_frame() → DataFrame¶

Organizes the posteriors in a pandas.DataFrame. Each posterior is a frame column. The length of the frame is the number of sampling iterations times the number of sampling chains.

Returns¶

pandas.DataFrame: Returns posterior samples. Posteriors are organized in a pandas.DataFrame, one for each column. The length of the frame is the number of sampling iterations times the number of sampling chains.

Raises¶

ValueError: If posteriors are not available because the method baypy.regression.LinearRegression.sample() has not been called yet.

predict_distribution(predictors: dict) → ndarray¶

Predicts a posterior distribution for an unobserved values. For each posterior sample, it draws a sample from: the likelihood.

Parameters¶

predictorsdict: Values of predictors \(X\) at which compute the posterior distribution. Each predictor has to be set as a key-value pair.

Returns¶

numpy.ndarray: Array of the predicted posterior distribution. It contains a number of element equal to the number of regression iterations times the number of model Markov chains.

Raises¶

TypeError: If predictors is not a dict.
KeyError: If a predictors key is not a key of posteriors.
ValueError: If predictors is an empty dict.

Returns¶

dict: Priors for each random variable. It must contain an intercept and a variance keys. Each value must be a dict with hyperparameter names as key and hyperparameter values as values.

Raises¶

TypeError

If priors is not a dict,
if a priors’ value is not a dict.

ValueError

If priors is an empty dict,
if a priors’ value is an empty dict,
if a variance value is not positive,
if a shape value is not positive,
if a scale value is not positive.

KeyError

If priors does not contain both intercept and variance keys,
if a prior’s hyperparameters are not:
- mean and variance for a regression parameter \(\beta_j\) or
- shape and scale for variance \(\sigma^2\).

Notes¶

To each random variables is assigned a prior distribution:

to each regressor parameter \(\beta_j\) is assigned a normal prior distribution with hyperparameters mean \(\beta_j^0\) and variance \(\Sigma_{\beta_j}^0\):

\[\beta_j \sim N(\beta_j^0 , \Sigma_{\beta_j}^0)\]
to variance \(\sigma^2\) is assigned an inverse gamma distribution with hyperparameters shape \(\kappa^0\) and scale \(\theta^0\):

\[\sigma^2 \sim \text{Inv-}\Gamma(\kappa^0, \theta^0)\]

Examples¶

Consider a linear regression of the response variable \(y\) with respect to regressors \(x_1\), \(x_2\) and \(x_3\), according to the following model:

\[y \sim N(\mu, \sigma^2)\]

\[\mu = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\]

then the sampler would require priors for:

parameter \(\beta_0\) of variable intercept, with mean \(\beta_0^0\) and variance \(\Sigma_{\beta_0}^0\)
parameter \(\beta_1\) of variable \(x_1\), with mean \(\beta_1^0\) and variance \(\Sigma_{\beta_1}^0\)
parameter \(\beta_2\) of variable \(x_2\), with mean \(\beta_2^0\) and variance \(\Sigma_{\beta_2}^0\)
parameter \(\beta_3\) of variable \(x_3\), with mean \(\beta_3^0\) and variance \(\Sigma_{\beta_3}^0\)
variable \(\sigma^2\), with shape \(\kappa^0\) and scale \(\theta^0\)

>>> model = baypy.model.LinearModel()
>>> model.set_priors({'intercept': {'mean': 0, 'variance': 1e6},
...                   'x_1': {'mean': 0, 'variance': 1e6},
...                   'x_2': {'mean': 0, 'variance': 1e6},
...                   'x_3': {'mean': 0, 'variance': 1e6},
...                   'variance': {'shape': 1, 'scale': 1e-6}})

residuals() → DataFrame¶

Compute the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).

Returns¶

pandas.DataFrame: Returns a copy of data with 3 more columns: intercept, predicted and residuals.

Raises¶

ValueError

If data is None because the property data has not been set,
if response_variable is not a column of data,
If a posteriors is None because the sampling has not been done yet.

Notes¶

Predicted values are computed at data points \(X\) using the posteriors means for each regressor’s parameter:

\[\hat{y_i} = \beta_0 + \sum_{j = 1}^{m} \beta_j x_{i,j}\]

while residuals are the difference between the observed values and the predicted values of the response_variable:

\[\epsilon_i = y_i - \hat{y_i}\]

property response_variable: str¶

Response variable \(y\) of the linear model.

Returns¶

string: Name of the response variable \(y\). In must be one of the columns of data.

Raises¶

TypeError: If response_variable is not a str.

property variable_names: list¶

Variables of the linear model.

Returns¶

list: The list of all model variables: the regressors \(X\), including the intercept and the variance \(\sigma^2\).

LinearModel¶

Attributes¶

Methods¶

Returns¶

Raises¶

Parameters¶

Returns¶

Raises¶

Notes¶

Parameters¶

Returns¶

Raises¶

Notes¶

Returns¶

Raises¶

Returns¶

Raises¶

Parameters¶

Returns¶

Raises¶

See Also¶

Returns¶

Raises¶

Notes¶

Examples¶

Returns¶

Raises¶

Notes¶

Returns¶

Raises¶

Returns¶