LinearModel¶
- class LinearModel¶
Bases:
Modelbaypy.model.model.LinearModel object.
Attributes¶
baypy.model.linear_model.LinearModel.datapandas.DataFrameData for the linear regression model, is a
pandas.DataFramecontaining all regressor variables \(X\) and the response variable \(y\).baypy.model.linear_model.LinearModel.response_variablestringResponse variable \(y\) of the linear model.
baypy.model.linear_model.LinearModel.priorsdictPriors for the regressors’ and variance parameters.
baypy.model.linear_model.LinearModel.variable_nameslistThe list of all model variables: the regressors \(X\), including the
interceptand thevariance\(\sigma^2\).baypy.model.linear_model.LinearModel.posteriorsdictPosterior samples. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.
Methods¶
baypy.model.linear_model.LinearModel.posteriors_to_frame()Organizes the
posteriorsin apandas.DataFrame.baypy.model.linear_model.LinearModel.residuals()Compute the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).
baypy.model.linear_model.LinearModel.predict_distribution()Predicts a posterior distribution for an unobserved values.
baypy.model.linear_model.LinearModel.likelihood()Computes the likelihood of observations
model.response_variablegiven a model'mean'and'variance'.baypy.model.linear_model.LinearModel.log_likelihood()Computes the log likelihood of observations
model.response_variablegiven a model'mean'and'variance'.
- property data: DataFrame¶
Data for the linear regression model, is a
pandas.DataFramecontaining all regressor variables \(X\) and the response variable \(y\).Returns¶
- pandas.DataFrame
Observed data of the model. It cannot be empty. It must contain regressor variables \(X\) and the response variable \(y\).
Raises¶
- likelihood(data: DataFrame) ndarray¶
Computes the likelihood of observations
response_variablegiven a model'mean'and'variance'.Parameters¶
- data: pandas.DataFrame
Data to use for likelihood computation. It cannot be empty. It must contain columns
response_variable,'mean'and'variance'.
Returns¶
- numpy.ndarray
Array of computed likelihood. It has the same length of
data. Each element is a likelihood computation of each row ofdata.
Raises¶
- TypeError
If
datais not an instance ofpandas.DataFrame.- ValueError
If
datais an emptypandas.DataFrame,if
response_variableis not a column ofdata,if
'mean'is not a column ofdata,if
'variance'is not a column ofdata.
Notes¶
The likelihood is computed with the normal distribution probability density function:
\[L(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{- \frac{\left(y - \mu \right)^2}{2 \sigma^2}}\]where \(\mu\) is the
'mean'column and \(\sigma^2\) is the'variance'column.
- log_likelihood(data: DataFrame) ndarray¶
Computes the log likelihood of observations
response_variablegiven a model'mean'and'variance'.Parameters¶
- data: pandas.DataFrame
Data to use for log likelihood computation. It cannot be empty. It must contain columns
response_variable,'mean'and'variance'.
Returns¶
- numpy.ndarray
Array of computed log likelihood. It has the same length of
data. Each element is a log likelihood computation of each row ofdata.
Raises¶
- TypeError
If
datais not an instance ofpandas.DataFrame.- ValueError
If
datais an emptypandas.DataFrame,if
response_variableis not a column ofdata,if
'mean'is not a column ofdata,if
'variance'is not a column ofdata.
Notes¶
The log likelihood is computed as the log of the normal distribution probability density function:
\[l(y) = - \frac{1}{2} \log{2 \pi \sigma^2} - \frac{1}{2} \frac{\left(y - \mu \right)^2}{\sigma^2}\]where \(\mu\) is the
'mean'column and \(\sigma^2\) is the'variance'column.
- property posteriors: dict¶
Posteriors of the regressors’ and variance parameters. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.Returns¶
- dict
Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.
Raises¶
- TypeError
If
posteriorsis not adict,if a posterior sample is not a
numpy.ndarray.
- KeyError
If
posteriorsdoes not contain bothinterceptandvariancekeys.- ValueError
If a posterior sample is an empty
numpy.ndarray.
- posteriors_to_frame() DataFrame¶
Organizes the
posteriorsin apandas.DataFrame. Each posterior is a frame column. The length of the frame is the number of sampling iterations times the number of sampling chains.Returns¶
- pandas.DataFrame
Returns posterior samples. Posteriors are organized in a
pandas.DataFrame, one for each column. The length of the frame is the number of sampling iterations times the number of sampling chains.
Raises¶
- ValueError
If
posteriorsare not available because the methodbaypy.regression.LinearRegression.sample()has not been called yet.
- predict_distribution(predictors: dict) ndarray¶
- Predicts a posterior distribution for an unobserved values. For each posterior sample, it draws a sample from
the likelihood.
Parameters¶
- predictorsdict
Values of predictors \(X\) at which compute the posterior distribution. Each predictor has to be set as a key-value pair.
Returns¶
- numpy.ndarray
Array of the predicted posterior distribution. It contains a number of element equal to the number of regression iterations times the number of model Markov chains.
Raises¶
- TypeError
If
predictorsis not adict.- KeyError
If a
predictorskey is not a key ofposteriors.- ValueError
If
predictorsis an emptydict.
See Also¶
- property priors: dict¶
Priors for the regressors’ and variance parameters. Each prior is a key-value pair, where the value is a
dictwith:hyperparameter names as keys
hyperparameter values as values.
Returns¶
- dict
Priors for each random variable. It must contain an
interceptand avariancekeys. Each value must be adictwith hyperparameter names as key and hyperparameter values as values.
Raises¶
- TypeError
- ValueError
- KeyError
If
priorsdoes not contain bothinterceptandvariancekeys,- if a prior’s hyperparameters are not:
meanandvariancefor a regression parameter \(\beta_j\) orshapeandscaleforvariance\(\sigma^2\).
Notes¶
To each random variables is assigned a prior distribution:
to each regressor parameter \(\beta_j\) is assigned a normal prior distribution with hyperparameters
mean\(\beta_j^0\) andvariance\(\Sigma_{\beta_j}^0\):\[\beta_j \sim N(\beta_j^0 , \Sigma_{\beta_j}^0)\]to variance \(\sigma^2\) is assigned an inverse gamma distribution with hyperparameters
shape\(\kappa^0\) andscale\(\theta^0\):\[\sigma^2 \sim \text{Inv-}\Gamma(\kappa^0, \theta^0)\]
Examples¶
Consider a linear regression of the response variable \(y\) with respect to regressors \(x_1\), \(x_2\) and \(x_3\), according to the following model:
\[y \sim N(\mu, \sigma^2)\]\[\mu = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\]- then the sampler would require priors for:
parameter \(\beta_0\) of variable
intercept, withmean\(\beta_0^0\) andvariance\(\Sigma_{\beta_0}^0\)parameter \(\beta_1\) of variable \(x_1\), with
mean\(\beta_1^0\) andvariance\(\Sigma_{\beta_1}^0\)parameter \(\beta_2\) of variable \(x_2\), with
mean\(\beta_2^0\) andvariance\(\Sigma_{\beta_2}^0\)parameter \(\beta_3\) of variable \(x_3\), with
mean\(\beta_3^0\) andvariance\(\Sigma_{\beta_3}^0\)variable \(\sigma^2\), with
shape\(\kappa^0\) andscale\(\theta^0\)
>>> model = baypy.model.LinearModel() >>> model.set_priors({'intercept': {'mean': 0, 'variance': 1e6}, ... 'x_1': {'mean': 0, 'variance': 1e6}, ... 'x_2': {'mean': 0, 'variance': 1e6}, ... 'x_3': {'mean': 0, 'variance': 1e6}, ... 'variance': {'shape': 1, 'scale': 1e-6}})
- residuals() DataFrame¶
Compute the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).
Returns¶
- pandas.DataFrame
Returns a copy of
datawith 3 more columns:intercept,predictedandresiduals.
Raises¶
- ValueError
if
response_variableis not a column ofdata,If a
posteriorsisNonebecause the sampling has not been done yet.
Notes¶
Predicted values are computed at data points \(X\) using the posteriors means for each regressor’s parameter:
\[\hat{y_i} = \beta_0 + \sum_{j = 1}^{m} \beta_j x_{i,j}\]while residuals are the difference between the observed values and the predicted values of the
response_variable:\[\epsilon_i = y_i - \hat{y_i}\]
- property response_variable: str¶
Response variable \(y\) of the linear model.
Returns¶
- string
Name of the response variable \(y\). In must be one of the columns of
data.
Raises¶
- TypeError
If
response_variableis not astr.