LinearModel¶
- class LinearModel¶
Bases:
ModelLinearModelobject.Attributes¶
datapandas.DataFrameData for the linear regression model, is a
pandas.DataFramecontaining all regressor variables \(X\) and the response variable \(y\).response_variablestrResponse variable \(y\) of the linear model.
priorsdictPriors for the regressors’ and variance parameters.
variable_nameslistThe list of all model variables: the regressors \(X\), including the
'intercept'and the'variance'\(\sigma^2\).posteriorsdictPosterior samples. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.
Methods¶
posteriors_to_frame()It organizes the
posteriorsin apandas.DataFrame.residuals()It computes the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).
predict_distribution()It predicts a posterior distribution for an unobserved values.
likelihood()It computes the likelihood of observations
response_variablegiven a model'mean'and'variance'.log_likelihood()It computes the log likelihood of observations
response_variablegiven a model'mean'and'variance'.
- property data: DataFrame¶
Data for the linear regression model, is a
pandas.DataFramecontaining all regressor variables \(X\) and the response variable \(y\).Returns¶
pandas.DataFrameObserved data of the model. It cannot be empty. It must contain regressor variables \(X\) and the
response_variable\(y\).
Raises
TypeErrorIf
datais not an instance ofpandas.DataFrame.ValueErrorIf
datais an emptypandas.DataFrame.
- likelihood(data: DataFrame) ndarray¶
It computes the likelihood of observations
response_variablegiven a model'mean'and'variance'.Parameters¶
data:pandas.DataFrameData to use for likelihood computation. It cannot be empty. It must contain columns
response_variable,'mean'and'variance'.
Returns¶
numpy.ndarrayArray of computed likelihood. It has the same length of
data. Each element is a likelihood computation of each row ofdata.
Raises
TypeErrorIf
datais not an instance ofpandas.DataFrame.ValueErrorIf
datais an emptypandas.DataFrame,if
response_variableis not a column ofdata,if
'mean'is not a column ofdata,if
'variance'is not a column ofdata.
Notes
The likelihood is computed with the normal distribution probability density function:
\[L(y) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{- \frac{\left(y - \mu \right)^2}{2 \sigma^2}}\]where \(\mu\) is the
'mean'column and \(\sigma^2\) is the'variance'column.
- log_likelihood(data: DataFrame) ndarray¶
It computes the log likelihood of observations
response_variablegiven a model'mean'and'variance'.Parameters¶
data:pandas.DataFrameData to use for log likelihood computation. It cannot be empty. It must contain columns
response_variable,'mean'and'variance'.
Returns¶
numpy.ndarrayArray of computed log likelihood. It has the same length of
data. Each element is a log likelihood computation of each row ofdata.
Raises
TypeErrorIf
datais not an instance ofpandas.DataFrame.ValueErrorIf
datais an emptypandas.DataFrame,if
response_variableis not a column ofdata,if
'mean'is not a column ofdata,if
'variance'is not a column ofdata.
Notes
The log likelihood is computed as the log of the normal distribution probability density function:
\[l(y) = - \frac{1}{2} \log{2 \pi \sigma^2} - \frac{1}{2} \frac{\left(y - \mu \right)^2}{\sigma^2}\]where \(\mu\) is the
'mean'column and \(\sigma^2\) is the'variance'column.
- property posteriors: dict[str, ndarray]¶
Posteriors of the regressors’ and variance parameters. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.Returns¶
dictPosterior samples. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equals to the number of iterations and a number of columns equal to the number of Markov chains.
Raises
TypeErrorIf
posteriorsis not adict,if a posterior sample is not a
numpy.ndarray.
KeyErrorIf
posteriorsdoes not contain both'intercept'and'variance'keys.ValueErrorIf a posterior sample is an empty
numpy.ndarray.
- posteriors_to_frame() DataFrame¶
It organizes the
posteriorsin apandas.DataFrame. Each posterior is a frame column. The length of the frame is the number of sampling iterations times the number of sampling chains.Returns¶
pandas.DataFrameReturns posterior samples. Posteriors are organized in a
pandas.DataFrame, one for each column. The length of the frame is the number of sampling iterations times the number of sampling chains.
Raises
ValueErrorIf
posteriorsare not available because the methodLinearRegression.samplehas not been called yet.
- predict_distribution(predictors: dict[str, float | int]) ndarray¶
It predicts a posterior distribution for an unobserved values. For each posterior sample, it draws a sample from the likelihood.
Parameters¶
predictorsdictValues of predictors \(X\) at which compute the posterior distribution. Each predictor has to be set as a key-value pair.
Returns¶
numpy.ndarrayArray of the predicted posterior distribution. It contains a number of element equal to the number of regression iterations times the number of model Markov chains.
Raises
TypeErrorIf
predictorsis not adict.KeyErrorIf a
predictorskey is not a key ofposteriors.ValueErrorIf
predictorsis an emptydict.
See Also
- property priors: dict[str, dict[str, float | int]]¶
Priors for the regressors’ and variance parameters. Each prior is a key-value pair, where the value is a
dictwith:hyperparameter names as keys
hyperparameter values as values.
Returns¶
dictPriors for each random variable. It must contain an
'intercept'and a'variance'keys. Each value must be adictwith hyperparameter names as key and hyperparameter values as values.
Raises
TypeErrorValueErrorKeyErrorIf
priorsdoes not contain both'intercept'and'variance'keys,- if a prior’s hyperparameters are not:
'mean'and'variance'for a regression parameter \(\beta_j\) or'shape'and'scale'forvariance\(\sigma^2\).
Notes
To each random variables is assigned a prior distribution:
to each regressor parameter \(\beta_j\) is assigned a normal prior distribution with hyperparameters
'mean'\(\beta_j^0\) and'variance'\(\Sigma_{\beta_j}^0\):\[\beta_j \sim N(\beta_j^0 , \Sigma_{\beta_j}^0)\]to variance \(\sigma^2\) is assigned an inverse gamma distribution with hyperparameters
'shape'\(\kappa^0\) and'scale'\(\theta^0\):\[\sigma^2 \sim \text{Inv-}\Gamma(\kappa^0, \theta^0)\]
Examples
Consider a linear regression of the
response_variable\(y\) with respect to regressors \(x_1\), \(x_2\) and \(x_3\), according to the following model:\[y \sim N(\mu, \sigma^2)\]\[\mu = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\]- then the sampler would require priors for:
parameter \(\beta_0\) of variable
'intercept', with'mean'\(\beta_0^0\) and'variance'\(\Sigma_{\beta_0}^0\)parameter \(\beta_1\) of variable \(x_1\), with
'mean'\(\beta_1^0\) and'variance'\(\Sigma_{\beta_1}^0\)parameter \(\beta_2\) of variable \(x_2\), with
'mean'\(\beta_2^0\) and'variance'\(\Sigma_{\beta_2}^0\)parameter \(\beta_3\) of variable \(x_3\), with
'mean'\(\beta_3^0\) and'variance'\(\Sigma_{\beta_3}^0\)variable \(\sigma^2\), with
'shape'\(\kappa^0\) and'scale'\(\theta^0\)
>>> model = baypy.model.LinearModel() >>> model.priors = { ... 'intercept': {'mean': 0, 'variance': 1e6}, ... 'x_1': {'mean': 0, 'variance': 1e6}, ... 'x_2': {'mean': 0, 'variance': 1e6}, ... 'x_3': {'mean': 0, 'variance': 1e6}, ... 'variance': {'shape': 1, 'scale': 1e-6} ... }
- residuals() DataFrame¶
It computes the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).
Returns¶
pandas.DataFrameReturns a copy of
datawith 3 more columns:'intercept','predicted'and'residuals'.
Raises
ValueErrorif
response_variableis not a column ofdata,If a
posteriorsisNonebecause the sampling has not been done yet.
Notes
Predicted values are computed at data points \(X\) using the posteriors means for each regressor’s parameter:
\[\hat{y_i} = \beta_0 + \sum_{j = 1}^{m} \beta_j x_{i,j}\]while residuals are the difference between the observed values and the predicted values of the
response_variable:\[\epsilon_i = y_i - \hat{y_i}\]
- property response_variable: str¶
Response variable \(y\) of the linear model.
Returns¶
Raises
TypeErrorIf
response_variableis not astr.