analysis¶
- compute_DIC(model: Model, print_summary: bool = True) dict¶
Computes and prints the Deviance Information Criterion (DIC) for the fitted model.
Parameters¶
- modelbaypy.model.model.Model
The model with data, regressors, response variable and priors to be solved through Monte Carlo sampling.
- print_summarybool, optional
If
Trueprints the deviance summary report. Default isTrue.
Returns¶
- dict
- Dictionary with deviance summary. It contains:
key
deviance at posterior means,key
posterior mean deviance,key
effective number of parameters,key
DIC.
Raises¶
- TypeError
If
modelis not abaypy.model.model.Model,if
print_summaryis not abool.
- ValueError
If a
model.posteriorsisNonebecause the sampling has not been done yet,if a posterior key is not a column of
model.data,if
model.datais an emptypandas.DataFrame,if
model.response_variableis not a column ofmodel.data.
See Also¶
Notes¶
The DIC measures posterior predictive error by penalizing the fit of a model (deviance) by its complexity, determined by the effective number of parameters. Comparing some alternative models, the smaller the DIC of a model, the better the model. Consider a linear regression of the response variable \(y\) with respect to regressors \(X\), according to the following model:
\[y \sim N(\mu, \sigma^2)\]\[\mu = \beta_0 + B X = \beta_0 + \sum_{j = 1}^m \beta_j x_j\]then the likelyhood is:
\[p \left( y \left\vert B,\sigma^2 \right. \right) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{- \frac{\left(y - \mu \right)^2}{2 \sigma^2}} .\]The deviance [1] [2] is defined as:
\[D \left( y, B, \sigma^2 \right) = -2\log p \left( y \left\vert B,\sigma^2 \right. \right) .\]The deviance at posterior mean of \(B\) and \(\sigma^2\), denoted by \(\overline{B}\) and \(\overline{\sigma^2}\) is:
\[D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = D \left( y, \overline{B}, \overline{\sigma^2} \right)\]while the posterior mean deviance is:
\[\overline{D} \left( y, B, \sigma^2 \right) = E \left( D(y, B, \sigma^2) \left. \right\vert y \right) .\]and the effective number of parameter is defined as:
\[pD = \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) .\]The Deviance Information Criterion [1] is:
\[DIC = 2 \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = \overline{D} \left( y, B, \sigma^2 \right) + pD = D_{{\overline{B}}, \overline{\sigma^2}} (y) + 2pD .\]References¶
- residuals_plot(model: Model) None¶
Plots the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).
Parameters¶
- modelbaypy.model.model.Model
The model with data, regressors, response variable and priors to be solved through Monte Carlo sampling.
Raises¶
- TypeError
If
modelis not abaypy.model.model.Model.- ValueError
If a
model.posteriorsisNonebecause the sampling has not been done yet,if a posterior key is not a column of
model.data,if
model.datais an emptypandas.DataFrame,if
model.response_variableis not a column ofmodel.data.
See Also¶
Notes¶
Predicted values are computed at data points \(X\) using the posteriors means for each regressor’s parameter. In the case of linear model:
\[\hat{y_i} = \beta_0 + \sum_{j = 1}^{m} \beta_j x_{i,j}\]while residuals are the difference between the observed values and the predicted values of the
response_variable:\[\epsilon_i = y_i - \hat{y_i}\]
- summary(posteriors: dict, alpha: float = 0.05, quantiles: list | None = None, print_summary: bool = True) dict¶
Prints a statistical summary for each posterior.
Parameters¶
- posteriorsdict
Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.- alphafloat
Significance level. It is used to compute the Highest Posterior Density (HPD) interval. It must be between
0and1.- quantileslist, optional
List of the quantiles to compute, for each posterior. It cannot be empty. It must contain only float between
0and1. Default is[0.025, 0.25, 0.5, 0.75, 0.975].- print_summarybool, optional
If
Trueprints the statistical posterior summary report. Default isTrue.
Returns¶
- dict
- Dictionary with statistical summary of posteriors. It contains:
key
n_chain, the number of Markov chains,key
n_iterations, the number of regression iterations,key
summary, the statistical summary of the posteriors, as a pandas.DataFrame,key
quantiles, quantiles summary of the posteriors, as a pandas.DataFrame.
Raises¶
- TypeError
If
posteriorsis not adict,if a posterior sample is not a
numpy.ndarray,if
alphais not afloat,if
quantilesis not alist,if a
quantilesvalue is not afloat,if
print_summaryis not abool.
- KeyError
If
posteriorsdoes not containinterceptkey.- ValueError
If a posterior sample is an empty
numpy.ndarray,if
alphais not between0and1,if
quantilesis an emptylist,if a
quantilesvalue is not between0and1.
See Also¶
- trace_plot(posteriors: dict) None¶
Plots the traces and the probability density for each posterior. The plot shows the traces for each Markov chain, for each regression variable and the relative posterior density. The plot layout has number of rows equal to the number of regression variables and two columns: traces on the left and densities on the right.
Parameters¶
- posteriorsdict
Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.
Raises¶
- TypeError
If
posteriorsis not adict,if a posterior sample is not a
numpy.ndarray.
- KeyError
If
posteriorsdoes not containinterceptkey.- ValueError
If a posterior sample is an empty
numpy.ndarray.
See Also¶