analysis¶
- compute_DIC(model: Model, print_summary: bool = True) dict[str, float]¶
It computes and prints the Deviance Information Criterion (DIC) for the fitted model.
Parameters¶
Returns¶
dict- Dictionary with deviance summary. It contains:
key
'Deviance at posterior means',key
'Posterior mean deviance',key
'Effective number of parameters',key
'Deviance Information Criterion'.
Raises
TypeErrorValueErrorIf a
model.posteriorsisNonebecause the sampling has not been done yet,if a posterior key is not a column of
model.data,if
model.datais an emptypandas.DataFrame,if
model.response_variableis not a column ofmodel.data.
Notes
The DIC measures posterior predictive error by penalizing the fit of a model (deviance) by its complexity, determined by the effective number of parameters. Comparing some alternative models, the smaller the DIC of a model, the better the model. Consider a linear regression of the response variable \(y\) with respect to regressors \(X\), according to the following model:
\[y \sim N(\mu, \sigma^2)\]\[\mu = \beta_0 + B X = \beta_0 + \sum_{j = 1}^m \beta_j x_j\]then the likelyhood is:
\[p \left( y \left\vert B,\sigma^2 \right. \right) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{- \frac{\left(y - \mu \right)^2}{2 \sigma^2}} .\]The deviance [1] [2] is defined as:
\[D \left( y, B, \sigma^2 \right) = -2\log p \left( y \left\vert B,\sigma^2 \right. \right) .\]The deviance at posterior mean of \(B\) and \(\sigma^2\), denoted by \(\overline{B}\) and \(\overline{\sigma^2}\) is:
\[D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = D \left( y, \overline{B}, \overline{\sigma^2} \right)\]while the posterior mean deviance is:
\[\overline{D} \left( y, B, \sigma^2 \right) = E \left( D(y, B, \sigma^2) \left. \right\vert y \right) .\]and the effective number of parameter is defined as:
\[pD = \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) .\]The Deviance Information Criterion [1] is:
\[DIC = 2 \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = \overline{D} \left( y, B, \sigma^2 \right) + pD = D_{{\overline{B}}, \overline{\sigma^2}} (y) + 2pD .\]References
See Also
- residuals_plot(model: Model) None¶
It plots the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).
Parameters¶
modelModelThe model with data, regressors, response variable and priors to be solved through Monte Carlo sampling.
Raises
TypeErrorIf
modelis not aModel.ValueErrorIf a
model.posteriorsisNonebecause the sampling has not been done yet,if a posterior key is not a column of
model.data,if
model.datais an emptypandas.DataFrame,if
model.response_variableis not a column ofmodel.data.
Notes
Predicted values are computed at data points \(X\) using the posteriors means for each regressor’s parameter. In the case of linear model:
\[\hat{y_i} = \beta_0 + \sum_{j = 1}^{m} \beta_j x_{i,j}\]while residuals are the difference between the observed values and the predicted values of the
response_variable:\[\epsilon_i = y_i - \hat{y_i}\]See Also
- summary(posteriors: dict[str, np.ndarray], alpha: float = 0.05, quantiles: list[float] = None, print_summary: bool = True) dict[str, int | str]¶
It prints a statistical summary for each posterior.
Parameters¶
posteriorsdictPosterior samples. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.alphafloat, optionalSignificance level. It is used to compute the Highest Posterior Density (HPD) interval. It must be between
0and1. Default is0.05.quantileslist, optionalList of the quantiles to compute, for each posterior. It cannot be empty. It must contain only float between
0and1. Default is[0.025, 0.25, 0.5, 0.75, 0.975].print_summarybool, optionalIf
Trueprints the statistical posterior summary report. Default isTrue.
Returns¶
dict- Dictionary with statistical summary of posteriors. It contains:
key
'n_chain', the number of Markov chains,key
'n_iterations', the number of regression iterations,key
'summary', the statistical summary of the posteriors, as apandas.DataFrame,key
'quantiles', quantiles summary of the posteriors, as apandas.DataFrame.
Raises
TypeErrorKeyErrorIf
posteriorsdoes not contain'intercept'key.ValueErrorIf a posterior sample is an empty
numpy.ndarray,if
alphais not between0and1,if
quantilesis an emptylist,if a
quantilesvalue is not between0and1.
See Also
- trace_plot(posteriors: dict[str, ndarray]) None¶
it plots the traces and the probability density for each posterior.
The plot shows the traces for each Markov chain, for each regression variable and the relative posterior density. The plot layout has number of rows equal to the number of regression variables and two columns: traces on the left and densities on the right.
Parameters¶
posteriorsdictPosterior samples. Posteriors and relative samples are key-value pairs. Each sample is a
numpy.ndarraywith a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.
Raises
TypeErrorIf
posteriorsis not adict,if a posterior sample is not a
numpy.ndarray.
KeyErrorIf
posteriorsdoes not contain'intercept'key.ValueErrorIf a posterior sample is an empty
numpy.ndarray.
See Also