analysis¶

compute_DIC(model: Model, print_summary: bool = True) → dict¶

Computes and prints the Deviance Information Criterion (DIC) for the fitted model.

Parameters¶

modelbaypy.model.model.Model: The model with data, regressors, response variable and priors to be solved through Monte Carlo sampling.
print_summarybool, optional: If True prints the deviance summary report. Default is True.

Returns¶

dict

Dictionary with deviance summary. It contains:

key deviance at posterior means,
key posterior mean deviance,
key effective number of parameters,
key DIC.

Raises¶

TypeError

If model is not a baypy.model.model.Model,
if print_summary is not a bool.

ValueError

If a model.posteriors is None because the sampling has not been done yet,
if a posterior key is not a column of model.data,
if model.data is an empty pandas.DataFrame,
if model.response_variable is not a column of model.data.

Notes¶

The DIC measures posterior predictive error by penalizing the fit of a model (deviance) by its complexity, determined by the effective number of parameters. Comparing some alternative models, the smaller the DIC of a model, the better the model. Consider a linear regression of the response variable \(y\) with respect to regressors \(X\), according to the following model:

\[y \sim N(\mu, \sigma^2)\]

\[\mu = \beta_0 + B X = \beta_0 + \sum_{j = 1}^m \beta_j x_j\]

then the likelyhood is:

\[p \left( y \left\vert B,\sigma^2 \right. \right) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{- \frac{\left(y - \mu \right)^2}{2 \sigma^2}} .\]

The deviance [1] [2] is defined as:

\[D \left( y, B, \sigma^2 \right) = -2\log p \left( y \left\vert B,\sigma^2 \right. \right) .\]

The deviance at posterior mean of \(B\) and \(\sigma^2\), denoted by \(\overline{B}\) and \(\overline{\sigma^2}\) is:

\[D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = D \left( y, \overline{B}, \overline{\sigma^2} \right)\]

while the posterior mean deviance is:

\[\overline{D} \left( y, B, \sigma^2 \right) = E \left( D(y, B, \sigma^2) \left. \right\vert y \right) .\]

and the effective number of parameter is defined as:

\[pD = \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) .\]

The Deviance Information Criterion [1] is:

\[DIC = 2 \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = \overline{D} \left( y, B, \sigma^2 \right) + pD = D_{{\overline{B}}, \overline{\sigma^2}} (y) + 2pD .\]

References¶

residuals_plot(model: Model) → None¶

Plots the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).

Parameters¶

modelbaypy.model.model.Model: The model with data, regressors, response variable and priors to be solved through Monte Carlo sampling.

Raises¶

TypeError

If model is not a baypy.model.model.Model.

ValueError

If a model.posteriors is None because the sampling has not been done yet,
if a posterior key is not a column of model.data,
if model.data is an empty pandas.DataFrame,
if model.response_variable is not a column of model.data.

Notes¶

Predicted values are computed at data points \(X\) using the posteriors means for each regressor’s parameter. In the case of linear model:

\[\hat{y_i} = \beta_0 + \sum_{j = 1}^{m} \beta_j x_{i,j}\]

while residuals are the difference between the observed values and the predicted values of the response_variable:

\[\epsilon_i = y_i - \hat{y_i}\]

summary(posteriors: dict, alpha: float = 0.05, quantiles: list | None = None, print_summary: bool = True) → dict¶

Prints a statistical summary for each posterior.

Parameters¶

posteriorsdict: Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.
alphafloat: Significance level. It is used to compute the Highest Posterior Density (HPD) interval. It must be between 0 and 1.
quantileslist, optional: List of the quantiles to compute, for each posterior. It cannot be empty. It must contain only float between 0 and 1. Default is [0.025, 0.25, 0.5, 0.75, 0.975].
print_summarybool, optional: If True prints the statistical posterior summary report. Default is True.

Returns¶

dict

Dictionary with statistical summary of posteriors. It contains:

key n_chain, the number of Markov chains,
key n_iterations, the number of regression iterations,
key summary, the statistical summary of the posteriors, as a pandas.DataFrame,
key quantiles, quantiles summary of the posteriors, as a pandas.DataFrame.

Raises¶

TypeError

If posteriors is not a dict,
if a posterior sample is not a numpy.ndarray,
if alpha is not a float,
if quantiles is not a list,
if a quantiles value is not a float,
if print_summary is not a bool.

KeyError

If posteriors does not contain intercept key.

ValueError

If a posterior sample is an empty numpy.ndarray,
if alpha is not between 0 and 1,
if quantiles is an empty list,
if a quantiles value is not between 0 and 1.

Parameters¶

posteriorsdict: Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.

Raises¶

TypeError

If posteriors is not a dict,
if a posterior sample is not a numpy.ndarray.

KeyError

If posteriors does not contain intercept key.

ValueError

If a posterior sample is an empty numpy.ndarray.

analysis¶

Parameters¶

Returns¶

Raises¶

See Also¶

Notes¶

References¶

Parameters¶

Raises¶

See Also¶

Notes¶

Parameters¶

Returns¶

Raises¶

See Also¶

Parameters¶

Raises¶

See Also¶