analysis¶

compute_DIC(model: Model, print_summary: bool = True) → dict[str, float]¶

It computes and prints the Deviance Information Criterion (DIC) for the fitted model.

Parameters¶

modelModel: The model with data, regressors, response variable and priors to be solved through Monte Carlo sampling.
print_summarybool, optional: If True prints the deviance summary report. Default is True.

Returns¶

dict

Dictionary with deviance summary. It contains:

key 'Deviance at posterior means',
key 'Posterior mean deviance',
key 'Effective number of parameters',
key 'Deviance Information Criterion'.

Raises

TypeError

If model is not a Model,
if print_summary is not a bool.

ValueError

If a model.posteriors is None because the sampling has not been done yet,
if a posterior key is not a column of model.data,
if model.data is an empty pandas.DataFrame,
if model.response_variable is not a column of model.data.

Notes

The DIC measures posterior predictive error by penalizing the fit of a model (deviance) by its complexity, determined by the effective number of parameters. Comparing some alternative models, the smaller the DIC of a model, the better the model. Consider a linear regression of the response variable \(y\) with respect to regressors \(X\), according to the following model:

\[y \sim N(\mu, \sigma^2)\]

\[\mu = \beta_0 + B X = \beta_0 + \sum_{j = 1}^m \beta_j x_j\]

then the likelyhood is:

\[p \left( y \left\vert B,\sigma^2 \right. \right) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{- \frac{\left(y - \mu \right)^2}{2 \sigma^2}} .\]

The deviance [1] [2] is defined as:

\[D \left( y, B, \sigma^2 \right) = -2\log p \left( y \left\vert B,\sigma^2 \right. \right) .\]

The deviance at posterior mean of \(B\) and \(\sigma^2\), denoted by \(\overline{B}\) and \(\overline{\sigma^2}\) is:

\[D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = D \left( y, \overline{B}, \overline{\sigma^2} \right)\]

while the posterior mean deviance is:

\[\overline{D} \left( y, B, \sigma^2 \right) = E \left( D(y, B, \sigma^2) \left. \right\vert y \right) .\]

and the effective number of parameter is defined as:

\[pD = \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) .\]

The Deviance Information Criterion [1] is:

\[DIC = 2 \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = \overline{D} \left( y, B, \sigma^2 \right) + pD = D_{{\overline{B}}, \overline{\sigma^2}} (y) + 2pD .\]

References

See Also

LinearRegression

residuals_plot(model: Model) → None¶

It plots the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).

Parameters¶

modelModel: The model with data, regressors, response variable and priors to be solved through Monte Carlo sampling.

Raises

TypeError

If model is not a Model.

ValueError

If a model.posteriors is None because the sampling has not been done yet,
if a posterior key is not a column of model.data,
if model.data is an empty pandas.DataFrame,
if model.response_variable is not a column of model.data.

Notes

Predicted values are computed at data points \(X\) using the posteriors means for each regressor’s parameter. In the case of linear model:

\[\hat{y_i} = \beta_0 + \sum_{j = 1}^{m} \beta_j x_{i,j}\]

while residuals are the difference between the observed values and the predicted values of the response_variable:

\[\epsilon_i = y_i - \hat{y_i}\]

See Also

LinearRegression

summary(posteriors: dict[str, np.ndarray], alpha: float = 0.05, quantiles: list[float] = None, print_summary: bool = True) → dict[str, int | str]¶

It prints a statistical summary for each posterior.

Parameters¶

posteriorsdict: Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.
alphafloat, optional: Significance level. It is used to compute the Highest Posterior Density (HPD) interval. It must be between 0 and 1. Default is 0.05.
quantileslist, optional: List of the quantiles to compute, for each posterior. It cannot be empty. It must contain only float between 0 and 1. Default is [0.025, 0.25, 0.5, 0.75, 0.975].
print_summarybool, optional: If True prints the statistical posterior summary report. Default is True.

Returns¶

dict

Dictionary with statistical summary of posteriors. It contains:

key 'n_chain', the number of Markov chains,
key 'n_iterations', the number of regression iterations,
key 'summary', the statistical summary of the posteriors, as a pandas.DataFrame,
key 'quantiles', quantiles summary of the posteriors, as a pandas.DataFrame.

Raises

TypeError

If posteriors is not a dict,
if a posterior sample is not a numpy.ndarray,
if alpha is not a float,
if quantiles is not a list,
if a quantiles value is not a float,
if print_summary is not a bool.

KeyError

If posteriors does not contain 'intercept' key.

ValueError

If a posterior sample is an empty numpy.ndarray,
if alpha is not between 0 and 1,
if quantiles is an empty list,
if a quantiles value is not between 0 and 1.

See Also

LinearRegression

trace_plot(posteriors: dict[str, ndarray]) → None¶

it plots the traces and the probability density for each posterior.

The plot shows the traces for each Markov chain, for each regression variable and the relative posterior density. The plot layout has number of rows equal to the number of regression variables and two columns: traces on the left and densities on the right.

Parameters¶

posteriorsdict: Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.

Raises

TypeError

If posteriors is not a dict,
if a posterior sample is not a numpy.ndarray.

KeyError

If posteriors does not contain 'intercept' key.

ValueError

If a posterior sample is an empty numpy.ndarray.

See Also

LinearRegression