analysis

compute_DIC(model: Model, print_summary: bool = True) dict

Computes and prints the Deviance Information Criterion (DIC) for the fitted model.

Parameters

modelbaypy.model.model.Model

The model with data, regressors, response variable and priors to be solved through Monte Carlo sampling.

print_summarybool, optional

If True prints the deviance summary report. Default is True.

Returns

dict
Dictionary with deviance summary. It contains:
  • key deviance at posterior means,

  • key posterior mean deviance,

  • key effective number of parameters,

  • key DIC.

Raises

TypeError
  • If model is not a baypy.model.model.Model,

  • if print_summary is not a bool.

ValueError
  • If a model.posteriors is None because the sampling has not been done yet,

  • if a posterior key is not a column of model.data,

  • if model.data is an empty pandas.DataFrame,

  • if model.response_variable is not a column of model.data.

See Also

baypy.regression.linear_regression.LinearRegression

Notes

The DIC measures posterior predictive error by penalizing the fit of a model (deviance) by its complexity, determined by the effective number of parameters. Comparing some alternative models, the smaller the DIC of a model, the better the model. Consider a linear regression of the response variable \(y\) with respect to regressors \(X\), according to the following model:

\[y \sim N(\mu, \sigma^2)\]
\[\mu = \beta_0 + B X = \beta_0 + \sum_{j = 1}^m \beta_j x_j\]

then the likelyhood is:

\[p \left( y \left\vert B,\sigma^2 \right. \right) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp{- \frac{\left(y - \mu \right)^2}{2 \sigma^2}} .\]

The deviance [1] [2] is defined as:

\[D \left( y, B, \sigma^2 \right) = -2\log p \left( y \left\vert B,\sigma^2 \right. \right) .\]

The deviance at posterior mean of \(B\) and \(\sigma^2\), denoted by \(\overline{B}\) and \(\overline{\sigma^2}\) is:

\[D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = D \left( y, \overline{B}, \overline{\sigma^2} \right)\]

while the posterior mean deviance is:

\[\overline{D} \left( y, B, \sigma^2 \right) = E \left( D(y, B, \sigma^2) \left. \right\vert y \right) .\]

and the effective number of parameter is defined as:

\[pD = \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) .\]

The Deviance Information Criterion [1] is:

\[DIC = 2 \overline{D} \left( y, B, \sigma^2 \right) - D_{{\overline{\beta}}, \overline{\sigma^2}} (y) = \overline{D} \left( y, B, \sigma^2 \right) + pD = D_{{\overline{B}}, \overline{\sigma^2}} (y) + 2pD .\]

References

residuals_plot(model: Model) None

Plots the residuals \(\epsilon\) with respect to predicted values \(\hat{y}\).

Parameters

modelbaypy.model.model.Model

The model with data, regressors, response variable and priors to be solved through Monte Carlo sampling.

Raises

TypeError

If model is not a baypy.model.model.Model.

ValueError
  • If a model.posteriors is None because the sampling has not been done yet,

  • if a posterior key is not a column of model.data,

  • if model.data is an empty pandas.DataFrame,

  • if model.response_variable is not a column of model.data.

See Also

baypy.regression.linear_regression.LinearRegression

Notes

Predicted values are computed at data points \(X\) using the posteriors means for each regressor’s parameter. In the case of linear model:

\[\hat{y_i} = \beta_0 + \sum_{j = 1}^{m} \beta_j x_{i,j}\]

while residuals are the difference between the observed values and the predicted values of the response_variable:

\[\epsilon_i = y_i - \hat{y_i}\]
summary(posteriors: dict, alpha: float = 0.05, quantiles: list | None = None, print_summary: bool = True) dict

Prints a statistical summary for each posterior.

Parameters

posteriorsdict

Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.

alphafloat

Significance level. It is used to compute the Highest Posterior Density (HPD) interval. It must be between 0 and 1.

quantileslist, optional

List of the quantiles to compute, for each posterior. It cannot be empty. It must contain only float between 0 and 1. Default is [0.025, 0.25, 0.5, 0.75, 0.975].

print_summarybool, optional

If True prints the statistical posterior summary report. Default is True.

Returns

dict
Dictionary with statistical summary of posteriors. It contains:
  • key n_chain, the number of Markov chains,

  • key n_iterations, the number of regression iterations,

  • key summary, the statistical summary of the posteriors, as a pandas.DataFrame,

  • key quantiles, quantiles summary of the posteriors, as a pandas.DataFrame.

Raises

TypeError
  • If posteriors is not a dict,

  • if a posterior sample is not a numpy.ndarray,

  • if alpha is not a float,

  • if quantiles is not a list,

  • if a quantiles value is not a float,

  • if print_summary is not a bool.

KeyError

If posteriors does not contain intercept key.

ValueError
  • If a posterior sample is an empty numpy.ndarray,

  • if alpha is not between 0 and 1,

  • if quantiles is an empty list,

  • if a quantiles value is not between 0 and 1.

See Also

baypy.regression.linear_regression.LinearRegression

trace_plot(posteriors: dict) None

Plots the traces and the probability density for each posterior. The plot shows the traces for each Markov chain, for each regression variable and the relative posterior density. The plot layout has number of rows equal to the number of regression variables and two columns: traces on the left and densities on the right.

Parameters

posteriorsdict

Posterior samples. Posteriors and relative samples are key-value pairs. Each sample is a numpy.ndarray with a number of rows equal to the number of iterations and a number of columns equal to the number of Markov chains.

Raises

TypeError
  • If posteriors is not a dict,

  • if a posterior sample is not a numpy.ndarray.

KeyError

If posteriors does not contain intercept key.

ValueError

If a posterior sample is an empty numpy.ndarray.

See Also

baypy.regression.linear_regression.LinearRegression