TV Marketing¶

Model Set Up¶

Link to the dataset
Unfortunately, the database original source does not report the units on each variable.
Complete example code
Build a linear regression model for predicting Sales using TV as a predictor.

import pandas as pd

data = pd.read_csv(r'data/data.csv')

import matplotlib.pyplot as plt

fig_1, ax_1 = plt.subplots()

ax_1.plot(
    data['TV'].values,
    data['Sales'].values,
    marker='o',
    linestyle='',
    alpha=0.5
)

ax_1.set_xlabel('TV')
ax_1.set_ylabel('Sales')

ax_1.tick_params(bottom=False, top=False, left=False, right=False)

plt.show()

Sales do not follow a perfect linear relationship with respect to TV, but let’s try to fit a linear model anyway.
Set up a linear regression model, considering TV as regressor and Sales as the response variable.
Use non-informative priors for regressor and variance:

from baypy.model import LinearModel
import baypy as bp

model_1 = LinearModel()

model_1.data = data
model_1.response_variable = 'Sales'
model_1.priors = {
    'intercept': {'mean': 0, 'variance': 1e6},
    'TV': {'mean': 0, 'variance': 1e6},
    'variance': {'shape': 1, 'scale': 1e-6}
}

See LinearModel for more information on this class and its attributes and methods.

Sampling¶

Run the regression sampling on 3 Markov chains, with 500 iterations per each chain and discarding the first 50 burn-in draws:

from baypy.regression import LinearRegression

LinearRegression.sample(
    model= model_1,
    n_iterations=500,
    burn_in_iterations=50,
    n_chains=3,
    seed=137
)

See LinearRegression for more information on this class and its attributes and methods.

Convergence Diagnostics¶

Asses the model convergence diagnostics:

bp.diagnostics.effective_sample_size(posteriors=model_1.posteriors)

                       intercept       TV  variance
Effective Sample Size    1371.51  1280.94   1428.17

bp.diagnostics.autocorrelaion_summary(posteriors=model_1.posteriors)

        intercept        TV  variance
Lag 0    1.000000  1.000000  1.000000
Lag 1   -0.032025 -0.010411  0.000142
Lag 5    0.011572  0.015856 -0.033278
Lag 10   0.021948  0.010220 -0.003107
Lag 30  -0.028900 -0.023383  0.032368

bp.diagnostics.autocorrelation_plot(posteriors=model_1.posteriors)

See effective_sample_size, autocorrelation_summary and autocorrelation_plot for more details on diagnostics functions.
All diagnostics show a low correlation, indicating the chains converged to the stationary distribution.

Posteriors Analysis¶

Asses posterior analysis:

bp.analysis.trace_plot(posteriors=model_1.posteriors)

bp.analysis.residuals_plot(model=model_1)

See trace_plot and residuals_plot for more details on analysis functions.
Residuals form a shape and increase as the predicted variable increases. This suggests that the model does not fit the data well.
This is consistent with previous exploration of the data: there is no pure linear relationship between Sales and TV.
Comparing data to fitted model posteriors:

import numpy as np

posteriors_data = pd.DataFrame()
for posterior, posterior_sample in model_1.posteriors.items():
    posteriors_data[posterior] = np.asarray(posterior_sample).reshape(-1)
posteriors_data['error 1'] = np.random.normal(
    loc=0,
    scale=np.sqrt(posteriors_data['variance']),
    size=len(posteriors_data)
)

tv_1 = np.linspace(data['TV'].min(), data['TV'].max(), 50)


fig_2, ax_2 = plt.subplots()

for row in zip(*posteriors_data.to_dict('list').values()):
    sales_1 = row[0] + row[1]*tv_1 + row[3]
    ax_2.plot(tv_1, sales_1, color='blue', linewidth=1, alpha=0.1)
ax_2.plot(
    data['TV'].values,
    data['Sales'].values,
    marker='o',
    linestyle='',
    markerfacecolor='none',
    markeredgecolor='red',
    markeredgewidth=1.2
)

ax_2.set_xlabel('TV')
ax_2.set_ylabel('Sales')
ax_2.tick_params(bottom=False, top=False, left=False, right=False)

plt.tight_layout()

plt.show()

The posterior distribution is rather broad and includes all the data without identifying their true trend. Notice that some posteriors predict negative Sales for low values of TV.

Alternative Model Set Up¶

Try to check a relationship using the log-scale:

fig_3, ax_3 = plt.subplots()

ax_3.loglog(
    data['TV'].values,
    data['Sales'].values,
    marker='o',
    linestyle='',
    alpha=0.5
)

ax_3.set_xlabel('TV')
ax_3.set_ylabel('Sales')

plt.show()

The relationship is almost linear in the log-scale. Notice that most of the data are concentrated toward high values of TV.
Try to fit a linear model in the log-scale; it is required to transform the data:

data['log TV'] = np.log(data['TV'])
data['log Sales'] = np.log(data['Sales'])

Set up a linear regression model, considering log TV as regressor and log Sales as the response variable.
Use non-informative priors for regressor and variance:

model_2 = LinearModel()

model_2.data = data
model_2.response_variable = 'log Sales'
model_2.priors = {
    'intercept': {'mean': 0, 'variance': 1e6},
    'log TV': {'mean': 0, 'variance': 1e6},
    'variance': {'shape': 1, 'scale': 1e-6}
}

Sampling¶

Run the regression sampling on 3 Markov chains, with 500 iteration per each chain and discarding the first 50 burn-in draws:

LinearRegression.sample(
    model=model_2,
    n_iterations = 500,
    burn_in_iterations=50,
    n_chains=3,
    seed=137
)

Convergence Diagnostics¶

Asses the model convergence diagnostics:

bp.diagnostics.effective_sample_size(posteriors=model_2.posteriors)

                       intercept   log TV  variance
Effective Sample Size    1373.29  1321.11   1428.17

bp.diagnostics.autocorrelation_summary(posteriors=model_2.posteriors)

        intercept    log TV  variance
Lag 0    1.000000  1.000000  1.000000
Lag 1   -0.032124 -0.027163  0.000142
Lag 5    0.011421  0.014532 -0.033278
Lag 10   0.021877  0.021076 -0.003107
Lag 30  -0.028748 -0.030245  0.032368

bp.diagnostics.autocorrelation_plot(posteriors=model_2.posteriors)

All diagnostics show a low correlation, indicating the chains converged to the stationary distribution.

Posteriors Analysis¶

Asses posterior analysis:

bp.analysis.trace_plot(posteriors=model_2.posteriors)

bp.analysis.residuals_plot(model=model_2)

Residuals are generally improved with respect to the original model.
Residuals appear to reflect an increasing dispersion as predicted variable increase. However, as already mentioned, notice that most of the data are concentrated toward high values of TV. The slight growing pattern is partially caused by this data heterogeneity.

bp.analysis.summary(posteriors=model_2.posteriors)

Number of chains:           3
Sample size per chian:    500

Empirical mean, standard deviation, 95% HPD interval for each variable:

               Mean        SD   HPD min   HPD max
intercept  0.905983  0.073676  0.768024  1.057664
log TV     0.354768  0.015388  0.325168  0.384996
variance   0.044958  0.004671  0.035855  0.053536

Quantiles for each variable:

               2.5%       25%       50%       75%     97.5%
intercept  0.754455  0.858332  0.905386  0.952843  1.050115
log TV     0.325689  0.344946  0.354906  0.364422  0.386279
variance   0.036865  0.041719  0.044506  0.048087  0.054838

See summary for more details on this analysis function.
The summary reports a statistical evidence for a positive effect of log TV: \(10\%\) percent increase in TV would result in \(1.10^{0.354768} - 1 = 3.44\%\) percent increase in Sales.
Comparing data to fitted model posteriors:

posteriors_data = pd.DataFrame()
for posterior, posterior_sample in model_2.posteriors.items():
    posteriors_data[posterior] = np.asarray(posterior_sample).reshape(-1)
posteriors_data['error 2'] = np.random.normal(
    loc=0,
    scale=np.sqrt(posteriors_data['variance']),
    size=len(posteriors_data)
)

log_tv_2 = np.linspace(data['log TV'].min(), data['log TV'].max(), 50)


fig_4, ax_4 = plt.subplots()

for row in zip(*posteriors_data.to_dict('list').values()):
    log_sales_2 = row[0] + row[1]*log_tv_2 + row[3]
    ax_4.plot(log_tv_2, log_sales_2, color='blue', linewidth=1, alpha=0.1)
ax_4.plot(
    data['log TV'].values,
    data['log Sales'].values,
    marker='o',
    linestyle='',
    markerfacecolor='none',
    markeredgecolor='red',
    markeredgewidth=1.2
)

ax_4.set_xlabel('log TV')
ax_4.set_ylabel('log Sales')
ax_4.tick_params(bottom=False, top=False, left=False, right=False)

plt.tight_layout()

plt.show()

fig_5, ax_5 = plt.subplots()

for row in zip(*posteriors_data.to_dict('list').values()):
    log_sales_2 = row[0] + row[1]*log_tv_2 + row[3]
    ax_5.plot(
        np.exp(log_tv_2),
        np.exp(log_sales_2),
        color='blue',
        linewidth=1,
        alpha=0.1
    )
ax_5.plot(
    data['TV'].values,
    data['Sales'].values,
    marker='o',
    linestyle='',
    markerfacecolor='none',
    markeredgecolor='red',
    markeredgewidth=1.2
)

ax_5.set_xlabel('TV')
ax_5.set_ylabel('Sales')
ax_5.tick_params(bottom=False, top=False, left=False, right=False)

plt.tight_layout()

plt.show()

The alternative model’s posteriors catch in a better way the trend of the data and take into account data dispersion.
For completeness, compare the two models using the DIC:

bp.analysis.compute_DIC(model=model_1)

Deviance at posterior means          1038.13
Posterior mean deviance              1039.18
Effective number of parameters          1.05
Deviance Information Criterion       1040.23

bp.analysis.compute_DIC(model=model_2)

Deviance at posterior means           -56.87
Posterior mean deviance               -55.82
Effective number of parameters          1.05
Deviance Information Criterion        -54.77

See compute_DIC for more details on this analysis function.
DIC is lower for the alternative model, indicating a preference for the alternative model in log-scale.