Explanation: Core MMM Methodology

Version: 2.5.0

Marketing Mix Modelling (MMM) quantifies the impact of marketing activities on sales or other KPIs. AMMM provides a Bayesian framework for building flexible, interpretable MMM models.

Core Principles

Bayesian Approach: AMMM uses PyMC to treat model parameters as probability distributions, providing uncertainty quantification rather than point estimates. This yields credible intervals (HDIs) and enables incorporation of prior knowledge.

Flexibility: Customizable media transformations (adstock, saturation), control variables, and prior distributions allow models to reflect business reality.

Interpretability: Model outputs include channel-specific coefficients, response curves, ROI estimates, and contribution decomposition.

Actionability: Results directly inform budget optimization and scenario planning.

Bayesian Framework

Key Concepts

Parameters as Distributions: Model parameters (channel effectiveness, adstock rates, saturation points) are probability distributions reflecting uncertainty about their true values.

Prior Distributions: Express initial beliefs about parameters before observing data. Common choices:

  • HalfNormal: For positive-only parameters (effectiveness coefficients, error terms)

  • Beta: For parameters between 0-1 (adstock retention rates)

  • Gamma: For positive parameters with specific shapes (saturation parameters)

  • Normal: For unconstrained parameters

Likelihood Function: Quantifies how probable the observed data is given parameter values.

Posterior Distributions: Updated beliefs about parameters after observing data, obtained via Bayes’ Theorem:

P(parameter | Data) ∝ P(Data | parameter) × P(parameter)

MCMC Sampling: PyMC uses NUTS (No-U-Turn Sampler) to draw samples from posterior distributions, which are typically too complex for analytical solutions.

Benefits

  • Credible intervals (HDIs) for all parameters

  • Full posterior distributions for detailed inspection

  • Direct probability statements about parameters

  • Formal incorporation of prior knowledge

Core Model Equation

The typical AMMM model structure:

y_t = baseline_t + Σ[β_m · saturation(adstock(x_m,t))] + Σ[γ_c · z_c,t] + ε_t

Components

Target Variable (y_t): The outcome being modelled (sales, revenue, conversions).

Baseline (baseline_t): Expected value when marketing spend is zero:

  • Intercept (α): Fundamental base level

  • Trend: Long-term patterns

  • Seasonality: Handled by Prophet integration (required)

    • Yearly, weekly, daily patterns

    • Holiday effects

    • Prophet uses Fourier decomposition internally

Media Inputs (x_m,t): Raw marketing effort (spend, impressions, GRPs).

Media Transformations:

  1. Adstock (Carry-over Effects): Models lagged advertising impact

    • Geometric adstock: adstocked_t = (1-θ) × input_t + θ × adstocked_{t-1}

    • Parameter θ (retention rate): 0-1, typically has Beta prior

    • adstock_max_lag: Maximum periods for carry-over calculation

  2. Saturation (Diminishing Returns): Models non-linear response

    • Michaelis-Menten: Hyperbolic response curve

    • Logistic: S-shaped response curve

    • Parameters control shape and steepness, typically Gamma or HalfNormal priors

Channel Effectiveness (β_m): Scales transformed media input. Represents marginal impact on target. Typically HalfNormal prior (positive effect).

Control Variables (z_c,t): External factors (promotions, competitor activity, economic indicators). Coefficients γ_c have Normal or HalfNormal priors.

Error Term (ε_t): Random variation not explained by model. Typically Normal(0, σ_error) with HalfNormal prior on σ.

Prior Specification

Priors guide model estimation and reflect domain knowledge:

Channel Effectiveness (β_m):

  • HalfNormal: For positive effects

  • Inform based on past studies, domain expertise, or lift tests

Adstock (θ_m):

  • Beta distribution (0-1 constraint)

  • Digital channels: Lower values (shorter memory)

  • Traditional media: Higher values (longer memory)

Saturation (λ_m):

  • Gamma or HalfNormal

  • Inform using lift test results when available

  • Otherwise use weakly informative priors

Best Practices:

  • Start with weakly informative priors

  • Visualize priors before fitting

  • Test sensitivity to prior choices

  • Document reasoning

Model Fitting

MCMC Process:

  1. Initialize parameter values

  2. Run multiple independent chains (typically 4)

  3. Tuning phase: Sampler adapts (typically 2000 iterations)

  4. Sampling phase: Collect posterior samples (typically 2000+ draws)

Key Parameters:

  • draws: Posterior samples per chain

  • tune: Tuning iterations

  • chains: Number of independent chains

  • target_accept: Acceptance rate (0.8-0.99)

Convergence Diagnostics:

  • R-hat: Should be ≈1.0 (< 1.01 ideal)

    • Compares within-chain vs between-chain variance

    • Values > 1.05 indicate non-convergence

  • Effective Sample Size (ESS): Should be > 100-400

    • Accounts for autocorrelation

    • Low ESS indicates inefficient sampling

  • Trace Plots: Should show “fuzzy caterpillar”

    • Horizontal band (stationarity)

    • Good mixing between chains

    • No trends or patterns

  • Divergences: Should be zero or minimal

    • Indicate sampler instability

    • Fix by increasing target_accept

Model Outputs

Channel Coefficients (β_m):

  • Magnitude indicates effectiveness

  • HDI indicates uncertainty

  • If HDI excludes zero, strong evidence of effect

Response Curves:

  • Visualize diminishing returns

  • Identify optimal spend levels

  • Compare channel dynamics

ROI Metrics:

  • Overall ROI: Total contribution / total spend

  • Marginal ROI (mROI): Return on next dollar spent

  • Use mROI for budget allocation decisions

Contribution Analysis:

  • Decomposes target into components

  • Shows baseline vs marketing impact

  • Tracks contribution over time

Budget Optimization:

  • Allocates budget to maximize returns

  • Uses response curves and mROI

  • Supports constraints on channel spend

Scenario Planning:

  • Tests “what-if” scenarios

  • Predicts outcomes under different budgets

  • Provides uncertainty intervals

Prophet Integration

Prophet handles all seasonality and trend decomposition (required component):

Configuration:

  • yearly_seasonality: Annual patterns (all data frequencies)

  • weekly_seasonality: Day-of-week patterns (for daily or sub-daily data)

  • daily_seasonality: Hour-of-day patterns (for sub-daily/intra-day data)

  • trend: Long-term trends

  • include_holidays: Holiday effects

Data Frequency Guidance:

  • Weekly data: Use yearly_seasonality=True, set weekly_seasonality=False and daily_seasonality=False

  • Daily data: Use yearly_seasonality=True and weekly_seasonality=True, set daily_seasonality=False

  • Intra-day data: Use all three seasonality parameters as needed

Prophet components are automatically added to the model as control variables.

Convergence Best Practices

If convergence fails:

  1. Increase tune and draws

  2. Increase target_accept (0.95-0.99)

  3. Use more informative priors

  4. Simplify model (fewer channels/features)

  5. Check data quality

For large datasets:

  • Use more chains and draws

  • Increase computational resources

  • Monitor memory usage

For complex models:

  • Start simple, add complexity gradually

  • Validate at each step

  • Document changes

References

See also: