src.core.mmm_base_v2

Base class for Media Mix Models (MMM) with In-Graph Scaling.

This is version 2 of the MMM base class that implements scaling within the PyMC model graph to avoid PyTensor compilation cache contamination issues. Based on the architecture used by pymc-marketing.

Module Contents

class src.core.mmm_base_v2.BaseDelayedSaturatedMMMv2(date_column: str, channel_columns: List[str], adstock_max_lag: int, model_config: Dict | None = None, sampler_config: Dict | None = None, validate_data: bool = True, control_columns: List[str] | None = None, df_lift_test: pandas.DataFrame | None = None, **kwargs)

Bases: src.core.model.MMM

Base class for Media Mix Models with delayed adstock and logistic saturation.

Version 2: Implements in-graph scaling to avoid PyTensor cache contamination.

Key Changes from v1: - Stores raw data instead of preprocessed data - Computes scaling parameters but doesn’t apply them to data - Applies scaling transformations within the PyMC model graph - Eliminates tensor shape contamination issues

This implementation is inspired by pymc-marketing’s approach to handling data transformations within the model graph using pm.Data containers.

channel_scale_mean

Mean values for channel scaling.

Type:

pd.Series

channel_scale_std

Standard deviation for channel scaling.

Type:

pd.Series

target_scale_mean

Mean value for target scaling.

Type:

float

target_scale_std

Standard deviation for target scaling.

Type:

float

control_scale_mean

Mean values for control scaling.

Type:

pd.Series

control_scale_std

Standard deviation for control scaling.

Type:

pd.Series

control_columns

List of control variable columns.

Type:

Optional[List[str]]

adstock_max_lag

Maximum lag for adstock transformation.

Type:

int

yearly_seasonality

Number of Fourier modes for seasonality.

Type:

Optional[int]

date_column

Name of the date column.

Type:

str

validate_data

Whether to validate input data.

Type:

bool

channel_columns

List of media channel columns.

Type:

List[str]

model_config

Model configuration.

Type:

Optional[Dict]

sampler_config

Sampler configuration.

Type:

Optional[Dict]

property default_sampler_config: Dict

Returns the default configuration for the PyMC sampler.

property output_var: str

Returns the name of the target variable used in the model.

compute_scaling_params(X: pandas.DataFrame, y: pandas.Series | numpy.ndarray) None

Computes max-abs scaling parameters without applying them to the data.

This method calculates max absolute values for scaling following PyMC-Marketing’s MaxAbsScaler approach. The actual scaling will happen within the PyMC model graph.

Parameters:
  • X – Input features DataFrame containing channels and controls.

  • y – Target variable data.

build_model(X: pandas.DataFrame, y: pandas.Series | numpy.ndarray, **kwargs: Any) None

Builds the PyMC model with in-graph scaling.

This is the core change from v1: scaling happens WITHIN the model graph using PyTensor operations rather than preprocessing the data beforehand.

Parameters:
  • X – Input features DataFrame.

  • y – Target variable data.

  • **kwargs – Additional keyword arguments.

classmethod load(fname: str) BaseDelayedSaturatedMMMv2

Loads a saved model instance from a NetCDF file.

Parameters:

fname – File path to load the model from.

Returns:

Loaded model instance with scaling parameters.

property default_model_config: Dict

Returns the default model configuration dictionary.

channel_contributions_forward_pass(channel_data: numpy.ndarray) numpy.ndarray

Evaluates channel contributions using fitted model parameters.

Note: This method expects RAW channel data and will apply scaling internally based on the stored scaling parameters.

Parameters:

channel_data – Raw input channel data (not scaled). Shape should be (n_dates, n_channels).

Returns:

Estimated channel contributions based on the fitted model. Shape will be (chains, draws, n_dates, n_channels).