src.core.mmm_base_v2¶
Base class for Media Mix Models (MMM) with In-Graph Scaling.
This is version 2 of the MMM base class that implements scaling within the PyMC model graph to avoid PyTensor compilation cache contamination issues. Based on the architecture used by pymc-marketing.
Module Contents¶
- class src.core.mmm_base_v2.BaseDelayedSaturatedMMMv2(date_column: str, channel_columns: List[str], adstock_max_lag: int, model_config: Dict | None = None, sampler_config: Dict | None = None, validate_data: bool = True, control_columns: List[str] | None = None, df_lift_test: pandas.DataFrame | None = None, **kwargs)¶
Bases:
src.core.model.MMMBase class for Media Mix Models with delayed adstock and logistic saturation.
Version 2: Implements in-graph scaling to avoid PyTensor cache contamination.
Key Changes from v1: - Stores raw data instead of preprocessed data - Computes scaling parameters but doesn’t apply them to data - Applies scaling transformations within the PyMC model graph - Eliminates tensor shape contamination issues
This implementation is inspired by pymc-marketing’s approach to handling data transformations within the model graph using pm.Data containers.
- channel_scale_mean¶
Mean values for channel scaling.
- Type:
pd.Series
- channel_scale_std¶
Standard deviation for channel scaling.
- Type:
pd.Series
- target_scale_mean¶
Mean value for target scaling.
- Type:
float
- target_scale_std¶
Standard deviation for target scaling.
- Type:
float
- control_scale_mean¶
Mean values for control scaling.
- Type:
pd.Series
- control_scale_std¶
Standard deviation for control scaling.
- Type:
pd.Series
- control_columns¶
List of control variable columns.
- Type:
Optional[List[str]]
- adstock_max_lag¶
Maximum lag for adstock transformation.
- Type:
int
- yearly_seasonality¶
Number of Fourier modes for seasonality.
- Type:
Optional[int]
- date_column¶
Name of the date column.
- Type:
str
- validate_data¶
Whether to validate input data.
- Type:
bool
- channel_columns¶
List of media channel columns.
- Type:
List[str]
- model_config¶
Model configuration.
- Type:
Optional[Dict]
- sampler_config¶
Sampler configuration.
- Type:
Optional[Dict]
- property default_sampler_config: Dict¶
Returns the default configuration for the PyMC sampler.
- property output_var: str¶
Returns the name of the target variable used in the model.
- compute_scaling_params(X: pandas.DataFrame, y: pandas.Series | numpy.ndarray) None¶
Computes max-abs scaling parameters without applying them to the data.
This method calculates max absolute values for scaling following PyMC-Marketing’s MaxAbsScaler approach. The actual scaling will happen within the PyMC model graph.
- Parameters:
X – Input features DataFrame containing channels and controls.
y – Target variable data.
- build_model(X: pandas.DataFrame, y: pandas.Series | numpy.ndarray, **kwargs: Any) None¶
Builds the PyMC model with in-graph scaling.
This is the core change from v1: scaling happens WITHIN the model graph using PyTensor operations rather than preprocessing the data beforehand.
- Parameters:
X – Input features DataFrame.
y – Target variable data.
**kwargs – Additional keyword arguments.
- classmethod load(fname: str) BaseDelayedSaturatedMMMv2¶
Loads a saved model instance from a NetCDF file.
- Parameters:
fname – File path to load the model from.
- Returns:
Loaded model instance with scaling parameters.
- property default_model_config: Dict¶
Returns the default model configuration dictionary.
- channel_contributions_forward_pass(channel_data: numpy.ndarray) numpy.ndarray¶
Evaluates channel contributions using fitted model parameters.
Note: This method expects RAW channel data and will apply scaling internally based on the stored scaling parameters.
- Parameters:
channel_data – Raw input channel data (not scaled). Shape should be (n_dates, n_channels).
- Returns:
Estimated channel contributions based on the fitted model. Shape will be (chains, draws, n_dates, n_channels).