src.llm_interpreter.inputs.schema_map

Schema definitions for AMMM CSV outputs.

This module defines typed dataclasses for all 12 CSV files generated by the AMMM pipeline. Each dataclass represents a row in its respective CSV file with proper type annotations.

Author: AMMM Team Created: 2025-04-10 Last Modified: 2025-04-10

Module Contents

class src.llm_interpreter.inputs.schema_map.StationarityRow

Represents a row from stationarity_summary.csv.

Tests for stationarity in time series data using both ADF and KPSS tests. Generated during Phase 4 - Data Exploration (Pre-diagnostics).

is_problematic() bool

Check if variable shows non-stationarity issues.

class src.llm_interpreter.inputs.schema_map.VIFRow

Represents a row from vif_summary.csv.

Variance Inflation Factor (VIF) analysis to detect multicollinearity between features. Generated during Phase 4 - Data Exploration (Pre-diagnostics).

Interpretation: - VIF = 1: No correlation - VIF < 5: Low correlation (acceptable) - VIF 5-10: Moderate correlation (caution) - VIF > 10: High multicollinearity (problematic)

severity_level() Literal['none', 'low', 'moderate', 'high']

Classify VIF severity level.

is_multicollinear() bool

Check if variable is flagged for high VIF.

class src.llm_interpreter.inputs.schema_map.TransferEntropyRow

Represents a row from transfer_entropy_summary.csv.

Measures bidirectional information transfer between variables using transfer entropy. Generated during Phase 4 - Data Exploration (Pre-diagnostics).

has_significant_transfer() bool

Check if there’s significant information transfer in either direction.

class src.llm_interpreter.inputs.schema_map.ModelSummaryRow

Represents a row from model_summary.csv.

Detailed summary of all fitted model parameters with posterior statistics. Generated during Phase 5 - Model Fitting (After MCMC sampling).

Parameter Types: - intercept: Model intercept (baseline effect) - likelihood_sigma: Noise/error standard deviation - beta_channel[channel_name]: Channel effectiveness coefficient - alpha[channel_name]: Adstock retention parameter (0-1) - lam[channel_name]: Saturation steepness parameter

has_converged(threshold: float = 1.01) bool

Check if parameter has converged (r_hat ≈ 1.0).

get_parameter_type() Literal['intercept', 'sigma', 'beta', 'alpha', 'lam', 'unknown']

Extract parameter type from parameter name.

class src.llm_interpreter.inputs.schema_map.ELPDRow

Represents a row from ELPD_summary.csv.

Expected Log Pointwise Predictive Density (ELPD) and model diagnostics. Generated during Phase 7 - Post-Analysis (Model Diagnostics).

Metrics Included: - n_samples: Number of posterior samples used - n_data_points: Number of data points in the model - good_k: Proportion of good Pareto k values (should be > 0.7) - elpd_loo: Expected log pointwise predictive density (LOO-CV) - p_loo: Effective number of parameters - warning: Whether LOO diagnostic warnings were raised - r_squared: Model R-squared value

as_float() float | None

Safely convert value to float if numeric.

as_bool() bool | None

Safely convert value to bool if boolean.

class src.llm_interpreter.inputs.schema_map.MediaPerformanceEffectRow

Represents a row from media_performance_effect.csv.

Media channel effectiveness with posterior statistics from Bayesian model. Generated during Phase 7 - Post-Analysis (Performance Calculation).

has_converged(threshold: float = 1.01) bool

Check if parameter has converged (r_hat ≈ 1.0).

class src.llm_interpreter.inputs.schema_map.MediaConversionEfficiencyRow

[LEGACY V1] Represents a row from media_conversion_efficiency.csv.

Note: This file is not generated in V2. Kept for backward compatibility.

class src.llm_interpreter.inputs.schema_map.MediaCostPerConversionRow

[LEGACY V1] Represents a row from media_cost_per_conversion.csv.

Note: This file is not generated in V2. Kept for backward compatibility.

class src.llm_interpreter.inputs.schema_map.MediaContributionPerSpendRow

Represents a row from media_contribution_per_spend.csv (V2).

Media channel ROI/contribution per spend with percentiles. Generated during Phase 7 - Post-Analysis (Performance Calculation).

class src.llm_interpreter.inputs.schema_map.MediaCostPerRevenueUnitRow

Represents a row from media_cost_per_revenue_unit.csv (V2).

Cost per revenue unit metrics with percentiles for each media channel. Generated during Phase 7 - Post-Analysis (Performance Calculation).

class src.llm_interpreter.inputs.schema_map.ResponseCurveFitRow

Represents a row from response_curve_fit_combined.csv.

Fitted response curves for all media channels showing diminishing returns. Generated during Phase 5 - Model Fitting (Visualization Phase).

class src.llm_interpreter.inputs.schema_map.BudgetScenarioResultRow

Represents a row from budget_scenario_results.csv.

Results from budget scenario planning across different budget levels. Generated during Phase 8 - Budget Optimization.

Scenario Types: - baseline: Current spend levels - scenario-X: X% decrease in total budget - scenario_+X: X% increase in total budget

is_baseline() bool

Check if this is the baseline scenario.

is_total_row() bool

Check if this is a total/aggregate row.

class src.llm_interpreter.inputs.schema_map.AllDecompRow

Represents a row from all_decomp.csv.

Time-series decomposition showing contribution of each channel and control variable. Generated during Phase 7 - Post-Analysis (Decomposition).

Note: This CSV has dynamic columns for each media channel and control variable. The structure is: date, [channel_contributions], [control_contributions], trend, intercept

get_channel_contribution(channel: str) float

Get contribution for a specific channel.

get_total_media_contribution(media_channels: list[str]) float

Calculate total contribution from specified media channels.

class src.llm_interpreter.inputs.schema_map.WaterfallDecompositionRow

Represents a row from waterfall_decomposition_data.csv.

Aggregated decomposition data for waterfall visualizations. Generated during Phase 7 - Post-Analysis (Visualization).

is_positive_contributor() bool

Check if component has positive contribution.

class src.llm_interpreter.inputs.schema_map.CSVSummary

Container for a single CSV file’s data and metadata.

is_empty() bool

Check if CSV has no data.

class src.llm_interpreter.inputs.schema_map.AllCSVData

Container for all CSV outputs from AMMM pipeline.

Each attribute holds a CSVSummary object containing the parsed data from the respective CSV file.

classmethod from_dict(csv_dict: dict[str, list]) AllCSVData

Create AllCSVData from a dictionary of CSV data.

Parameters:

csv_dict – Dictionary mapping CSV names to lists of dataclass instances

Returns:

AllCSVData object with CSVSummary attributes

get_available_files() list[str]

Get list of CSV files that were successfully loaded.

count_loaded_files() int

Count how many CSV files were loaded.

src.llm_interpreter.inputs.schema_map.get_schema_class(csv_name: str)

Get the schema dataclass for a given CSV file name.

Parameters:

csv_name – Name of CSV file (with or without .csv extension)

Returns:

Dataclass type for the schema

Raises:

KeyError – If CSV name not found in schema map

src.llm_interpreter.inputs.schema_map.get_column_mapping(csv_name: str) dict[str, str]

Get column name mappings for a CSV file.

Parameters:

csv_name – Name of CSV file (with or without .csv extension)

Returns:

Dictionary mapping CSV column names to dataclass field names