Input Validator (diagnostics.input_validator)¶
Utilities for validating input data before modelling. These checks help ensure data quality and prevent common issues prior to model training.
Key checks
check_nans(dataframe, target_col, media_cols, control_cols)Verifies that specified columns contain no NaN values.
Raises
ValueErrorwith a list of offending columns.
check_duplicate_columns(dataframe)Ensures column names are unique.
Raises
ValueErrorif duplicates are found.
check_date_column(date_series, config)Validates chronology, frequency, missing dates, and weekly start day.
Attempts to parse using
config.get('date_format')if provided.Raises
ValueErrorif unsorted, irregular, or gaps are detected.
check_column_variance(dataframe, columns, check_zeros_only=False)Detects columns with zero variance (or all zeros when
check_zeros_only=True).Raises
ValueErrorlisting columns with issues.
Usage example
import pandas as pd
from src.diagnostics.input_validator import (
check_nans,
check_duplicate_columns,
check_date_column,
check_column_variance,
)
# Example inputs
config = {"date_format": None}
media_cols = ["tv_spend", "search_spend"]
control_cols = ["price", "competitor_index"]
# 1) Duplicate columns
check_duplicate_columns(df)
# 2) Date column integrity
check_date_column(df["date"], config)
# 3) NaNs across core columns
check_nans(df, target_col="revenue", media_cols=media_cols, control_cols=control_cols)
# 4) Zero-variance checks
check_column_variance(df, columns=media_cols + control_cols, check_zeros_only=False)
Notes
These validators print structured status messages; errors raise
ValueErrorand are intended to fail fast.For MMM-specific diagnostics (stationarity, VIF, transfer entropy), see the Pre‑Diagnostics Guide.