src.core.utils¶

This module contains utility functions for ammm.

Functions¶

estimate_menten_parameters(channel, original_dataframe, contributions, **kwargs): Estimate the parameters for the Michaelis-Menten function using curve fitting.
estimate_sigmoid_parameters(channel, original_dataframe, contributions, **kwargs): Estimate the parameters for the sigmoid function using curve fitting.
compute_sigmoid_second_derivative(x, alpha, lam): Compute the second derivative of the extended sigmoid function.
find_sigmoid_inflection_point(alpha, lam): Find the inflection point of the extended sigmoid function.
standardize_scenarios_dict_keys(d, keywords): Standardize the keys in a dictionary based on a list of keywords.
apply_sklearn_transformer_across_dim(data, func, dim_name, combined=False): Helper function to use scikit-learn functions with the xarray target.
sigmoid_saturation(x, alpha, lam): Compute the sigmoid function.

Module Contents¶

src.core.utils.estimate_menten_parameters(channel: str, original_dataframe: pandas.DataFrame, contributions: xarray.DataArray, **kwargs: Any) → List[float]¶

Estimate the parameters for the Michaelis-Menten function using curve fitting.

This function extracts the relevant data for the specified channel from both the original_dataframe and contributions DataArray resulting from the model. It then utilizes scipy’s curve_fit method to find the optimal parameters for an Menten function, aiming to minimize the least squares difference between the observed and predicted data.

Parameters:

channel (str) – The name of the marketing channel for which parameters are to be estimated.
original_dataframe (pd.DataFrame) – The original DataFrame containing the channel data.
contributions (xr.DataArray) – An xarray DataArray containing the contributions data, indexed by channel.
**kwargs (Any) – Additional keyword arguments for scipy.optimize.curve_fit and initial parameter estimates (e.g., maxfev, lam_initial_estimate, alpha_initial_estimate, x, y).

Returns:

The estimated parameters (typically [alpha, lam]) of the Michaelis-Menten function.

Return type:

List[float]

src.core.utils.estimate_sigmoid_parameters(channel: str, original_dataframe: pandas.DataFrame, contributions: xarray.DataArray, **kwargs: Any) → List[float]¶

Estimate the parameters for the sigmoid function using curve fitting.

This function extracts the relevant data for the specified channel from both the original_dataframe and contributions DataArray resulting from the model. It then utilizes scipy’s curve_fit method to find the optimal parameters for an sigmoid function, aiming to minimize the least squares difference between the observed and predicted data.

Parameters:

channel (str) – The name of the marketing channel for which parameters are to be estimated.
original_dataframe (pd.DataFrame) – The original DataFrame containing the channel data.
contributions (xr.DataArray) – An xarray DataArray containing the contributions data, indexed by channel.
**kwargs (Any) – Additional keyword arguments for scipy.optimize.curve_fit and initial parameter estimates (e.g., maxfev, lam_initial_estimate, alpha_initial_estimate, x, y).

Returns:

The estimated parameters (typically [alpha, lam]) of the sigmoid function.

Return type:

List[float]

src.core.utils.compute_sigmoid_second_derivative(x: float | numpy.typing.NDArray[numpy.float64], alpha: float | numpy.typing.NDArray[numpy.float64], lam: float | numpy.typing.NDArray[numpy.float64]) → float | numpy.typing.NDArray[numpy.float64]¶

Compute the second derivative of the extended sigmoid function.

The second derivative of a function gives us information about the curvature of the function. In the context of the sigmoid function, it helps us identify the inflection point, which is the point where the function changes from being concave up to concave down, or vice versa.

Parameters:

x (Union[float, npt.NDArray[np.float64]]) – The input value(s) for which the second derivative is to be computed.
alpha (Union[float, npt.NDArray[np.float64]]) – The asymptotic maximum or ceiling value of the sigmoid function.
lam (Union[float, npt.NDArray[np.float64]]) – The parameter that affects how quickly the function approaches its upper and lower asymptotes.

Returns:

The second derivative of the sigmoid function at the input value(s).

Return type:

Union[float, npt.NDArray[np.float64]]

src.core.utils.find_sigmoid_inflection_point(alpha: float, lam: float) → Tuple[float, float]¶

Find the inflection point of the extended sigmoid function.

The inflection point of a function is the point where the function changes its curvature, i.e., it changes from being concave up to concave down, or vice versa. For the sigmoid function, this is the point where the function has its maximum rate of growth.

Parameters:

alpha (float) – The asymptotic maximum or ceiling value of the sigmoid function.
lam (float) – The parameter that affects how quickly the function approaches its upper and lower asymptotes.

Returns:

The x and y coordinates of the inflection point.

Return type:

Tuple[float, float]

Raises:

TypeError – If alpha or lam are not scalar values.

src.core.utils.standardize_scenarios_dict_keys(d: Dict[Any, Any], keywords: List[str]) → None¶

Standardize the keys in a dictionary based on a list of keywords.

This function iterates over the keys in the dictionary and the keywords. If a keyword is found in a key (case-insensitive), the key is replaced with the keyword.

Parameters:

d (Dict[Any, Any]) – The dictionary whose keys are to be standardized. This dictionary is modified in-place.
keywords (List[str]) – The list of keywords to standardize the keys to.

src.core.utils.apply_sklearn_transformer_across_dim(data: xarray.DataArray, func: Callable[[numpy.ndarray], numpy.ndarray], dim_name: str, combined: bool = False) → xarray.DataArray¶

Helper function in order to use scikit-learn functions with the xarray target.

Parameters:

data (xr.DataArray) – The xarray DataArray to transform.
func (Callable[[np.ndarray], np.ndarray]) – A scikit-learn transformer method (e.g., scaler.transform).
dim_name (str) – Name of the dimension to apply the function along.
combined (bool, optional) – Flag to indicate if the data coords have been combined or not. Defaults to False.

Returns:

The transformed DataArray.

Return type:

xr.DataArray

src.core.utils.sigmoid_saturation(x: float | numpy.typing.NDArray[numpy.float64], alpha: float | numpy.typing.NDArray[numpy.float64], lam: float | numpy.typing.NDArray[numpy.float64]) → float | numpy.typing.NDArray[numpy.float64]¶

Calculates the sigmoid saturation value for a given input.

The formula used is: alpha * (1 - exp(-lam * x)) / (1 + exp(-lam * x)).

Parameters:

x (Union[float, npt.NDArray[np.float64]]) – The input value(s).
alpha (Union[float, npt.NDArray[np.float64]]) – The asymptotic maximum or ceiling value(s). Must be > 0.
lam (Union[float, npt.NDArray[np.float64]]) – The parameter(s) that affects how quickly the function approaches its upper and lower asymptotes. A higher value of lam makes the curve steeper, while a lower value makes it more gradual. Must be > 0.

Returns:

The sigmoid saturation value(s).

Return type:

Union[float, npt.NDArray[np.float64]]

Raises:

ValueError – If alpha or lam is less than or equal to 0.

src.core.utils.create_new_spend_data(spend: numpy.ndarray, adstock_max_lag: int, one_time: bool, spend_leading_up: numpy.ndarray | None = None) → numpy.ndarray¶

Create new spend data for the channel forward pass.

Spends must be the same length as the number of channels.

Parameters:

spend (np.ndarray) – The spend data for the channels.
adstock_max_lag (int) – The maximum lag for the adstock transformation.
one_time (bool) – If True, the spend is considered a one-time event, followed by zeros. If False, the spend is considered constant over adstock_max_lag + 1 periods.
spend_leading_up (Optional[np.ndarray], optional) – The spend leading up to the first observation. Defaults to None (treated as zeros).

Returns:

The new spend data array, typically of shape (2 * adstock_max_lag + 1, n_channels).

Return type:

np.ndarray

Raises:

ValueError – If spend_leading_up is provided and its length does not match the number of channels in spend.