Code Monkey home page Code Monkey logo

mmm_stan's Introduction

Python/STAN Implementation of Multiplicative Marketing Mix Model

The methodology of this project is based on this paper by Google, but is applied to a more complicated, real-world setting, where 1) there are 13 media channels and 46 control variables; 2) models are built in a stacked way.

1. Introduction

Marketing Mix Model, or Media Mix Model (MMM) is used by advertisers to measure how their media spending contributes to sales, so as to optimize future budget allocation. ROAS (return on ad spend) and mROAS (marginal ROAS) are the key metrics to look at. High ROAS indicates the channel is efficient, high mROAS means increasing spend in the channel will yield a high return based on current spending level.

Procedures

  1. Fit a regression model with priors on coefficients, using media channels' impressions (or spending) and control variables to predict sales;

  2. Decompose sales to each media channel's contribution. Channel contribution is calculated by comparing original sales and predicted sales upon removal of the channel;

  3. Compute ROAS and mROAS using channel contribution and spending.

Intuition of MMM

  • Offline channel's influence is hard to track. E.g., a customer saw a TV ad, and made a purchase at store.
  • Media channels' influences are intertwined.

Actual Customer Journey: Multiple Touchpoints
A customer saw a product on TV > clicked on a display ad > clicked on a paid seach ad > made a purchase of $30. In this case, 3 touchpoints contributed to the conversion, and they should all get credits for this conversion.
actual customer journey - multiple touchpoints

What's trackable: Last Digital Touchpoint
Usually, only the last digital touchpoint can be tracked. In this case, SEM, and it will get all credits for this conversion.
what can be tracked - last touchpoint
So, a good attribution model should take into account all the relevant variables leading to conversion.

1.1 Multiplicative MMM

Since media channels work interactively, a multiplicative model structure is adopted:
multiplicative MMM
Take log of both sides, we get the linear form (log-log model):
multiplicative MMM - linear form

Constraints on Coefficients

  1. Media coefficients are positive.

  2. Control variables like discount, macro economy, event/retail holiday are expected to have positive impact on sales, their coefficients should also be positive.

1.2 Adstock

Media effect on sales may lag behind the original exposure and extend several weeks. The carry-over effect is modeled by Adstock:
adstock transformation
L: length of the media effect
P: peak/delay of the media effect, how many weeks it's lagging behind first exposure
D: decay/retention rate of the media channel, concentration of the effect
The media effect of current weeks is a weighted average of current week and previous (L− 1) weeks.

Adstock Example
adstock example

Adstock with Varying Decay
The larger the decay, the more scattered the effect.
adstock parameter - decay
Adstock with Varying Length
The impact of length is relatively minor. In model training, length could be fixed to 8 weeks or a period long enough for the media effect to finish.
adstock parameter - length

import numpy as np
import pandas as pd

def apply_adstock(x, L, P, D):
    '''
    params:
    x: original media variable, array
    L: length
    P: peak, delay in effect
    D: decay, retain rate
    returns:
    array, adstocked media variable
    '''
    x = np.append(np.zeros(L-1), x)
    
    weights = np.zeros(L)
    for l in range(L):
        weight = D**((l-P)**2)
        weights[L-1-l] = weight
    
    adstocked_x = []
    for i in range(L-1, len(x)):
        x_array = x[i-L+1:i+1]
        xi = sum(x_array * weights)/sum(weights)
        adstocked_x.append(xi)
    adstocked_x = np.array(adstocked_x)
    return adstocked_x

1.3 Diminishing Return

After a certain saturation point, increasing spend will yield diminishing marginal return, the channel will be losing efficiency as you keep overspending on it. The diminishing return is modeled by Hill function:
Hill function
K: half saturation point
S: slope

Hill function with varying K and S
Hill function with varying K and S

def hill_transform(x, ec, slope):
    return 1 / (1 + (x / ec)**(-slope))

2. Model Specification & Implementation

Data

Four years' (209 weeks) records of sales, media impression and media spending at weekly level.

1. Media Variables

  • Media Impression (prefix='mdip_'): impressions of 13 media channels: direct mail, insert, newspaper, digital audio, radio, TV, digital video, social media, online display, email, SMS, affiliates, SEM.
  • Media Spending (prefix='mdsp_'): spending of media channels.

2. Control Variables

  • Macro Economy (prefix='me_'): CPI, gas price.
  • Markdown (prefix='mrkdn_'): markdown/discount.
  • Store Count ('st_ct')
  • Retail Holidays (prefix='hldy_'): one-hot encoded.
  • Seasonality (prefix='seas_'): month, with Nov and Dec further broken into to weeks. One-hot encoded.

3. Sales Variable ('sales')

df = pd.read_csv('data.csv')

# 1. media variables
# media impression
mdip_cols=[col for col in df.columns if 'mdip_' in col]
# media spending
mdsp_cols=[col for col in df.columns if 'mdsp_' in col]

# 2. control variables
# macro economics variables
me_cols = [col for col in df.columns if 'me_' in col]
# store count variables
st_cols = ['st_ct']
# markdown/discount variables
mrkdn_cols = [col for col in df.columns if 'mrkdn_' in col]
# holiday variables
hldy_cols = [col for col in df.columns if 'hldy_' in col]
# seasonality variables
seas_cols = [col for col in df.columns if 'seas_' in col]
base_vars = me_cols+st_cols+mrkdn_cols+va_cols+hldy_cols+seas_cols

# 3. sales variables
sales_cols =['sales']

Model Architecture

The model is built in a stacked way. Three models are trained:

  • Control Model
  • Marketing Mix Model
  • Diminishing Return Model
    mmm_stan_model_architecture

2.1 Control Model / Base Sales Model

Goal: predict base sales (X_ctrl) as an input variable to MMM, this represents the baseline sales trend without any marketing activities.
control model formular
X1: control variables positively related with sales, including macro economy, store count, markdown, holiday.
X2: control variables that may have either positive or negtive impact on sales: seasonality.
Target variable: ln(sales).
The variables are centralized by mean.

Priors
control model priors

import pystan
import os
os.environ['CC'] = 'gcc-10'
os.environ['CXX'] = 'g++-10'

# mean-centralize: sales, numeric base_vars
df_ctrl, sc_ctrl = mean_center_trandform(df, ['sales']+me_cols+st_cols+mrkdn_cols)
df_ctrl = pd.concat([df_ctrl, df[hldy_cols+seas_cols]], axis=1)

# variables positively related to sales: macro economy, store count, markdown, holiday
pos_vars = [col for col in base_vars if col not in seas_cols]
X1 = df_ctrl[pos_vars].values

# variables may have either positive or negtive impact on sales: seasonality
pn_vars = seas_cols
X2 = df_ctrl[pn_vars].values

ctrl_data = {
    'N': len(df_ctrl),
    'K1': len(pos_vars), 
    'K2': len(pn_vars), 
    'X1': X1,
    'X2': X2, 
    'y': df_ctrl['sales'].values,
    'max_intercept': min(df_ctrl['sales'])
}

ctrl_code1 = '''
data {
  int N; // number of observations
  int K1; // number of positive predictors
  int K2; // number of positive/negative predictors
  real max_intercept; // restrict the intercept to be less than the minimum y
  matrix[N, K1] X1;
  matrix[N, K2] X2;
  vector[N] y; 
}

parameters {
  vector<lower=0>[K1] beta1; // regression coefficients for X1 (positive)
  vector[K2] beta2; // regression coefficients for X2
  real<lower=0, upper=max_intercept> alpha; // intercept
  real<lower=0> noise_var; // residual variance
}

model {
  // Define the priors
  beta1 ~ normal(0, 1); 
  beta2 ~ normal(0, 1); 
  noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
  // The likelihood
  y ~ normal(X1*beta1 + X2*beta2 + alpha, sqrt(noise_var));
}
'''

sm1 = pystan.StanModel(model_code=ctrl_code1, verbose=True)
fit1 = sm1.sampling(data=ctrl_data, iter=2000, chains=4)
fit1_result = fit1.extract()

MAPE of control model: 8.63%
Extract control model parameters from the fit object and predict base sales -> df['base_sales']

2.2 Marketing Mix Model

Goal:

  • Find appropriate adstock parameters for media channels;
  • Decompose sales to media channels' contribution (and non-marketing contribution).

marketing mix model formular
L: length of media impact
P: peak of media impact
D: decay of media impact
X: adstocked media impression variables and base sales
Target variable: ln(sales)
Variables are centralized by mean.

Priors
marketing mix model priors

df_mmm, sc_mmm = mean_log1p_trandform(df, ['sales', 'base_sales'])
mu_mdip = df[mdip_cols].apply(np.mean, axis=0).values
max_lag = 8
num_media = len(mdip_cols)
# padding zero * (max_lag-1) rows
X_media = np.concatenate((np.zeros((max_lag-1, num_media)), df[mdip_cols].values), axis=0)
X_ctrl = df_mmm['base_sales'].values.reshape(len(df),1)
model_data2 = {
    'N': len(df),
    'max_lag': max_lag, 
    'num_media': num_media,
    'X_media': X_media, 
    'mu_mdip': mu_mdip,
    'num_ctrl': X_ctrl.shape[1],
    'X_ctrl': X_ctrl, 
    'y': df_mmm['sales'].values
}

model_code2 = '''
functions {
  // the adstock transformation with a vector of weights
  real Adstock(vector t, row_vector weights) {
    return dot_product(t, weights) / sum(weights);
  }
}
data {
  // the total number of observations
  int<lower=1> N;
  // the vector of sales
  real y[N];
  // the maximum duration of lag effect, in weeks
  int<lower=1> max_lag;
  // the number of media channels
  int<lower=1> num_media;
  // matrix of media variables
  matrix[N+max_lag-1, num_media] X_media;
  // vector of media variables' mean
  real mu_mdip[num_media];
  // the number of other control variables
  int<lower=1> num_ctrl;
  // a matrix of control variables
  matrix[N, num_ctrl] X_ctrl;
}
parameters {
  // residual variance
  real<lower=0> noise_var;
  // the intercept
  real tau;
  // the coefficients for media variables and base sales
  vector<lower=0>[num_media+num_ctrl] beta;
  // the decay and peak parameter for the adstock transformation of
  // each media
  vector<lower=0,upper=1>[num_media] decay;
  vector<lower=0,upper=ceil(max_lag/2)>[num_media] peak;
}
transformed parameters {
  // the cumulative media effect after adstock
  real cum_effect;
  // matrix of media variables after adstock
  matrix[N, num_media] X_media_adstocked;
  // matrix of all predictors
  matrix[N, num_media+num_ctrl] X;
  
  // adstock, mean-center, log1p transformation
  row_vector[max_lag] lag_weights;
  for (nn in 1:N) {
    for (media in 1 : num_media) {
      for (lag in 1 : max_lag) {
        lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
      }
     cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
     X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
    }
  X <- append_col(X_media_adstocked, X_ctrl);
  } 
}
model {
  decay ~ beta(3,3);
  peak ~ uniform(0, ceil(max_lag/2));
  tau ~ normal(0, 5);
  for (i in 1 : num_media+num_ctrl) {
    beta[i] ~ normal(0, 1);
  }
  noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
  y ~ normal(tau + X * beta, sqrt(noise_var));
}
'''

sm2 = pystan.StanModel(model_code=model_code2, verbose=True)
fit2 = sm2.sampling(data=model_data2, iter=1000, chains=3)
fit2_result = fit2.extract()

Distribution of Media Coefficients
red line: mean, green line: median
media coefficients distribution

Decompose sales to media channels' contribution

Each media channel's contribution = total sales - sales upon removal of the channel
In the previous model fitting step, parameters of the log-log model have been found:
mmm_stan_decompose_contrib1
Plug them into the multiplicative model:
mmm_stan_decompose_contrib2
mmm_stan_decompose_contrib3

# decompose sales to media contribution
mc_df = mmm_decompose_media_contrib(mmm, df, y_true=df['sales_ln'])
adstock_params = mmm['adstock_params']
mc_pct, mc_pct2 = calc_media_contrib_pct(mc_df, period=52)

RMSE (log-log model): 0.04977
MAPE (multiplicative model): 15.71%

Adstock Parameters

{'dm': {'L': 8, 'P': 0.8147057071636012, 'D': 0.5048365638721349},
 'inst': {'L': 8, 'P': 0.6339321363933637, 'D': 0.40532404247040194},
 'nsp': {'L': 8, 'P': 1.1076944292039324, 'D': 0.4612905130128658},
 'auddig': {'L': 8, 'P': 1.8834110997525702, 'D': 0.5117823761413419},
 'audtr': {'L': 8, 'P': 1.9892680621155827, 'D': 0.5046141055524362},
 'vidtr': {'L': 8, 'P': 0.05520253973872224, 'D': 0.0846136627657064},
 'viddig': {'L': 8, 'P': 1.862571613911107, 'D': 0.5074553132446618},
 'so': {'L': 8, 'P': 1.7027472358912694, 'D': 0.5046386226501091},
 'on': {'L': 8, 'P': 1.4169662215350334, 'D': 0.4907407637366824},
 'em': {'L': 8, 'P': 1.0590065753144235, 'D': 0.44420264450045377},
 'sms': {'L': 8, 'P': 1.8487648735160152, 'D': 0.5090970201714644},
 'aff': {'L': 8, 'P': 0.6018657109295106, 'D': 0.39889023002777724},
 'sem': {'L': 8, 'P': 1.34945185610011, 'D': 0.47875793676213835}}

Notes:

  • For SEM, P=1.3, D=0.48 does not make a lot of sense to me, because SEM is expected to have immediate and concentrated impact (P=0, low decay). Same with online display.
  • Try more specific priors in future model.

2.3 Diminishing Return Model

Goal: for each channel, find the relationship (fit a Hill function) between spending and contribution, so that ROAS and marginal ROAS can be calculated.
diminishing return model formular
x: adstocked media channel spending
K: half saturation
S: shape
Target variable: the media channel's contribution
Variables are centralized by mean.

Priors
diminishing return model priors

def create_hill_model_data(df, mc_df, adstock_params, media):
    y = mc_df['mdip_'+media].values
    L, P, D = adstock_params[media]['L'], adstock_params[media]['P'], adstock_params[media]['D']
    x = df['mdsp_'+media].values
    x_adstocked = apply_adstock(x, L, P, D)
    # centralize
    mu_x, mu_y = x_adstocked.mean(), y.mean()
    sc = {'x': mu_x, 'y': mu_y}
    x = x_adstocked/mu_x
    y = y/mu_y
        
    model_data = {
        'N': len(y),
        'y': y,
        'X': x
    }
    return model_data, sc

model_code3 = '''
functions {
  // the Hill function
  real Hill(real t, real ec, real slope) {
  return 1 / (1 + (t / ec)^(-slope));
  }
}

data {
  // the total number of observations
  int<lower=1> N;
  // y: vector of media contribution
  vector[N] y;
  // X: vector of adstocked media spending
  vector[N] X;
}

parameters {
  // residual variance
  real<lower=0> noise_var;
  // regression coefficient
  real<lower=0> beta_hill;
  // ec50 and slope for Hill function of the media
  real<lower=0,upper=1> ec;
  real<lower=0> slope;
}

transformed parameters {
  // a vector of the mean response
  vector[N] mu;
  for (i in 1:N) {
    mu[i] <- beta_hill * Hill(X[i], ec, slope);
  }
}

model {
  slope ~ gamma(3, 1);
  ec ~ beta(2, 2);
  beta_hill ~ normal(0, 1);
  noise_var ~ inv_gamma(0.05, 0.05 * 0.01); 
  y ~ normal(mu, sqrt(noise_var));
}
'''

# train hill models for all media channels
sm3 = pystan.StanModel(model_code=model_code3, verbose=True)
hill_models = {}
to_train = ['dm', 'inst', 'nsp', 'auddig', 'audtr', 'vidtr', 'viddig', 'so', 'on', 'sem']
for media in to_train:
    print('training for media: ', media)
    hill_model = train_hill_model(df, mc_df, adstock_params, media, sm3)
    hill_models[media] = hill_model

Diminishing Return Model (Fitted Hill Curve)
fitted hill

Calculate overall ROAS and weekly ROAS

  • Overall ROAS = total media contribution / total media spending
  • Weekly ROAS = weekly media contribution / weekly media spending

Distribution of Weekly ROAS (Recent 1 Year)
red line: mean, green line: median
weekly roas

Calculate mROAS

Marginal ROAS represents the return of incremental spending based on current spending. For example, I've spent $100 on SEM, how much will the next $1 bring.
mROAS is calculated by increasing the current spending level by 1%, the incremental channel contribution over incremental channel spending.

  1. Current spending level cur_sp is an array of weekly spending in a given period.
    Next spending level next_sp is increasing cur_sp by 1%.
  2. Plug cur_sp and next_sp into the Hill function:
    Current media contribution cur_mc = beta * Hill(cur_sp)
    Next-level media contribution next_mc = beta * Hill(next_sp)
  3. mROAS = (sum(next_mc) - sum(cur_mc)) / sum(0.01 * cur_sp)

3. Results & Marketing Budget Optimization

Media Channel Contribution
80% sales are contributed by non-marketing factors, marketing channels contributed 20% sales.
marketing contribution plot
Top contributors: TV, affiliates, SEM
media contribution percentage plot
ROAS
High ROAS: TV, insert, online display
media channels contribution roas plot
mROAS
High mROAS: TV, insert, radio, online display
media channels roas mroas plot
Note: trivial channels: newspaper, digital audio, digital video, social (spending/impression too small to be qualified, so that their results are not trustworthy).

Q&A

Please check this running list of FAQ. If you have questions, comments, suggestions, and practical problems (when applying this script to your datasets) that are unaddressed in this list, feel free to open a discussion. You may also comment on my Medium article.
For bugs/errors in code, please open an issue. An issue is expected to be addressed in the following weekend.

References

[1] Bayesian Methods for Media Mix Modeling with Carryover and Shape Effects. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46001.pdf
[2] STAN tutorials:
Prior Choice Recommendations. https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations
Pystan Documentation. https://www.cnpython.com/pypi/pystan
Pystan Workflow. https://mc-stan.org/users/documentation/case-studies/pystan_workflow.html
A quick-start introduction to Stan for economists. https://nbviewer.jupyter.org/github/QuantEcon/QuantEcon.notebooks/blob/master/IntroToStan_basics_workflow.ipynb
HMC sampling. https://education.illinois.edu/docs/default-source/carolyn-anderson/edpsy590ca/lectures/9-hmc-and-stan/hmc_n_stan_post.pdf

If you like this project, please leave a 🌟 for motivation:)

mmm_stan's People

Contributors

sibylhe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mmm_stan's Issues

Struggled with -> RuntimeError: Initialization failed

Hi Sibyl,

First, let me thank you in advance for both your post (which is pretty good) and for your help!

I wanted to use your approach in an example but I can not fix the "RuntimeError: Initialization failed" when I do the ".sampling" and I really do not know what else to do. Could you give me any little advice?

Here it goes the data I provide to "sampling", and the error messagge:

Model data:

  • N: 53
  • max_lag: 2 (my example is with monthly data)
  • num_media: 8
  • X_media: As it is quite big and I guess the numbers do say nothing to you, I give you its shape (54,8)
  • mu_mdip: As I guess the numbers do say nothing to you, I give you its shape (8,)
  • num_ctrl: 1
  • X_ctrl: As I guess the numbers do say nothing to you, I give you its shape (53,1)
  • y: As I guess the numbers do say nothing to you, I give you its shape (53,)

Error message:

"RuntimeError Traceback (most recent call last)
in
----> 1 fit2 = sm2.sampling(data=model_data2, iter=2000, chains=4, verbose=True, n_jobs=1)

C:\ProgramData\Anaconda3\lib\site-packages\pystan\model.py in sampling(self, data, pars, chains, iter, warmup, thin, seed, init, sample_file, diagnostic_file, verbose, algorithm, control, n_jobs, **kwargs)
811 call_sampler_args = izip(itertools.repeat(data), args_list, itertools.repeat(pars))
812 call_sampler_star = self.module._call_sampler_star
--> 813 ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
814 samples = [smpl for _, smpl in ret_and_samples]
815

C:\ProgramData\Anaconda3\lib\site-packages\pystan\model.py in _map_parallel(function, args, n_jobs)
88 pool.join()
89 else:
---> 90 map_result = list(map(function, args))
91 return map_result
92

stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848.pyx in stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848._call_sampler_star()

stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848.pyx in stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848._call_sampler()

RuntimeError: Initialization failed."

I hope you might help me,
Once again, thank you so much.
Best regards.

Theoretical questions and tips for the code

Hi Sibyl,

Hoping you are well.
I have some theoretical questions about the model and some suggestions for the code.

About Marketing Mix Model (model 2):

  1. If I have to benchmark different models, which metrics do you suggest to use? Every time I check for the MAPE and RMSE are always pretty similar.

  2. How can we evaluate the impact of each control variable on the baseline?

  3. In your dataset, your Media is often ON. What happens if we have interspersed Media values with many 0s from campaign to campaign? In this case if our lag_effect willl not sufficient to cover this gap, the cumulative effect will be 0.
    For this last problem, I decided to change the STAN code, it works, but I am not sure if it is the right approach.
    Here you can find the changed code:

“””// adstock, mean-center, log1p transformation
row_vector[max_lag] lag_weights;
for (nn in 1:N) {
for (media in 1 : num_media) {
for (lag in 1 : max_lag) {
lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
}
cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
if (cum_effect == 0) {
X_media_adstocked[nn, media] <- 0;
}
else {
X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
}
}
X <- append_col(X_media_adstocked, X_ctrl);
}
}
“””
Instead of this modification I made, do you suggest other solutions? For example, creating a vector of cumulative effect, transform it to avoid 0 and then pass it through the formula (log1p(cum_effect/mu_mdip[media), could be effective?

  1. How can we deal if we have variables that are negative and positive as, e.g., seasonality (variation from negative to positive). Do you think a transformation as (X-X.min())/(X.max()-X.min()) can be a right approach, or do you suggest to keep the mean transformation?

  2. Why do we need to apply a log_mean_center transformation in the second model?


I think the undermentioned parts can improve the code to avoid some errors.

Control Model (model 1)
I think this solution can help, if we want to use only one variable for one specific beta, otherwise problems can arise.
I suggest transforming every control variable X in the matrix form to obtain the shape.
For example:
pos_vars = pos_vars = [col for col in base_vars if col not in seas_cols] -> here we have more than one variables
X1 = np.matrix(df_ctrl[pos_vars].values).reshape(len(df),(len([pos _vars]) if type(pos_vars) == str else len(pos_vars))))
pn_vars = seas_cols[1] -> we have only one control variable inside
X2 = np.matrix(df_ctrl[pn_vars].values).reshape(len(df),(len([pn_vars]) if type(pn_vars) == str else len(pn_vars)))) –> to mantain coherence shape later with the ctrl_data dictionary

ctrl_data = {
'N': len(df_ctrl),
'K1': X1.shape[1], --> instead of len(pos_vars)
'K2': X2.shape[1], --> instead of len(pn_vars)
'X1': X1,
'X2': X2,
'y': df_ctrl[base_sales'].values,
'max_intercept': min(df_ctrl['total_volume'])
}

In addition, in every sm.sampling() I would add n_jobs=-1 to run faster the code (if it can be helpful).

As always, Sibyl, thank you very much for your help and for the code you published. You are giving a big help to everyone needs it.

Best regards

Not an issue, but a Thank you!

@sibylhe I wanted to convey my thanks for sharing this truly comprehensive example of Marketing Mix models using STAN based library/package. I come from the R side and I have used STAN extensively, but not as well versed on the Python side with Pystan. This was a great tutorial to follow and use in different datasets that I have.

A sincere thank you! I'm not well versed with Git Hub, but eventually would like to share some of the examples I have developed using/modifying your code. Let me know how I can contribute here.

Sree

Attempting multiplication of list and float.

In the function below we save hill_model['data']['y'] as a list of floats.

def train_hill_model(df, mc_df, adstock_params, media, sm):
    data, sc = create_hill_model_data(df, mc_df, adstock_params, media)
    fit = sm.sampling(data=data, iter=2000, chains=4)
    fit_result = fit.extract()
    hill_model = {
        'beta_hill_list': fit_result['beta_hill'].tolist(),
        'ec_list': fit_result['ec'].tolist(),
        'slope_list': fit_result['slope'].tolist(),
        'sc': sc,
        'data': {
            'X': data['X'].tolist(),
            'y': data['y'].tolist(),
        }
    }
    return hill_model

However later on we try to multiply hill_model['data']['y'] by a float. Since it is a list it can't legally be multiplied by a float, and multiplication by an integer just increases the size of the list.

def evaluate_hill_model(hill_model, hill_model_params):
    x = hill_model['data']['X']
    y_true = hill_model['data']['y'] * hill_model['sc']['y']
    print(y_true)
    y_pred = hill_model_predict(hill_model_params, x) * hill_model['sc']['y']
    print(y_pred)
    print('mape on original data: ', 
         mean_absolute_percentage_error(y_true, y_pred))
    return y_true, y_pred

To be able to preform the multiplication hill_model['data']['y'] * hill_model['sc']['y'] then hill_model['data']['y'] needs to be a numpy array, not a list.

How to set free max_lag for each media in second model with prior or fix command

Hello,

I'd like to know how we have to change the code below-mentioned to have LAG free or fixed for every channel.

First case:
Set Lag free and use Priors; I think max_lag ~ beta(1,2) or (2,>4) can be good.
Here you can have an example of prior distribution beta simulator: https://keisan.casio.com/exec/system/1180573226

Second case
In this case we can choose which lag we want to fix for each channel.
Example:
max_lag_media_a = 10
max_lag_media_b = 4
max_lag_media_c = 6
etc...

Thanks

# 2.2 Marketing Mix Model
df_mmm, sc_mmm = mean_log1p_trandform(df, ['sales', 'base_sales'])
mu_mdip = df[mdip_cols].apply(np.mean, axis=0).values
max_lag = 8
num_media = len(mdip_cols)
# padding zero * (max_lag-1) rows
X_media = np.concatenate((np.zeros((max_lag-1, num_media)), df[mdip_cols].values), axis=0)
X_ctrl = df_mmm['base_sales'].values.reshape(len(df),1)
model_data2 = {
    'N': len(df),
    'max_lag': max_lag, 
    'num_media': num_media,
    'X_media': X_media, 
    'mu_mdip': mu_mdip,
    'num_ctrl': X_ctrl.shape[1],
    'X_ctrl': X_ctrl, 
    'y': df_mmm['sales'].values
}

model_code2 = '''
functions {
  // the adstock transformation with a vector of weights
  real Adstock(vector t, row_vector weights) {
    return dot_product(t, weights) / sum(weights);
  }
}
data {
  // the total number of observations
  int<lower=1> N;
  // the vector of sales
  real y[N];
  // the maximum duration of lag effect, in weeks
  int<lower=1> max_lag;
  // the number of media channels
  int<lower=1> num_media;
  // matrix of media variables
  matrix[N+max_lag-1, num_media] X_media;
  // vector of media variables' mean
  real mu_mdip[num_media];
  // the number of other control variables
  int<lower=1> num_ctrl;
  // a matrix of control variables
  matrix[N, num_ctrl] X_ctrl;
}
parameters {
  // residual variance
  real<lower=0> noise_var;
  // the intercept
  real tau;
  // the coefficients for media variables and base sales
  vector<lower=0>[num_media+num_ctrl] beta;
  // the decay and peak parameter for the adstock transformation of
  // each media
  vector<lower=0,upper=1>[num_media] decay;
  vector<lower=0,upper=ceil(max_lag/2)>[num_media] peak;
}
transformed parameters {
  // the cumulative media effect after adstock
  real cum_effect;
  // matrix of media variables after adstock
  matrix[N, num_media] X_media_adstocked;
  // matrix of all predictors
  matrix[N, num_media+num_ctrl] X;
  
  // adstock, mean-center, log1p transformation
  row_vector[max_lag] lag_weights;
  for (nn in 1:N) {
    for (media in 1 : num_media) {
      for (lag in 1 : max_lag) {
        lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
      }
     cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
     X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
    }
  X <- append_col(X_media_adstocked, X_ctrl);
  } 
}
model {
  decay ~ beta(3,3);
  peak ~ uniform(0, ceil(max_lag/2));
  tau ~ normal(0, 5);
  for (i in 1 : num_media+num_ctrl) {
    beta[i] ~ normal(0, 1);
  }
  noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
  y ~ normal(tau + X * beta, sqrt(noise_var));
}
'''

Please add a license.

GitHub’s policy

You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work.

Since this repo is getting popular and already has over 15 forks, I think it would be wise to add a license.

Mroas, Optimal Mix and efficiency

Hi

With the MROAS we are knowing where to allocate the budget for a channel instead of another one, for the Optimal Mix. Higher mroas means we need to allocate more budget to it, right?

Here my questions:
Where, but how much?

Are we considering the efficiency of the specific media?
E.g. after a certain threshold of investment, channel efficiency can saturate, and become useless to invest in that specific channel

In addition, if we have negative mroas, in practical terms, we are saying that we need to avoid to invest in that channel, because is counterproductive?

As always, thank you for the support

Potential error from line #483

Hi Sibyl,

First of all: thank you very much for your precious help! The code is really helpful.
I decided to re run your example.

The line #483
def mmm_decompose_contrib(mmm, df, original_sales=df['sales']):
This code doesn't find proper connection with line 514 and 517, about "X" variable. Has it to be "X2"?

In addition at line #578
mc_df = mmm_decompose_media_contrib(mmm, df, y_true=df['sales_ln'])
I think that has to be "mmm_decompose_contrib" or viceversa at 483 "mmm_decompose_media_contrib"
Here I am finding another issue: y_true =df["sales_ln"]
Into the code I dind't find any kind of log transformation about the df for "sales" and also before, y_true it is not generated into the mmm_decompose, only "y_true2", that from line 555 to 556 a log transformation is made. How can we have to modify the code?

Again,
Thank you for your support!
Best

Creating autoMMM library based on your code.

Hello, I've had the idea to create a media mix modeling library in python for a while. I really liked your model, so I wrapped it in a class that would make it easy to use. You can see the start of it here.

https://github.com/NichoGustafson/autoMMM

So far I haven't tested the code much, but I plan to work on it much more in the coming week. I'd like to add, in addition to your Bayesian model, an interface for simpler models as well. If you would like to collaborate on this project, please let me know; I'm very busy with work, so I need all the help I can get.

Thanks!

Nick

MAPE problem with photo

Hi Sibyl.

This formula:

def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

I have seen that if I am using the same value that is inside a DataFrame instead of an array, with the exact same values, results are totally different. I attached real examples with your data examples.

I will better explain, focusing on the calculation inside the formula mmm_decompose_contrib():
If you consider this: mc_tt["y_true"], mc_tt["y_pred"] = y_true2, y_pred
np.mean(np.abs((np.array(mc_tt["y_true"]) - np.array(mc_tt["y_pred"]))/np.array(mc_tt["y_true"]))) * 100
is totally different compared to np.mean(np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2))) * 100. In your code it is as in this last form at line 552 and 553:

print('mape (multiplicative model): ', 
         mean_absolute_percentage_error(y_true2, y_pred))

I have check it step by step. And the output of this step:
np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2)) it is not clear to me, because the output is a combination of array.
Here you can find the example.
first
In this case, the final MAPE is 20%

Whilst the output of this:
np.abs((np.array(mc_tt["y_true"]) - np.array(mc_tt["y_pred"]))/np.array(mc_tt["y_true"])) * 100 is a singular array, and it makes sense to me. Because on that you calculate then the mean
second

In this second case MAPE is 11%.

If it was an error, I suggest inside the formula mmm_decompose_contrib() to put the data required for the MAPE, into the mc_df and then to pass it through the formula as you did here for the first model:
print('mape: ', mean_absolute_percentage_error(df['sales'], df['base_sales']))

Different results between Jupyter Notebook and Readme

First of all AWESOME STUFF!

However, in an effort to reproduce your results locally I spotted a few differences between your Jupyter notebook results and the ones given in the readme.

Value stan_mmm.ipynb readme.md
MAPE 26.06330645337451 8.63
RMSE (log-log model) 0.13647800963534104 0.04977
MAPE (multiplicative model) 20.313393234793644 15.71

Have the Readme results been generated using different priors/parameters, or am I missing something obvious?

Plug in for Media competitors and AdjR2

Hi Sibyl,

Sorry I thought to asnwer you about this question.
"Why do you want to avoid 0's, or what's your purpose for changing the adstock? I don't understand your problem and goal, if you let me know, I might have more specific answer" At the end you were right, it didn't change anything. So it's correct to leave the code like this.

I am sharing some chunks of the code that I added/modified, hoping they can be helpful.
Here I add a new part for the First Model to have the R2 and the AdjR2 (I don't know how much can be helpful, but I hope a little bit). Just plug at the end of the Stan Code.

###FIRST MODEL

###STAN CODE

  generated quantities {
  real rss;                
  real totalss;
  real <upper=1> R2 ;
  real <upper=1> AdjR2;                 
  vector[N] mu;
  
  mu <- X1 * beta1 + X2 * beta2 + X3 * beta3 + alpha;
  rss <- dot_self(y-mu);
  totalss <- dot_self(y-mean(y));
  R2 <- 1 - rss/totalss;
  AdjR2 <- 1 - ((1-R2)*(N-1))/(N - (K1+K2+K3) - 1);
}
'''

    sm1 = pystan.StanModel(model_code=ctrl_code1, verbose=True)
    fit1 = sm1.sampling(data=ctrl_data, iter=5000, chains=5, n_jobs=-1, seed=10)
    fit1_result` = fit1.extract()
    fit1_result["R2"].mean()
    fit1_result["AdjR2"].mean()

In the Second Model, I plugged into it a new chunk to have also the media competitors effect (forced as negative). I create a new matrix Z and for it a specific peak and decay. In addition is possible to see also distribution beta for them.
However, I don't know if this is out of the scope of your publication.
Regarding this second step, I have one question. As you can see, I am calculating the y_pred like this:
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
where "mdip_comp_cols" are the columns of the media competitors. I did this to avoid the negative values since I calculated the beta competitors like this:

for i in range(num_media_comp):
        colname = media_comp_vars[i]
        factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))

Do you know how to calculate the y_pred considering also the effect of the media competitors?
In addition I tried to take it out the R2 and the AdjR2 without having results. At the same time I don't know how much can be helpful to have it. Eventually, in this code I didn't consider the mc_delta for the proportion.

Here you can find the code. I started from the starting point of Second Model because there are many steps in between.
It is only necessary to create this in the first lines of the importing phase also because it is necessary to use them to check for multicollinearity among media:

mdip_comp_cols = [col for col in df.columns if 'atl_' in col]

###FIRST MODEL

df_mmm, sc_mmm = mean_log1p_trandform(df, ['total_volume', 'base_sales'])   
mu_mdip = df[mdip_cols].apply(np.mean, axis=0).values
mu_mdip_comp = df[mdip_comp_cols].apply(np.mean, axis=0).values
max_lag = 8
num_media = len(mdip_cols)
num_media_comp = len(mdip_comp_cols)
X_media = np.concatenate((np.zeros((max_lag-1, num_media)), df[mdip_cols].values), axis=0)
X_comp_media = np.concatenate((np.zeros((max_lag-1, num_media_comp)), df[mdip_comp_cols].values), axis=0)
X_ctrl = df_mmm['base_sales'].values.reshape(len(df),1)

###STAN CODE

model_data2 = {
    'N': len(df),
    'max_lag': max_lag, 
    'num_media': num_media,
    'num_media_comp': num_media_comp,
    'X_media': X_media, 
    'X_media_comp': X_comp_media,
    'mu_mdip': mu_mdip,
    'mu_mdip_comp': mu_mdip_comp,
    'num_ctrl': X_ctrl.shape[1],
    'X_ctrl': X_ctrl, 
    'y': df_mmm['total_volume'].values
}

model_code2 = '''
functions {
  // the adstock transformation with a vector of weights
  real Adstock(vector t, row_vector weights) {
    return dot_product(t, weights) / sum(weights);
  }
}
data {
  // the total number of observations
  int<lower=1> N;
  // the vector of sales
  real y[N];
  // the maximum duration of lag effect, in weeks
  int<lower=1> max_lag;
  // the number of media channels
  int<lower=1> num_media;
  // the number of competitors media for TV 
  int<lower=1> num_media_comp;
  // matrix of media variables
  matrix[N+max_lag-1, num_media] X_media;
  // matrix of media variables
  matrix[N+max_lag-1, num_media_comp] X_media_comp;
  // vector of media variables' mean
  real mu_mdip[num_media];
  // vector of media variables' mean competitors
  real mu_mdip_comp[num_media_comp];
  // the number of other control variables
  int<lower=1> num_ctrl;
  // a matrix of control variables
  matrix[N, num_ctrl] X_ctrl;
}
parameters {
  // residual variance
  real<lower=0> noise_var;
  // the intercept
  real tau;
  // the coefficients for media variables and base sales
  vector<lower=0>[num_media+num_ctrl] beta;
  // the coefficients for media competitors variables
  vector<upper=0>[num_media_comp] beta2;
  // the decay and peak parameter for the adstock transformation of
  // each media
  vector<lower=0,upper=1>[num_media] decay;
  vector<lower=0,upper=ceil(max_lag/2)>[num_media] peak;
  vector<lower=0,upper=1>[num_media_comp] decay2;
  vector<lower=0,upper=ceil(max_lag/2)>[num_media_comp] peak2;
}
transformed parameters {
  // the cumulative media effect after adstock
  real cum_effect;
   // the cumulative media effect after adstock of competitors
  real cum_effect_comp;
  // matrix of media variables after adstock
  matrix[N, num_media] X_media_adstocked;
  // matrix of media variables competitors after adstock
  matrix[N, num_media_comp] Z_media_adstocked;
  // matrix of all predictors
  matrix[N, num_media+num_ctrl] X;
  // matrix of all competitors predictors
  matrix[N, num_media_comp] Z;
  // adstock, mean-center, log1p transformation
  row_vector[max_lag] lag_weights;
  // adstock, mean-center, log1p transformation - for competitors
  row_vector[max_lag] lag_weights_comp;
  
  for (nn in 1:N) {
    for (media in 1 : num_media) {
      for (lag in 1 : max_lag) {
        lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
        }
      cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
      X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
     } 
  X <- append_col(X_media_adstocked, X_ctrl);
  }
  for (nn in 1:N) {
    for (media_c in 1 : num_media_comp) {
      for (lag in 1 : max_lag) {
        lag_weights_comp[max_lag-lag+1] <- pow(decay2[media_c], (lag - 1 - peak2[media_c]) ^ 2);
        }
      cum_effect_comp <- Adstock(sub_col(X_media_comp, nn, media_c, max_lag), lag_weights_comp);
      Z_media_adstocked[nn, media_c] <- log1p(cum_effect_comp/mu_mdip_comp[media_c]);
     } 
  Z <- Z_media_adstocked;
  }
}
model {
  decay ~ beta(3,3);
  peak ~ uniform(0, ceil(max_lag/2));
  decay2 ~ beta(3,3);
  peak2 ~ uniform(0, ceil(max_lag/2));
  tau ~ normal(0, 5);
  for (i in 1 : num_media+num_ctrl) {
    beta[i] ~ normal(0, 1);
  }
  for (i in 1 : num_media_comp) {
    beta2[i] ~ normal(0, 1);
  }
  noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
  y ~ normal(tau + X * beta + Z * beta2, sqrt(noise_var));
}
'''

###FITTING AND EXTRAPOLATION FEATURES

sm2 = pystan.StanModel(model_code=model_code2, verbose=True)
fit2 = sm2.sampling(data=model_data2, iter=4000, chains=5, n_jobs=-1, seed=10)
fit2_result = fit2.extract()
stan_utility.check_all_diagnostics(fit2)


def extract_mmm(fit_result, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, ctrl_vars=['base_sales'], extract_param_list=True):
    mmm = {}
    mmm['max_lag'] = max_lag
    mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars'] = media_vars, media_comp_vars, ctrl_vars
    mmm['decay'] = decay = fit_result['decay'].mean(axis=0).tolist()
    mmm['decay2'] = decay2 = fit_result['decay2'].mean(axis=0).tolist()
    mmm['peak'] = peak = fit_result['peak'].mean(axis=0).tolist()
    mmm['peak2'] = peak2 = fit_result['peak2'].mean(axis=0).tolist()
    mmm['beta'] = fit_result['beta'].mean(axis=0).tolist()
    mmm['beta2'] = fit_result['beta2'].mean(axis=0).tolist()
    mmm['tau'] = fit_result['tau'].mean()
    if extract_param_list:
        mmm['decay_list'] = fit_result['decay'].tolist()
        mmm['decay2_list'] = fit_result['decay2'].tolist()
        mmm['peak_list'] = fit_result['peak'].tolist()
        mmm['peak2_list'] = fit_result['peak2'].tolist()
        mmm['beta_list'] = fit_result['beta'].tolist()
        mmm['beta2_list'] = fit_result['beta2'].tolist()
        mmm['tau_list'] = fit_result['tau'].tolist()
    
    adstock_params = {}
    media_names = [col.replace('metric_', '') for col in media_vars]
    for i in range(len(media_names)):
        adstock_params[media_names[i]] = {
            'L': max_lag,
            'P': peak[i],
            'D': decay[i]
        }
    mmm['adstock_params'] = adstock_params
    
    adstock_params_comp = {}
    media_comp_names = [col.replace('atl_', '') for col in media_comp_vars]
    for i in range(len(media_comp_names)):
        adstock_params_comp[media_comp_names[i]] = {
            'L': max_lag,
            'P': (peak2 if type(peak2) == float else peak2[i]),
            'D': (decay2 if type(decay2) == float else decay2[i])
        }
    mmm['adstock_params_comp'] = adstock_params_comp
    return mmm

mmm = extract_mmm(fit2, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars=mdip_comp_cols, ctrl_vars=['base_sales'])

###BETA PLOT

beta_media = {}
for i in range(len(mmm['media_vars'])):
    md = mmm['media_vars'][i]
    betas = []
    for j in range(len(mmm['beta_list'])):
        betas.append(mmm['beta_list'][j][i])
    beta_media[md] = np.array(betas)

f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_vars'])):
    ax = f.add_subplot(5,3,i+1)
    md = mmm['media_vars'][i]
    x = beta_media[md]
    mean_x = x.mean()
    median_x = np.median(x)
    ax = sns.distplot(x)
    ax.axvline(mean_x, color='r', linestyle='-')
    ax.axvline(median_x, color='g', linestyle='-')
    ax.set_title(md)

###BETA COMP

beta_media2 = {}
for i in range(len(mmm['media_comp_vars'])):
    md = mmm['media_comp_vars'][i]
    betas = []
    for j in range(len(mmm['beta2_list'])):
        (betas.append(mmm['beta2_list'][j]) if type(mmm['beta2']) == float else betas.append(mmm['beta2_list'][j][i]))
    beta_media2[md] = np.array(betas)

f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_comp_vars'])):
    ax = f.add_subplot(5,3,i+1)
    md = mmm['media_comp_vars'][i]
    x = beta_media2[md]
    mean_x = x.mean()
    median_x = np.median(x)
    ax = sns.distplot(x)
    ax.axvline(mean_x, color='r', linestyle='-')
    ax.axvline(median_x, color='g', linestyle='-')
    ax.set_title(md)

###MEDIA CONTRIBUTION

def mmm_decompose_contrib(mmm, df, original_sales=df['total_volume']):
    # adstock params
    adstock_params = mmm['adstock_params']
    # coefficients, intercept
    adstock_params_comp = mmm['adstock_params_comp']
    beta, beta2, tau = mmm['beta'], mmm['beta2'], mmm['tau']
    # variables
    media_vars, media_comp_vars, ctrl_vars = mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars']
    num_media, num_media_comp, num_ctrl = len(media_vars), len(media_comp_vars), len(ctrl_vars)
    # X_media2: adstocked, mean-centered media variables + 1
    # It apply adstock formula for every channel, transforming values
    X_media2 = adstock_transform(df, media_vars, adstock_params)
    X_media_comp2 = adstock_transform(df, media_comp_vars, adstock_params_comp)
    # it paramtrize to mean  0 all media sc is mean
    X_media2, sc_mmm2 = mean_center_trandform(X_media2, media_vars)
    X_media_comp2, sc_mmm_comp_2 = mean_center_trandform(X_media_comp2, media_comp_vars)
    #it sums to every channel +1: it means that all 0 are +1
    X_media2 = X_media2 + 1
    X_media_comp2 = X_media_comp2 + 1
    # X_ctrl2, mean-centered control variables + 1
    # It parametrize to mean 0 the base_sales
    X_ctrl2, sc_mmm2_1 = mean_center_trandform(df[ctrl_vars], ctrl_vars)
    X_ctrl2 = X_ctrl2 + 1
    # y_true2, mean-centered sales variable + 1
    y_true2, sc_mmm2_2 = mean_center_trandform(df, ['total_volume'])
    y_true2 = y_true2 + 1
    #it adds the line sc_mmm2_1 and sc_mm2_2 to sc_mmm2 -- why?
    sc_mmm2.update(sc_mmm2_1)
    sc_mmm2.update(sc_mmm2_2)
    # X2 <- media variables + ctrl variable
    X2 = pd.concat([X_media2, X_ctrl2, X_media_comp2], axis=1)

    # 1. compute each media/control factor: 
    # log-log model: log(sales) = log(X[0])*beta[0] + ... + log(X[13])*beta[13] + tau
    # multiplicative model: sales = X[0]^beta[0] * ... * X[13]^beta[13] * e^tau
    # each factor = X[i]^beta[i]
    # intercept = e^tau
    # Here is transformation from log-log to base
    # Indeed all the beta for parameter estimated are ^(**). B[9] = beta[num_media+i] = beta for base_sales
    factor_df = pd.DataFrame(columns=media_vars+ctrl_vars+media_comp_vars+['intercept'])
    for i in range(num_media):
        colname = media_vars[i]
        factor_df[colname] = X2[colname] ** beta[i]
    for i in range(num_ctrl):
        colname = ctrl_vars[i]
        factor_df[colname] = X2[colname] ** beta[num_media+i]
    for i in range(num_media_comp):
        colname = media_comp_vars[i]
        factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
    factor_df['intercept'] = np.exp(tau)

    # 2. calculate the product of all factors -> y_pred
    # baseline = intercept * control factor = e^tau * X[13]^beta[13]
    # .apply(np.prod, axis = 1. Multiplication for every row)
    y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
    factor_df['y_pred'], factor_df['y_true2'] = y_pred, y_true2
    factor_df['baseline'] = factor_df[['intercept']+ctrl_vars].apply(np.prod, axis=1)

    # 3. calculate each media factor's contribution
    # media contribution = total volume – volume upon removal of the media factor
    mc_df = pd.DataFrame(columns=media_vars+media_comp_vars+['baseline'])
    for col in (media_vars):
        mc_df[col] = factor_df['y_true2'] - factor_df['y_true2']/factor_df[col]
    for col in (media_comp_vars):
        mc_df[col] = factor_df['y_true2'] + factor_df['y_true2']/factor_df[col]
    mc_df['baseline'] = factor_df['baseline']
    mc_df['y_true2'] = factor_df['y_true2']

    # 4. scale contribution
    # difference between mc_true(y_true2 - baseline) and mc_pred. This difference 
    # used to regauge/scale/proportion w.r.t. each media factors
    # predicted total media contribution: product of all media factors
    mc_df['mc_pred'] = mc_df[media_vars + media_comp_vars].apply(np.sum, axis=1)
    # true total media contribution: total volume - baseline
    mc_df['mc_true'] = mc_df['y_true2'] - mc_df['baseline']
    
    # 5. scale mc_df based on original sales
    mc_df['total_volume'] = original_sales
    for col in media_vars+media_comp_vars+['baseline']:
        mc_df[col] = mc_df[col]*mc_df['total_volume']/mc_df['y_true2']
    
    print('rmse (log-log model): ', 
         (mean_squared_error(y_true2, y_pred, squared = False))/(y_true2.mean())*100)
    print('mape (multiplicative model): ', 
         mean_absolute_percentage_error(y_true2, y_pred))
    return mc_df

###calculate media contribution percentage

def calc_media_contrib_pct(mc_df, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, sales_col='total_volume', period=52):
    '''
    returns:
    mc_pct: percentage over total sales
    mc_pct2: percentage over incremental sales (sales contributed by media channels)
    '''
    mc_pct = {}
    mc_pct2 = {}
    s = 0
    if period is None:
        for col in (media_vars+media_comp_vars+['baseline']):
            mc_pct[col] = (mc_df[col]/mc_df[sales_col]).mean()
    else:
        for col in (media_vars+media_comp_vars+['baseline']):
            mc_pct[col] = (mc_df[col]/mc_df[sales_col])[-period:].mean()
    for m in media_vars:
        s += mc_pct[m]
    for m in (media_vars+media_comp_vars):
        mc_pct2[m] = mc_pct[m]/s
    return mc_pct, mc_pct2

As always thank you for your help, hoping these modifications can be helpful or at least interesting

MAPE formula potential issue with array

Hi Sibyl.

Hoping you are fine.
I have found a really strange situation that I want to share.

This formula:

def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

I have seen that if I am using the same value that is inside a DataFrame instead of an array, with the exact same values, results are totally different.

I will better explain, focusing on the calculation inside the formula mmm_decompose_contrib():
If you consider this: mc_df["y_true3"], mc_df["y_pred"] = y_true2, y_pred
np.mean(np.abs((np.array(mc_df["y_true3"]) - np.array(mc_df["y_pred"]))/np.array(mc_df["y_true3"]))) * 100
is totally different compared to np.mean(np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2))) * 100

I have check it step by step. And the output of this step:
np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2)) it is not clear to me, because the output is a combination of array
whilst the output of this :
np.abs((np.array(mc_df["y_true3"]) - np.array(mc_df["y_pred"]))/np.array(mc_df["y_true3"])) * 100 is a singular array, and it makes sense to me.

If it was an error, I suggest inside the formula mmm_decompose_contrib() to put the data required for the MAPE, into the mc_df and then to pass it through the formula.

how to get adstock

Hello,thanks for your great work. I am learning this now.
About the adstock, I ran the following code:

data=pd.DataFrame({"x":[100,100,100,100,100,100,100,100,100,100,100,100,100,100,100]})
L=4
P=1
D=0.8
x_adstocked = apply_adstock(data['x'].values, L, P, D)

Got these numbers:

array([ 26.58160553, 59.80861244, 86.39021797, 100. ,
100. , 100. , 100. , 100. ,
100. , 100. , 100. , 100. ,
100. , 100. , 100. ])

My question is:
Do these numbers are right adstock? or Is there any part i am missing?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.