sibylhe / mmm_stan Goto Github PK

Python/STAN Implementation of Multiplicative Marketing Mix Model, with deep dive into Adstock (carry-over effect), ROAS, and mROAS

License: MIT License

Python 5.75% Jupyter Notebook 94.25%

bayesian-regression constrained-regression marketing-mix-modeling media-mix-modeling pystan roas stan

mmm_stan's People

Contributors

Stargazers

Watchers

Forkers

baturalpt123 victorlee29 jorge-llanesco griu ar-quera zymafunk gtenren riemann85 muhammadagf hiroki-kurasawa mcarricano jqpg patisu01 qivinzhao madhushrirajagopal oussghan ychu19 wanyi11 y0zhan06 phillip1029 rabiawadhwa silvisece dorelyms jien-0 yangliuming cafew sthone-gif mindis gdberrio mehulsingh marcoscattolin adivarma27 ricky842 vaynermedia-la lucas-oros surajsingleview amitpatil21 christian-r-n yvnsuresh samoh77 royben74 vbugaichuk chengfang-sophie-zhu bhascarp anupam1981 nurajamil sebastiangj steven-livingstone xs-tsato laplacekorea anandkhandekar manojhuilgol masou sanket17096 sandeepyadav10011995 hammer-mt spinkney getmykhan atiyasa hyoseok-bang jsailukumar smalexanderkc lokudassanayake nakhirot0327 strumila gmurati gabriele-franco duranbx2 wwwjk366 corinnanutsch bennian zhongxianmen2020 margaretnm troybvo guangfei0 keshavgoyal05 wewyse sarav24 amitkumarojha nwokemodohenry manu87ds oodiboo diegodlg95 ejhortala felixbastian githubjack22 kakkartushar1 kennethwu1987 srravula1 orhansolak kyanome shebigon vkf9 kangyue1997 johanna-hughes ashutosh3060 lalit7jain jzyjade naracampos lacajdaniel

mmm_stan's Issues

Different results between Jupyter Notebook and Readme

First of all AWESOME STUFF!

However, in an effort to reproduce your results locally I spotted a few differences between your Jupyter notebook results and the ones given in the readme.

Value	stan_mmm.ipynb	readme.md
MAPE	26.06330645337451	8.63
RMSE (log-log model)	0.13647800963534104	0.04977
MAPE (multiplicative model)	20.313393234793644	15.71

Have the Readme results been generated using different priors/parameters, or am I missing something obvious?

how to get adstock

Hello，thanks for your great work. I am learning this now.
About the adstock, I ran the following code:

data=pd.DataFrame({"x":[100,100,100,100,100,100,100,100,100,100,100,100,100,100,100]})
L=4
P=1
D=0.8
x_adstocked = apply_adstock(data['x'].values, L, P, D)

Got these numbers:

array([ 26.58160553, 59.80861244, 86.39021797, 100. ,
100. , 100. , 100. , 100. ,
100. , 100. , 100. , 100. ,
100. , 100. , 100. ])

My question is:
Do these numbers are right adstock? or Is there any part i am missing?
Thanks.

Not an issue, but a Thank you!

@sibylhe I wanted to convey my thanks for sharing this truly comprehensive example of Marketing Mix models using STAN based library/package. I come from the R side and I have used STAN extensively, but not as well versed on the Python side with Pystan. This was a great tutorial to follow and use in different datasets that I have.

A sincere thank you! I'm not well versed with Git Hub, but eventually would like to share some of the examples I have developed using/modifying your code. Let me know how I can contribute here.

Sree

MAPE formula potential issue with array

Hi Sibyl.

Hoping you are fine.
I have found a really strange situation that I want to share.

This formula:

def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

I have seen that if I am using the same value that is inside a DataFrame instead of an array, with the exact same values, results are totally different.

I will better explain, focusing on the calculation inside the formula mmm_decompose_contrib():
If you consider this: mc_df["y_true3"], mc_df["y_pred"] = y_true2, y_pred
np.mean(np.abs((np.array(mc_df["y_true3"]) - np.array(mc_df["y_pred"]))/np.array(mc_df["y_true3"]))) * 100
is totally different compared to np.mean(np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2))) * 100

I have check it step by step. And the output of this step:
np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2)) it is not clear to me, because the output is a combination of array
whilst the output of this :
np.abs((np.array(mc_df["y_true3"]) - np.array(mc_df["y_pred"]))/np.array(mc_df["y_true3"])) * 100 is a singular array, and it makes sense to me.

If it was an error, I suggest inside the formula mmm_decompose_contrib() to put the data required for the MAPE, into the mc_df and then to pass it through the formula.

Please add a license.

GitHub’s policy

You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work.

Since this repo is getting popular and already has over 15 forks, I think it would be wise to add a license.

Struggled with -> RuntimeError: Initialization failed

Hi Sibyl,

First, let me thank you in advance for both your post (which is pretty good) and for your help!

I wanted to use your approach in an example but I can not fix the "RuntimeError: Initialization failed" when I do the ".sampling" and I really do not know what else to do. Could you give me any little advice?

Here it goes the data I provide to "sampling", and the error messagge:

Model data:

N: 53
max_lag: 2 (my example is with monthly data)
num_media: 8
X_media: As it is quite big and I guess the numbers do say nothing to you, I give you its shape (54,8)
mu_mdip: As I guess the numbers do say nothing to you, I give you its shape (8,)
num_ctrl: 1
X_ctrl: As I guess the numbers do say nothing to you, I give you its shape (53,1)
y: As I guess the numbers do say nothing to you, I give you its shape (53,)

Error message:

"RuntimeError Traceback (most recent call last)
in
----> 1 fit2 = sm2.sampling(data=model_data2, iter=2000, chains=4, verbose=True, n_jobs=1)

C:\ProgramData\Anaconda3\lib\site-packages\pystan\model.py in sampling(self, data, pars, chains, iter, warmup, thin, seed, init, sample_file, diagnostic_file, verbose, algorithm, control, n_jobs, **kwargs)
811 call_sampler_args = izip(itertools.repeat(data), args_list, itertools.repeat(pars))
812 call_sampler_star = self.module._call_sampler_star
--> 813 ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
814 samples = [smpl for _, smpl in ret_and_samples]
815

C:\ProgramData\Anaconda3\lib\site-packages\pystan\model.py in _map_parallel(function, args, n_jobs)
88 pool.join()
89 else:
---> 90 map_result = list(map(function, args))
91 return map_result
92

stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848.pyx in stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848._call_sampler_star()

stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848.pyx in stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848._call_sampler()

RuntimeError: Initialization failed."

I hope you might help me,
Once again, thank you so much.
Best regards.

MAPE problem with photo

Hi Sibyl.

This formula:

def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

I have seen that if I am using the same value that is inside a DataFrame instead of an array, with the exact same values, results are totally different. I attached real examples with your data examples.

I will better explain, focusing on the calculation inside the formula mmm_decompose_contrib():
If you consider this: mc_tt["y_true"], mc_tt["y_pred"] = y_true2, y_pred
np.mean(np.abs((np.array(mc_tt["y_true"]) - np.array(mc_tt["y_pred"]))/np.array(mc_tt["y_true"]))) * 100
is totally different compared to np.mean(np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2))) * 100. In your code it is as in this last form at line 552 and 553:

print('mape (multiplicative model): ', 
         mean_absolute_percentage_error(y_true2, y_pred))

I have check it step by step. And the output of this step:
np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2)) it is not clear to me, because the output is a combination of array.
Here you can find the example.

In this case, the final MAPE is 20%

Whilst the output of this:
np.abs((np.array(mc_tt["y_true"]) - np.array(mc_tt["y_pred"]))/np.array(mc_tt["y_true"])) * 100 is a singular array, and it makes sense to me. Because on that you calculate then the mean

In this second case MAPE is 11%.

If it was an error, I suggest inside the formula mmm_decompose_contrib() to put the data required for the MAPE, into the mc_df and then to pass it through the formula as you did here for the first model:
print('mape: ', mean_absolute_percentage_error(df['sales'], df['base_sales']))

Mroas, Optimal Mix and efficiency

With the MROAS we are knowing where to allocate the budget for a channel instead of another one, for the Optimal Mix. Higher mroas means we need to allocate more budget to it, right?

Here my questions:
Where, but how much?

Are we considering the efficiency of the specific media?
E.g. after a certain threshold of investment, channel efficiency can saturate, and become useless to invest in that specific channel

In addition, if we have negative mroas, in practical terms, we are saying that we need to avoid to invest in that channel, because is counterproductive?

As always, thank you for the support

Creating autoMMM library based on your code.

Hello, I've had the idea to create a media mix modeling library in python for a while. I really liked your model, so I wrapped it in a class that would make it easy to use. You can see the start of it here.

https://github.com/NichoGustafson/autoMMM

So far I haven't tested the code much, but I plan to work on it much more in the coming week. I'd like to add, in addition to your Bayesian model, an interface for simpler models as well. If you would like to collaborate on this project, please let me know; I'm very busy with work, so I need all the help I can get.

Thanks!

Nick

Plug in for Media competitors and AdjR2

Hi Sibyl,

Sorry I thought to asnwer you about this question.
"Why do you want to avoid 0's, or what's your purpose for changing the adstock? I don't understand your problem and goal, if you let me know, I might have more specific answer" At the end you were right, it didn't change anything. So it's correct to leave the code like this.

I am sharing some chunks of the code that I added/modified, hoping they can be helpful.
Here I add a new part for the First Model to have the R2 and the AdjR2 (I don't know how much can be helpful, but I hope a little bit). Just plug at the end of the Stan Code.

###FIRST MODEL

###STAN CODE

  generated quantities {
  real rss;                
  real totalss;
  real <upper=1> R2 ;
  real <upper=1> AdjR2;                 
  vector[N] mu;
  
  mu <- X1 * beta1 + X2 * beta2 + X3 * beta3 + alpha;
  rss <- dot_self(y-mu);
  totalss <- dot_self(y-mean(y));
  R2 <- 1 - rss/totalss;
  AdjR2 <- 1 - ((1-R2)*(N-1))/(N - (K1+K2+K3) - 1);
}
'''

    sm1 = pystan.StanModel(model_code=ctrl_code1, verbose=True)
    fit1 = sm1.sampling(data=ctrl_data, iter=5000, chains=5, n_jobs=-1, seed=10)
    fit1_result` = fit1.extract()
    fit1_result["R2"].mean()
    fit1_result["AdjR2"].mean()

In the Second Model, I plugged into it a new chunk to have also the media competitors effect (forced as negative). I create a new matrix Z and for it a specific peak and decay. In addition is possible to see also distribution beta for them.
However, I don't know if this is out of the scope of your publication.
Regarding this second step, I have one question. As you can see, I am calculating the y_pred like this:
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
where "mdip_comp_cols" are the columns of the media competitors. I did this to avoid the negative values since I calculated the beta competitors like this:

for i in range(num_media_comp):
        colname = media_comp_vars[i]
        factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))

Do you know how to calculate the y_pred considering also the effect of the media competitors?
In addition I tried to take it out the R2 and the AdjR2 without having results. At the same time I don't know how much can be helpful to have it. Eventually, in this code I didn't consider the mc_delta for the proportion.

Here you can find the code. I started from the starting point of Second Model because there are many steps in between.
It is only necessary to create this in the first lines of the importing phase also because it is necessary to use them to check for multicollinearity among media:

mdip_comp_cols = [col for col in df.columns if 'atl_' in col]

###FIRST MODEL

df_mmm, sc_mmm = mean_log1p_trandform(df, ['total_volume', 'base_sales'])   
mu_mdip = df[mdip_cols].apply(np.mean, axis=0).values
mu_mdip_comp = df[mdip_comp_cols].apply(np.mean, axis=0).values
max_lag = 8
num_media = len(mdip_cols)
num_media_comp = len(mdip_comp_cols)
X_media = np.concatenate((np.zeros((max_lag-1, num_media)), df[mdip_cols].values), axis=0)
X_comp_media = np.concatenate((np.zeros((max_lag-1, num_media_comp)), df[mdip_comp_cols].values), axis=0)
X_ctrl = df_mmm['base_sales'].values.reshape(len(df),1)

###STAN CODE

model_data2 = {
    'N': len(df),
    'max_lag': max_lag, 
    'num_media': num_media,
    'num_media_comp': num_media_comp,
    'X_media': X_media, 
    'X_media_comp': X_comp_media,
    'mu_mdip': mu_mdip,
    'mu_mdip_comp': mu_mdip_comp,
    'num_ctrl': X_ctrl.shape[1],
    'X_ctrl': X_ctrl, 
    'y': df_mmm['total_volume'].values
}

model_code2 = '''
functions {
  // the adstock transformation with a vector of weights
  real Adstock(vector t, row_vector weights) {
    return dot_product(t, weights) / sum(weights);
  }
}
data {
  // the total number of observations
  int<lower=1> N;
  // the vector of sales
  real y[N];
  // the maximum duration of lag effect, in weeks
  int<lower=1> max_lag;
  // the number of media channels
  int<lower=1> num_media;
  // the number of competitors media for TV 
  int<lower=1> num_media_comp;
  // matrix of media variables
  matrix[N+max_lag-1, num_media] X_media;
  // matrix of media variables
  matrix[N+max_lag-1, num_media_comp] X_media_comp;
  // vector of media variables' mean
  real mu_mdip[num_media];
  // vector of media variables' mean competitors
  real mu_mdip_comp[num_media_comp];
  // the number of other control variables
  int<lower=1> num_ctrl;
  // a matrix of control variables
  matrix[N, num_ctrl] X_ctrl;
}
parameters {
  // residual variance
  real<lower=0> noise_var;
  // the intercept
  real tau;
  // the coefficients for media variables and base sales
  vector<lower=0>[num_media+num_ctrl] beta;
  // the coefficients for media competitors variables
  vector<upper=0>[num_media_comp] beta2;
  // the decay and peak parameter for the adstock transformation of
  // each media
  vector<lower=0,upper=1>[num_media] decay;
  vector<lower=0,upper=ceil(max_lag/2)>[num_media] peak;
  vector<lower=0,upper=1>[num_media_comp] decay2;
  vector<lower=0,upper=ceil(max_lag/2)>[num_media_comp] peak2;
}
transformed parameters {
  // the cumulative media effect after adstock
  real cum_effect;
   // the cumulative media effect after adstock of competitors
  real cum_effect_comp;
  // matrix of media variables after adstock
  matrix[N, num_media] X_media_adstocked;
  // matrix of media variables competitors after adstock
  matrix[N, num_media_comp] Z_media_adstocked;
  // matrix of all predictors
  matrix[N, num_media+num_ctrl] X;
  // matrix of all competitors predictors
  matrix[N, num_media_comp] Z;
  // adstock, mean-center, log1p transformation
  row_vector[max_lag] lag_weights;
  // adstock, mean-center, log1p transformation - for competitors
  row_vector[max_lag] lag_weights_comp;
  
  for (nn in 1:N) {
    for (media in 1 : num_media) {
      for (lag in 1 : max_lag) {
        lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
        }
      cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
      X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
     } 
  X <- append_col(X_media_adstocked, X_ctrl);
  }
  for (nn in 1:N) {
    for (media_c in 1 : num_media_comp) {
      for (lag in 1 : max_lag) {
        lag_weights_comp[max_lag-lag+1] <- pow(decay2[media_c], (lag - 1 - peak2[media_c]) ^ 2);
        }
      cum_effect_comp <- Adstock(sub_col(X_media_comp, nn, media_c, max_lag), lag_weights_comp);
      Z_media_adstocked[nn, media_c] <- log1p(cum_effect_comp/mu_mdip_comp[media_c]);
     } 
  Z <- Z_media_adstocked;
  }
}
model {
  decay ~ beta(3,3);
  peak ~ uniform(0, ceil(max_lag/2));
  decay2 ~ beta(3,3);
  peak2 ~ uniform(0, ceil(max_lag/2));
  tau ~ normal(0, 5);
  for (i in 1 : num_media+num_ctrl) {
    beta[i] ~ normal(0, 1);
  }
  for (i in 1 : num_media_comp) {
    beta2[i] ~ normal(0, 1);
  }
  noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
  y ~ normal(tau + X * beta + Z * beta2, sqrt(noise_var));
}
'''

###FITTING AND EXTRAPOLATION FEATURES

sm2 = pystan.StanModel(model_code=model_code2, verbose=True)
fit2 = sm2.sampling(data=model_data2, iter=4000, chains=5, n_jobs=-1, seed=10)
fit2_result = fit2.extract()
stan_utility.check_all_diagnostics(fit2)


def extract_mmm(fit_result, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, ctrl_vars=['base_sales'], extract_param_list=True):
    mmm = {}
    mmm['max_lag'] = max_lag
    mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars'] = media_vars, media_comp_vars, ctrl_vars
    mmm['decay'] = decay = fit_result['decay'].mean(axis=0).tolist()
    mmm['decay2'] = decay2 = fit_result['decay2'].mean(axis=0).tolist()
    mmm['peak'] = peak = fit_result['peak'].mean(axis=0).tolist()
    mmm['peak2'] = peak2 = fit_result['peak2'].mean(axis=0).tolist()
    mmm['beta'] = fit_result['beta'].mean(axis=0).tolist()
    mmm['beta2'] = fit_result['beta2'].mean(axis=0).tolist()
    mmm['tau'] = fit_result['tau'].mean()
    if extract_param_list:
        mmm['decay_list'] = fit_result['decay'].tolist()
        mmm['decay2_list'] = fit_result['decay2'].tolist()
        mmm['peak_list'] = fit_result['peak'].tolist()
        mmm['peak2_list'] = fit_result['peak2'].tolist()
        mmm['beta_list'] = fit_result['beta'].tolist()
        mmm['beta2_list'] = fit_result['beta2'].tolist()
        mmm['tau_list'] = fit_result['tau'].tolist()
    
    adstock_params = {}
    media_names = [col.replace('metric_', '') for col in media_vars]
    for i in range(len(media_names)):
        adstock_params[media_names[i]] = {
            'L': max_lag,
            'P': peak[i],
            'D': decay[i]
        }
    mmm['adstock_params'] = adstock_params
    
    adstock_params_comp = {}
    media_comp_names = [col.replace('atl_', '') for col in media_comp_vars]
    for i in range(len(media_comp_names)):
        adstock_params_comp[media_comp_names[i]] = {
            'L': max_lag,
            'P': (peak2 if type(peak2) == float else peak2[i]),
            'D': (decay2 if type(decay2) == float else decay2[i])
        }
    mmm['adstock_params_comp'] = adstock_params_comp
    return mmm

mmm = extract_mmm(fit2, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars=mdip_comp_cols, ctrl_vars=['base_sales'])

###BETA PLOT

beta_media = {}
for i in range(len(mmm['media_vars'])):
    md = mmm['media_vars'][i]
    betas = []
    for j in range(len(mmm['beta_list'])):
        betas.append(mmm['beta_list'][j][i])
    beta_media[md] = np.array(betas)

f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_vars'])):
    ax = f.add_subplot(5,3,i+1)
    md = mmm['media_vars'][i]
    x = beta_media[md]
    mean_x = x.mean()
    median_x = np.median(x)
    ax = sns.distplot(x)
    ax.axvline(mean_x, color='r', linestyle='-')
    ax.axvline(median_x, color='g', linestyle='-')
    ax.set_title(md)

###BETA COMP

beta_media2 = {}
for i in range(len(mmm['media_comp_vars'])):
    md = mmm['media_comp_vars'][i]
    betas = []
    for j in range(len(mmm['beta2_list'])):
        (betas.append(mmm['beta2_list'][j]) if type(mmm['beta2']) == float else betas.append(mmm['beta2_list'][j][i]))
    beta_media2[md] = np.array(betas)

f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_comp_vars'])):
    ax = f.add_subplot(5,3,i+1)
    md = mmm['media_comp_vars'][i]
    x = beta_media2[md]
    mean_x = x.mean()
    median_x = np.median(x)
    ax = sns.distplot(x)
    ax.axvline(mean_x, color='r', linestyle='-')
    ax.axvline(median_x, color='g', linestyle='-')
    ax.set_title(md)

###MEDIA CONTRIBUTION

def mmm_decompose_contrib(mmm, df, original_sales=df['total_volume']):
    # adstock params
    adstock_params = mmm['adstock_params']
    # coefficients, intercept
    adstock_params_comp = mmm['adstock_params_comp']
    beta, beta2, tau = mmm['beta'], mmm['beta2'], mmm['tau']
    # variables
    media_vars, media_comp_vars, ctrl_vars = mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars']
    num_media, num_media_comp, num_ctrl = len(media_vars), len(media_comp_vars), len(ctrl_vars)
    # X_media2: adstocked, mean-centered media variables + 1
    # It apply adstock formula for every channel, transforming values
    X_media2 = adstock_transform(df, media_vars, adstock_params)
    X_media_comp2 = adstock_transform(df, media_comp_vars, adstock_params_comp)
    # it paramtrize to mean  0 all media sc is mean
    X_media2, sc_mmm2 = mean_center_trandform(X_media2, media_vars)
    X_media_comp2, sc_mmm_comp_2 = mean_center_trandform(X_media_comp2, media_comp_vars)
    #it sums to every channel +1: it means that all 0 are +1
    X_media2 = X_media2 + 1
    X_media_comp2 = X_media_comp2 + 1
    # X_ctrl2, mean-centered control variables + 1
    # It parametrize to mean 0 the base_sales
    X_ctrl2, sc_mmm2_1 = mean_center_trandform(df[ctrl_vars], ctrl_vars)
    X_ctrl2 = X_ctrl2 + 1
    # y_true2, mean-centered sales variable + 1
    y_true2, sc_mmm2_2 = mean_center_trandform(df, ['total_volume'])
    y_true2 = y_true2 + 1
    #it adds the line sc_mmm2_1 and sc_mm2_2 to sc_mmm2 -- why?
    sc_mmm2.update(sc_mmm2_1)
    sc_mmm2.update(sc_mmm2_2)
    # X2 <- media variables + ctrl variable
    X2 = pd.concat([X_media2, X_ctrl2, X_media_comp2], axis=1)

    # 1. compute each media/control factor: 
    # log-log model: log(sales) = log(X[0])*beta[0] + ... + log(X[13])*beta[13] + tau
    # multiplicative model: sales = X[0]^beta[0] * ... * X[13]^beta[13] * e^tau
    # each factor = X[i]^beta[i]
    # intercept = e^tau
    # Here is transformation from log-log to base
    # Indeed all the beta for parameter estimated are ^(**). B[9] = beta[num_media+i] = beta for base_sales
    factor_df = pd.DataFrame(columns=media_vars+ctrl_vars+media_comp_vars+['intercept'])
    for i in range(num_media):
        colname = media_vars[i]
        factor_df[colname] = X2[colname] ** beta[i]
    for i in range(num_ctrl):
        colname = ctrl_vars[i]
        factor_df[colname] = X2[colname] ** beta[num_media+i]
    for i in range(num_media_comp):
        colname = media_comp_vars[i]
        factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
    factor_df['intercept'] = np.exp(tau)

    # 2. calculate the product of all factors -> y_pred
    # baseline = intercept * control factor = e^tau * X[13]^beta[13]
    # .apply(np.prod, axis = 1. Multiplication for every row)
    y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
    factor_df['y_pred'], factor_df['y_true2'] = y_pred, y_true2
    factor_df['baseline'] = factor_df[['intercept']+ctrl_vars].apply(np.prod, axis=1)

    # 3. calculate each media factor's contribution
    # media contribution = total volume – volume upon removal of the media factor
    mc_df = pd.DataFrame(columns=media_vars+media_comp_vars+['baseline'])
    for col in (media_vars):
        mc_df[col] = factor_df['y_true2'] - factor_df['y_true2']/factor_df[col]
    for col in (media_comp_vars):
        mc_df[col] = factor_df['y_true2'] + factor_df['y_true2']/factor_df[col]
    mc_df['baseline'] = factor_df['baseline']
    mc_df['y_true2'] = factor_df['y_true2']

    # 4. scale contribution
    # difference between mc_true(y_true2 - baseline) and mc_pred. This difference 
    # used to regauge/scale/proportion w.r.t. each media factors
    # predicted total media contribution: product of all media factors
    mc_df['mc_pred'] = mc_df[media_vars + media_comp_vars].apply(np.sum, axis=1)
    # true total media contribution: total volume - baseline
    mc_df['mc_true'] = mc_df['y_true2'] - mc_df['baseline']
    
    # 5. scale mc_df based on original sales
    mc_df['total_volume'] = original_sales
    for col in media_vars+media_comp_vars+['baseline']:
        mc_df[col] = mc_df[col]*mc_df['total_volume']/mc_df['y_true2']
    
    print('rmse (log-log model): ', 
         (mean_squared_error(y_true2, y_pred, squared = False))/(y_true2.mean())*100)
    print('mape (multiplicative model): ', 
         mean_absolute_percentage_error(y_true2, y_pred))
    return mc_df

###calculate media contribution percentage

def calc_media_contrib_pct(mc_df, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, sales_col='total_volume', period=52):
    '''
    returns:
    mc_pct: percentage over total sales
    mc_pct2: percentage over incremental sales (sales contributed by media channels)
    '''
    mc_pct = {}
    mc_pct2 = {}
    s = 0
    if period is None:
        for col in (media_vars+media_comp_vars+['baseline']):
            mc_pct[col] = (mc_df[col]/mc_df[sales_col]).mean()
    else:
        for col in (media_vars+media_comp_vars+['baseline']):
            mc_pct[col] = (mc_df[col]/mc_df[sales_col])[-period:].mean()
    for m in media_vars:
        s += mc_pct[m]
    for m in (media_vars+media_comp_vars):
        mc_pct2[m] = mc_pct[m]/s
    return mc_pct, mc_pct2

As always thank you for your help, hoping these modifications can be helpful or at least interesting

Potential error from line #483

Hi Sibyl,

First of all: thank you very much for your precious help! The code is really helpful.
I decided to re run your example.

The line #483
def mmm_decompose_contrib(mmm, df, original_sales=df['sales']):
This code doesn't find proper connection with line 514 and 517, about "X" variable. Has it to be "X2"?

In addition at line #578
mc_df = mmm_decompose_media_contrib(mmm, df, y_true=df['sales_ln'])
I think that has to be "mmm_decompose_contrib" or viceversa at 483 "mmm_decompose_media_contrib"
Here I am finding another issue: y_true =df["sales_ln"]
Into the code I dind't find any kind of log transformation about the df for "sales" and also before, y_true it is not generated into the mmm_decompose, only "y_true2", that from line 555 to 556 a log transformation is made. How can we have to modify the code?

Again,
Thank you for your support!
Best

Attempting multiplication of list and float.

In the function below we save hill_model['data']['y'] as a list of floats.

def train_hill_model(df, mc_df, adstock_params, media, sm):
    data, sc = create_hill_model_data(df, mc_df, adstock_params, media)
    fit = sm.sampling(data=data, iter=2000, chains=4)
    fit_result = fit.extract()
    hill_model = {
        'beta_hill_list': fit_result['beta_hill'].tolist(),
        'ec_list': fit_result['ec'].tolist(),
        'slope_list': fit_result['slope'].tolist(),
        'sc': sc,
        'data': {
            'X': data['X'].tolist(),
            'y': data['y'].tolist(),
        }
    }
    return hill_model

However later on we try to multiply hill_model['data']['y'] by a float. Since it is a list it can't legally be multiplied by a float, and multiplication by an integer just increases the size of the list.

def evaluate_hill_model(hill_model, hill_model_params):
    x = hill_model['data']['X']
    y_true = hill_model['data']['y'] * hill_model['sc']['y']
    print(y_true)
    y_pred = hill_model_predict(hill_model_params, x) * hill_model['sc']['y']
    print(y_pred)
    print('mape on original data: ', 
         mean_absolute_percentage_error(y_true, y_pred))
    return y_true, y_pred

To be able to preform the multiplication hill_model['data']['y'] * hill_model['sc']['y'] then hill_model['data']['y'] needs to be a numpy array, not a list.

Theoretical questions and tips for the code

Hi Sibyl,

Hoping you are well.
I have some theoretical questions about the model and some suggestions for the code.

About Marketing Mix Model (model 2):

If I have to benchmark different models, which metrics do you suggest to use? Every time I check for the MAPE and RMSE are always pretty similar.
How can we evaluate the impact of each control variable on the baseline?
In your dataset, your Media is often ON. What happens if we have interspersed Media values with many 0s from campaign to campaign? In this case if our lag_effect willl not sufficient to cover this gap, the cumulative effect will be 0.
For this last problem, I decided to change the STAN code, it works, but I am not sure if it is the right approach.
Here you can find the changed code:

“””// adstock, mean-center, log1p transformation
row_vector[max_lag] lag_weights;
for (nn in 1:N) {
for (media in 1 : num_media) {
for (lag in 1 : max_lag) {
lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
}
cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
if (cum_effect == 0) {
X_media_adstocked[nn, media] <- 0;
}
else {
X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
}
}
X <- append_col(X_media_adstocked, X_ctrl);
}
}
“””
Instead of this modification I made, do you suggest other solutions? For example, creating a vector of cumulative effect, transform it to avoid 0 and then pass it through the formula (log1p(cum_effect/mu_mdip[media), could be effective?

How can we deal if we have variables that are negative and positive as, e.g., seasonality (variation from negative to positive). Do you think a transformation as (X-X.min())/(X.max()-X.min()) can be a right approach, or do you suggest to keep the mean transformation?
Why do we need to apply a log_mean_center transformation in the second model?

I think the undermentioned parts can improve the code to avoid some errors.

Control Model (model 1)
I think this solution can help, if we want to use only one variable for one specific beta, otherwise problems can arise.
I suggest transforming every control variable X in the matrix form to obtain the shape.
For example:
pos_vars = pos_vars = [col for col in base_vars if col not in seas_cols] -> here we have more than one variables
X1 = np.matrix(df_ctrl[pos_vars].values).reshape(len(df),(len([pos _vars]) if type(pos_vars) == str else len(pos_vars))))
pn_vars = seas_cols[1] -> we have only one control variable inside
X2 = np.matrix(df_ctrl[pn_vars].values).reshape(len(df),(len([pn_vars]) if type(pn_vars) == str else len(pn_vars)))) –> to mantain coherence shape later with the ctrl_data dictionary

ctrl_data = {
'N': len(df_ctrl),
'K1': X1.shape[1], --> instead of len(pos_vars)
'K2': X2.shape[1], --> instead of len(pn_vars)
'X1': X1,
'X2': X2,
'y': df_ctrl[base_sales'].values,
'max_intercept': min(df_ctrl['total_volume'])
}

In addition, in every sm.sampling() I would add n_jobs=-1 to run faster the code (if it can be helpful).

As always, Sibyl, thank you very much for your help and for the code you published. You are giving a big help to everyone needs it.

Best regards

How to set free max_lag for each media in second model with prior or fix command

Hello,

I'd like to know how we have to change the code below-mentioned to have LAG free or fixed for every channel.

First case:
Set Lag free and use Priors; I think max_lag ~ beta(1,2) or (2,>4) can be good.
Here you can have an example of prior distribution beta simulator: https://keisan.casio.com/exec/system/1180573226

Second case
In this case we can choose which lag we want to fix for each channel.
Example:
max_lag_media_a = 10
max_lag_media_b = 4
max_lag_media_c = 6
etc...

Thanks

# 2.2 Marketing Mix Model
df_mmm, sc_mmm = mean_log1p_trandform(df, ['sales', 'base_sales'])
mu_mdip = df[mdip_cols].apply(np.mean, axis=0).values
max_lag = 8
num_media = len(mdip_cols)
# padding zero * (max_lag-1) rows
X_media = np.concatenate((np.zeros((max_lag-1, num_media)), df[mdip_cols].values), axis=0)
X_ctrl = df_mmm['base_sales'].values.reshape(len(df),1)
model_data2 = {
    'N': len(df),
    'max_lag': max_lag, 
    'num_media': num_media,
    'X_media': X_media, 
    'mu_mdip': mu_mdip,
    'num_ctrl': X_ctrl.shape[1],
    'X_ctrl': X_ctrl, 
    'y': df_mmm['sales'].values
}

model_code2 = '''
functions {
  // the adstock transformation with a vector of weights
  real Adstock(vector t, row_vector weights) {
    return dot_product(t, weights) / sum(weights);
  }
}
data {
  // the total number of observations
  int<lower=1> N;
  // the vector of sales
  real y[N];
  // the maximum duration of lag effect, in weeks
  int<lower=1> max_lag;
  // the number of media channels
  int<lower=1> num_media;
  // matrix of media variables
  matrix[N+max_lag-1, num_media] X_media;
  // vector of media variables' mean
  real mu_mdip[num_media];
  // the number of other control variables
  int<lower=1> num_ctrl;
  // a matrix of control variables
  matrix[N, num_ctrl] X_ctrl;
}
parameters {
  // residual variance
  real<lower=0> noise_var;
  // the intercept
  real tau;
  // the coefficients for media variables and base sales
  vector<lower=0>[num_media+num_ctrl] beta;
  // the decay and peak parameter for the adstock transformation of
  // each media
  vector<lower=0,upper=1>[num_media] decay;
  vector<lower=0,upper=ceil(max_lag/2)>[num_media] peak;
}
transformed parameters {
  // the cumulative media effect after adstock
  real cum_effect;
  // matrix of media variables after adstock
  matrix[N, num_media] X_media_adstocked;
  // matrix of all predictors
  matrix[N, num_media+num_ctrl] X;
  
  // adstock, mean-center, log1p transformation
  row_vector[max_lag] lag_weights;
  for (nn in 1:N) {
    for (media in 1 : num_media) {
      for (lag in 1 : max_lag) {
        lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
      }
     cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
     X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
    }
  X <- append_col(X_media_adstocked, X_ctrl);
  } 
}
model {
  decay ~ beta(3,3);
  peak ~ uniform(0, ceil(max_lag/2));
  tau ~ normal(0, 5);
  for (i in 1 : num_media+num_ctrl) {
    beta[i] ~ normal(0, 1);
  }
  noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
  y ~ normal(tau + X * beta, sqrt(noise_var));
}
'''

Definition formula of mc_df['mc_delta']

Hi Sibyl

thanks to this repository, I now have a deeper understanding of marketing mix modeling

I have a question about line 541 of mmm_stan.py.
https://github.com/sibylhe/mmm_stan/blob/main/mmm_stan.py#L541

in README, definition of delta is mc_pred - mc_ture, but delta is mc_true - mc_pred in line 541 . Is there any problem?

sibylhe / mmm_stan Goto Github PK

mmm_stan's People

Contributors

Stargazers

Watchers

Forkers

mmm_stan's Issues

Recommend Projects

Recommend Topics

Recommend Org