sibylhe / mmm_stan Goto Github PK
View Code? Open in Web Editor NEWPython/STAN Implementation of Multiplicative Marketing Mix Model, with deep dive into Adstock (carry-over effect), ROAS, and mROAS
License: MIT License
Python/STAN Implementation of Multiplicative Marketing Mix Model, with deep dive into Adstock (carry-over effect), ROAS, and mROAS
License: MIT License
Hi Sibyl,
Sorry I thought to asnwer you about this question.
"Why do you want to avoid 0's, or what's your purpose for changing the adstock? I don't understand your problem and goal, if you let me know, I might have more specific answer" At the end you were right, it didn't change anything. So it's correct to leave the code like this.
I am sharing some chunks of the code that I added/modified, hoping they can be helpful.
Here I add a new part for the First Model to have the R2 and the AdjR2 (I don't know how much can be helpful, but I hope a little bit). Just plug at the end of the Stan Code.
###FIRST MODEL
###STAN CODE
generated quantities {
real rss;
real totalss;
real <upper=1> R2 ;
real <upper=1> AdjR2;
vector[N] mu;
mu <- X1 * beta1 + X2 * beta2 + X3 * beta3 + alpha;
rss <- dot_self(y-mu);
totalss <- dot_self(y-mean(y));
R2 <- 1 - rss/totalss;
AdjR2 <- 1 - ((1-R2)*(N-1))/(N - (K1+K2+K3) - 1);
}
'''
sm1 = pystan.StanModel(model_code=ctrl_code1, verbose=True)
fit1 = sm1.sampling(data=ctrl_data, iter=5000, chains=5, n_jobs=-1, seed=10)
fit1_result` = fit1.extract()
fit1_result["R2"].mean()
fit1_result["AdjR2"].mean()
In the Second Model, I plugged into it a new chunk to have also the media competitors effect (forced as negative). I create a new matrix Z and for it a specific peak and decay. In addition is possible to see also distribution beta for them.
However, I don't know if this is out of the scope of your publication.
Regarding this second step, I have one question. As you can see, I am calculating the y_pred like this:
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
where "mdip_comp_cols" are the columns of the media competitors. I did this to avoid the negative values since I calculated the beta competitors like this:
for i in range(num_media_comp):
colname = media_comp_vars[i]
factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
Do you know how to calculate the y_pred considering also the effect of the media competitors?
In addition I tried to take it out the R2 and the AdjR2 without having results. At the same time I don't know how much can be helpful to have it. Eventually, in this code I didn't consider the mc_delta for the proportion.
Here you can find the code. I started from the starting point of Second Model because there are many steps in between.
It is only necessary to create this in the first lines of the importing phase also because it is necessary to use them to check for multicollinearity among media:
mdip_comp_cols = [col for col in df.columns if 'atl_' in col]
###FIRST MODEL
df_mmm, sc_mmm = mean_log1p_trandform(df, ['total_volume', 'base_sales'])
mu_mdip = df[mdip_cols].apply(np.mean, axis=0).values
mu_mdip_comp = df[mdip_comp_cols].apply(np.mean, axis=0).values
max_lag = 8
num_media = len(mdip_cols)
num_media_comp = len(mdip_comp_cols)
X_media = np.concatenate((np.zeros((max_lag-1, num_media)), df[mdip_cols].values), axis=0)
X_comp_media = np.concatenate((np.zeros((max_lag-1, num_media_comp)), df[mdip_comp_cols].values), axis=0)
X_ctrl = df_mmm['base_sales'].values.reshape(len(df),1)
###STAN CODE
model_data2 = {
'N': len(df),
'max_lag': max_lag,
'num_media': num_media,
'num_media_comp': num_media_comp,
'X_media': X_media,
'X_media_comp': X_comp_media,
'mu_mdip': mu_mdip,
'mu_mdip_comp': mu_mdip_comp,
'num_ctrl': X_ctrl.shape[1],
'X_ctrl': X_ctrl,
'y': df_mmm['total_volume'].values
}
model_code2 = '''
functions {
// the adstock transformation with a vector of weights
real Adstock(vector t, row_vector weights) {
return dot_product(t, weights) / sum(weights);
}
}
data {
// the total number of observations
int<lower=1> N;
// the vector of sales
real y[N];
// the maximum duration of lag effect, in weeks
int<lower=1> max_lag;
// the number of media channels
int<lower=1> num_media;
// the number of competitors media for TV
int<lower=1> num_media_comp;
// matrix of media variables
matrix[N+max_lag-1, num_media] X_media;
// matrix of media variables
matrix[N+max_lag-1, num_media_comp] X_media_comp;
// vector of media variables' mean
real mu_mdip[num_media];
// vector of media variables' mean competitors
real mu_mdip_comp[num_media_comp];
// the number of other control variables
int<lower=1> num_ctrl;
// a matrix of control variables
matrix[N, num_ctrl] X_ctrl;
}
parameters {
// residual variance
real<lower=0> noise_var;
// the intercept
real tau;
// the coefficients for media variables and base sales
vector<lower=0>[num_media+num_ctrl] beta;
// the coefficients for media competitors variables
vector<upper=0>[num_media_comp] beta2;
// the decay and peak parameter for the adstock transformation of
// each media
vector<lower=0,upper=1>[num_media] decay;
vector<lower=0,upper=ceil(max_lag/2)>[num_media] peak;
vector<lower=0,upper=1>[num_media_comp] decay2;
vector<lower=0,upper=ceil(max_lag/2)>[num_media_comp] peak2;
}
transformed parameters {
// the cumulative media effect after adstock
real cum_effect;
// the cumulative media effect after adstock of competitors
real cum_effect_comp;
// matrix of media variables after adstock
matrix[N, num_media] X_media_adstocked;
// matrix of media variables competitors after adstock
matrix[N, num_media_comp] Z_media_adstocked;
// matrix of all predictors
matrix[N, num_media+num_ctrl] X;
// matrix of all competitors predictors
matrix[N, num_media_comp] Z;
// adstock, mean-center, log1p transformation
row_vector[max_lag] lag_weights;
// adstock, mean-center, log1p transformation - for competitors
row_vector[max_lag] lag_weights_comp;
for (nn in 1:N) {
for (media in 1 : num_media) {
for (lag in 1 : max_lag) {
lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
}
cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
}
X <- append_col(X_media_adstocked, X_ctrl);
}
for (nn in 1:N) {
for (media_c in 1 : num_media_comp) {
for (lag in 1 : max_lag) {
lag_weights_comp[max_lag-lag+1] <- pow(decay2[media_c], (lag - 1 - peak2[media_c]) ^ 2);
}
cum_effect_comp <- Adstock(sub_col(X_media_comp, nn, media_c, max_lag), lag_weights_comp);
Z_media_adstocked[nn, media_c] <- log1p(cum_effect_comp/mu_mdip_comp[media_c]);
}
Z <- Z_media_adstocked;
}
}
model {
decay ~ beta(3,3);
peak ~ uniform(0, ceil(max_lag/2));
decay2 ~ beta(3,3);
peak2 ~ uniform(0, ceil(max_lag/2));
tau ~ normal(0, 5);
for (i in 1 : num_media+num_ctrl) {
beta[i] ~ normal(0, 1);
}
for (i in 1 : num_media_comp) {
beta2[i] ~ normal(0, 1);
}
noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
y ~ normal(tau + X * beta + Z * beta2, sqrt(noise_var));
}
'''
###FITTING AND EXTRAPOLATION FEATURES
sm2 = pystan.StanModel(model_code=model_code2, verbose=True)
fit2 = sm2.sampling(data=model_data2, iter=4000, chains=5, n_jobs=-1, seed=10)
fit2_result = fit2.extract()
stan_utility.check_all_diagnostics(fit2)
def extract_mmm(fit_result, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, ctrl_vars=['base_sales'], extract_param_list=True):
mmm = {}
mmm['max_lag'] = max_lag
mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars'] = media_vars, media_comp_vars, ctrl_vars
mmm['decay'] = decay = fit_result['decay'].mean(axis=0).tolist()
mmm['decay2'] = decay2 = fit_result['decay2'].mean(axis=0).tolist()
mmm['peak'] = peak = fit_result['peak'].mean(axis=0).tolist()
mmm['peak2'] = peak2 = fit_result['peak2'].mean(axis=0).tolist()
mmm['beta'] = fit_result['beta'].mean(axis=0).tolist()
mmm['beta2'] = fit_result['beta2'].mean(axis=0).tolist()
mmm['tau'] = fit_result['tau'].mean()
if extract_param_list:
mmm['decay_list'] = fit_result['decay'].tolist()
mmm['decay2_list'] = fit_result['decay2'].tolist()
mmm['peak_list'] = fit_result['peak'].tolist()
mmm['peak2_list'] = fit_result['peak2'].tolist()
mmm['beta_list'] = fit_result['beta'].tolist()
mmm['beta2_list'] = fit_result['beta2'].tolist()
mmm['tau_list'] = fit_result['tau'].tolist()
adstock_params = {}
media_names = [col.replace('metric_', '') for col in media_vars]
for i in range(len(media_names)):
adstock_params[media_names[i]] = {
'L': max_lag,
'P': peak[i],
'D': decay[i]
}
mmm['adstock_params'] = adstock_params
adstock_params_comp = {}
media_comp_names = [col.replace('atl_', '') for col in media_comp_vars]
for i in range(len(media_comp_names)):
adstock_params_comp[media_comp_names[i]] = {
'L': max_lag,
'P': (peak2 if type(peak2) == float else peak2[i]),
'D': (decay2 if type(decay2) == float else decay2[i])
}
mmm['adstock_params_comp'] = adstock_params_comp
return mmm
mmm = extract_mmm(fit2, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars=mdip_comp_cols, ctrl_vars=['base_sales'])
###BETA PLOT
beta_media = {}
for i in range(len(mmm['media_vars'])):
md = mmm['media_vars'][i]
betas = []
for j in range(len(mmm['beta_list'])):
betas.append(mmm['beta_list'][j][i])
beta_media[md] = np.array(betas)
f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_vars'])):
ax = f.add_subplot(5,3,i+1)
md = mmm['media_vars'][i]
x = beta_media[md]
mean_x = x.mean()
median_x = np.median(x)
ax = sns.distplot(x)
ax.axvline(mean_x, color='r', linestyle='-')
ax.axvline(median_x, color='g', linestyle='-')
ax.set_title(md)
###BETA COMP
beta_media2 = {}
for i in range(len(mmm['media_comp_vars'])):
md = mmm['media_comp_vars'][i]
betas = []
for j in range(len(mmm['beta2_list'])):
(betas.append(mmm['beta2_list'][j]) if type(mmm['beta2']) == float else betas.append(mmm['beta2_list'][j][i]))
beta_media2[md] = np.array(betas)
f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_comp_vars'])):
ax = f.add_subplot(5,3,i+1)
md = mmm['media_comp_vars'][i]
x = beta_media2[md]
mean_x = x.mean()
median_x = np.median(x)
ax = sns.distplot(x)
ax.axvline(mean_x, color='r', linestyle='-')
ax.axvline(median_x, color='g', linestyle='-')
ax.set_title(md)
###MEDIA CONTRIBUTION
def mmm_decompose_contrib(mmm, df, original_sales=df['total_volume']):
# adstock params
adstock_params = mmm['adstock_params']
# coefficients, intercept
adstock_params_comp = mmm['adstock_params_comp']
beta, beta2, tau = mmm['beta'], mmm['beta2'], mmm['tau']
# variables
media_vars, media_comp_vars, ctrl_vars = mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars']
num_media, num_media_comp, num_ctrl = len(media_vars), len(media_comp_vars), len(ctrl_vars)
# X_media2: adstocked, mean-centered media variables + 1
# It apply adstock formula for every channel, transforming values
X_media2 = adstock_transform(df, media_vars, adstock_params)
X_media_comp2 = adstock_transform(df, media_comp_vars, adstock_params_comp)
# it paramtrize to mean 0 all media sc is mean
X_media2, sc_mmm2 = mean_center_trandform(X_media2, media_vars)
X_media_comp2, sc_mmm_comp_2 = mean_center_trandform(X_media_comp2, media_comp_vars)
#it sums to every channel +1: it means that all 0 are +1
X_media2 = X_media2 + 1
X_media_comp2 = X_media_comp2 + 1
# X_ctrl2, mean-centered control variables + 1
# It parametrize to mean 0 the base_sales
X_ctrl2, sc_mmm2_1 = mean_center_trandform(df[ctrl_vars], ctrl_vars)
X_ctrl2 = X_ctrl2 + 1
# y_true2, mean-centered sales variable + 1
y_true2, sc_mmm2_2 = mean_center_trandform(df, ['total_volume'])
y_true2 = y_true2 + 1
#it adds the line sc_mmm2_1 and sc_mm2_2 to sc_mmm2 -- why?
sc_mmm2.update(sc_mmm2_1)
sc_mmm2.update(sc_mmm2_2)
# X2 <- media variables + ctrl variable
X2 = pd.concat([X_media2, X_ctrl2, X_media_comp2], axis=1)
# 1. compute each media/control factor:
# log-log model: log(sales) = log(X[0])*beta[0] + ... + log(X[13])*beta[13] + tau
# multiplicative model: sales = X[0]^beta[0] * ... * X[13]^beta[13] * e^tau
# each factor = X[i]^beta[i]
# intercept = e^tau
# Here is transformation from log-log to base
# Indeed all the beta for parameter estimated are ^(**). B[9] = beta[num_media+i] = beta for base_sales
factor_df = pd.DataFrame(columns=media_vars+ctrl_vars+media_comp_vars+['intercept'])
for i in range(num_media):
colname = media_vars[i]
factor_df[colname] = X2[colname] ** beta[i]
for i in range(num_ctrl):
colname = ctrl_vars[i]
factor_df[colname] = X2[colname] ** beta[num_media+i]
for i in range(num_media_comp):
colname = media_comp_vars[i]
factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
factor_df['intercept'] = np.exp(tau)
# 2. calculate the product of all factors -> y_pred
# baseline = intercept * control factor = e^tau * X[13]^beta[13]
# .apply(np.prod, axis = 1. Multiplication for every row)
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
factor_df['y_pred'], factor_df['y_true2'] = y_pred, y_true2
factor_df['baseline'] = factor_df[['intercept']+ctrl_vars].apply(np.prod, axis=1)
# 3. calculate each media factor's contribution
# media contribution = total volume – volume upon removal of the media factor
mc_df = pd.DataFrame(columns=media_vars+media_comp_vars+['baseline'])
for col in (media_vars):
mc_df[col] = factor_df['y_true2'] - factor_df['y_true2']/factor_df[col]
for col in (media_comp_vars):
mc_df[col] = factor_df['y_true2'] + factor_df['y_true2']/factor_df[col]
mc_df['baseline'] = factor_df['baseline']
mc_df['y_true2'] = factor_df['y_true2']
# 4. scale contribution
# difference between mc_true(y_true2 - baseline) and mc_pred. This difference
# used to regauge/scale/proportion w.r.t. each media factors
# predicted total media contribution: product of all media factors
mc_df['mc_pred'] = mc_df[media_vars + media_comp_vars].apply(np.sum, axis=1)
# true total media contribution: total volume - baseline
mc_df['mc_true'] = mc_df['y_true2'] - mc_df['baseline']
# 5. scale mc_df based on original sales
mc_df['total_volume'] = original_sales
for col in media_vars+media_comp_vars+['baseline']:
mc_df[col] = mc_df[col]*mc_df['total_volume']/mc_df['y_true2']
print('rmse (log-log model): ',
(mean_squared_error(y_true2, y_pred, squared = False))/(y_true2.mean())*100)
print('mape (multiplicative model): ',
mean_absolute_percentage_error(y_true2, y_pred))
return mc_df
###calculate media contribution percentage
def calc_media_contrib_pct(mc_df, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, sales_col='total_volume', period=52):
'''
returns:
mc_pct: percentage over total sales
mc_pct2: percentage over incremental sales (sales contributed by media channels)
'''
mc_pct = {}
mc_pct2 = {}
s = 0
if period is None:
for col in (media_vars+media_comp_vars+['baseline']):
mc_pct[col] = (mc_df[col]/mc_df[sales_col]).mean()
else:
for col in (media_vars+media_comp_vars+['baseline']):
mc_pct[col] = (mc_df[col]/mc_df[sales_col])[-period:].mean()
for m in media_vars:
s += mc_pct[m]
for m in (media_vars+media_comp_vars):
mc_pct2[m] = mc_pct[m]/s
return mc_pct, mc_pct2
As always thank you for your help, hoping these modifications can be helpful or at least interesting
@sibylhe I wanted to convey my thanks for sharing this truly comprehensive example of Marketing Mix models using STAN based library/package. I come from the R side and I have used STAN extensively, but not as well versed on the Python side with Pystan. This was a great tutorial to follow and use in different datasets that I have.
A sincere thank you! I'm not well versed with Git Hub, but eventually would like to share some of the examples I have developed using/modifying your code. Let me know how I can contribute here.
Sree
Hi Sibyl.
Hoping you are fine.
I have found a really strange situation that I want to share.
This formula:
def mean_absolute_percentage_error(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred)
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
I have seen that if I am using the same value that is inside a DataFrame instead of an array, with the exact same values, results are totally different.
I will better explain, focusing on the calculation inside the formula mmm_decompose_contrib():
If you consider this: mc_df["y_true3"], mc_df["y_pred"] = y_true2, y_pred
np.mean(np.abs((np.array(mc_df["y_true3"]) - np.array(mc_df["y_pred"]))/np.array(mc_df["y_true3"]))) * 100
is totally different compared to np.mean(np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2))) * 100
I have check it step by step. And the output of this step:
np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2)) it is not clear to me, because the output is a combination of array
whilst the output of this :
np.abs((np.array(mc_df["y_true3"]) - np.array(mc_df["y_pred"]))/np.array(mc_df["y_true3"])) * 100 is a singular array, and it makes sense to me.
If it was an error, I suggest inside the formula mmm_decompose_contrib() to put the data required for the MAPE, into the mc_df and then to pass it through the formula.
Hi Sibyl,
First, let me thank you in advance for both your post (which is pretty good) and for your help!
I wanted to use your approach in an example but I can not fix the "RuntimeError: Initialization failed" when I do the ".sampling" and I really do not know what else to do. Could you give me any little advice?
Here it goes the data I provide to "sampling", and the error messagge:
Model data:
Error message:
"RuntimeError Traceback (most recent call last)
in
----> 1 fit2 = sm2.sampling(data=model_data2, iter=2000, chains=4, verbose=True, n_jobs=1)
C:\ProgramData\Anaconda3\lib\site-packages\pystan\model.py in sampling(self, data, pars, chains, iter, warmup, thin, seed, init, sample_file, diagnostic_file, verbose, algorithm, control, n_jobs, **kwargs)
811 call_sampler_args = izip(itertools.repeat(data), args_list, itertools.repeat(pars))
812 call_sampler_star = self.module._call_sampler_star
--> 813 ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
814 samples = [smpl for _, smpl in ret_and_samples]
815
C:\ProgramData\Anaconda3\lib\site-packages\pystan\model.py in _map_parallel(function, args, n_jobs)
88 pool.join()
89 else:
---> 90 map_result = list(map(function, args))
91 return map_result
92
stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848.pyx in stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848._call_sampler_star()
stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848.pyx in stanfit4anon_model_c6677ffefdee0513f144508ee1783d0c_6434600101167848._call_sampler()
RuntimeError: Initialization failed."
I hope you might help me,
Once again, thank you so much.
Best regards.
You're under no obligation to choose a license. However, without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work.
Since this repo is getting popular and already has over 15 forks, I think it would be wise to add a license.
Hello,
I'd like to know how we have to change the code below-mentioned to have LAG free or fixed for every channel.
First case:
Set Lag free and use Priors; I think max_lag ~ beta(1,2) or (2,>4) can be good.
Here you can have an example of prior distribution beta simulator: https://keisan.casio.com/exec/system/1180573226
Second case
In this case we can choose which lag we want to fix for each channel.
Example:
max_lag_media_a = 10
max_lag_media_b = 4
max_lag_media_c = 6
etc...
Thanks
# 2.2 Marketing Mix Model
df_mmm, sc_mmm = mean_log1p_trandform(df, ['sales', 'base_sales'])
mu_mdip = df[mdip_cols].apply(np.mean, axis=0).values
max_lag = 8
num_media = len(mdip_cols)
# padding zero * (max_lag-1) rows
X_media = np.concatenate((np.zeros((max_lag-1, num_media)), df[mdip_cols].values), axis=0)
X_ctrl = df_mmm['base_sales'].values.reshape(len(df),1)
model_data2 = {
'N': len(df),
'max_lag': max_lag,
'num_media': num_media,
'X_media': X_media,
'mu_mdip': mu_mdip,
'num_ctrl': X_ctrl.shape[1],
'X_ctrl': X_ctrl,
'y': df_mmm['sales'].values
}
model_code2 = '''
functions {
// the adstock transformation with a vector of weights
real Adstock(vector t, row_vector weights) {
return dot_product(t, weights) / sum(weights);
}
}
data {
// the total number of observations
int<lower=1> N;
// the vector of sales
real y[N];
// the maximum duration of lag effect, in weeks
int<lower=1> max_lag;
// the number of media channels
int<lower=1> num_media;
// matrix of media variables
matrix[N+max_lag-1, num_media] X_media;
// vector of media variables' mean
real mu_mdip[num_media];
// the number of other control variables
int<lower=1> num_ctrl;
// a matrix of control variables
matrix[N, num_ctrl] X_ctrl;
}
parameters {
// residual variance
real<lower=0> noise_var;
// the intercept
real tau;
// the coefficients for media variables and base sales
vector<lower=0>[num_media+num_ctrl] beta;
// the decay and peak parameter for the adstock transformation of
// each media
vector<lower=0,upper=1>[num_media] decay;
vector<lower=0,upper=ceil(max_lag/2)>[num_media] peak;
}
transformed parameters {
// the cumulative media effect after adstock
real cum_effect;
// matrix of media variables after adstock
matrix[N, num_media] X_media_adstocked;
// matrix of all predictors
matrix[N, num_media+num_ctrl] X;
// adstock, mean-center, log1p transformation
row_vector[max_lag] lag_weights;
for (nn in 1:N) {
for (media in 1 : num_media) {
for (lag in 1 : max_lag) {
lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
}
cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
}
X <- append_col(X_media_adstocked, X_ctrl);
}
}
model {
decay ~ beta(3,3);
peak ~ uniform(0, ceil(max_lag/2));
tau ~ normal(0, 5);
for (i in 1 : num_media+num_ctrl) {
beta[i] ~ normal(0, 1);
}
noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
y ~ normal(tau + X * beta, sqrt(noise_var));
}
'''
In the function below we save hill_model['data']['y'] as a list of floats.
def train_hill_model(df, mc_df, adstock_params, media, sm):
data, sc = create_hill_model_data(df, mc_df, adstock_params, media)
fit = sm.sampling(data=data, iter=2000, chains=4)
fit_result = fit.extract()
hill_model = {
'beta_hill_list': fit_result['beta_hill'].tolist(),
'ec_list': fit_result['ec'].tolist(),
'slope_list': fit_result['slope'].tolist(),
'sc': sc,
'data': {
'X': data['X'].tolist(),
'y': data['y'].tolist(),
}
}
return hill_model
However later on we try to multiply hill_model['data']['y'] by a float. Since it is a list it can't legally be multiplied by a float, and multiplication by an integer just increases the size of the list.
def evaluate_hill_model(hill_model, hill_model_params):
x = hill_model['data']['X']
y_true = hill_model['data']['y'] * hill_model['sc']['y']
print(y_true)
y_pred = hill_model_predict(hill_model_params, x) * hill_model['sc']['y']
print(y_pred)
print('mape on original data: ',
mean_absolute_percentage_error(y_true, y_pred))
return y_true, y_pred
To be able to preform the multiplication hill_model['data']['y'] * hill_model['sc']['y'] then hill_model['data']['y'] needs to be a numpy array, not a list.
Hi Sibyl,
Hoping you are well.
I have some theoretical questions about the model and some suggestions for the code.
About Marketing Mix Model (model 2):
If I have to benchmark different models, which metrics do you suggest to use? Every time I check for the MAPE and RMSE are always pretty similar.
How can we evaluate the impact of each control variable on the baseline?
In your dataset, your Media is often ON. What happens if we have interspersed Media values with many 0s from campaign to campaign? In this case if our lag_effect willl not sufficient to cover this gap, the cumulative effect will be 0.
For this last problem, I decided to change the STAN code, it works, but I am not sure if it is the right approach.
Here you can find the changed code:
“””// adstock, mean-center, log1p transformation
row_vector[max_lag] lag_weights;
for (nn in 1:N) {
for (media in 1 : num_media) {
for (lag in 1 : max_lag) {
lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
}
cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
if (cum_effect == 0) {
X_media_adstocked[nn, media] <- 0;
}
else {
X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
}
}
X <- append_col(X_media_adstocked, X_ctrl);
}
}
“””
Instead of this modification I made, do you suggest other solutions? For example, creating a vector of cumulative effect, transform it to avoid 0 and then pass it through the formula (log1p(cum_effect/mu_mdip[media), could be effective?
How can we deal if we have variables that are negative and positive as, e.g., seasonality (variation from negative to positive). Do you think a transformation as (X-X.min())/(X.max()-X.min()) can be a right approach, or do you suggest to keep the mean transformation?
Why do we need to apply a log_mean_center transformation in the second model?
I think the undermentioned parts can improve the code to avoid some errors.
Control Model (model 1)
I think this solution can help, if we want to use only one variable for one specific beta, otherwise problems can arise.
I suggest transforming every control variable X in the matrix form to obtain the shape.
For example:
pos_vars = pos_vars = [col for col in base_vars if col not in seas_cols] -> here we have more than one variables
X1 = np.matrix(df_ctrl[pos_vars].values).reshape(len(df),(len([pos _vars]) if type(pos_vars) == str else len(pos_vars))))
pn_vars = seas_cols[1] -> we have only one control variable inside
X2 = np.matrix(df_ctrl[pn_vars].values).reshape(len(df),(len([pn_vars]) if type(pn_vars) == str else len(pn_vars)))) –> to mantain coherence shape later with the ctrl_data dictionary
ctrl_data = {
'N': len(df_ctrl),
'K1': X1.shape[1], --> instead of len(pos_vars)
'K2': X2.shape[1], --> instead of len(pn_vars)
'X1': X1,
'X2': X2,
'y': df_ctrl[base_sales'].values,
'max_intercept': min(df_ctrl['total_volume'])
}
In addition, in every sm.sampling() I would add n_jobs=-1 to run faster the code (if it can be helpful).
As always, Sibyl, thank you very much for your help and for the code you published. You are giving a big help to everyone needs it.
Best regards
Hello,thanks for your great work. I am learning this now.
About the adstock, I ran the following code:
data=pd.DataFrame({"x":[100,100,100,100,100,100,100,100,100,100,100,100,100,100,100]})
L=4
P=1
D=0.8
x_adstocked = apply_adstock(data['x'].values, L, P, D)
Got these numbers:
array([ 26.58160553, 59.80861244, 86.39021797, 100. ,
100. , 100. , 100. , 100. ,
100. , 100. , 100. , 100. ,
100. , 100. , 100. ])
My question is:
Do these numbers are right adstock? or Is there any part i am missing?
Thanks.
Hi Sibyl
thanks to this repository, I now have a deeper understanding of marketing mix modeling
I have a question about line 541 of mmm_stan.py.
https://github.com/sibylhe/mmm_stan/blob/main/mmm_stan.py#L541
in README, definition of delta is mc_pred - mc_ture
, but delta is mc_true - mc_pred
in line 541 . Is there any problem?
Hello, I've had the idea to create a media mix modeling library in python for a while. I really liked your model, so I wrapped it in a class that would make it easy to use. You can see the start of it here.
https://github.com/NichoGustafson/autoMMM
So far I haven't tested the code much, but I plan to work on it much more in the coming week. I'd like to add, in addition to your Bayesian model, an interface for simpler models as well. If you would like to collaborate on this project, please let me know; I'm very busy with work, so I need all the help I can get.
Thanks!
Nick
Hi Sibyl.
This formula:
def mean_absolute_percentage_error(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred)
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
I have seen that if I am using the same value that is inside a DataFrame instead of an array, with the exact same values, results are totally different. I attached real examples with your data examples.
I will better explain, focusing on the calculation inside the formula mmm_decompose_contrib():
If you consider this: mc_tt["y_true"], mc_tt["y_pred"] = y_true2, y_pred
np.mean(np.abs((np.array(mc_tt["y_true"]) - np.array(mc_tt["y_pred"]))/np.array(mc_tt["y_true"]))) * 100
is totally different compared to np.mean(np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2))) * 100. In your code it is as in this last form at line 552 and 553:
print('mape (multiplicative model): ',
mean_absolute_percentage_error(y_true2, y_pred))
I have check it step by step. And the output of this step:
np.abs((np.array(y_true2) - np.array(y_pred)) / np.array(y_true2)) it is not clear to me, because the output is a combination of array.
Here you can find the example.
In this case, the final MAPE is 20%
Whilst the output of this:
np.abs((np.array(mc_tt["y_true"]) - np.array(mc_tt["y_pred"]))/np.array(mc_tt["y_true"])) * 100 is a singular array, and it makes sense to me. Because on that you calculate then the mean
In this second case MAPE is 11%.
If it was an error, I suggest inside the formula mmm_decompose_contrib() to put the data required for the MAPE, into the mc_df and then to pass it through the formula as you did here for the first model:
print('mape: ', mean_absolute_percentage_error(df['sales'], df['base_sales']))
First of all AWESOME STUFF!
However, in an effort to reproduce your results locally I spotted a few differences between your Jupyter notebook results and the ones given in the readme.
Value | stan_mmm.ipynb | readme.md |
---|---|---|
MAPE | 26.06330645337451 | 8.63 |
RMSE (log-log model) | 0.13647800963534104 | 0.04977 |
MAPE (multiplicative model) | 20.313393234793644 | 15.71 |
Have the Readme results been generated using different priors/parameters, or am I missing something obvious?
Hi
With the MROAS we are knowing where to allocate the budget for a channel instead of another one, for the Optimal Mix. Higher mroas means we need to allocate more budget to it, right?
Here my questions:
Where, but how much?
Are we considering the efficiency of the specific media?
E.g. after a certain threshold of investment, channel efficiency can saturate, and become useless to invest in that specific channel
In addition, if we have negative mroas, in practical terms, we are saying that we need to avoid to invest in that channel, because is counterproductive?
As always, thank you for the support
Hi Sibyl,
First of all: thank you very much for your precious help! The code is really helpful.
I decided to re run your example.
The line #483
def mmm_decompose_contrib(mmm, df, original_sales=df['sales']):
This code doesn't find proper connection with line 514 and 517, about "X" variable. Has it to be "X2"?
In addition at line #578
mc_df = mmm_decompose_media_contrib(mmm, df, y_true=df['sales_ln'])
I think that has to be "mmm_decompose_contrib" or viceversa at 483 "mmm_decompose_media_contrib"
Here I am finding another issue: y_true =df["sales_ln"]
Into the code I dind't find any kind of log transformation about the df for "sales" and also before, y_true it is not generated into the mmm_decompose, only "y_true2", that from line 555 to 556 a log transformation is made. How can we have to modify the code?
Again,
Thank you for your support!
Best
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.