The documentation of CoxPHSurvivalAnalysis says:

Great work <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis about scikit-survival HOT 10 CLOSED

sebp commented on June 3, 2024 1

Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis

from scikit-survival.

Comments (10)

leihuang commented on June 3, 2024 1

By referencing the scikit-learn documentation (eg, see here) and experimenting with parameters, I was able to match the results of the two models:

from sksurv.linear_model import CoxPHSurvivalAnalysis, CoxnetSurvivalAnalysis
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.preprocessing import OneHotEncoder

X_, y_ = load_veterans_lung_cancer()
X = OneHotEncoder().fit_transform(X_)

f = CoxPHSurvivalAnalysis(alpha=1, n_iter=100000)
g = CoxnetSurvivalAnalysis(alphas=[1/X.shape[0]], alpha_min_ratio=1, n_alphas=1, l1_ratio=1e-16, tol=1e-09, normalize=False)

print(f)
print(g)

f.fit(X, y_)
g.fit(X, y_)

print(f.coef_)
print(g.coef_[:,0])
print('Relative errors:\n', (f.coef_ - g.coef_[:,0]) / f.coef_)

Output:

CoxPHSurvivalAnalysis(alpha=1, n_iter=100000, tol=1e-09, verbose=0)
CoxnetSurvivalAnalysis(alpha_min_ratio=1, alphas=[0.0072992700729927005],
            copy_X=True, l1_ratio=1e-16, max_iter=100000, n_alphas=1,
            normalize=False, penalty_factor=None, tol=1e-09, verbose=False)
[-8.15647317e-03 -6.63327564e-01 -2.36950223e-01 -1.04657588e+00
 -3.26009359e-02 -2.71347095e-04  5.39815287e-02  2.89290932e-01]
[-8.15323389e-03 -6.63228135e-01 -2.36836564e-01 -1.04645919e+00
 -3.25993975e-02 -2.71575816e-04  5.39863418e-02  2.89279963e-01]
Relative errors:
 [ 3.97142155e-04  1.49894366e-04  4.79676516e-04  1.11497071e-04
  4.71911560e-05 -8.42911801e-04 -8.91616158e-05  3.79195919e-05]

This is reassuring.

from scikit-survival.

leihuang commented on June 3, 2024 1

Now that the two models can give the same results, we can benchmark their performance and I find that CoxnetSurvivalAnalysis is about 35X faster than CoxPHSurvivalAnalysis. Given that CoxnetSurvivalAnalysis also implements a more general model (L1+L2 regularization), it makes sense to use it only for now, until one needs to dig into the details of the algorithm execution (where CoxPHSurvivalAnalysis provides better transparency).

from scikit-survival.

sebp commented on June 3, 2024

CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis are related but do not implement the same model. As you summarized, the difference is in penalty terms. Thus, the coefficients found by CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis will differ. In addition, CoxnetSurvivalAnalysis supports fitting a path of coefficients for varying penalization strength alpha without much computational overhead. CoxPHSurvivalAnalysis will return the result for a single a penalization strength alpha only. This is the main reason why constructors differ.

The fact that predict_survival_function is not available for CoxnetSurvivalAnalysis is an oversight which should be corrected.

from scikit-survival.

leihuang commented on June 3, 2024

Since CoxnetSurvivalAnalysis supports L1 and L2 regularization while CoxPHSurvivalAnalysis supports only L2 regularization, by carefully matching the model parameters they should return the same results, and this is what the codes above try to do. So, why do CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000) and CoxnetSurvivalAnalysis(alphas=[0.5], alpha_min_ratio=1, n_alphas=1, l1_ratio=1e-16, tol=1e-09, normalize=False) not give the same results?

from scikit-survival.

sebp commented on June 3, 2024

You cannot completely eliminate the l1 penalty in CoxnetSurvivalAnalysis, thus you aren't able to exactly match the result of CoxnetSurvivalAnalysis.

from scikit-survival.

leihuang commented on June 3, 2024

I think it makes sense to devote this issue to the different results between the two models, so I just changed the title back, and will close this thread now that it's resolved. Given that CoxnetSurvivalAnalysis seems a more desirable implementation as I concluded above, it is important to have those methods that are so far only available to CoxPHSurvivalAnalysis also available to CoxnetSurvivalAnalysis. I will open a different issue for that.

from scikit-survival.

plpxsk commented on June 3, 2024

Great work @leihuang !

from scikit-survival.

WellJoea commented on June 3, 2024

So which coef was the best recommended used for model？
the first one or the last one？
g.coef_[:,0] or g.coef_[:,-1]

from scikit-survival.

sebp commented on June 3, 2024

It depends on your application really. Usually, you want to do cross-validation to determine the optimal alpha and thus coefficients. For an example, see e.g. https://gist.github.com/sebp/d580d44c4beab3379c6dfda6810b33b8

from scikit-survival.

WellJoea commented on June 3, 2024

I see. Thank you very much!

from scikit-survival.

Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis about scikit-survival HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent