sebp / scikit-survival Goto Github PK

Survival analysis built on top of scikit-learn

License: GNU General Public License v3.0

Shell 0.43% Python 91.82% C++ 4.64% C 0.08% Cython 3.00% PowerShell 0.05%

survival-analysis machine-learning python scikit-learn

scikit-survival's Introduction

scikit-survival

scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation.

About Survival Analysis

The objective in survival analysis (also referred to as time-to-event or reliability analysis) is to establish a connection between covariates and the time of an event. What makes survival analysis differ from traditional machine learning is the fact that parts of the training data can only be partially observed – they are censored.

For instance, in a clinical study, patients are often monitored for a particular time period, and events occurring in this particular period are recorded. If a patient experiences an event, the exact time of the event can be recorded – the patient’s record is uncensored. In contrast, right censored records refer to patients that remained event-free during the study period and it is unknown whether an event has or has not occurred after the study ended. Consequently, survival analysis demands for models that take this unique characteristic of such a dataset into account.

Requirements

Python 3.9 or later
ecos
joblib
numexpr
numpy
osqp
pandas 1.0.5 or later
scikit-learn 1.4
scipy
C/C++ compiler

Installation

The easiest way to install scikit-survival is to use Anaconda by running:

conda install -c sebp scikit-survival

Alternatively, you can install scikit-survival from source following this guide.

Examples

The user guide provides in-depth information on the key concepts of scikit-survival, an overview of available survival models, and hands-on examples in the form of Jupyter notebooks.

Help and Support

Documentation

HTML documentation for the latest release: https://scikit-survival.readthedocs.io/en/stable/
HTML documentation for the development version (master branch): https://scikit-survival.readthedocs.io/en/latest/
For a list of notable changes, see the release notes.

Bug reports

If you encountered a problem, please submit a bug report.

Questions

If you have a question on how to use scikit-survival, please use GitHub Discussions.
For general theoretical or methodological questions on survival analysis, please use Cross Validated.

Contributing

New contributors are always welcome. Please have a look at the contributing guidelines on how to get started and to make sure your code complies with our guidelines.

References

Please cite the following paper if you are using scikit-survival.

S. Pölsterl, "scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn," Journal of Machine Learning Research, vol. 21, no. 212, pp. 1–6, 2020.

@article{sksurv,
  author  = {Sebastian P{\"o}lsterl},
  title   = {scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {212},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/20-729.html}
}

scikit-survival's People

Contributors

Stargazers

Watchers

Forkers

walterreade jtkostman leyichi tylerwmarrs rsantana-isg sylviaxiaoyanwang plpxsk csetraynor skrypnyuk tulimck mehrdad-shokri prithwi jamesliao2016 baranshad devhliu junzh821 vinodhian drroad qiliu1013 badge larryliu912 ccchang0111 elveros83 pap212 sunbc0120 yeocuthbert ivanka727 pwforks vgs549 reyhaneh-gh jakubpizon aastha3 chaoyue729 nsa11 snowdj karmari116 qzhao drotich renjiege charon-x mhichar brunogolfette joofeloof vpolimenov adiguzelomer path-ai hermidalc oladapoduke upsudghosts alvarocalle sumedhkumarprasad simplehe fabriz-io pylro aaptedata eotp karagul xmur liuwenhaha m23shen leslie9121030 rserran nirupam1sharma pooyam spming maojingyi jcjs weijtang jrinconcol saneshashank mahmudrahman yangxiusong fyc0803 sonarahbar qkrwlgml05 junaidqazi timcowlishaw yanlirock laqua-stack iair nash0990 a-agmon sands58 yangyang1225 gitter-badger kusumy kev-kutkin winstonqxy ankit-da dilemmalab ds-heejae kormilitzin kavyak03 agnesbao shalevy1 dalelin93 bramamoorthy ssehztirom davidtxx liujiwen0517

scikit-survival's Issues

Survival Data sources

Hello,
Could you point me to some survival datasets on machines or equipment or even a webpage that I can scrape the details from?

Cox model with time varying covariates

Hi
Are there plans to add the Cox proportional hazard model with time varying covariates to the package?

cannot import name 'is_categorical_dtype' from 'pandas.core.common'

Pandas 0.18?

Is there a reason pandas 0.18 is required and not >=0.18?

Random forest survival for feature selection?

It seems that there should be an implementation of ensemble methods, but I cannot seem to find the documentation for this? How can I make this work?
Thank you!

Add predict_survival_function and predict_cumulative_hazard_function to CoxnetSurvivalAnalysis

The two methods are currently only available to CoxPHSurvivalAnalysis, and it is desirable to have them available to CoxnetSurvivalAnalysis as well.

Discuss/request: required y survival status can be 0/1 integer in addition to bool

Currently, outcome y is required to be a structured array as (event, time) with boolean event.

I'd like to discuss/request whether we can also accept a 0/1 integer event status, in addition to a boolean type.

Motivation:

sometimes I analyze the dataset in R and sometimes in Python. R accepts both boolean and 0/1 integers in survival analysis (see Surv's event).

It would be a good bonus to have sksurv also support both dtypes. Curious about others' thoughts.

wrong description

load_veterans_lung_cancer()
has the description: "Load and return the Worcester Heart Attack Study dataset"
I think it is a copy paste error (load_whas500() has the same description).

GradientBoostingSurvivalAnalysis needs to use min_impurity_decrease instead of min_impurity_split

Here's the error:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-89-df62d5a185bd> in <module>()
----> 1 estimator = GradientBoostingSurvivalAnalysis(n_estimators=1000, random_state=0)

~/.venv/201710_price_opt_fake_data/lib/python3.4/site-packages/sksurv/ensemble/boosting.py in __init__(self, loss, learning_rate, n_estimators, criterion, min_samples_split, min_samples_leaf, min_weight_fraction_leaf, max_depth, min_impurity_split, random_state, max_features, max_leaf_nodes, subsample, dropout_rate, verbose)
    486                          max_features=max_features,
    487                          max_leaf_nodes=max_leaf_nodes,
--> 488                          verbose=verbose)
    489         self.dropout_rate = dropout_rate
    490 

TypeError: __init__() missing 1 required positional argument: 'min_impurity_decrease'

Seems that the call to Tree-based models must now use min_impurity_decrease instead of min_impurity_split. See the release note here: http://scikit-learn.org/stable/whats_new.html

Infinite loop in sksurv.linear_model.CoxPHSurvivalAnalysis.fit() if loss doesn't decrease

In file coxph.py, CoxPHSurvivalAnalysis.fit():
If loss_new > loss always, the check
elif i >= self.n_iter
is never performed and we don't break the loop after n_iter iterations.

Also verbose mode isn't informative enough.

Installation Problem

I tried to install it but i get this error: (Im using windows and python 3.6-x64)

running install
running bdist_egg
running egg_info
running build_src
build_src
building extension "sksurv.bintrees._binarytrees" sources
building extension "sksurv.ensemble._coxph_loss" sources
building extension "sksurv.kernels._clinical_kernel" sources
building extension "sksurv.linear_model._coxnet" sources
building extension "sksurv.svm._prsvm" sources
building extension "sksurv.svm._minlip" sources
building data_files sources
build_src: building npy-pkg config files
writing scikit_survival.egg-info\PKG-INFO
writing dependency_links to scikit_survival.egg-info\dependency_links.txt
writing requirements to scikit_survival.egg-info\requires.txt
writing top-level names to scikit_survival.egg-info\top_level.txt
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.py' under directory 'examples'
warning: no files found matching '*.pxd' under directory 'sksurv'
warning: no directories found matching 'sksurv\linear_model\src\eigen\Eigen'
no previously-included directories found matching 'doc\_build'
no previously-included directories found matching 'doc\generated'
no previously-included directories found matching '*\__pycache__'
no previously-included directories found matching '*\.ipynb_checkpoints'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '*~' found anywhere in distribution
warning: no previously-included files matching '*.bak' found anywhere in distribution
warning: no previously-included files matching '*.swp' found anywhere in distribution
warning: no previously-included files matching '*.pyo' found anywhere in distribution
writing manifest file 'scikit_survival.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
customize MSVCCompiler using build_ext
No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
customize MSVCCompiler
Missing compiler_cxx fix for MSVCCompiler
customize MSVCCompiler using build_ext
building 'sksurv.ensemble._coxph_loss' extension
compiling C sources
C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.14.26428\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Users\talat\PycharmP
rojects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\core\include -IC:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\cor
e\include -IC:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\include -IC:\Users\talat\AppData\Local\Programs\Python\Python36\include -IC:\Users\talat\AppData
\Local\Programs\Python\Python36\include -I"C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.14.26428\include" -I"C:\Program Files (x86)\Window
s Kits\10\include\10.0.17134.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\um
" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\winrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.17134.0\cppwinrt" /Tcsksurv\ensemble\_coxph_lo
ss.c /Fobuild\temp.win-amd64-3.6\Release\sksurv\ensemble\_coxph_loss.obj
Traceback (most recent call last):
  File "setup.py", line 98, in <module>
    setup_package()
  File "setup.py", line 90, in setup_package
    setup(**metadata)
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\distutils\core.py", line 169, in setup
    return old_setup(**new_attr)
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\setuptools-39.1.0-py3.6.egg\setuptools\__init__.py", line 129, in setup
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\core.py", line 148, in setup
    dist.run_commands()
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\distutils\command\install.py", line 62, in run
    r = self.setuptools_run()
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\distutils\command\install.py", line 56, in setuptools_run
    self.do_egg_install()
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\setuptools-39.1.0-py3.6.egg\setuptools\command\install.py", line 109, in do_egg_in
stall
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\setuptools-39.1.0-py3.6.egg\setuptools\command\bdist_egg.py", line 172, in run
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\setuptools-39.1.0-py3.6.egg\setuptools\command\bdist_egg.py", line 158, in call_co
mmand
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\setuptools-39.1.0-py3.6.egg\setuptools\command\install_lib.py", line 11, in run
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\command\install_lib.py", line 107, in build
    self.run_command('build_ext')
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\dist.py", line 974, in run_command
    cmd_obj.run()
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\distutils\command\build_ext.py", line 262, in run
    self.build_extensions()
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\command\build_ext.py", line 448, in build_extensions
    self._build_extensions_serial()
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\command\build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\distutils\command\build_ext.py", line 370, in build_extension
    **kws)
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\_msvccompiler.py", line 423, in compile
    self.spawn(args)
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\distutils\_msvccompiler.py", line 542, in spawn
    return super().spawn(cmd)
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\distutils\ccompiler.py", line 89, in <lambda>
    m = lambda self, *args, **kw: func(self, *args, **kw)
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\distutils\ccompiler.py", line 139, in CCompiler_spawn
    s, o = exec_command(cmd)
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\distutils\exec_command.py", line 213, in exec_command
    **env)
  File "C:\Users\talat\PycharmProjects\SurvivalAnalysis_Lifetime_Example\lib\site-packages\numpy\distutils\exec_command.py", line 256, in _exec_command
    text, err = proc.communicate()
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 830, in communicate
    stdout = self.stdout.read()
  File "C:\Users\talat\AppData\Local\Programs\Python\Python36\lib\encodings\cp1254.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 322: character maps to <undefined>

CoxPH SurvivalAnalysis and Singular Matrix Error

I'm going through the tutorial using the veterans lung cancer study and I am using the same code for my own dataset for Cox regression. My problem is to calculating the days to graft failure after a transplant and the dataset has about 900 features after encoding and other preprocessing steps and it has 130K rows. I prepared data for Cox regression (data_x is a dataframe and data_y is a numpy array of status and suvival_in_days) and took a sample of it to run. However when I run the CoxRegression, I am getting the error of:
LinAlgError:Matrix is Singular
I manipulated my data in different ways, but I could not understand what is the problem and how to solve it.

Move datasets into a separate package

Since the datasets may be useful for other packages too.

Add CoxnetSurvivalAnalysisCV to search for optimal alpha via cross-validation

Curious to know if it is possible to extract the sparse features in a Cox model from L1 models.

when applying this:

pipe = make_pipeline(
    StandardScaler(),
    CoxnetSurvivalAnalysis(l1_ratio=1)
)

sfm = SelectFromModel(pipe)
sfm.fit(X_train, y_train)

n_features = sfm.transform(X).shape[1]

... I get:

ValueError: The underlying estimator Pipeline has no `coef_` or `feature_importances_` attribute.
Either pass a fitted estimator to SelectFromModel or call fit before calling transform.

source

Thanks for all the input!

[edited]

Explain how to interpret output of .predict() in API doc

(I also posted this as a question on Stack Overflow: https://stackoverflow.com/q/47274356/1870832 )

I'm confused how to interpret the output of .predict from a fitted CoxnetSurvivalAnalysis model in scikit-survival. I've read through the notebook Intro to Survival Analysis in scikit-survival and the API reference, but can't find an explanation. Below is a minimal example of what leads to my confusion:

import pandas as pd
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.linear_model import CoxnetSurvivalAnalysis

# load data
data_X, data_y = load_veterans_lung_cancer()

# one-hot-encode categorical columns in X
categorical_cols = ['Celltype', 'Prior_therapy', 'Treatment']

X = data_X.copy()
for c in categorical_cols:
    dummy_matrix = pd.get_dummies(X[c], prefix=c, drop_first=False)
    X = pd.concat([X, dummy_matrix], axis=1).drop(c, axis=1)

# display final X to fit Cox Elastic Net model on
del data_X
print(X.head(3))

so here's the X going into the model:

   Age_in_years  Celltype  Karnofsky_score  Months_from_Diagnosis  \
0          69.0  squamous             60.0                    7.0   
1          64.0  squamous             70.0                    5.0   
2          38.0  squamous             60.0                    3.0   

  Prior_therapy Treatment  
0            no  standard  
1           yes  standard  
2            no  standard

...moving on to fitting model and generating predictions:

# Fit Model
coxnet_model = CoxnetSurvivalAnalysis()
coxnet.fit(X, data_y)    

# What are these predictions?    
preds = coxnet.predict(X)

preds has same number of records as X, but their values are wayyy different than the values in data_y, even when predicted on the same data they were fit on.

print(preds.mean()) 
print(data_y['Survival_in_days'].mean())

output:

-0.044114643249153422
121.62773722627738

So what exactly are preds? Clearly .predict means something pretty different here than in scikit-learn, but I can't figure out what. The API Reference says it returns "The predicted decision function," but what does that mean? And how do I get to the predicted estimate in months yhat for a given X? I'm new to survival analysis so I'm obviously missing something.

Add predict_cumulative_hazard_function() and predict_survival_function() to all algorithms

CoxPHSurvivalAnalysis has these 2 methods, predict_cumulative_hazard_function() and predict_survival_function(). Please add them to CoxnetSurvivalAnalysis, GradientBoostingSurvivalAnalysis, FastSurvivalSVM, etc.

I looked up the source code in coxph.py, but would appreciate pointers on how to implement the same thing for other classes, thanks for a great package!

Support for scikit-learn 0.20

issue when try to install scikit-survival in windows 10

Hi, I have no issue when I install it in my Linux, but at the universityI am working in a Windows Machine and it is no way I can go around this main issue that is that it is that the command to install from conda does not work for me. It is that I need to change the channel? I try install pip and unfortunately it did not work, could you share the binary? I think it would be the only way to go around this.

Any way, I want to congratulate the team that is working on this ! good job!

It is possible to plug here xgboost, but does it worth?

On my task it was even a bit slower than sklearn, since the bottleneck is not in tree boosting. But it can deal with missing features.

# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
import numbers

import numpy

from sklearn.base import BaseEstimator
from sklearn.ensemble._gradient_boosting import _random_sample_mask
from sklearn.ensemble.base import BaseEnsemble
from sklearn.ensemble.gradient_boosting import BaseGradientBoosting, VerboseReporter
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree._tree import DTYPE
import xgboost

class GradientBoostingSurvivalAnalysis(BaseGradientBoosting, SurvivalAnalysisMixin):
    """Gradient-boosted Cox proportional hazard loss with
    regression trees as base learner.

    Parameters
    ----------
    loss : {'coxph', 'squared', 'ipcwls'}, optional, default: 'coxph'
        loss function to be optimized. 'coxph' refers to partial likelihood loss
        of Cox's proportional hazards model. The loss 'squared' minimizes a
        squared regression loss that ignores predictions beyond the time of censoring,
        and 'ipcwls' refers to inverse-probability of censoring weighted least squares error.

    learning_rate : float, optional, default: 0.1
        learning rate shrinks the contribution of each tree by `learning_rate`.
        There is a trade-off between learning_rate and n_estimators.

    n_estimators : int, default: 100
        The number of boosting stages to perform. Gradient boosting
        is fairly robust to over-fitting so a large number usually
        results in better performance.

    max_depth : integer, optional, default: 3
        maximum depth of the individual regression estimators. The maximum
        depth limits the number of nodes in the tree. Tune this parameter
        for best performance; the best value depends on the interaction
        of the input variables.
        Ignored if ``max_leaf_nodes`` is not None.

    min_samples_split : integer, optional, default: 2
        The minimum number of samples required to split an internal node.

    min_samples_leaf : integer, optional, default: 1
        The minimum number of samples required to be at a leaf node.

    min_weight_fraction_leaf : float, optional, default: 0.
        The minimum weighted fraction of the input samples required to be at a
        leaf node.

    max_features : int, float, string or None, optional, default: None
        The number of features to consider when looking for the best split:
          - If int, then consider `max_features` features at each split.
          - If float, then `max_features` is a percentage and
            `int(max_features * n_features)` features are considered at each
            split.
          - If "auto", then `max_features=n_features`.
          - If "sqrt", then `max_features=sqrt(n_features)`.
          - If "log2", then `max_features=log2(n_features)`.
          - If None, then `max_features=n_features`.

        Choosing `max_features < n_features` leads to a reduction of variance
        and an increase in bias.

        Note: the search for a split does not stop until at least one
        valid partition of the node samples is found, even if it requires to
        effectively inspect more than ``max_features`` features.

    max_leaf_nodes : int or None, optional, default: None
        Grow trees with ``max_leaf_nodes`` in best-first fashion.
        Best nodes are defined as relative reduction in impurity.
        If None then unlimited number of leaf nodes.

    subsample : float, optional, default: 1.0
        The fraction of samples to be used for fitting the individual base
        learners. If smaller than 1.0 this results in Stochastic Gradient
        Boosting. `subsample` interacts with the parameter `n_estimators`.
        Choosing `subsample < 1.0` leads to a reduction of variance
        and an increase in bias.

    dropout_rate : float, optional, default: 0.0
        If larger than zero, the residuals at each iteration are only computed
        from a random subset of base learners. The value corresponds to the
        percentage of base learners that are dropped. In each iteration,
        at least one base learner is dropped. This is an alternative regularization
        to shrinkage, i.e., setting `learning_rate < 1.0`.

    random_state : int seed, RandomState instance, or None, default: None
        The seed of the pseudo random number generator to use when
        shuffling the data.

    verbose : int, default: 0
        Enable verbose output. If 1 then it prints progress and performance
        once in a while (the more trees the lower the frequency). If greater
        than 1 then it prints progress and performance for every tree.


    Attributes
    ----------
    feature_importances\_ : ndarray, shape = (n_features,)
        The feature importances (the higher, the more important the feature).

    estimators_ : ndarray of DecisionTreeRegressor, shape = (n_estimators, 1)
        The collection of fitted sub-estimators.

    train_score_ : ndarray, shape = (n_estimators,)
        The i-th score ``train_score_[i]`` is the deviance (= loss) of the
        model at iteration ``i`` on the in-bag sample.
        If ``subsample == 1`` this is the deviance on the training data.

    oob_improvement_ : ndarray, shape = (n_estimators,)
        The improvement in loss (= deviance) on the out-of-bag samples
        relative to the previous iteration.
        ``oob_improvement_[0]`` is the improvement in
        loss of the first stage over the ``init`` estimator.
    """
    def __init__(self, loss="coxph", learning_rate=0.1, n_estimators=100,
                 criterion='friedman_mse',
                 min_samples_split=2,
                 min_samples_leaf=1, min_weight_fraction_leaf=0.,
                 max_depth=3, min_impurity_split=None,
                 min_impurity_decrease=0., random_state=None,
                 max_features=None, max_leaf_nodes=None,
                 subsample=1.0, dropout_rate=0.0,
                 verbose=0):
        super().__init__(loss=loss,
                         learning_rate=learning_rate,
                         n_estimators=n_estimators,
                         subsample=subsample,
                         criterion=criterion,
                         min_samples_split=min_samples_split,
                         min_samples_leaf=min_samples_leaf,
                         min_weight_fraction_leaf=min_weight_fraction_leaf,
                         max_depth=max_depth,
                         min_impurity_split=min_impurity_split,
                         min_impurity_decrease=min_impurity_decrease,
                         init=ZeroSurvivalEstimator(),
                         random_state=random_state,
                         max_features=max_features,
                         max_leaf_nodes=max_leaf_nodes,
                         verbose=verbose)
        self.dropout_rate = dropout_rate

    def _check_params(self):
        """Check validity of parameters and raise ValueError if not valid. """
        self.n_estimators = int(self.n_estimators)
        if self.n_estimators <= 0:
            raise ValueError("n_estimators must be greater than 0 but "
                             "was %r" % self.n_estimators)

        if not 0.0 < self.learning_rate <= 1.0:
            raise ValueError("learning_rate must be within ]0; 1] but "
                             "was %r" % self.learning_rate)

        if not 0.0 < self.subsample <= 1.0:
            raise ValueError("subsample must be in ]0; 1] but "
                             "was %r" % self.subsample)

        if not 0.0 <= self.dropout_rate < 1.0:
            raise ValueError("dropout_rate must be within [0; 1[, but "
                             "was %r" % self.dropout_rate)

        if isinstance(self.max_features, str):
            if self.max_features == "auto":
                max_features = self.n_features_
            elif self.max_features == "sqrt":
                max_features = max(1, int(numpy.sqrt(self.n_features_)))
            elif self.max_features == "log2":
                max_features = max(1, int(numpy.log2(self.n_features_)))
            else:
                raise ValueError("Invalid value for max_features: %r. "
                                 "Allowed string values are 'auto', 'sqrt' "
                                 "or 'log2'." % self.max_features)
        elif self.max_features is None:
            max_features = self.n_features_
        elif isinstance(self.max_features, (numbers.Integral, numpy.integer)):
            if self.max_features < 1:
                raise ValueError("max_features must be in (0, n_features]")
            max_features = self.max_features
        else:  # float
            if 0. < self.max_features <= 1.:
                max_features = max(int(self.max_features * self.n_features_), 1)
            else:
                raise ValueError("max_features must be in (0, 1.0]")

        self.min_samples_split = int(self.min_samples_split)
        self.min_samples_leaf = int(self.min_samples_leaf)
        self.max_depth = int(self.max_depth)
        if self.max_leaf_nodes:
            self.max_leaf_nodes = int(self.max_leaf_nodes)

        self.max_features_ = max_features

        if self.loss not in LOSS_FUNCTIONS:
            raise ValueError("Loss '{0:s}' not supported. ".format(self.loss))

    def _fit_stage(self, i, X, y, y_pred, sample_weight, sample_mask,
                   random_state, scale, X_idx_sorted, X_csc=None, X_csr=None):
        """Fit another stage of ``n_classes_`` trees to the boosting model. """

        assert sample_mask.dtype == numpy.bool
        loss = self.loss_

        # whether to use dropout in next iteration
        do_dropout = self.dropout_rate > 0. and 0 < i < len(scale) - 1

        for k in range(loss.K):
            residual = loss.negative_gradient(y, y_pred, k=k,
                                              sample_weight=sample_weight)

            # induce regression tree on residuals
            tree = xgboost.sklearn.XGBRegressor(
                criterion='friedman_mse',
                splitter='best',
                max_depth=self.max_depth,
                min_samples_split=self.min_samples_split,
                min_samples_leaf=self.min_samples_leaf,
                min_weight_fraction_leaf=self.min_weight_fraction_leaf,
                min_impurity_split=self.min_impurity_split,
                max_features=self.max_features,
                max_leaf_nodes=self.max_leaf_nodes)

            if self.subsample < 1.0:
                # no inplace multiplication!
                sample_weight = sample_weight * sample_mask.astype(numpy.float64)

            if X_csc is not None:
                tree.fit(X_csc, residual, sample_weight=sample_weight,
                         #check_input=False, X_idx_sorted=X_idx_sorted)
                        )
            else:
                tree.fit(X, residual, sample_weight=sample_weight,
                         #check_input=False, X_idx_sorted=X_idx_sorted)
                        )

            # add tree to ensemble
            self.estimators_[i, k] = tree

            # update tree leaves
            if do_dropout:
                # select base learners to be dropped for next iteration
                drop_model, n_dropped = _sample_binomial_plus_one(self.dropout_rate, i + 1, random_state)

                # adjust scaling factor of tree that is going to be trained in next iteration
                scale[i + 1] = 1. / (n_dropped + 1.)

                y_pred[:, k] = 0
                for m in range(i + 1):
                    if drop_model[m] == 1:
                        # adjust scaling factor of dropped trees
                        scale[m] *= n_dropped / (n_dropped + 1.)
                    else:
                        # pseudoresponse of next iteration (without contribution of dropped trees)
                        y_pred[:, k] += self.learning_rate * scale[m] * self.estimators_[m, k].predict(X).ravel()
            else:
                # update tree leaves
                if X_csr is not None:
                    loss.update_terminal_regions(tree, X_csr, y, residual, y_pred,
                                                 sample_weight, sample_mask,
                                                 self.learning_rate, k=k)
                else:
                    loss.update_terminal_regions(tree, X, y, residual, y_pred,
                                                 sample_weight, sample_mask,
                                                 self.learning_rate, k=k)

        return y_pred

    def _fit_stages(self, X, y, y_pred, sample_weight, random_state,
                    begin_at_stage=0, monitor=None, X_idx_sorted=None):
        """Iteratively fits the stages.

        For each stage it computes the progress (OOB, train score)
        and delegates to ``_fit_stage``.
        Returns the number of stages fit; might differ from ``n_estimators``
        due to early stopping.
        """
        n_samples = X.shape[0]
        do_oob = self.subsample < 1.0
        sample_mask = numpy.ones((n_samples, ), dtype=numpy.bool)
        n_inbag = max(1, int(self.subsample * n_samples))
        loss_ = self.loss_

        if self.verbose:
            verbose_reporter = VerboseReporter(self.verbose)
            verbose_reporter.init(self, begin_at_stage)

        X_csc = csc_matrix(X) if issparse(X) else None
        X_csr = csr_matrix(X) if issparse(X) else None

        if self.dropout_rate > 0.:
            scale = numpy.ones(self.n_estimators, dtype=float)
        else:
            scale = None

        # perform boosting iterations
        i = begin_at_stage
        for i in range(begin_at_stage, self.n_estimators):

            # subsampling
            if do_oob:
                sample_mask = _random_sample_mask(n_samples, n_inbag,
                                                  random_state)
                # OOB score before adding this stage
                y_oob_sample = y[~sample_mask]
                old_oob_score = loss_(y_oob_sample,
                                      y_pred[~sample_mask],
                                      sample_weight[~sample_mask])

            # fit next stage of trees
            y_pred = self._fit_stage(i, X, y, y_pred, sample_weight,
                                     sample_mask, random_state, scale, X_idx_sorted,
                                     X_csc, X_csr)

            # track deviance (= loss)
            if do_oob:
                self.train_score_[i] = loss_(y[sample_mask],
                                             y_pred[sample_mask],
                                             sample_weight[sample_mask])
                self.oob_improvement_[i] = (old_oob_score - loss_(y_oob_sample, y_pred[~sample_mask],
                                                                  sample_weight[~sample_mask]))
            else:
                # no need to fancy index w/ no subsampling
                self.train_score_[i] = loss_(y, y_pred, sample_weight)

            if self.verbose > 0:
                verbose_reporter.update(i, self)

            if monitor is not None:
                early_stopping = monitor(i, self, locals())
                if early_stopping:
                    break

        if self.dropout_rate > 0.:
            self.scale_ = scale

        return i + 1

    def fit(self, X, y, sample_weight=None, monitor=None):
        """Fit the gradient boosting model.

        Parameters
        ----------
        X : array-like, shape = (n_samples, n_features)
            Data matrix

        y : structured array, shape = (n_samples,)
            A structured array containing the binary event indicator
            as first field, and time of event or time of censoring as
            second field.

        sample_weight : array-like, shape = (n_samples,), optional
            Weights given to each sample. If omitted, all samples have weight 1.

        monitor : callable, optional
            The monitor is called after each iteration with the current
            iteration, a reference to the estimator and the local variables of
            ``_fit_stages`` as keyword arguments ``callable(i, self,
            locals())``. If the callable returns ``True`` the fitting procedure
            is stopped. The monitor can be used for various things such as
            computing held-out estimates, early stopping, model introspect, and
            snapshoting.

        Returns
        -------
        self : object
            Returns self.
        """
        random_state = check_random_state(self.random_state)

        X, event, time = check_arrays_survival(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)
        n_samples, self.n_features_ = X.shape

        X = X.astype(DTYPE)
        if sample_weight is None:
            sample_weight = numpy.ones(n_samples, dtype=numpy.float32)
        else:
            sample_weight = column_or_1d(sample_weight, warn=True)
            check_consistent_length(X, sample_weight)

        self._check_params()

        self.loss_ = LOSS_FUNCTIONS[self.loss](1)
        if isinstance(self.loss_, (CensoredSquaredLoss, IPCWLeastSquaresError)):
            time = numpy.log(time)

        self._init_state()
        self.init_.fit(X, (event, time), sample_weight)
        y_pred = self.init_.predict(X)
        begin_at_stage = 0

        # fit the boosting stages
        y = numpy.fromiter(zip(event, time), dtype=[('event', numpy.bool), ('time', numpy.float64)])
        n_stages = self._fit_stages(X, y, y_pred, sample_weight, random_state,
                                    begin_at_stage, monitor)
        # change shape of arrays after fit (early-stopping or additional tests)
        if n_stages != self.estimators_.shape[0]:
            self.estimators_ = self.estimators_[:n_stages]
            self.train_score_ = self.train_score_[:n_stages]
            if hasattr(self, 'oob_improvement_'):
                self.oob_improvement_ = self.oob_improvement_[:n_stages]

        return self

    def _dropout_predict_stage(self, X, i, K, score):
        for k in range(K):
            tree = self.estimators_[i, k]
            score += self.learning_rate * self.scale_[i] * tree.predict(X).reshape((X.shape[0], 1))
        return score

    def _dropout_decision_function(self, X):
        score = self.init_.predict(X).astype(numpy.float64)

        n_estimators, K = self.estimators_.shape
        for i in range(n_estimators):
            self._dropout_predict_stage(X, i, K, score)

        return score

    def _dropout_staged_decision_function(self, X):
        X = check_array(X, dtype=DTYPE, order="C")
        score = self._init_decision_function(X)

        n_estimators, K = self.estimators_.shape
        for i in range(n_estimators):
            self._dropout_predict_stage(X, i, K, score)
            yield score.copy()

    def _scale_prediction(self, score):
        if isinstance(self.loss_, (CensoredSquaredLoss, IPCWLeastSquaresError)):
            numpy.exp(score, out=score)
        return score

    def _decision_function(self, X):
        # if dropout wasn't used during training, proceed as usual,
        # otherwise consider scaling factor of individual trees
        if not hasattr(self, "scale_"):
            score = super()._decision_function(X)
        else:
            score = self._dropout_decision_function(X)

        return self._scale_prediction(score)

    def predict(self, X):
        """Predict risk scores.

        Parameters
        ----------
        X : array-like, shape = (n_samples, n_features)
            The input samples.

        Returns
        -------
        y : ndarray, shape = (n_samples,)
            The risk scores.
        """
        check_is_fitted(self, 'estimators_')

        X = check_array(X, dtype=DTYPE, order="C")
        score = self._decision_function(X)
        if score.shape[1] == 1:
            score = score.ravel()

        return score

    def staged_predict(self, X):
        """Predict hazard at each stage for X.

        This method allows monitoring (i.e. determine error on testing set)
        after each stage.

        Parameters
        ----------
        X : array-like, shape = (n_samples, n_features)
            The input samples.

        Returns
        -------
        y : generator of array of shape = (n_samples,)
            The predicted value of the input samples.
        """
        check_is_fitted(self, 'estimators_')

        # if dropout wasn't used during training, proceed as usual,
        # otherwise consider scaling factor of individual trees
        if not hasattr(self, "scale_"):
            for y in self._staged_decision_function(X):
                yield self._scale_prediction(y.ravel())
        else:
            for y in self._dropout_staged_decision_function(X):
                yield self._scale_prediction(y.ravel())

CoxPHSurvivalAnalysis().predict_survival_function() at unseen event times

Similar to lifelines implementation below:

from lifelines import CoxPHFitter

df = pd.DataFrame({
"var_a": [0, 0, 1, 1],
"var_b": [0, 0, 1, 1],
"tenure": [236, 112, 89, 678],
"terminated": [0, 1, 1, 1]
})

cph = CoxPHFitter()
cph.fit(df, duration_col='tenure', event_col='terminated')
cph.predict_survival_function(df, times=[450])
...
# retruns the survival_func for that time.
 | 0|1|2|3
450 | 1.0 | 0.0 | 0.0 | 0.616205

Is it possible to get the predict_survival_function for an unseen event time?

Currently I am able to get this:

...
estimator.predict_survival_function(X)
array([ StepFunction(x=array([  1,  ...]), y=array([ 0.99352964, ...])

But not for any event time specifically.

Build error on windows: unable to find vcarsall.bat

Hello,
I have Python 3.6 on Windows 10 and VS 2012. I assume I should be able to use it but get error running below command:

from sksurv.svm import FastSurvivalSVM

the error message:

i tried to modify _msvccompiler.py but it seems like it is taking the setting for VS from registry.

Please advise!
Thank you

Bus error on linux

Some tests on linux (on my environment) cause crash of the interpreter with bus error. Also non tests, but actual using of GradientBoostingSurvivalAnalysis for my research also causes bus error.

sksurv was built with gcc with -Og to troubleshoot the crash. sklearn and cython are from git.

Here is the log of one of such tests (I have removed the tests which have passed).

nose.core: DEBUG: test loader is <nose.loader.TestLoader object at 0x40ca8970>
nose.core: DEBUG: defaultTest .
nose.core: DEBUG: Test names are ['tests.test_boosting.TestComponentwiseGradientBoosting']
nose.core: DEBUG: createTests called with None
nose.loader: DEBUG: load from tests.test_boosting.TestComponentwiseGradientBoosting (None)
nose.selector: DEBUG: Test name tests.test_boosting.TestComponentwiseGradientBoosting resolved to file None, module tests.test_boosting.TestComponentwiseGradientBoosting, call None
nose.selector: DEBUG: Final resolution of test name tests.test_boosting.TestComponentwiseGradientBoosting: file None module tests.test_boosting.TestComponentwiseGradientBoosting call None
nose: DEBUG: __import__ tests.test_boosting.TestComponentwiseGradientBoosting
nose: DEBUG: __import__ tests.test_boosting.TestComponentwiseGradientBoosting
nose: DEBUG: resolve: ['test_boosting', 'TestComponentwiseGradientBoosting'], tests.test_boosting.TestComponentwiseGradientBoosting, <module 'tests' (namespace)>, <module 'tests' (namespace)>
nose.loader: DEBUG: Load from module <class 'tests.test_boosting.TestComponentwiseGradientBoosting'>
nose.selector: DEBUG: wantFunction <function TestCase.__call__ at 0x40570c00>? False
nose.selector: DEBUG: wantClass <class 'type'>? None
nose.selector: DEBUG: wantFunction <function TestCase.__eq__ at 0x405708a0>? False
nose.selector: DEBUG: wantFunction <function TestCase.__hash__ at 0x405708e8>? False
nose.selector: DEBUG: wantFunction <function TestCase.__init__ at 0x40570588>? False
nose.selector: DEBUG: wantFunction <function TestCase.__repr__ at 0x40570978>? False
nose.selector: DEBUG: wantFunction <function TestCase.__str__ at 0x40570930>? False
nose.selector: DEBUG: wantFunction <function TestCase._addExpectedFailure at 0x40570ae0>? False
nose.selector: DEBUG: wantFunction <function TestCase._addSkip at 0x405709c0>? False
nose.selector: DEBUG: wantFunction <function TestCase._addUnexpectedSuccess at 0x40570b28>? False
nose.selector: DEBUG: wantFunction <function TestCase._baseAssertEqual at 0x40570f18>? False
nose.selector: DEBUG: wantFunction <function TestCase._deprecate at 0x405717c8>? False
nose.selector: DEBUG: wantFunction <function TestCase._feedErrorsToResult at 0x40570a98>? False
nose.selector: DEBUG: wantFunction <function TestCase._formatMessage at 0x40570db0>? False
nose.selector: DEBUG: wantFunction <function TestCase._getAssertEqualityFunc at 0x40570ed0>? False
nose.selector: DEBUG: wantFunction <function TestCase._truncateMessage at 0x40571108>? False
nose.selector: DEBUG: wantFunction <function TestCase.addCleanup at 0x40570618>? None
nose.selector: DEBUG: wantFunction <function TestCase.addTypeEqualityFunc at 0x405705d0>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertAlmostEqual at 0x40571030>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x405718a0>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertCountEqual at 0x405713d8>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertDictContainsSubset at 0x40571390>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertDictEqual at 0x40571348>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertEqual at 0x40570f60>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x40571810>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertFalse at 0x40570d20>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertGreater at 0x405714f8>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertGreaterEqual at 0x40571540>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertIn at 0x40571228>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertIs at 0x405712b8>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertIsInstance at 0x40571618>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertIsNone at 0x40571588>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertIsNot at 0x40571300>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertIsNotNone at 0x405715d0>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertLess at 0x40571468>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertLessEqual at 0x405714b0>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertListEqual at 0x40571150>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertLogs at 0x40570e88>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertMultiLineEqual at 0x40571420>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertNotAlmostEqual at 0x40571078>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x405718e8>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertNotEqual at 0x40570fa8>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x40571858>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertNotIn at 0x40571270>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertNotIsInstance at 0x40571660>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertNotRegex at 0x40571780>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertRaises at 0x40570df8>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertRaisesRegex at 0x405716a8>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x40571a08>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertRegex at 0x40571738>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x40571a50>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertSequenceEqual at 0x405710c0>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertSetEqual at 0x405711e0>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertTrue at 0x40570d68>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertTupleEqual at 0x40571198>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertWarns at 0x40570e40>? None
nose.selector: DEBUG: wantFunction <function TestCase.assertWarnsRegex at 0x405716f0>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x40571930>? None
nose.selector: DEBUG: wantFunction <function TestCase.countTestCases at 0x40570780>? None
nose.selector: DEBUG: wantFunction <function TestCase.debug at 0x40570c48>? None
nose.selector: DEBUG: wantFunction <function TestCase.defaultTestResult at 0x405707c8>? None
nose.selector: DEBUG: wantFunction <function TestCase.doCleanups at 0x40570bb8>? None
nose.selector: DEBUG: wantFunction <function TestCase.fail at 0x40570cd8>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x405719c0>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x405718e8>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x40571858>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x40571930>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x405718a0>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x40571810>? None
nose.selector: DEBUG: wantFunction <function TestCase._deprecate.<locals>.deprecated_func at 0x40571978>? None
nose.selector: DEBUG: wantClass <class 'AssertionError'>? None
nose.selector: DEBUG: wantFunction <function TestCase.id at 0x40570858>? None
nose.selector: DEBUG: wantFunction <function TestCase.run at 0x40570b70>? None
nose.selector: DEBUG: wantFunction <function TestComponentwiseGradientBoosting.setUp at 0x470448e8>? None
nose.selector: DEBUG: wantFunction <function TestCase.shortDescription at 0x40570810>? None
nose.selector: DEBUG: wantFunction <function TestCase.skipTest at 0x40570c90>? None
nose.selector: DEBUG: wantFunction <function TestCase.subTest at 0x40570a50>? None
nose.selector: DEBUG: wantFunction <function TestCase.tearDown at 0x405706a8>? None
nose.selector: DEBUG: wantFunction <function TestComponentwiseGradientBoosting.test_feature_importances at 0x47044390>? True
nose.suite: DEBUG: Create suite for <nose.suite.ContextList object at 0x40ca8df0>
nose.suite: DEBUG: tests <nose.suite.ContextList object at 0x40ca8df0> context <class 'tests.test_boosting.TestComponentwiseGradientBoosting'>
nose.suite: DEBUG: Context suite for <nose.suite.ContextList object at 0x40ca8df0> (<class 'tests.test_boosting.TestComponentwiseGradientBoosting'>) (1194290256)
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=TestComponentwiseGradientBoosting> has context TestComponentwiseGradientBoosting
nose.suite: DEBUG: get ancestry <class 'tests.test_boosting.TestComponentwiseGradientBoosting'>
nose.suite: DEBUG:  <class 'tests.test_boosting.TestComponentwiseGradientBoosting'> ancestors ['tests', 'test_boosting']
nose: DEBUG: __import__ tests.test_boosting
nose: DEBUG: resolve: ['test_boosting'], tests.test_boosting, <module 'tests' (namespace)>, <module 'tests' (namespace)>
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=TestComponentwiseGradientBoosting> has ancestor tests.test_boosting
nose.suite: DEBUG:  <class 'tests.test_boosting.TestComponentwiseGradientBoosting'> ancestors ['tests']
nose: DEBUG: __import__ tests
nose: DEBUG: resolve: [], tests, <module 'tests' (namespace)>, <module 'tests' (namespace)>
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=TestComponentwiseGradientBoosting> has ancestor tests
nose.loader: DEBUG: load from test_feature_importances (None)
nose.selector: DEBUG: Test name test_feature_importances resolved to file None, module test_feature_importances, call None
nose.selector: DEBUG: Final resolution of test name test_feature_importances: file None module test_feature_importances call None
nose: DEBUG: __import__ test_feature_importances
nose.failure: DEBUG: A failure! <class 'ImportError'> No module named 'test_feature_importances' ['  File "***/dist-packages/nose/loader.py", line 407, in loadTestsFromName\n    module = resolve_name(addr.module)\n', '  File "***/dist-packages/nose/util.py", line 312, in resolve_name\n    module = __import__(\'.\'.join(parts_copy))\n']
nose.suite: DEBUG: Create suite for [<nose.failure.Failure testMethod=runTest>]
nose.suite: DEBUG: tests [<nose.failure.Failure testMethod=runTest>] context None
nose.suite: DEBUG: wrap [<nose.failure.Failure testMethod=runTest>]
nose.suite: DEBUG: wrapping Failure: ImportError (No module named 'test_feature_importances')
nose.suite: DEBUG: Context suite for [Test(<nose.failure.Failure testMethod=runTest>)] (<class 'nose.failure.Failure'>) (1087144304)
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=Failure> has context Failure
nose.suite: DEBUG: get ancestry <class 'nose.failure.Failure'>
nose.suite: DEBUG:  <class 'nose.failure.Failure'> ancestors ['nose', 'failure']
nose: DEBUG: __import__ nose.failure
nose: DEBUG: resolve: ['failure'], nose.failure, <module 'nose' from '***/dist-packages/nose/__init__.py'>, <module 'nose' from '***/dist-packages/nose/__init__.py'>
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=Failure> has ancestor nose.failure
nose.suite: DEBUG:  <class 'nose.failure.Failure'> ancestors ['nose']
nose: DEBUG: __import__ nose
nose: DEBUG: resolve: [], nose, <module 'nose' from '***/dist-packages/nose/__init__.py'>, <module 'nose' from '***/dist-packages/nose/__init__.py'>
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=Failure> has ancestor nose
nose.suite: DEBUG: Create suite for [<nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>, <nose.suite.ContextSuite context=Failure>]
nose.suite: DEBUG: tests [<nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>, <nose.suite.ContextSuite context=Failure>] context None
nose.suite: DEBUG: wrap [<nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>, <nose.suite.ContextSuite context=Failure>]
nose.suite: DEBUG: wrapping <nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>
nose.suite: DEBUG: wrapping <nose.suite.ContextSuite context=Failure>
nose.suite: DEBUG: get ancestry <class 'tests.test_boosting.TestComponentwiseGradientBoosting'>
nose.suite: DEBUG:  <class 'tests.test_boosting.TestComponentwiseGradientBoosting'> ancestors ['tests', 'test_boosting']
nose: DEBUG: __import__ tests.test_boosting
nose: DEBUG: resolve: ['test_boosting'], tests.test_boosting, <module 'tests' (namespace)>, <module 'tests' (namespace)>
nose.suite: DEBUG:  <class 'tests.test_boosting.TestComponentwiseGradientBoosting'> ancestors ['tests']
nose: DEBUG: __import__ tests
nose: DEBUG: resolve: [], tests, <module 'tests' (namespace)>, <module 'tests' (namespace)>
nose.suite: DEBUG: get ancestry <class 'nose.failure.Failure'>
nose.suite: DEBUG:  <class 'nose.failure.Failure'> ancestors ['nose', 'failure']
nose: DEBUG: __import__ nose.failure
nose: DEBUG: resolve: ['failure'], nose.failure, <module 'nose' from '***/dist-packages/nose/__init__.py'>, <module 'nose' from '***/dist-packages/nose/__init__.py'>
nose.suite: DEBUG:  <class 'nose.failure.Failure'> ancestors ['nose']
nose: DEBUG: __import__ nose
nose: DEBUG: resolve: [], nose, <module 'nose' from '***/dist-packages/nose/__init__.py'>, <module 'nose' from '***/dist-packages/nose/__init__.py'>
nose.suite: DEBUG: Context suite for [<nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>] (<class 'tests.test_boosting.TestComponentwiseGradientBoosting'>) (1169485488)
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=TestComponentwiseGradientBoosting> has context TestComponentwiseGradientBoosting
nose.suite: DEBUG: get ancestry <class 'tests.test_boosting.TestComponentwiseGradientBoosting'>
nose.suite: DEBUG:  <class 'tests.test_boosting.TestComponentwiseGradientBoosting'> ancestors ['tests', 'test_boosting']
nose: DEBUG: __import__ tests.test_boosting
nose: DEBUG: resolve: ['test_boosting'], tests.test_boosting, <module 'tests' (namespace)>, <module 'tests' (namespace)>
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=TestComponentwiseGradientBoosting> has ancestor tests.test_boosting
nose.suite: DEBUG:  <class 'tests.test_boosting.TestComponentwiseGradientBoosting'> ancestors ['tests']
nose: DEBUG: __import__ tests
nose: DEBUG: resolve: [], tests, <module 'tests' (namespace)>, <module 'tests' (namespace)>
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=TestComponentwiseGradientBoosting> has ancestor tests
nose.suite: DEBUG: get ancestry <class 'nose.failure.Failure'>
nose.suite: DEBUG:  <class 'nose.failure.Failure'> ancestors ['nose', 'failure']
nose: DEBUG: __import__ nose.failure
nose: DEBUG: resolve: ['failure'], nose.failure, <module 'nose' from '/usr/local/lib/python3.4/dist-packages/nose/__init__.py'>, <module 'nose' from '/usr/local/lib/python3.4/dist-packages/nose/__init__.py'>
nose.suite: DEBUG:  <class 'nose.failure.Failure'> ancestors ['nose']
nose: DEBUG: __import__ nose
nose: DEBUG: resolve: [], nose, <module 'nose' from '/usr/local/lib/python3.4/dist-packages/nose/__init__.py'>, <module 'nose' from '/usr/local/lib/python3.4/dist-packages/nose/__init__.py'>
nose.suite: DEBUG: Context suite for [<nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>] (<module 'tests.test_boosting' from '***/tests/test_boosting.py'>) (1078604880)
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=tests.test_boosting> has context tests.test_boosting
nose.suite: DEBUG: get ancestry <module 'tests.test_boosting' from '***/tests/test_boosting.py'>
nose.suite: DEBUG:  <module 'tests.test_boosting' from '***/tests/test_boosting.py'> ancestors ['tests']
nose: DEBUG: __import__ tests
nose: DEBUG: resolve: [], tests, <module 'tests' (namespace)>, <module 'tests' (namespace)>
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=tests.test_boosting> has ancestor tests
nose.suite: DEBUG: get ancestry <class 'nose.failure.Failure'>
nose.suite: DEBUG:  <class 'nose.failure.Failure'> ancestors ['nose', 'failure']
nose: DEBUG: __import__ nose.failure
nose: DEBUG: resolve: ['failure'], nose.failure, <module 'nose' from '***/dist-packages/nose/__init__.py'>, <module 'nose' from '***/dist-packages/nose/__init__.py'>
nose.suite: DEBUG:  <class 'nose.failure.Failure'> ancestors ['nose']
nose: DEBUG: __import__ nose
nose: DEBUG: resolve: [], nose, <module 'nose' from '***/dist-packages/nose/__init__.py'>, <module 'nose' from '***/dist-packages/nose/__init__.py'>
nose.suite: DEBUG: Context suite for [<nose.suite.ContextSuite context=tests.test_boosting>] (<module 'tests' (namespace)>) (1191398640)
nose.suite: DEBUG: suite <nose.suite.ContextSuite context=tests> has context tests
nose.suite: DEBUG: get ancestry <module 'tests' (namespace)>
nose.suite: DEBUG: Context suite for [<nose.suite.ContextSuite context=tests>, <nose.suite.ContextSuite context=Failure>] (None) (1087016432)
nose.core: DEBUG: runTests called
nose.suite: DEBUG: suite 1087016432 (<nose.suite.ContextSuite context=None>) run called, tests: <generator object _get_wrapped_tests at 0x47315828>
nose.suite: DEBUG: suite 1087016432 setUp called, tests: <generator object _get_wrapped_tests at 0x47315918>
nose.suite: DEBUG: tests in 1087016432?
nose.suite: DEBUG: precache is [<nose.suite.ContextSuite context=tests>, <nose.suite.ContextSuite context=Failure>]
nose.suite: DEBUG: suite 1191398640 (<nose.suite.ContextSuite context=tests>) run called, tests: <generator object _get_wrapped_tests at 0x47315a80>
nose.suite: DEBUG: suite 1191398640 setUp called, tests: <generator object _get_wrapped_tests at 0x47315a80>
nose.suite: DEBUG: tests in 1191398640?
nose.suite: DEBUG: ancestor <module 'tests' (namespace)> may need setup
nose.suite: DEBUG: ancestor <module 'tests' (namespace)> does need setup
nose.suite: DEBUG: <nose.suite.ContextSuite context=tests> setup context <module 'tests' (namespace)>
nose.suite: DEBUG: completed suite setup
nose.suite: DEBUG: precache is [<nose.suite.ContextSuite context=tests.test_boosting>]
nose.suite: DEBUG: suite 1078604880 (<nose.suite.ContextSuite context=tests.test_boosting>) run called, tests: <generator object _get_wrapped_tests at 0x47315b20>
nose.suite: DEBUG: suite 1078604880 setUp called, tests: <generator object _get_wrapped_tests at 0x47315b20>
nose.suite: DEBUG: tests in 1078604880?
nose.suite: DEBUG: ancestor <module 'tests' (namespace)> may need setup
nose.suite: DEBUG: ancestor <module 'tests.test_boosting' from '***/tests/test_boosting.py'> may need setup
nose.suite: DEBUG: ancestor <module 'tests.test_boosting' from '***/tests/test_boosting.py'> does need setup
nose.suite: DEBUG: <nose.suite.ContextSuite context=tests.test_boosting> setup context <module 'tests.test_boosting' from '***/tests/test_boosting.py'>
nose.suite: DEBUG: completed suite setup
nose.suite: DEBUG: precache is [<nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>]
nose.suite: DEBUG: suite 1169485488 (<nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>) run called, tests: <generator object _get_wrapped_tests at 0x47315bc0>
nose.suite: DEBUG: suite 1169485488 setUp called, tests: <generator object _get_wrapped_tests at 0x47315bc0>
nose.suite: DEBUG: tests in 1169485488?
nose.suite: DEBUG: ancestor <module 'tests' (namespace)> may need setup
nose.suite: DEBUG: ancestor <module 'tests.test_boosting' from '***/tests/test_boosting.py'> may need setup
nose.suite: DEBUG: ancestor <class 'tests.test_boosting.TestComponentwiseGradientBoosting'> may need setup
nose.suite: DEBUG: ancestor <class 'tests.test_boosting.TestComponentwiseGradientBoosting'> does need setup
nose.suite: DEBUG: <nose.suite.ContextSuite context=TestComponentwiseGradientBoosting> setup context <class 'tests.test_boosting.TestComponentwiseGradientBoosting'>
nose: DEBUG: call fixture <class 'tests.test_boosting.TestComponentwiseGradientBoosting'>.setUpClass
nose.suite: DEBUG: completed suite setup
nose.suite: DEBUG: precache is [<nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>]
nose.suite: DEBUG: suite 1194290256 (<nose.suite.ContextSuite context=TestComponentwiseGradientBoosting>) run called, tests: <generator object _get_wrapped_tests at 0x47315c60>
nose.suite: DEBUG: suite 1194290256 setUp called, tests: <generator object _get_wrapped_tests at 0x47315c60>
nose.suite: DEBUG: tests in 1194290256?
nose.suite: DEBUG: ancestor <module 'tests' (namespace)> may need setup
nose.suite: DEBUG: ancestor <module 'tests.test_boosting' from '***/tests/test_boosting.py'> may need setup
nose.suite: DEBUG: ancestor <class 'tests.test_boosting.TestComponentwiseGradientBoosting'> may need setup
nose.suite: DEBUG: completed suite setup
nose.suite: DEBUG: precache is [<tests.test_boosting.TestComponentwiseGradientBoosting testMethod=test_feature_importances>]
test_feature_importances (tests.test_boosting.TestComponentwiseGradientBoosting) ...

Kaplan Meier output consistent regardless of time

Very useful article, thanks guys. I'm having problems getting the Kaplan Meier Estimator to give a meaningful output. I've saved my results in a record array as shown below, with the event of a cancellation happening as boolean and the number of days before cancellation / number of days so far if there has not yet been a cancellation as a float. I don't understand why the kaplan meier estimaor always predicts the cancellations to be at a steady rate of one regardless of the time, and I don't have the code for the Kaplan Meier Estimator to check. Has anyone else had this problem and how can it be solved?

Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis

The documentation of CoxPHSurvivalAnalysis says:

Cox proportional hazards model.

And the documentation of CoxnetSurvivalAnalysis says:

Cox's proportional hazard's model with elastic net penalty.

So I assume the two classes implement the same model, and should return the same results when set with the same model parameters and given the same data. However, I see different results. Why? Also, what are the differences between them?

Codes:

from sksurv.linear_model import CoxPHSurvivalAnalysis, CoxnetSurvivalAnalysis
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.preprocessing import OneHotEncoder

X_, y = load_veterans_lung_cancer()
X = OneHotEncoder().fit_transform(X_)

# try to match the model parameters wherever possible
f = CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000)
g = CoxnetSurvivalAnalysis(alphas=[0.5], alpha_min_ratio=1, n_alphas=1, 
                           l1_ratio=1e-16, tol=1e-09, normalize=False)

print(f)
print(g)

f.fit(X, y)
g.fit(X, y)

print(f.coef_)
print(g.coef_[:,0])

Output:

CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000, tol=1e-09, verbose=0)
CoxnetSurvivalAnalysis(alpha_min_ratio=0.0001, alphas=[0.5], copy_X=True,
            l1_ratio=1e-16, max_iter=100000, n_alphas=1, normalize=False,
            penalty_factor=None, tol=1e-09, verbose=False)
[-8.34518623e-03 -7.21105070e-01 -2.80434400e-01 -1.11234345e+00
 -3.26083027e-02 -1.93213436e-04  6.22726190e-02  2.90289950e-01]
[-0.00346722 -0.05117406  0.06044394 -0.16433136 -0.03300373  0.0003172
 -0.00881617  0.06956854]

What I've gathered:

CoxPHSurvivalAnalysis is sksurv's own implementation of Cox Proportional Hazard model, and supports ridge (L2) regularization.
CoxnetSurvivalAnalysis is a wrapper of some C++ extension codes used by R's glmnet package, and supports elastic net (L1 and L2) regularization.
In the test files, CoxPHSurvivalAnalysis is tested with the Rossi dataset, while CoxnetSurvivalAnalysis is tested with the Breast Cancer dataset.
The two classes have different constructor signatures and methods (eg, only CoxPHSurvivalAnalysis has predict_survival_function).

Will it be some nice features to have a consolidated constructor signatures and methods for the two classes? And have them tested on the same dataset, for validation or comparison?

Thanks.

Alpha Behavior

When I run a model like this example:
mod = coxnet.CoxnetSurvivalAnalysis(n_alphas=30, l1_ratio=1.0)
There are times (and it could be data specific) where the length of the coefficients out of the model are less than the n_alphas I specified. For example, it often stops at 5 alphas deep.
The paths might have 15 variables > 0 (coming out) at 5 alphas deep, which is fine. The strange thing I am seeing is, lets say I set n_alphas=20 on the same data set. I end up getting more variables > 0 along the path (and still stopping at 5 alphas deep) or vice verse, if I set n_alphas = 40 on the same data set, I end up getting less variables > 0 along the path and once again the algorithm is automatically stopping at 5 alphas deep. (Im referring to the parameters as variables.)

Im assuming this is a bug as the way I have working with Elastic Nets in the past is that the alpha sequence should exponentially decrease toward some min but I should see more variables > 0 as I move forward and closer to the min in this alpha curve. Such that if I see 15 variables at alpha=30, then I should see <15 at alpha<30 and the reverse.

Could there be some ratio somewhere that is picking up a variable of similar name to a global that is confusing the alpha parameters in the Elastic net?

Unable to install on windows

Unable to install on windows, do you have any prebuilt version with whl

I tried using pip install scikit-survival, having some build issues with C++ compiler. It would be great if you have whl file

GradientBoostingSurvivalAnalysis uses only a single CPU core

At least on a 2-core cpu its utilization is only 50% when fitting.

It seems that there is a bottleneck in python code, since xgboost utilizes all the cores and that the implementation using xgboost tree regressors is even slower than the one using sklearn.

Error when running cross-validation example

I'm trying to run the examples from this talk: https://k-d-w.org/slides/pyconuk-2017/

I'm getting an error when I run this code snippet

from sksurv.datasets import load_breast_cancer
from sksurv.preprocessing import OneHotEncoder
from sksurv.linear_model import CoxnetSurvivalAnalysis
from sklearn.model_selection import GridSearchCV, KFold
X, y = load_breast_cancer()
Xt = OneHotEncoder().fit_transform(X)

cv = KFold(n_splits=5, shuffle=True, random_state=328)
coxnet = CoxnetSurvivalAnalysis(n_alphas=100,
    l1_ratio=1.0, alpha_min_ratio=0.01).fit(Xt, y)

gcv = GridSearchCV(coxnet,
    {"alphas": [[v] for v in coxnet.alphas_]},
    cv=cv).fit(Xt, y)

When I run this code, I get a failed assertion at

scikit-survival/sksurv/linear_model/coxnet.py

Line 192 in 71bba93

assert numpy.isfinite(coef).all()

It looks like something is going wrong with the optimisation and the coefficients are all 0 or nan.

Add support for pandas 0.23

pandas 0.23 isn't supported by scikit-survival version 0.5. I've made a start at identifying necessary changes here: badge@9f050bb

Currently, I'm having a problem with the nosetests where the TestOneHotEncoder class (and others?) expects pandas.DataFrame.select_dtypes to return a DataFrame whose columns are ordered alphabetically instead of reflecting their original location, so there are assertListEqual tests failing. I can't see that this has ever been the case in pandas, so I'm confused!

The reduce argument has been deprecated in pandas.DataFrame.apply, so the last line of table.apply(transform, axis=0, reduce=False) currently raises a DeprecationWarning.

Problem in the concordance metric

Hello,
The concordance metric give me this error.

Regards.

Remove call of deprecated pandas API: Categorical.from_array

_get_dummies_1d in column.py calls Categorical.from_array, which has been deprecated since pandas 0.19 (pandas-dev/pandas!13854):

sksurv/column.py:147: FutureWarning: Categorical.from_array is deprecated, use Categorical instead
  cat = pandas.Categorical.from_array(pandas.Series(data))

ValueError: y must be a structured array with the first field being a binary class event indicator and the second field the time of the event/censoring

When performing ".fit(x,y)" i get the above error. However, my y looks liks this:

[(1.0, 144.0),
 (1.0, 86.0),
 (1.0, 448.0),
 (1.0, 76.0),
 (1.0, 511.0),
 (1.0, 393.0),
 (0.0, 466.0),
 (0.0, 470.0),
 (1.0, 493.0),
 (1.0, 8.0),
 (1.0, 123.0),
 (1.0, 1439.0),
 (1.0, 3.0),
 (1.0, 691.0),
 (1.0, 989.0),
 (1.0, 320.0),
 (1.0, 570.0),
 (1.0, 702.0),
 (1.0, 327.0),
 (1.0, 280.0),
 (1.0, 71.0),
 (1.0, 359.0),
 (1.0, 389.0),
 (1.0, 263.0),
 (1.0, 97.0),
 (1.0, 166.0),
 (1.0, 161.0),
 (1.0, 579.0),
 (1.0, 95.0),
 (1.0, 396.0),
 (1.0, 7.0),
 (1.0, 67.0),
 (1.0, 1228.0),
 (0.0, 2246.0),
 (1.0, 801.0),
 (1.0, 866.0),
 (1.0, 454.0),
 (1.0, 313.0),
 (1.0, 557.0),
 (1.0, 214.0),
 (1.0, 119.0),
 (1.0, 151.0),
 (1.0, 360.0),
 (1.0, 6.0),
 (1.0, 378.0),
 (1.0, 684.0),
 (1.0, 11.0),
 (1.0, 548.0),
 (1.0, 315.0),
 (1.0, 359.0),
 (1.0, 455.0),
 (1.0, 375.0),
 (1.0, 98.0),
 (1.0, 136.0),
 (1.0, 294.0),
 (1.0, 62.0),
 (1.0, 717.0),
 (1.0, 286.0),
 (1.0, 679.0),
 (1.0, 595.0),
 (1.0, 164.0),
 (1.0, 82.0),
 (1.0, 66.0),
 (1.0, 28.0),
 (0.0, 1322.0),
 (0.0, 1405.0),
 (1.0, 204.0),
 (0.0, 218.0),
 (1.0, 699.0),
 (1.0, 178.0),
 (1.0, 165.0),
 (1.0, 637.0),
 (1.0, 226.0),
 (1.0, 109.0),
 (1.0, 815.0),
 (1.0, 263.0),
 (1.0, 240.0),
 (1.0, 414.0),
 (0.0, 1031.0),
 (1.0, 33.0),
 (1.0, 23.0),
 (1.0, 150.0),
 (1.0, 282.0),
 (1.0, 86.0),
 (0.0, 932.0),
 (0.0, 181.0),
 (1.0, 207.0),
 (1.0, 182.0),
 (1.0, 133.0),
 (0.0, 13.0),
 (0.0, 958.0),
 (1.0, 342.0),
 (1.0, 108.0),
 (1.0, 254.0),
 (1.0, 138.0),
 (0.0, 268.0),
 (0.0, 273.0),
 (0.0, 260.0),
 (0.0, 204.0),
 (0.0, 155.0),
 (1.0, 83.0),
 (1.0, 114.0),
 (0.0, 187.0),
 (0.0, 139.0),
 (1.0, 120.0),
 (0.0, 237.0),
 (1.0, 164.0),
 (1.0, 15.0),
 (1.0, 224.0),
 (0.0, 253.0),
 (0.0, 391.0),
 (0.0, 145.0),
 (0.0, 47.0),
 (0.0, 145.0),
 (0.0, 151.0),
 (0.0, 815.0),
 (1.0, 164.0),
 (0.0, 539.0),
 (1.0, 478.0),
 (1.0, 426.0),
 (1.0, 439.0),
 (1.0, 50.0),
 (1.0, 316.0),
 (1.0, 2005.0),
 (1.0, 3.0),
 (1.0, 1161.0),
 (1.0, 790.0),
 (1.0, 734.0),
 (1.0, 186.0),
 (1.0, 80.0),
 (1.0, 0.0),
 (1.0, 1311.0),
 (1.0, 167.0),
 (1.0, 452.0),
 (1.0, 2791.0),
 (1.0, 754.0),
 (1.0, 562.0),
 (1.0, 323.0),
 (1.0, 715.0),
 (1.0, 845.0),
 (1.0, 1179.0),
 (1.0, 272.0),
 (1.0, 626.0),
 (1.0, 3881.0),
 (1.0, 177.0),
 (1.0, 658.0),
 (1.0, 364.0),
 (1.0, 427.0),
 (1.0, 287.0),
 (1.0, 769.0),
 (1.0, 414.0),
 (1.0, 195.0),
 (1.0, 1807.0),
 (1.0, 1315.0),
 (1.0, 784.0),
 (1.0, 351.0),
 (1.0, 333.0),
 (1.0, 358.0),
 (1.0, 1050.0),
 (1.0, 357.0),
 (1.0, 399.0),
 (1.0, 98.0),
 (1.0, 62.0),
 (1.0, 460.0),
 (1.0, 364.0),
 (1.0, 29.0),
 (1.0, 701.0),
 (1.0, 68.0),
 (1.0, 342.0),
 (1.0, 419.0),
 (1.0, 99.0),
 (1.0, 41.0),
 (1.0, 164.0),
 (1.0, 88.0),
 (1.0, 502.0),
 (1.0, 801.0),
 (1.0, 62.0),
 (1.0, 880.0),
 (1.0, 485.0),
 (1.0, 587.0),
 (1.0, 24.0),
 (1.0, 42.0),
 (1.0, 34.0),
 (0.0, 1788.0),
 (1.0, 703.0),
 (1.0, 35.0),
 (1.0, 28.0),
 (0.0, 1246.0),
 (1.0, 187.0),
 (1.0, 30.0),
 (1.0, 60.0),
 (1.0, 541.0),
 (1.0, 543.0),
 (1.0, 179.0),
 (0.0, 218.0),
 (1.0, 532.0),
 (1.0, 12.0),
 (0.0, 115.0),
 (0.0, 104.0),
 (0.0, 581.0),
 (1.0, 119.0),
 (1.0, 302.0),
 (0.0, 793.0),
 (1.0, 496.0),
 (0.0, 131.0),
 (1.0, 883.0),
 (0.0, 195.0),
 (1.0, 666.0),
 (1.0, 327.0),
 (1.0, 276.0),
 (0.0, 181.0),
 (1.0, 394.0),
 (1.0, 141.0),
 (1.0, 772.0),
 (0.0, 218.0),
 (1.0, 385.0),
 (1.0, 112.0),
 (1.0, 99.0),
 (1.0, 154.0),
 (0.0, 4.0),
 (0.0, 294.0),
 (1.0, 148.0),
 (1.0, 33.0),
 (0.0, 229.0),
 (1.0, 5.0),
 (1.0, 124.0),
 (1.0, 501.0),
 (1.0, 213.0),
 (0.0, 214.0),
 (0.0, 137.0),
 (1.0, 202.0),
 (1.0, 343.0),
 (1.0, 244.0),
 (1.0, 567.0),
 (1.0, 144.0),
 (1.0, 242.0),
 (1.0, 54.0),
 (1.0, 198.0),
 (1.0, 164.0),
 (1.0, 161.0),
 (1.0, 165.0),
 (0.0, 541.0),
 (0.0, 868.0),
 (0.0, 953.0),
 (1.0, 285.0),
 (0.0, 286.0),
 (0.0, 452.0),
 (0.0, 167.0),
 (1.0, 37.0),
 (0.0, 48.0),
 (0.0, 241.0),
 (0.0, 71.0),
 (0.0, 588.0),
 (1.0, 146.0),
 (1.0, 505.0),
 (1.0, 0.0),
 (1.0, 555.0),
 (1.0, 1008.0),
 (1.0, 648.0),
 (1.0, 427.0),
 (1.0, 350.0),
 (1.0, 753.0),
 (1.0, 316.0),
 (1.0, 316.0),
 (1.0, 489.0),
 (1.0, 231.0),
 (1.0, 87.0),
 (1.0, 438.0),
 (1.0, 480.0),
 (0.0, 345.0),
 (0.0, 6.0),
 (1.0, 77.0),
 (0.0, 280.0),
 (0.0, 254.0),
 (0.0, 232.0),
 (0.0, 258.0),
 (0.0, 37.0),
 (1.0, 47.0),
 (0.0, 4.0),
 (0.0, 143.0),
 (0.0, 95.0),
 (0.0, 20.0),
 (0.0, 205.0),
 (0.0, 145.0),
 (0.0, 222.0),
 (0.0, 160.0),
 (1.0, 275.0),
 (1.0, 474.0),
 (0.0, 77.0),
 (0.0, 358.0),
 (0.0, 951.0),
 (1.0, 432.0),
 (1.0, 237.0),
 (0.0, 415.0),
 (1.0, 128.0),
 (0.0, 690.0),
 (1.0, 319.0),
 (1.0, 165.0),
 (1.0, 468.0),
 (1.0, 15.0),
 (0.0, 3.0),
 (1.0, 593.0),
 (1.0, 36.0),
 (1.0, 142.0),
 (1.0, 386.0),
 (1.0, 515.0),
 (1.0, 372.0),
 (1.0, 632.0),
 (1.0, 457.0),
 (1.0, 485.0),
 (1.0, 224.0),
 (0.0, 693.0),
 (1.0, 644.0),
 (0.0, 643.0),
 (1.0, 604.0),
 (1.0, 113.0),
 (1.0, 383.0),
 (0.0, 604.0),
 (1.0, 330.0),
 (1.0, 323.0),
 (1.0, 26.0),
 (1.0, 406.0),
 (0.0, 272.0),
 (1.0, 290.0),
 (1.0, 30.0),
 (1.0, 135.0),
 (1.0, 360.0),
 (1.0, 6.0),
 (1.0, 351.0),
 (1.0, 236.0),
 (0.0, 124.0),
 (0.0, 44.0),
 (0.0, 778.0),
 (0.0, 636.0),
 (0.0, 132.0),
 (0.0, 436.0),
 (0.0, 250.0),
 (0.0, 228.0),
 (1.0, 146.0),
 (1.0, 138.0),
 (1.0, 535.0),
 (1.0, 94.0),
 (1.0, 111.0),
 (1.0, 279.0),
 (1.0, 1458.0),
 (1.0, 77.0),
 (1.0, 328.0),
 (1.0, 508.0),
 (1.0, 100.0),
 (1.0, 82.0),
 (1.0, 346.0),
 (1.0, 519.0),
 (1.0, 254.0),
 (1.0, 638.0),
 (1.0, 147.0),
 (1.0, 153.0),
 (1.0, 7.0),
 (1.0, 282.0),
 (1.0, 235.0),
 (0.0, 1101.0),
 (0.0, 539.0),
 (0.0, 800.0),
 (0.0, 0.0)]

So I cannot understand why I receive this problem?
Thank you!

Port to pytest

Numpy has moved from nose to pytest (see numpy/numpy#10856), we should do the same.

Question about what _get_survival_pairs() function does in NaiveSurvivalSVM

The _get_survival_pairs() function in NaiveSurvivalSVM is really interesting because it seems to convert the X data and y structured array into X_pairs and y_pairs that are suitable for running with a classification algorithm, in this case LinearSVC.

May I ask what kind of transformations is it doing to make this possible? Using the veterans dataset, it looks like the number of samples increase by a lot when running _get_survival_pairs()

X.shape
(137, 8)
y.shape
(137,)
X_pairs.shape
(8843, 8)
y_pairs.shape
(8843,)

NaiveSurvivalSVM() throws value error when fitting

I'm going through the tutorial using the veterans lung cancer study and when fitting NaiveSurvivalSVM() the below error is thrown. The other SVM models work with the same syntax. the X and y values are the same as in the tutorial.

File "u:\Projects.Experiment\scikit-survival\Main.py", line 482, in main 
  model.fit(df_x_numeric, array_y)
File "C:\Program Files\Anaconda3\envs\test\lib\site-packages\sksurv\svm\naive_survival_svm.py", line 
  153, in fit
    x_pairs, y_pairs = self._get_survival_pairs(X, y, random_state)
File "C:\Program Files\Anaconda3\envs\test\lib\site-packages\sksurv\svm\naive_survival_svm.py", line 
  126, in _get_survival_pairs
    x_pairs.resize((k, X.shape[1]))
ValueError: cannot resize an array that references or is referenced by another array in this way.  
  Use the resize function

Generating concordance index vs alpha plot (CoxnetSurvivalAnalysis)

Do you by any chance have the full code to generate the graph on the 13th slide of this presentation?

https://k-d-w.org/slides/pyconuk-2017/

I tried to replicate the code on slide 12 but it doesn't converge and also doesn't show how to calculate the concordance index for each of the alphas. Was this done manually with code that's not yet incorporated into scikit-survival? If so, would you be able to share the code?

Any help would be much appreciated.

Edit: Nevermind figured it out.

Is Ensemble = Random Survival Forest?

Dear,

Thank you for this great package.

Is the emsemble module the same as a Random Survival Forest, e.g. the one provided in R?
Do you have an example or documentation on how to use it? I cannot make it work for feature selection in a high dimensional space (genes).

problem when run setup.py install

hi when I was using the command 'python setup.py install', there has a lookup error

it shows that

setuptools-scm was unable to detect version for ...

Make sure you are either building from a fully intact git repository or PyPI tarballs...

Has anyone got the same problem with me? and could anyone give me some hint on how to fix the bug?

Thanks in advance

How can I save GradientBoostingSurvivalAnalysis model without pickle?

Add support for time varying features to Cox model

Is it possible to use time varying variables? E.g. if we want to use temperature as a variable, could we do that? How would we structure the data?

predict_cumulative_hazard_function(X) in CoxPHSurvivalAnalysis return the same function for all entries.

I have fitted a CoxPHSurvivalAnalysis model and using that to calculate the Survival function and the Cummulative Hazard function.
pred_survival = estimator.predict_survival_function(X_test_tmp)
The survival function for different entries look quite good like this:

pred_hazard = estimator.predict_cumulative_hazard_function(X_test_tmp)
However the cummulative hazard function for all entries are the same:

Any suggestion for this. Thank you.

For sklearn 0.19, cvxpy install differs for me

It seems I need to install cvxpy separately than from requirements.txt

I follow instructions, and get:....

(sksurv) scikit-survival$ conda install --file requirements.txt
Fetching package metadata ...........

PackageNotFoundError: Packages missing in current channels:

  - cvxpy

We have searched for the packages in the following channels:

  - https://repo.continuum.io/pkgs/main/osx-64
  - https://repo.continuum.io/pkgs/main/noarch
  - https://repo.continuum.io/pkgs/free/osx-64
  - https://repo.continuum.io/pkgs/free/noarch
  - https://repo.continuum.io/pkgs/r/osx-64
  - https://repo.continuum.io/pkgs/r/noarch
  - https://repo.continuum.io/pkgs/pro/osx-64
  - https://repo.continuum.io/pkgs/pro/noarch

so instead, I do:

set up blank environment sksurv
into that env, install cvxpy
comment out cvxpy from this repo's requirements.txt
now, conda install --file requirements.txt and
python setup.py install [for me, fails the first time - need to install Cython manually]

(FWIW)

macOS Sierra, Bash

Support for python 3.7

The latest version of Anaconda, 5.3.0, ships with python 3.7. scikit-survival will not install in Anaconda 5.3.0 because restrictions in cvxpy recipe needing python <= 3.6.

"sksurv/linear_model/src/coxnet/coxnet.h:18:10: fatal error: Eigen/Core: No such file or directory" when compiling scikit-survival from source

I was following instructions to install scikit-survival from source (because I want to use it with scikit-learn 0.20.0) and ran into the following error when running python setup.py install after creating sksurv environment and dependencies and activating it.

In file included from sksurv/linear_model/src/coxnet_wrapper.h:18,
                 from sksurv/linear_model/_coxnet.cpp:656:
sksurv/linear_model/src/coxnet/coxnet.h:18:10: fatal error: Eigen/Core: No such file or directory
 #include <Eigen/Core>
          ^~~~~~~~~~~~
compilation terminated.

Looks like a missing dependency?

Implement Cox model with elastic net penalty

The glmnet R package implements a coordinate descent algorithm to fit Cox's proportional hazard's model with elastic net penalty as described in Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. Adding a class similar to sklearn.linear_model.ElasticNet that implements this algorithms would be a nice.

Example using `predict()`

More of a feature request than an issue, but I'm trying to use sksurv for some predictive modeling and am having a hard time interpreting the predict output for most of the estimators. For example, with GradientBoostingSurvivalAnalysis the predict(X) output is described as the hazard for X. The output from my test data, however, is often negative which doesn't make sense for a traditional hazard rate. Is there any way to include a predict step in your current example notebook?

More than anything, I would like to be able to predict the survival duration of items not in my training data (like the predict_expected in the lifelines library). If that functionality doesn't exist, I would be happy to submit a pull request to that end if you can advise me a bit on how to interpret the current predict output.

And a million thanks for making this code available and easy to install/use. This is cool stuff!

Package imports error in SVM survival model

This might be specific to my package configuration, but when I try to run "from sksurv.svm import survival_svm", I get an import error in the survival_svm.py file "ImportError: No module named 'sksurv.svm._prsvm' , with the error coming from the line "from .survival_svm import FastKernelSurvivalSVM, FastSurvivalSVM" in the init.py file of the same directory.

Generic estimator of survival and cumulative hazard functions

It seems the ensemble methods don't have these methods implemented. How do we generate survival function and cumulative hazard functions for the ensemble based survival methods?

Some initial questions after getting started

First of all, I really love this library, thank you for writing it! I've written code using it for the first time and I just have a few questions on some details. Apologies if some of these deal with survival analysis concepts that I am new to.

In Feature-Selection:-Which-Variable-is-Most-Predictive? in your feature scoring function for SelectKBest you run CoxPHSurvivalAnalysis() on each individual feature to get its c-index score. In scikit-learn, when you use SelectKBest(), your scoring function is normally meant to score and rank all the input features at once. Why not simply run CoxPHSurvivalAnalysis() on all the input features to get their scores in the feature selection step in the grid search?
I want to perform survival analysis using non-clinical feature data such as gene expression or other omics data. For such types of numeric continuous data with different scales most classification algorithms require that you standardize the data before. Do the CoxPH, Coxnet, and survival SVM methods also need such data standardized before use?
Since omics data typically has very high dimensionality, it is normally important to do feature selection. I've done this in the context of classification and there are many methods that can perform feature selection efficiently on 20,000 or more features. With survival analysis methods as feature selectors I thought it might not be the way to go? Would you have any recommendation on high dimensional data in survival analysis?
How does one do survival analysis using scikit-survival with multiple feature types? For example, combining gene expression and clinical features into one model? Ensemble methods aren't really intended to work for multiple feature type data on the same samples.

NaiveSurvivalSVM() bug with random_state default value and int seed

If you instantiate NaiveSurvivalSVM() with default values:

model = NaiveSurvivalSVM()

and you then call fit you get the following error:

~/soft/anaconda3/lib/python3.6/site-packages/sksurv/svm/naive_survival_svm.py in _get_survival_pairs(self, X, y, random_state)
    104 
    105         idx = numpy.arange(X.shape[0], dtype=int)
--> 106         random_state.shuffle(idx)
    107 
    108         n_pairs = int(comb(X.shape[0], 2))

AttributeError: 'NoneType' object has no attribute 'shuffle'

It also doesn't work if you instantiate with a random_state seed integer.

AttributeError: 'int' object has no attribute 'shuffle'

The code only wants a numpy RandomState instance.