Code Monkey home page Code Monkey logo

2020plus's People

Contributors

ctokheim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

2020plus's Issues

Regarding input file

Hi,

I noticed that variants on chrX and chrY are labeled chr23 and chr24 in the example input file (bladder.txt). Is this recommended practice for a user dataset?

regards,
Sigve

Error in rule simMaf

I met some problem when running the code
$ snakemake -s Snakefile pretrained_predict -p --cores 1
--config mutations="/home/kjf/data/TCGA.LUAD.mutect.0458c57f-316c-4a7c-9294-ccd11c97c2f9.DR-10.0.somatic.maf" output_dir="output_bladder" trained_classifier="/home/kjf/data/2020plus_10k.Rdata"

The problem is
(2020plus) [kjf@localhost 2020plus-1.2.2]$ snakemake -s Snakefile pretrained_predict -p --cores 1 \

 --config mutations="/home/kjf/data/TCGA.LUAD.mutect.0458c57f-316c-4a7c-9294-ccd11c97c2f9.DR-10.0.somatic.maf" output_dir="output_bladder" trained_classifier="/home/kjf/data/2020plus_10k.Rdata"

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 features
1 finishSim
1 og
1 predict_test
1 pretrained_predict
10 simFeatures
10 simMaf
10 simOg
10 simSummary
10 simTsg
1 summary
1 tsg
57

[Sun Mar 31 14:20:18 2019]
rule simMaf:
input: /home/kjf/data/TCGA.LUAD.mutect.0458c57f-316c-4a7c-9294-ccd11c97c2f9.DR-10.0.somatic.maf
output: output_bladder/simulated_summary/chasm_sim_maf1.txt
jobid: 47
wildcards: iter=1

mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m /home/kjf/data/TCGA.LUAD.mutect.0458c57f-316c-4a7c-9294-ccd11c97c2f9.DR-10.0.somatic.maf -p 0 -n 1 --maf --seed=$((1*42)) -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf1.txt
Traceback (most recent call last):
File "/home/kjf/anaconda3/envs/2020plus/bin/mut_annotate", line 6, in
from prob2020.console.annotate import cli_main
File "/home/kjf/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 12, in
import prob2020.cython.cutils as cutils
File "numpy.pxd", line 861, in init prob2020.cython.cutils (prob2020/cython/cutils.cpp:19495)
ValueError: numpy.ufunc has the wrong size, try recompiling
[Sun Mar 31 14:20:19 2019]
Error in rule simMaf:
jobid: 47
output: output_bladder/simulated_summary/chasm_sim_maf1.txt

RuleException:
CalledProcessError in line 135 of /home/kjf/2020plus-1.2.2/Snakefile:
Command 'set -euo pipefail; mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m /home/kjf/data/TCGA.LUAD.mutect.0458c57f-316c-4a7c-9294-ccd11c97c2f9.DR-10.0.somatic.maf -p 0 -n 1 --maf --seed=$((1*42)) -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf1.txt' returned non-zero exit status 1.
File "/home/kjf/2020plus-1.2.2/Snakefile", line 135, in __rule_simMaf
File "/home/kjf/anaconda3/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/kjf/2020plus-1.2.2/.snakemake/log/2019-03-31T142018.562491.snakemake.log

could you plz help with this?

Exception: 'rf_clf' not found

When running the sub-command 'classify' of 2020plus.py, I encounter the following error:


AN ERROR HAS OCCURRED: check the log file


Type: <type 'exceptions.LookupError'>
Exception: 'rf_clf' not found
Traceback:
File "/home/lixiangchun/.work/database/2020plus/2020plus-master/2020plus.py", line 341, in
args.func() # run function corresponding to user's command
File "/home/lixiangchun/.work/database/2020plus/2020plus-master/2020plus.py", line 37, in _classify
src.classify.python.classifier.main(opts) # run code
File "/home/lixiangchun/.work/database/2020plus/2020plus-master/src/classify/python/classifier.py", line 186, in main
rrclf.clf.load(cli_opts['trained_classifier'])
File "/home/lixiangchun/.work/database/2020plus/2020plus-master/src/classify/python/r_random_forest_clf.py", line 138, in load
self.rf = ro.r["rf_clf"]
File "/home/lixiangchun/.work/software/install/anaconda2/lib/python2.7/site-packages/rpy2/robjects/init.py", line 341, in getitem
res = _globalenv.get(item)

I do find that there is no "rf_clf" defined in r_random_forest_clf.py or in Rdata files (i.e. 2020plus_10k.Rdata).

Hope some can fix it for me.

Xiangchun

Dependencies problem

Hi,

I am trying to run 20/20+ on linux machine with following command:

snakemake -s Snakefile pretrained_predict -p --cores 40 --config mutations="/mnt/scratch/melichv/2020+/data/mutations_maftools.maf" data_dir="/mnt/scratch/melichv/2020+/data/" output_dir="/mnt/scratch/melichv/2020+/output/" trained_classifier="/mnt/scratch/melichv/2020+/data/2020plus_10k.Rdata"

Firstly, I was getting this error:

Failed to import the site module
Traceback (most recent call last):
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site.py", line 545, in <module>
    main()
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site.py", line 531, in main
    known_paths = addusersitepackages(known_paths)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site.py", line 282, in addusersitepackages
    user_site = getusersitepackages()
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site.py", line 258, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site.py", line 248, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/sysconfig.py", line 609, in get_config_var
    return get_config_vars().get(name)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/sysconfig.py", line 558, in get_config_vars
    _init_posix(_CONFIG_VARS)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/sysconfig.py", line 429, in _init_posix
    _temp = __import__(name, globals(), locals(), ['build_time_vars'], 0)
ModuleNotFoundError: No module named '_sysconfigdata_x86_64_conda_linux_gnu'

I fixed this by solution posted here: https://stackoverflow.com/a/68685847. But then i encountered another problem and I do not know how to solve it.

Traceback (most recent call last):
  File "/home/melichv/miniconda3/envs/2020plus/bin/mut_annotate", line 5, in <module>
    from prob2020.console.annotate import cli_main
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 12, in <module>
    import prob2020.cython.cutils as cutils
  File "__init__.pxd", line 918, in init prob2020.cython.cutils
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

It is probably a problem with version of some package. Could you please update the installation procedure?

Conda env:

# packages in environment at /home/melichv/miniconda3/envs/2020plus:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             5.1                       1_gnu
_r-mutex                  1.0.0               anacondar_1
_sysroot_linux-64_curr_repodata_hack 3                   haa98f57_10
bcrypt                    3.2.0            py36h7b6447c_0
binutils_impl_linux-64    2.35.1               h27ae35d_9
binutils_linux-64         2.35.1              h454624a_30
blas                      1.0                         mkl
brotlipy                  0.7.0           py36h27cfd23_1003
bwidget                   1.9.11                        1
bzip2                     1.0.8                h7b6447c_0
c-ares                    1.19.1               h5eee18b_0
ca-certificates           2023.12.12           h06a4308_0
cairo                     1.14.12              h8948797_3
certifi                   2021.5.30        py36h06a4308_0
cffi                      1.14.0           py36h2e261b9_0
charset-normalizer        2.0.4              pyhd3eb1b0_0
cryptography              35.0.0           py36hd23ed53_0
curl                      7.67.0               hbc83047_0
cycler                    0.11.0             pyhd3eb1b0_0
dbus                      1.13.18              hb2f20db_0
docutils                  0.17.1           py36h06a4308_1
dropbox                   5.2.1                    py36_0    bioconda
expat                     2.5.0                h6a678d5_0
filechunkio               1.6                      py36_0    bioconda
fontconfig                2.14.1               h52c9d5c_1
freetype                  2.12.1               h4a9f257_0
fribidi                   1.0.10               h7b6447c_0
ftputil                   3.2                      py36_0    bioconda
gcc_impl_linux-64         7.5.0               h7105cf2_17
gcc_linux-64              7.5.0               h8f34230_30
gfortran_impl_linux-64    7.5.0               ha8c8e06_17
gfortran_linux-64         7.5.0               h96bb648_30
giflib                    5.2.1                h5eee18b_3
glib                      2.63.1               h5a9c865_0
graphite2                 1.3.14               h295c915_1
gsl                       2.4                  h14c3975_4
gst-plugins-base          1.14.0               hbbd80ab_1
gstreamer                 1.14.0               hb453b48_1
gxx_impl_linux-64         7.5.0               h0a5bf11_17
gxx_linux-64              7.5.0               hffc177d_30
harfbuzz                  1.8.8                hffaf4a1_0
icu                       58.2                 he6710b0_3
idna                      3.3                pyhd3eb1b0_0
intel-openmp              2022.1.0          h9e868ea_3769
jinja2                    3.0.3              pyhd3eb1b0_0
jpeg                      9e                   h5eee18b_1
kernel-headers_linux-64   3.10.0              h57e8cba_10
kiwisolver                1.3.1            py36h2531618_0
krb5                      1.16.4               h173b8e3_0
lcms2                     2.12                 h3be6417_0
ld_impl_linux-64          2.35.1               h7274673_9
libcurl                   7.67.0               h20c2e04_0
libdeflate                1.0                  h14c3975_1    bioconda
libedit                   3.1.20230828         h5eee18b_0
libev                     4.33                 h7f8727e_1
libffi                    3.2.1             hf484d3e_1007
libgcc-devel_linux-64     7.5.0               hbbeae57_17
libgcc-ng                 11.2.0               h1234567_1
libgfortran-ng            7.5.0               ha8ba4b0_17
libgfortran4              7.5.0               ha8ba4b0_17
libgomp                   11.2.0               h1234567_1
libnghttp2                1.52.0               ha637b67_1
libpng                    1.6.39               h5eee18b_0
libsodium                 1.0.18               h7b6447c_0
libssh2                   1.10.0               h37d81fd_2
libstdcxx-devel_linux-64  7.5.0               hf0c5c8d_17
libstdcxx-ng              11.2.0               h1234567_1
libtiff                   4.2.0                hecacb30_2
libuuid                   1.41.5               h5eee18b_0
libwebp                   1.2.4                h11a3e52_1
libwebp-base              1.2.4                h5eee18b_1
libxcb                    1.15                 h7f8727e_0
libxml2                   2.9.14               h74e7548_0
lz4-c                     1.9.4                h6a678d5_0
make                      4.2.1                h1bed415_1
markupsafe                2.0.1            py36h27cfd23_0
matplotlib                3.3.2                h06a4308_0
matplotlib-base           3.3.2            py36h817c723_0
mkl                       2018.0.3                      1
mkl_fft                   1.0.6            py36h7dd41cf_0
mkl_random                1.0.1            py36h4414c95_1
ncurses                   6.4                  h6a678d5_0
numpy                     1.15.4           py36h1d66e8a_0
numpy-base                1.15.4           py36h81de0dd_0
olefile                   0.46               pyhd3eb1b0_0
openssl                   1.1.1w               h7f8727e_0
pandas                    0.25.3           py36he6710b0_0
pango                     1.42.4               h049681c_0
paramiko                  2.8.1              pyhd3eb1b0_0
pcre                      8.45                 h295c915_0
pillow                    8.3.1            py36h5aabda8_0
pip                       21.2.2           py36h06a4308_0
pixman                    0.40.0               h7f8727e_1
probabilistic2020         1.2.3                    pypi_0    pypi
psutil                    5.8.0            py36h27cfd23_1
pycparser                 2.21               pyhd3eb1b0_0
pynacl                    1.4.0            py36h7b6447c_1
pyopenssl                 22.0.0             pyhd3eb1b0_0
pyparsing                 3.0.4              pyhd3eb1b0_0
pyqt                      5.9.2            py36h05f1152_2
pysam                     0.15.3           py36hda2845c_1    bioconda
pysftp                    0.2.9                    py36_0    bioconda
pysocks                   1.7.1            py36h06a4308_0
python                    3.6.10               h191fe78_1
python-dateutil           2.8.2              pyhd3eb1b0_0
pytz                      2021.3             pyhd3eb1b0_0
pyyaml                    5.4.1            py36h27cfd23_1
qt                        5.9.7                h5867ecd_1
r                         3.6.0                     r36_0
r-assertthat              0.2.1             r36h6115d3f_0
r-base                    3.6.1                h9bb98a2_1
r-bh                      1.69.0_1          r36h6115d3f_0
r-bit                     1.1_14            r36h96ca727_0
r-bit64                   0.9_7             r36h96ca727_0
r-blob                    1.1.1             r36h6115d3f_0
r-boot                    1.3_20            r36h6115d3f_0
r-class                   7.3_15            r36h96ca727_0
r-cli                     1.1.0             r36h6115d3f_0
r-cluster                 2.0.8             r36ha65eedd_0
r-codetools               0.2_16            r36h6115d3f_0
r-crayon                  1.3.4             r36h6115d3f_0
r-dbi                     1.0.0             r36h6115d3f_0
r-dbplyr                  1.4.0             r36h6115d3f_0
r-digest                  0.6.18            r36h96ca727_0
r-dplyr                   0.8.0.1           r36h29659fb_0
r-fansi                   0.4.0             r36h96ca727_0
r-foreign                 0.8_71            r36h96ca727_0
r-glue                    1.3.1             r36h96ca727_0
r-kernsmooth              2.23_15           r36ha65eedd_4
r-lattice                 0.20_38           r36h96ca727_0
r-magrittr                1.5               r36h6115d3f_4
r-mass                    7.3_51.3          r36h96ca727_0
r-matrix                  1.2_17            r36h96ca727_0
r-memoise                 1.1.0             r36h6115d3f_0
r-mgcv                    1.8_28            r36h96ca727_0
r-nlme                    3.1_139           r36ha65eedd_0
r-nnet                    7.3_12            r36h96ca727_0
r-pillar                  1.3.1             r36h6115d3f_0
r-pkgconfig               2.0.2             r36h6115d3f_0
r-plogr                   0.2.0             r36h6115d3f_0
r-prettyunits             1.0.2             r36h6115d3f_0
r-purrr                   0.3.2             r36h96ca727_0
r-r6                      2.4.0             r36h6115d3f_0
r-randomforest            4.6_14            r36ha65eedd_0
r-rcpp                    1.0.1             r36h29659fb_0
r-recommended             3.6.0                     r36_0
r-rlang                   0.3.4             r36h96ca727_0
r-rpart                   4.1_15            r36h96ca727_0
r-rsqlite                 2.1.1             r36h29659fb_0
r-spatial                 7.3_11            r36h96ca727_4
r-survival                2.44_1.1          r36h96ca727_0
r-tibble                  2.1.1             r36h96ca727_0
r-tidyselect              0.2.5             r36h29659fb_0
r-utf8                    1.1.4             r36h96ca727_0
readline                  7.0                  h7b6447c_5
requests                  2.27.1             pyhd3eb1b0_0
rpy2                      2.9.4           py36r36h481b005_0
scikit-learn              0.19.2           py36h4989274_0
scipy                     0.19.1           py36h9976243_3
setuptools                58.0.4           py36h06a4308_0
sip                       4.19.8           py36hf484d3e_0
six                       1.16.0             pyhd3eb1b0_1
snakemake                 3.13.3                   py36_0    bioconda
sqlite                    3.33.0               h62c20be_0
sysroot_linux-64          2.17                h57e8cba_10
tbb                       2021.8.0             hdb19cb5_0
tbb4py                    2021.3.0         py36hd09550d_0
tk                        8.6.12               h1ccaba5_0
tktable                   2.10                 h14c3975_0
tornado                   6.1              py36h27cfd23_0
tzlocal                   2.1                      py36_0
urllib3                   1.26.8             pyhd3eb1b0_0
wheel                     0.37.1             pyhd3eb1b0_0
wrapt                     1.12.1           py36h7b6447c_1
xz                        5.2.10               h5eee18b_1
yaml                      0.2.5                h7b6447c_0
zlib                      1.2.13               h5eee18b_0
zstd                      1.5.5                hc292b87_0

Error in load("data/2020plus_10k.Rdata") : error reading from connection

Hi KarchinLab:
This error occurred when 2020plus was about to run to completion.
2020plus_10k.Rdata has already located in the /data directory. ../../2020plus/2020plus-1.2.3/data/2020plus_10k.Rdata

`[Sat Dec 11 05:30:02 2021]
rule predict_test:
input: data/2020plus_10k.Rdata, 2021.12.10_2020plus_1/features.txt, 2021.12.10_2020plus_1/simulated_summary/simulated_features.txt
output: 2021.12.10_2020plus_1/pretrained_output/results/r_random_forest_prediction.txt
jobid: 1
resources: tmpdir=/tmp

    python `which 2020plus.py` --log-level=INFO classify --trained-classifier data/2020plus_10k.Rdata --null-distribution 2021.12.10_2020plus_1/simulated_null_dist.txt --fe>
    python `which 2020plus.py` --out-dir 2021.12.10_2020plus_1/pretrained_output --log-level=INFO classify -n 200 --trained-classifier data/2020plus_10k.Rdata -d .7 -o 1.0 >

Version: 1.2.3
Command: /home/data/vip13t22/wes_cancer/biosoft/2020plus/2020plus-1.2.3//2020plus.py --log-level=INFO classify --trained-classifier data/2020plus_10k.Rdata --null-distribution >
Running Random forest . . .
Type: <class 'rpy2.rinterface.RRuntimeError'>
Exception: Error in load("data/2020plus_10k.Rdata") : error reading from connection

Traceback:
File "/home/data/vip13t22/wes_cancer/biosoft/2020plus/2020plus-1.2.3//2020plus.py", line 275, in
args.func() # run function corresponding to user's command
File "/home/data/vip13t22/wes_cancer/biosoft/2020plus/2020plus-1.2.3//2020plus.py", line 37, in _classify
src.classify.python.classifier.main(opts) # run code
File "/home/data/vip13t22/wes_cancer/biosoft/2020plus/2020plus-1.2.3/src/classify/python/classifier.py", line 184, in main
rrclf.clf.load_cv(cli_opts['trained_classifier'])
File "/home/data/vip13t22/wes_cancer/biosoft/2020plus/2020plus-1.2.3/src/classify/python/r_random_forest_clf.py", line 164, in load_cv
ro.r('load("{0}")'.format(path))
File "/home/data/vip13t22/miniconda3/envs/snakemake/lib/python3.6/site-packages/rpy2/robjects/init.py", line 321, in call
res = self.eval(p)
File "/home/data/vip13t22/miniconda3/envs/snakemake/lib/python3.6/site-packages/rpy2/robjects/functions.py", line 178, in call
return super(SignatureTranslatedFunction, self).call(*args, **kwargs)
File "/home/data/vip13t22/miniconda3/envs/snakemake/lib/python3.6/site-packages/rpy2/robjects/functions.py", line 106, in call
res = super(Function, self).call(*new_args, **new_kwargs)


AN ERROR HAS OCCURRED: check the log file


[Sat Dec 11 05:30:05 2021]
Error in rule predict_test:
jobid: 1
output: 2021.12.10_2020plus_1/pretrained_output/results/r_random_forest_prediction.txt
shell:

    python `which 2020plus.py` --log-level=INFO classify --trained-classifier data/2020plus_10k.Rdata --null-distribution 2021.12.10_2020plus_1/simulated_null_dist.txt --fe>
    python `which 2020plus.py` --out-dir 2021.12.10_2020plus_1/pretrained_output --log-level=INFO classify -n 200 --trained-classifier data/2020plus_10k.Rdata -d .7 -o 1.0 >
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/data/vip13t22/wes_cancer/biosoft/2020plus/2020plus-1.2.3/.snakemake/log/2021-12-10T142639.085265.snakemake.log`

pandas.errors.EmptyDataError: No columns to parse from file

Hi,

I am receiving the following error during my run:

probabilistic2020 --log-level=INFO oncogene -c 1.5 -n 10000 -b data//snvboxGenes.bed -m output_fl/simulated_summary/chasm_sim_maf1.txt -i data//snvboxGenes.fa -p 4 --score-dir=data//scores -r 3 -o output_fl/simulated_summary/oncogene_sim1.txt
Version: 1.2.0
Command: /cluster/home/sigven/SOFTWARE/2020plus/bin/bin/probabilistic2020 --log-level=INFO oncogene -c 1.5 -n 10000 -b data//snvboxGenes.bed -m output_fl/simulated_summary/chasm_sim_maf1.txt -i data//snvboxGenes.fa -p 4 --score-dir=data//scores -r 3 -o output_fl/simulated_summary/oncogene_sim1.txt
Traceback (most recent call last):
File "/cluster/home/sigven/SOFTWARE/2020plus/bin/bin/probabilistic2020", line 11, in <module>
sys.exit(cli_main())
File "/usit/abel/u1/sigven/SOFTWARE/2020plus/lib/python3.6/site-packages/prob2020/console/probabilistic2020.py", line 284, in cli_main
main(opts)
File "/usit/abel/u1/sigven/SOFTWARE/2020plus/lib/python3.6/site-packages/prob2020/console/probabilistic2020.py", line 229, in main
result_df = rt.main(opts, mutation_df)
File "/usit/abel/u1/sigven/SOFTWARE/2020plus/lib/python3.6/site-packages/prob2020/console/randomization_test.py", line 366, in main
mut_df = pd.read_csv(opts['mutations'], sep='\t')
File "/usit/abel/u1/sigven/anaconda2/envs/2020plus/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usit/abel/u1/sigven/anaconda2/envs/2020plus/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usit/abel/u1/sigven/anaconda2/envs/2020plus/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
self._make_engine(self.engine)
File "/usit/abel/u1/sigven/anaconda2/envs/2020plus/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usit/abel/u1/sigven/anaconda2/envs/2020plus/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Error in job simOg while creating output file output_fl/simulated_summary/oncogene_sim1.txt.

Do you have any clue of what's going wrong?

Error executing 2020plus.py

By using my custom data created using probabilistic2020 feature file is generated for my data, but when i try to run classify from 2020plus.py It ran into the python error that I could not figure where it went wrong. With the example data(pancreatic_example data) same error was observed. I tried reinstalling python dependencies (numpy,pandas,rpy2, etc) but still have the same error (in both python 2.7 and python 3.4).

With my data

Version: 1.1.0
Command: 2020plus.py train -f features_2020.txt -r classifier.Rdata
Training R's Random forest . . .


AN ERROR HAS OCCURRED: check the log file


Type: <class 'KeyError'>
Exception: 1
Traceback:
File "2020plus.py", line 341, in
args.func() # run function corresponding to user's command
File "2020plus.py", line 43, in _train
src.train.python.train.main(opts) # run code
File "/home/mlscl3/2020/2020plus-1.1.0/src/train/python/train.py", line 33, in main
rrclf.train()
File "/home/mlscl3/2020/2020plus-1.1.0/src/classify/python/generic_classifier.py", line 50, in train
self.clf.fit(self.x, self.y)
File "/home/mlscl3/2020/2020plus-1.1.0/src/classify/python/r_random_forest_clf.py", line 102, in fit
label_counts[self.onco_num],
File "/usr/local/lib/python3.4/dist-packages/pandas/core/series.py", line 601, in getitem
result = self.index.get_value(self, key)
File "/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3567)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3250)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4289)
File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8555)
File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8499)

With example data

Version: 1.1.0
Command: 2020plus.py --out-dir=result_compare classify -f pancan_example/features_pancan.txt -nd pancan_example/simulated_null_dist.txt
Running Random forest . . .


AN ERROR HAS OCCURRED: check the log file


Type: <class 'KeyError'>
Exception: 1
Traceback:
File "2020plus.py", line 341, in
args.func() # run function corresponding to user's command
File "2020plus.py", line 37, in _classify
src.classify.python.classifier.main(opts) # run code
File "/home/mlscl3/2020/2020plus-1.1.0/src/classify/python/classifier.py", line 250, in main
rrclf.kfold_validation()
File "/home/mlscl3/2020/2020plus-1.1.0/src/classify/python/generic_classifier.py", line 212, in kfold_validation
self.y.iloc[train_ix].copy())
File "/home/mlscl3/2020/2020plus-1.1.0/src/classify/python/r_random_forest_clf.py", line 102, in fit
label_counts[self.onco_num],
File "/usr/local/lib/python3.4/dist-packages/pandas/core/series.py", line 601, in getitem
result = self.index.get_value(self, key)
File "/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3567)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3250)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4289)
File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8555)
File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8499)

Keep getting error : Error in job simMaf while creating output file

Hi I'm getting following error while executing sample files. I couldn't find any fix for this. Can you please help on this

ERROR:

snakemake -s Snakefile pretrained_predict -p --cores 1 \

 --config mutations="data/bladder.txt" output_dir="output_bladder" trained_classifier="data/2020plus_10k.Rdata"

Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 features
1 finishSim
1 og
1 predict_test
1 pretrained_predict
10 simFeatures
10 simMaf
10 simOg
10 simSummary
10 simTsg
1 summary
1 tsg
57

rule simMaf:
input: data/bladder.txt
output: output_bladder/simulated_summary/chasm_sim_maf3.txt
jobid: 50
wildcards: iter=3

mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --maf --seed=$((342)) -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf3.txt
Command: /root/miniconda3/envs/2020plus/bin/mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --maf --seed=126 -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf3.txt
There were 832 indels identified.
Kept 33771 mutations after droping mutations with missing information (Droped: 0)
Dropped 832 mutations after only keeping Missense_Mutation, Silent, Nonsense_Mutation, Splice_Site, Nonstop_Mutation, Translation_Start_Site. Indels are processed separately.
Dropped 182 mutations after only keeping valid SNVs
Dropped 0 mutations when removing duplicates
Working on chromosome: chr1 . . .
Finished working on chromosome: chr1.
Working on chromosome: chr19 . . .
Finished working on chromosome: chr19.
Working on chromosome: chr11 . . .
Finished working on chromosome: chr11.
Working on chromosome: chr2 . . .
Finished working on chromosome: chr2.
Working on chromosome: chr17 . . .
Finished working on chromosome: chr17.
Working on chromosome: chr3 . . .
Finished working on chromosome: chr3.
Working on chromosome: chr6 . . .
Finished working on chromosome: chr6.
Working on chromosome: chr12 . . .
Finished working on chromosome: chr12.
Working on chromosome: chr7 . . .
Finished working on chromosome: chr7.
Working on chromosome: chr5 . . .
Finished working on chromosome: chr5.
Working on chromosome: chr14 . . .
Finished working on chromosome: chr14.
Working on chromosome: chr16 . . .
Finished working on chromosome: chr16.
Working on chromosome: chr9 . . .
Finished working on chromosome: chr9.
Working on chromosome: chrX . . .
Finished working on chromosome: chrX.
Working on chromosome: chr10 . . .
Finished working on chromosome: chr10.
Working on chromosome: chr4 . . .
Finished working on chromosome: chr4.
Working on chromosome: chr8 . . .
Finished working on chromosome: chr8.
Working on chromosome: chr15 . . .
Finished working on chromosome: chr15.
Working on chromosome: chr20 . . .
Finished working on chromosome: chr20.
Working on chromosome: chr22 . . .
Finished working on chromosome: chr22.
Working on chromosome: chr13 . . .
Finished working on chromosome: chr13.
Working on chromosome: chr18 . . .
Finished working on chromosome: chr18.
Working on chromosome: chr21 . . .
Finished working on chromosome: chr21.
Working on chromosome: chrY . . .
Finished working on chromosome: chrY.
Working on chromosome: chrM . . .
Finished working on chromosome: chrM.
/bin/bash: line 1: 11394 Killed mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --maf --seed=$((3
42)) -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf3.txt
Error in job simMaf while creating output file output_bladder/simulated_summary/chasm_sim_maf3.txt.
RuleException:
CalledProcessError in line 135 of /root/Desktop/2020_Prod/2020plus-1.2.2/Snakefile:
Command 'mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --maf --seed=$((3*42)) -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf3.txt' returned non-zero exit status 137.
File "/root/Desktop/2020_Prod/2020plus-1.2.2/Snakefile", line 135, in __rule_simMaf
File "/root/miniconda3/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job simMaf since they might be corrupted:
output_bladder/simulated_summary/chasm_sim_maf3.txt
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

Can't open file 'features': [Errno 2] No such file or directory

Hi,
While running the program and under simFeatures creation, I am encountering the following error. please let me know what's the issue and how best it can be resolved.
Thanks a lot for your kind help.

`snakemake -s Snakefile pretrained_predict -p --cores 1 --config mutations="data/soma_fb_2020_mod.txt" output_dir="output_soma_freebayes" trained_classifier="data/2020plus_10k.Rdata"

rule simFeatures:
input: output_soma_freebayes/simulated_summary/chasm_sim_summary3.txt, output_soma_freebayes/simulated_summary/oncogene_sim3.txt, output_soma_freebayes/simulated_summary/tsg_sim3.txt
output: output_soma_freebayes/simulated_summary/simulated_features3.txt
jobid: 7
wildcards: iter=3

python which 2020plus.py features -s output_soma_freebayes/simulated_summary/chasm_sim_summary3.txt --tsg-test output_soma_freebayes/simulated_summary/tsg_sim3.txt -og-test output_soma_freebayes/simulated_summary/oncogene_sim3.txt -o output_soma_freebayes/simulated_summary/simulated_features3.txt
which: no 2020plus.py in (/home/ateeqanees/anaconda3/bin:/home/ateeqanees/miniconda3/envs/2020plus/bin:/home/ateeqanees/miniconda3/condabin:/home/ateeqanees/anaconda3/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/home/ateeqanees/.local/bin:/home/ateeqanees/bin)
python: can't open file 'features': [Errno 2] No such file or directory
Error in job simFeatures while creating output file output_soma_freebayes/simulated_summary/simulated_features3.txt.
RuleException:
CalledProcessError in line 202 of /home/ateeqanees/Desktop/20_20/2020plus-1.2.2/Snakefile:
Command 'python which 2020plus.py features -s output_soma_freebayes/simulated_summary/chasm_sim_summary3.txt --tsg-test output_soma_freebayes/simulated_summary/tsg_sim3.txt -og-test output_soma_freebayes/simulated_summary/oncogene_sim3.txt -o output_soma_freebayes/simulated_summary/simulated_features3.txt' returned non-zero exit status 2.
File "/home/ateeqanees/Desktop/20_20/2020plus-1.2.2/Snakefile", line 202, in __rule_simFeatures
File "/home/ateeqanees/miniconda3/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
`

Thanks a lot for your help,
Dave

Should I train a new model ?

Dear Collin,
I want to run 2020puls using my own pan-cancer data without silent mutations(total mutation num >130, 000) to predict oncogene and TSG of Pan-cancer and type specific cancer. Should I train a new model using my data with –config drop_silent=”yes” followed by running predict or just run pretrained_predict using your pre-trained 20/20+ classifiers with the same config above?
Thanks.

No such file or directory in 'feature file'

I got error below.
Can you help me out?

Thanks.

python `which 2020plus.py` features   -s output_bladder/simulated_summary/chasm_sim_summary2.txt --tsg-test output_bladder/simulated_summary/tsg_sim2.txt -og-test output_bladder/simulated_summary/oncogene_sim2.txt   -o output_bladder/simulated_summary/simulated_features2.txt
python: can't open file 'features': [Errno 2] No such file or directory
python: can't open file 'features': [Errno 2] No such file or directory
python: can't open file 'features': [Errno 2] No such file or directory
python: can't open file 'features': [Errno 2] No such file or directory
python: can't open file 'features': [Errno 2] No such file or directory
python: can't open file 'features': [Errno 2] No such file or directory
python: can't open file 'features': [Errno 2] No such file or directory
Error in job simFeatures while creating output file output_bladder/simulated_summary/simulated_features8.txt.
Error in job simFeatures while creating output file output_bladder/simulated_summary/simulated_features7.txt.
Error in job simFeatures while creating output file output_bladder/simulated_summary/simulated_features4.txt.
Error in job simFeatures while creating output file output_bladder/simulated_summary/simulated_features6.txt.
Error in job simFeatures while creating output file output_bladder/simulated_summary/simulated_features1.txt.
Error in job simFeatures while creating output file output_bladder/simulated_summary/simulated_features2.txt.
Error in job simFeatures while creating output file output_bladder/simulated_summary/simulated_features10.txt.
RuleException:
CalledProcessError in line 202 of /addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile:
Command 'python `which 2020plus.py` features   -s output_bladder/simulated_summary/chasm_sim_summary6.txt --tsg-test output_bladder/simulated_summary/tsg_sim6.txt -og-test output_bladder/simulated_summary/oncogene_sim6.txt   -o output_bladder/simulated_summary/simulated_features6.txt' returned non-zero exit status 2.
  File "/addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile", line 202, in __rule_simFeatures
  File "/root/anaconda2/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
RuleException:
CalledProcessError in line 202 of /addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile:
Command 'python `which 2020plus.py` features   -s output_bladder/simulated_summary/chasm_sim_summary10.txt --tsg-test output_bladder/simulated_summary/tsg_sim10.txt -og-test output_bladder/simulated_summary/oncogene_sim10.txt   -o output_bladder/simulated_summary/simulated_features10.txt' returned non-zero exit status 2.
  File "/addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile", line 202, in __rule_simFeatures
  File "/root/anaconda2/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
RuleException:
CalledProcessError in line 202 of /addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile:
Command 'python `which 2020plus.py` features   -s output_bladder/simulated_summary/chasm_sim_summary7.txt --tsg-test output_bladder/simulated_summary/tsg_sim7.txt -og-test output_bladder/simulated_summary/oncogene_sim7.txt   -o output_bladder/simulated_summary/simulated_features7.txt' returned non-zero exit status 2.
  File "/addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile", line 202, in __rule_simFeatures
  File "/root/anaconda2/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
RuleException:
CalledProcessError in line 202 of /addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile:
Command 'python `which 2020plus.py` features   -s output_bladder/simulated_summary/chasm_sim_summary8.txt --tsg-test output_bladder/simulated_summary/tsg_sim8.txt -og-test output_bladder/simulated_summary/oncogene_sim8.txt   -o output_bladder/simulated_summary/simulated_features8.txt' returned non-zero exit status 2.
  File "/addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile", line 202, in __rule_simFeatures
  File "/root/anaconda2/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
RuleException:
CalledProcessError in line 202 of /addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile:
Command 'python `which 2020plus.py` features   -s output_bladder/simulated_summary/chasm_sim_summary2.txt --tsg-test output_bladder/simulated_summary/tsg_sim2.txt -og-test output_bladder/simulated_summary/oncogene_sim2.txt   -o output_bladder/simulated_summary/simulated_features2.txt' returned non-zero exit status 2.
  File "/addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile", line 202, in __rule_simFeatures
  File "/root/anaconda2/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
RuleException:
CalledProcessError in line 202 of /addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile:
Command 'python `which 2020plus.py` features   -s output_bladder/simulated_summary/chasm_sim_summary1.txt --tsg-test output_bladder/simulated_summary/tsg_sim1.txt -og-test output_bladder/simulated_summary/oncogene_sim1.txt   -o output_bladder/simulated_summary/simulated_features1.txt' returned non-zero exit status 2.
  File "/addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile", line 202, in __rule_simFeatures
  File "/root/anaconda2/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
RuleException:
CalledProcessError in line 202 of /addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile:
Command 'python `which 2020plus.py` features   -s output_bladder/simulated_summary/chasm_sim_summary4.txt --tsg-test output_bladder/simulated_summary/tsg_sim4.txt -og-test output_bladder/simulated_summary/oncogene_sim4.txt   -o output_bladder/simulated_summary/simulated_features4.txt' returned non-zero exit status 2.
  File "/addData01/01_Program_to_install/63.2020plus/2020plus-master/Snakefile", line 202, in __rule_simFeatures
  File "/root/anaconda2/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

Errors: libicuuc.so.54 not found

Type: <class 'ImportError'>
Exception: libicuuc.so.54: cannot open shared object file: No such file or directory
Traceback:
File "/share/home/luoylLab/zengyuchen/biosoft/2020plus-1.2.3/2020plus.py", line 263, in
import src.classify.python.classifier
File "/share/home/luoylLab/zengyuchen/biosoft/2020plus-1.2.3/src/classify/python/classifier.py", line 3, in
from src.classify.python.r_random_forest_clf import RRandomForest
File "/share/home/luoylLab/zengyuchen/biosoft/2020plus-1.2.3/src/classify/python/r_random_forest_clf.py", line 4, in
import rpy2.robjects as ro
File "/share/home/luoylLab/zengyuchen/.conda/envs/2020plus/lib/python3.6/site-packages/rpy2/robjects/init.py", line 16, in
import rpy2.rinterface as rinterface
File "/share/home/luoylLab/zengyuchen/.conda/envs/2020plus/lib/python3.6/site-packages/rpy2/rinterface/init.py", line 92, in
from rpy2.rinterface._rinterface import (baseenv,


AN ERROR HAS OCCURRED: check the log file

Require snvboxGenes.fa file

Hello,

I am using the 2020+ tool to identify the potential candidate driver gene. I can download the required files, such as snvboxGenes.bed or scores.tar.gz, but I am not able to get the exact file required for gene.fa in the following command:

mut_annotate --summary -i genes.fa -b genes.bed -s score_dir -m mutations.txt -o summary.txt

I tried various fasta files generated from UCSC Table Brower, but now of them worked. Can you share the exact fasta file you used in your published work? Thanks.

Include plots of p-value distribution

There is likely substantial benefit in including qq plots for the p-value distribution as an output by default. This may have the benefit of allowing users to use precomputed null distribution and allow them to check whether they really need to create their own null distribution based on their own data.

CalledProcessorError in Line 342

I have faced the following issue after 96% job done and started Random Forest....
Version: 1.2.3
Command: /home/yasirniazi/2020plus-1.2.3/2020plus.py --log-level=INFO classify --trained-classifier data/2020plus_10k.Rdata --null-distribution output_bladder/simulated_null_dist.txt --features output_bladder/simulated_summary/simulated_features.txt --simulated --cv
Running Random forest . . .


AN ERROR HAS OCCURRED: check the log file


Type: <class 'rpy2.rinterface.RRuntimeError'>
Exception: Error in load("data/2020plus_10k.Rdata") : error reading from connection

Traceback:
File "/home/yasirniazi/2020plus-1.2.3/2020plus.py", line 275, in
args.func() # run function corresponding to user's command
File "/home/yasirniazi/2020plus-1.2.3/2020plus.py", line 37, in _classify
src.classify.python.classifier.main(opts) # run code
File "/home/yasirniazi/2020plus-1.2.3/src/classify/python/classifier.py", line 184, in main
rrclf.clf.load_cv(cli_opts['trained_classifier'])
File "/home/yasirniazi/2020plus-1.2.3/src/classify/python/r_random_forest_clf.py", line 164, in load_cv
ro.r('load("{0}")'.format(path))
File "/home/yasirniazi/anaconda3/envs/2020plus/lib/python3.6/site-packages/rpy2/robjects/init.py", line 352, in call
res = self.eval(p)
File "/home/yasirniazi/anaconda3/envs/2020plus/lib/python3.6/site-packages/rpy2/robjects/functions.py", line 178, in call
return super(SignatureTranslatedFunction, self).call(*args, **kwargs)
File "/home/yasirniazi/anaconda3/envs/2020plus/lib/python3.6/site-packages/rpy2/robjects/functions.py", line 106, in call
res = super(Function, self).call(*new_args, **new_kwargs)

Error in job predict_test while creating output file output_bladder/pretrained_output/results/r_random_forest_prediction.txt.
RuleException:
CalledProcessError in line 342 of /home/yasirniazi/2020plus-1.2.3/Snakefile:

Command '
python which 2020plus.py --log-level=INFO classify --trained-classifier data/2020plus_10k.Rdata --null-distribution output_bladder/simulated_null_dist.txt --features output_bladder/simulated_summary/simulated_features.txt --simulated --cv
python which 2020plus.py --out-dir output_bladder/pretrained_output --log-level=INFO classify -n 200 --trained-classifier data/2020plus_10k.Rdata -d .7 -o 1.0 --features output_bladder/features.txt --null-distribution output_bladder/simulated_null_dist.txt --random-seed 71 --cv
' returned non-zero exit status 1.
File "/home/yasirniazi/2020plus-1.2.3/Snakefile", line 342, in __rule_predict_test
File "/home/yasirniazi/anaconda3/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

Error in rule SimSummary: /bin/bash: mut_annotate: command not found

Kindly help me to resolve said issue

I run following command

snakemake -s Snakefile pretrained_predict -p --cores 1 --config mutations="data/bladder.txt" output_dir="output_bladder" trained_classifier="data/2020plus_10k.Rdata"

and find the following error

/bin/bash: mut_annotate: command not found

Error in rule simSummary:
jobid: 17
output: output_bladder/simulated_summary/chasm_sim_summary1.txt
shell:
mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --summary --seed=$((1*42)) --score-dir=data//scores --unique -r 3 -o output_bladder/simulated_summary/chasm_sim_summary1.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

rpy2 may need a specified version

When I ran 'python 2020plus.py classify', the error showed as follows:

Type: <class 'AttributeError'>
Exception: module 'rpy2.robjects.pandas2ri' has no attribute 'py2ri'

I used rpy2 v3.1.0. I suggested the author specified a rpy2 version that could solve this problem. Thanks.

Not enough Mutated Oncogenes or TSGs Found in Your Data

Hi,
I am currently testing the 2020plus software on my MAF data set which is basically the TCGA MAF files for 16 cancer types concatenated. The algorithm ran without problems using the MAF file from the tutorial. My MAF file contains roughly 2.5 million mutations with 500,000 of them being classified as silent.

After most of the snakemake pipeline works as expected, I get this error in the final stages:

Version: 1.2.3
Command: /project/gcn/2020plus-1.2.3/2020plus.py --log-level=INFO train -d .7 -o 1.0 -n 1000 -r pancan16_ourmutation/trained.Rdata --features=pancan16_ourmutation/features.txt --random-seed 71
Training R's Random forest . . .
ERROR: There were either no or very few mutated oncogenes or tumor suppressor genes found in your data! Did you supply a full pan-cancer dataset? Or have you modified the training list of oncogenes or tumor suppressor genes? Or did you subset your mutations to not include oncogenes/tumor suppressor genes in the training list?
Error in job cv_predict while creating output files pancan16_ourmutation/output/results/r_random_forest_prediction.txt, pancan16_ourmutation/trained.Rdata.
RuleException:
CalledProcessError in line 304 of /project/gcn/2020plus-1.2.3/Snakefile:
Command '
python which 2020plus.py --log-level=INFO train -d .7 -o 1.0 -n 1000 -r pancan16_ourmutation/trained.Rdata --features=pancan16_ourmutation/features.txt --random-seed 71
python which 2020plus.py --log-level=INFO classify --trained-classifier pancan16_ourmutation/trained.Rdata --null-distribution pancan16_ourmutation/simulated_null_dist.txt --features pancan16_ourmutation/simulated_summary/simulated_features.txt --simulated
python which 2020plus.py --out-dir pancan16_ourmutation/output --log-level=INFO classify -n 200 -d .7 -o 1.0 --features pancan16_ourmutation/features.txt --null-distribution pancan16_ourmutation/simulated_null_dist.txt --random-seed 71
' returned non-zero exit status 1.
File "/project/gcn/2020plus-1.2.3/Snakefile", line 304, in __rule_cv_predict
File "/home/sasse/miniconda3/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
(2020plus) sasse@bohemianrhapsody:/project/gcn/2020plus-1.2.3>

I called the tool using:

snakemake -s Snakefile predict -p --cores 64 --config mutations="data/pancancer_16_onlyrequiredcols.maf" output_dir="pancan16_ourmutation"

where data/pancancer_16_onlyrequiredcols.maf is my edited MAF file and I leave all other data as in the tutorial.
Do you know why this error happens? Could it be that there is a problem with my mutation file or is the problem in the layout (e.g. not enough mutations in some of the known TSGs/oncogenes due to using only a subset of cancer types)?

I might add that the MAF file only contains the columns required according to this page, that is:

*Hugo_Symbol (or named “Gene”)
*Chromosome

  • Start_Position
  • End_Position
  • Reference_Allele
  • Tumor_Seq_Allele2 (or named “Tumor_Allele”)
  • Tumor_Sample_Barcode (or named “Tumor_Sample”)
  • Variant_Classification

Maybe that has to do with the error?

Thank you for some hints on that.

Best,

Roman

Docker image

Hi,

Would it be possible to make a Docker image of 2020plus with all the necessary software dependencies? I am struggling hard to make it run successfully (although I seemed to have done this previously), and not really understanding which requirements are needed to make everything work.

Sigve

Error in job simMAf : Called processor error :

We've gotten through the quick start and have trained our classifier. Now, while trying to run 2020plus an error is thrown, goes as follows:

Command: /anaconda3/envs/2020plus/bin/mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --maf --seed=378 -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf9.txt
There were 832 indels identified.
Kept 33771 mutations after droping mutations with missing information (Droped: 0)
Dropped 832 mutations after only keeping Missense_Mutation, Silent, Nonsense_Mutation, Splice_Site, Nonstop_Mutation, Translation_Start_Site. Indels are processed separately.
Dropped 182 mutations after only keeping valid SNVs
Dropped 0 mutations when removing duplicates
Working on chromosome: chr1 . . .
'N'
Traceback (most recent call last):
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/utils.py", line 131, in wrapper
result = f(*args, **kwds)
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 209, in singleprocess_permutation
drop_silent=opts['drop_silent'])
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/permutation.py", line 732, in maf_permutation
num_permutations)
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/sequence_context.py", line 226, in random_pos
pos_array = self.random_context_pos(n, num_permutations, contxt)
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/sequence_context.py", line 204, in random_context_pos
random_pos = self.prng_dict[context].choice(available_pos, (num_permutations, num))
KeyError: 'N'
Traceback (most recent call last):
File "/anaconda3/envs/2020plus/bin/mut_annotate", line 10, in
sys.exit(cli_main())
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 432, in cli_main
main(opts)
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 417, in main
multiprocess_permutation(bed_dict, mut_df, opts, indel_df)
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 136, in multiprocess_permutation
chrom_results = singleprocess_permutation(info)
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/utils.py", line 131, in wrapper
result = f(args, **kwds)
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 209, in singleprocess_permutation
drop_silent=opts['drop_silent'])
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/permutation.py", line 732, in maf_permutation
num_permutations)
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/sequence_context.py", line 226, in random_pos
pos_array = self.random_context_pos(n, num_permutations, contxt)
File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/sequence_context.py", line 204, in random_context_pos
random_pos = self.prng_dict[context].choice(available_pos, (num_permutations, num))
KeyError: 'N'
Error in job simMaf while creating output file output_bladder/simulated_summary/chasm_sim_maf9.txt.
RuleException:
CalledProcessError in line 135 of /Users/josephnovak/Desktop/2020plus-master/Snakefile:
Command 'mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --maf --seed=$((9
42)) -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf9.txt' returned non-zero exit status 1.
File "/Users/josephnovak/Desktop/2020plus-master/Snakefile", line 135, in __rule_simMaf
File "/anaconda3/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job simMaf since they might be corrupted:
output_bladder/simulated_summary/chasm_sim_maf9.txt
Will exit after finishing currently running jobs.

Error in rule simFeatures

Thanks for you kindly help on the last question.
Restarting the code with --cores = 20 ,

$ snakemake -s Snakefile pretrained_predict -p --cores 20
--config mutations="/home/kjf/2020plus-1.2.2/data/bladder.txt" output_dir="output" trained_classifier="/home/kjf/2020plus-1.2.2/data/2020plus_10k.Rdata"

I get a much greater speed.

However, when it run to
25 of 48 steps (52%) done
A error occurs (I pick some or the warning, removing the duplicated parts)

############################
**[Tue Apr 2 13:50:01 2019]
rule simFeatures:
input: output/simulated_summary/chasm_sim_summary10.txt, output/simulated_summary/oncogene_sim10.txt, output/simulated_summary/tsg_sim10.txt
output: output/simulated_summary/simulated_features10.txt
jobid: 16
wildcards: iter=10

python which 2020plus.py features -s output/simulated_summary/chasm_sim_summary10.txt --tsg-test output/simulated_summary/tsg_sim10.txt -og-test output/simulated_summary/oncogene_sim10.txt -o output/simulated_summary/simulated_features10.txt

Version: 1.2.2
Command: /home/kjf/2020plus-1.2.2/2020plus.py features -s output/simulated_summary/chasm_sim_summary6.txt --tsg-test output/simulated_summary/tsg_sim6.txt -og-test output/simulated_summary/oncogene_sim6.txt -o output/simulated_summary/simulated_features6.txt


AN ERROR HAS OCCURRED: check the log file

Type: <class 'ModuleNotFoundError'>

Exception: No module named 'sklearn'
Traceback:
File "/home/kjf/2020plus-1.2.2/2020plus.py", line 263, in
import src.classify.python.classifier
File "/home/kjf/2020plus-1.2.2/src/classify/python/classifier.py", line 2, in
from src.classify.python.dummy_clf import DummyClf
File "/home/kjf/2020plus-1.2.2/src/classify/python/dummy_clf.py", line 1, in
from sklearn.dummy import DummyClassifier

[Tue Apr 2 13:50:02 2019]
Error in rule simFeatures:
Error in rule simFeatures:
jobid: 8
jobid: 16
output: output/simulated_summary/simulated_features2.txt
output: output/simulated_summary/simulated_features10.txt

RuleException:
CalledProcessError in line 282 of /home/kjf/2020plus-1.2.2/Snakefile:
Command 'set -euo pipefail; python which 2020plus.py features -s output/summary.txt --tsg-test output/tsg.txt -og-test output/oncogene.txt -o output/features.txt' returned non-zero exit status 1.
File "/home/kjf/2020plus-1.2.2/Snakefile", line 282, in __rule_features
File "/home/kjf/anaconda3/envs/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run

Finished working on chromosome: chr13.
[Tue Apr 2 13:52:26 2019]
Finished job 30.
26 of 48 steps (54%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/kjf/2020plus-1.2.2/.snakemake/log/2019-04-02T123210.444145.snakemake.log**

Could you please help with the error? Thank you.

OSError: file `data//snvboxGenes.fa` not found

I'm getting and error while executing following sample command

snakemake -s Snakefile pretrained_predict -p --cores 1 --config mutations="data/bladder.txt" output_dir="output_bladder" trained_classifier="data/2020plus_10k.Rdata"

ERROR

Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 features
1 finishSim
1 og
1 predict_test
1 pretrained_predict
10 simFeatures
10 simMaf
10 simOg
10 simSummary
10 simTsg
1 summary
1 tsg
57

rule simMaf:
input: data/bladder.txt
output: output_bladder/simulated_summary/chasm_sim_maf5.txt
jobid: 51
wildcards: iter=5

mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --maf --seed=$((542)) -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf5.txt
Command: /root/Desktop/2020_Prod/2020plus/2020plus/bin/mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --maf --seed=210 -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf5.txt
Traceback (most recent call last):
File "/root/Desktop/2020_Prod/2020plus/2020plus/bin/mut_annotate", line 10, in
sys.exit(cli_main())
File "/root/Desktop/2020_Prod/2020plus/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 432, in cli_main
main(opts)
File "/root/Desktop/2020_Prod/2020plus/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 380, in main
gene_fa = pysam.Fastafile(opts['input'])
File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.cinit
File "pysam/libcfaidx.pyx", line 155, in pysam.libcfaidx.FastaFile._open
OSError: file data//snvboxGenes.fa not found
Error in job simMaf while creating output file output_bladder/simulated_summary/chasm_sim_maf5.txt.
RuleException:
CalledProcessError in line 135 of /root/Desktop/2020_Prod/2020plus/Snakefile:
Command 'mut_annotate --log-level=INFO -b data//snvboxGenes.bed -i data//snvboxGenes.fa -c 1.5 -m data/bladder.txt -p 0 -n 1 --maf --seed=$((5
42)) -r 3 --unique -o output_bladder/simulated_summary/chasm_sim_maf5.txt' returned non-zero exit status 1.
File "/root/Desktop/2020_Prod/2020plus/Snakefile", line 135, in __rule_simMaf
File "/root/Desktop/2020_Prod/2020plus/2020plus/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

Is grch38 supported?

Dear Karchin
I have maf file from aligning grch38 ver.
Can I get a hg38 'snvboxgene.bed' from you?

Thanks.

Exception: Error in randomForest.default(m, y, ...) : Bad sampsize specification

Hi

I'm trying to predict on my own pan-cancer mutation data (or train, both have the same error). During runtime, at the randomForest step
python 2020plus.py --log-level=INFO train -d .7 -o 1.0 -n 1000 -r output_pancan2/trained.Rdata --features=output_pancan2/features.txt --random-seed 71 the R code fails with the titled error. A little digging reveals that is_onco_pred and is_tsg_pred are both set to True, yet the number of tsg in label_counts in this iteration is 0, which breaks randomForest.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.