Comments (14)
Did you run the commands exactly as in the quick start? I'm not getting an error on the example. Also, what was the file format problem you had on the last issue you raised? Could it be something similar?
from 2020plus.
The only changes we have made to quick start is not using 'which 2020plus.py'. I do not understand why it is used or is it necessary to be used the same way as mentioned in the quick start. When we use
python3 'which 2020plus.py' --out-dir=result_compare classify -f testfeature.txt -nd test1/pancan_example/simulated_null_dist.txt
(Used ` instead of ' as shown in command above)
we get an error: Unknown option: --
usage: python3 [option] ... [-c cmd | -m mod | file | -] [arg] ...
Hence, instead I have used
python3 2020plus.py --out-dir=result_compare classify -f testfeature.txt -nd test1/pancan_example/simulated_null_dist.txt
Could this be the problem?
from 2020plus.
Backticks execute the code in between them. Did you add the directory containing 2020plus.py to your path? Looks like you didn't and the command can't be found because of it.
from 2020plus.
After exporting the PATH and executing the command
python3 `which 2020plus.py` --out-dir=result_compare classify -f testfeature.txt -nd test1/pancan_example/simulated_null_dist.txt
Version: 1.1.0
Command: /home/Documents/2020/2020plus-1.1.0/2020plus.py --out-dir=result_compare classify -f testfeature.txt -nd test1/pancan_example/simulated_null_dist.txt
Running Random forest . . .
****************************************
AN ERROR HAS OCCURRED: check the log file
****************************************
Type: <class 'KeyError'>
Exception: 1
Traceback:
File "/home/Documents/2020/2020plus-1.1.0/2020plus.py", line 341, in <module>
args.func() # run function corresponding to user's command
File "/home/Documents/2020/2020plus-1.1.0/2020plus.py", line 37, in _classify
src.classify.python.classifier.main(opts) # run code
File "/home/Documents/2020/2020plus-1.1.0/src/classify/python/classifier.py", line 250, in main
rrclf.kfold_validation()
File "/home/Documents/2020/2020plus-1.1.0/src/classify/python/generic_classifier.py", line 212, in kfold_validation
self.y.iloc[train_ix].copy())
File "/home/Documents/2020/2020plus-1.1.0/src/classify/python/r_random_forest_clf.py", line 102, in fit
label_counts[self.onco_num],
File "/usr/lib64/python3.3/site-packages/pandas/core/series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "/usr/lib64/python3.3/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3342)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3045)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)
File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)
File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)
The error remains the same. Any suggestions?
from 2020plus.
Did the features
subcommand work for you before this step? You seem to be using a file called testfeature.txt, which is not an output of the quick start example. The 20/20+ quick start is only meant to test the installation of 20/20+, not to modify to run your own data (you should see the tutorial for that).
Are you giving the classify
subcommand a feature file that is just a "test" of a few lines? The command given expects features generated from a large pancancer data set because the command is doing a pancancer analysis. If you are just giving 20/20+ only a few lines of test input features or your own smaller sized data, I'd expect to see the error message above.
So, I suspect the problems are arising because of the data files you are providing to the commands. Could you start from the beginning of the quick start, and ONLY copy and paste the commands? And then provide me what works or doesn't, because I do not know what you did for the features subcommand before your posted error.
from 2020plus.
Feature subcommand worked perfectly, testfeature.txt was created using feature subcommand using quick start example data. The feature file is not few line test file, I tried copy paste the command and still have same issue.
[mallya@localhost pancan_example]$ python `which 2020plus.py` features \
> -og-test oncogene.txt \
> -tsg-test tsg.txt \
> --summary summary_pancan.txt \
> -o features_pancan.txt
Version: 1.1.0
Command: /home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py features -og-test oncogene.txt -tsg-test tsg.txt --summary summary_pancan.txt -o features_pancan.txt
FINISHED SUCCESSFULLY!
[mallya@localhost pancan_example]$ python `which 2020plus.py` --out-dir=result_compare classify \
> -f features_pancan.txt \
> -nd simulated_null_dist.txt
Version: 1.1.0
Command: /home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py --out-dir=result_compare classify -f features_pancan.txt -nd simulated_null_dist.txt
Running Random forest . . .
****************************************
AN ERROR HAS OCCURRED: check the log file
****************************************
Type: <class 'KeyError'>
Exception: 1
Traceback:
File "/home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py", line 341, in <module>
args.func() # run function corresponding to user's command
File "/home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py", line 37, in _classify
src.classify.python.classifier.main(opts) # run code
File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/classifier.py", line 250, in main
rrclf.kfold_validation()
File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/generic_classifier.py", line 212, in kfold_validation
self.y.iloc[train_ix].copy())
File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/r_random_forest_clf.py", line 102, in fit
label_counts[self.onco_num],
File "/usr/lib64/python3.3/site-packages/pandas/core/series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "/usr/lib64/python3.3/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3342)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3045)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)
File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)
File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)
example data is downloaded from http://karchinlab.org/data/2020+/pancan_example.tar.gz
from 2020plus.
I can't reproduce the error. Could you print the version of your python packages:
$ pip freeze
Also it might be helpful to add a print statement to the part of the source code that has the problem. Specifically right before line 101 of src/classify/python/r_random_forest_clf.py, add a print statement for label_counts and self.onco_num.
from 2020plus.
cycler==0.10.0
matplotlib==1.5.3
nose==1.3.7
numpy==1.11.2
pandas==0.19.1
probabilistic2020==1.0.7
pyparsing==2.1.10
pysam==0.9.1.4
python-dateutil==2.6.0
pytz==2016.7
rpy2==2.8.4
scikit-learn==0.18.1
scipy==0.18.1
singledispatch==3.4.0.3
six==1.10.0
from 2020plus.
r_random_forest_clf.py
from 2020plus.
After adding the print statement to r_random_forest_clf.py the following additional values were printed.
Running Random forest . . .
0 16519
Name: gene, dtype: int64
1
python3 `which 2020plus.py` --out-dir=result_compare classify -f features_pancan.txt -nd simulated_null_dist.txt
Version: 1.1.0
Command: /home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py --out-dir=result_compare classify -f features_pancan.txt -nd simulated_null_dist.txt
Running Random forest . . .
0 16519
Name: gene, dtype: int64
1
****************************************
AN ERROR HAS OCCURRED: check the log file
****************************************
Type: <class 'KeyError'>
Exception: 1
Traceback:
File "/home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py", line 341, in <module>
args.func() # run function corresponding to user's command
File "/home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py", line 37, in _classify
src.classify.python.classifier.main(opts) # run code
File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/classifier.py", line 250, in main
rrclf.kfold_validation()
File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/generic_classifier.py", line 212, in kfold_validation
self.y.iloc[train_ix].copy())
File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/r_random_forest_clf.py", line 104, in fit
label_counts[self.onco_num],
File "/usr/lib64/python3.3/site-packages/pandas/core/series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "/usr/lib64/python3.3/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3342)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3045)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)
File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)
File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)
from 2020plus.
I ran your exact versions of python packages on python 3.5.2. Didn't get an error. I got the following when adding those print statements:
Running Random forest . . .
0 16407
2 63
1 47
Name: gene, dtype: int64
1
Did you happen to change the training list of oncogenes and tumor suppressor genes (data/gene_lists/oncogenes.txt and data/gene_lists/tsgs.txt in the source code)? 0 here is passenger genes, 1 is the label for oncogenes, and 2 is the label for tumor suppressors.
from 2020plus.
I think I have changed data/gene_lists/oncogenes.txt and data/gene_lists/tsgs.txt files. I have restored them and checking it now. Suppose if I need to run 2020plus for my data do I need to keep same training list oncogenes.txt and tsgs.txt ?
from 2020plus.
Output generated without plots.
[mallya@localhost pancan_example]$ python3 `which 2020plus.py` --out-dir=result_compare classify -f features_pancan.txt -nd simulated_null_dist.txt
Version: 1.1.0
Command: /home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py --out-dir=result_compare classify -f features_pancan.txt -nd simulated_null_dist.txt
Running Random forest . . .
/home/mallya/Documents/2020/2020plus-1.1.0/src/utils/python/p_value.py:132: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
pval_adj = np.zeros(n)
Random forest significance test: 65 (39 novel) oncogenes, 109 (66 novel) tsg
FINISHED SUCCESSFULLY!
from 2020plus.
The two lists were established by cancer experts in the field. You don't need to modify the oncogenes.txt and tsgs.txt for your own data. Editing these were only meant for advanced users who are familiar with both machine learning and what cancer experts consider bona finde cancer driver genes. 20/20+ learns features that are signatures of oncogenes and tumor suppressor genes, so that it can predict whether a new mutated gene discovered in your data significantly looks like an oncogene or tumor suppressor gene.
In terms of plots, did you install matplotlib? It's an optional dependency.
from 2020plus.
Related Issues (20)
- Error in rule simMaf HOT 16
- Error in rule simFeatures HOT 5
- Error in job simMAf : Called processor error : HOT 7
- rpy2 may need a specified version HOT 4
- Can't open file 'features': [Errno 2] No such file or directory HOT 2
- error while executing 2020plus.py command " python 2020plus.py" HOT 1
- Is grch38 supported? HOT 1
- Error in rule SimSummary: /bin/bash: mut_annotate: command not found HOT 5
- No such file or directory in 'feature file' HOT 10
- Should I train a new model ? HOT 1
- CalledProcessorError in Line 342
- Not enough Mutated Oncogenes or TSGs Found in Your Data HOT 5
- Error in load("data/2020plus_10k.Rdata") : error reading from connection HOT 1
- Errors: libicuuc.so.54 not found HOT 1
- Require snvboxGenes.fa file HOT 1
- Dependencies problem HOT 3
- can not find download page
- pandas.errors.EmptyDataError: No columns to parse from file HOT 5
- OSError: file `data//snvboxGenes.fa` not found HOT 5
- Keep getting error : Error in job simMaf while creating output file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 2020plus.