Code Monkey home page Code Monkey logo

Comments (14)

ctokheim avatar ctokheim commented on June 3, 2024

Did you run the commands exactly as in the quick start? I'm not getting an error on the example. Also, what was the file format problem you had on the last issue you raised? Could it be something similar?

from 2020plus.

pradyumnasagar avatar pradyumnasagar commented on June 3, 2024

The only changes we have made to quick start is not using 'which 2020plus.py'. I do not understand why it is used or is it necessary to be used the same way as mentioned in the quick start. When we use

python3 'which 2020plus.py' --out-dir=result_compare classify -f testfeature.txt -nd test1/pancan_example/simulated_null_dist.txt

(Used ` instead of ' as shown in command above)
we get an error: Unknown option: --
usage: python3 [option] ... [-c cmd | -m mod | file | -] [arg] ...

Hence, instead I have used

python3 2020plus.py --out-dir=result_compare classify -f testfeature.txt -nd test1/pancan_example/simulated_null_dist.txt

Could this be the problem?

from 2020plus.

ctokheim avatar ctokheim commented on June 3, 2024

Backticks execute the code in between them. Did you add the directory containing 2020plus.py to your path? Looks like you didn't and the command can't be found because of it.

from 2020plus.

pradyumnasagar avatar pradyumnasagar commented on June 3, 2024

After exporting the PATH and executing the command

python3 `which 2020plus.py` --out-dir=result_compare classify -f testfeature.txt -nd test1/pancan_example/simulated_null_dist.txt 
Version: 1.1.0
Command: /home/Documents/2020/2020plus-1.1.0/2020plus.py --out-dir=result_compare classify -f testfeature.txt -nd test1/pancan_example/simulated_null_dist.txt
Running Random forest . . .
****************************************
AN ERROR HAS OCCURRED: check the log file
****************************************
Type: <class 'KeyError'>
Exception: 1
Traceback:
   File "/home/Documents/2020/2020plus-1.1.0/2020plus.py", line 341, in <module>
    args.func()  # run function corresponding to user's command
  File "/home/Documents/2020/2020plus-1.1.0/2020plus.py", line 37, in _classify
    src.classify.python.classifier.main(opts)  # run code
  File "/home/Documents/2020/2020plus-1.1.0/src/classify/python/classifier.py", line 250, in main
    rrclf.kfold_validation()
  File "/home/Documents/2020/2020plus-1.1.0/src/classify/python/generic_classifier.py", line 212, in kfold_validation
    self.y.iloc[train_ix].copy())
  File "/home/Documents/2020/2020plus-1.1.0/src/classify/python/r_random_forest_clf.py", line 102, in fit
    label_counts[self.onco_num],
  File "/usr/lib64/python3.3/site-packages/pandas/core/series.py", line 601, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/lib64/python3.3/site-packages/pandas/indexes/base.py", line 2169, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3342)
  File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3045)
  File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)
  File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)
  File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)

The error remains the same. Any suggestions?

from 2020plus.

ctokheim avatar ctokheim commented on June 3, 2024

Did the features subcommand work for you before this step? You seem to be using a file called testfeature.txt, which is not an output of the quick start example. The 20/20+ quick start is only meant to test the installation of 20/20+, not to modify to run your own data (you should see the tutorial for that).

Are you giving the classify subcommand a feature file that is just a "test" of a few lines? The command given expects features generated from a large pancancer data set because the command is doing a pancancer analysis. If you are just giving 20/20+ only a few lines of test input features or your own smaller sized data, I'd expect to see the error message above.

So, I suspect the problems are arising because of the data files you are providing to the commands. Could you start from the beginning of the quick start, and ONLY copy and paste the commands? And then provide me what works or doesn't, because I do not know what you did for the features subcommand before your posted error.

from 2020plus.

pradyumnasagar avatar pradyumnasagar commented on June 3, 2024

Feature subcommand worked perfectly, testfeature.txt was created using feature subcommand using quick start example data. The feature file is not few line test file, I tried copy paste the command and still have same issue.


[mallya@localhost pancan_example]$ python `which 2020plus.py` features \
>      -og-test oncogene.txt \
>      -tsg-test tsg.txt \
>      --summary summary_pancan.txt \
>      -o features_pancan.txt
Version: 1.1.0
Command: /home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py features -og-test oncogene.txt -tsg-test tsg.txt --summary summary_pancan.txt -o features_pancan.txt
FINISHED SUCCESSFULLY!
[mallya@localhost pancan_example]$ python `which 2020plus.py` --out-dir=result_compare classify \
>      -f features_pancan.txt \
>      -nd simulated_null_dist.txt
Version: 1.1.0
Command: /home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py --out-dir=result_compare classify -f features_pancan.txt -nd simulated_null_dist.txt
Running Random forest . . .
****************************************
AN ERROR HAS OCCURRED: check the log file
****************************************
Type: <class 'KeyError'>
Exception: 1
Traceback:
   File "/home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py", line 341, in <module>
    args.func()  # run function corresponding to user's command
  File "/home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py", line 37, in _classify
    src.classify.python.classifier.main(opts)  # run code
  File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/classifier.py", line 250, in main
    rrclf.kfold_validation()
  File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/generic_classifier.py", line 212, in kfold_validation
    self.y.iloc[train_ix].copy())
  File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/r_random_forest_clf.py", line 102, in fit
    label_counts[self.onco_num],
  File "/usr/lib64/python3.3/site-packages/pandas/core/series.py", line 601, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/lib64/python3.3/site-packages/pandas/indexes/base.py", line 2169, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3342)
  File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3045)
  File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)
  File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)
  File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)

example data is downloaded from http://karchinlab.org/data/2020+/pancan_example.tar.gz

from 2020plus.

ctokheim avatar ctokheim commented on June 3, 2024

I can't reproduce the error. Could you print the version of your python packages:

$ pip freeze

Also it might be helpful to add a print statement to the part of the source code that has the problem. Specifically right before line 101 of src/classify/python/r_random_forest_clf.py, add a print statement for label_counts and self.onco_num.

from 2020plus.

pradyumnasagar avatar pradyumnasagar commented on June 3, 2024

cycler==0.10.0
matplotlib==1.5.3
nose==1.3.7
numpy==1.11.2
pandas==0.19.1
probabilistic2020==1.0.7
pyparsing==2.1.10
pysam==0.9.1.4
python-dateutil==2.6.0
pytz==2016.7
rpy2==2.8.4
scikit-learn==0.18.1
scipy==0.18.1
singledispatch==3.4.0.3
six==1.10.0

from 2020plus.

ctokheim avatar ctokheim commented on June 3, 2024

r_random_forest_clf.py

from 2020plus.

pradyumnasagar avatar pradyumnasagar commented on June 3, 2024

After adding the print statement to r_random_forest_clf.py the following additional values were printed.

Running Random forest . . .
0 16519
Name: gene, dtype: int64
1

python3 `which 2020plus.py` --out-dir=result_compare classify      -f features_pancan.txt      -nd simulated_null_dist.txt
Version: 1.1.0
Command: /home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py --out-dir=result_compare classify -f features_pancan.txt -nd simulated_null_dist.txt
Running Random forest . . .
0    16519
Name: gene, dtype: int64
1
****************************************
AN ERROR HAS OCCURRED: check the log file
****************************************
Type: <class 'KeyError'>
Exception: 1
Traceback:
   File "/home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py", line 341, in <module>
    args.func()  # run function corresponding to user's command
  File "/home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py", line 37, in _classify
    src.classify.python.classifier.main(opts)  # run code
  File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/classifier.py", line 250, in main
    rrclf.kfold_validation()
  File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/generic_classifier.py", line 212, in kfold_validation
    self.y.iloc[train_ix].copy())
  File "/home/mallya/Documents/2020/2020plus-1.1.0/src/classify/python/r_random_forest_clf.py", line 104, in fit
    label_counts[self.onco_num],
  File "/usr/lib64/python3.3/site-packages/pandas/core/series.py", line 601, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/lib64/python3.3/site-packages/pandas/indexes/base.py", line 2169, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3342)
  File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3045)
  File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)
  File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)
  File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)


from 2020plus.

ctokheim avatar ctokheim commented on June 3, 2024

I ran your exact versions of python packages on python 3.5.2. Didn't get an error. I got the following when adding those print statements:

Running Random forest . . .
0 16407
2 63
1 47
Name: gene, dtype: int64
1

Did you happen to change the training list of oncogenes and tumor suppressor genes (data/gene_lists/oncogenes.txt and data/gene_lists/tsgs.txt in the source code)? 0 here is passenger genes, 1 is the label for oncogenes, and 2 is the label for tumor suppressors.

from 2020plus.

pradyumnasagar avatar pradyumnasagar commented on June 3, 2024

I think I have changed data/gene_lists/oncogenes.txt and data/gene_lists/tsgs.txt files. I have restored them and checking it now. Suppose if I need to run 2020plus for my data do I need to keep same training list oncogenes.txt and tsgs.txt ?

from 2020plus.

pradyumnasagar avatar pradyumnasagar commented on June 3, 2024

Output generated without plots.

[mallya@localhost pancan_example]$ python3 `which 2020plus.py` --out-dir=result_compare classify      -f features_pancan.txt      -nd simulated_null_dist.txt
Version: 1.1.0
Command: /home/mallya/Documents/2020/2020plus-1.1.0/2020plus.py --out-dir=result_compare classify -f features_pancan.txt -nd simulated_null_dist.txt
Running Random forest . . .
/home/mallya/Documents/2020/2020plus-1.1.0/src/utils/python/p_value.py:132: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  pval_adj = np.zeros(n)
Random forest significance test: 65 (39 novel) oncogenes, 109 (66 novel) tsg
FINISHED SUCCESSFULLY!

from 2020plus.

ctokheim avatar ctokheim commented on June 3, 2024

The two lists were established by cancer experts in the field. You don't need to modify the oncogenes.txt and tsgs.txt for your own data. Editing these were only meant for advanced users who are familiar with both machine learning and what cancer experts consider bona finde cancer driver genes. 20/20+ learns features that are signatures of oncogenes and tumor suppressor genes, so that it can predict whether a new mutated gene discovered in your data significantly looks like an oncogene or tumor suppressor gene.

In terms of plots, did you install matplotlib? It's an optional dependency.

from 2020plus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.