Code Monkey home page Code Monkey logo

Comments (3)

ctokheim avatar ctokheim commented on June 3, 2024

Hi @vmelichar.

I've attached an updated environment file to install the 2020plus environment (2020plus_environment.yml.gz), which had also worked for another user that had issues recently and I just tested worked for myself to successfully run the unit tests.

To install the environment, I would first either delete your current environment or change the name of the environment in the yaml file.

Then run conda (or mamba) to install.

conda env create -f 2020plus_environment.yml

Although for me conda was being quite slow to create the environment, so my installation was actually via mamba.

conda install mamba
mamba env create -f 2020plus_environment.yml

This should install everything that is necessary, including r dependencies. Just then need to activate the env.

A quick way to test if the 2020plus code is likely installed correctly is then to run the unit tests (this would be quicker then trying to run 20/20+ on large data and only finding out later in the pipeline there was a problem):

pip install nose
nosetests tests/test_features.py
nosetests tests/test_train.py
nosetests tests/test_classify.py

from 2020plus.

vmelichar avatar vmelichar commented on June 3, 2024

Thank you @ctokheim for the response. The installation of env worked perfectly. All the tests were also passed. But I am still getting an error when running on actual data. It still might be a problem with pandas....

This is the first error I get in the pipeline:

Traceback (most recent call last):
  File "/home/melichv/miniconda3/envs/2020plus/bin/probabilistic2020", line 8, in <module>
    sys.exit(cli_main())
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/probabilistic2020.py", line 284, in cli_main
    main(opts)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/probabilistic2020.py", line 229, in main
    result_df = rt.main(opts, mutation_df)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/randomization_test.py", line 392, in main
    opts['use_unmapped'])
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/count_frameshifts.py", line 37, in count_frameshift_total
    gene_df = fs_df[fs_df['Gene']==bed.gene_name]
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/frame.py", line 2982, in __getitem__
    return self._getitem_frame(key)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/frame.py", line 3082, in _getitem_frame
    return self.where(key)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/generic.py", line 9276, in where
    cond, other, inplace, axis, level, errors=errors, try_cast=try_cast
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/generic.py", line 9123, in _where
    axis=block_axis,
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 557, in where
    return self.apply("where", **kwargs)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 436, in apply
    kwargs[k] = obj.reindex(b_items, axis=axis, copy=align_copy)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/util/_decorators.py", line 221, in wrapper
    return func(*args, **kwargs)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/frame.py", line 3976, in reindex
    return super().reindex(**kwargs)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/generic.py", line 4514, in reindex
    axes, level, limit, tolerance, method, fill_value, copy
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/frame.py", line 3858, in _reindex_axes
    columns, method, copy, level, fill_value, limit, tolerance
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/frame.py", line 3906, in _reindex_columns
Dropped 139 mutations after only keeping valid SNVs
    allow_dups=False,
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/generic.py", line 4577, in _reindex_with_indexers
    copy=copy,
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 1251, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3362, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

Then there is also this one:

Traceback (most recent call last):
  File "/home/melichv/miniconda3/envs/2020plus/bin/mut_annotate", line 8, in <module>
    sys.exit(cli_main())
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 432, in cli_main
    main(opts)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 417, in main
    multiprocess_permutation(bed_dict, mut_df, opts, indel_df)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/annotate.py", line 86, in multiprocess_permutation
    indel_cts_dict = indel_df['Gene'].value_counts().to_dict()
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/generic.py", line 5179, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'value_counts'

And this one:

Traceback (most recent call last):
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/utils.py", line 131, in wrapper
    result = f(*args, **kwds)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/randomization_test.py", line 46, in singleprocess_permutation
    genes_with_mut = set(mut_df['Gene'].unique())
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/generic.py", line 5179, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'unique'
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/python/utils.py", line 131, in wrapper
    result = f(*args, **kwds)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/randomization_test.py", line 46, in singleprocess_permutation
    genes_with_mut = set(mut_df['Gene'].unique())
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/pandas/core/generic.py", line 5179, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'unique'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/melichv/miniconda3/envs/2020plus/bin/probabilistic2020", line 8, in <module>
    sys.exit(cli_main())
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/probabilistic2020.py", line 284, in cli_main
    main(opts)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/probabilistic2020.py", line 229, in main
    result_df = rt.main(opts, mutation_df)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/randomization_test.py", line 414, in main
    permutation_result = multiprocess_permutation(bed_dict, mut_df, opts)
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/randomization_test.py", line 178, in multiprocess_permutation
    for chrom_result in process_results:
  File "/home/melichv/miniconda3/envs/2020plus/lib/python3.6/multiprocessing/pool.py", line 761, in next
    raise value
AttributeError: 'DataFrame' object has no attribute 'unique'

Do you think there is a problem with my data and not with 2020plus?

Thank you for your help!

from 2020plus.

vmelichar avatar vmelichar commented on June 3, 2024

There was indeed an error with my input file. I produced the MAF by merging multiple MAFs, but 20/20+ takes only 8 columns as input, so many rows were duplicates. I corrected the MAF so it only contains specific columns and no duplicate rows and now everything works.

I appreciate your help earlier.

from 2020plus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.