Code Monkey home page Code Monkey logo

gecco's Introduction

Hi, I'm GECCO!

๐ŸฆŽ ๏ธOverview

GECCO (Gene Cluster prediction with Conditional Random Fields) is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

Actions License Coverage Docs Source Mirror Changelog Issues Preprint PyPI Bioconda Galaxy Versions Wheel

๐Ÿ”ง Installing GECCO

GECCO is implemented in Python, and supports all versions from Python 3.7. It requires additional libraries that can be installed directly from PyPI, the Python Package Index.

Use pip to install GECCO on your machine:

$ pip install gecco-tool

If you'd rather use Conda, a package is available in the bioconda channel. You can install with:

$ conda install -c bioconda gecco

This will install GECCO, its dependencies, and the data needed to run predictions. This requires around 40MB of data to be downloaded, so it could take some time depending on your Internet connection. Once done, you will have a gecco command available in your $PATH.

Note that GECCO uses HMMER3, which can only run on PowerPC and recent x86-64 machines running a POSIX operating system. Therefore, GECCO will work on Linux and OSX, but not on Windows.

๐Ÿงฌ Running GECCO

Once gecco is installed, you can run it from the terminal by giving it a FASTA or GenBank file with the genomic sequence you want to analyze, as well as an output directory:

$ gecco run --genome some_genome.fna -o some_output_dir

Additional parameters of interest are:

  • --jobs, which controls the number of threads that will be spawned by GECCO whenever a step can be parallelized. The default, 0, will autodetect the number of CPUs on the machine using os.cpu_count.
  • --cds, controlling the minimum number of consecutive genes a BGC region must have to be detected by GECCO. The default is 3.
  • --threshold, controlling the minimum probability for a gene to be considered part of a BGC region. Using a lower number will increase the number (and possibly length) of predictions, but reduce accuracy. The default of 0.8 was selected to optimize precision/recall on a test set of 364 BGCs from MIBiG 2.0.
  • --cds-feature, which can be supplied a feature name to extract genes if the input file already contains gene annotations instead of predicting genes with Pyrodigal. A common value for records downloaded from GenBank is --cds-feature CDS.

๐Ÿ”Ž Results

GECCO will create the following files:

  • {genome}.genes.tsv: The genes file, containing the genes extracted or predicted from the input file, and per-gene BGC probabilities predicted by the CRF.
  • {genome}.features.tsv: The features file, containing the identified domains in the input sequences, in tabular format.
  • {genome}.clusters.tsv: If any were found, a clusters file, containing the coordinates of the predicted clusters along their putative biosynthetic type, in tabular format.
  • {genome}_cluster_{N}.gbk: If any were found, a GenBank file per cluster, containing the cluster sequence annotated with its member proteins and domains.

GECCO can also convert results to other formats that may be more convenient depending on the downstream usage. GECCO can convert results into:

  • GFF3 format so they can be loaded into a genomic viewer (gecco convert clusters --format gff).
  • GenBank files with antiSMASH-style features so they can be loaded into BiG-SLiCE for further analysis (gecco convert gbk --format bigslice).
  • FASTA files with the sequences of all the predicted BGCs (gecco convert gbk --format fna) or with the sequences of all their proteins (gecco convert gbk --format faa).

To get a more visual way of exploring of the predictions, you can open the GenBank files in a genome editing software like UGENE. You can otherwise load the results into an AntiSMASH report: check the Integrations page of the documentation for a step-by-step guide.

๐Ÿ”– Reference

GECCO can be cited using the following preprint:

Accurate de novo identification of biosynthetic gene clusters with GECCO. Laura M Carroll, Martin Larralde, Jonas Simon Fleck, Ruby Ponnudurai, Alessio Milanese, Elisa Cappio Barazzone, Georg Zeller. bioRxiv 2021.05.03.442509; doi:10.1101/2021.05.03.442509

๐Ÿ’ญ Feedback

โš ๏ธ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

๐Ÿ—๏ธ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

โš–๏ธ License

This software is provided under the GNU General Public License v3.0 or later. GECCO is developped by the Zeller Team at the European Molecular Biology Laboratory in Heidelberg.

gecco's People

Contributors

althonos avatar joschif avatar lmc297 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gecco's Issues

is it memory issues?

Hello, excuse my ignorance please :) I am having an issue with running gecco on my laptop ( macbook pro) and I get this issue.
zsh: segmentation fault gecco -v run --genome contigs.fa

/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

I am trying to search for BGCs in a contigs file after assembling reads from a soil sample ( metagenomes) . Does gecco work directly on such files or does it have to be only one genome at a time. Am I missing out something? Is it a memory issue or problem in installation? I ran gecco before on a refseq genome from NCBI and it worked fine.

Problems installing gecco via conda

Hi

I installed gecco via conda install -c bioconda -y gecco

When I try calling gecco within my conda activated environment, I get this error message

Traceback (most recent call last):
  File "<path_to>/envs/gecco/lib/python3.6/site-packages/gecco/cli/commands/_main.py", line 22, in <module>
    import importlib.metadata as importlib_metadata
ModuleNotFoundError: No module named 'importlib.metadata'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<path_to>/envs/gecco/bin/gecco", line 10, in <module>
    sys.exit(main())
  File "<path_to>/envs/gecco/lib/python3.6/site-packages/gecco/cli/__init__.py", line 9, in main
    from .commands._main import Main
  File "<path_to>/envs/gecco/lib/python3.6/site-packages/gecco/cli/commands/_main.py", line 24, in <module>
    import importlib_metadata
ModuleNotFoundError: No module named 'importlib_metadata'

Thanks in advance

do not run pyrodigal if input is genbank

Hi,
I ran GECCO v0.9.2 with default settings (gecco run --antismash-sedeload with genbank input) and it went smoothly. However, there was an issue when I tried to incorporate the annotations into antiSMASH:

ERROR    02/05 14:39:56   sideloaded area contains no complete CDS features in cfod1: Subregion(GECCO, 209296-212824, Unknown)

I thought it could be an issue with the parsing of the data into the sideloader, but the problem was that the genome, already gene-called in GenBank format, had been re-processed, and so, the coordinates of the genes and CDS didn't match any more (the input was a fungal genome). I don't know if this is an intended behaviour but it does break compatibility with the user's possible extant annotations and/or with database-sourced genomes. It also removes the possibility of predicting eukaryotic genomes, and probably a bit of a waste redoing the gene-calling if this has already been provided.

Regards

Issue with installation in conda env

Create env and install gecco

$ conda create -n gecco
$ source activate gecco
$ pip install gecco-tool --user

Run gecco

$ gecco
Usage:
    gecco [-v | -vv | -q | -qq] <cmd> [<args>...]
    gecco --version
    gecco --help [<cmd>]

Run gecco --version and --help

$ gecco --version
gecco 0.6.2
$ gecco --help
x An unexpected error occurred. Consider opening a new issue on the bug tracker (https://github.com/zellerlab/GECCO/issues/new) if it persists, including the traceback below:
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /home/fz274/.local/lib/python3.7/site-packages/gecco/cli/commands/_main.py:141 in execute        โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   138 โ”‚   โ”‚   โ”‚   โ”‚   subcmd.quiet = self.quiet                                                  โ”‚
โ”‚   139 โ”‚   โ”‚   โ”‚   โ”‚   subcmd.progress.disable = self.quiet > 0                                   โ”‚
โ”‚   140 โ”‚   โ”‚   โ”‚   # run the subcommand                                                           โ”‚
โ”‚ โฑ 141 โ”‚   โ”‚   โ”‚   return subcmd.execute(ctx)                                                     โ”‚
โ”‚   142 โ”‚   โ”‚   except CommandExit as sysexit:                                                     โ”‚
โ”‚   143 โ”‚   โ”‚   โ”‚   return sysexit.code                                                            โ”‚
โ”‚   144 โ”‚   โ”‚   except KeyboardInterrupt:                                                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/fz274/.local/lib/python3.7/site-packages/gecco/cli/commands/help.py:59 in execute          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   56 โ”‚   โ”‚   โ”‚   โ”‚   self.error("Unknown subcommand", repr(self.args["<cmd>"]))                  โ”‚
โ”‚   57 โ”‚   โ”‚   โ”‚   โ”‚   return 1                                                                    โ”‚
โ”‚   58 โ”‚   โ”‚   โ”‚   # Render the help message                                                       โ”‚
โ”‚ โฑ 59 โ”‚   โ”‚   โ”‚   doc = Main.doc() if subcmd_cls is None else subcmd_cls.doc()                    โ”‚
โ”‚   60 โ”‚   โ”‚   โ”‚   text = rich.text.Text(textwrap.dedent(doc).lstrip())                            โ”‚
โ”‚   61 โ”‚   โ”‚   โ”‚   console = rich.console.Console(file=self._stream, soft_wrap=True)               โ”‚
โ”‚   62 โ”‚   โ”‚   โ”‚   console.print(text)                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/fz274/.local/lib/python3.7/site-packages/gecco/cli/commands/_main.py:58 in doc             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    55 โ”‚   โ”‚   โ”‚   commands = (                                                                   โ”‚
โ”‚    56 โ”‚   โ”‚   โ”‚   โ”‚   "    {:12}{}".format(name, typing.cast(Command, cmd).summary)              โ”‚
โ”‚    57 โ”‚   โ”‚   โ”‚   โ”‚   for name, cmd in sorted(                                                   โ”‚
โ”‚ โฑ  58 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   cls._get_subcommands().items(), key=operator.itemgetter(0)             โ”‚
โ”‚    59 โ”‚   โ”‚   โ”‚   โ”‚   )                                                                          โ”‚
โ”‚    60 โ”‚   โ”‚   โ”‚   )                                                                              โ”‚
โ”‚    61 โ”‚   โ”‚   return (                                                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/fz274/.local/lib/python3.7/site-packages/gecco/cli/commands/_main.py:36 in _get_subcommand โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    33 โ”‚   โ”‚   commands = {}                                                                      โ”‚
โ”‚    34 โ”‚   โ”‚   for cmd in pkg_resources.iter_entry_points(__parent__):                            โ”‚
โ”‚    35 โ”‚   โ”‚   โ”‚   try:                                                                           โ”‚
โ”‚ โฑ  36 โ”‚   โ”‚   โ”‚   โ”‚   commands[cmd.name] = cmd.load()                                            โ”‚
โ”‚    37 โ”‚   โ”‚   โ”‚   except pkg_resources.DistributionNotFound as err:                              โ”‚
โ”‚    38 โ”‚   โ”‚   โ”‚   โ”‚   pass                                                                       โ”‚
โ”‚    39 โ”‚   โ”‚   return commands                                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /usr/local/software/anaconda/3.2019-10/lib/python3.7/site-packages/pkg_resources/__init__.py:244 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2440 โ”‚   โ”‚   โ”‚   )                                                                             โ”‚
โ”‚   2441 โ”‚   โ”‚   if require:                                                                       โ”‚
โ”‚   2442 โ”‚   โ”‚   โ”‚   self.require(*args, **kwargs)                                                 โ”‚
โ”‚ โฑ 2443 โ”‚   โ”‚   return self.resolve()                                                             โ”‚
โ”‚   2444 โ”‚                                                                                         โ”‚
โ”‚   2445 โ”‚   def resolve(self):                                                                    โ”‚
โ”‚   2446 โ”‚   โ”‚   """                                                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /usr/local/software/anaconda/3.2019-10/lib/python3.7/site-packages/pkg_resources/__init__.py:244 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2446 โ”‚   โ”‚   """                                                                               โ”‚
โ”‚   2447 โ”‚   โ”‚   Resolve the entry point from its module and attrs.                                โ”‚
โ”‚   2448 โ”‚   โ”‚   """                                                                               โ”‚
โ”‚ โฑ 2449 โ”‚   โ”‚   module = __import__(self.module_name, fromlist=['__name__'], level=0)             โ”‚
โ”‚   2450 โ”‚   โ”‚   try:                                                                              โ”‚
โ”‚   2451 โ”‚   โ”‚   โ”‚   return functools.reduce(getattr, self.attrs, module)                          โ”‚
โ”‚   2452 โ”‚   โ”‚   except AttributeError as exc:                                                     โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/fz274/.local/lib/python3.7/site-packages/gecco/cli/commands/annotate.py:26 in <module>     โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    23                                                                                            โ”‚
โ”‚    24 from ._base import Command, CommandExit, InvalidArgument                                   โ”‚
โ”‚    25 from .._utils import guess_sequences_format, in_context, patch_showwarnings                โ”‚
โ”‚ โฑ  26 from ...crf import ClusterCRF                                                              โ”‚
โ”‚    27 from ...hmmer import PyHMMER, HMM, embedded_hmms                                           โ”‚
โ”‚    28 from ...model import FeatureTable, ClusterTable, ProductType                               โ”‚
โ”‚    29 from ...orf import PyrodigalFinder                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/fz274/.local/lib/python3.7/site-packages/gecco/crf/__init__.py:36 in <module>              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    33 import tqdm                                                                                โ”‚
โ”‚    34 import pkg_resources                                                                       โ”‚
โ”‚    35 import sklearn_crfsuite                                                                    โ”‚
โ”‚ โฑ  36 import sklearn.model_selection                                                             โ”‚
โ”‚    37 import sklearn.preprocessing                                                               โ”‚
โ”‚    38                                                                                            โ”‚
โ”‚    39 from ..model import Gene                                                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
ModuleNotFoundError: No module named 'sklearn.model_selection'

Check scikit-learn version

$pip list|grep scikit-learn
scikit-learn                       0.24.2
All packages in env
$ pip list
Package                            Version    
---------------------------------- -----------
alabaster                          0.7.12     
amply                              0.1.4      
anaconda-client                    1.7.2      
anaconda-navigator                 1.9.7      
anaconda-project                   0.8.3      
appdirs                            1.4.4      
arrow                              0.17.0     
asn1crypto                         1.0.1      
astroid                            2.3.1      
astropy                            3.2.2      
atomicwrites                       1.3.0      
attrs                              19.2.0     
Autologging                        1.3.2      
Babel                              2.7.0      
backcall                           0.1.0      
backports.functools-lru-cache      1.5        
backports.os                       0.1.1      
backports.shutil-get-terminal-size 1.0.0      
backports.tempfile                 1.0        
backports.weakref                  1.0.post1  
bcrypt                             3.2.0      
beautifulsoup4                     4.8.0      
binaryornot                        0.4.4      
biopython                          1.78       
bitarray                           1.0.1      
bkcharts                           0.2        
bleach                             3.1.0      
bokeh                              1.3.4      
boto                               2.49.0     
boto3                              1.16.57    
botocore                           1.19.57    
Bottleneck                         1.2.1      
cached-property                    1.5.2      
carve                              0.2.1      
carveme                            1.4.1      
certifi                            2019.9.11  
cffi                               1.12.3     
chardet                            3.0.4      
chemicalchecker                    1.0.1      
Click                              7.0        
click-configfile                   0.2.3      
click-default-group                1.2.2      
click-log                          0.3.2      
cloudpickle                        1.2.2      
clyent                             1.2.2      
cobra                              0.20.0     
colorama                           0.4.1      
commonmark                         0.9.1      
conda                              4.7.12     
conda-build                        3.18.9     
conda-package-handling             1.6.0      
conda-verify                       3.4.2      
ConfigArgParse                     1.3        
configobj                          5.0.6      
configparser                       5.0.1      
contextlib2                        0.6.0      
cookiecutter                       1.7.2      
cryptography                       2.7        
csvsort                            1.6.1      
cycler                             0.10.0     
Cython                             0.29.13    
cytoolz                            0.9.0.1    
dask                               2.5.2      
datapackage                        1.15.1     
datrie                             0.8.2      
decorator                          4.4.0      
defusedxml                         0.6.0      
depinfo                            1.6.0      
diamond                            4.0.515    
diskcache                          5.1.0      
distributed                        2.5.2      
docopt                             0.6.2      
docplex                            2.19.202   
docutils                           0.15.2     
entrypoints                        0.3        
et-xmlfile                         1.0.1      
fastcache                          1.1.0      
filelock                           3.0.12     
Flask                              1.1.1      
fsspec                             0.5.2      
future                             0.17.1     
gecco-tool                         0.6.2      
gevent                             1.4.0      
gitdb                              4.0.5      
GitPython                          3.1.12     
glob2                              0.7        
gmpy2                              2.0.8      
goodtables                         2.5.2      
greenlet                           0.4.15     
h11                                0.12.0     
h5py                               2.9.0      
HeapDict                           1.0.1      
html5lib                           1.0.1      
httpcore                           0.12.2     
httpx                              0.16.1     
idna                               2.8        
ijson                              3.1.3      
imageio                            2.6.0      
imagesize                          1.1.0      
importlib-metadata                 0.23       
importlib-resources                5.1.0      
ipykernel                          5.1.2      
ipython                            7.8.0      
ipython-genutils                   0.2.0      
ipywidgets                         7.5.1      
isodate                            0.6.0      
isort                              4.3.21     
itsdangerous                       1.1.0      
jdcal                              1.4.1      
jedi                               0.15.1     
jeepney                            0.4.1      
Jinja2                             2.10.3     
jinja2-time                        0.2.0      
jmespath                           0.10.0     
joblib                             0.13.2     
json5                              0.8.5      
jsonlines                          2.0.0      
jsonpointer                        2.0        
jsonschema                         3.0.2      
jupyter                            1.0.0      
jupyter-client                     5.3.3      
jupyter-console                    6.0.0      
jupyter-core                       4.5.0      
jupyterlab                         1.1.4      
jupyterlab-server                  1.0.6      
keyring                            18.0.0     
kiwisolver                         1.1.0      
lazy-object-proxy                  1.4.2      
libarchive-c                       2.8        
lief                               0.9.0      
linear-tsv                         1.1.0      
llvmlite                           0.29.0     
locket                             0.2.0      
lxml                               4.4.1      
MarkupSafe                         1.1.1      
matplotlib                         2.2.5      
mccabe                             0.6.1      
memote                             0.12.0     
mistune                            0.8.4      
mkl-fft                            1.0.14     
mkl-random                         1.1.0      
mkl-service                        2.3.0      
mock                               3.0.5      
more-itertools                     7.2.0      
mpmath                             1.1.0      
msgpack                            0.6.1      
multipledispatch                   0.6.0      
navigator-updater                  0.2.1      
nbconvert                          5.6.0      
nbformat                           4.4.0      
networkx                           2.3        
nltk                               3.4.5      
nose                               1.3.7      
notebook                           6.0.1      
numba                              0.45.1     
numexpr                            2.7.0      
numpy                              1.17.2     
numpydoc                           0.9.1      
olefile                            0.46       
openpyxl                           3.0.0      
optlang                            1.4.4      
packaging                          19.2       
pandas                             0.25.1     
pandocfilters                      1.4.2      
paramiko                           2.7.2      
parso                              0.5.1      
partd                              1.0.0      
path.py                            12.0.1     
pathlib2                           2.3.5      
patsy                              0.5.1      
pep8                               1.7.1      
pexpect                            4.7.0      
pickleshare                        0.7.5      
Pillow                             6.2.0      
pip                                19.2.3     
pkginfo                            1.5.0.1    
pluggy                             0.13.0     
ply                                3.11       
poyo                               0.5.0      
prometheus-client                  0.7.1      
prompt-toolkit                     2.0.10     
psutil                             5.8.0      
psycopg2-binary                    2.8.6      
ptyprocess                         0.6.0      
PuLP                               2.4        
py                                 1.8.0      
pycodestyle                        2.5.0      
pycosat                            0.6.3      
pycparser                          2.19       
pycrypto                           2.6.1      
pycurl                             7.43.0.3   
pydantic                           1.7.3      
pyflakes                           2.1.1      
Pygments                           2.7.4      
pyhmmer                            0.3.0      
pylint                             2.4.2      
pylru                              1.2.0      
PyNaCl                             1.4.0      
pyodbc                             4.0.27     
pyOpenSSL                          19.0.0     
pyparsing                          2.4.2      
pyperclip                          1.6.0      
pyrodigal                          0.4.7      
pyrsistent                         0.15.4     
PySocks                            1.7.1      
pytest                             5.2.1      
pytest-arraydiff                   0.3        
pytest-astropy                     0.5.0      
pytest-doctestplus                 0.4.0      
pytest-openfiles                   0.4.0      
pytest-remotedata                  0.3.2      
python-crfsuite                    0.9.7      
python-dateutil                    2.8.0      
python-libsbml                     5.19.0     
python-libsbml-experimental        5.18.3     
python-slugify                     4.0.1      
pytz                               2019.3     
PyWavelets                         1.0.3      
PyYAML                             5.1.2      
pyzmq                              18.1.0     
QtAwesome                          0.6.0      
qtconsole                          4.5.5      
QtPy                               1.9.0      
ratelimiter                        1.2.0.post0
reframed                           1.2.0      
requests                           2.22.0     
rfc3986                            1.4.0      
rich                               10.1.0     
rope                               0.14.0     
ruamel-yaml                        0.15.46    
ruamel.yaml                        0.16.12    
ruamel.yaml.clib                   0.2.2      
s3transfer                         0.3.4      
scikit-image                       0.15.0     
scikit-learn                       0.24.2     
scipy                              1.6.3      
seaborn                            0.9.0      
SecretStorage                      3.1.1      
Send2Trash                         1.5.0      
setuptools                         41.4.0     
simpleeval                         0.9.10     
simplegeneric                      0.8.1      
singledispatch                     3.4.0.3    
six                                1.12.0     
sklearn                            0.0        
sklearn-crfsuite                   0.3.6      
smart-open                         4.2.0      
smetana                            1.2.0      
smmap                              3.0.4      
snakemake                          6.0.2      
sniffio                            1.2.0      
snowballstemmer                    2.0.0      
sortedcollections                  1.1.2      
sortedcontainers                   2.1.0      
soupsieve                          1.9.3      
Sphinx                             2.2.0      
sphinxcontrib-applehelp            1.0.1      
sphinxcontrib-devhelp              1.0.1      
sphinxcontrib-htmlhelp             1.0.2      
sphinxcontrib-jsmath               1.0.1      
sphinxcontrib-qthelp               1.0.2      
sphinxcontrib-serializinghtml      1.1.3      
sphinxcontrib-websupport           1.1.2      
spyder                             3.3.6      
spyder-kernels                     0.5.2      
SQLAlchemy                         1.3.9      
statistics                         1.0.3.5    
statsmodels                        0.10.1     
swiglpk                            4.65.1     
sympy                              1.4        
tables                             3.5.2      
tableschema                        1.20.0     
tabulate                           0.8.9      
tabulator                          1.53.1     
tblib                              1.4.0      
terminado                          0.8.2      
testpath                           0.4.2      
text-unidecode                     1.3        
threadpoolctl                      2.1.0      
toolz                              0.9.0      
toposort                           1.6        
tornado                            6.0.3      
tqdm                               4.36.1     
traitlets                          4.3.3      
travis-encrypt                     1.1.2      
typing-extensions                  3.7.4.3    
unicodecsv                         0.14.1     
urllib3                            1.24.2     
wcwidth                            0.1.7      
webencodings                       0.5.1      
Werkzeug                           0.16.0     
wheel                              0.33.6     
widgetsnbextension                 3.5.1      
wrapt                              1.11.2     
wurlitzer                          1.0.3      
xlrd                               1.2.0      
XlsxWriter                         1.2.1      
xlwt                               1.3.0      
zict                               1.0.0      
zipp                               0.6.0  

[Question] Documentation - Gecco use cases for 'annotation', downstream 'antismash'

Hi @althonos

I have some questions pertaining to documentation . I know you mention here some documentation and also have a disclaimer

Before I ask my questions, I there is a bug or something wrong in the help text for -vvv (verbose debugging). I do not think that the -vvv is working. Does it stand for very very verbose

  • When I invoke it, it causes the program to exit
    gecco -vvv run --genome GENOME.fasta -o gecco_GENOME >& verbose_GENOME_gecco.txt &
  • However, the same works if I change vvv to vv

Here is the relevant gecco --help text - it states vvv shows debug information

gecco --help

Parameters:
    -h, --help                 show the message for ``gecco`` or
                               for a given subcommand.
    -q, --quiet                silence any output other than errors
                               (-qq silences everything).
    -v, --verbose              increase verbosity (-v is minimal,
                               -vv is verbose, and -vvv shows
                               debug information).
    -V, --version              show the program version and exit.

I have some questions/feature requests:

  1. When do you use the gecco annotate command and what is the purpose of it
  2. In what scenarios does one use gecco for downstream post-processing with antismash. I could not understand the use case for it from the preprint
  3. I am assuming you would have done a downstream BiG-SLiCE process with your datasets. As a feature request or enhancement, it would be nice to have gecco outputs (or scripts) in a compatible way for BiG-SLiCE.
  • I do also note that you mention here to write our own scripts to make it compatible for BiG-SLiCE
Parameters - Cluster Detection:
    -c, --cds <N>                 the minimum number of coding sequences a
                                  valid cluster must contain. [default: 3]
    -m <m>, --threshold <m>       the probability threshold for cluster
                                  detection. Default depends on the
                                  post-processing method (0.4 for gecco,
                                  0.6 for antismash).
    --postproc <method>           the method to use for cluster validation
                                  (antismash or gecco). [default: gecco]

GECCO workflow for immunoglobulins

Hello GECCO community.

I intend to use GECCO for identification of gene conversion events in rabbit immunoglobulins (B-cell receptors of antibody heavy and light chains.)

Could somebody point me to any documentation that I could use as a guide?

I am not the first one to appreciate the value of GECCO for rabbit immunoglobulin studies. See Immunoglobulin gene conversion identification and analysis. However, no bioinformatics details are provided in that article.

Thank you.

Ivan

Visualization issue

Hi, I am having trouble visualizing the output from GECCO on antiSMASH. I followed the instructions in the GECCO documentation and uploaded the JSON file generated by GECCO into the "extra annotations" tab on antiSMASH but I do not see any of the information from GECCO after running antiSMASH. I get the same results as if I never uploaded the JSON file. Any idea how to solve this issue?

manual on the tool output

Hello,

Is there a manual that would explain the output files? I am interested in seeing what BCGs are shared by a range of genomes. The command to run the tool seems very simple but I am having trouble interpreting the output.

Many thanks!

CDS mode reports wrong coordinate end

Hi there,

I've been running GECCO on a few annotated genomes with --cds-feature CDS and sideloaded them in antiSMASH without problems. However, there is one genome with multiple CDS in several genes so I had to add the --locus-tag protein_id so GECCO doesn't crash with something like this:

$ gecco run -g genomic.gbk -o gecco_test --antismash-sideload --cds-feature CDS  -j $NCPUS
x An unexpected error occurred. Consider opening a new issue on the bug tracker ( https://github.com/zellerlab/GECCO/issues/new ) if it persists, including the traceback below:
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /home/z3382651/miniconda3/envs/gecco/lib/python3.9/site-packages/gecco/cli/commands/_main.py:153 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   150 โ”‚   โ”‚   โ”‚   โ”‚   subcmd.quiet = self.quiet                                                  โ”‚
โ”‚   151 โ”‚   โ”‚   โ”‚   โ”‚   subcmd.progress.disable = self.quiet > 0                                   โ”‚
โ”‚   152 โ”‚   โ”‚   โ”‚   # run the subcommand                                                           โ”‚
โ”‚ โฑ 153 โ”‚   โ”‚   โ”‚   return subcmd.execute(ctx)                                                     โ”‚
โ”‚   154 โ”‚   โ”‚   except CommandExit as sysexit:                                                     โ”‚
โ”‚   155 โ”‚   โ”‚   โ”‚   return sysexit.code                                                            โ”‚
โ”‚   156 โ”‚   โ”‚   except KeyboardInterrupt:                                                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/z3382651/miniconda3/envs/gecco/lib/python3.9/site-packages/gecco/cli/commands/run.py:332 i โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   329 โ”‚   โ”‚   โ”‚   self._make_output_directory(extensions)                                        โ”‚
โ”‚   330 โ”‚   โ”‚   โ”‚   # load sequences and extract genes                                             โ”‚
โ”‚   331 โ”‚   โ”‚   โ”‚   sequences = self._load_sequences()                                             โ”‚
โ”‚ โฑ 332 โ”‚   โ”‚   โ”‚   genes = self._extract_genes(sequences)                                         โ”‚
โ”‚   333 โ”‚   โ”‚   โ”‚   if genes:                                                                      โ”‚
โ”‚   334 โ”‚   โ”‚   โ”‚   โ”‚   self.success("Found", "a total of", len(genes), "genes", level=1)          โ”‚
โ”‚   335 โ”‚   โ”‚   โ”‚   else:                                                                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/z3382651/miniconda3/envs/gecco/lib/python3.9/site-packages/gecco/cli/commands/annotate.py: โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   233 โ”‚   โ”‚   โ”‚   self.success("Found", found, "genes in record", repr(record.id), level=2)      โ”‚
โ”‚   234 โ”‚   โ”‚   โ”‚   self.progress.update(task, advance=1)                                          โ”‚
โ”‚   235 โ”‚   โ”‚                                                                                      โ”‚
โ”‚ โฑ 236 โ”‚   โ”‚   return list(orf_finder.find_genes(sequences, progress=callback))                   โ”‚
โ”‚   237 โ”‚                                                                                          โ”‚
โ”‚   238 โ”‚   def _annotate_domains(self, genes: List["Gene"], whitelist: Optional[Container[str]] = โ”‚
โ”‚   239 โ”‚   โ”‚   from ...hmmer import PyHMMER, embedded_hmms                                        โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/z3382651/miniconda3/envs/gecco/lib/python3.9/site-packages/gecco/orf.py:187 in find_genes  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   184 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   protein = Protein(id=f"{record.id}_{i+1}", seq=prot_seq)               โ”‚
โ”‚   185 โ”‚   โ”‚   โ”‚   โ”‚   # check IDs are unique                                                     โ”‚
โ”‚   186 โ”‚   โ”‚   โ”‚   โ”‚   if protein.id in ids:                                                      โ”‚
โ”‚ โฑ 187 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   raise ValueError(f"Duplicate gene identifier found in {record.id!r}: { โ”‚
โ”‚   188 โ”‚   โ”‚   โ”‚   โ”‚   ids.add(protein.id)                                                        โ”‚
โ”‚   189 โ”‚   โ”‚   โ”‚   โ”‚   # fix coordinates (using 1-based, leftmost start in `Gene`, no STOP codon) โ”‚
โ”‚   190 โ”‚   โ”‚   โ”‚   โ”‚   if feature.location.strand == -1:                                          โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
ValueError: Duplicate gene identifier found in 'QVQW01000001.1': 'DL546_001492'

Once I add --locus-tag protein_id, it runs without problems. However, it made antiSMASH crash while sideloading:

ERROR    10/06 06:01:10   sideloaded area contains no complete CDS features in QVQW01000070.1: Subregion(GECCO, 45922-53301, RiPP)

I took a look at the GECCO output and those are indeed the coordinates reported in the *.sideload.json and the *.features.tsv. I then inspected the source gbk file and I found out that the coordinates for the reported proteins are actually 3 nt before the actual protein coordinates, i.e. 53301 instead of 53304.

These are the features reported in the range that antiSMASH complains:

QVQW01000070.1  RKU41646.1      45922   48397   -       PF00172 Pfam    2.1697438388095284e-07  7.84433781203734e-11    17      50      0.9998662132334699
QVQW01000070.1  RKU41647.1      49277   53301   +       PF00664 Pfam    8.584690299391551e-43   3.1036479751957886e-46  12      285     0.9990239121540062
QVQW01000070.1  RKU41647.1      49277   53301   +       PF00005 Pfam    1.019473891851838e-32   3.685733520794786e-36   356     514     0.9990239121540062
QVQW01000070.1  RKU41647.1      49277   53301   +       PF00664 Pfam    4.819150001206918e-48   1.7422812730321468e-51  653     922     0.9990239121540062
QVQW01000070.1  RKU41647.1      49277   53301   +       PF00005 Pfam    1.9178124750314499e-31  6.933523047836044e-35   994     1151    0.9990239121540062
QVQW01000070.1  RKU41651.1      49557   53301   +       PF00664 Pfam    9.83159809886755e-32    3.5544461673418474e-35  8       210     0.9980173661507098
QVQW01000070.1  RKU41651.1      49557   53301   +       PF00005 Pfam    9.31573975313726e-33    3.3679464038818724e-36  281     439     0.9980173661507098
QVQW01000070.1  RKU41651.1      49557   53301   +       PF00664 Pfam    4.3291279121459376e-48  1.565122166357895e-51   578     847     0.9980173661507098
QVQW01000070.1  RKU41651.1      49557   53301   +       PF00005 Pfam    1.7526661696993695e-31  6.336464821762001e-35   919     1076    0.9980173661507098
QVQW01000070.1  RKU41650.1      49930   53301   +       PF00664 Pfam    2.573268801701548e-08   9.303213310562356e-12   6       107     0.9965288403076623
QVQW01000070.1  RKU41650.1      49930   53301   +       PF00005 Pfam    8.123409368073151e-33   2.9368797426150224e-36  178     336     0.9965288403076623
QVQW01000070.1  RKU41650.1      49930   53301   +       PF00664 Pfam    3.673538032748807e-48   1.3281048563806245e-51  475     744     0.9965288403076623
QVQW01000070.1  RKU41650.1      49930   53301   +       PF00005 Pfam    1.5286475222431794e-31  5.526563710206723e-35   816     973     0.9965288403076623
QVQW01000070.1  RKU41648.1      50343   53301   +       PF00005 Pfam    6.760321537712605e-33   2.4440786470399873e-36  58      216     0.9943062519881032
QVQW01000070.1  RKU41648.1      50343   53301   +       PF00664 Pfam    2.9386520726827317e-48  1.0624194044406115e-51  355     624     0.9943062519881032
QVQW01000070.1  RKU41648.1      50343   53301   +       PF00005 Pfam    1.272544564498185e-31   4.600667261381724e-35   696     853     0.9943062519881032
QVQW01000070.1  RKU41649.1      50880   53301   +       PF00005 Pfam    2.1871313608716245e-11  7.90719942469857e-15    6       55      0.9909732046876351
QVQW01000070.1  RKU41649.1      50880   53301   +       PF00664 Pfam    2.017067394657362e-48   7.2923622366499e-52     194     463     0.9909732046876351
QVQW01000070.1  RKU41649.1      50880   53301   +       PF00005 Pfam    9.401288594063467e-32   3.398875124390263e-35   535     692     0.9909732046876351

And these are the corresponding annotations from the source gbk:

     gene            complement(45660..48678)
                     /locus_tag="DL546_002159"
     mRNA            complement(join(45660..46357,46417..47485,47549..48027,
                     48079..48203,48265..48288,48344..48678))
                     /locus_tag="DL546_002159"
                     /product="hypothetical protein"
     CDS             complement(join(45919..46357,46417..47485,47549..48027,
                     48079..48203,48265..48288,48344..48397))
                     /locus_tag="DL546_002159"
                     /codon_start=1
                     /product="hypothetical protein"
                     /protein_id="RKU41646.1"
                     /translation="MEGVGETSDEPHGSLRRACDQCRFRKIRCDKVTGTPCYHCRAAK
                     RECTSTGGQKRKDGRQRVSVSHSYERKIELVGQRLADIEKTLSNLTHLTISLGSSGTL
                     GRMSMTATAPSHASTGMESSTAYVTPSVDEETDETFEGNSSMTAHTVFASDFMEQAVT
                     SPLFNEKLSHDMKKALASLRQMVHLQSRRQAVHESIFVHQKPIFEGGLNQLPLPPTDI
                     VLRVLRETKNAPSETFMRYCVFITIRGFVDYCQRVFFATEEYSPTAFVIVNAGLYFLF
                     QERSLLADGSVNKAYRDLQNLCRDNLETALANLPLLLPPKRETVEALLLGVSLLVTLS
                     CCVFLAQCAQVLYSIDISKFTIAWQLNSAAASMCQALGWHRIQLTETEDTDDTRLASF
                     WFCYMQDKSLALRFGRTSVIRDLEITAPRCFGNMTDLSESWKHITALWIQTGSILGDM
                     YDHLYSPEALARPPEGRIETARRLADRKKQLFHGLEETSAALKQDADLVTAPSVGDAT
                     DSATRSKMVDMTVKSAEVSHLACLTLIYRALPPSPSFPSSFNVECLEAARLAFKRHAE
                     CMELSSDNFFARVGYLRWTLLYGTFSPLIVLFCHVIETSNQDDLQQLANFTASLEPLV
                     SLSLAMEKFYRLSRTLCQVATLYVEVKTQGQDQHDQDTSSISNDFDVYLNQLGFISSG
                     QHNSSGDMAIPGSNAAPDLETWFSGNSYIMGLMEEDLLDFDTHLNPP"
     gene            49277..53608
                     /locus_tag="DL546_002293"
     mRNA            join(49277..50231,50285..50711,50766..50921,50978..51542,
                     51596..53608)
                     /locus_tag="DL546_002293"
                     /product="hypothetical protein"
     mRNA            join(49277..49565,49630..50231,50285..50711,50766..50921,
                     50978..51542,51596..53608)
                     /locus_tag="DL546_002293"
                     /product="hypothetical protein"
     mRNA            join(49277..49455,49511..49565,49630..50711,50766..50921,
                     50978..51542,51596..53608)
                     /locus_tag="DL546_002293"
                     /product="hypothetical protein"
     mRNA            join(49277..49455,49511..49565,49630..50231,50285..50711,
                     50766..50921,50978..51542,51596..53608)
                     /locus_tag="DL546_002293"
                     /product="hypothetical protein"
     CDS             join(49277..49455,49511..49565,49630..50231,50285..50711,
                     50766..50921,50978..51542,51596..53304)
                     /locus_tag="DL546_002293"
                     /codon_start=1
                     /product="hypothetical protein"
                     /protein_id="RKU41647.1"
                     /translation="MYNDTIGWITTVLGIICMVAAGVLLPVMNFVFGKFVTVFNDFII
                     GKKSPEDFRSSINHYTLYFVYLFVAKFVLSYMWTTIVSINAIRLTRSLRIDFLKQTLR
                     QEIPYFDSAEAGSIAGNINRGGNLVNQGISERFGLTVQATTTFFSAFIVAFAVQWKLT
                     LICLSIVAANLIVVTVCVMIDSGIENKLNATWGEADKLAEEVFASIRNVHAFWAYGKL
                     SAKFEGLMQSTRHLAQRKPPIYAILFSVQFFCIYAGYGLAFWQGIRMYHRGEIDQPGG
                     VVTVILAVLLAAQGLTQIAPQIMVVSKAVGAADGLFKTIDRESKIDSLSTRGTTPQDC
                     HGEILLDKVQFAYPSRPSVQVLNGLSLVIPANKTTAIVGASGSGKSTIVSLLERWFEP
                     TSGTITFDGQPIQTLHISWLRINMRLVQQEPVLFSGTVYQNVVYGLSGTPQAELADDI
                     KLRLVEQACMAAFAHDFIEKLPDGYHTEIGERGRMLSGGQKQRLAIARAIISNPRVLL
                     LDEATSALDANAEHVVQQALNHVAAGRTTVVIAHRLSTVRGADNIVVMAKGTIVEQGT
                     HEELMRHGGAYFRLVRAQQLGRDDMGEDAPLHDDAEQPTTAPKTLSANALETNPEQAA
                     VQADIHYNLMKCLAIIIKEQRNLWFPCAIVGLAAVIGGGMYPALAVLFSRVLDAFALT
                     GDAMLKRGDFYALMFFVMALGNLVAYAAMGWMSSLVSQEIARSYRLDIFNNLIRQEMT
                     FFDDADNGTGALVSRLSTEPTAIQELLSSNIALLLTISVNLTSSCVLALAYGWKLGLT
                     LTFGALPPLVAAGYVRIQLESRLDKETASRFANSASVAAEAVSAIRTVASLTMENEVL
                     AKYEDSLRYVTRASAKSLVRTMFWYALSQSISFLSMALGFWYGGRLISFNEYTSQQFY
                     TVFVAVIFSGEAAASLFQYTSSITQAQGAANYVFNLRRQVDKDMRDNYPPRDGHSHSG
                     AAQVECKDLVFSYPRRPGSRVLNEVTLSVQPGQFIAFVGASGCGKTTMISLLERFYEP
                     TGGTILLDGVDSISSHLGQYRRHIALVQQEPVLYQGSLRENIALGIEDLPGGGSSTVT
                     DEDILEACRQANIDTFILSLPDGLSTRCGSQGLQFSGGQRQRIALARALTRKPRLLLL
                     DEATSSLDTESERIVQAALDDAAKGENTKRTTIAVAHRLSTIRNADLIFVFSRGRITE
                     VGRHEDLVRRKGMYHQMFLAQSLDEA"
     CDS             join(49557..49565,49630..50231,50285..50711,50766..50921,
                     50978..51542,51596..53304)
                     /locus_tag="DL546_002293"
                     /codon_start=1
                     /product="hypothetical protein"
                     /protein_id="RKU41651.1"
                     /translation="MWTTIVSINAIRLTRSLRIDFLKQTLRQEIPYFDSAEAGSIAGN
                     INRGGNLVNQGISERFGLTVQATTTFFSAFIVAFAVQWKLTLICLSIVAANLIVVTVC
                     VMIDSGIENKLNATWGEADKLAEEVFASIRNVHAFWAYGKLSAKFEGLMQSTRHLAQR
                     KPPIYAILFSVQFFCIYAGYGLAFWQGIRMYHRGEIDQPGGVVTVILAVLLAAQGLTQ
                     IAPQIMVVSKAVGAADGLFKTIDRESKIDSLSTRGTTPQDCHGEILLDKVQFAYPSRP
                     SVQVLNGLSLVIPANKTTAIVGASGSGKSTIVSLLERWFEPTSGTITFDGQPIQTLHI
                     SWLRINMRLVQQEPVLFSGTVYQNVVYGLSGTPQAELADDIKLRLVEQACMAAFAHDF
                     IEKLPDGYHTEIGERGRMLSGGQKQRLAIARAIISNPRVLLLDEATSALDANAEHVVQ
                     QALNHVAAGRTTVVIAHRLSTVRGADNIVVMAKGTIVEQGTHEELMRHGGAYFRLVRA
                     QQLGRDDMGEDAPLHDDAEQPTTAPKTLSANALETNPEQAAVQADIHYNLMKCLAIII
                     KEQRNLWFPCAIVGLAAVIGGGMYPALAVLFSRVLDAFALTGDAMLKRGDFYALMFFV
                     MALGNLVAYAAMGWMSSLVSQEIARSYRLDIFNNLIRQEMTFFDDADNGTGALVSRLS
                     TEPTAIQELLSSNIALLLTISVNLTSSCVLALAYGWKLGLTLTFGALPPLVAAGYVRI
                     QLESRLDKETASRFANSASVAAEAVSAIRTVASLTMENEVLAKYEDSLRYVTRASAKS
                     LVRTMFWYALSQSISFLSMALGFWYGGRLISFNEYTSQQFYTVFVAVIFSGEAAASLF
                     QYTSSITQAQGAANYVFNLRRQVDKDMRDNYPPRDGHSHSGAAQVECKDLVFSYPRRP
                     GSRVLNEVTLSVQPGQFIAFVGASGCGKTTMISLLERFYEPTGGTILLDGVDSISSHL
                     GQYRRHIALVQQEPVLYQGSLRENIALGIEDLPGGGSSTVTDEDILEACRQANIDTFI
                     LSLPDGLSTRCGSQGLQFSGGQRQRIALARALTRKPRLLLLDEATSSLDTESERIVQA
                     ALDDAAKGENTKRTTIAVAHRLSTIRNADLIFVFSRGRITEVGRHEDLVRRKGMYHQM
                     FLAQSLDEA"
     CDS             join(49930..50231,50285..50711,50766..50921,50978..51542,
                     51596..53304)
                     /locus_tag="DL546_002293"
                     /codon_start=1
                     /product="hypothetical protein"
                     /protein_id="RKU41650.1"
                     /translation="MIDSGIENKLNATWGEADKLAEEVFASIRNVHAFWAYGKLSAKF
                     EGLMQSTRHLAQRKPPIYAILFSVQFFCIYAGYGLAFWQGIRMYHRGEIDQPGGVVTV
                     ILAVLLAAQGLTQIAPQIMVVSKAVGAADGLFKTIDRESKIDSLSTRGTTPQDCHGEI
                     LLDKVQFAYPSRPSVQVLNGLSLVIPANKTTAIVGASGSGKSTIVSLLERWFEPTSGT
                     ITFDGQPIQTLHISWLRINMRLVQQEPVLFSGTVYQNVVYGLSGTPQAELADDIKLRL
                     VEQACMAAFAHDFIEKLPDGYHTEIGERGRMLSGGQKQRLAIARAIISNPRVLLLDEA
                     TSALDANAEHVVQQALNHVAAGRTTVVIAHRLSTVRGADNIVVMAKGTIVEQGTHEEL
                     MRHGGAYFRLVRAQQLGRDDMGEDAPLHDDAEQPTTAPKTLSANALETNPEQAAVQAD
                     IHYNLMKCLAIIIKEQRNLWFPCAIVGLAAVIGGGMYPALAVLFSRVLDAFALTGDAM
                     LKRGDFYALMFFVMALGNLVAYAAMGWMSSLVSQEIARSYRLDIFNNLIRQEMTFFDD
                     ADNGTGALVSRLSTEPTAIQELLSSNIALLLTISVNLTSSCVLALAYGWKLGLTLTFG
                     ALPPLVAAGYVRIQLESRLDKETASRFANSASVAAEAVSAIRTVASLTMENEVLAKYE
                     DSLRYVTRASAKSLVRTMFWYALSQSISFLSMALGFWYGGRLISFNEYTSQQFYTVFV
                     AVIFSGEAAASLFQYTSSITQAQGAANYVFNLRRQVDKDMRDNYPPRDGHSHSGAAQV
                     ECKDLVFSYPRRPGSRVLNEVTLSVQPGQFIAFVGASGCGKTTMISLLERFYEPTGGT
                     ILLDGVDSISSHLGQYRRHIALVQQEPVLYQGSLRENIALGIEDLPGGGSSTVTDEDI
                     LEACRQANIDTFILSLPDGLSTRCGSQGLQFSGGQRQRIALARALTRKPRLLLLDEAT
                     SSLDTESERIVQAALDDAAKGENTKRTTIAVAHRLSTIRNADLIFVFSRGRITEVGRH
                     EDLVRRKGMYHQMFLAQSLDEA"

My guess is that it is an issue with L195 in orf.py, which assumes the presence of stop codon and removes it.

Regards

Installation issue

Screenshot (140)
Hi, I am trying to install GECCO on my windows laptop for a research project and keep getting this pyhmmer error.

Reporting a vulnerability

Hello!

I hope you are doing well!

We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.

Can you enable it, so that we can report it?

Thanks in advance!

PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository

issue with bigslice

Hello I am trying to run the second line and it is not working :
gecco convert gbk -i bigslice_dir/dataset_1/U00096.2_cluster_1/ --format bigslice

x An unexpected error occurred. Consider opening a new issue on the bug tracker ( https://github.com/zellerlab/GECCO/issues/new ) if it persists, including the traceback below:
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚                                                                                                  โ”‚
โ”‚ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/gecco/cli/commands โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   150 โ”‚   โ”‚   โ”‚   โ”‚   subcmd.quiet = self.quiet                                                  โ”‚
โ”‚   151 โ”‚   โ”‚   โ”‚   โ”‚   subcmd.progress.disable = self.quiet > 0                                   โ”‚
โ”‚   152 โ”‚   โ”‚   โ”‚   # run the subcommand                                                           โ”‚
โ”‚ โฑ 153 โ”‚   โ”‚   โ”‚   return subcmd.execute(ctx)                                                     โ”‚
โ”‚   154 โ”‚   โ”‚   except CommandExit as sysexit:                                                     โ”‚
โ”‚   155 โ”‚   โ”‚   โ”‚   return sysexit.code                                                            โ”‚
โ”‚   156 โ”‚   โ”‚   except KeyboardInterrupt:                                                          โ”‚
โ”‚ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/gecco/cli/commands โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   195 โ”‚   โ”‚   โ”‚   # run the appropriate method                                                   โ”‚
โ”‚   196 โ”‚   โ”‚   โ”‚   if self.args["gbk"]:                                                           โ”‚
โ”‚   197 โ”‚   โ”‚   โ”‚   โ”‚   if self.args["--format"] == "bigslice":                                    โ”‚
โ”‚ โฑ 198 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   self._convert_gbk_bigslice(ctx)                                        โ”‚
โ”‚   199 โ”‚   โ”‚   โ”‚   โ”‚   elif self.args["--format"] == "fna":                                       โ”‚
โ”‚   200 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   self._convert_gbk_fna(ctx)                                             โ”‚
โ”‚   201 โ”‚   โ”‚   โ”‚   โ”‚   elif self.args["--format"] == "faa":                                       โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/gecco/cli/commands โ”‚
โ”‚                                                                                                  โ”‚
โ”‚    89 โ”‚   โ”‚   # load the original coordinates from the `*.clusters.tsv` files                    โ”‚
โ”‚    90 โ”‚   โ”‚   coordinates = {}                                                                   โ”‚
โ”‚    91 โ”‚   โ”‚   types = {}                                                                         โ”‚
โ”‚ โฑ  92 โ”‚   โ”‚   for cluster_file in self.progress.track(cluster_files, task_id=task, precision="") โ”‚
โ”‚    93 โ”‚   โ”‚   โ”‚   cluster_fh = ctx.enter_context(open(cluster_file))                             โ”‚
โ”‚    94 โ”‚   โ”‚   โ”‚   for row in ClusterTable.load(cluster_fh):                                      โ”‚
โ”‚    95 โ”‚   โ”‚   โ”‚   โ”‚   ty = ";".join(sorted(ty.name for ty in row.type.unpack()))                 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
TypeError: track() got an unexpected keyword argument 'precision'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.