Code Monkey home page Code Monkey logo

deepbgc's People

Contributors

gkapatai avatar jasonmvictor avatar kblin avatar microbiology avatar peterjc avatar prihoda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepbgc's Issues

Installation Failure

Hi, there!
I've downloaded 'DeepBGC' on my Mac successfully, but there's some issue reported as follows when I tapped in 'deepbgc info'.
=================|_|===== version 0.1.30 ===== INFO 26/06 22:03:07 Available data files: ['Pfam-A.31.0.hmm.h3f', 'Pfam-A.31.0.hmm.h3i', 'Pfam-A.31.0.clans.tsv', 'Pfam-A.31.0.hmm', 'Pfam-A.31.0.hmm.h3p', 'Pfam-A.31.0.hmm.h3m'] INFO 26/06 22:03:07 ================================================================================ INFO 26/06 22:03:07 Available detectors: ['clusterfinder_retrained', 'clusterfinder_geneborder', 'clusterfinder_original', 'deepbgc'] INFO 26/06 22:03:07 -------------------------------------------------------------------------------- INFO 26/06 22:03:07 Model: clusterfinder_retrained INFO 26/06 22:03:07 Loading model from: /Users/Work/Library/Application Support/deepbgc/data/0.1.0/detector/clusterfinder_retrained.pkl WARNING 26/06 22:03:07 Model not supported: ('Package "hmmlearn" needs to be installed to run ClusterFinder HMM. ', 'Install extra dependencies using: \n pip install "deepbgc[hmm]"') INFO 26/06 22:03:07 -------------------------------------------------------------------------------- INFO 26/06 22:03:07 Model: clusterfinder_geneborder INFO 26/06 22:03:07 Loading model from: /Users/Work/Library/Application Support/deepbgc/data/0.1.0/detector/clusterfinder_geneborder.pkl WARNING 26/06 22:03:07 Model not supported: ('Package "hmmlearn" needs to be installed to run ClusterFinder HMM. ', 'Install extra dependencies using: \n pip install "deepbgc[hmm]"') INFO 26/06 22:03:07 -------------------------------------------------------------------------------- INFO 26/06 22:03:07 Model: clusterfinder_original INFO 26/06 22:03:07 Loading model from: /Users/Work/Library/Application Support/deepbgc/data/0.1.0/detector/clusterfinder_original.pkl WARNING 26/06 22:03:07 Model not supported: ('Package "hmmlearn" needs to be installed to run ClusterFinder HMM. ', 'Install extra dependencies using: \n pip install "deepbgc[hmm]"') INFO 26/06 22:03:07 -------------------------------------------------------------------------------- INFO 26/06 22:03:07 Model: deepbgc INFO 26/06 22:03:07 Loading model from: /Users/Work/Library/Application Support/deepbgc/data/0.1.0/detector/deepbgc.pkl Using TensorFlow backend. WARNING 26/06 22:03:08 Model not supported: ("Error unpickling model from path '/Users/Work/Library/Application Support/deepbgc/data/0.1.0/detector/deepbgc.pkl'", TypeError('Descriptors cannot not be created directly.\nIf this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.\nIf you cannot immediately regenerate your protos, some other possible workarounds are:\n 1. Downgrade the protobuf package to 3.20.x or lower.\n 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).\n\nMore information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates')) INFO 26/06 22:03:08 ================================================================================ INFO 26/06 22:03:08 Available classifiers: ['product_activity', 'product_class'] INFO 26/06 22:03:08 -------------------------------------------------------------------------------- INFO 26/06 22:03:08 Model: product_activity INFO 26/06 22:03:08 Loading model from: /Users/Work/Library/Application Support/deepbgc/data/0.1.0/classifier/product_activity.pkl /Users/Work/opt/miniconda3/envs/deepbgc/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.18.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk. UserWarning) /Users/Work/opt/miniconda3/envs/deepbgc/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.18.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk. UserWarning) INFO 26/06 22:03:08 Type: RandomForestClassifier INFO 26/06 22:03:08 Version: 0.1.0 INFO 26/06 22:03:08 Timestamp: 1551781433.886473 (2019-03-05T18:23:53.886473) INFO 26/06 22:03:08 -------------------------------------------------------------------------------- INFO 26/06 22:03:08 Model: product_class INFO 26/06 22:03:08 Loading model from: /Users/Work/Library/Application Support/deepbgc/data/0.1.0/classifier/product_class.pkl INFO 26/06 22:03:08 Type: RandomForestClassifier INFO 26/06 22:03:08 Version: 0.1.0 INFO 26/06 22:03:08 Timestamp: 1551781410.019103 (2019-03-05T18:23:30.019103) INFO 26/06 22:03:08 ================================================================================ WARNING 26/06 22:03:08 Some warnings detected, check the output above
It looks like something is blocking me from accessing the file 'deepbgc.pkl', or maybe something is incompatible with 'deepbgc.pkl'. I tried to run the script as stated in 'README', using a partial sequence from my genome. It seems the question stands that 'deepbgc.pkl' is not working as supposed (see the picture).
error
)

Installing deepBGC

I am trying to install deepBGC on my pc but I'm havig some issues that I cannot solve.

I am using Python3.7.0 and conda and already added all the necessary channels. However, everytime I try to install deepBGC I get this issue:

image

I already try with and without forcing the deepBGC version and get the same error everytime.

Can you please help?

Thanks!~

allow for gzip genome fasta input

It would be nice if deepbgc allowed gzip'ed genome fasta input, which is allowed by Antismash v5. At least with deepbgc v0.1.13, the following error occurs if a gzip'ed fasta is provided as input:

ERROR   03/10 07:41:20   Please provide a GenBank or FASTA sequence with an appropriate file extension.

DeepBGC failed with Exception: Unexpected error detecting protein domains using HMMER hmmscan

Hi, I have some problems about using DeepBGC to identify the BGCs in Mycobacterium tuberculosis H37Rv (high GC Gram+).

Here is my DeepBGC version information:

(base) mima@123456:~/BGC/jieheganjun$ deepbgc info
 _____                  ____    ____   ____ 
 |  _ \  ___  ___ ____ | __ )  / ___) / ___)
 | | \ \/ _ \/ _ \  _ \|  _ \ | |  _ | |    
 | |_/ /  __/  __/ |_) | |_) || |_| || |___ 
 |____/ \___|\___| ___/|____/  \____| \____)
=================|_|===== version 0.1.26 =====
INFO    08/04 16:55:44   Available data files: ['Pfam-A.31.0.hmm', 'Pfam-A.31.0.hmm.h3f', 'Pfam-A.31.0.hmm.h3m', 'Pfam-A.31.0.hmm.h3i', 'Pfam-A.31.0.clans.tsv', 'Pfam-A.31.0.hmm.h3p']
INFO    08/04 16:55:44   ================================================================================
INFO    08/04 16:55:44   Available detectors: ['clusterfinder_retrained', 'clusterfinder_original', 'deepbgc', 'product_class', 'product_activity', 'clusterfinder_geneborder']
INFO    08/04 16:55:44   --------------------------------------------------------------------------------
INFO    08/04 16:55:44   Model: clusterfinder_retrained
INFO    08/04 16:55:44   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/detector/clusterfinder_retrained.pkl
WARNING 08/04 16:55:44   Model not supported: ('Package "hmmlearn" needs to be installed to run ClusterFinder HMM. ', 'Install extra dependencies using: \n    pip install "deepbgc[hmm]"')
INFO    08/04 16:55:44   --------------------------------------------------------------------------------
INFO    08/04 16:55:44   Model: clusterfinder_original
INFO    08/04 16:55:44   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/detector/clusterfinder_original.pkl
WARNING 08/04 16:55:44   Model not supported: ('Package "hmmlearn" needs to be installed to run ClusterFinder HMM. ', 'Install extra dependencies using: \n    pip install "deepbgc[hmm]"')
INFO    08/04 16:55:44   --------------------------------------------------------------------------------
INFO    08/04 16:55:44   Model: deepbgc
INFO    08/04 16:55:44   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/detector/deepbgc.pkl
Using TensorFlow backend.
WARNING 08/04 16:55:49   From /home/mima/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING 08/04 16:55:49   From /home/mima/miniconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
INFO    08/04 16:55:50   Type: KerasRNN
INFO    08/04 16:55:50   Version: 0.1.0
INFO    08/04 16:55:50   Timestamp: 1551305667.986168 (2019-02-28T06:14:27.986168)
INFO    08/04 16:55:50   --------------------------------------------------------------------------------
INFO    08/04 16:55:50   Model: product_class
INFO    08/04 16:55:50   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/detector/product_class.pkl
/home/mima/miniconda3/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.18.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/home/mima/miniconda3/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.18.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
INFO    08/04 16:55:50   Type: RandomForestClassifier
INFO    08/04 16:55:50   Version: 0.1.0
INFO    08/04 16:55:50   Timestamp: 1551781410.019103 (2019-03-05T18:23:30.019103)
INFO    08/04 16:55:50   --------------------------------------------------------------------------------
INFO    08/04 16:55:50   Model: product_activity
INFO    08/04 16:55:50   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/detector/product_activity.pkl
INFO    08/04 16:55:50   Type: RandomForestClassifier
INFO    08/04 16:55:50   Version: 0.1.0
INFO    08/04 16:55:50   Timestamp: 1551781433.886473 (2019-03-05T18:23:53.886473)
INFO    08/04 16:55:50   --------------------------------------------------------------------------------
INFO    08/04 16:55:50   Model: clusterfinder_geneborder
INFO    08/04 16:55:50   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/detector/clusterfinder_geneborder.pkl
WARNING 08/04 16:55:50   Model not supported: ('Package "hmmlearn" needs to be installed to run ClusterFinder HMM. ', 'Install extra dependencies using: \n    pip install "deepbgc[hmm]"')
INFO    08/04 16:55:50   ================================================================================
INFO    08/04 16:55:50   Available classifiers: ['product_class', 'product_activity']
INFO    08/04 16:55:50   --------------------------------------------------------------------------------
INFO    08/04 16:55:50   Model: product_class
INFO    08/04 16:55:50   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/classifier/product_class.pkl
INFO    08/04 16:55:50   Type: RandomForestClassifier
INFO    08/04 16:55:50   Version: 0.1.0
INFO    08/04 16:55:50   Timestamp: 1551781410.019103 (2019-03-05T18:23:30.019103)
INFO    08/04 16:55:51   --------------------------------------------------------------------------------
INFO    08/04 16:55:51   Model: product_activity
INFO    08/04 16:55:51   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/classifier/product_activity.pkl
INFO    08/04 16:55:51   Type: RandomForestClassifier
INFO    08/04 16:55:51   Version: 0.1.0
INFO    08/04 16:55:51   Timestamp: 1551781433.886473 (2019-03-05T18:23:53.886473)
INFO    08/04 16:55:51   ================================================================================
WARNING 08/04 16:55:51   Some warnings detected, check the output above

Then type the command deepbgc pipeline ./GCF_000195955.2_ASM19595v2_genomic.fna. Unfortunately, it failed.

(base) mima@123456:~/BGC/jieheganjun$ deepbgc pipeline ./GCF_000195955.2_ASM19595v2_genomic.fna 
 _____                  ____    ____   ____ 
 |  _ \  ___  ___ ____ | __ )  / ___) / ___)
 | | \ \/ _ \/ _ \  _ \|  _ \ | |  _ | |    
 | |_/ /  __/  __/ |_) | |_) || |_| || |___ 
 |____/ \___|\___| ___/|____/  \____| \____)
=================|_|===== version 0.1.26 =====
INFO    08/04 16:32:36   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/detector/deepbgc.pkl
Using TensorFlow backend.
WARNING 08/04 16:32:36   From /home/mima/miniconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING 08/04 16:32:36   From /home/mima/miniconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
INFO    08/04 16:32:37   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/classifier/product_class.pkl
/home/mima/miniconda3/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.18.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/home/mima/miniconda3/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.18.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
INFO    08/04 16:32:37   Loading model from: /home/mima/.local/share/deepbgc/data/0.1.0/classifier/product_activity.pkl
INFO    08/04 16:32:37   Processing input file 1/1: ./GCF_000195955.2_ASM19595v2_genomic.fna
INFO    08/04 16:32:37   ================================================================================
INFO    08/04 16:32:37   Processing record #1: NC_000962.3
WARNING 08/04 16:32:37   Updating record alphabet to generic_dna
INFO    08/04 16:32:37   Finding genes in record: NC_000962.3
INFO    08/04 16:32:47   Detecting Pfam domains in "NC_000962.3" using HMMER hmmscan, this might take a while...
WARNING 08/04 16:42:14   == HMMER hmmscan Error: ================
WARNING 08/04 16:42:14   
WARNING 08/04 16:42:14   == End HMMER hmmscan Error. ============
ERROR   08/04 16:42:14   Unexpected error detecting protein domains using HMMER hmmscan
Traceback (most recent call last):
  File "/home/mima/miniconda3/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main
    run(argv)
  File "/home/mima/miniconda3/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run
    args.func.run(**args_dict)
  File "/home/mima/miniconda3/lib/python3.7/site-packages/deepbgc/command/pipeline.py", line 177, in run
    step.run(record)
  File "/home/mima/miniconda3/lib/python3.7/site-packages/deepbgc/pipeline/annotator.py", line 35, in run
    pfam_annotator.annotate()
  File "/home/mima/miniconda3/lib/python3.7/site-packages/deepbgc/pipeline/pfam.py", line 97, in annotate
    self._run_hmmscan(protein_path, domtbl_path)
  File "/home/mima/miniconda3/lib/python3.7/site-packages/deepbgc/pipeline/pfam.py", line 73, in _run_hmmscan
    raise Exception("Unexpected error detecting protein domains using HMMER hmmscan")
Exception: Unexpected error detecting protein domains using HMMER hmmscan
ERROR   08/04 16:42:14   ================================================================================
ERROR   08/04 16:42:14   DeepBGC failed with Exception: Unexpected error detecting protein domains using HMMER hmmscan
ERROR   08/04 16:42:14   ================================================================================

I don't know how to solve this problem, so please help me. (By the way, I have tried put this genome sequences into antiSMASH, and it works). Thanks in advance.

Deepbgc installation through conda

Hi, I hope you are doing well

I am so interested to use deepbgc in my work, I think I could get interesting results using it. I have been trying to install deepbgc using conda, but always get the following (screenshot). Do you know what can I do to solve it? Thanks in advance
Screen Shot 2021-09-23 at 21 22 27

some problems while training my model

Hello! Recently I used deepbgc in my work, but I encountered some problems while training my model.
My code:

Nohup deepbgc train --model deepbgc.json --output DeepBGC_antigen_model.pkl --config PFAM2VEC pfam2vec.csv Oantigen.pfam.tsv GeneSwap_Negatives.pfam.tsv &

It can start running, but after a while, he seems to be deadlocked:
The end of the log :

Epoch 76/1000
ย  - 143s - loss: 0.0028 - acc: 0.9993 - precision: 0.8799 - recall: 0.8642 - auc_roc: 0.9911
Epoch 77/1000
ย  - 148s - loss: 0.0023 - acc: 0.9992 - precision: 0.8969 - recall: 0.8722 - auc_roc: 0.9913

This state has been going on for a long time (about 2 days, my input file is only 2M, including 515 BGC)
What can I do?
thank you!

How to use DeepBGC's results to the antiSMASH

I see that antiSMASH 6.0 has added a specification of a JSON based file format that allows users to load gene cluster predictions from DeepBGC to be displayed alongside antiSMASH's own results. When I run the DeepBGC, there are no *.json file generated. I used the following code:
$ deepbgc pipeline test/data/BGC0000015.fa --output work/BGC0000015
So, how can i obtain the *.json file?

And, another question, how to use the DeepBGCโ€˜s result (*.json file) to the antiSMASH?
Looking forward for your reply, Thank you very much!

MD5SUM is not match

MD5 is not match with md5_checksums in line 67 and line 74 of script data.py
data.py

{
        'url': 'ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam{}/Pfam-A.hmm.gz'.format(PFAM_DB_VERSION),
        'target': PFAM_DB_FILE_NAME,
        'gzip': True,
        'after': util.run_hmmpress,
        'checksum': '79a3328e4c95b13949a4489b19959fc5',
        'versioned': False
    },
    {
        'url': 'ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam{}/Pfam-A.clans.tsv.gz'.format(PFAM_DB_VERSION),
        'target': PFAM_CLANS_FILE_NAME,
        'gzip': True,
        'checksum': 'a0a4590ffb2b33b83ef2b28f6ead886b',
        'versioned': False
    }

md5_checksums of Pfam31.0: ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam31.0/md5_checksums

8951c2a33e7f77c562473097d6ac2b33  Pfam-A.clans.tsv.gz
b6fda5fdc90d24fbc1484d3b641d4e32  Pfam-A.hmm.gz

different number of elements in *.bgc.tsv rows

Hi,

recently I encounter the problem of different number of elements in the*.bgc.tsv output files. I am not sure if it is input specific or just result of the update to 1.25. For some rows, there seems to be missing '\t's (product_activity variable), so the columns are shifted and can't be correctly imported as TSV. Attached is the output I got and the.gbk that produced the error.

Thanks for having a look!

deepBGC_issue.zip

Does DeepBGC works with plants (Brassica spp) ?

I am a Ph.D. student metabolomic/specialized plant metabolic responses to pathogens.

your deep learning approach applied to the BGC is very innovative. I'd like to apply it to my plant data to predict the biosynthetic gene cluster.

Does DEEPPBGC works with plants (Brassica spp) data ?,

IMPORTANT: These plants contain approximately 6 copies of each gene resulting from duplication and therefore a large question of neofunctionalization and subfunctionalization.

Thank you for your reply

ERROR: Could not find a version that satisfies the requirement tensorflow<2.0.0,>=1.15.4 (from deepbgc)

Hello, I am sorry to ask you for help again. ๐Ÿ˜ฅ I have some problem when I install deepbgc on WSL2.

First try: I follow your instruction to install deepbgc by conda install deepbgc. But it is not the latest version, it is 0.1.10.

(base) b07@SB:~$ deepbgc --version
 _____                  ____    ____   ____
 |  _ \  ___  ___ ____ | __ )  / ___) / ___)
 | | \ \/ _ \/ _ \  _ \|  _ \ | |  _ | |
 | |_/ /  __/  __/ |_) | |_) || |_| || |___
 |____/ \___|\___| ___/|____/  \____| \____)
=================|_|===== version 0.1.10 =====
usage: deepbgc [-h] COMMAND ...

DeepBGC - Biosynthetic Gene Cluster detection and classification

Second try: I want to try your latest version to get the *.json file. So I follow this issue How to use DeepBGC's results to the antiSMASH to install special version by conda install deepbgc==0.1.23

(base) b07@SB:~$ conda install deepbgc==0.1.23
Collecting package metadata (current_repodata.json): -
done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): / 
done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Third try: Finally, I just download this file to manually install deepbgc.
image
Then I uncompressed it and tried to type pip install . in current directory. #45 (comment)

(base) b07@SB:~/deepbgc-0.1.25$ tree
.
โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ LICENSES_THIRD_PARTY
โ”œโ”€โ”€ Makefile
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ deepbgc
โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”œโ”€โ”€ __version__.py
โ”‚ย ย  โ”œโ”€โ”€ command
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ base.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ download.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ info.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pipeline.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ prepare.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ train.py
โ”‚ย ย  โ”œโ”€โ”€ data.py
โ”‚ย ย  โ”œโ”€โ”€ features.py
โ”‚ย ย  โ”œโ”€โ”€ main.py
โ”‚ย ย  โ”œโ”€โ”€ models
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ hmm.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ random_forest.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ rnn.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ wrapper.py
โ”‚ย ย  โ”œโ”€โ”€ output
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ antismash_json.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ bgc_genbank.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ cluster_tsv.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ evaluation
โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ bgc_region_plot.py
โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pfam_score_plot.py
โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pr_plot.py
โ”‚ย ย  โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ roc_plot.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ genbank.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pfam_tsv.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ readme.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ writer.py
โ”‚ย ย  โ”œโ”€โ”€ pipeline
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ annotator.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ classifier.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ detector.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pfam.py
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ protein.py
โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ step.py
โ”‚ย ย  โ””โ”€โ”€ util.py
โ”œโ”€โ”€ images
โ”‚ย ย  โ”œโ”€โ”€ deepbgc.architecture.png
โ”‚ย ย  โ”œโ”€โ”€ deepbgc.bgc.png
โ”‚ย ย  โ””โ”€โ”€ deepbgc.pipeline.png
โ”œโ”€โ”€ setup.py
โ””โ”€โ”€ test
    โ”œโ”€โ”€ __init__.py
    โ”œโ”€โ”€ data
    โ”‚ย ย  โ”œโ”€โ”€ BGC0000015.classes.csv
    โ”‚ย ย  โ”œโ”€โ”€ BGC0000015.fa
    โ”‚ย ย  โ”œโ”€โ”€ BGC0000015.fa.gz
    โ”‚ย ย  โ”œโ”€โ”€ BGC0000015.gbk
    โ”‚ย ย  โ”œโ”€โ”€ BGC0000015.pfam.csv
    โ”‚ย ย  โ”œโ”€โ”€ BGC0000015.protein.fa
    โ”‚ย ย  โ”œโ”€โ”€ Pfam-A.PF00005.clans.tsv
    โ”‚ย ย  โ”œโ”€โ”€ Pfam-A.PF00005.hmm
    โ”‚ย ย  โ”œโ”€โ”€ Pfam-A.PF00005.hmm.h3f
    โ”‚ย ย  โ”œโ”€โ”€ Pfam-A.PF00005.hmm.h3i
    โ”‚ย ย  โ”œโ”€โ”€ Pfam-A.PF00005.hmm.h3m
    โ”‚ย ย  โ”œโ”€โ”€ Pfam-A.PF00005.hmm.h3p
    โ”‚ย ย  โ”œโ”€โ”€ clusterfinder_geneborder_test.json
    โ”‚ย ย  โ”œโ”€โ”€ deepbgc_test.json
    โ”‚ย ย  โ”œโ”€โ”€ negative.pfam.csv
    โ”‚ย ย  โ”œโ”€โ”€ pfam2vec.test.tsv
    โ”‚ย ย  โ””โ”€โ”€ random_forest_test.json
    โ”œโ”€โ”€ integration
    โ”‚ย ย  โ”œโ”€โ”€ __init__.py
    โ”‚ย ย  โ”œโ”€โ”€ commands
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ __init__.py
    โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ test_integration_pipeline.py
    โ”‚ย ย  โ”‚ย ย  โ””โ”€โ”€ test_integration_prepare.py
    โ”‚ย ย  โ””โ”€โ”€ pipeline
    โ”‚ย ย      โ”œโ”€โ”€ __init__.py
    โ”‚ย ย      โ”œโ”€โ”€ test_integration_pfam.py
    โ”‚ย ย      โ””โ”€โ”€ test_integration_protein.py
    โ”œโ”€โ”€ test_util.py
    โ””โ”€โ”€ unit
        โ”œโ”€โ”€ __init__.py
        โ”œโ”€โ”€ commands
        โ”‚ย ย  โ”œโ”€โ”€ __init__.py
        โ”‚ย ย  โ”œโ”€โ”€ test_unit_pipeline.py
        โ”‚ย ย  โ””โ”€โ”€ test_unit_train.py
        โ”œโ”€โ”€ output
        โ”‚ย ย  โ”œโ”€โ”€ __init__.py
        โ”‚ย ย  โ””โ”€โ”€ test_unit_writers.py
        โ””โ”€โ”€ test_unit_main.py

15 directories, 79 files

(base) b07@SB:~/deepbgc-0.1.25$  pip install .
Collecting argparse
  Using cached argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: biopython<=1.76,>=1.70 in /home/b07/miniconda3/lib/python3.8/site-packages (from deepbgc==0.1.25) (1.76)
Collecting scikit-learn<=0.21.3,>=0.18.2
  Downloading scikit-learn-0.21.3.tar.gz (12.2 MB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 12.2 MB 294 kB/s
Collecting pandas>=0.24.1
  Downloading pandas-1.2.3-cp38-cp38-manylinux1_x86_64.whl (9.7 MB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 9.7 MB 1.3 MB/s
Collecting numpy<1.17,>=1.16.1
  Downloading numpy-1.16.6.zip (5.1 MB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 5.1 MB 442 kB/s
Collecting keras<2.3.0,>=2.2.4
  Downloading Keras-2.2.5-py2.py3-none-any.whl (336 kB)
     |โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 336 kB 104 kB/s
ERROR: Could not find a version that satisfies the requirement tensorflow<2.0.0,>=1.15.4 (from deepbgc)
ERROR: No matching distribution found for tensorflow<2.0.0,>=1.15.4

Unfortunately, I failed again. I hope you can help me, I want to this antismash.json file. Thank you in advance.
image

Fails with deepbgc. ImportError: cannot import name 'ChainMap'

I'm getting a failure to install deepbgc using Anaconda. And Finished.
But Fails with "ImportError: cannot import name 'ChainMap' "
Is it the python version error?
my python version is 3.6
As follows๏ผš
conda create -n myenv python==3.6
source activate myenv
and I try to deepbgc --help
(deepbgc) [xxxxx@localhost xxxx]$ deepbgc pipeline --help
Traceback (most recent call last):
File "/home/xxxx/.conda/envs/deepbgc/bin/deepbgc", line 6, in
from deepbgc.main import main
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/deepbgc/init.py", line 2, in
from .pipeline import DeepBGCClassifier, DeepBGCDetector, HmmscanPfamRecordAnnotator, DeepBGCAnnotator, ProdigalProteinRecordAnnotator
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/deepbgc/pipeline/init.py", line 1, in
from .classifier import DeepBGCClassifier
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/deepbgc/pipeline/classifier.py", line 4, in
from deepbgc import util
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/deepbgc/util.py", line 8, in
import pandas as pd
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/pandas/init.py", line 121, in
from pandas.core.computation.api import eval
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/pandas/core/computation/api.py", line 3, in
from pandas.core.computation.eval import eval
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 12, in
from pandas.core.computation.engines import _engines
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/pandas/core/computation/engines.py", line 9, in
from pandas.core.computation.ops import _mathops, _reductions
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/pandas/core/computation/ops.py", line 19, in
from pandas.core.computation.scope import _DEFAULT_GLOBALS
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/pandas/core/computation/scope.py", line 17, in
from pandas.compat.chainmap import DeepChainMap
File "/home/xxxx/.conda/envs/deepbgc/lib/python3.6/site-packages/pandas/compat/chainmap.py", line 1, in
from typing import ChainMap, MutableMapping, TypeVar, cast
ImportError: cannot import name 'ChainMap'

deepbgc pipeline does not seem to use more than 2 cores?

Hello,

Thank you for taking the time to develop this tool, it does exactly what I need it to do for my odd gene clusters. However, the only issue is that it takes more than 20 minutes per fasta file for some bacterial genomes. Is this normal?

I have made a simple model both locally on my macbook and on my university's computational cluster and ran deepbgc pipeline on a single bacterial genome fasta file and in both cases there is no improvement in processing speed past 2 cores. When reading the HMMER user manual, it suggests that it should be capable of utilizing all of the cores available. However, deepbgc pipeline --help does not suggest there is a command to specify how many cpu's it can utilize.

Am I missing something obvious here? Should it take 20 minutes for one fasta file? It seems like it would take decades to get through a metagenome at that rate. My apologies for the ignorance, but how does deepbgc call hmmscan/search?

I looked through some of the other issue reports, but we do not seem to be having the same problems. Any advice you can offer is greatly appreciated, I really would like to use this tool for my work.

Thanks

DeepBGC failed with ValueError: missing molecule_type in annotations

Hi,

When I tried to use DeepBGC to analyze a genbank file downloaded from NCBI, I encountered an errror like this:

Using TensorFlow backend.
WARNING 11/02 15:54:49 From /opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING 11/02 15:54:49 From /opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING 11/02 15:54:49 From /opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING 11/02 15:54:49 From /opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING 11/02 15:54:49 From /opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
WARNING 11/02 15:54:50 From /opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

INFO 11/02 15:54:50 Loading model from: /Users/cynthiayo/Library/Application Support/deepbgc/data/0.1.0/classifier/product_class.pkl
/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.18.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.18.2 when using version 0.21.3. This might lead to breaking code or invalid results. Use at your own risk.
UserWarning)
INFO 11/02 15:54:52 Loading model from: /Users/cynthiayo/Library/Application Support/deepbgc/data/0.1.0/classifier/product_activity.pkl
INFO 11/02 15:54:52 Processing input file 1/1: sequence.gb
INFO 11/02 15:54:52 ================================================================================
INFO 11/02 15:54:52 Processing record #1: KB946332.1
INFO 11/02 15:54:52 Sequence already contains 802 CDS features, skipping CDS detection
INFO 11/02 15:54:52 Detecting Pfam domains in "KB946332.1" using HMMER hmmscan, this might take a while...
INFO 11/02 15:59:29 HMMER hmmscan Pfam detection done in 0h4m37s
INFO 11/02 16:00:07 Added 1795 Pfam domains (947 unique PFAM_IDs)
INFO 11/02 16:00:07 Detecting BGCs using deepbgc model in KB946332.1
INFO 11/02 16:00:10 Detected 3 BGCs using deepbgc model in KB946332.1
INFO 11/02 16:00:10 Classifying 3 BGCs using product_class model in KB946332.1
INFO 11/02 16:00:10 Classifying 3 BGCs using product_activity model in KB946332.1
INFO 11/02 16:00:10 Saving processed record KB946332.1
ERROR 11/02 16:00:10 missing molecule_type in annotations
Traceback (most recent call last):
File "/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main
run(argv)
File "/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run
args.func.run(**args_dict)
File "/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/command/pipeline.py", line 181, in run
writer.write(record)
File "/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/output/bgc_genbank.py", line 26, in write
SeqIO.write(cluster_record, self.fd, 'genbank')
File "/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/Bio/SeqIO/init.py", line 530, in write
count = writer_class(handle).write_file(sequences)
File "/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 244, in write_file
count = self.write_records(records, maxcount)
File "/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/Bio/SeqIO/Interfaces.py", line 218, in write_records
self.write_record(record)
File "/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/Bio/SeqIO/InsdcIO.py", line 981, in write_record
self._write_the_first_line(record)
File "/opt/anaconda3/envs/deepbgc/lib/python3.7/site-packages/Bio/SeqIO/InsdcIO.py", line 744, in _write_the_first_line
raise ValueError("missing molecule_type in annotations")
ValueError: missing molecule_type in annotations
ERROR 11/02 16:00:10 ================================================================================
ERROR 11/02 16:00:10 DeepBGC failed with ValueError: missing molecule_type in annotations
ERROR 11/02 16:00:10 ================================================================================

Could you help me with the problem mentioned here?

Thank you!

Automatically re-annotate PFAM domains if they are incompatible

Hi,

I'm running version 0.1.26 and I'm getting the same error as #47

part of error message is below:

ERROR   06/07 22:04:58   pfam_id
Traceback (most recent call last):
  File "/mnt/data/sbusi/antismash/.snakemake/conda/bf12a359/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2656, in get_loc
    return self._engine.getloc(key)
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'pfam_id'

Thank you for your help!

WARNING: Tensor flow backened message when running deepbgc info into the CLI

Hi,

I currently have deepbgc installed onto a conda environment. The program works but when I run it, I am getting a warning message that looks like the following:

`Using TensorFlow backend.
WARNING 04/10 11:54:59 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING 04/10 11:54:59 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING 04/10 11:54:59 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING 04/10 11:54:59 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING 04/10 11:54:59 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
WARNING 04/10 11:55:00 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING 04/10 11:55:00 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:181: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING 04/10 11:55:00 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:186: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING 04/10 11:55:00 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING 04/10 11:55:00 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING 04/10 11:55:00 From /EFS/tools/miniconda/envs/deepbgc2/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.`

Any way to deal with these message flags? I currently have version 1.15.4 of the tensorflow package installed onto the environment and version 0.1.30 of deepbgc installed

allow user-defined downloads directory

deepbgc download (version 0.1.10) downloads all files into the user's home directory. At our institute, our home directories are very limited in size, so requiring downloading of the deepbgc-assocaited files into the home directory can lead to some significant problems for us. It would help if deepbgc download allowed users to specify the download location.

Retrain on MIBiG 2.0

Our current model is trained on MIBiG 1.4, we can retrain (and validate) the model.

We can also demonstrate how our current model performs on the newly added BGCs.

leave-class out validation

Hi, prihoda๏ผš

I met some problems when using DeepBGC need for your help. If I want to use leave-class-out validation to demonstrate the good performance of the model. I use the following command:

$ deepbgc train --model deepbgc.json --config PFAM2VEC pfam2vec.csv --validation vali_pos_Alkaloid.tsv --validation vali_neg.tsv train_pos_Alkaloid.tsv train_neg.tsv

where train_pos_Alkaloid.tsv contains 1077 samples from five classes (Polyketide, NRP, RiPP, Saccharide, and Terpene),
train_neg.tsv contains 6752 samples, with random two thirds of negative samples, vali_pos_Alkaloid.tsv contains 500 samples by sampling with replacement according to the Alkaloid pfam file, and vali_neg.tsv contains the remaining third of negative samples

Here, my question is:

  1. Is this command correct to the leave-class-out validation?
  2. This process seems to require huge memory. I encountered a "Segmentation fault" issue during operation.

I look forward to hearing from you soon.

DeepBGC failed with TypeError: rsplit() takes no keyword arguments

Part of the error log:

ERROR 19/11 15:51:37 rsplit() takes no keyword arguments
Traceback (most recent call last):
File "/mnt/volume/miniconda/3/lib/python2.7/site-packages/deepbgc/main.py", line 113, in main
run(argv)
File "/mnt/volume/miniconda/3/lib/python2.7/site-packages/deepbgc/main.py", line 102, in run
args.func.run(**args_dict)
File "/mnt/volume/miniconda/3/lib/python2.7/site-packages/deepbgc/command/pipeline.py", line 173, in run
step.run(record)
File "/mnt/volume/miniconda/3/lib/python2.7/site-packages/deepbgc/pipeline/annotator.py", line 36, in run
pfam_annotator.annotate()
File "/mnt/volume/miniconda/3/lib/python2.7/site-packages/deepbgc/pipeline/pfam.py", line 120, in annotate
short_pfam_id = pfam_id.rsplit('.', maxsplit=1)[0]
TypeError: rsplit() takes no keyword arguments
ERROR 19/11 15:51:37 ================================================================================
ERROR 19/11 15:51:37 DeepBGC failed with TypeError: rsplit() takes no keyword arguments
ERROR 19/11 15:51:37 ================================================================================

installation error

hello
I used this step to install deepbgc but t get error at the end

Set up Bioconda and Conda-Forge channels:

conda config --add channels bioconda
conda config --add channels conda-forge
Install DeepBGC using:

Create a separate DeepBGC environment and install dependencies

conda create -n deepbgc python=3.7 hmmer prodigal

Install DeepBGC into the environment using pip

conda activate deepbgc
pip install deepbgc

error message
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main
run(argv)
File "/root/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run
args.func.run(**args_dict)
File "/root/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/command/pipeline.py", line 133, in run
min_bio_domains=min_bio_domains
File "/root/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/pipeline/detector.py", line 35, in init
self.model = SequenceModelWrapper.load(model_path)
File "/root/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/models/wrapper.py", line 188, in load
raise ValueError("Error unpickling model from path '{}'".format(path), e)
ValueError: ("Error unpickling model from path '/root/.local/share/deepbgc/data/0.1.0/detector/deepbgc.pkl'", TypeError('Descriptors cannot not be created directly.\nIf this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.\nIf you cannot immediately regenerate your protos, some other possible workarounds are:\n 1. Downgrade the protobuf package to 3.20.x or lower.\n 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).\n\nMore information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates'))
ERROR 20/11 16:30:10 ================================================================================
ERROR 20/11 16:30:10 DeepBGC failed with ValueError: Error unpickling model from path '/root/.local/share/deepbgc/data/0.1.0/detector/deepbgc.pkl'
ERROR 20/11 16:30:10 ================================================================================
ERROR 20/11 16:30:10 Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
ERROR 20/11 16:30:10 ================================================================================

TensorFlow version problem

New installations of deepbgc fail because of the installed new versions of TensorFlow:

root@a492456632b1:~# pip install deepbgc
Collecting deepbgc
  Using cached deepbgc-0.1.18-py3-none-any.whl (63 kB)
Collecting matplotlib<3.1,>=2.2.3
  Using cached matplotlib-3.0.3.tar.gz (36.6 MB)
Collecting numpy<1.17,>=1.16.1
  Using cached numpy-1.16.6.zip (5.1 MB)
ERROR: Could not find a version that satisfies the requirement tensorflow<2.0.0,>=1.12.0 (from deepbgc) (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0)
ERROR: No matching distribution found for tensorflow<2.0.0,>=1.12.0 (from deepbgc)
root@a492456632b1:~#

Here in the info for the installed TensorFlow:

root@a492456632b1:~# pip show tensorflow
Name: tensorflow
Version: 2.3.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: [email protected]
License: Apache 2.0
Location: /usr/local/miniconda3/lib/python3.8/site-packages
Requires: wrapt, absl-py, keras-preprocessing, google-pasta, gast, grpcio, wheel, tensorboard, h5py, astunparse, opt-einsum, tensorflow-estimator, six, scipy, numpy, termcolor, protobuf
Required-by:
root@a492456632b1:~#

screenshot

Visualization

Hi, I would just like to find out if there are any alternative options for the visualization of the deepBGC output other than antiSMASH?

Replace hmmscan with hmmsearch?

I suggested this in a closed issue discussion, but I'll post it as a request. The hmmscan portion of deepbgc takes a LONG time, and is computationally unnecessary, I believe, for what you're trying to do. This post from the hmmer author explains why. (It's a very old post, but I believe it still applies to hmmer 3.3)
https://cryptogenomicon.org/2011/05/27/hmmscan-vs-hmmsearch-speed-the-numerology/

Feel free to correct me if I'm wrong or misunderstanding the goal of that part of the pipeline.

Provide alternative versions of DeepBGC model

Alternative versions of DeepBGC could be provided:

  • deepbgc_curated: MIBiG 1.4 filtered by e.g. minimal number of protein domains
  • deepbgc_mibig_1.4: All MIBiG 1.4 data
  • deepbgc_cf: ClusterFinder dataset of 617 BGCs

I am getting an issue regarding memory, how to solve this ?

(deepbgc) [devaraja@rocket ~]$ deepbgc pipeline deepbgc/deepbgc-0.1-3.29/test/data/BGC0000015.fa


| _ \ ___ ___ ____ | __ ) / ) / )
| | \ / _ / _ \ _ | _ \ | | _ | |
| |
/ / __/ __/ |
) | |
) || |
| || |
_
|/ _|_| /|/ _| ___)
=================|
|===== version 0.1.29 =====
INFO 20/03 11:17:39 Loading model from: /gpfs/space/home/devaraja/.local/share/deepbgc/data/0.1.0/detector/deepbgc.pkl
Using TensorFlow backend.
WARNING 20/03 11:17:45 From /gpfs/space/home/devaraja/.conda/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING 20/03 11:17:45 From /gpfs/space/home/devaraja/.conda/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING 20/03 11:17:45 From /gpfs/space/home/devaraja/.conda/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING 20/03 11:17:45 From /gpfs/space/home/devaraja/.conda/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING 20/03 11:17:45 From /gpfs/space/home/devaraja/.conda/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
WARNING 20/03 11:17:46 From /gpfs/space/home/devaraja/.conda/envs/deepbgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
Aborted

run Deepbgc in fungal genomes

I see that Deepbgc uses prodigal for gene prediction. Is there a way to run Deepbgc for fungal genomes using another gene prediction tool?

Thanks,
Juliana

Problem for training DeepBGC from scratch

Hi, Prihoda:
Here, I have some problems want to consult you. I try to train DeepBGC from scratch. According to the "Readme.md" file, I download the MIBiG.pfam.tsv and GeneSwap_Negatives.pfam.tsv files as the positive and negative samples. Then, I use to the following instruction to train DeepBGC:

deepbgc train --model deepbgc.json --output MyDeepBGCDetector.pkl MIBiG.pfam.tsv GeneSwap_Negatives.pfam.tsv

But, I met the error:
p1

Why do we need to load "pfam2vec.csv" during training? I think this should be obtained from the trained model.

defaults in script docs

It would help to include the default values in the argparse help. Example:

group.add_argument('-s', '--score', default=0.5, type=float,
                            help="Average protein-wise DeepBGC score threshold for extracting BGC regions from Pfam sequences (default: %(default)s).")

Otherwise, the user has to dig through your code to find the defaults.

Error during detector training

hello
i faced a problem during detector o my sample

here is the error message, it says i didn't use a negative dataset but actually i used one that is called GeneSwap_Negatives.pfam.tsv
i think deepbgc can't see the negative dataset because of a error o my command. --help didn't tell where or how to put it

  'optimizer': 'adam',
                  'shuffle': True,
                  'timesteps': 256,
                  'validation_size': 0,
                  'verbose': 1,
                  'weighted': True},
'input_params': {   'features': [   {'type': 'ProteinBorderTransformer'},
                                    {   'type': 'Pfam2VecTransformer',
                                        'vector_path': 'pfam2vec.csv'}]},
'type': 'KerasRNN'}

INFO 15/05 09:28:22 Loaded 41102 samples and 80777 domains from sample1_deepbgc_prepare_result.tsv
INFO 15/05 09:28:28 Loaded 10128 samples and 706950 domains from GeneSwap_Negatives.pfam.tsv
ERROR 15/05 09:28:33 Got target variable with only one value {0} in: ['sample1_deepbgc_prepare_result.tsv', 'GeneSwap_Negatives.pfam.tsv']
Traceback (most recent call last):
File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main
run(argv)
File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run
args.func.run(**args_dict)
File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/command/train.py", line 60, in run
train_samples, train_y = util.read_samples(inputs, target)
File "/root/esraa/miniconda3/envs/deepbgcv0.1.29/lib/python3.7/site-packages/deepbgc/util.py", line 574, in read_samples
'Did you provide positive and negative samples?')
ValueError: ("Got target variable with only one value {0} in: ['sample1_deepbgc_prepare_result.tsv', 'GeneSwap_Negatives.pfam.tsv']", 'At least two values are required to train a model. ', 'Did you provide positive and negative samples?')
ERROR 15/05 09:28:33 ================================================================================
ERROR 15/05 09:28:33 DeepBGC failed with ValueError: Got target variable with only one value {0} in: ['sample1_deepbgc_prepare_result.tsv', 'GeneSwap_Negatives.pfam.tsv']
ERROR 15/05 09:28:33 ================================================================================
ERROR 15/05 09:28:33 At least two values are required to train a model.
ERROR 15/05 09:28:33 Did you provide positive and negative samples?
ERROR 15/05 09:28:33 ================================================================================

my cmmand:

deepbgc train --model templates/deepbgc.json --output MyDeepBGCDetector.pkl sample1_deepbgc_prepare_result.tsv GeneSwap_Negatives.pfam.tsv --config PFAM2VEC pfam2vec
.csv -v ClusterFinder_Annotated_Contigs.full.gbk

how to get a cluster?

Hello! I wangt to ask how to get clusters after getting the output of each pfam from the LSTM model?

Test run not working

Hi I'm trying to implement this pipeline in my work, but I'm facing problems. The pipeline was conda installed according with instruction in you GitHub. Here's the error:

$ deepbgc --help
Traceback (most recent call last):
File "/apps/deepbgc/0.1.18/bin/deepbgc", line 6, in
from deepbgc.main import main
File "/apps/deepbgc/0.1.18/lib/python3.7/site-packages/deepbgc/init.py",
line 2, in
from .pipeline import DeepBGCClassifier, DeepBGCDetector,
HmmscanPfamRecordAnnotator, DeepBGCAnnotator, ProdigalProteinRecordAnnotator
File
"/apps/deepbgc/0.1.18/lib/python3.7/site-packages/deepbgc/pipeline/init.py",
line 1, in
from .classifier import DeepBGCClassifier
File
"/apps/deepbgc/0.1.18/lib/python3.7/site-packages/deepbgc/pipeline/classifier.py",
line 4, in
from deepbgc import util
File "/apps/deepbgc/0.1.18/lib/python3.7/site-packages/deepbgc/util.py", line
18, in
from Bio.Alphabet import SingleLetterAlphabet, generic_dna
File
"/apps/deepbgc/0.1.18/lib/python3.7/site-packages/Bio/Alphabet/init.py",
line 21, in
"Bio.Alphabet has been removed from Biopython. In many cases, the alphabet
can simply be ignored and removed from scripts. In a few cases, you may need to
specify the molecule_type as an annotation on a SeqRecord for your script
to work correctly. Please see https://biopython.org/wiki/Alphabet for more
information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the
alphabet can simply be ignored and removed from scripts. In a few cases, you
may need to specify the molecule_type as an annotation on a SeqRecord for
your script to work correctly. Please see https://biopython.org/wiki/Alphabet
for more information.

Thanks in advance,

Alejandro

bug in get_proteins_by_id affecting pfam annotator

First of all thanks for developing DeepBGC and making it available to the community.

I came across a bug in HmmscanPfamRecordAnnotator when generating the proteins_by_id dictionary. The util function get_proteins_by_id is currently looping through all the potential protein ids of a feature (e.g. unique_protein_id, protein_id and locus_tag) and this can cause features with id based on protein_id qualifier to be overwritten by another feature that shares the same protein_id but it was deduplicated using the unique_protein_id. This is causing PFAM_domain features to be incorrectly placed in the genomic sequence because protein_id used in hmmscan output file will match a different feature and pick the incorrect feature location.

DeepBGC failed with KeyError: pfam_id

Hi,

I sometimes get an error:

INFO    05/03 10:31:02   Loading model from: /home/art/.local/share/deepbgc/data/0.1.0/classifier/product_activity.pkl
INFO    05/03 10:31:02   Processing input file 1/1: NZ_PZDX01000459.gbk
INFO    05/03 10:31:02   ================================================================================
INFO    05/03 10:31:02   Processing record #1: NZ_PZDX01000459
INFO    05/03 10:31:02   Sequence already contains 2 CDS features, skipping CDS detection
INFO    05/03 10:31:02   Reusing already existing HMMER hmmscan result: NZ_PZDX01000459/tmp/NZ_PZDX01000459.pfam.domtbl.txt
INFO    05/03 10:31:02   Added 1 Pfam domains (1 unique PFAM_IDs)
INFO    05/03 10:31:02   Detecting BGCs using deepbgc model in NZ_PZDX01000459
INFO    05/03 10:31:02   Detected 1 BGCs using deepbgc model in NZ_PZDX01000459
INFO    05/03 10:31:02   Classifying 1 BGCs using product_class model in NZ_PZDX01000459
ERROR   05/03 10:31:02   pfam_id
Traceback (most recent call last):
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'pfam_id'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main
    run(argv)
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run
    args.func.run(**args_dict)
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/command/pipeline.py", line 177, in run
    step.run(record)
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/pipeline/classifier.py", line 41, in run
    class_scores = self.model.predict(cluster_pfam_sequences)
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/models/wrapper.py", line 119, in predict
    X_list = self.transformer.transform(samples)
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/features.py", line 29, in transform
    X_list = [self._transform_sequence(sequence) for sequence in samples]
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/features.py", line 29, in <listcomp>
    X_list = [self._transform_sequence(sequence) for sequence in samples]
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/features.py", line 38, in _transform_sequence
    return pd.concat([t.transform(sequence) for t in self.transformers], sort=False)
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/features.py", line 38, in <listcomp>
    return pd.concat([t.transform(sequence) for t in self.transformers], sort=False)
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/deepbgc/features.py", line 204, in transform
    values = pd.get_dummies(X[self.column]).reindex(columns=self.unique_values, fill_value=0)
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/frame.py", line 3024, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/art/miniconda3/envs/deepbgc/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'pfam_id'
ERROR   05/03 10:31:02   ================================================================================
ERROR   05/03 10:31:02   DeepBGC failed with KeyError: pfam_id
ERROR   05/03 10:31:02   ================================================================================

What is the problem here?
Attached is the extracted contig that is throwing this error.

NZ_PZDX01000459.zip

Thanks!

how to create negative samples?

"To generate a single negative sample, a random reference bacterium and a random sample from the positive ClusterFinder set were selected. Each gene in the positive sample was replaced with a random gene from the reference bacteria, while considering only 1% of genes that were most similar in number of Pfam domains. In total, three samples were generated from each reference bacteria, producing 10 128 negative samples."

In theory, a lot of negative samples can be created. Have you tried to generate other numbers of negative samples for training?
And I don't figure out how to create a negative sample. In the above description, the operation object seems to be positive sample instead of reference bacteria.After removing the regions similar to MIBiG,How did you create negative samples๏ผŸ
Thank you very much!

Conda install error, incomplete output from deepbgc pipeline, and IndexError: tuple index out of range

Hello DeepBGC developers,

I learned about this software from Geoffrey Hannigan's recent microseminar - fantastic. I just had a quick test on the software. I tried conda install the software but it didn't work. Conda output something like this:

UnsatisfiableError: The following specifications were found to be in conflict:
  - deepbgc
Use "conda search <package> --info" to see the dependencies for each package.

Anyways, I proceeded with creating an empty conda environment, conda install the dependencies and pip install deepbgc. DeepBGC worked so far, although there was a lot of deprecation warning from tensorflow. I downloaded the trained classifier with deepbgc download. Here is the output for deepbgc info:

 _____                  ____    ____   ____ 
 |  _ \  ___  ___ ____ | __ )  / ___) / ___)
 | | \ \/ _ \/ _ \  _ \|  _ \ | |  _ | |    
 | |_/ /  __/  __/ |_) | |_) || |_| || |___ 
 |____/ \___|\___| ___/|____/  \____| \____)
=================|_|===== version 0.1.7 =====
INFO    27/07 23:14:23   Available data files: ['Pfam-A.31.0.hmm', 'Pfam-A.31.0.hmm.h3m', 'Pfam-A.31.0.hmm.h3f', 'Pfam-A.31.0.hmm.h3i', 'Pfam-A.31.0.hmm.h3p', 'Pfam-A.31.0.clans.tsv']
INFO    27/07 23:14:23   ================================================================================
INFO    27/07 23:14:23   Available detectors: ['clusterfinder_geneborder', 'clusterfinder_retrained', 'deepbgc', 'clusterfinder_original']
INFO    27/07 23:14:23   --------------------------------------------------------------------------------
INFO    27/07 23:14:23   Model: clusterfinder_geneborder
INFO    27/07 23:14:23   Loading model from: /home/zarul/.local/share/deepbgc/data/0.1.0/detector/clusterfinder_geneborder.pkl
INFO    27/07 23:14:23   Type: GeneBorderHMM
INFO    27/07 23:14:23   Version: 0.1.0
INFO    27/07 23:14:23   Timestamp: 1551449863.1564941 (2019-03-01T22:17:43.156494)
INFO    27/07 23:14:23   --------------------------------------------------------------------------------
INFO    27/07 23:14:23   Model: clusterfinder_retrained
INFO    27/07 23:14:23   Loading model from: /home/zarul/.local/share/deepbgc/data/0.1.0/detector/clusterfinder_retrained.pkl
INFO    27/07 23:14:23   Type: DiscreteHMM
INFO    27/07 23:14:23   Version: 0.1.0
INFO    27/07 23:14:23   Timestamp: 1551449925.734045 (2019-03-01T22:18:45.734045)
INFO    27/07 23:14:23   --------------------------------------------------------------------------------
INFO    27/07 23:14:23   Model: deepbgc
INFO    27/07 23:14:23   Loading model from: /home/zarul/.local/share/deepbgc/data/0.1.0/detector/deepbgc.pkl
Using TensorFlow backend.
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING: Logging before flag parsing goes to stderr.
W0727 23:14:23.820647 140009826789184 deprecation_wrapper.py:119] From /home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0727 23:14:23.829619 140009826789184 deprecation_wrapper.py:119] From /home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0727 23:14:23.830600 140009826789184 deprecation_wrapper.py:119] From /home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0727 23:14:24.130435 140009826789184 deprecation_wrapper.py:119] From /home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

W0727 23:14:24.134406 140009826789184 deprecation.py:506] From /home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W0727 23:14:24.515460 140009826789184 deprecation_wrapper.py:119] From /home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

I0727 23:14:24.652741 140009826789184 info.py:26] Type: KerasRNN
I0727 23:14:24.652947 140009826789184 info.py:27] Version: 0.1.0
I0727 23:14:24.653012 140009826789184 info.py:28] Timestamp: 1551305667.986168 (2019-02-28T06:14:27.986168)
I0727 23:14:24.653129 140009826789184 info.py:22] --------------------------------------------------------------------------------
I0727 23:14:24.653177 140009826789184 info.py:23] Model: clusterfinder_original
I0727 23:14:24.653223 140009826789184 wrapper.py:174] Loading model from: /home/zarul/.local/share/deepbgc/data/0.1.0/detector/clusterfinder_original.pkl
I0727 23:14:24.655262 140009826789184 info.py:26] Type: ClusterFinderHMM
I0727 23:14:24.655326 140009826789184 info.py:27] Version: 0.1.0
I0727 23:14:24.655377 140009826789184 info.py:28] Timestamp: 1551449904.5101252 (2019-03-01T22:18:24.510125)
I0727 23:14:24.655647 140009826789184 info.py:67] ================================================================================
I0727 23:14:24.655702 140009826789184 info.py:68] Available classifiers: ['product_class', 'product_activity']
I0727 23:14:24.655784 140009826789184 info.py:22] --------------------------------------------------------------------------------
I0727 23:14:24.655827 140009826789184 info.py:23] Model: product_class
I0727 23:14:24.655869 140009826789184 wrapper.py:174] Loading model from: /home/zarul/.local/share/deepbgc/data/0.1.0/classifier/product_class.pkl
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.18.2 when using version 0.21.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/sklearn/base.py:306: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.18.2 when using version 0.21.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
I0727 23:14:24.696886 140009826789184 info.py:26] Type: RandomForestClassifier
I0727 23:14:24.696993 140009826789184 info.py:27] Version: 0.1.0
I0727 23:14:24.697047 140009826789184 info.py:28] Timestamp: 1551781410.019103 (2019-03-05T18:23:30.019103)
I0727 23:14:24.700250 140009826789184 info.py:22] --------------------------------------------------------------------------------
I0727 23:14:24.700344 140009826789184 info.py:23] Model: product_activity
I0727 23:14:24.700432 140009826789184 wrapper.py:174] Loading model from: /home/zarul/.local/share/deepbgc/data/0.1.0/classifier/product_activity.pkl
I0727 23:14:24.710430 140009826789184 info.py:26] Type: RandomForestClassifier
I0727 23:14:24.710538 140009826789184 info.py:27] Version: 0.1.0
I0727 23:14:24.710602 140009826789184 info.py:28] Timestamp: 1551781433.886473 (2019-03-05T18:23:53.886473)
I0727 23:14:24.710908 140009826789184 info.py:78] ================================================================================
I0727 23:14:24.710979 140009826789184 info.py:80] All OK

I tried to run deepbgc on a file containing 25 contigs, all above 20kb but it didn't complete (I specified only deepbgc detector). There was a tmp directory containing all the prodigal and pfam files. There was an empty evaluation directory - not expected based on your example results notebook. Here is the last few lines of LOG.txt:

Detected 36 total BGCs using deepbgc model
Number of BGCs with predicted product_class: 
 no confident class: 22
 Polyketide: 8
 Saccharide: 3
 RiPP: 3
 Terpene: 1
 NRP: 1
Number of BGCs with predicted product_activity: 
 antibacterial: 26
 no confident class: 9
 cytotoxic: 1
 antifungal: 1
tuple index out of range
Traceback (most recent call last):
  File "/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/deepbgc/main.py", line 113, in main
    run(argv)
  File "/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/deepbgc/main.py", line 102, in run
    args.func.run(**args_dict)
  File "/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/deepbgc/command/pipeline.py", line 187, in run
    writer.close()
  File "/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/deepbgc/output/evaluation/pfam_score_plot.py", line 28, in close
    self.save_plot()
  File "/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/deepbgc/output/evaluation/pfam_score_plot.py", line 59, in save_plot
    axes[i].plot(x, y, lw=0.75, alpha=0.6, color=color, label=column)
  File "/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/matplotlib/axes/_axes.py", line 1666, in plot
    lines = [*self._get_lines(*args, data=data, **kwargs)]
  File "/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/matplotlib/axes/_base.py", line 225, in __call__
    yield from self._plot_args(this, kwargs)
  File "/home/zarul/Zarul/Software/anaconda3/envs/bgc/lib/python3.7/site-packages/matplotlib/axes/_base.py", line 399, in _plot_args
    ncx, ncy = x.shape[1], y.shape[1]
IndexError: tuple index out of range
================================================================================
DeepBGC failed with IndexError: tuple index out of range
================================================================================

Thank you.

No module named 'sklearn.ensemble.forest'

EDIT:

I fixed the issue by downgrading scikit-learn to version 0.21 (conda install scikit-learn=0.21). sklearn.ensemble.forest is deprecated in version >= 0.22, where it uses sklearn.ensemble._forest

Hi,

I created a conda environment and installed deepbgc. The command I used was:

conda create -n deepbgc deepbgc

I then ran deepbgc download followed by deepbgc info. I get a warning:

INFO 20/01 14:38:43 Type: KerasRNN
INFO 20/01 14:38:43 Version: 0.1.0
INFO 20/01 14:38:43 Timestamp: 1551305667.986168 (2019-02-27T22:14:27.986168)
INFO 20/01 14:38:43 ================================================================================
INFO 20/01 14:38:43 Available classifiers: ['product_activity', 'product_class']
INFO 20/01 14:38:43 --------------------------------------------------------------------------------
INFO 20/01 14:38:43 Model: product_activity
INFO 20/01 14:38:43 Loading model from: /home/nick/.local/share/deepbgc/data/0.1.0/classifier/product_activity.pkl
WARNING 20/01 14:38:43 Model not supported: No module named 'sklearn.ensemble.forest'
INFO 20/01 14:38:43 --------------------------------------------------------------------------------
INFO 20/01 14:38:43 Model: product_class
INFO 20/01 14:38:43 Loading model from: /home/nick/.local/share/deepbgc/data/0.1.0/classifier/product_class.pkl
WARNING 20/01 14:38:43 Model not supported: No module named 'sklearn.ensemble.forest'
INFO 20/01 14:38:43 ================================================================================
WARNING 20/01 14:38:43 Some warnings detected, check the output above

When I try the pipeline, I get an error:

ERROR 20/01 14:43:59 ================================================================================
ERROR 20/01 14:43:59 DeepBGC failed with ModuleNotFoundError: No module named 'sklearn.ensemble.forest'
ERROR 20/01 14:43:59 ================================================================================

The deepbgc package installed includes:

The following NEW packages will be INSTALLED:

_libgcc_mutex conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
_openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-1_gnu
absl-py conda-forge/linux-64::absl-py-0.11.0-py37h89c1867_0
appdirs conda-forge/noarch::appdirs-1.4.4-pyh9f0ad1d_0
astor conda-forge/noarch::astor-0.8.1-pyh9f0ad1d_0
biopython conda-forge/linux-64::biopython-1.76-py37h516909a_0
c-ares conda-forge/linux-64::c-ares-1.17.1-h36c2ea0_0
ca-certificates conda-forge/linux-64::ca-certificates-2020.12.5-ha878542_0
cached-property conda-forge/noarch::cached-property-1.5.1-py_0
certifi conda-forge/linux-64::certifi-2020.12.5-py37h89c1867_1
cycler conda-forge/noarch::cycler-0.10.0-py_2
deepbgc bioconda/noarch::deepbgc-0.1.21-py_0
freetype conda-forge/linux-64::freetype-2.10.4-h0708190_1
gast conda-forge/noarch::gast-0.4.0-pyh9f0ad1d_0
google-pasta conda-forge/noarch::google-pasta-0.2.0-pyh8c360ce_0
grpcio conda-forge/linux-64::grpcio-1.34.1-py37hb27c1af_0
h5py conda-forge/linux-64::h5py-3.1.0-nompi_py37h1e651dc_100
hdf5 conda-forge/linux-64::hdf5-1.10.6-nompi_h6a2412b_1114
hmmer bioconda/linux-64::hmmer-3.3.1-he1b5a44_0
importlib-metadata conda-forge/linux-64::importlib-metadata-3.4.0-py37h89c1867_0
joblib conda-forge/noarch::joblib-1.0.0-pyhd8ed1ab_0
jpeg conda-forge/linux-64::jpeg-9d-h36c2ea0_0
keras conda-forge/linux-64::keras-2.2.5-py37_1
keras-applications conda-forge/noarch::keras-applications-1.0.8-py_1
keras-preprocessi~ conda-forge/noarch::keras-preprocessing-1.1.2-pyhd8ed1ab_0
kiwisolver conda-forge/linux-64::kiwisolver-1.3.1-py37h2527ec5_1
krb5 conda-forge/linux-64::krb5-1.17.2-h926e7f8_0
lcms2 conda-forge/linux-64::lcms2-2.11-hcbb858e_1
ld_impl_linux-64 conda-forge/linux-64::ld_impl_linux-64-2.35.1-hea4e1c9_1
libblas conda-forge/linux-64::libblas-3.9.0-7_openblas
libcblas conda-forge/linux-64::libcblas-3.9.0-7_openblas
libcurl conda-forge/linux-64::libcurl-7.71.1-hcdd3856_8
libedit conda-forge/linux-64::libedit-3.1.20191231-he28a2e2_2
libev conda-forge/linux-64::libev-4.33-h516909a_1
libffi conda-forge/linux-64::libffi-3.3-h58526e2_2
libgcc-ng conda-forge/linux-64::libgcc-ng-9.3.0-h2828fa1_18
libgfortran-ng conda-forge/linux-64::libgfortran-ng-9.3.0-hff62375_18
libgfortran5 conda-forge/linux-64::libgfortran5-9.3.0-hff62375_18
libgomp conda-forge/linux-64::libgomp-9.3.0-h2828fa1_18
libgpuarray conda-forge/linux-64::libgpuarray-0.7.6-h14c3975_1003
liblapack conda-forge/linux-64::liblapack-3.9.0-7_openblas
libnghttp2 conda-forge/linux-64::libnghttp2-1.41.0-h8cfc5f6_2
libopenblas conda-forge/linux-64::libopenblas-0.3.12-pthreads_h4812303_1
libpng conda-forge/linux-64::libpng-1.6.37-h21135ba_2
libprotobuf conda-forge/linux-64::libprotobuf-3.14.0-h780b84a_0
libssh2 conda-forge/linux-64::libssh2-1.9.0-hab1572f_5
libstdcxx-ng conda-forge/linux-64::libstdcxx-ng-9.3.0-h6de172a_18
libtiff conda-forge/linux-64::libtiff-4.2.0-hdc55705_0
libwebp-base conda-forge/linux-64::libwebp-base-1.1.0-h36c2ea0_3
lz4-c conda-forge/linux-64::lz4-c-1.9.3-h9c3ff4c_0
mako conda-forge/noarch::mako-1.1.4-pyh44b312d_0
markdown conda-forge/noarch::markdown-3.3.3-pyh9f0ad1d_0
markupsafe conda-forge/linux-64::markupsafe-1.1.1-py37h5e8e339_3
matplotlib-base conda-forge/linux-64::matplotlib-base-3.3.3-py37h0c9df89_0
ncurses conda-forge/linux-64::ncurses-6.2-h58526e2_4
numpy conda-forge/linux-64::numpy-1.19.5-py37haa41c4c_1
olefile conda-forge/noarch::olefile-0.46-pyh9f0ad1d_1
openssl conda-forge/linux-64::openssl-1.1.1i-h7f98852_0
pandas conda-forge/linux-64::pandas-1.2.0-py37hdc94413_1
pillow conda-forge/linux-64::pillow-8.1.0-py37he6b4880_1
pip conda-forge/noarch::pip-20.3.3-pyhd8ed1ab_0
prodigal bioconda/linux-64::prodigal-2.6.3-h516909a_2
protobuf conda-forge/linux-64::protobuf-3.14.0-py37hcd2ae1e_1
pygpu conda-forge/linux-64::pygpu-0.7.6-py37h902c9e0_1002
pyparsing conda-forge/noarch::pyparsing-2.4.7-pyh9f0ad1d_0
python conda-forge/linux-64::python-3.7.9-hffdb5ce_0_cpython
python-dateutil conda-forge/noarch::python-dateutil-2.8.1-py_0
python_abi conda-forge/linux-64::python_abi-3.7-1_cp37m
pytz conda-forge/noarch::pytz-2020.5-pyhd8ed1ab_0
pyyaml conda-forge/linux-64::pyyaml-5.3.1-py37h5e8e339_2
readline conda-forge/linux-64::readline-8.0-he28a2e2_2
scikit-learn conda-forge/linux-64::scikit-learn-0.24.1-py37h69acf81_0
scipy conda-forge/linux-64::scipy-1.6.0-py37h14a347d_0
setuptools conda-forge/linux-64::setuptools-49.6.0-py37h89c1867_3
six conda-forge/noarch::six-1.15.0-pyh9f0ad1d_0
sqlite conda-forge/linux-64::sqlite-3.34.0-h74cdb3f_0
tensorboard conda-forge/linux-64::tensorboard-1.14.0-py37_0
tensorflow conda-forge/linux-64::tensorflow-1.14.0-h4531e10_0
tensorflow-base conda-forge/linux-64::tensorflow-base-1.14.0-py37h4531e10_0
tensorflow-estima~ conda-forge/linux-64::tensorflow-estimator-1.14.0-py37h5ca1d4c_0
termcolor conda-forge/noarch::termcolor-1.1.0-py_2
theano conda-forge/linux-64::theano-1.0.3-py37hfc679d8_1
threadpoolctl conda-forge/noarch::threadpoolctl-2.1.0-pyh5ca1d4c_0
tk conda-forge/linux-64::tk-8.6.10-h21135ba_1
toolchain pkgs/main/linux-64::toolchain-2.4.0-0
toolchain_c_linux~ pkgs/main/linux-64::toolchain_c_linux-64-2.4.0-0
toolchain_cxx_lin~ pkgs/main/linux-64::toolchain_cxx_linux-64-2.4.0-0
tornado conda-forge/linux-64::tornado-6.1-py37h5e8e339_1
typing_extensions conda-forge/noarch::typing_extensions-3.7.4.3-py_0
werkzeug conda-forge/noarch::werkzeug-1.0.1-pyh9f0ad1d_0
wheel conda-forge/noarch::wheel-0.36.2-pyhd3deb0d_0
wrapt conda-forge/linux-64::wrapt-1.12.1-py37h5e8e339_3
xz conda-forge/linux-64::xz-5.2.5-h516909a_1
yaml conda-forge/linux-64::yaml-0.2.5-h516909a_0
zipp conda-forge/noarch::zipp-3.4.0-py_0
zlib conda-forge/linux-64::zlib-1.2.11-h516909a_1010
zstd conda-forge/linux-64::zstd-1.4.8-ha95c52a_1

The scikit-learn version is the latest (0.24.1) - was the model trained using this version or an older one?

using deepBGC with metagenomes

Dear users, I wonder if I can use deepBGC with metagenomic samples? In the paper describing the software it is mentioned as a useful tool for this kind of data but I don't know if it is implemented in the current version. I run a test with a sample (CPB-18) which is the scaffold file obtained from SPAdes and it quickly returned 0 matches I don't understand if this is a matter of the format I used or something else. This same file returned several matches or bgc with antiSMASH.

I noticed these lines while running it

/mnt/ubi/andres/miniconda3/envs/deepbio/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.tree. Anything that cannot be imported from sklearn.tree is now part of the private API. warnings.warn(message, FutureWarning) /mnt/ubi/andres/miniconda3/envs/deepbio/lib/python3.7/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.18.2 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk. UserWarning) /mnt/ubi/andres/miniconda3/envs/deepbio/lib/python3.7/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.18.2 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.

Before that I run the BGC sample included within the test folder and I obtained 2 hits as seen in the log attached here (BGC15 file).

Maybe I have a broken install of the program, I followed the conda instructions.
Please find attached the log from deepbgc info too.
pipeinfo.txt

BGC15.txt
sample.txt

Thanks for your help.

Installing 0.1.26 with Anaconda

I'm getting a failure to install deepbgc 0.1.26 using Anaconda. Fails with no available version of matplotlib-base==2.2.3

Is that version really a hard requirement?

`------

[9/9] RUN conda create -n dbgcenv -c bioconda -y deepbgc=0.1.26:
#13 0.684 Collecting package metadata (current_repodata.json): ...working... done
#13 3.771 Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
#13 4.193 Collecting package metadata (repodata.json): ...working... done
#13 14.16 Solving environment: ...working...
#13 23.31 Found conflicts! Looking for incompatible packages.
#13 23.31 This can take several minutes. Press CTRL-C to abort.
#13 23.31 failed
#13 23.31
#13 23.31 PackagesNotFoundError: The following packages are not available from current channels:
#13 23.31
#13 23.31 - matplotlib-base==2.2.3
#13 23.31
#13 23.31 Current channels:
#13 23.31
#13 23.31 - https://conda.anaconda.org/bioconda/linux-64
#13 23.31 - https://conda.anaconda.org/bioconda/noarch
#13 23.31 - https://repo.anaconda.com/pkgs/main/linux-64
#13 23.31 - https://repo.anaconda.com/pkgs/main/noarch
#13 23.31 - https://repo.anaconda.com/pkgs/r/linux-64
#13 23.31 - https://repo.anaconda.com/pkgs/r/noarch
#13 23.31
#13 23.31 To search for alternate channels that may provide the conda package you're
#13 23.31 looking for, navigate to
#13 23.31
#13 23.31 https://anaconda.org
#13 23.31
#13 23.31 and use the search bar at the top of the page.
`

Is there other way to download data set?[speed restriction]

Hi, I always get some trouble as follow when I download data model by deepbgc download.

:~$ deepbgc download
 _____                  ____    ____   ____ 
 |  _ \  ___  ___ ____ | __ )  / ___) / ___)
 | | \ \/ _ \/ _ \  _ \|  _ \ | |  _ | |    
 | |_/ /  __/  __/ |_) | |_) || |_| || |___ 
 |____/ \___|\___| ___/|____/  \____| \____)
=================|_|===== version 0.1.10 =====
INFO    10/10 11:48:16   ================================================================================
INFO    10/10 11:48:16   Checking: /home/mima123456/.local/share/deepbgc/data/0.1.0/detector/deepbgc.pkl
INFO    10/10 11:48:16   Downloading: https://github.com/Merck/deepbgc/releases/download/v0.1.0/deepbgc.pkl
ERROR   10/10 11:57:55   104
Traceback (most recent call last):
  File "/home/mima123456/miniconda3/lib/python3.8/site-packages/deepbgc/main.py", line 113, in main
    run(argv)
  File "/home/mima123456/miniconda3/lib/python3.8/site-packages/deepbgc/main.py", line 102, in run
    args.func.run(**args_dict)
  File "/home/mima123456/miniconda3/lib/python3.8/site-packages/deepbgc/command/download.py", line 20, in run
    util.download_files(DOWNLOADS)
  File "/home/mima123456/miniconda3/lib/python3.8/site-packages/deepbgc/util.py", line 393, in download_files
    download_file(url=url, target_path=target_path, checksum=checksum, gzipped=gzipped)
  File "/home/mima123456/miniconda3/lib/python3.8/site-packages/deepbgc/util.py", line 415, in download_file
    urlretrieve(url, download_path)
  File "/home/mima123456/miniconda3/lib/python3.8/urllib/request.py", line 276, in urlretrieve
    block = fp.read(bs)
  File "/home/mima123456/miniconda3/lib/python3.8/http/client.py", line 454, in read
    n = self.readinto(b)
  File "/home/mima123456/miniconda3/lib/python3.8/http/client.py", line 498, in readinto
    n = self.fp.readinto(b)
  File "/home/mima123456/miniconda3/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/home/mima123456/miniconda3/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/home/mima123456/miniconda3/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer
ERROR   10/10 11:57:55   ================================================================================
ERROR   10/10 11:57:55   DeepBGC failed with ConnectionResetError: 104
ERROR   10/10 11:57:55   ================================================================================
ERROR   10/10 11:57:55   Connection reset by peer
ERROR   10/10 11:57:55   ================================================================================

I don't know why I can't downlaod this data, and it seems that the speed of access to Github is restricted. So is there any other way to download data?

Prodigal Error

Error LOG part:

== Prodigal Error: ================

PRODIGAL v2.6.3 [February, 2016]
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.

Request: Single Genome, Phase: Training
Reading in the sequence(s) to train...

Error: Sequence must be 20000 characters (only 10000 read).
(Consider running with the -p meta option or finding more contigs from the same genome.)
== End Prodigal Error. ============

Will this (to run prodigal in meta mode) require some changes to source code or can be handled using deepbgc commandline?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.