Code Monkey home page Code Monkey logo

antismash's People

Contributors

alexamk avatar friederikebiermann avatar hyphaltip avatar judsze avatar kblin avatar lisavader avatar malanjary-wur avatar sjshaw avatar zachcp avatar zreitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

antismash's Issues

Error in Lassopeptide detection code

Describe the bug
Running bacterial genomes in AS5 -RC1 results in unexpected termination due to a bug in the lassopeptide prediction code.

Expected behavior
A clear and concise description of what you expected to happen.

To Reproduce

Reproducible example here:

# get an example gbk and a script to run docker with the latest tagged release
wget -O bgc.gbk https://mibig.secondarymetabolites.org/repository/BGC0001307/BGC0001307.1.final.gbk
wget https://gist.githubusercontent.com/zachcp/7b5fa4286a5e6f3b08fe638bac4cc5fb/raw/76f6f3ff5e2b7ca9f6b612a54508062a5e1f5080/run_as5rc1.sh

# convert to fasta and run with gene finding
seqmagick convert bgc.gbk bgc.fasta
bash run_as5rc1.sh bgc.fasta .  --genefinding-tool prodigal

Error Message Below

○ → bash run_as5rc1.sh bgc.fasta .  --genefinding-tool prodigal
Traceback (most recent call last):
  File "/usr/local/bin/antismash", line 11, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.5/dist-packages/antismash/__main__.py", line 120, in entrypoint
    sys.exit(main(sys.argv[1:]))
  File "/usr/local/lib/python3.5/dist-packages/antismash/__main__.py", line 109, in main
    antismash.run_antismash(sequence, options)
  File "/usr/local/lib/python3.5/dist-packages/antismash/main.py", line 572, in run_antismash
    result = _run_antismash(sequence_file, options)
  File "/usr/local/lib/python3.5/dist-packages/antismash/main.py", line 637, in _run_antismash
    analysis_timings = analyse_record(record, options, analysis_modules, module_results)
  File "/usr/local/lib/python3.5/dist-packages/antismash/main.py", line 263, in analyse_record
    run_module(record, module, options, previous_result, timings)
  File "/usr/local/lib/python3.5/dist-packages/antismash/main.py", line 235, in run_module
    results = module.run_on_record(record, results, options)
  File "/usr/local/lib/python3.5/dist-packages/antismash/modules/lassopeptides/__init__.py", line 75, in run_on_record
    return specific_analysis(record)
  File "/usr/local/lib/python3.5/dist-packages/antismash/modules/lassopeptides/specific_analysis.py", line 716, in specific_analysis
    motif = run_lassopred(record, cluster, candidate)
  File "/usr/local/lib/python3.5/dist-packages/antismash/modules/lassopeptides/specific_analysis.py", line 644, in run_lassopred
    result = determine_precursor_peptide_candidate(record, cluster, query, query.translation)
  File "/usr/local/lib/python3.5/dist-packages/antismash/modules/lassopeptides/specific_analysis.py", line 630, in determine_precursor_peptide_candidate
    valid, rodeo_score = run_rodeo(record, cluster, query, query_sequence[:end], query_sequence[end:])
  File "/usr/local/lib/python3.5/dist-packages/antismash/modules/lassopeptides/specific_analysis.py", line 596, in run_rodeo
    fimo_motifs, motif_score, fimo_scores = identify_lasso_motifs(leader, core)
  File "/usr/local/lib/python3.5/dist-packages/antismash/modules/lassopeptides/specific_analysis.py", line 423, in identify_lasso_motifs
    fimo_output = subprocessing.run_fimo_simple(motif_file, tempfile.name)
  File "/usr/local/lib/python3.5/dist-packages/antismash/common/subprocessing.py", line 299, in run_fimo_simple
    result = execute(command)
  File "/usr/local/lib/python3.5/dist-packages/antismash/common/subprocessing.py", line 91, in execute
    proc = Popen(commands, stdin=stdin_redir, stdout=stdout, stderr=stderr)
  File "/usr/lib/python3.5/subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.5/subprocess.py", line 1282, in _execute_child
    raise child_exception_type(errno_num, err_msg)
PermissionError: [Errno 13] Permission denied
Running antiSMASH FAILED

System (please complete the following information):
docker AS5 release

Additional context
Reproducible example in docker with public data attached above.

Clusterblast downloads failing

Describe the bug
I've installed 5-0-0rc1 on a Redhat 6 system using python 3.7.3.

Clusterblast fails to download:

$ env PATH=/disks/patric-common/runtime/bin:$PATH  /disks/patric-common/runtime/antismash-5-0-0rc1/bin/download-antismash-databases --database-dir /vol/patric3/production/data/antismash/antismash-5-0-0rc1
/disks/patric-common/runtime/antismash-5-0-0rc1/lib/python3.7/site-packages/scss/selector.py:54: FutureWarning: Possible nested set at position 329
  ''', re.VERBOSE | re.MULTILINE)
Creating checksum of Pfam-A.hmm
PFAM file present and ok for version 27.0
Creating checksum of Pfam-A.hmm
PFAM file present and ok for version 31.0
Resfams database present and checked
Downloading ClusterBlast database.
Traceback (most recent call last):
  File "/disks/patric-common/runtime/lib/python3.7/urllib/request.py", line 1317, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/disks/patric-common/runtime/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/disks/patric-common/runtime/lib/python3.7/http/client.py", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/disks/patric-common/runtime/lib/python3.7/http/client.py", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/disks/patric-common/runtime/lib/python3.7/http/client.py", line 1016, in _send_output
    self.send(msg)
  File "/disks/patric-common/runtime/lib/python3.7/http/client.py", line 956, in send
    self.connect()
  File "/disks/patric-common/runtime/lib/python3.7/http/client.py", line 1392, in connect
    server_hostname=server_hostname)
  File "/disks/patric-common/runtime/lib/python3.7/ssl.py", line 412, in wrap_socket
    session=session
  File "/disks/patric-common/runtime/lib/python3.7/ssl.py", line 853, in _create
    self.do_handshake()
  File "/disks/patric-common/runtime/lib/python3.7/ssl.py", line 1117, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/disks/patric-common/runtime/antismash-5-0-0rc1/lib/python3.7/site-packages/antismash/download_databases.py", line 82, in download_file
    req = request.urlopen(url)
  File "/disks/patric-common/runtime/lib/python3.7/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/disks/patric-common/runtime/lib/python3.7/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/disks/patric-common/runtime/lib/python3.7/urllib/request.py", line 543, in _open
    '_open', req)
  File "/disks/patric-common/runtime/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/disks/patric-common/runtime/lib/python3.7/urllib/request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/disks/patric-common/runtime/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/disks/patric-common/runtime/antismash-5-0-0rc1/bin/download-antismash-databases", line 11, in <module>
    load_entry_point('antismash==5.0.0rc1', 'console_scripts', 'download-antismash-databases')()
  File "/disks/patric-common/runtime/antismash-5-0-0rc1/lib/python3.7/site-packages/antismash/download_databases.py", line 352, in _main
    download(args)
  File "/disks/patric-common/runtime/antismash-5-0-0rc1/lib/python3.7/site-packages/antismash/download_databases.py", line 330, in download
    download_clusterblast(args.database_dir)
  File "/disks/patric-common/runtime/antismash-5-0-0rc1/lib/python3.7/site-packages/antismash/download_databases.py", line 307, in download_clusterblast
    download_if_not_present(CLUSTERBLAST_URL, archive_filename, CLUSTERBLAST_ARCHIVE_CHECKSUM)
  File "/disks/patric-common/runtime/antismash-5-0-0rc1/lib/python3.7/site-packages/antismash/download_databases.py", line 207, in download_if_not_present
    download_file(url, filename)
  File "/disks/patric-common/runtime/antismash-5-0-0rc1/lib/python3.7/site-packages/antismash/download_databases.py", line 84, in download_file
    raise DownloadError("ERROR: File not found on server.\nPlease check your internet connection.")
antismash.download_databases.DownloadError: ERROR: File not found on server.
Please check your internet connection.

I have installed the latest certifi package, and verified it holds the root certificate
"DST Root CA X3" that your site uses via LetsEncrypt.

Expected behavior
download-databases pulls the databases properly

To Reproduce
Install 5-0-0rc1 on a Redhat 6 system using python 3.7.3 via a python venv.

Screenshots
Pasted text above

System (please complete the following information):

  • OS: CentOS 6.10
  • Browser: N/A

Additional context

DeepBGC integration

Hi @kblin and team. We are happy about your involvement in the DeepBGC module, would you be interested in integrating the module in antiSMASH?

DeepBGC can be used programmatically, here's an example:

from deepbgc import DeepBGCDetector, DeepBGCClassifier

os.environ["DEEPBGC_DOWNLOADS_DIR"] = os.path.join(os.path.basename(__file__), 'data')

detector = DeepBGCDetector('deepbgc', merge_max_protein_gap=1, merge_max_nucl_gap=2000, score_threshold=0.5)
product_classifier = DeepBGCClassifier('product_class')
activity_classifier = DeepBGCClassifier('product_activity')

detector.run(record)
product_classifier.run(record)
activity_classifier.run(record)

If you are interested, please let me know and I will be happy to get involved and discuss next steps.

multiprocessing issue

Hi,
I am doing a full-featured run of antismash 5.1 on a bacterial high-quality genome. I run the following command:

antismash -c 60 --fullhmmer --cf-create-clusters --cb-general --cb-knownclusters --cb-subclusters --asf --pfam2go --smcog-trees --output-dir 3_secondary_metab/C08 C08.gbk

and get repeatedly this warning:

Process ForkPoolWorker-141: Traceback (most recent call last): File "/project02/miniconda3/envs/antismash/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/project02/miniconda3/envs/antismash/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/project02/miniconda3/envs/antismash/lib/python3.7/multiprocessing/pool.py", line 110, in worker task = get() File "/project02/miniconda3/envs/antismash/lib/python3.7/multiprocessing/queues.py", line 354, in get return _ForkingPickler.loads(res) File "/project02/miniconda3/envs/antismash/lib/python3.7/site-packages/antismash/common/secmet/qualifiers/nrps_pks.py", line 103, in extend raise NotImplementedError("Extending this list won't work") NotImplementedError: Extending this list won't work

It doesn't kill the execution of the routine but the job never finish and there is no CPU usage after the last occurrence of the error.
I am using a recent conda installation of antismash on a Linux Mint 19.2

Error loading genbank record using secmet record

Hi guys,
I'm trying to use Record.from_genbank on this file. The run has been generated beginning of February and I just installed antismash again using pip (see below for versions).

Trying to load the file above I get several Exceptions. I used to run the function without problem a year or so ago.

from antismash.common.secmet import Record

 rec = Record.from_genbank("Amycolatopsis_keratinaphila.gbk")

Traceback (most recent call last):
  File "/.../lib/python3.7/site-packages/antismash/common/secmet/record.py", line 340, in get_domain_by_name
    return self._domains_by_name[name]
KeyError: 'nrpspksdomains_ctg1_2108_Condensation_Starter.1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/.../lib/python3.7/site-packages/antismash/common/secmet/features/module.py", line 201, in from_biopython
    domains = [record.get_domain_by_name(domain) for domain in domain_names]
  File "/.../lib/python3.7/site-packages/antismash/common/secmet/features/module.py", line 201, in <listcomp>
    domains = [record.get_domain_by_name(domain) for domain in domain_names]
  File "/.../lib/python3.7/site-packages/antismash/common/secmet/record.py", line 342, in get_domain_by_name
    raise KeyError("record %s contains no domain named %s" % (self.id, name))
KeyError: 'record NZ_LT629789.1 contains no domain named nrpspksdomains_ctg1_2108_Condensation_Starter.1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/.../lib/python3.7/site-packages/antismash/common/secmet/record.py", line 762, in from_biopython
    record.add_feature(cls.from_biopython(feature, record=record))
  File "/.../lib/python3.7/site-packages/antismash/common/secmet/features/module.py", line 204, in from_biopython
    bio_feature.location, err))
ValueError: record does not contain domain referenced by module at [2331621:2334660](+): 'record NZ_LT629789.1 contains no domain named nrpspksdomains_ctg1_2108_Condensation_Starter.1'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../lib/python3.7/site-packages/antismash/common/secmet/record.py", line 774, in from_genbank
    records.append(Record.from_biopython(bio, taxon))
  File "/.../lib/python3.7/site-packages/antismash/common/secmet/record.py", line 764, in from_biopython
    raise SecmetInvalidInputError(str(err)) from err
antismash.common.secmet.errors.SecmetInvalidInputError: record does not contain domain referenced by module at [2331621:2334660](+): 'record NZ_LT629789.1 contains no domain named nrpspksdomains_ctg1_2108_Condensation_Starter.1'
pip freeze
antismash==5.1.1
bcbio-gff==0.6.6
biopython==1.76
cycler==0.10.0
helperlibs==0.1.11
Jinja2==2.11.1
joblib==0.14.1
kiwisolver==1.1.0
MarkupSafe==1.1.1
matplotlib==3.1.3
numpy==1.18.1
pyparsing==2.4.6
pyScss==1.3.5
pysvg-py3==0.2.2.post3
python-dateutil==2.8.1
scikit-learn==0.22.1
scipy==1.4.1
six==1.14.0

Warning from conda install.

Thank you very much for your efforts in getting antiSMASH v5 onto bioconda.

When running the new antismash via conda however, I am getting this warning message indicating errors that I will assume happen in the future?:

/home/lamma/miniconda3/envs/antismash_v5/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=FutureWarning)
/home/lamma/miniconda3/envs/antismash_v5/lib/python3.7/site-packages/scss/selector.py:54: FutureWarning: Possible nested set at position 329
  ''', re.VERBOSE | re.MULTILINE)

Add back NRPS Predictor3 to the AD domain substrate prediction step

Is your feature request related to a problem? Please describe.
We've noticed on a few clusters we've been working on that the Adenylation domain substrate predictions differ in AS5 relative to AS4.

Describe the solution you'd like
Would it be possible to restore the use of NRPSPredictor3 to the AS5 pipeline?

Describe alternatives you've considered

  1. Use AS4 and AS5 in parallel on the same sequences
  2. run NRPSPredictor3 ourselves

Additional context
Here is a breakdown of the Adenylation domain substrate differences between AS4 and AS5 on the mibig dataset: substrate_prediction_comparison. It seems most of the differences between the two programs are related to no_calls which I would guess to be due to differences in acceptance thresholds. However there is also a small subset of sequences where there are differences.

ClusterBlast in antismash 5 returns only bacterial hits for fungal genomes

Describe the bug
ClusterBlast in antismash 5 returns only bacterial hits for fungal genomes

Expected behavior
ClusterBlast in antismash 4 returned fungal hits for fungal genomes

To Reproduce
For analysis issues, provide an accession number or a record fragment that can reproduce the problem. fungi-0d30459c-21b0-469a-84cf-146b64dcaebc

two screenshots attached for the same cluster in

antismash4
image
and antismash5
!
image

the problem applicable to all clusters :(

[known, sub]clusterblast results not shown if run locally

when ran locally, antiSMASH5 still retain the same bug as 4's clusterblast module: the SVGs will not be shown in the html output because it violates cross-origin resource sharing (CORS) policy.
given that we're now also store clusterblast result as an object, would it be a good idea to reimplement the html visualization e.g. by re-drawing the SVGs inline?

image

No geneclusters.txt in the new version?

Looking for this file in my outputs from Antismash, I didn't find it?
Is there any option to generate this file or it is not available in the new version?
Do you have any post-processing data script to do this?

Thank you in advance.

handling gbk files with leader sequence

Describe the bug
@kblin I have been following your suggestion of using secmet to extract info from the genbank files. See #216. This works for most files, but fails for files which have the string leader, for example: prepeptide=leader, leader_location="[56991:57051](-)

For such files, I get the error

antismash.common.secmet.errors.SecmetInvalidInputError: Features that bridge the record origin cannot be directly created: join{[56991:57051](-)}

Expected behavior
These files should be parsed like others which don't have this string
Attached is a file which can help reproduce this error
CS1EDSR2D_564265.region001.gbk.txt

genefinding disabled

Describe the bug
I'm trying to run antiSMASH with the Docker image that was downloaded today around 1500hrs (Germany Time) as follows:

bash run_antismash-full input.fna '/output/folder' --cb-general --cb-knownclusters --cb-subclusters --asf --pfam2go

However I keep getting the following error, I keep getting the following error:

WARNING  12/07 13:14:50   Fasta header too long: renamed "Chlamy10_contig_108" to "c00108_Chlamy1.."
. . . many of these . . .
WARNING  12/07 13:14:50   Fasta header too long: renamed "Chlamy10_contig_109" to "c00109_Chlamy1.."

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/usr/local/lib/python3.5/dist-packages/antismash/common/record_processing.py", line 239, in ensure_cds_info
    genefinding(sequence, options)
  File "/usr/local/lib/python3.5/dist-packages/antismash/detection/genefinding/__init__.py", line 93, in run_on_record
    raise ValueError("Called find_genes, but genefinding disabled")
ValueError: Called find_genes, but genefinding disabled
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/antismash", line 11, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.5/dist-packages/antismash/__main__.py", line 124, in entrypoint
    sys.exit(main(sys.argv[1:]))
  File "/usr/local/lib/python3.5/dist-packages/antismash/__main__.py", line 113, in main
    antismash.run_antismash(sequence, options)
  File "/usr/local/lib/python3.5/dist-packages/antismash/main.py", line 574, in run_antismash
    result = _run_antismash(sequence_file, options)
  File "/usr/local/lib/python3.5/dist-packages/antismash/main.py", line 628, in _run_antismash
    cast(AntismashModule, genefinding))
  File "/usr/local/lib/python3.5/dist-packages/antismash/common/record_processing.py", line 388, in pre_process_sequences
    sequences = parallel_function(partial, ([sequence] for sequence in sequences))
  File "/usr/local/lib/python3.5/dist-packages/antismash/common/subprocessing/base.py", line 132, in parallel_function
    results = jobs.get(timeout=timeout)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
ValueError: Called find_genes, but genefinding disabled
Running antiSMASH FAILED

I'm fairly certain I've installed Docker and the image correctly, as I followed the instructions on the docs pages.

System (please complete the following information):

  • OS: Ubuntu 18.04
    .

Error during Java version check

Antismash fails prerequisite check during installation with following traceback:

silwer@ibch@dna ~ $ antismash --check-prereqs
/usr/local/lib/python3.6/dist-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=FutureWarning)
Traceback (most recent call last):
  File "/home/domain/silwer/.local/bin/antismash", line 8, in <module>
    sys.exit(entrypoint())
  File "/home/domain/silwer/.local/lib/python3.6/site-packages/antismash/__main__.py", line 126, in entrypoint
    sys.exit(main(sys.argv[1:]))
  File "/home/domain/silwer/.local/lib/python3.6/site-packages/antismash/__main__.py", line 115, in main
    antismash.run_antismash(sequence, options)
  File "/home/domain/silwer/.local/lib/python3.6/site-packages/antismash/main.py", line 612, in run_antismash
    result = _run_antismash(sequence_file, options)
  File "/home/domain/silwer/.local/lib/python3.6/site-packages/antismash/main.py", line 619, in _run_antismash
    _log_found_executables(options)
  File "/home/domain/silwer/.local/lib/python3.6/site-packages/antismash/main.py", line 718, in _log_found_executables
    version = " ({})".format(version_getter())  # pylint: disable=not-callable
  File "/home/domain/silwer/.local/lib/python3.6/site-packages/antismash/common/subprocessing/java.py", line 21, in run_java_version
    raise RuntimeError(msg % java)
RuntimeError: unexpected output from java: /usr/bin/java, check path

though java works just fine:

silwer@ibch@dna ~ $ java --version
Picked up _JAVA_OPTIONS: -Dswing.defaultlaf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel -Dawt.useSystemAAFontSettings=on
openjdk 11.0.6 2020-01-14
OpenJDK Runtime Environment (build 11.0.6+10-post-Ubuntu-1ubuntu118.04.1)
OpenJDK 64-Bit Server VM (build 11.0.6+10-post-Ubuntu-1ubuntu118.04.1, mixed mode, sharing)

The reason for the crash is the presence of the custom _JAVA_OPTIONS, which were set through environmental vars. If I unset them, then antismash passes all the checks.

silwer@ibch@dna ~ $ unset _JAVA_OPTIONS
silwer@ibch@dna ~ $ java -version
openjdk version "11.0.6" 2020-01-14
OpenJDK Runtime Environment (build 11.0.6+10-post-Ubuntu-1ubuntu118.04.1)
OpenJDK 64-Bit Server VM (build 11.0.6+10-post-Ubuntu-1ubuntu118.04.1, mixed mode, sharing)
silwer@ibch@dna ~ $ antismash --check-prereqs
/usr/local/lib/python3.6/dist-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=FutureWarning)
All prerequisites satisfied

It seems, that the root of the problem is https://github.com/antismash/antismash/blob/master/antismash/common/subprocessing/java.py

with a very basic check:

def run_java_version() -> str:
    """ Get the version of the java binary """
    java = get_config().executables.java
    command = [
        java,
        "-version",
    ]

    version_string = execute(command).stderr
    if not version_string.startswith("openjdk") and not version_string.startswith("java"):
        msg = "unexpected output from java: %s, check path"
        raise RuntimeError(msg % java)
    # get rid of the non-version stuff in the output
    return version_string.split()[2].strip('"')

which fails due to the usage of startswith function. the issue can be solved by this naive check:

if not "openjdk" in version_string and not "java" in version_string:

antismash-hmmscan runtime error

Describe the bug
antismash 5.0 runntime error:
RuntimeError: hmmscan returned 1: 'Error: TC bit thresholds unavailable on model Enediyne-KS' while scanning '>CMLKBGAL_01274\nMSSKLIYTGKAKDIYTTEDEHVIRSVYKDQATMLNGARKETIEGKGVLNNQISSLIFEKLNAAGVATHFIERISDTEQLNKKVT'

Expected behavior
A clear and concise description of what you expected to happen.

To Reproduce
For analysis issues, provide an accession number or a record fragment that can reproduce the problem.

antismash --cpus 8 --cf-create-clusters --smcog-trees --cb-knownclusters --asf --pfam2go --output-dir results/umgs/ERR2764806.metaspades.bin.4.fa --genefinding-gff3 ERR2764806.metaspades.bin.4.gff ERR2764806.metaspades.bin.4.fna

For interaction issues, also include steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Screenshots
If applicable, add screenshots to help explain your problem.
image

System (please complete the following information):

  • OS: [e.g. iOS, Ubuntu]
    centos 6.9
  • Browser [e.g. chrome, safari] (if relevant)
    chromium 77.0.3865.90

Additional context
Add any other context about the problem here.

Permission denied with smCOG

Describe the bug

I am seeing a 'Permission denied' error with using the --smcog-trees parameter; tested using a manual installation of antiSMASH and a venv installation. The application finishes successfully without this parameter.

[cjfields@compute-8-0 2019-05-16-test-as]$ antismash --logfile as.log --debug --cb-general --cb-knownclusters --cb-subclusters --asf --pfam2go --smcog-trees -c $SLURM_NPROCS --genefinding-gff GCF_002157265.1_ASM215726v1_genomic.gff GCF_002157265.1_ASM215726v1_genomic.fna
...
DEBUG    16/05 10:40:44   annotating CDS features with resist info: 1 CDSes
DEBUG    16/05 10:40:44   annotating CDS features with smcogs info: 174 CDSes
DEBUG    16/05 10:40:44   Checking if antismash.modules.smcog_trees should be run
INFO     16/05 10:40:44   Running antismash.modules.smcog_trees
INFO     16/05 10:40:44   Calculating and drawing phylogenetic trees of cluster genes with smCOG members
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/apps/software/Python/3.6.1-IGB-gcc-4.9.4/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/apps/software/Python/3.6.1-IGB-gcc-4.9.4/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/modules/smcog_trees/trees.py", line 68, in smcog_tree_analysis
    draw_tree(input_number, output_dir, gene_id)
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/modules/smcog_trees/trees.py", line 138, in draw_tree
    run_result = subprocessing.execute(command)
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/common/subprocessing/base.py", line 86, in execute
    proc = Popen(commands, stdin=stdin_redir, stdout=stdout, stderr=stderr)
  File "/home/apps/software/Python/3.6.1-IGB-gcc-4.9.4/lib/python3.6/subprocess.py", line 707, in __init__
    restore_signals, start_new_session)
  File "/home/apps/software/Python/3.6.1-IGB-gcc-4.9.4/lib/python3.6/subprocess.py", line 1326, in _execute_child
    raise child_exception_type(errno_num, err_msg)
PermissionError: [Errno 13] Permission denied
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/bin/antismash", line 11, in <module>
    sys.exit(entrypoint())
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/__main__.py", line 124, in entrypoint
    sys.exit(main(sys.argv[1:]))
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/__main__.py", line 113, in main
    antismash.run_antismash(sequence, options)
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/main.py", line 574, in run_antismash
    result = _run_antismash(sequence_file, options)
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/main.py", line 638, in _run_antismash
    analysis_timings = analyse_record(record, options, analysis_modules, module_results)
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/main.py", line 264, in analyse_record
    run_module(record, module, options, previous_result, timings)
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/main.py", line 236, in run_module
    results = module.run_on_record(record, results, options)
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/modules/smcog_trees/__init__.py", line 128, in run_on_record
    trees = generate_trees(smcogs_dir, record.get_cds_features_within_regions(), nrpspks_genes)
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/modules/smcog_trees/trees.py", line 48, in generate_trees
    subprocessing.parallel_function(smcog_tree_analysis, args)
  File "/home/groups/hpcbio/apps/antismash/antismash-github/install/lib/python3.6/site-packages/antismash/common/subprocessing/base.py", line 132, in parallel_function
    results = jobs.get(timeout=timeout)
  File "/home/apps/software/Python/3.6.1-IGB-gcc-4.9.4/lib/python3.6/multiprocessing/pool.py", line 608, in get
    raise self._value
PermissionError: [Errno 13] Permission denied

Expected behavior

Well, that --smcog-trees would work, but I know this one is a difficult issue to wrangle, particularly with a complex tool with a lot of moving parts ;)

To Reproduce

Example data (GCF_002157265.1_ASM215726v1_genomic.fna and GCF_002157265.1_ASM215726v1_genomic.gff) from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/157/265/GCF_002157265.1_ASM215726v1

Using Github checkout cbd4ad3 (May 14).

Screenshots

None

System (please complete the following information):

  • OS:

[cjfields@compute-8-0 2019-05-16-test-as]$ cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

  • Browser [e.g. chrome, safari] (if relevant)

None (command line)

Additional context

The run passes all checks for installation; I am attaching a log file for this

My suspicion is that an application is being called that isn't installed, but I can't hone in on which one it is. muscle or Fasttree are in the path.
as.log

updatable pfam2go database

The current antismash release (5.1.1) comes with an outdated version of the pfam2go database. The one which is distributed with the tarball is dated to !version date: 2018/02/24 12:32:44 while the current one (http://current.geneontology.org/ontology/external2go/pfam2go) is dated to !version date: 2019/12/14 15:47:44. A lot has changed - versions differ not only in the number of lines (10570 vs. 10332 in the latter) but also in the naming of some pathways, e.g. (note the hyphen in the G-protein)


< Pfam:PF00001 7tm_1 > GO:G-protein coupled receptor activity ; GO:0004930
< Pfam:PF00001 7tm_1 > GO:G-protein coupled receptor signaling pathway ; GO:0007186
---
> Pfam:PF00001 7tm_1 > GO:G protein-coupled receptor activity ; GO:0004930
> Pfam:PF00001 7tm_1 > GO:G protein-coupled receptor signaling pathway ; GO:0007186

see the attached diff (https://pastebin.com/4pKmuzmt) for a more detailed view.

Thus it would be very convenient to have an opportunity to change the version or, at least, use the most recent one in the corresponding dir. At the moment the name of the database is hardcoded in several places:

modules/pfam2go/__init__.py:38:        pfam2go-march-2018.txt: mapping file for Pfam to Gene Ontology mapping
modules/pfam2go/__init__.py:41:    if path.locate_file(path.get_full_path(__file__, 'data', 'pfam2go-march-2018.txt')) is None:
modules/pfam2go/pfam2go.py:159:    full_gomap_as_ontologies = construct_mapping(path.get_full_path(__file__, 'data', 'pfam2go-march-2018.txt'))
modules/pfam2go/test/test_pfam2go.py:84:        data = path.get_full_path(os.path.dirname(__file__), 'data', 'pfam2go-march-2018.txt')

How reliable is the "Most similar known cluster" column?

The "Most similar known cluster" provides a % similarity to a significant BCG hit in blast but when that column returns say 3% or even 30%, is that really enough to take away that this BGC might produce that compound? Sorry if I am misunderstanding the purpose of that output.

Is there a way to combine Antismash run results?

Hello Team Antismash,

I am wondering if there is a way to combined Antismash results from different runs. For example - to take two JSON outputs, combine them, and then generate a new, combined html output? I played around a bit with the AntismashResults class but it looks like Module results are either not stored or not parsed so, while I was able to load the JSON files I was unable to generate fresh html. Is there some way to do this currently? If not, I think this might be a useful additional tool.

Regards,
zach cp

Tab-delimited output?

Apologies if my request is misplaced here. I see the TSV outputs were available in v3, with JSON being the preferred output format for v5

I am looking for a simple script which can parse the antiSMASH output and return a frequency table of the detected BGC's along with feature data such as the predicted product. I am working a bunch of metagenome assembled genomes and therefore trying to aggregate results in a meaningful way.

Error during check_prerequisites, because antiSMASH installation directory is not writable

Describe the bug
antiSMASH throws error during check_prerequisites. It tries to update /usr/local/lib/python3.5/dist-packages/antismash/detection/hmm_detection/data/bgc_seeds.hmm, but this file is not writable for the user (belongs to root).

Expected behavior
From my point of view antiSMASH should not update any file systems file, when a user runs the tool. Is there an option for placing these prerequisities somewhere, where the user can write? Or can I disable the check_prerequisities step?

To Reproduce

I try to install antiSMASH on our HPC cluster. Because antiSMASH has many dependencies, I wanted to use a Singularity container. Therefore I pulled your docker container with Singularity 3.2.1:

singularity pull docker://antismash/standalone:5.0.0

In order to user the Singularity container, I changed the container executation in your wrapper script to:

singularity --debug run --writable-tmpfs \
    --bind ${INPUT_DIR}:${CONTAINER_SRC_DIR} \
    --bind ${OUTPUT_DIR}:${CONTAINER_DST_DIR} \
    standalone_5.0.0.sif \
    ${INPUT_FILE} \
    $@

Then I execute the wrapper script with a small test data set, provided by the user:

./run_antismash ./example_input/CP026121sequence.fasta \
    /example_output/ \
    -c 1 -v -d \
    --genefinding-tool prodigal \
    --logfile test.log

The error message I get is:

ERROR    19/07 11:05:51   antismash.detection.hmm_detection: preqrequisite failure: Failed to generate file '/usr/local/lib/python3.5/dist-packages/antismash/detection/hmm_detection/data/bgc_seeds.hmm'
Traceback (most recent call last):
  File "/usr/local/bin/antismash", line 11, in <module>
    sys.exit(entrypoint())
  File "/usr/local/lib/python3.5/dist-packages/antismash/__main__.py", line 124, in entrypoint
    sys.exit(main(sys.argv[1:]))
  File "/usr/local/lib/python3.5/dist-packages/antismash/__main__.py", line 113, in main
    antismash.run_antismash(sequence, options)
  File "/usr/local/lib/python3.5/dist-packages/antismash/main.py", line 574, in run_antismash
    result = _run_antismash(sequence_file, options)
  File "/usr/local/lib/python3.5/dist-packages/antismash/main.py", line 603, in _run_antismash
    check_prerequisites(options.all_enabled_modules, options)
  File "/usr/local/lib/python3.5/dist-packages/antismash/main.py", line 511, in check_prerequisites
    raise RuntimeError("Modules failing prerequisites")
RuntimeError: Modules failing prerequisites
Running antiSMASH FAILED

System (please complete the following information):

  • OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
  • Singularity 3.2.1

Additional context
I think I would have the same problem if I install an environment module, because the installation directory also not writable for a user of our HPC cluster.

could not parse records from GFF3

Describe the bug
Encounter a parsing error for GFF3 file when running a standalone antiSMASH 5.0.0 from https://hub.docker.com/r/antismash/standalone on Linux. The error message shows "could not parse records from GFF3".

Expected behavior
The same GFF3 file was working using the public web server for antiSMASH 5.0.0.

To Reproduce
Here is the commands:
/home/antismash/bin/run_antismash ~/data/sequences/SPAdes_Pcocy_lophfium_scaffolds.fasta ~/data/maker/pcocy/antismash --taxon fungi --tta-threshold 0.65 --cb-general --cb-subclusters --cb-knownclusters --smcog-trees --fullhmmer --asf --pfam2go --cf-create-clusters --genefinding-tool none --genefinding-gff3 ~/data/maker/pcocy/pcocy.all.functional.ipr.sprot.gff

System (please complete the following information):

  • OS: Ubuntu18.04

Additional context
GFF3 file was generated by MAKER.

Feature Request: Provide Custom Clusters to Clusterblast

Feature Request: Provide a supported method for adding clusters to Known/Subcluster-blast

Is your feature request related to a problem? Please describe.
Lets say you have a number of gene clusters that cannot be shared publicly but which you would like to incorporate into Antismash. Ideally, there would be a way to take these clusters, extract the relevant information, and incorporate them into the known-cluster or sub-cluster files so that when you run Antismash your clusters are included in the results.

Describe the solution you'd like

Take as input one or more GBK/JSON files and:

  1. choose known cluster or sub cluster
  2. generate the appropriate files:
    1. known:
      1. antismash/antismash/modules/clusterblast/data/known/clusters.txt
      2. antismash/antismash/modules/clusterblast/data/known/proteins.fasta
    2. sub:
      1. antismash/antismash/modules/clusterblast/data/sub/clusters.txt
      2. antismash/antismash/modules/clusterblast/data/sub/proteins.fasta
  3. lint/validate the files
  4. append these to the originals and rebuild the blast DBs
  5. Now clusterblast will incorporate the new clusters as desired.

Describe alternatives you've considered
Cluster and fasta files need to be generated to be used in clusterblast. These files can be
generated during build/install or they can be specified at runtime. I have highlighted the option whereby we build the files beforehand. However, the ability to add files at runtime may be more attractive from an operational point of view: e.g. Antismash can still be built as before but you could specify extra filepaths to look for. This would require the same steps as above except (4.) would now act at runtime. It would need to extend the clusterblast module to handle linting/validation of the files and to either combine these files on-the-fly or run 2 blast jobs and combine the results.

Additional context
I think this is a potentially useful feature that would be of value to a wide range of academic and industrial groups.

NotImplementedError: Extending this list won't work

Manually installed antiSMASH v5.1 on a linux server
Did a test run with the minimal flag and the output looked fine

But running a full-featured run throws the following error

Process ForkPoolWorker-62:
Traceback (most recent call last):
File "miniconda3/envs/antismash/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run()
File "miniconda3/envs/antismash/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "miniconda3/envs/antismash/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "miniconda3/envs/antismash/lib/python3.7/multiprocessing/queues.py", line 354, in get
return _ForkingPickler.loads(res)
File "miniconda3/envs/antismash/lib/python3.7/site-packages/antismash/common/secmet/qualifiers/nrps_pks.py", line 103, in extend
raise NotImplementedError("Extending this list won't work")
NotImplementedError: Extending this list won't work

No structures for NRPS's in antiSMASH 5.0.0

Describe the bug
There are no NRPS structures drawn for an NRPS's that did get predicted structures in antiSMASH 4.

To Reproduce
I've installed antiSMASH-5.0.0 from the Releases section of the Github antiSMASH site and it seems to work fine. I'm using GCA_000021785.1_ASM2178v1_genomic.gbff from NCBI (Bacillus cereus) for testing. When run in antiSMASH 4, the NRPS pages show predicted structures, whereas in antiSMASH 5, I get a link to the Norine web page that doesn't work.

I noticed that the example antiSMASH output for S. coelicolor on the antiSMASH web site, the NRPS's do have structures available.

Is there something missing in the installation that I can add to get the structure prediction images? Going to the Norine web site is not useful in my context.

Bug in GeneFunctionAnnotations assignment

Describe the bug
I added a custom Resfam pHMM model that overlaps semantically with an existing smCOG annotation. I see the smCOG annotation (other) in the html output, (gene color is gray).

Expected behavior
I expect the resistance annotation to take priority (e.g. pink gene in the html output). I can see both the smCOG and resist annotations in the side panel, and the resist annotation has the higher bitscore.

According to the comments in the GeneFunctionsAnnotations class, I would expect the resfams to take priority:

# then priority for resfam, then smcogs
annotations = self._by_tool.get("resfam", self._by_tool.get("smcogs"))
if annotations:

However, I believe the name of tool is misspecified in line 182:

        # then priority for resfam, then smcogs
-        annotations = self._by_tool.get("resfam", self._by_tool.get("smcogs"))
+        annotations = self._by_tool.get("resist", self._by_tool.get("smcogs"))
        if annotations:
            return annotations[0].function

If both smcogs and resist is present, the annotation defaults to smcog.

To Reproduce
I can create a reproducible example if you disagree that the line above is a straight-forward bug.

System (please complete the following information):

  • standalone docker

ValueError: numpy.dtype has the wrong size, try recompiling. Expected 88, got 96

Describe the bug
Trying to run a few strains using a command line, I got an error.
Expected behavior
As normal, I expected that it worked fine as usual.

To Reproduce
My command line
for i in GCF_000954135.1 ; do run_antismash.py -c 8 --taxon bacteria --input-type nucl --transatpks_da --transatpks_da_cutoff 2 --clusterblast --subclusterblast --smcogs --inclusive --full-hmmer --asf -v --outputfolder antismash_results_156_ralstonia/${i}.antismash ${i}.gbff; done

  File "/usr/local/bin/antismash-4.2.0/run_antismash.py", line 45, in <module>
    from antismash.specific_modules import (
  File "/usr/local/bin/antismash-4.2.0/antismash/specific_modules/lantipeptides/__init__.py", line 20, in <module>
    from .specific_analysis import specific_analysis
  File "/usr/local/bin/antismash-4.2.0/antismash/specific_modules/lantipeptides/specific_analysis.py", line 27, in <module>
    from antismash.specific_modules.lassopeptides.specific_analysis import distance_to_pfam, find_all_orfs
  File "/usr/local/bin/antismash-4.2.0/antismash/specific_modules/lassopeptides/__init__.py", line 20, in <module>
    from .specific_analysis import specific_analysis
  File "/usr/local/bin/antismash-4.2.0/antismash/specific_modules/lassopeptides/specific_analysis.py", line 33, in <module>
    from svm_lasso import svm_classify
  File "/usr/local/bin/antismash-4.2.0/antismash/specific_modules/lassopeptides/svm_lasso/svm_classify.py", line 54, in <module>
    import sklearn
  File "/usr/lib/python2.7/dist-packages/sklearn/__init__.py", line 57, in <module>
    from .base import clone
  File "/usr/lib/python2.7/dist-packages/sklearn/base.py", line 12, in <module>
    from .utils.fixes import signature
  File "/usr/lib/python2.7/dist-packages/sklearn/utils/__init__.py", line 10, in <module>
    from .murmurhash import murmurhash3_32
  File "__init__.pxd", line 155, in init sklearn.utils.murmurhash (sklearn/utils/murmurhash.c:6314)

System (please complete the following information):
Debian 9

Bug in draw_tree (modules/smcog_trees/trees.py)

I used Manual installation from https://docs.antismash.secondarymetabolites.org/install/

execution command:

ls $INPUT/*.fna | sed 's/.*\///'| sed 's/.fna//'| parallel -j $CPU "echo ============ processing {};
          time antismash --cpus $CPU -v --taxon fungi --fullhmmer --cassis --cf-create-clusters --smcog-trees --cb-general --cb-subclusters --cb-knownclusters --asf --pfam2go --output-dir $OUT/{} --genefinding-gff3 antismashIN/{}.gff --logfile $OUTPUT/{}.log  antismashIN/{}.fna"

I got an error message:

INFO 05/06 03:00:36 Analysing record: EU1_51
INFO 05/06 03:00:36 Running antismash.detection.full_hmmer
INFO 05/06 03:00:36 Running whole-genome PFAM search
INFO 05/06 03:00:48 Detecting secondary metabolite clusters
INFO 05/06 03:00:48 Running antismash.detection.hmm_detection
INFO 05/06 03:00:49 Running antismash.detection.cassis
INFO 05/06 03:00:49 Detecting gene cluster regions using CASSIS
INFO 05/06 03:00:49 Running antismash.detection.clusterfinder_probabilistic
INFO 05/06 03:00:49 Running ClusterFinder to detect probabilistic gene clusters
INFO 05/06 03:00:49 No regions detected, skipping record
INFO 05/06 03:00:49 Analysing record: EU1_52
INFO 05/06 03:00:49 Running antismash.detection.full_hmmer
INFO 05/06 03:00:49 Running whole-genome PFAM search
INFO 05/06 03:01:03 Detecting secondary metabolite clusters
INFO 05/06 03:01:03 Running antismash.detection.hmm_detection
INFO 05/06 03:01:03 Running antismash.detection.cassis
INFO 05/06 03:01:03 Detecting gene cluster regions using CASSIS
INFO 05/06 03:01:03 Running antismash.detection.clusterfinder_probabilistic
INFO 05/06 03:01:03 Running ClusterFinder to detect probabilistic gene clusters
INFO 05/06 03:01:03 1 region(s) detected in record
INFO 05/06 03:01:03 Running antismash.detection.nrps_pks_domains
INFO 05/06 03:01:04 Running antismash.detection.genefunctions
INFO 05/06 03:01:05 Running antismash.modules.smcog_trees
INFO 05/06 03:01:05 Calculating and drawing phylogenetic trees of cluster genes with smCOG members
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/anaconda/envs/antismash5/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/opt/anaconda/envs/antismash5/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/modules/smcog_trees/trees.py", line 68, in smcog_tree_analysis
draw_tree(input_number, output_dir, gene_id)
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/modules/smcog_trees/trees.py", line 161, in draw_tree
label_func=lambda node: str(node).replace("|", " "))
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/Bio/Phylo/_utils.py", line 344, in draw
import matplotlib.pyplot as plt
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/matplotlib/pyplot.py", line 32, in
import matplotlib.colorbar
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/matplotlib/colorbar.py", line 29, in
import matplotlib.collections as collections
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/matplotlib/collections.py", line 2056, in
docstring.interpd.update(LineCollection=artist.kwdoc(LineCollection))
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/matplotlib/artist.py", line 1583, in kwdoc
leadingspace=2))
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/matplotlib/artist.py", line 1354, in pprint_setters
accepts = self.get_valid_values(prop)
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/matplotlib/artist.py", line 1247, in get_valid_values
return re.sub("\n *", " ", match.group(1))
File "/opt/anaconda/envs/antismash5/lib/python3.6/re.py", line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/anaconda/envs/antismash5/bin/antismash", line 10, in
sys.exit(entrypoint())
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/main.py", line 124, in entrypoint
sys.exit(main(sys.argv[1:]))
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/main.py", line 113, in main
antismash.run_antismash(sequence, options)
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/main.py", line 574, in run_antismash
result = _run_antismash(sequence_file, options)
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/main.py", line 638, in _run_antismash
analysis_timings = analyse_record(record, options, analysis_modules, module_results)
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/main.py", line 264, in analyse_record
run_module(record, module, options, previous_result, timings)
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/main.py", line 236, in run_module
results = module.run_on_record(record, results, options)
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/modules/smcog_trees/init.py", line 128, in run_on_record
trees = generate_trees(smcogs_dir, record.get_cds_features_within_regions(), nrpspks_genes)
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/modules/smcog_trees/trees.py", line 48, in generate_trees
subprocessing.parallel_function(smcog_tree_analysis, args)
File "/opt/anaconda/envs/antismash5/lib/python3.6/site-packages/antismash/common/subprocessing/base.py", line 132, in parallel_function
results = jobs.get(timeout=timeout)
File "/opt/anaconda/envs/antismash5/lib/python3.6/multiprocessing/pool.py", line 670, in get
raise self._value
TypeError: expected string or bytes-like object

real 60m17.158s
user 144m35.199s
sys 15m16.297s

The error happens not immideately. some trees were created.

Ubuntu18.04

Antismash 4.2 --help says to follow the antismash call with -h, this seems to be resulting in no analysis occuring.?

Describe the bug
I am using Antismash 4.2 due to installing via conda but the --help documentation says to use antismash -h [options] but this just results in the -h call showing all the help documentation but no other execution or analysis occuring. How can I run antismash with custom options without this -h flag? the slurm generated output log shows no errors of any kind.

Expected behavior
Antismash to begin analysis with the options I flagged

System (please complete the following information):

  • OS: Unix based cluster (genomedk)

Results compatible with previous genome annotation

We are running antismash to find biosynthetic capabilities of different organisms, but we already have our own annotation of these genomes.
Is there a way to give our own genebank file with our annotation to generate data compatible with our own open reading frames?

more docs on using a custom database location

I ran download-antismash-database --database-dir $MY_SPECIAL_PATH, which worked fine, but I can't find any docs on how to use a custom database location for running antismash. I don't see any such parameter in the antismash script doc and no information. I'm guessing that I have to set some environmental variable (eg., ANTISMASH_DB_DIR=$MY_SPECIAL_PATH), but I can't find any information in the docs on this.

It would help to have clear docs (script and/or at https://docs.antismash.secondarymetabolites.org/install/) for using a custom database path.

AttributeError: module 'scss' has no attribute 'Compiler' when antismash --check-prereqs

antismash --check-prereqs
Traceback (most recent call last):
File "/usr/local/bin/antismash", line 9, in
entrypoint()
File "antismash-5.0.0/antismash/main.py", line 124, in entrypoint
sys.exit(main(sys.argv[1:]))
File "/antismash-5.0.0/antismash/main.py", line 113, in main
antismash.run_antismash(sequence, options)
File "antismash-5.0.0/antismash/main.py", line 574, in run_antismash
result = _run_antismash(sequence_file, options)
File "antismash-5.0.0/antismash/main.py", line 596, in _run_antismash
check_prerequisites(modules, options)
File "antismash-5.0.0/antismash/main.py", line 504, in check_prerequisites
res = module.check_prereqs(options)
File "antismash-5.0.0/antismash/outputs/html/init.py", line 66, in check_prereqs
return prepare_data()
File "antismash-5.0.0/antismash/outputs/html/init.py", line 57, in prepare_data
result = scss.Compiler(output_style="expanded").compile(flavour + ".scss")
AttributeError: module 'scss' has no attribute 'Compiler'

The file:
https://github.com/antismash/antismash/blob/master/antismash/outputs/html/__init__.py
requiere
https://github.com/Kronuz/pyScss
should be added:
pip uninstall libsass
pip uninstall scss
pip install pyScss

and in:
https://github.com/antismash/antismash/blob/master/antismash/outputs/html/__init__.py
add:
from scss import Compiler
under import scss

With that the issue is fixed

Greetings

i have some problems

hello

when I downloaded antismash, problem coming out.
refer to this web site (https://docs.antismash.secondarymetabolites.org/install/)

my os is ubuntu 18.04

antismash --check-prereqs

when i insert the coding in terminal, this results coming out

ERROR 19/04 17:22:20 Failed to locate executable for 'long-orfs'
ERROR 19/04 17:22:20 Failed to locate executable for 'extract'
ERROR 19/04 17:22:20 Failed to locate executable for 'build-icm'
ERROR 19/04 17:22:20 Failed to locate executable for 'glimmer3'
ERROR 19/04 17:22:20 Failed to locate executable for 'glimmerhmm'
ERROR 19/04 17:22:20 Failed to locate executable for 'hmmpfam2'
ERROR 19/04 17:22:20 Failed to locate executable for 'hmmpfam2'
ERROR 19/04 17:22:20 Failed to locate executable for 'java'
ERROR 19/04 17:22:20 Failed to locate executable for 'hmmpfam2'
ERROR 19/04 17:22:20 Failed to locate executable for 'hmmpfam2'
ERROR 19/04 17:22:20 Not all prerequisites met

i download all they need follow the step but come out the error
help me...

thank you

Extract module co-ordinates from antismash output.

Is your feature request related to a problem? Please describe.
Extract module nucleotide co-ordinates for NRPS coding ORFs. Since v5 module co-ordinate inference has become difficult due to lack of the txt folder from v4.1.2

Describe the solution you'd like
Ideally, get a text file with tab separated output stating ORF-ID, Module-number, module.start, module.end

Describe alternatives you've considered

  1. An alternative would be to run antismash v4.1.2 and use the text files where the information exists and create a table using the CAT domain co-ordinates. (Highly mistake prone)
  2. Since I am a complete programming newbie, parsing the json file is difficult, though I guess the information is located in it from the web output.
  3. Partial solve but low throughput: using a json editor for mac like : Smart JSON Editor http://www.smartjsoneditor.com/ and extracting per ORF the co-ordinates from the modules section of the json tree (Note: This gives only amino acid positions and not nucleic acid)

Additional context
I need to apply the alternative to a large number of geneclusters with long NRPSs which makes manual co-ordinate extraction tedious and mistake prone. This need not be a standard output but if there is a flag in the antismash offline run to output this info it would be great.

Advantages
This will enable meta-analysis on evolution of modules and help understand diversification mechanisms in NRPS.

KeyError when re-running antismashed gbk file

Describe the bug
When running antismash 5 in a docker container, I'm getting a KeyError when the update_prediction function from the nrps_pks/parsers module is called.

Expected behavior
I expect AD/KS domains to be sanitized and rebuilt from scratch since --skip-sanitisation is not flagged.

To Reproduce

wget https://mibig.secondarymetabolites.org/repository/BGC0000001/BGC0000001.1.region001.gbk
mkdir BGC0000001

## Gives error
docker run -v ${PWD}:/input/ -v ${PWD}/BGC0000001:/output/ --detach=false --rm \
    --user=$(id -u):$(id -g) antismash/standalone:5.0.0 BGC0000001.1.region001.gbk

## No error (but is the source of BGC0000001) 
efetch -db nuccore -format gb -id JF752342.1 > JF752342.1.gbk
docker run -v ${PWD}:/input/ -v ${PWD}/BGC0000001:/output/ --detach=false --rm \
    --user=$(id -u):$(id -g) antismash/standalone:5.0.0 JF752342.1.gbk

System (please complete the following information):

  • antismash/standalone:5.0.0

Additional context
Normally, I would think about just stripping out the annotations, but this is also interacting with issue #180, where we want to combine / reuse existing results.

thanks!

Error when run antismash 4.2.

I run antismash 4.2 for 10 genomes dowloaded from NCBI. After downloaded, i run prokka then take *.gbk file to run antismash. 9 others file is ok. But 1 file when i run antismash. I had this error

/home/vdsa/software/antismash-4.2.0/run_antismash.py -c 12 --transatpks_da --clusterblast --subclusterblast --knownclusterblast --smcogs --inclusive --borderpredict --full-hmmer --asf --tta -v --outputfolder GCA_003945505.1_ASM394550v1_genomic GCA_003945505.1_ASM394550v1_genomic.gbk
INFO 10/01 14:07:55 Loading detection plugins
INFO 10/01 14:07:55 Parsing the input sequence(s)
INFO 10/01 14:07:58 Analyzing record 1 (CP029618.1)
INFO 10/01 14:07:58 Detecting secondary metabolite signature genes for contig #1
INFO 10/01 14:08:12 Detecting secondary metabolite clusters using inclusive ClusterFinder algorithm for contig #1
INFO 10/01 14:08:12 Running whole-genome pfam search
INFO 10/01 14:43:38 Running ClusterFinder HMM to detect gene clusters
INFO 10/01 14:43:50 Running cluster-specific analyses
INFO 10/01 14:43:50 Calculating detailed predictions for Lanthipeptide clusters
INFO 10/01 14:48:04 Calculating detailed predictions for lasso peptide clusters
INFO 10/01 14:48:32 Predicting NRPS A domain substrate specificities with SANDPUMA
INFO 10/01 15:10:25 Predicting PKS AT domain substrate specificities by Yadav et al. PKS signature sequences
INFO 10/01 15:10:37 Predicting PKS AT domain substrate specificities by Minowa et al. method
INFO 10/01 15:10:59 Predicting CAL domain substrate specificities by Minowa et al. method
INFO 10/01 15:11:00 Predicting PKS KR activity and stereochemistry using KR fingerprints from Starcevic et al.
INFO 10/01 15:11:30 Aligning Tans-AT PKS domains
INFO 10/01 15:11:53 TransATPKS: constructing phylogeny tree for each KS domain
INFO 10/01 15:17:49 Phylogenetic analysis: predicting substrate specificity of KS
INFO 10/01 15:17:56 MAFFT: generating pairwise distance matrix of multiple sequence alignment of all domains
INFO 10/01 15:23:19 TRANSATPKS: generating distance matrix of assembly lines
Traceback (most recent call last):
File "/home/vdsa/software/antismash-4.2.0/run_antismash.py", line 1210, in
main()
File "/home/vdsa/software/antismash-4.2.0/run_antismash.py", line 559, in main
run_analyses(seq_record, options, plugins)
File "/home/vdsa/software/antismash-4.2.0/run_antismash.py", line 643, in run_analyses
cluster_specific_analysis(plugins, seq_record, options)
File "/home/vdsa/software/antismash-4.2.0/run_antismash.py", line 1191, in cluster_specific_analysis
plugin.specific_analysis(seq_record, options)
File "/home/vdsa/software/antismash-4.2.0/antismash/specific_modules/nrpspks/specific_analysis.py", line 99, in specific_analysis
classify_nrpspks_domains_ks(pksnrpsvars, seq_record, options)
File "/home/vdsa/software/antismash-4.2.0/antismash/specific_modules/nrpspks/nrpspks_classification.py", line 159, in classify_nrpspks_domains_ks
similar_bgc_per_cluster, new_cluster, new_cluster_index = cdt.run_calculate_distance(data_dir = data_dir, seq_simScore = KS_msa_dist, ksa_per_new_cluster=KS_per_cluster, cutoff_bgc_nr = bgcs_nr)
File "/home/vdsa/software/antismash-4.2.0/antismash/specific_modules/nrpspks/nrpspksdomainalign/calculate_distance_transATPKS.py", line 494, in run_calculate_distance
BGCs, DMS, cluster_seq, new_cluster, new_cluster_index = generate_BGCs_DMS(cluster=cluster, cluster_KSindex=cluster_KSindex, pseudo_aa=pseudo_aa, ksa_per_new_cluster=ksa_per_new_cluster)
File "/home/vdsa/software/antismash-4.2.0/antismash/specific_modules/nrpspks/nrpspksdomainalign/calculate_distance_transATPKS.py", line 119, in generate_BGCs_DMS
new_cluster, new_cluster_index = _get_new_cluster_index(ksa_per_new_cluster=ksa_per_new_cluster)
File "/home/vdsa/software/antismash-4.2.0/antismash/specific_modules/nrpspks/nrpspksdomainalign/calculate_distance_transATPKS.py", line 68, in _get_new_cluster_index
cluster_i_dict_wob[int(index)] = [ksa_index, cluster_i[k][0]]
ValueError: invalid literal for int() with base 10: 'KS?'

antismash.detection.cassis error

Describe the bug
Encountered the following error messages when running antiSMASH 5.0.0 from docker image with the option of --cassis.
antismash.detection.cassis: preqrequisite failure: Failed to locate executable for 'meme'
antismash.detection.cassis: preqrequisite failure: Failed to locate executable for 'fimo'

To Reproduce
/home/antismash/bin/run_antismash ~/data/maker/pcocy/SPAdes_Pcocy_lophfium_scaffolds.fasta ~/data/maker/pcocy/test --taxon fungi --cassis --tta-threshold 0.65 --cb-general --cb-subclusters --cb-knownclusters --smcog-trees --fullhmmer --asf --pfam2go --cf-create-clusters --genefinding-tool none --genefinding-gff3 /input/pcocy.all.functional.ipr.sprot.gff

System (please complete the following information):

  • OS: Ubuntu

Biopython Warning about partial codon form NCBI's genome genbank records

I was testing out antiSMASH (downloaded using Bioconda) using few known pathogens. I downloaded genbank records from NCBI (e.g. Clostridium difficile : GCA_002073735.2 ) and ran antismash with default settings. Got the following warning. This doesn't keep it from generating an output,

/usr/miniconda2/envs/antismash/lib/python2.7/site-packages/Bio/Seq.py:2576: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.

I downloaded 5 other genomes from NCBI and got the same warning for all of them.

Question: antismash summary of many genomes

Hello, antismash development team,
how to do large-scale analysis of antismash results of many genomes in a programmatic manner? parse *.gbk or parse *.json or parse regions.js?

Keep accession version information

Is your feature request related to a problem? Please describe.
Yes, basically I run antiSMASH on some RefSeq bacterial assemblies in fasta format. Some of those have multiple fasta records within a single assembly, and each fasta record has an ID of the form XXXXXXXX.#. My issue is that antiSMASH removes the last part of the accession ID (the .#) so that output files get names like XXXXXXXX_BGC.txt, and BGCs get names like XXXXXXXX_c1. Hence I have no direct way of matching my original fasta record IDs to the filenames. I have resorted to changing the IDs in my original input files to some custom unique values and keeping a separate table that matches my unique accessions to the original ones. Obviously this is cumbersome and far from optimal since I duplicate data and add extra steps whenever I want to match the antiSMASH results with other data.

Describe the solution you'd like
Simply keeping the original IDs of the input fasta records would be the ideal. This would be the best practice since software shouldn't modify the IDs of the input sequences, and the predictions technically correspond not only to an accession, but to a specific version as well.

Describe alternatives you've considered
I imagine there could be reasons why having a dot in the BGC_ID could cause some incompatibilities in the pipeline. An alternative would be to add a column to the output files that have the original unmodified ID of the input fasta record where a BGC is found.

Great software
Sur

No BGCs in B.thetaiotamicron

Bacteroides thetaiotaomicron VPI-5482 (NCBI assembly ID: GCA_000011065.1) has experimentally verified BGCs in their genome which is also present in MIBiG. But why I am getting no antismash results in this strain? I ran antismash both locally (conda) and in antismash server.

Are antismash txt results 0-indexed or 1-indexed? right open or closed intervals?

I am running antiSMASH 4.2 installed through conda and saving only the txt output.

I cannot find explicit documentation indicating what the values in the "BGC_range" column in the _BGC.txt files means.

For example if I had 1:2 in the range, and the sequence ACT, which bases are included in the feature?

  1. The first and second base of the sequence: AC.
  2. The first base of the sequence: A.
  3. The second base of the sequence: C.
  4. The second and third base of the sequence: CT.
    ??

Thanks

Best way to generate a datatable of information represented in the html output ?

Whilst the htlm output it very useful to look at, having a tabular version of it would be more useful for generating summaries across samples or groups of samples and just general use in down stread analysis. I know you used to have the genecluster.txt output but no longer generate that so what is the best way to get such a tabular output now?

make header rename optional

Is your feature request related to a problem? Please describe.
Some of my genomes have long contig headers and preserving them as such is useful for downstream stats. I understand the rational behind renaming is to conform with the genbank limit, but I guess it becomes cumbersome for users who don't intend to work/submit using the genbank format

Describe the solution you'd like
Renaming headers should be optional

CASSIS not working in default docker installation

Describe the bug
A clear and concise description of what the bug is.

Expected behavior
A clear and concise description of what you expected to happen.

To Reproduce
For analysis issues, provide an accession number or a record fragment that can reproduce the problem.

For interaction issues, also include steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Screenshots
If applicable, add screenshots to help explain your problem.

System (please complete the following information):

  • OS: [e.g. iOS, Ubuntu]
  • Browser [e.g. chrome, safari] (if relevant)

Additional context
Add any other context about the problem here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.