Code Monkey home page Code Monkey logo

Comments (5)

SJShaw avatar SJShaw commented on July 29, 2024 1

The NRPS/PKS domains are annotated as aSDomain features in the genbank outputs, so you can find the aSDomain feature with the matching domain_id from the CDS anntoations and use the protein locations annotated there.

The sec_met_domain annotations aren't handled in the same way, as they only really indicate the presence of at least one domain of that type (e.g. antiSMASH doesn't annotate each individual condensations domain as a unique sec_met_domain in the CDS feature annotations).

from antismash.

SJShaw avatar SJShaw commented on July 29, 2024

We deliberately removed this sort of text output file from antiSMASH 5, as everyone has different needs and we can't satisfy everyone with just one file and maintaining many extra files will inevitably go wrong.

The JSON has the core data from which the locations can be derived, but for your case, you may want to use the annotations from the genbank file instead, as the details you need are reasonably simple.

Here's a python snippet that will print out the equivalent of the table you'd like (you'll want to install biopython with pip install biopython, if not already installed). You can use it as a starting point.

from collections import defaultdict
import json

from Bio import SeqIO

module_counts = defaultdict(int)

for record in SeqIO.parse("your_genbank_file.gbk", "genbank"):
    for feature in record.features:
        if feature.type != "aSModule":
            continue
        locus = feature.qualifiers["locus_tags"][0]
        module_counts[locus] += 1
        columns = [locus, module_counts[locus], feature.location.start, feature.location.end]
        print("\t".join(map(str, columns)))

from antismash.

somakchowdhury avatar somakchowdhury commented on July 29, 2024

Thank you for taking the time to sort this out. Indeed many ancillary result files is a recipe for disaster.
I shall implement your suggestion. Closing the issue.

from antismash.

TaehyungKwon avatar TaehyungKwon commented on July 29, 2024

@SJShaw Hi, I'm trying to investigate sequential order of biosynthetic domains from the GBK output from antismash 5.1.2.
Using SeqIO, I extracted domains annotated by "/NRPS_PKS=" features, whereas I failed to resolve how to deal with "/gene_functions="biosynthetic (rule-based-clusters)"". The latter annotations do not include coordinates, and I could not align them with /NRPS_PKS domain information.
Is there any workaround to figure out all annotated domains and their order in each CDS using all "NRPS_PKS", "gene_functions="biosynthetic (rule-based-clusters)"", and "sec_met_domain"?

Thanks,
Taehyung

from antismash.

TaehyungKwon avatar TaehyungKwon commented on July 29, 2024

@SJShaw Thanks for a prompt response. I appreciate it.
I understand what you are saying, but I would like to find a workaround on this matter.
For example, I found putative T3-PKS, which contains PKS_KR domain in one of CDSs. This PKS_KR domain was annotated by both "NRPS_PKS" and "gene_functions="biosynthetic (rule-based-clusters)"" labels . However, there were N and C terminal chalcone synthases in this CDS, annotated only by "gene_functions="biosynthetic (rule-based-clusters)"" label. Although chalcone synthase is crucial for T3-PKS rule-based detection, location of these canonical domains will not be accessible, is that correct?
When I am trying to organize all canonical biosynthetic domains, this makes a problem because I cannot order some domains of NRPS_PKS labels with other domains of "gene_functions="biosynthetic (rule-based-clusters)"" labels. So, I had to dump some canonical domains? Do you have any suggestions to fill up this missing information?

from antismash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.