Code Monkey home page Code Monkey logo

Comments (13)

zhangrengang avatar zhangrengang commented on August 27, 2024 1

@oushujun , you are right. I have revised the number in last comment to 2356.

from tesorter.

zhangrengang avatar zhangrengang commented on August 27, 2024

Shujun, I will add it soon. These HMMs are based on nucleotide sequences, and will not be merged with other protein-based databases.

from tesorter.

zhangrengang avatar zhangrengang commented on August 27, 2024

@oushujun, now, SINEs are supported in the lastest Github version by running:

TEsorter -db sine rice6.9.5.liban

The summary of output:

Positive Negative
SINE 26 17
non-SINE 30 2358

So the sensitivity = 26/(26+17) = 0.60 and the precision = 26/(26+30) = 0.46. It seems to be not well-performed with this method, compared with AnnoSINE.

from tesorter.

oushujun avatar oushujun commented on August 27, 2024

@zhangrengang, the SINE sequences were re-curated by the AnnoSINE authors and updated in the rice library: https://github.com/oushujun/riceTElib
There are 200-ish new SINE sequences added to the library, and some helitron sequences containing SINE fragments were cleaned. Can you incorporate this new library to TEsorter? Thank you.

Shujun

from tesorter.

zhangrengang avatar zhangrengang commented on August 27, 2024

@oushujun, I have added the library v7.0.0 into the folder TEsorter/test/. With this library, the summary of output is as follows:

Positive Negative
SINE 230 9
non-SINE 32 2356

So the sensitivity = 230/(230+9) = 0.96 and the precision = 230/(230+32) = 0.88.

The false positive cases are detailed as follows, which can be controlled by e-value (e.g. increase the threshold to 1e-6).

Os0190_INT#LTR/Copia    TEsorter        CDS     805     912     0.1     +       .       ID=Os0190_INT#LTR/Copia|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=23.7;evalue=0.0001;probability=0.76
Os0222#DNAnona/MULE     TEsorter        CDS     188     321     0.08    +       .       ID=Os0222#DNAnona/MULE|SINE-1_ECa;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=38.5;evalue=1.3e-05;probability=0.44
Os0343#DNAnona/CACTA    TEsorter        CDS     516     602     0.04    +       .       ID=Os0343#DNAnona/CACTA|SINE2-1a_SBi;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=32.7;evalue=0.00054;probability=0.71
Os0521#DNAnona/hAT      TEsorter        CDS     1       101     0.15    +       .       ID=Os0521#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.7;evalue=9.6e-06;probability=0.7
Os0545#DNAnona/hAT      TEsorter        CDS     1       91      0.2     +       .       ID=Os0545#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=67.8;evalue=9.6e-08;probability=0.78
Os0563#DNAnona/hAT      TEsorter        CDS     119     264     0.09    +       .       ID=Os0563#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=59.7;evalue=8.5e-06;probability=0.53
Os0623#DNAnona/MULE     TEsorter        CDS     267     386     0.1     +       .       ID=Os0623#DNAnona/MULE|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.3;evalue=0.00014;probability=0.54
Os0701#DNAnona/hAT      TEsorter        CDS     1       176     0.08    +       .       ID=Os0701#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=81.0;evalue=4.4e-05;probability=0.5
Os0848#DNAnona/MULE     TEsorter        CDS     216     335     0.09    +       .       ID=Os0848#DNAnona/MULE|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=55.8;evalue=0.00035;probability=0.68
Os0902#DNAnona/hAT      TEsorter        CDS     1       100     0.16    +       .       ID=Os0902#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=59.5;evalue=3.7e-06;probability=0.65
Os0909#MITE/Stow        TEsorter        CDS     59      151     0.08    +       .       ID=Os0909#MITE/Stow|SINE2-3_OS;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=25.1;evalue=0.00014;probability=0.63
Os0926#LINE/unknown     TEsorter        CDS     1092    1222    0.03    +       .       ID=Os0926#LINE/unknown|AtSB6;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=22.8;evalue=0.00087;probability=0.81
Os0986#DNAnona/MULE     TEsorter        CDS     251     344     0.05    +       .       ID=Os0986#DNAnona/MULE|SINE2-1_SBi;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=22.0;evalue=7e-05;probability=0.91
Os0987#DNAnona/hAT      TEsorter        CDS     48      157     0.06    +       .       ID=Os0987#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=42.7;evalue=0.00044;probability=0.78
Os1087#DNAnona/Helitron TEsorter        CDS     43      130     0.12    +       .       ID=Os1087#DNAnona/Helitron|SINE16_OS;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=41.6;evalue=0.00052;probability=0.77
Os1103#DNAnona/hAT      TEsorter        CDS     2       101     0.13    +       .       ID=Os1103#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=70.2;evalue=7.7e-05;probability=0.63
Os1106#DNAnona/Tourist  TEsorter        CDS     157     336     0.03    +       .       ID=Os1106#DNAnona/Tourist|SINE-1_ATr;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=31.7;evalue=0.00078;probability=0.43
Os1181#DNAnona/hAT      TEsorter        CDS     7       105     0.11    +       .       ID=Os1181#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=71.1;evalue=0.00022;probability=0.77
Os1296#DNAnona/hAT      TEsorter        CDS     4       112     0.12    +       .       ID=Os1296#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=76.9;evalue=0.00013;probability=0.63
Os1418#DNAnona/hAT      TEsorter        CDS     2       104     0.1     +       .       ID=Os1418#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.7;evalue=0.00059;probability=0.64
Os1523#DNAauto/hAT      TEsorter        CDS     1       148     0.06    +       .       ID=Os1523#DNAauto/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=62.6;evalue=0.0006;probability=0.53
Os1615#DNAnona/hAT      TEsorter        CDS     1       158     0.11    +       .       ID=Os1615#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=53.6;evalue=2.1e-07;probability=0.63
Os2008#DNAnona/hAT      TEsorter        CDS     3       91      0.13    +       .       ID=Os2008#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.7;evalue=4.6e-05;probability=0.75
Os2057#DNAnona/hAT      TEsorter        CDS     144     304     0.06    +       .       ID=Os2057#DNAnona/hAT|BoSB13;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=25.9;evalue=0.00041;probability=0.84
Os2523#DNAnona/hAT      TEsorter        CDS     1       112     0.08    +       .       ID=Os2523#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=67.3;evalue=4e-05;probability=0.56
Os2861#DNAnona/CACTG    TEsorter        CDS     21      145     0.08    +       .       ID=Os2861#DNAnona/CACTG|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.3;evalue=4.5e-05;probability=0.51
Os3087_INT#LTR/Gypsy    TEsorter        CDS     677     823     0.12    +       .       ID=Os3087_INT#LTR/Gypsy|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=76.9;evalue=2.2e-05;probability=0.45
Os3264#DNAauto/MLE      TEsorter        CDS     165     247     0.08    +       .       ID=Os3264#DNAauto/MLE|BoSB14A;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=26.3;evalue=0.00034;probability=0.74
Os3423#DNAnona/hAT      TEsorter        CDS     1050    1189    0.09    +       .       ID=Os3423#DNAnona/hAT|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=78.2;evalue=0.00054;probability=0.75
Os3507_LTR#LTR/Gypsy    TEsorter        CDS     31      229     0.03    +       .       ID=Os3507_LTR#LTR/Gypsy|SINE2-1_OS;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=40.3;evalue=0.001;probability=0.71
Os3527_ICR#DNAnona/hAT  TEsorter        CDS     1       100     0.16    +       .       ID=Os3527_ICR#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=70.2;evalue=3.5e-06;probability=0.72
Os3565#DNAnona/hAT      TEsorter        CDS     252     474     0.03    +       .       ID=Os3565#DNAnona/hAT|SINE2-1_STu;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=34.7;evalue=1.2e-05;probability=0.53

from tesorter.

oushujun avatar oushujun commented on August 27, 2024

@zhangrengang Thank you for the prompt update! The non-SNIE negative category should not be that high. The v7 library only has 2627 sequences. Can you double check? Thanks

from tesorter.

yuzhenpeng avatar yuzhenpeng commented on August 27, 2024

Hi, Ren-Gang,

I wonder if this upgrade works for animal genome annotations.

Thanks.

Zhenpeng

from tesorter.

zhangrengang avatar zhangrengang commented on August 27, 2024

@yuzhenpeng Hi, according to AnnoSINE, these SINE sequences are derived from plants, so I think it should not work well with animals.

from tesorter.

yuzhenpeng avatar yuzhenpeng commented on August 27, 2024

Thank you for your response.
By the way, if i want to annotate or classify animal TEs, such LINEs and SINEs. Do you have some suggestions?

@zhangrengang

from tesorter.

zhangrengang avatar zhangrengang commented on August 27, 2024

@yuzhenpeng , in my opinion, the RepeatMolderlor + RepeatMasker pipeline is ok.

from tesorter.

yuzhenpeng avatar yuzhenpeng commented on August 27, 2024

But, RepeatMolderlor seems can not accurate identify the LINEs or SINEs. It seems to be a little high or low proportion of the genome.
@zhangrengang

from tesorter.

zhangrengang avatar zhangrengang commented on August 27, 2024

@yuzhenpeng sorry, I have no better solution.

@oushujun Shujun, do you have any suggestions?

from tesorter.

yuzhenpeng avatar yuzhenpeng commented on August 27, 2024

Thanks. I found the solution from oushujun/EDTA#231.
But, identification SINEs seems difficult.

from tesorter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.