Comments (13)
@oushujun , you are right. I have revised the number in last comment to 2356.
from tesorter.
Shujun, I will add it soon. These HMMs are based on nucleotide sequences, and will not be merged with other protein-based databases.
from tesorter.
@oushujun, now, SINEs are supported in the lastest Github version by running:
TEsorter -db sine rice6.9.5.liban
The summary of output:
Positive | Negative | |
---|---|---|
SINE | 26 | 17 |
non-SINE | 30 | 2358 |
So the sensitivity = 26/(26+17) = 0.60 and the precision = 26/(26+30) = 0.46. It seems to be not well-performed with this method, compared with AnnoSINE.
from tesorter.
@zhangrengang, the SINE sequences were re-curated by the AnnoSINE authors and updated in the rice library: https://github.com/oushujun/riceTElib
There are 200-ish new SINE sequences added to the library, and some helitron sequences containing SINE fragments were cleaned. Can you incorporate this new library to TEsorter? Thank you.
Shujun
from tesorter.
@oushujun, I have added the library v7.0.0 into the folder TEsorter/test/
. With this library, the summary of output is as follows:
Positive | Negative | |
---|---|---|
SINE | 230 | 9 |
non-SINE | 32 | 2356 |
So the sensitivity = 230/(230+9) = 0.96 and the precision = 230/(230+32) = 0.88.
The false positive cases are detailed as follows, which can be controlled by e-value (e.g. increase the threshold to 1e-6).
Os0190_INT#LTR/Copia TEsorter CDS 805 912 0.1 + . ID=Os0190_INT#LTR/Copia|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=23.7;evalue=0.0001;probability=0.76
Os0222#DNAnona/MULE TEsorter CDS 188 321 0.08 + . ID=Os0222#DNAnona/MULE|SINE-1_ECa;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=38.5;evalue=1.3e-05;probability=0.44
Os0343#DNAnona/CACTA TEsorter CDS 516 602 0.04 + . ID=Os0343#DNAnona/CACTA|SINE2-1a_SBi;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=32.7;evalue=0.00054;probability=0.71
Os0521#DNAnona/hAT TEsorter CDS 1 101 0.15 + . ID=Os0521#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.7;evalue=9.6e-06;probability=0.7
Os0545#DNAnona/hAT TEsorter CDS 1 91 0.2 + . ID=Os0545#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=67.8;evalue=9.6e-08;probability=0.78
Os0563#DNAnona/hAT TEsorter CDS 119 264 0.09 + . ID=Os0563#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=59.7;evalue=8.5e-06;probability=0.53
Os0623#DNAnona/MULE TEsorter CDS 267 386 0.1 + . ID=Os0623#DNAnona/MULE|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.3;evalue=0.00014;probability=0.54
Os0701#DNAnona/hAT TEsorter CDS 1 176 0.08 + . ID=Os0701#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=81.0;evalue=4.4e-05;probability=0.5
Os0848#DNAnona/MULE TEsorter CDS 216 335 0.09 + . ID=Os0848#DNAnona/MULE|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=55.8;evalue=0.00035;probability=0.68
Os0902#DNAnona/hAT TEsorter CDS 1 100 0.16 + . ID=Os0902#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=59.5;evalue=3.7e-06;probability=0.65
Os0909#MITE/Stow TEsorter CDS 59 151 0.08 + . ID=Os0909#MITE/Stow|SINE2-3_OS;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=25.1;evalue=0.00014;probability=0.63
Os0926#LINE/unknown TEsorter CDS 1092 1222 0.03 + . ID=Os0926#LINE/unknown|AtSB6;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=22.8;evalue=0.00087;probability=0.81
Os0986#DNAnona/MULE TEsorter CDS 251 344 0.05 + . ID=Os0986#DNAnona/MULE|SINE2-1_SBi;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=22.0;evalue=7e-05;probability=0.91
Os0987#DNAnona/hAT TEsorter CDS 48 157 0.06 + . ID=Os0987#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=42.7;evalue=0.00044;probability=0.78
Os1087#DNAnona/Helitron TEsorter CDS 43 130 0.12 + . ID=Os1087#DNAnona/Helitron|SINE16_OS;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=41.6;evalue=0.00052;probability=0.77
Os1103#DNAnona/hAT TEsorter CDS 2 101 0.13 + . ID=Os1103#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=70.2;evalue=7.7e-05;probability=0.63
Os1106#DNAnona/Tourist TEsorter CDS 157 336 0.03 + . ID=Os1106#DNAnona/Tourist|SINE-1_ATr;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=31.7;evalue=0.00078;probability=0.43
Os1181#DNAnona/hAT TEsorter CDS 7 105 0.11 + . ID=Os1181#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=71.1;evalue=0.00022;probability=0.77
Os1296#DNAnona/hAT TEsorter CDS 4 112 0.12 + . ID=Os1296#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=76.9;evalue=0.00013;probability=0.63
Os1418#DNAnona/hAT TEsorter CDS 2 104 0.1 + . ID=Os1418#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.7;evalue=0.00059;probability=0.64
Os1523#DNAauto/hAT TEsorter CDS 1 148 0.06 + . ID=Os1523#DNAauto/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=62.6;evalue=0.0006;probability=0.53
Os1615#DNAnona/hAT TEsorter CDS 1 158 0.11 + . ID=Os1615#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=53.6;evalue=2.1e-07;probability=0.63
Os2008#DNAnona/hAT TEsorter CDS 3 91 0.13 + . ID=Os2008#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.7;evalue=4.6e-05;probability=0.75
Os2057#DNAnona/hAT TEsorter CDS 144 304 0.06 + . ID=Os2057#DNAnona/hAT|BoSB13;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=25.9;evalue=0.00041;probability=0.84
Os2523#DNAnona/hAT TEsorter CDS 1 112 0.08 + . ID=Os2523#DNAnona/hAT|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=67.3;evalue=4e-05;probability=0.56
Os2861#DNAnona/CACTG TEsorter CDS 21 145 0.08 + . ID=Os2861#DNAnona/CACTG|SINE1_SO;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=58.3;evalue=4.5e-05;probability=0.51
Os3087_INT#LTR/Gypsy TEsorter CDS 677 823 0.12 + . ID=Os3087_INT#LTR/Gypsy|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=76.9;evalue=2.2e-05;probability=0.45
Os3264#DNAauto/MLE TEsorter CDS 165 247 0.08 + . ID=Os3264#DNAauto/MLE|BoSB14A;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=26.3;evalue=0.00034;probability=0.74
Os3423#DNAnona/hAT TEsorter CDS 1050 1189 0.09 + . ID=Os3423#DNAnona/hAT|SINE1_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=78.2;evalue=0.00054;probability=0.75
Os3507_LTR#LTR/Gypsy TEsorter CDS 31 229 0.03 + . ID=Os3507_LTR#LTR/Gypsy|SINE2-1_OS;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=40.3;evalue=0.001;probability=0.71
Os3527_ICR#DNAnona/hAT TEsorter CDS 1 100 0.16 + . ID=Os3527_ICR#DNAnona/hAT|SHANSINE_MT;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=70.2;evalue=3.5e-06;probability=0.72
Os3565#DNAnona/hAT TEsorter CDS 252 474 0.03 + . ID=Os3565#DNAnona/hAT|SINE2-1_STu;Name=SINE-SINE;gene=SINE;clade=SINE;coverage=34.7;evalue=1.2e-05;probability=0.53
from tesorter.
@zhangrengang Thank you for the prompt update! The non-SNIE negative category should not be that high. The v7 library only has 2627 sequences. Can you double check? Thanks
from tesorter.
Hi, Ren-Gang,
I wonder if this upgrade works for animal genome annotations.
Thanks.
Zhenpeng
from tesorter.
@yuzhenpeng Hi, according to AnnoSINE, these SINE sequences are derived from plants, so I think it should not work well with animals.
from tesorter.
Thank you for your response.
By the way, if i want to annotate or classify animal TEs, such LINEs and SINEs. Do you have some suggestions?
from tesorter.
@yuzhenpeng , in my opinion, the RepeatMolderlor + RepeatMasker pipeline is ok.
from tesorter.
But, RepeatMolderlor seems can not accurate identify the LINEs or SINEs. It seems to be a little high or low proportion of the genome.
@zhangrengang
from tesorter.
@yuzhenpeng sorry, I have no better solution.
@oushujun Shujun, do you have any suggestions?
from tesorter.
Thanks. I found the solution from oushujun/EDTA#231.
But, identification SINEs seems difficult.
from tesorter.
Related Issues (20)
- error HOT 3
- error to get phylogeny tree using LTR_tree.R script HOT 9
- Does TEsorter results only contain positive strain? HOT 5
- keyError HOT 5
- Hi~ TEsorter can be worked in animal genoem ? HOT 2
- ValueError: invalid literal for int() with base 10: '0.5' HOT 8
- How to analyze the effect of transposons on plant traits in de novo transcriptome assembly. HOT 4
- ModuleNotFoundError: No module named 'RunCmdsMP' HOT 5
- Problem with concatenate_domains.py HOT 1
- How to identify the homology (synteny) of LTRs? HOT 10
- Insertion time calculation HOT 4
- Crash when special characters occurred in sequence IDs HOT 3
- Can TEsorter classify Class II elements(DNA transposons) into clade-level? HOT 2
- Assistance with custom installation directory HOT 5
- How to obtain the set of distances between LTRs and their adjacent genes? HOT 3
- TEsorter genome.fasta -genome -p 20 -prob 0.9 HOT 1
- Allocation into lineages for metazoan LTR-RTs HOT 4
- How to merge the TEsorter repeat libraires HOT 18
- get_full_seqs in LTR_retriever.py generate some empty sequences which should be generated. HOT 4
- Exploring the transposition profile of specific LTRs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesorter.