csu-kanghu / hite Goto Github PK
View Code? Open in Web Editor NEWHigh-precision TE Annotator
License: GNU General Public License v3.0
High-precision TE Annotator
License: GNU General Public License v3.0
I'm curious as to whether one can use a library of already curated TEs to enhance the analysis and eliminate duplication with previous library work.
I have several species that we have manually curated and I'm hoping to use HiTE. I plan to compare the HiTE libraries to our curated TEs using any of several tools but was wondering if there is a mechanism built in that would allow me to do this automatically.
Dear developer
Thank you for designing such a great software, which is excellent in some aspects of performance and running time!
When I performed TE detection on an animal genome of about ~2.5G in size, I used HiTE and EDTA2.2 to run and got the results, as shown in the figure:
The total number of TEs seems similar, but the ratio of SINEs to LINEs seems quite different. Because these two TEs account for a large proportion of the genome, this result of almost double difference confuses me. To be honest, the SINE ratio of HITE is similar to that of previous studies, but the result of LINE is much lower. Do I need to manually manage the TEs predicted by EDTA and HiTE? Or is there a strategy to integrate the results of the two softwares to help me get as many real TE sequences as possible.
In addition, I used the RepeatModeler+RepeatMasker strategy, and the SINE and LINE ratio obtained was closer to that of EDTA2.2.
Sincerely hope to get your professional advice, which will be very helpful to me.
Best wishes
yulong
Hi! Thanks for the excellent tools!
How can I get the annotation of specified TE type (e.g. LTR or TIR)? Like the "Divide and conquer" part in EDTA pipeline. Many thanks!
Hello, I have two questions about the HiTE output files:
First, it seems that the naming of transposon classification in the HiTE.tbl and HiTE.out files is inconsistent. Where can I find the correspondence between the two classification methods?
Second, the HiTE parameters can be set with --miu
, but I did not find statistical information about transposon insertion time in the output files.
Looking forward to your response.
Hello,
I've just tried HiTE, only on one chromosome for now, and trying to run it on assemblies now.
I would like a bit more information on the output files:
I understand that the output:
confident_TE.cons.fa.classified
is the most final/complete output of TEs found?
However it doesn't give the coordinates on the genome like the confident_TE.cons.fa.domain does.
I would like the coordinates of the TEs on the genome.
Best wishes,
Isabella
Dear Professor Hukang,
I am delighted to see that your work has developed a pipeline capable of accurately identifying full-length TEs. I have also been troubled by the overly fragmented TEs annotated by previous software like EDTA2. However, I believe EDTA2 has its advantages, such as their panEDTA pipeline, which allows the construction of a TE library at the pangenome level. This enables the use of a single library to annotate multiple genomes, facilitating comparison and analysis.
I would like to know if it is possible to use the panEDTA pipeline to cluster libraries constructed by HiTE for multiple genomes, thereby generating a panTElib. Perhaps you could consider adding this functionality in future updates? Additionally, I noticed that your article and the peer review comments highlight the high annotation accuracy of your pipeline. However, focusing only on full-length TEs may overlook many non-full-length TEs that are still abundant in the genome. Could I combine your annotation results with the more comprehensive results from EDTA2 to achieve a more complete and accurate annotation?
Thank you again for your work. Best wishes!
yfchen
Hi, thanks for this software. Is there any statistics results, such as percentage of each type of TEs? Total TEs percentage of whole genome?
I would like to use the pipeline on a large plant genome. Would it be to run separately on chromosomes or directly on the entire genome? Are there any requirements for CPUs and RAM? Have you ever tested it on a large genome? Thanks!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.