csu-kanghu / hite Goto Github PK

View Code? Open in Web Editor NEW

19.0 3.0 1.0 193.74 MB

High-precision TE Annotator

License: GNU General Public License v3.0

Python 90.92% Shell 1.35% Perl 5.17% Dockerfile 0.10% Nextflow 2.42% JavaScript 0.04%

genome-annotation high-precision transposable-elements progressive-methods

hite's People

Contributors

Stargazers

Watchers

Forkers

minghuaxu

hite's Issues

Incorporating known TEs

I'm curious as to whether one can use a library of already curated TEs to enhance the analysis and eliminate duplication with previous library work.

I have several species that we have manually curated and I'm hoping to use HiTE. I plan to compare the HiTE libraries to our curated TEs using any of several tools but was wondering if there is a mechanism built in that would allow me to do this automatically.

How to deal with huge differences in the running results of different software？

Dear developer

Thank you for designing such a great software, which is excellent in some aspects of performance and running time!

When I performed TE detection on an animal genome of about ~2.5G in size, I used HiTE and EDTA2.2 to run and got the results, as shown in the figure:

The total number of TEs seems similar, but the ratio of SINEs to LINEs seems quite different. Because these two TEs account for a large proportion of the genome, this result of almost double difference confuses me. To be honest, the SINE ratio of HITE is similar to that of previous studies, but the result of LINE is much lower. Do I need to manually manage the TEs predicted by EDTA and HiTE? Or is there a strategy to integrate the results of the two softwares to help me get as many real TE sequences as possible.
In addition, I used the RepeatModeler+RepeatMasker strategy, and the SINE and LINE ratio obtained was closer to that of EDTA2.2.

Sincerely hope to get your professional advice, which will be very helpful to me.

Best wishes
yulong

Identify intact elements of a paticular TE type

Hi! Thanks for the excellent tools!

How can I get the annotation of specified TE type (e.g. LTR or TIR)? Like the "Divide and conquer" part in EDTA pipeline. Many thanks!

Classification and insertion time of transposons

Hello, I have two questions about the HiTE output files:

First, it seems that the naming of transposon classification in the HiTE.tbl and HiTE.out files is inconsistent. Where can I find the correspondence between the two classification methods?

Second, the HiTE parameters can be set with --miu, but I did not find statistical information about transposon insertion time in the output files.

Looking forward to your response.

Output files

Hello,

I've just tried HiTE, only on one chromosome for now, and trying to run it on assemblies now.

I would like a bit more information on the output files:

I understand that the output:
confident_TE.cons.fa.classified
is the most final/complete output of TEs found?

However it doesn't give the coordinates on the genome like the confident_TE.cons.fa.domain does.

I would like the coordinates of the TEs on the genome.

Best wishes,

Isabella

Construction a panTElib

Dear Professor Hukang,

I am delighted to see that your work has developed a pipeline capable of accurately identifying full-length TEs. I have also been troubled by the overly fragmented TEs annotated by previous software like EDTA2. However, I believe EDTA2 has its advantages, such as their panEDTA pipeline, which allows the construction of a TE library at the pangenome level. This enables the use of a single library to annotate multiple genomes, facilitating comparison and analysis.

I would like to know if it is possible to use the panEDTA pipeline to cluster libraries constructed by HiTE for multiple genomes, thereby generating a panTElib. Perhaps you could consider adding this functionality in future updates? Additionally, I noticed that your article and the peer review comments highlight the high annotation accuracy of your pipeline. However, focusing only on full-length TEs may overlook many non-full-length TEs that are still abundant in the genome. Could I combine your annotation results with the more comprehensive results from EDTA2 to achieve a more complete and accurate annotation?

Thank you again for your work. Best wishes!
yfchen

Is there any statistics results?

Hi, thanks for this software. Is there any statistics results, such as percentage of each type of TEs? Total TEs percentage of whole genome?

How is the annotation performance in the large genome (>10G)

I would like to use the pipeline on a large plant genome. Would it be to run separately on chromosomes or directly on the entire genome? Are there any requirements for CPUs and RAM? Have you ever tested it on a large genome? Thanks!!

csu-kanghu / hite Goto Github PK

hite's People

Contributors

Stargazers

Watchers

Forkers

hite's Issues

Incorporating known TEs

How to deal with huge differences in the running results of different software？

Identify intact elements of a paticular TE type

Classification and insertion time of transposons

Output files

Construction a panTElib

Is there any statistics results?

How is the annotation performance in the large genome (>10G)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent