Comments (3)
Yes, mismatches of sequence ID between input fasta and preexisting hmmscan outfile (*.domtbl
) can result in this issue. It can be avoided by deleting the *.domtbl
file or using -fw
option. I will add a warning for this issue.
from tesorter.
Hi Shujun,
Please update TEsorter (the lastest conda version is v1.4.6). With the version 1.4, it is ok:
2023-05-03 15:37:42,995 -INFO- Command: /media/40T/wlx/zrg/share/home/app/.local/python3/bin/TEsorter test.fa
2023-05-03 15:37:42,996 -INFO- VARS: {'sequence': 'test.fa', 'hmm_database': 'rexdb', 'db_hmm': None, 'seq_type': 'nucl', 'prefix': None, 'force_write_hmmscan': False, 'processors': 4, 'tmp_dir': './tmp-63472734-e985-11ed-8106-4cd98fb9bbe7', 'min_coverage': 20, 'max_evalue': 0.001, 'min_probability': 0.5, 'no_cleanup': False, 'disable_pass2': False, 'pass2_rule': '80-80-80', 'no_library': False, 'no_reverse': False, 'genome': False, 'win_size': 270000, 'win_ovl': 30000, 'p2_identity': 80.0, 'p2_coverage': 80.0, 'p2_length': 80.0}
2023-05-03 15:37:42,996 -INFO- checking dependencies:
2023-05-03 15:37:43,098 -INFO- hmmer 3.3 OK
2023-05-03 15:37:44,391 -INFO- blastn 2.13.0+ OK
2023-05-03 15:37:44,392 -INFO- check database /media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/TEsorter-1.4.1-py3.6.egg/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm
2023-05-03 15:37:44,392 -INFO- db file: /media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/TEsorter-1.4.1-py3.6.egg/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm
2023-05-03 15:37:44,392 -INFO- REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm OK
2023-05-03 15:37:44,392 -INFO- Start classifying pipeline (ELEMENT mode)
2023-05-03 15:37:44,472 -INFO- total 1 sequences
2023-05-03 15:37:44,473 -INFO- HMM scanning against `/media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/TEsorter-1.4.1-py3.6.egg/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm`
2023-05-03 15:37:44,521 -INFO- Start Pool with 4 process(es)
2023-05-03 15:37:44,522 -INFO- translating `./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/chunk.1.fasta` in six frames
/media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/Bio/Seq.py:2609: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning)
2023-05-03 15:37:44,627 -INFO- Start Pool with 4 process(es)
2023-05-03 15:37:44,628 -INFO- run CMD: `hmmscan --nobias --notextw --noali --domtblout ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/chunk.1.fasta.aa.domtbl /media/40T/wlx/zrg/share/home/app/.local/python3/lib/python3.6/site-packages/TEsorter-1.4.1-py3.6.egg/TEsorter/database/REXdb_protein_database_viridiplantae_v3.0_plus_metazoa_v3.hmm ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/chunk.1.fasta.aa > /dev/null`
2023-05-03 15:37:44,829 -INFO- generating gene anntations
2023-05-03 15:37:44,830 -INFO- 1 sequences classified by HMM
2023-05-03 15:37:44,830 -INFO- see protein domain sequences in `test.fa.rexdb.dom.faa` and annotation gff3 file in `test.fa.rexdb.dom.gff3`
2023-05-03 15:37:44,831 -INFO- classifying the unclassified sequences by searching against the classified ones
2023-05-03 15:37:44,886 -INFO- using the 80-80-80 rule
2023-05-03 15:37:44,886 -INFO- run CMD: `makeblastdb -in ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_classified.fa -dbtype nucl -out ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_classified.fa`
2023-05-03 15:37:46,226 -INFO- run CMD: `blastn -query ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_unclassified.fa -db ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_classified.fa -out ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7/pass1_unclassified.fa.blastout -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp sstrand' -num_threads 4 `
2023-05-03 15:37:47,403 -INFO- 0 sequences classified in pass 2
2023-05-03 15:37:47,404 -INFO- total 1 sequences classified.
2023-05-03 15:37:47,404 -INFO- see classified sequences in `test.fa.rexdb.cls.tsv`
2023-05-03 15:37:47,404 -INFO- writing library for RepeatMasker in `test.fa.rexdb.cls.lib`
2023-05-03 15:37:47,404 -INFO- writing classified protein domains in `test.fa.rexdb.cls.pep`
2023-05-03 15:37:47,404 -INFO- Summary of classifications:
Order Superfamily # of Sequences# of Clade Sequences # of Clades# of full Domains
DIRS unknown 1 0 0 0
2023-05-03 15:37:47,404 -INFO- cleaning the temporary directory ./tmp-63472734-e985-11ed-8106-4cd98fb9bbe7
2023-05-03 15:37:47,405 -INFO- Pipeline done.
from tesorter.
I think the issue might be due to preexisting TEsorter results matching the input name but the input content has been changed, so the existing results do not match with the new content. After deleting those results, TEsorter runs normally.
from tesorter.
Related Issues (20)
- error HOT 3
- error to get phylogeny tree using LTR_tree.R script HOT 9
- Does TEsorter results only contain positive strain? HOT 5
- keyError HOT 5
- Hi~ TEsorter can be worked in animal genoem ? HOT 2
- ValueError: invalid literal for int() with base 10: '0.5' HOT 8
- How to analyze the effect of transposons on plant traits in de novo transcriptome assembly. HOT 4
- ModuleNotFoundError: No module named 'RunCmdsMP' HOT 5
- Problem with concatenate_domains.py HOT 1
- How to identify the homology (synteny) of LTRs? HOT 10
- Insertion time calculation HOT 4
- Can TEsorter classify Class II elements(DNA transposons) into clade-level? HOT 2
- Assistance with custom installation directory HOT 5
- How to obtain the set of distances between LTRs and their adjacent genes? HOT 3
- TEsorter genome.fasta -genome -p 20 -prob 0.9 HOT 1
- Allocation into lineages for metazoan LTR-RTs HOT 4
- How to merge the TEsorter repeat libraires HOT 18
- get_full_seqs in LTR_retriever.py generate some empty sequences which should be generated. HOT 4
- Exploring the transposition profile of specific LTRs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesorter.