Code Monkey home page Code Monkey logo

taxonkit's Introduction

TaxonKit - A Practical and Efficient NCBI Taxonomy Toolkit

Related projects:

  • Taxid-Changelog: Tracking all changes of TaxIds, including deletion, new adding, merge, reuse, and rank/name changes.
  • GTDB taxdump: GTDB taxonomy taxdump files with trackable TaxIds.
  • ICTV taxdump: NCBI-style taxdump files for International Committee on Taxonomy of Viruses (ICTV)

Table of Contents

Features

Subcommands

Subcommand Function
list List taxonomic subtrees (TaxIds) bellow given TaxIds
lineage Query taxonomic lineage of given TaxIds
reformat Reformat lineage in canonical ranks
name2taxid Convert taxon names to TaxIds
filter Filter TaxIds by taxonomic rank range
lca Compute lowest common ancestor (LCA) for TaxIds
taxid-changelog Create TaxId changelog from dump archives
profile2cami* Convert metagenomic profile table to CAMI format
cami-filter* Remove taxa of given TaxIds and their descendants in CAMI metagenomic profile
create-taxdump* Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB and ICTV

Note: *New commands since the publication.

taxonkit

Benchmark

  1. Getting complete lineage for given TaxIds

    Versions: ETE=3.1.2, taxopy=0.5.0 (faster since 0.6.0), TaxonKit=0.7.2.

Dataset

  1. Download and uncompress taxdump.tar.gz: ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
  2. Copy names.dmp, nodes.dmp, delnodes.dmp and merged.dmp to data directory: $HOME/.taxonkit, e.g., /home/shenwei/.taxonkit ,
  3. Optionally copy to some other directories, and later you can refer to using flag --data-dir, or environment variable TAXONKIT_DB.

All-in-one command:

wget -c ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz 
tar -zxvf taxdump.tar.gz

mkdir -p $HOME/.taxonkit
cp names.dmp nodes.dmp delnodes.dmp merged.dmp $HOME/.taxonkit

Update dataset: Simply re-download the taxdump files, uncompress and override old ones.

Installation

Go to Download Page for more download options and changelogs.

TaxonKit is implemented in Go programming language, executable binary files for most popular operating systems are freely available in release page.

Method 1: Download binaries (latest stable/dev version)

Just download compressed executable file of your operating system, and uncompress it with tar -zxvf *.tar.gz command or other tools. And then:

  1. For Linux-like systems

    1. If you have root privilege simply copy it to /usr/local/bin:

       sudo cp taxonkit /usr/local/bin/
      
    2. Or copy to anywhere in the environment variable PATH:

       mkdir -p $HOME/bin/; cp taxonkit $HOME/bin/
      
  2. For Windows, just copy taxonkit.exe to C:\WINDOWS\system32.

Method 2: Install via conda (latest stable version) Install-with-conda Anaconda Cloud downloads

conda install -c bioconda taxonkit

Method 3: Install via homebrew (out of date)

brew install brewsci/bio/taxonkit

Method 4: Compile from source (latest stable/dev version)

  1. Install go

     wget https://go.dev/dl/go1.17.13.linux-amd64.tar.gz
    
     tar -zxf go1.17.13.linux-amd64.tar.gz -C $HOME/
    
     # or 
     #   echo "export PATH=$PATH:$HOME/go/bin" >> ~/.bashrc
     #   source ~/.bashrc
     export PATH=$PATH:$HOME/go/bin
    
  2. Compile TaxonKit

     # ------------- the latest stable version -------------
    
     go get -v -u github.com/shenwei356/taxonkit/taxonkit
    
     # The executable binary file is located in:
     #   ~/go/bin/taxonkit
     # You can also move it to anywhere in the $PATH
     mkdir -p $HOME/bin
     cp ~/go/bin/taxonkit $HOME/bin/
    
    
     # --------------- the development version --------------
    
     git clone https://github.com/shenwei356/taxonkit
     cd taxonkit/taxonkit/
     go build
    
     # The executable binary file is located in:
     #   ./taxonkit
     # You can also move it to anywhere in the $PATH
     mkdir -p $HOME/bin
     cp ./taxonkit $HOME/bin/
    

Bash-completion

Supported shell: bash|zsh|fish|powershell

Bash:

# generate completion shell
taxonkit genautocomplete --shell bash

# configure if never did.
# install bash-completion if the "complete" command is not found.
echo "for bcfile in ~/.bash_completion.d/* ; do source \$bcfile; done" >> ~/.bash_completion
echo "source ~/.bash_completion" >> ~/.bashrc

Zsh:

# generate completion shell
taxonkit genautocomplete --shell zsh --file ~/.zfunc/_taxonkit

# configure if never did
echo 'fpath=( ~/.zfunc "${fpath[@]}" )' >> ~/.zshrc
echo "autoload -U compinit; compinit" >> ~/.zshrc

fish:

taxonkit genautocomplete --shell fish --file ~/.config/fish/completions/taxonkit.fish

Citation

If you use TaxonKit in your work, please cite:

Shen, W., Ren, H., TaxonKit: a practical and efficient NCBI Taxonomy toolkit, Journal of Genetics and Genomics, https://doi.org/10.1016/j.jgg.2021.03.006 Citation Badge

Contact

Create an issue to report bugs, propose new functions or ask for help.

License

MIT License

Starchart

Stargazers over time

taxonkit's People

Contributors

apcamargo avatar shenwei356 avatar tolot27 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

taxonkit's Issues

taxid-changelog bug: 10842

10842    2014-08-01   NEW                           Microvirus                  genus
10842    2016-11-01   L_CHANGE_LEN                  Microvirus                  genus
10842    2019-01-01   DELETE                        Microvirus                  genus
10842    2019-05-01   L_CHANGE_LEN                  Microvirus                  genus

Adding and removing nodes

Prerequisites

  • make sure you're are using the latest version by taxonkit version
    • Version 0.3.0
  • read the usage

Describe your issue

  • describe the problem
    • I was wondering if you could add new subcommands for adding/removing nodes. I am suggesting the subcommands 'touch' and 'rm' based on unix executables.
  • provide a reproducible example
    • taxonkit touch 662 "Vibrio newspecies" #=> adds a new species under taxon id 662 with scientific name "Vibrio newspecies." Prints the new taxon id.
    • taxonkit rm 662 #=> deletes node 662 and all nodes under it. Prints the number of nodes deleted.

format options in lineage command

It would be great if the lineage command supports the same format options as the reformat command does. That would avoid a second pipe and save processing time, especially for large datasets.

JSON output is malformed when running `taxonkit list`

Adding the --json flag to the example included in the documentation for taxonkit list produces output that is not well-formed JSON.

$ taxonkit list --json --ids 9605,239934 > result.json

This can be confirmed with a JSON linter like https://jsonlint.com/ or, as I discovered it, with Python's JSON parser.

>>> import json
>>> with open('result.json') as fh:
...   result = json.load(fh)
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/standage/anaconda3/envs/taxonkit/lib/python3.7/json/__init__.py", line 296, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/home/standage/anaconda3/envs/taxonkit/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/standage/anaconda3/envs/taxonkit/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/standage/anaconda3/envs/taxonkit/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 13 column 5 (char 149)
>>>

Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem
  • provide a reproducible example

Inconsistent "taxonkit reformat" output for 446045

Hi @shenwei356, I just upgraded to 0.6.1 and I found some unexpected behavior when querying the lineage for taxid 446045 (Drosophila serrata species complex). The full lineage from taxonkit lineage is consistent and correct, but the abbreviated lineage from taxonkit reformat is inconsistent. The final taxon in the abbreviated lineage switches between 7215 (the correct genus), 32281 (a subgenus), and 2081351 (a totally unrelated genus that coincidentally shares the same name).

$ for i in {1..6}; do echo 446045 | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d /; done
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;2081351;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;32281;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;2081351;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;

Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem
  • provide a reproducible example

Now in Brew

I made this work:

brew install brewsci/bio/taxonkit

Can I get kingdom using taxonkit?

help message says reformat can only output superkingdom, can I get kingdom info using taxonkit?

Output format can be formated by flag --format, available placeholders:

    {k}: superkingdom
    {p}: phylum
    {c}: class
    {o}: order
    {f}: family
    {g}: genus
    {s}: species
    {t}: subspecies/strain
    
    {S}: subspecies
    {T}: strain

duplicate taxid from name2taxid

Hello,

I am getting duplicate values from name2taxid when running

taxonkit name2taxid -i 2 filename

My input:
ESP_3 Bacteria
ESP_84 Bacteria
ESP_136 Bacteria
ESP_149 Bacteria
ESP_166 Bacteria
ESP_169 Bacteria
ESP_181 Bacteria
ESP_187 Bacteria
ESP_196 Bacteria

Output:
ESP_3 Bacteria 2
ESP_3 Bacteria 629395
ESP_84 Bacteria 2
ESP_84 Bacteria 629395
ESP_136 Bacteria 2
ESP_136 Bacteria 629395
ESP_149 Bacteria 2
ESP_149 Bacteria 629395
ESP_166 Bacteria 2
ESP_166 Bacteria 629395

Some lines as seen above are duplicated with a different taxid. There are no duplicates in the input.

Do you you what could be causing this?

Thank you!

line-buffered output

Currently, taxonkit uses xopen fully buffered output writers. Would it be possible to use at least a line-buffered writer? Grep does this by an additional parameter --line-buffered.
The rationale behind this request is the piping of BLASTs tabular output. BLAST is line-buffered and I can process hits as they "appear" but not with taxonkit. Long-running BLAST jobs piped to taxonkit will produce the output only if a lot of hits "appear", regardless of the Linux tools unbuffer or stdbuf -oL.

"out of range" error for Reformat on long taxonomic strings

Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem
    Receive an "out of range" error when using reformat option on certain taxa. See example below for TaxID 101020

  • provide a reproducible example

> echo 101020 | taxonkit lineage -
101020  cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;fabids;Rosales;Rosaceae;Rosoideae;Potentilleae;Fragariinae;Fragaria;Fragaria vesca;Fragaria vesca subsp. vesca

> echo 101020 | taxonkit reformat -
11:51:03.201 [ERRO] lineage-field (2) out of range (1):101020

R bindings for taxonkit?

Hello,

I am quite happy with taxonkit and discovered yesterday that there are is also pytaxonkit - is there also an R package with R bindings to taxonkit?

If not, can anyone help me to start developing it? (I have developed R packages in the past, I am just not sure in how to connect the functionality of taxonkit inside R)

Thank you.

PS: I know an issue is not the best location for this request, unfortunately I did not have a better idea.

Difference between `--list-order` and `--list-ranks` unclear

For the new taxonkit filter command, I'm not sure what the difference is between the --list-ranks and --list-order flags. I can see that --list-order is more comprehensive. Can you explain the difference? Thank you for your patience! 😄

taxonkit Synonyms

Taxonkit v0.2.4 (linux/mac/windows)

Maybe in the next release, taxonkit could identify synonyms names (from names.dmp).
For example:

"Chlorobium tepidum" is the "unofficial name" of "Chlorobaculum tepidum" with taxid 1097. It could be great, if taxonkit could identify synonyms too, because some names like that, appears in literature.
No results when i tested the "no scientific" name with names2taxid.

Thanks,
:)

Unsure about usage of --pseudo-strain

Consider the following two commands.

$ echo 36827 | taxonkit lineage | taxonkit reformat --add-prefix --format '{k};{p};{c};{o};{f};{g};{s};{T}'
36827   cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales;Clostridiaceae;Clostridium;Clostridium botulinum;Clostridium botulinum B        k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium botulinum;T__
$ echo 36827 | taxonkit lineage | taxonkit reformat --add-prefix --format '{k};{p};{c};{o};{f};{g};{s};{T}' --pseudo-strain
36827   cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Clostridiales;Clostridiaceae;Clostridium;Clostridium botulinum;Clostridium botulinum B        k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Clostridiaceae;g__Clostridium;s__Clostridium botulinum;T__

The lowest taxon in this lineage is below species and has no rank, so I expected the --pseudo-strain to report the name of this taxon as the strain. It doesn't work with {T} or {t}. Am I doing something wrong?


Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem
  • provide a reproducible example

reformat should support tab characters in format string

It would be great if reformat would support tab characters in the format string to separate the placeholders into separate columns.

It tried to provide tabs with \t like following:

taxonkit reformat -f "{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}\t{S}"

An intermediate solution is to mask the string in bash like $'{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}\t{S}'. But that's not obvious to most users.

Functionality request

Taxonkit is extremely useful for me, but I would like to request a funcionality for downloading the corresponding genomic sequences from NCBI according to given taxid which may interest others as well.

No output for leaf nodes of the taxonomy

Hello,

I am playing around with taxonkit and I got puzzled by its behavior. It looks like it will not produce output when searching for leaf nodes of the taxonomic tree. For example, this prints nothing:

taxonkit list --show-rank --ids 9913

same with lineage etc. Any time I search for a node with no children I get nothing back.

I am not sure I understand what is the rationale for this. Is this working as intended?

How would one use taxonkit to figure out that 9913 is the the taxid for Bos Taurus?
le

[Question] How to get specific taxonomic fields (not full taxonomy string) from a list of taxon identifiers?

Prerequisites

  • [ X] make sure you're are using the latest version by taxonkit version
  • [ X] read the usage

Describe your issue

What is the recommended method for getting only a subset of lineages with taxonkit lineage?
e.g., superkingdom,phylum,order,class,family,genus,species

  • describe the problem
    I made a post-processing script that puts it into the following format with prefixes:
    ["superkingdom", "phylum", "class", "order", "family", "genus", "species"], ["d__","p__","c__","o__","f__","g__","s__"])

This is what GTDB-Tk uses and I need to manually create this for a few taxa.

  • provide a reproducible example

For example, with the following file called euk.list:

$ cat euk.list
425265
76775

If I run taxonkit lineage -R -t euk.list, I get the following output:

425265	cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Ustilaginomycotina;Malasseziomycetes;Malasseziales;Malasseziaceae;Malassezia;Malassezia globosa;Malassezia globosa CBS 7966	131567;2759;33154;4751;451864;5204;452284;1538075;162474;742845;55193;76773;425265	no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;order;family;genus;species;strain
76775	cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Ustilaginomycotina;Malasseziomycetes;Malasseziales;Malasseziaceae;Malassezia;Malassezia restricta	131567;2759;33154;4751;451864;5204;452284;1538075;162474;742845;55193;76775	no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;order;family;genus;species

How to Extract GenBank 15-taxon format abbreviated lineage

Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

For examples, human complete lineage is
"cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Dipnotetrapodomorpha; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Boreoeutheria; Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; Homininae; Homo; Homo sapiens;".
Human abbreviated lineage is
"Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo; Homo sapiens;", a 15-taxon format.
And Xenopus tropicalis abbreviated lineage is
"Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Amphibia; Batrachia; Anura; Pipoidea; Pipidae; Xenopodinae; Xenopus; Silurana; Xenopus tropicalis;".
Taxonomy database allows us to flag taxa that should (or should not) appear in the abbreviated classification line in the GenBank flatfiles. For convenience reasons, both GenBank/EMBL-Bank/DDBJ and UniProtKB entries store an abbreviated lineage.
How to extract GenBank 15-taxon format abbreviated lineage or could you add a reformat of the 15-taxon format.
Thanks for the state-of-the-art tools!

Why require delnodes.dmp and merged.dmp?

I created a tool for converting the Genomes Taxonomy DataBase (GTDB) taxonomy to nodes.dmp & names.dmp files. The tool is gtdb_to_taxdump. The output works with taxonkit<0.5.0, but fails for taxonkit>=0.5.0, since you now require delnodes.dmp and merged.dmp. Why do you require these files? Could an option be included to override that requirement or is it absolutely necessary for taxonkit 0.5 to work?

taxonkit reformat omits species for some species taxids

Hi!

The following command does not output the species, I'm not sure why. Is this a bug or related to the taxonomy of this species in NCBI?

echo '272799' | taxonkit lineage --data-dir $BLASTDB | taxonkit --data-dir $BLASTDB reformat | csvtk -tH cut -f 1,3              
272799  Eukaryota;Chordata;Mammalia;Chiroptera;Vespertilionidae;Plecotus;

The full output of the taxon using

echo '272799' | taxonkit lineage --data-dir $BLASTDB

ends correctly with the species, Plecotus gaisleri.
More example taxids where this happens: 2493713, 1629512, 602068

Thanks for all the nice programs!

output taxid for certain rank or complete lineage

Like the format string in reformat, it would be interesting to have placeholders for the taxids of the available ranks.
Most useful for downstream analyses and visualization, i.e. of metagenomic data, would be the taxid of the species and subspecies rank. Often, the taxonomic classifiers are randomly or incorrectly choosing a certain strain or a lot of different strains of the same species/subspecies. That clutters the output.
Having such a placeholder can be easily used to filter the dataset during visualization rather than during processing.

filtering ranks

in nodes.dmp
superkingdom
subkingdom
kingdom
superphylum
phylum
subphylum
superclass
class
subclass
infraclass
cohort
subcohort
superorder
order
suborder
infraorder
parvorder
superfamily
family
subfamily
tribe
subtribe
genus
subgenus
section
subsection 
series
species group
species subgroup
species
subspecies
varietas
forma
no rank

taxonkit reformat erro :panic: runtime error: invalid memory address or nil pointer dereference

hello,
I've been having some problems lately when using the latest version:

$ taxonkit list --ids 2,4751,13776,10239 --indent "" -j 8 > bac_fun_arc_vir_taxids.txt
$ taxonkit lineage bac_fun_arc_vir_taxids.txt -o bac_fun_arc_vir_taxids_lineage.txt -j 8
$ time taxonkit reformat -F -f "k__{k}|p__{p}|c__{c}|o__{o}|f__{f}|g__{g}|s__{s}" bac_fun_arc_vir_taxids_lineage.txt -o bac_fun_arc_vir_taxids_lineage_reformat.txt
ids_lineage.txt -o bac_fun_arc_vir_taxids_lineage_reformat.txt

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x6f1e49]

goroutine 284259 [running]:
github.com/shenwei356/taxonkit/taxonkit/cmd.glob..func7.1(0xc472103340, 0xae, 0x8, 0x8, 0x9, 0xc4720c4b00, 0x8)
	/home/shenwei/shenwei/script/Go/project/src/github.com/shenwei356/taxonkit/taxonkit/cmd/reformat.go:157 +0x579
github.com/shenwei356/breader.(*BufferedReader).run.func2.1(0xc471116850, 0xc458bea7e0, 0xc4585087c0, 0xc458c98840, 0xc448f384e0, 0xc471116860, 0x108a0, 0xc471f252c0, 0xa, 0xa)
	/home/shenwei/shenwei/script/Go/project/src/github.com/shenwei356/breader/BufferedReader.go:168 +0xef
created by github.com/shenwei356/breader.(*BufferedReader).run.func2
	/home/shenwei/shenwei/script/Go/project/src/github.com/shenwei356/breader/BufferedReader.go:160 +0x245

Uploading bac_fun_arc_vir_taxids.txt…

Thanks!

There are so many ranks

OMG, so many ranks:

class
family
forma
genus
infraclass
infraorder
kingdom
no rank
order
parvorder
phylum
species
species group
species subgroup
subclass
subfamily
subgenus
subkingdom
suborder
subphylum
subspecies
subtribe
superclass
superfamily
superkingdom
superorder
superphylum
tribe
varietas

`taxonkit lineage` stalls if taxID = 1

I'm using taxonkit v0.2.0 (installed via bioconda), and I was running taxonkit lineage on the "hits" file generated by centrifuge. taxonkit lineage would very quickly write out taxonomies for the first ~60000 hits, but then stall and the memory used would climb to >300 GB. It turns out that one of the centrifuge hits had a taxID of "1" (centifuge called this a "no rank"). I filtered out this "no rank" hits, which fixed this stalling issue.

$TAXONKIT_DB variable for file locations?

And copy "names.dmp" and "nodes.dmp" to data directory: "$HOME/.taxonkit".

We use taxonkit on a shared server with many users.

Would you be able to honour an env variable like $TAXONKIT_DB so that we can put the files in one place that's not a home directory?

Then we can have global export TAXONKIT_DB=/opt/data/taxonkit/ and the two files will be found there?

We'd rather not set global aliases for --names-file and --nodes-file

Possible bug with taxonkit filter --black-list

When user specifies a rank or a (comma separated?) list of ranks for --black-list, these should be excluded from the output, correct? I have tried the following example several times with different ranks, and I get the same error message every time.

$ echo 349741 | taxonkit lineage -t | cut -f 3 | sed 's/;/\n/g' > taxids2.txt
$ cat taxids2.txt | taxonkit filter -B Family 
23:40:47.905 [ERRO] rank order not defined in rank file: no rank

Is this a bug, or am I misunderstanding this flag?


Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem
  • provide a reproducible example

no lineage info when using custom nodes+names dmp

I created a script to convert the Genome Taxonomy Database (GTDB) taxonomy to nodes.dmp + names.dmp files. The output looks like:

names.dmp

1	|	all	|		|	synonym	|
1	|	root	|		|	scientific_name	|
2	|	d__Archaea	|		|	scientific_name	|
3	|	p__Halobacterota	|		|	scientific_name	|
4	|	c__Methanosarcinia	|		|	scientific_name	|
5	|	o__Methanosarcinales	|		|	scientific_name	|
6	|	f__Methanosarcinaceae	|		|	scientific_name	|
7	|	g__Methanosarcina	|		|	scientific_name	|
8	|	s__Methanosarcina mazei	|		|	scientific_name	|
9	|	RS_GCF_000979745.1	|		|	scientific_name	|
10	|	RS_GCF_000980175.1	|		|	scientific_name	|
11	|	RS_GCF_000980005.1	|		|	scientific_name	|
12	|	RS_GCF_000979595.1	|		|	scientific_name	|
13	|	RS_GCF_000979555.1	|		|	scientific_name	|
14	|	RS_GCF_000979915.1	|		|	scientific_name	|
15	|	RS_GCF_000970165.1	|		|	scientific_name	|
16	|	RS_GCF_000979125.1	|		|	scientific_name	|
17	|	RS_GCF_000979015.1	|		|	scientific_name	|
18	|	RS_GCF_000979925.1	|		|	scientific_name	|
19	|	RS_GCF_000980105.1	|		|	scientific_name	|

nodes.dmp

1	|	1	|	no rank	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
2	|	1	|	superkingdom	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
3	|	2	|	phylum	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
4	|	3	|	class	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
5	|	4	|	order	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
6	|	5	|	family	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
7	|	6	|	genus	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
8	|	7	|	species	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
9	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
10	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
11	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
12	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
13	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
14	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
15	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
16	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
17	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
18	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
19	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|
20	|	8	|	subspecies	|	XX	|	0	|	0	|	11	|	1	|	1	|	0	|	0	|	0	|

taxonkit list works as expected, but taxonkit lineage does not provide any lineage info. For example:

1		no rank
2		superkingdom
10	;;;;;;;	subspecies
397	;;;	order
982	;;;;;	genus
541	;;;;	family
3844	;;;;;;	species

Any idea why I'm not getting the full lineage info? I tried to look at the taxonkit code to see if it was filtering based on the embl code or something else, but I don't see what's the problem (it doesn't help that I don't know go).

Export newick tree?

Hi,

I'm curious if it would be possible to use this tool to get a minimal rooted Newick tree of a set of taxids? I searched the documentation but couldn't find anything.

[ERRO] lineage-field (2) out of range (1)

I'm using taxonkit 0.3.0, and I'm getting the following error:

[ERRO] lineage-field (2) out of range (1):cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Cercopithecoidea;Cercopithecidae;Colobinae;Rhinopithecus;Rhinopithecus roxellana

The command is:

cat lineages.tsv | taxonkit reformat 

My table of lineages to reformat is:

cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Homo;Homo sapiens
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Pan;Pan troglodytes
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Homininae;Pan;Pan paniscus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hominidae;Ponginae;Pongo;Pongo abelii
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Hominoidea;Hylobatidae;Nomascus;Nomascus leucogenys
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Cercopithecoidea;Cercopithecidae;Cercopithecinae;Macaca;Macaca mulatta
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Cercopithecoidea;Cercopithecidae;Cercopithecinae;Macaca;Macaca fascicularis
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Cercopithecoidea;Cercopithecidae;Cercopithecinae;Papio;Papio anubis
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Cercopithecoidea;Cercopithecidae;Cercopithecinae;Chlorocebus;Chlorocebus sabaeus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Catarrhini;Cercopithecoidea;Cercopithecidae;Colobinae;Rhinopithecus;Rhinopithecus roxellana
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Platyrrhini;Cebidae;Callitrichinae;Callithrix;Callithrix;Callithrix jacchus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Simiiformes;Platyrrhini;Cebidae;Saimiriinae;Saimiri;Saimiri boliviensis
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Haplorrhini;Tarsiiformes;Tarsiidae;Carlito;Carlito syrichta
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Primates;Strepsirrhini;Lorisiformes;Galagidae;Otolemur;Otolemur garnettii
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Sciuromorpha;Sciuridae;Xerinae;Marmotini;Ictidomys;Ictidomys tridecemlineatus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Myomorpha;Dipodoidea;Dipodidae;Dipodinae;Jaculus;Jaculus jaculus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Myomorpha;Muroidea;Cricetidae;Arvicolinae;Microtus;Microtus ochrogaster
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Myomorpha;Muroidea;Cricetidae;Neotominae;Peromyscus;Peromyscus maniculatus;Peromyscus maniculatus bairdii
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Myomorpha;Muroidea;Cricetidae;Cricetinae;Mesocricetus;Mesocricetus auratus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Myomorpha;Muroidea;Muridae;Murinae;Mus;Mus;Mus musculus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Myomorpha;Muroidea;Muridae;Murinae;Rattus;Rattus norvegicus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Myomorpha;Muroidea;Spalacidae;Spalacinae;Nannospalax;Nannospalax galili
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Hystricomorpha;Bathyergidae;Heterocephalus;Heterocephalus glaber
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Hystricomorpha;Caviidae;Cavia;Cavia porcellus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Hystricomorpha;Chinchillidae;Chinchilla;Chinchilla lanigera
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Rodentia;Hystricomorpha;Octodontidae;Octodon;Octodon degus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Lagomorpha;Leporidae;Oryctolagus;Oryctolagus cuniculus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Euarchontoglires;Glires;Lagomorpha;Ochotonidae;Ochotona;Ochotona princeps
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Tylopoda;Camelidae;Vicugna;Vicugna pacos
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Tylopoda;Camelidae;Camelus;Camelus ferus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Whippomorpha;Cetacea;Odontoceti;Delphinidae;Tursiops;Tursiops truncatus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Whippomorpha;Cetacea;Odontoceti;Delphinidae;Orcinus;Orcinus orca
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Whippomorpha;Cetacea;Odontoceti;Physeteridae;Physeter;Physeter catodon
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Whippomorpha;Cetacea;Mysticeti;Balaenopteridae;Balaenoptera;Balaenoptera acutorostrata;Balaenoptera acutorostrata scammoni
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Antilopinae;Pantholops;Pantholops hodgsonii
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bos;Bos taurus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Bovinae;Bison;Bison bison;Bison bison bison
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Artiodactyla;Ruminantia;Pecora;Bovidae;Caprinae;Capra;Capra hircus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Perissodactyla;Equidae;Equus;Equus;Equus caballus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Perissodactyla;Rhinocerotidae;Ceratotherium;Ceratotherium simum
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Carnivora;Feliformia;Felidae;Felinae;Felis;Felis catus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Carnivora;Caniformia;Canidae;Canis;Canis lupus;Canis lupus familiaris
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Carnivora;Caniformia;Mustelidae;Mustelinae;Mustela;Mustela putorius;Mustela putorius furo
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Carnivora;Caniformia;Ursidae;Ailuropoda;Ailuropoda melanoleuca
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Carnivora;Caniformia;Ursidae;Ursus;Ursus maritimus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Carnivora;Caniformia;Odobenidae;Odobenus;Odobenus rosmarus;Odobenus rosmarus divergens
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Carnivora;Caniformia;Phocidae;Leptonychotes;Leptonychotes weddellii
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Pholidota;Manidae;Manis;Manis pentadactyla
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Chiroptera;Megachiroptera;Pteropodidae;Pteropodinae;Pteropus;Pteropus alecto
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Chiroptera;Megachiroptera;Pteropodidae;Pteropodinae;Pteropus;Pteropus vampyrus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Chiroptera;Microchiroptera;Vespertilionidae;Eptesicus;Eptesicus fuscus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Chiroptera;Microchiroptera;Vespertilionidae;Myotis;Myotis davidii
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Chiroptera;Microchiroptera;Vespertilionidae;Myotis;Myotis lucifugus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Eulipotyphla;Erinaceidae;Erinaceinae;Erinaceus;Erinaceus europaeus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Eulipotyphla;Soricidae;Soricinae;Sorex;Sorex araneus
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Boreoeutheria;Laurasiatheria;Eulipotyphla;Talpidae;Condylura;Condylura cristata
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Afrotheria;Proboscidea;Elephantidae;Loxodonta;Loxodonta africana
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Afrotheria;Macroscelidea;Macroscelididae;Elephantulus;Elephantulus edwardii
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Afrotheria;Sirenia;Trichechidae;Trichechus;Trichechus manatus;Trichechus manatus latirostris
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Afrotheria;Chrysochloridae;Chrysochlorinae;Chrysochloris;Chrysochloris asiatica
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Afrotheria;Tenrecidae;Tenrecinae;Echinops;Echinops telfairi
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Afrotheria;Tubulidentata;Orycteropodidae;Orycteropus;Orycteropus afer;Orycteropus afer afer
cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Mammalia;Theria;Eutheria;Xenarthra;Cingulata;Dasypodidae;Dasypus;Dasypus novemcinctus

I don't see what the problem is with Rhinopithecus roxellana

citation

Hey,

Cool tool - have u published this? How should we cite this in our paper?

how to use taxid-changelog

Describe your issue

Hello,

I'm confused regarding the exact purpose/use case of taxid-changelog.

Would I want to run the following:

wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/taxdmp*.zip
ls taxdmp*.zip | rush -j 1 'unzip {} names.dmp nodes.dmp merged.dmp delnodes.dmp -d {@_(.+)\.}'
cd ..
taxonkit taxid-changelog -i archive -o taxid-changelog.csv.gz --verbose

to ensure I have the most up-to-date lineage for use with taxonkit list?

Thanks!

autocompletion inside Tmux

taxonkit v0.3.0
Linux 3.10.0-693.11.6.el7.x86_64

Describe your issue

I Forgot to source ~/.bash_completion, that's why auto-completion wasn't working.

accession numbers not found

Hi, I'm trying to go from Blast output to taxIDs. I parsed the Blast accession number into a single file and then wanted to verify that the accession numbers were in the prot.accession2taxid.gz database.

To do this I ran:
pigz -dc /taxdb2020/prot.accession2taxid.gz | csvtk grep -t -f accession.version -P acc.txt > output.txt

head acc.txt
AB111947.1
AB111947.1
MK072403.1
MK072403.1
MK072403.1

My output.txt file comes out empty - I like to know if there are any suggestions for going from a list of accessions to TaxID?

Unnecessary stdin check for taxonkit filter?

In TaxonKit 0.7, I get the following error.

$ taxonkit filter --list-order
23:31:23.142 [ERRO] stdin not detected

The command is checking for stdin, even when input is not used. Can fix by echoing an empty string, but that seems unnecessary.

$ echo '' | taxonkit filter --list-order | tail
species
forma specialis,pathovar,subspecies
pathogroup,serogroup
biotype,genotype,serotype
aberration,morph,varietas,variety
subaberration,submorph,subvarietas,subvariety
form,forma
subform,subforma
strain
isolate

The same is true for taxonkit filter --list-rank.


Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem
  • provide a reproducible example

Ranks of interest for taxonkit lineage

I am working mostly with bacteria, and especially with uncultured bacteria many ranks appear that have the taxonomy rank no rank. Is it possible to define the ranks that I want to get back? I am thinking of an option such as

--filter-ranks superkingdom,phylum,class,order,family,genus

Please let me know if my issue description is clear enough.

no taxonomy for "1458427"

For some reason, taxonkit lineage does not return a taxonomy for taxonID 1458427, which is Comamonadaceae bacterium H1. I got taxonomies for all other taxa in my table (n =~ 2000), so it just appears to be an issue with taxonID 1458427. There is no warning.

example table

name	taxonomy_id	taxonomy_lvl	kraken_assigned_reads	added_reads	new_est_reads	fraction_total_reads
Calditerrivibrio nitroreducens	477976	S	53	0	53	0.00000
Streptococcus sp. oral taxon 071	712630	S	22	16	38	0.00000
Halothece sp. PCC 7418	65093	S	26	9	35	0.00000
Acinetobacter beijerinckii	262668	S	17	3	20	0.00000
Cycloclasticus zancles	1329899	S	11	0	11	0.00000
Bacillus velezensis	492670	S	86	14	100	0.00000
Atopobium rimae	1383	S	91	5	96	0.00000
Tatumella morbirosei	642227	S	66	0	66	0.00000
Paenibacillus sp. MAEPY2	1395587	S	196	10	206	0.00001
Comamonadaceae bacterium H1	1458427	S	0	0	0	0
Sulfolobales archaeon AZ1	1326980	S	10	0	10	0.00000

output

name	taxonomy_id	taxonomy_lvl	kraken_assigned_reads	added_reads	new_est_reads	fraction_total_reads
Calditerrivibrio nitroreducens	477976	S	53	0	53	0.00000	cellular organisms;Bacteria;Deferribacteres;Deferribacteres;Deferribacterales;Deferribacteraceae;Calditerrivibrio;Calditerrivibrio nitroreducens	131567;2;200930;68337;191393;191394;545865;477976
Streptococcus sp. oral taxon 071	712630	S	22	16	38	0.00000	cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcus sp. oral taxon 071	131567;2;1783272;1239;91061;186826;1300;1301;712630
Halothece sp. PCC 7418	65093	S	26	9	35	0.00000	cellular organisms;Bacteria;Terrabacteria group;Cyanobacteria/Melainabacteria group;Cyanobacteria;Oscillatoriophycideae;Chroococcales;Aphanothecaceae;Halothece cluster;Halothece;Halothece sp. PCC 7418	131567;2;1783272;1798711;1117;1301283;1118;1890450;92682;76023;65093
Acinetobacter beijerinckii	262668	S	17	3	20	0.00000	cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Pseudomonadales;Moraxellaceae;Acinetobacter;Acinetobacter beijerinckii	131567;2;1224;1236;72274;468;469;262668
Cycloclasticus zancles	1329899	S	11	0	11	0.00000	cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Thiotrichales;Piscirickettsiaceae;Cycloclasticus;Cycloclasticus zancles	131567;2;1224;1236;72273;135616;34067;1329899
Bacillus velezensis	492670	S	86	14	100	0.00000	cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus subtilis group;Bacillus amyloliquefaciens group;Bacillus velezensis	131567;2;1783272;1239;91061;1385;186817;1386;653685;1938374;492670
Atopobium rimae	1383	S	91	5	96	0.00000	cellular organisms;Bacteria;Terrabacteria group;Actinobacteria;Coriobacteriia;Coriobacteriales;Atopobiaceae;Atopobium;Atopobium rimae	131567;2;1783272;201174;84998;84999;1643824;1380;1383
Tatumella morbirosei	642227	S	66	0	66	0.00000	cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Erwiniaceae;Tatumella;Tatumella morbirosei	131567;2;1224;1236;91347;1903409;82986;642227
Paenibacillus sp. MAEPY2	1395587	S	196	10	206	0.00001	cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Paenibacillaceae;Paenibacillus;Paenibacillus sp. MAEPY2	131567;2;1783272;1239;91061;1385;186822;44249;1395587
Comamonadaceae bacterium H1	1458427	S	0	0	0	0
Sulfolobales archaeon AZ1	1326980	S	10	0	10	0.00000	cellular organisms;Archaea;TACK group;Crenarchaeota;Thermoprotei;Sulfolobales;Sulfolobaceae;Candidatus Aramenus;Candidatus Aramenus sulfurataquae	131567;2157;1783275;28889;183924;2281;118883;2489210;1326980

command

cat TEST.tsv | taxonkit lineage --threads 12 -i 2 -t --data-dir /path/to/taxonkit/taxdump/ > TEST_tax.tsv

conda-env

# packages in environment at /ebio/abt3_projects/software/dev/llmgp/.snakemake/conda/e0ee16ae:
#
# Name                    Version                   Build  Channel
blast                     2.5.0                hc0b0e79_3    bioconda
boost                     1.63.0                   py27_2    conda-forge
bracken                   2.2              py27h2d50403_1    bioconda
bzip2                     1.0.6                h470a237_2    conda-forge
ca-certificates           2018.11.29           ha4d7672_0    conda-forge
certifi                   2018.11.29            py27_1000    conda-forge
icu                       56.1                          4    conda-forge
jellyfish                 1.1.12               h2d50403_0    bioconda
kraken                    1.1                  h470a237_2    bioconda
kraken2                   2.0.7_beta      pl526h2d50403_0    bioconda
libffi                    3.2.1                hfc679d8_5    conda-forge
libgcc-ng                 7.2.0                hdf63c60_3    conda-forge
libstdcxx-ng              7.2.0                hdf63c60_3    conda-forge
ncurses                   6.1                  hfc679d8_2    conda-forge
openssl                   1.0.2p               h470a237_2    conda-forge
perl                      5.26.2               h470a237_0    conda-forge
pigz                      2.3.4                         0    conda-forge
pip                       18.1                  py27_1000    conda-forge
python                    2.7.15               h33da82c_6    conda-forge
readline                  7.0                  haf1bffa_1    conda-forge
setuptools                40.6.3                   py27_0    conda-forge
sqlite                    3.26.0               hb1c47c0_0    conda-forge
taxonkit                  0.3.0                         1    bioconda
tk                        8.6.9                ha92aebf_0    conda-forge
wheel                     0.32.3                   py27_0    conda-forge
zlib                      1.2.11               h470a237_4    conda-forge

Taxonkit list tabular output

Hi,
I would like to generate a tab separated output for further processing:
The following output contains only spaces which makes it difficult to separate columns.

taxonkit list --ids 2157 -n -r --indent "" > archae_tree.txt

Is there an option like
sep = "\t" ?

Best, Michael

obsolete taxid check

Hi :),
A little problem appears when the taxid gets "obsolet". However that info (and the new taxid), can be fetched from merged.dmp
I suggest you adding a little check in merged file, in case of no taxid be found? (Or search in merged first, something like that...).
Example: taxid 92489 was replaced with 796334 (Erwinia oleae). I tested it with list function.
Thanks

Taxon taxid reassigned with reformat

Hello, I noticed some unexpected behavior today. When I query and reformat the lineage for taxid 2507530, taxonkit reformat re-assigns 2516889 as the taxid in the output (the last taxid in the line).

$ echo 2507530 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids
2507530 2507530 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019    131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507530  Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889
$ echo 2516889 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids
2516889 2516889 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019    131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516889  Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889

It looks like these may be duplicated, unmerged taxids.

$ grep -e 2507530 -e 2516889 ~/.taxonkit/names.dmp 
2507530 |       Russula sp. 8 KA-2019   |       Russula sp. 8 KA-2019 <NCBI:txid2507530>        |       scientific name |
2516889 |       Russula sp. 8 KA-2019   |       Russula sp. 8 KA-2019 <NCBI:txid2516889>        |       scientific name |
$ grep 2516889 ~/.taxonkit/merged.dmp
$

Obviously, we should hope NCBI fixes this in the taxdump soon. But I'm assuming this is not the intended taxonkit behavior?


Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem
  • provide a reproducible example

Update benchmarks

I just released version 0.6.0 of taxopy where the only change is that taxids are now encoded as integers instead of strings. The code is now faster and uses less memory.

Before:

== Taxopy
data: taxids.n100000.txt

elapsed time: 8.591
peak rss: 1090184

b15e76dfe8cd3d7455bcf633909e3e97  taxids.n100000.txt.taxopy.lineage
== Taxopy
data: taxids.n10000.txt

elapsed time: 5.119
peak rss: 1090260

8debf4d37a7997c8ffdc13fd05e5d042  taxids.n10000.txt.taxopy.lineage
== Taxopy
data: taxids.n1000.txt

elapsed time: 5.474
peak rss: 1090236

4f47c764880ca614f9ac67c442f06144  taxids.n1000.txt.taxopy.lineage
== Taxopy
data: taxids.n100.txt

elapsed time: 6.360
peak rss: 1090024

4f7b7f23224e37658171a48780270d90  taxids.n100.txt.taxopy.lineage
== Taxopy
data: taxids.n10.txt

elapsed time: 4.902
peak rss: 1090316

138e7cea6c35a595b6538a34c9d2b7b3  taxids.n10.txt.taxopy.lineage
== Taxopy
data: taxids.n1.txt

elapsed time: 4.921
peak rss: 1090000

c1eda42e466916f0ef566c99c478907a  taxids.n1.txt.taxopy.lineage
== Taxopy
data: taxids.n20000.txt

elapsed time: 5.966
peak rss: 1090024

b6ec2a1d717ddcd854c762bd555b03df  taxids.n20000.txt.taxopy.lineage
== Taxopy
data: taxids.n2000.txt

elapsed time: 6.667
peak rss: 1090112

3cf4c5b7d13f455ed645654d829fa484  taxids.n2000.txt.taxopy.lineage
== Taxopy
data: taxids.n40000.txt

elapsed time: 6.467
peak rss: 1090300

70ddd9aac0283a4c21800245b582c983  taxids.n40000.txt.taxopy.lineage
== Taxopy
data: taxids.n4000.txt

elapsed time: 5.004
peak rss: 1090120

09e46bef68ac2e532644e5356e7b9928  taxids.n4000.txt.taxopy.lineage
== Taxopy
data: taxids.n60000.txt

elapsed time: 7.177
peak rss: 1090052

26215e6e9a981800565b5de62eb48bda  taxids.n60000.txt.taxopy.lineage
== Taxopy
data: taxids.n6000.txt

elapsed time: 5.240
peak rss: 1090260

8da55d3d8e76f548b461dbb5322b1c77  taxids.n6000.txt.taxopy.lineage
== Taxopy
data: taxids.n80000.txt

elapsed time: 7.685
peak rss: 1090220

30d16a8b6ebef3c5ee20bee943981b39  taxids.n80000.txt.taxopy.lineage
== Taxopy
data: taxids.n8000.txt

elapsed time: 5.125
peak rss: 1090064

cfecede52e185ee41336c6c1316e1a4e  taxids.n8000.txt.taxopy.lineage

After:

== Taxopy
data: taxids.n100000.txt

elapsed time: 6.760
peak rss: 867460

b15e76dfe8cd3d7455bcf633909e3e97  taxids.n100000.txt.taxopy.lineage
== Taxopy
data: taxids.n10000.txt

elapsed time: 3.991
peak rss: 867532

8debf4d37a7997c8ffdc13fd05e5d042  taxids.n10000.txt.taxopy.lineage
== Taxopy
data: taxids.n1000.txt

elapsed time: 4.102
peak rss: 867668

4f47c764880ca614f9ac67c442f06144  taxids.n1000.txt.taxopy.lineage
== Taxopy
data: taxids.n100.txt

elapsed time: 3.995
peak rss: 865352

4f7b7f23224e37658171a48780270d90  taxids.n100.txt.taxopy.lineage
== Taxopy
data: taxids.n10.txt

elapsed time: 3.898
peak rss: 853752

138e7cea6c35a595b6538a34c9d2b7b3  taxids.n10.txt.taxopy.lineage
== Taxopy
data: taxids.n1.txt

elapsed time: 3.787
peak rss: 862808

c1eda42e466916f0ef566c99c478907a  taxids.n1.txt.taxopy.lineage
== Taxopy
data: taxids.n20000.txt

elapsed time: 4.277
peak rss: 867532

b6ec2a1d717ddcd854c762bd555b03df  taxids.n20000.txt.taxopy.lineage
== Taxopy
data: taxids.n2000.txt

elapsed time: 3.892
peak rss: 867624

3cf4c5b7d13f455ed645654d829fa484  taxids.n2000.txt.taxopy.lineage
== Taxopy
data: taxids.n40000.txt

elapsed time: 4.914
peak rss: 867564

70ddd9aac0283a4c21800245b582c983  taxids.n40000.txt.taxopy.lineage
== Taxopy
data: taxids.n4000.txt

elapsed time: 3.889
peak rss: 867280

09e46bef68ac2e532644e5356e7b9928  taxids.n4000.txt.taxopy.lineage
== Taxopy
data: taxids.n60000.txt

elapsed time: 5.625
peak rss: 867564

26215e6e9a981800565b5de62eb48bda  taxids.n60000.txt.taxopy.lineage
== Taxopy
data: taxids.n6000.txt

elapsed time: 3.785
peak rss: 867412

8da55d3d8e76f548b461dbb5322b1c77  taxids.n6000.txt.taxopy.lineage
== Taxopy
data: taxids.n80000.txt

elapsed time: 6.216
peak rss: 867372

30d16a8b6ebef3c5ee20bee943981b39  taxids.n80000.txt.taxopy.lineage
== Taxopy
data: taxids.n8000.txt

elapsed time: 3.883
peak rss: 867676

cfecede52e185ee41336c6c1316e1a4e  taxids.n8000.txt.taxopy.lineage

Can I filter taxids using a set of taxids?

I have a gene2taxid file, like this:

qseqid  staxids
TRINITY_DN18_c0_g2      6973
TRINITY_DN18_c0_g3      6973
TRINITY_DN18_c0_g1      6973
TRINITY_DN18_c1_g1      189913
TRINITY_DN59_c0_g1      4577
TRINITY_DN79_c0_g1      4577
TRINITY_DN17_c0_g1      4577
TRINITY_DN46_c0_g1      81932
TRINITY_DN46_c2_g1      2020875

and a microbes taxid file, like this:

2       Bacteria                                                                                                                                                                        
2157    Archaea
10239   Viruses
33630   Alveolata
554915  Amoebozoa
5794    Apicomplexa
554296  Apusozoa
1401294 Breviatea
193537  Centroheliozoa
3041    Chlorophyta
28009   Choanoflagellida
190322  Collodictyonidae
3027    Cryptophyta
5758    Entamoeba
33682   Euglenozoa
207245  Fornicata
4751    Fungi

Can I get microbes genes using taxonkit?

About the special taxonID search

Dear WeiShen,
I meet a special condition for using the taxonkit, and I did not solve this problem. First, I had a list of NCBI taxonomy ID, which include a large of taxonomy iD involving into several species. Second, I need to extract/filter a special taxonomy ID (e.g., 10239, viruses) and its all child lineages from the list of large of taxonomy ID, followed by printing the taxonomy name (not ID). I have read the tutorial you provided, which are significantly integrated and abundant. However, I did not touch the progress or command adapted to my demand. So, I am curious if the taxonkit could provide the funtion I needed? Thanks your beautiful work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.