Improves the taxonomic assignment of reads using barcode information of linked-reads data. The formats are built to be compatible with Kraken2, but any classifier can be used with a little reformatting.
- A working pyhton3 installation
- Standard unix commands
awk
andpaste
- ETE toolkit, intallable through conda
- The first time you run cloudClassifier, you'll need an internet connexion to download the NCBI taxonomy
cloudClassifier needs two files as input:
- The fastq of the linked reads, with barcodes marked with "BX:Z:" or "BC:Z:" tag. This fastq MUST be sorted by barcode. You can use the
sort
command on unix systems (with option-S
for limiting memory usage) - The taxonomic classification of those reads, presented in the same order as the fastq file. Use the default Kraken2 output
cloudClassifier outputs a file in Kraken2 default output format, with improved taxonomic assignment.