merenlab / illumina-utils Goto Github PK
View Code? Open in Web Editor NEWA library and collection of scripts to work with Illumina paired-end data (for CASAVA 1.7+ pipeline).
License: GNU General Public License v2.0
A library and collection of scripts to work with Illumina paired-end data (for CASAVA 1.7+ pipeline).
License: GNU General Public License v2.0
Hi,
I tried to do the installation of illumina-utils for anvio with pip-20.1.
The following error came up.
Using cached illumina-utils-2.7.tar.gz (3.3 MB) ERROR: Command errored out with exit status 1: command: /Users/virtual-envs/anvio-master/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/_b/40l3prb52ln490d96qf4ccmh0000gn/T/pip-install-n0dzn60q/illumina-utils/setup.py'"'"'; __file__='"'"'/private/var/folders/_b/40l3prb52ln490d96qf4ccmh0000gn/T/pip-install-n0dzn60q/illumina-utils/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/_b/40l3prb52ln490d96qf4ccmh0000gn/T/pip-pip-egg-info-y_xa8czy cwd: /private/var/folders/_b/40l3prb52ln490d96qf4ccmh0000gn/T/pip-install-n0dzn60q/illumina-utils/ Complete output (7 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/private/var/folders/_b/40l3prb52ln490d96qf4ccmh0000gn/T/pip-install-n0dzn60q/illumina-utils/setup.py", line 17, in <module> reqs = [str(ir.req) for ir in install_reqs] File "/private/var/folders/_b/40l3prb52ln490d96qf4ccmh0000gn/T/pip-install-n0dzn60q/illumina-utils/setup.py", line 17, in <listcomp> reqs = [str(ir.req) for ir in install_reqs] AttributeError: 'ParsedRequirement' object has no attribute 'req' ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
I downgraded pip to pip-19.3 and the installation worked without any problems.
Hi Meren,
There is a "compressed" parameter for fastq files, but I can't find one for fasta. Is it possible to add it or am I missing something?
Hello
rapidmerge.py
uses multiprocessing.cpu_count()
to ge the number of available cpuswhich retruns the number of cpu in the machine. But this is not the same as the number of cpu available to the process. For example, you can run in a taskset context or a batch scheduler like slurm.
see:
$ nproc
96
$ taskset -c 1 nproc
1
$ taskset -c 1 python3 -c "import multiprocessing; print(multiprocessing.cpu_count())"
96
I would suggest to use len(os.sched_getaffinity(0))
instead of multiprocessing.cpu_count()
$ python3 -c "import os; print(len(os.sched_getaffinity(0)))"
96
$ taskset -c 1 python3 -c "import os; print(len(os.sched_getaffinity(0)))"
1
regards
Eric
Some methods are not compatible with Python3. For exemple :
iu-merge-pairs
^
SyntaxError: Missing parentheses in call to 'print'
Hi,
I'm running into an error trying to produce the quality plots with iu-filter-quality-minoche. This is the command I ran and the error I received:
#!/bin/bash
source activate anvio-7
iu-gen-configs samples_ada.txt -o 01_QC_minoche/
for ini in 01_QC_minoche/*.ini;
do iu-filter-quality-minoche $ini --visualize-quality-curves;
done
Quality scores visualization in progress: FAILED_REASON_N Traceback (most recent call last):
File "/home/saatkinson/anaconda3/envs/anvio-7/bin/iu-filter-quality-minoche", line 314, in <module>
sys.exit(main(config, args))
File "/home/saatkinson/anaconda3/envs/anvio-7/bin/iu-filter-quality-minoche", line 265, in main
title = 'Mean PHRED scores for pairs tagged as "%s"' % entry_type)
File "/home/saatkinson/anaconda3/envs/anvio-7/lib/python3.6/site-packages/IlluminaUtils/utils/helperfunctions.py", l
ine 558, in visualize_qual_stats_dict
subplots[tile] = plt.subplot(next(gs))
TypeError: 'Gs' object is not an iterator
All the other files associated with Minoche seem to have been generated, just not the plots.
Any help getting the quality plots to generate would be most appreciated!
Thanks,
Samantha
I suggest (and plan to implement) the following changes to iu-remove-ids-from-fastq
:
-G, --generate-output-for-survived-only
-K, --keep-ids
- if provided, then instead of removing the reads in the list, only the reads in the list will be kept (and the rest would be removed).Using v2.10.
I've tried to use iu-trim-fastq
but it failed with this error:
iu-trim-fastq -f 0 -t 100 R1.fastq.gz R1-TRIMMED-TO-100bp.fastq.gz
00% -- (num pairs processed: 1) Traceback (most recent call last):
File "/project2/meren/VIRTUAL-ENVS/anvio-dev/bin/iu-trim-fastq", line 51, in <module>
sys.exit(main(input_file_path, output_file_path, args.trim_from, args.trim_to, compressed))
File "/project2/meren/VIRTUAL-ENVS/anvio-dev/bin/iu-trim-fastq", line 25, in main
output.store_entry(input.entry)
File "/project2/meren/VIRTUAL-ENVS/anvio-dev/lib/python3.6/site-packages/IlluminaUtils/lib/fastqlib.py", line 191, in store_entry
self.file_pointer.write('@' + e.header_line + '\n')
File "/project2/meren/VIRTUAL-ENVS/anvio-dev/lib/python3.6/gzip.py", line 260, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Maybe related to a python 2 to 3 migration?
Thanks for the help!
ITS amplicons can have a great variation in length. Therefore the insert size may become too small to have a partial overlap. Better yet, the insert size may be long enough for a partial overlap, but then after trimming the prefixes from both reads you may run into a situation that requires complete overlap analysis instead of partial overlap.
Here is an example. Say this is read 1 in one of your ITS paired-end sequences:
@M01028:24:000000000-A49NB:1:1101:16134:1723 1:N:0:21
TACGTCAGCGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTACTGTTATTTACTACTACACTGCGTGAGCGGAACGAAAACAACAACACCTAAAATGTGGAATATAGCATATAGTCGACAAGAGAAATCTACGAAAAAACAAACAAAACTTTCAACAACGGATCTCTTGGTTCTCGCATCGATGAAGAGCGCAGCGAAATGCGATACCTAGTGTGAATTGCAGCCGTCGTGAAT
+
BBBBBFBBFBBBGGGGGGGFGGEHHHHHHHHHGHHGGHHHHHHHHHGGEGGGHBGHFHGHHHH5DFGHHHHHHHHHHHHHG>?B?EFEGGGGGGGDGHHHHHFHHGGH3F?3BFFGHFHHHHHHHBDGHHHHH/F/CFHHHHHHHHHHHGGHGGHGGHHHHHH.GHHHHHHHHHHHGGGGGHHHHGGGGGGGGGGGGGGGGFGGGGFGGG-@DF-@EFEDFFFFFFFFFFFFFFFFFFFFFFAFCFDA/A/
and this is read 2:
@M01028:24:000000000-A49NB:1:1101:16134:1723 2:N:0:21
GTTCAAAGATTCGATGATTCACGACGGCTGCAATTCACACTAGGTATCGCATTTCGCTGCGCTCTTCATCGATGCGAGAACCAAGAGATCCGTTGTTGAAAGTTTTGTTTGTTTTTTCGTAGATTTCTCTTGTCGACTATATGCTATATTCCACATTTTAGGTGTTGTTGTTTTCGTTCCGCTCACGCAGTGTAGTAGTAAATCACAGTAATGATCCTTCCGCAGGTTCACCTACGGAAACCTTGTTACGA
+
AABABFFFFFFFGGGGGGG6FGHGAAEEEGGFHGHFFF5BBB3BDFGGGGGGGHHHGGGGGGCGHHHHHHHHHHFEGGGGHHHHHHHGHHGHG3EGHH4BFFHHHHHGHHHHHHHGGHGHHGEHHFHHHHHHH1DGCGCHFGHFGDGFHHFGGHFGHHGF0GGGHGHHHHFHHHH?GHGGF?EGGHHG@BEFFBFFFBFFF0;FFFFFGGBF9BFF0FFGBFB@B;FFFFFFFFFFFF.BBFFBFBFBB?9
and this is your config.ini file to merge them:
[general]
project_name = CGCCTT_NNNNTCAGC_1
researcher_email = [email protected]
input_directory = /somewhere/on/your/disk
output_directory = /somewhere/on/your/disk/output
[files]
pair_1 = r1
pair_2 = r2
# following section is optional
[prefixes]
pair_1_prefix = ^....TCAGCGTAAAAGTCGTAACAAGGTTTC
pair_2_prefix = ^GTTCAAAGA[C,T]TCGATGATTCAC
And if you run these like this:
merge-illumina-pairs short.ini
read 1 and 2 will not merge. Because, after trimming prefixes, they will want to be aligned like this after the reverse-complementing read 2:
read 1: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
read 2: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
unlike our expected partial overlap situation:
read 1: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
read 2: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
To solve this issue a new parameter is required. When m/o (mismatches at the overlapped region) fails miserably, the algorithm should check "the other way around" alternative, and pick the one with the best m/o.
Hi Meren,
I am getting this error when I perform iu-merge-pairs
- Config File Error: Unexpected value for "pair_1" section "files": RC_AB4_062216-R1.fastq
. Not sure what the issue is. I prepared the samples.txt
file (first two rows of the file shown below).
sample | r1 | r2 |
---|---|---|
RC_AB4_062216 | RC_AB4_062216-R1.fastq | RC_AB4_062216-R2.fastq |
I ran iu-gen-configs
and then ran iu-merge-pairs
as shown below (I also tried iu-filter-minoche
to see if the issue was the iu-merge-pairs
command itself but that also gave the same error).
iu-gen-configs samples.txt -o 01_QC
iu-merge-pairs --debug 01_QC/RC_AB4_062216.ini
Config File Error: Unexpected value for "pair_1" section "files": RC_AB4_062216-R1.fastq
Here is the appearance of the config file for the sample
[general]
project_name = RC_AB4_062216
researcher_email = [email protected]
input_directory = /media/shared/Onedrive/Postdoc_Gu/Projects/Oligotyping_RockCreek/metagenomics/raw_fq
output_directory = /media/shared/Onedrive/Postdoc_Gu/Projects/Oligotyping_RockCreek/metagenomics/01_QC
[files]
pair_1 = RC_AB4_062216-R1.fastq
pair_2 = RC_AB4_062216-R2.fastq
Any help with this would be greatly appreciated! (I have this foreboding feeling that I am just missing something extremely obvious!).
Thanks
Varun
Dear all,
I am trying to use iu-demultiplex to work with those fastq files : https://github.com/caporaso-lab/mockrobiota/blob/master/data/mock-9/dataset-metadata.tsv
but it's seems to no be FASTQ file generated by CASAVA 1.8, as iu-demultiplex return me this error :
Header lines in your FASTQ file does not seem to be the ones illumina-utils
expects to see in a FASTQ file generated by CASAVA 1.8. If you call this
funciton with 'raw = True' parameter, all should be fine. If you are accessing
this function through a client, or in other words if you have no idea what this
message is telling you, try to re-run the program with --ignore-deflines
parameter. If that parameter is not available to you, then please send an e-mail
to [email protected]
and I don't undertstand " 'raw = True' parameter ", as it's not an option of iu-demultiplex.
Could you tell me if it's possible to use your program with those data ?
They looks like this
head mock-forward-read.fastq
@ILLUMINA_0331:1:1101:1214:2235#NNNNNNNNNNNN/1
TACGTAGGGCGCAAGCGTTGTCCGGAATTANTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
a_aeceeegggggiiiiiiighiiihehifBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
head mock-index-read.fastq
@ILLUMINA_0331:1:1101:1214:2235#NNNNNNNNNNNN/1
NNNNNNNNNNNN
+
YYYYYYYYYYYY
head mock-reverse-read.fastq
@ILLUMINA_0331:1:1101:1214:2235#NNNNNNNNNNNN/2
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
(its not always NNNNNNN in the index file)
Kind regards
Maria
I am hoping someone can help me interpret this error message. I am using iu-filter-quality-minoche
for some NextSeq metagenomes (4xR1, 4XR2 per sample) that i ran through trimmomatic.
For most of the metagenomes I can run iu-filter-quality-minoche
just fine but a few always fail at the same point in the processing. For this sample it is around 24,000 reads in. If I run iu-filter-quality-minoche
on the raw data before trimmomatic
I have no issues. So it seems trimmomatic
is doing something to a read that is causing iu-filter-quality-minoche
to crash. But I don't know how to interpret the error and thus troubleshoot the problem.
(num pairs processed: 23,000)
(num pairs processed: 24,000)
Traceback (most recent call last):
File "/miniconda3/bin/iu-filter-quality-minoche", line 313, in <module>
sys.exit(main(config, args))
File "/miniconda3/bin/iu-filter-quality-minoche", line 178, in main
p1_passed_qual, p1_trim_to, p1_fate = IsHighQuality(s1, q1, p)
File "/miniconda3/bin/iu-filter-quality-minoche", line 68, in IsHighQuality
trim_to = None if len(sequence) == trim_to else trim_to
UnboundLocalError: local variable 'trim_to' referenced before assignment
If --ignore-deflines
flag is used, the use of --compute-qual-dicts
should throw an exception.
Related to #3.
First of all, I wanted to say that this is a great tool, so I just wanted to thank the developers!
I am trying to use iu-remove-ids-from-fastq
to remove some reads that were mapped using bowtie2, but I have the following problem:
in the bam
output from the bowtie2 mapping the reads look like this:
fasta_02:23:B02CBACXX:8:2315:2667:7273
Whereas, if I look at the corresponding read in the fastq file, it looks like this:
@fasta_02:23:B02CBACXX:8:2315:2667:7273 1:N:0:GATCAG
And iu-remove-ids-from-fastq
expects:
fasta_02:23:B02CBACXX:8:2315:2667:7273 1:N:0:GATCAG
Even though to my understanding the read name fasta_02:23:B02CBACXX:8:2315:2667:7273
is unique.
Could this behavior be modified?
Thank you!
Here is my case. I got my V1-V3 data sequenced by an external provider. They say they used the Illumina Casava pipeline version 1.8.3.
As a start I tried to generate a config file. I followed the steps listed at https://github.com/meren/illumina-utils:
I first generated a tab file listing all the sample names and the corresponding paired end fastq files. Than I ran iu-gen-configs and I was surprised that instead of generating a single config file it generated a config file for each sample.
Than I decided to merge the paired end fastq files for each sample by using iu-merge-pairs using the --compute-qual-dicts option. When I ran it for the first sample it produced the following error:
Error: Your input FASTQ files do not seem to be generated by CASAVA 1.8. Please use --ignore-deflines parameter.
I added the parameter as requested. Than I got another error message:
$ iu-merge-pairs --compute-qual-dicts --ignore-deflines 16001_posD09_CCTAAGACACTGCATA.ini
Traceback (most recent call last):
File "/usr/local/bin/iu-merge-pairs", line 770, in <module>
sys.exit(merger.run())
File "/usr/local/bin/iu-merge-pairs", line 398, in run
tile_number = self.input_1.entry.tile_number
File "/Library/Python/2.7/site-packages/IlluminaUtils/lib/fastqlib.py", line 82, in __getattr__
return getattr(self, '_'.join(['process', key]))()
(...)
File "/Library/Python/2.7/site-packages/IlluminaUtils/lib/fastqlib.py", line 82, in __getattr__
return getattr(self, '_'.join(['process', key]))()
File "/Library/Python/2.7/site-packages/IlluminaUtils/lib/fastqlib.py", line 73, in __getattr__
if key in ['__str__']:
RuntimeError: maximum recursion depth exceeded in cmp
I don't know what to do now. Can you, perhaps, advise?
Dear developer,
I am trying to demultiplex an Illumina run using iu-demultiplex, here is my command iu-demultiplex -s SampleSheet-RC.txt --r1 lane1_NoIndex_L001_R1_001-13C.fastq.gz --r2 lane1_NoIndex_L001_R3_001-13C.fastq.gz --index lane1_NoIndex_L001_R2_001-13C.fastq.gz -o output/
But I got following errors:
Output directory .............................: /Users/Jincheng/Desktop/tmp/output
Barcodes .....................................: 13 samples found
Traceback (most recent call last):
File "/Users/Jincheng/miniconda3/envs/py34/bin/iu-demultiplex", line 238, in
d._run()
File "/Users/Jincheng/miniconda3/envs/py34/bin/iu-demultiplex", line 45, in _run
self.build_index()
File "/Users/Jincheng/miniconda3/envs/py34/bin/iu-demultiplex", line 116, in build_index
progress.update('~%.2f%% (num index reads with no barcode: %d (%.2f%% of all reads))' % (self.index.percent_read, missing_barcode, missing_barcode * 100.0 / num_index))
TypeError: a float is required
Could you help?
Thank you!
Jincheng
Hi Meren,
We finally switched to python3 and now I have a problem:
/groups/vampsweb/seqinfobin/anaconda3/bin/python3 ./fastaunique TTAGGC_NNNNTCAGC_1_MERGED_V6_PRIMERS_REMOVED
Traceback (most recent call last):
File "./fastaunique", line 74, in
main(args)
File "./fastaunique", line 13, in main
input = u.SequenceSource(args.input_fasta, unique = True)
File "/groups/vampsweb/seqinfobin/anaconda3/lib/python3.6/site-packages/IlluminaUtils/lib/fastalib.py", line 94, in init
self.init_unique_hash()
File "/groups/vampsweb/seqinfobin/anaconda3/lib/python3.6/site-packages/IlluminaUtils/lib/fastalib.py", line 98, in init_unique_hash
hash = hashlib.sha1(self.seq.upper()).hexdigest()
TypeError: Unicode-objects must be encoded before hashing
Hello -
I was trying to run the test example using this command:
(illumina-utils-dev) delaney@ada:~/software/illumina-utils/examples/demultiplexing$ iu-demultiplex -s barcode_to_sample.txt --r1 r1.fastq --r2 r2.fastq --index index.fastq -o output/
and get the following error:
Traceback (most recent call last):
File "/home/delaney/software/illumina-utils/scripts/iu-demultiplex", line 18, in
import IlluminaUtils.lib.fastqlib as u
ModuleNotFoundError: No module named 'IlluminaUtils'
Any help is greatly appreciated!!
Dear illumina-utils Developer
i have python 3.6 activate and I am receiving a strange error when trying to to run illmina-utils.
here is my command line and the error:
(py36) [dieunel@genomics ~]$ iu-filter-quality-minoche -h
Your active Python major version ('2') is not compatible with what illumina-utils expects :/ We recently switched to Python 3.
Please any help ?
Regards
Hi,
I noticed a small Python3 compatibility bug while running iu-merge-pairs.
The error message that I got was this:
Traceback (most recent call last):
File "/home/danielle/virtual-envs/illumina-utils-v2.0.0/bin/iu-merge-pairs", line 770, in <module>
sys.exit(merger.run())
File "/home/danielle/virtual-envs/illumina-utils-v2.0.0/bin/iu-merge-pairs", line 302, in run
r1_passed_Q30, r1_Q30 = self.passes_minoche_Q30(self.input_1.entry.Q_list[0:-len_overlap])
File "/home/danielle/virtual-envs/illumina-utils-v2.0.0/bin/iu-merge-pairs", line 666, in passes_minoche_Q30
Q30 = len([True for _q in base_qualities[0:half_length] if _q > 30])
TypeError: slice indices must be integers or None or have an __index__ method
This is an easy fix (I fixed it on my local machine), by changing line 666 from
Q30 = len([True for _q in base_qualities[0:half_length] if _q > 30])
to
Q30 = len([True for _q in base_qualities[0:int(half_length)] if _q > 30])
Just wanted to let you know!
Thanks for this excellent package!!
Best,
Danielle
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.