Code Monkey home page Code Monkey logo

atropos's Introduction

Travis CI PyPi DOI

Atropos

Atropos is tool for specific, sensitive, and speedy trimming of NGS reads. It is a fork of the venerable Cutadapt read trimmer (https://github.com/marcelm/cutadapt, DOI:10.14806/ej.17.1.200), with the primary improvements being:

  1. Multi-threading support, including an extremely fast "parallel write" mode.
  2. Implementation of a new insert alignment-based trimming algorithm for paired-end reads that is substantially more sensitive and specific than the original Cutadapt adapter alignment-based algorithm. This algorithm can also correct mismatches between the overlapping portions of the reads.
  3. Options for trimming specific types of data (miRNA, bisulfite-seq).
  4. A new command ('detect') that will detect adapter sequences and other potential contaminants.
  5. A new command ('error') that will estimate the sequencing error rate, which helps to select the appropriate adapter- and quality- trimming parameter values.
  6. A new command ('qc') that generates read statistics similar to FastQC. The trim command can also compute read statistics both before and after trimming (using the '--stats' option).
  7. Improved summary reports, including support for serialization formats (JSON, YAML, pickle), support for user-defined templates (via the optional Jinja2 dependency), and integration with MultiQC.
  8. The ability to merge overlapping reads (this is experimental and the functionality is limited).
  9. The ability to write the summary report and log messages to separate files.
  10. The ability to read SAM/BAM files and read/write interleaved FASTQ files.
  11. Direct trimming of reads from an SRA accession.
  12. A progress bar, and other minor usability enhancements.

Manual installation

Atropos is available from pypi and can be installed using pip.

First install dependencies:

  • Required
  • Maybe python libraries
    • pytest (for running unit tests)
    • progressbar2 or tqdm (progressbar support)
    • pysam (SAM/BAM input)
    • khmer 2.0+ (for detecting low-frequency adapter contamination)
    • jinja2 (for user-defined report formats)
    • ngstream (for SRA streaming), which requires ngs

Pip can be used to install atropos and optional dependencies, e.g.:

pip install atropos[tqdm,pysam,ngstream]

Conda

There is an Atropos recipe in Bioconda.

conda install -c bioconda atropos

Docker

A Docker image is available for Atropos in Docker Hub.

docker run jdidion/atropos <arguments>

Usage

Atropos is almost fully backward-compatible with cutadapt. If you currently use cutadapt, you can simply install Atropos and then substitute the executable name in your command line, with one key difference: you need to use options to specify input file names. For example:

atropos -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTTA -o trimmed.fq.gz -se reads.fq.gz

To take advantage of multi-threading, set the --threads option:

atropos --threads 8 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTTA -o trimmed.fq.gz -se reads.fq.gz

To take advantage of the new aligner (if you have paired-end reads with 3' adapters), set the --aligner option to 'insert':

atropos --aligner insert -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG \
  -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -o trimmed.1.fq.gz -p trimmed.2.fq.gz \
  -pe1 reads.1.fq.gz -pe2 reads.2.fq.gz

See the Documentation for more complete usage information.

Publications

Atropos is published in PeerJ.

Please cite as:

Didion JP, Martin M, Collins FS. (2017) Atropos: specific, sensitive, and speedy trimming of sequencing reads. PeerJ 5:e3720 https://doi.org/10.7717/peerj.3720

The results in the paper can be fully reproduced using the workflow defined in the paper directory.

The citation for the original Cutadapt paper is:

Marcel Martin. "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet.Journal, 17(1):10-12, May 2011. http://dx.doi.org/10.14806/ej.17.1.200

Links

Roadmap

1.2

  • Migrate to xphyle for file management.
  • Migrate to pokrok for progress bar management.
  • Accept multiple input files.
  • Support SAM output (including #33).
  • Direct streaming and trimming of reads from SRA and htsget using ngstream.
  • Read "cropping" (#50)
  • Support for ThruPlex-style adapters (in which barcode is part of query sequence; #55)
  • Accessibility:
    • Create recipe for homebrew.
    • Automatically update conda and homebrew recipes for each release.
    • Create Galaxy tool description using argparse2tool.
  • Improve documentation (#24)
  • Port over improvements in latest versions of Cutadapt https://cutadapt.readthedocs.io/en/stable/
  • Switch to using entry point instead of Atropos executable.

1.3

  • Add auto-trimming mode for paired-end reads.
  • Support for UMIs.
  • Provide PacBio- and nanopore-specific options (marcelm/cutadapt#120).
  • Provide option for RNA-seq data that will trim polyA sequence.
  • Add formal config file support (#53)
  • Automate crash reporting using sentry.
  • Look at [NGMerge] for improving read merging: https://github.com/harvardinformatics/NGmerge
  • Look at replacing pysam with pybam

1.4

  • Currently, InsertAligner requires a single 3' adapter for each end. Adapter trimming will later be generalized so that A) the InsertAligner can handle multiple matched pairs of adapters and/or B) multiple different aligners can be used for different adapters.
  • Integrate with AdapterBase for improved matching of detected contaminants to known adapters, automated trimming of datasets with known adapters, and (opt-in) submission of adapter information for novel datasets.
  • Migrate to seqio (https://github.com/jdidion/seqio) for reading/writing sequence files.
  • General-purpose read filtering based on read ID: marcelm/cutadapt#107.
  • Currently, SAM/BAM input files must be name sorted; add an option to 1) pre-sort reads inline using samtools or sambamba, or 2) cache each read in memory until its mate is found.

1.5

  • Provide more user control over anchoring of adapters: marcelm/cutadapt#53.
  • Enable user to define custom read structure: https://github.com/nh13/read-structure-examples
  • Support for paired-end demultiplexing
  • Demultiplexing based on barcodes: marcelm/cutadapt#118.
  • Consider supporting different error rates for read1 vs read2.
  • Add a ClipOverlapping modifier that will remove read overlaps (as opposed to merging).
  • Look more closely at providing solutions to the Illumina two-color chemistry issue:
    • Provide and option to exempt G calls from the assessment of quality
    • Trim 3โ€ฒ Gs from reads
  • Also look at addressing any issues with one-color chemistry (iSeq).
  • Consider whether to support trimming/QC of raw IonTorrent data.

1.6

2.0

Beyond 2.0

  • Implement additional alternate alignment algorithms.
  • Implement the error detection algorithm in ADEPT: https://github.com/LANL-Bioinformatics/ADEPT
  • Explore new quality trimming algorithms
  • Scythe is an interesting new trimmer. Depending on how the benchmarks look in the forthcoming paper, we will add it to the list of tools we compare against Atropos, and perhaps implement their Bayesian approach for adapter match.
  • Experiment with replacing the multicore implementation with an asyncio-based implementation (using ProcessPoolExecutor and uvloop).
  • Automatic adaptive tuning of queue sizes to maximize the balance between memory usage and latency.
  • FastProNGS has some nice visualizations that could be included, rather than relying on MultiQC: https://github.com/Megagenomics/FastProNGS

While we consider the command-line interface to be stable, the internal code organization of Atropos is likely to change. At this time, we recommend to not directly interface with Atropos as a library (or to be prepared for your code to break). The internal code organization will be stabilized as of version 2.0, which is planned for sometime in 2017.

If you would like to suggest additional enhancements, you can submit issues and/or pull requests at our GitHub page.

atropos's People

Contributors

antonkulaga avatar asellappen avatar essut avatar frederic-mahe avatar jdidion avatar jmarshall avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

atropos's Issues

Atropos can hang in multi-threaded mode due to insufficient memory

This is a subtle bug in which the main thread sometimes hangs waiting to read the next record from the input file. This appears to occur only under a strictly-regulated memory cap such as on cluster environment.

To address this, I have so far added the following:

  • Set the default batch size based on the queue sizes
  • Warn the user if their combination of batch and queue sizes might lead to excessive memory usage

atropos version 1.1.2 on conda hangs while pip version works

For your information:

conda install atropos

installs atropos and tqdm packages as expected but, a call to atropos hangs forever on a small data sets. A control+C interruption gives this error:

2017-05-01 18:29:26,569 INFO: Starting 2 worker processes
2017-05-01 18:29:26,572 ERROR: Unknown error
Traceback (most recent call last):
  File "/home/cokelaer/miniconda3/envs/py3/lib/python3.5/site-packages/atropos/util/__init__.py", line 673, in run_interruptible
    func(*args, **kwargs)
  File "/home/cokelaer/miniconda3/envs/py3/lib/python3.5/site-packages/atropos/commands/multicore.py", line 291, in __call__
    self.ensure_alive)
  File "/home/cokelaer/miniconda3/envs/py3/lib/python3.5/site-packages/atropos/commands/multicore.py", line 507, in enqueue_all
    for item in iterable:
TypeError: iter() returned non-iterator of type 'tqdm'

Then, I tried the pip version:

conda remove atropos 
conda remove tqdm
pip install atropos

This does not seem to install tqdm and then running the previous atropos command gives the expected output files.

This was on a python 3.5 conda environment.

base1 = r1_seq[i] IndexError: list index out of range

I have the following later with latest master:

atropos.AtroposError: An error occurred at record 569 of batch 1237
2017-10-07 12:47:03,522 ERROR: Unexpected error in Worker process 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/atropos-1.1.14+10.g3060182-py3.6-linux-x86_64.egg/atropos/commands/base.py", line 74, in handle_records
    self.handle_record(context, record)
  File "/usr/local/lib/python3.6/site-packages/atropos-1.1.14+10.g3060182-py3.6-linux-x86_64.egg/atropos/commands/base.py", line 127, in handle_record
    return self.handle_reads(context, read1, read2)
  File "/usr/local/lib/python3.6/site-packages/atropos-1.1.14+10.g3060182-py3.6-linux-x86_64.egg/atropos/commands/trim/__init__.py", line 52, in handle_reads
    return self.record_handler.handle_record(context, read1, read2)
  File "/usr/local/lib/python3.6/site-packages/atropos-1.1.14+10.g3060182-py3.6-linux-x86_64.egg/atropos/commands/trim/__init__.py", line 70, in handle_record
    reads = self.modifiers.modify(read1, read2)
  File "/usr/local/lib/python3.6/site-packages/atropos-1.1.14+10.g3060182-py3.6-linux-x86_64.egg/atropos/commands/trim/modifiers.py", line 1054, in modify
    read1, read2 = mods(read1, read2)
  File "/usr/local/lib/python3.6/site-packages/atropos-1.1.14+10.g3060182-py3.6-linux-x86_64.egg/atropos/commands/trim/modifiers.py", line 401, in __call__
    self.correct_errors(read1, read2, insert_match)
  File "/usr/local/lib/python3.6/site-packages/atropos-1.1.14+10.g3060182-py3.6-linux-x86_64.egg/atropos/commands/trim/modifiers.py", line 252, in correct_errors

Here is the command that I run:

atropos trim \
  --aligner insert \
  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG \
  -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT \
  -pe1 /cromwell-executions/quality_de_novo/2ce90b41-c1a7-4f7e-b31e-c0ae76ec1cb5/call-atropos_illumina_pe/inputs/pipelines/whales/graywhale/GWliver_S1_L001_R1_001.fastq.gz \
  -pe2 /cromwell-executions/quality_de_novo/2ce90b41-c1a7-4f7e-b31e-c0ae76ec1cb5/call-atropos_illumina_pe/inputs/pipelines/whales/graywhale/GWliver_S1_L001_R2_001.fastq.gz \
  -o GWliver_S1_L001_R1_001_trimmed.fastq.gz \
  -p GWliver_S1_L001_R2_001_trimmed.fastq.gz \
  --threads 8 \
  --correct-mismatches liberal \
  --trim-n

Here are the files:
1 and 2

Improve documentation on known adapters file

Clarify what the known adapters file is for (a database of adapters that can be referenced by name), and how it differs from specifying an adapter file to one of the -a/-g/-b options.

progress with "msg" option fails

FYI: with the progress option set to "msg" (and the import time fix), a new error occurs in the progress module.

File "/home/cokelaer/anaconda2/envs/py35/lib/python3.5/site-packages/atropos-1.0.14-py3.5-linux-x86_64.egg/atropos/serial.py", line 11, in run_serial
    for batch_size, batch in reader:
  File "/home/cokelaer/anaconda2/envs/py35/lib/python3.5/site-packages/atropos-1.0.14-py3.5-linux-x86_64.egg/atropos/progress.py", line 44, in __next__
    value = self.iterable.next()
AttributeError: 'BatchIterator' object has no attribute 'next'

UnboundLocalError: local variable 'name3' referenced before assignment

I've tried to build from source (as latest docker release does not contain latest fix with fasta support for atropos detect output. And there I have the following strange error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/__init__.py", line 199, in execute_cli
    retcode, _ = command.execute(args)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/__init__.py", line 73, in execute
    self.generate_reports(summary, options)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/__init__.py", line 153, in generate_reports
    generator.generate_reports(summary)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/reports.py", line 65, in generate_reports
    self.generate_text_report(fmt, summary, outfile, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/detect/reports.py", line 23, in generate_text_report
    generate_fasta(outfile, summary, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/detect/reports.py", line 125, in generate_fasta
    format_match(idx, match, records)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/detect/reports.py", line 119, in format_match
    "{} {}".format(idx, ";".join(name2 + name3)),
UnboundLocalError: local variable 'name3' referenced before assignment

The command that I ran was

/usr/local/bin/atropos detect -se ${file} -d heuristic -O fasta

where I put real filename instead of ${file} and I used single ended reads

changes in log

I've noticed that the log file format has changed.

For instance, in the summary section

Total basepairs processed:                505,000 bp
     Read 1:       252,500 bp
     Read 2:       252,500 bp

is now:

Total basepairs processed:                505,000 bp
     Read 1       252,500 bp
     Read 2       252,500 bp

that is there is no more ":" sign

I am parsing the output in github.com/sequana/sequana project and this obviously affect my code. This is not a big deal but I was wondering is there was any rationale to change the output with respect to cutadapt output itself. I thought it was grerat to keep the two outputs similar so that I needed only one parser ;-)

I am also parsing the remaining of the code hoping it won't change. thanks

Progress bar counts batches rather than reads

The progress bar wraps a batch reader. It should be a little bit smarter to check whether the wrapped iterable is a batch reader and, if so, multiply the count by the batch size.

progress bar error

With the latest version of atropos, the --progress bar option raises an error:

File "...atropos/io/progress.py", line 145, in __next__
   self.update(self.value + value[0])
TypeError: unsupported operand type(s) for +: 'int' and 'dict'

There are also lots of ERROR message, which should be ignored according to the documentation but that looks suspicious to me (maybe unrelated)

2017-09-21 02:16:50,970 ERROR: Waiting on Worker process 0 to terminate for 60.0 seconds
2017-09-21 02:16:55,860 ERROR: Worker process 5 waiting on batch for 60.0 seconds
2017-09-21 02:16:55,861 ERROR: Worker process 1 waiting on batch for 60.0 seconds
2017-09-21 02:16:55,862 ERROR: Worker process 6 waiting on batch for 60.0 seconds
2017-09-21 02:16:55,863 ERROR: Worker process 3 waiting on batch for 60.0 seconds

antropos detect output as fasta

Current antropos detect output is some text file that I have to parse to make use of detected adapters. It will be very useful to be able to have an option to output as fasta instead, so I will be able to use it in the other steps of the pipeline.

Config file vs cli for parameters

I like what you have done with this. I am also willing to help in anyway to keep this project moving forward. A suggestion I have is to use a tab delimited config file for inputting parameters something along the lines of the one I pasted below. The file can then be simply parsed into a named tuple and passed around easily. Even though the code is likely to break I would like some sort of API to allow simpler integration into my pipelines. I have started to play around with that but I am still very early into it. Again, let me know if there is something I might be able to help you with.

run_Mimir.txt

Error with `detect` adapters

Hello!
I tried to detect adapter sequences from read sequences with atropos detect -pe1 r1.fastq -pe2 r2.fastq and received the following error:

2017-05-02 10:52:21,687 INFO: This is Atropos 1.1.2 with Python 3.5.3
2017-05-02 10:52:21,688 INFO: Detecting adapters and other potential contaminant sequences based on 12-mers in 10000 reads
2017-05-02 10:52:21,837 ERROR: Error executing command: detect
Traceback (most recent call last):
  File "/home/mik/miniconda3/envs/ATRO/lib/python3.5/site-packages/atropos/commands/__init__.py", line 199, in execute_cli
    retcode, _ = command.execute(args)
  File "/home/mik/miniconda3/envs/ATRO/lib/python3.5/site-packages/atropos/commands/__init__.py", line 74, in execute
    summary, options.report_file, options.report_formats)
  File "/home/mik/miniconda3/envs/ATRO/lib/python3.5/site-packages/atropos/commands/__init__.py", line 153, in generate_reports
    generator.generate_reports(summary)
  File "/home/mik/miniconda3/envs/ATRO/lib/python3.5/site-packages/atropos/commands/reports.py", line 51, in generate_reports
    self.add_derived_data(summary)
  File "/home/mik/miniconda3/envs/ATRO/lib/python3.5/site-packages/atropos/commands/reports.py", line 66, in add_derived_data
    for bp in summary['total_bp_counts'])
KeyError: 'total_bp_counts'

Trimming of adapters worked perfectly with this data.
Atropos was installed in a clean python 3.5 conda environment.

With best regards,
Vladimir

discrepancy in reports between cutadapt and atropos on the "expect" column

Here is a sample of a cutadapt report for a given adapter:

Overview of removed sequences (5')
length	count	expect	max.err	error counts
6	7	24.4	0	7
7	3	6.1	0	3
10	1	0.1	1	0 1
11	1	0.0	1	0 1
12	1	0.0	1	1

and atropos 1.1.14 reports:

Overview of removed sequences (5'):
length count expect max.err error counts    
                                    0 1         
------ ----- ------ ------- ------------
     6     7  146.3       0   7           
     7     3   36.6       0   3           
    10     1    4.6       1   0 1         
    11     1    2.3       1   0 1         
    12     1    1.1       1   1           

Everything is identical but the third column (expect) that is 6 times the values reported in cutadapt.
Not sure what is the reason for this difference, and whether is a bug or intended behaviour.

Atropos uses legacy mode when paired-end quality/N-trimming is performed without also adapter-trimming

I would like to run Atropos on paired reads and the adapters being passed as a FASTA file. The following is the command I used:

atropos trim --input1 SRR927423_1.fastq.gz --input2 SRR927423_2.fastq.gz --log-file atropos_log.txt --report-file atropos_report.txt --trim-n -q 15 --minimum-length 100 --maximum-length 100 --max-n 1 --preserve-order --output SRR927423_1_trimmed --paired-output SRR927423_2_trimmed --known-adapters-file ./sequencing_adapters.fa

However, the log file said I was using legacy mode and read modifications were only being performed on the forward strand:

2017-07-10 14:44:03,238 INFO: Trimming 0 adapter with at most 10.0% errors in paired-end legacy mode ...
2017-07-10 14:44:03,238 WARNING: WARNING: Requested read modifications are applied only to the first
read since backwards compatibility mode is enabled. To modify both
reads, also use any of the -A/-B/-G/-U options. Use a dummy adapter
sequence when necessary: -A XXX

I tried to add option โ€œ-A XXXโ€ to my command, however this gave me an error (see attached). Could you advise me on the parameters I could use to run Atropos on paired reads with a FASTA file of adapters?

Error with long reads

I am getting an error that seems to be related to calculating a large factorial (see overflow error in output below) probably due to the long inserts in the 2X300 bp data I am testing. This only occurs when I use the --aligner insert method, which also suggests it is a problem with calculating the probability for the long overlaps. In addition, if I trim 150 off each read (for example, using u -150 -U 150) the program runs. The untrimmed reads do not cause a problem when I use SeqPurge. I will include an example of some paired read files causing this problem.

drl@rhombus ~/Projects/MAT_dada2/fungi $ atropos --aligner insert -a GATCTCTTGGYTCTBGCATCGATGAAGAACG -A GGAAACCTTGTTACGACTTTTACTTCCTCTAAATGACCAA -pe1 Test_R1_001.fastq.gz -pe2 Test_R2_001.fastq.gz -o Test.out.R1.fastq.gz -p Test.out.R2.fastq.gz 
2017-09-08 09:29:25,492 INFO: This is Atropos 1.1.12 with Python 3.6.1
2017-09-08 09:29:25,497 INFO: Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
2017-09-08 09:29:25,510 ERROR: Atropos error
Traceback (most recent call last):
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/base.py", line 74, in handle_records
   self.handle_record(context, record)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/base.py", line 127, in handle_record
   return self.handle_reads(context, read1, read2)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/trim/__init__.py", line 52, in handle_reads
   return self.record_handler.handle_record(context, read1, read2)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/trim/__init__.py", line 70, in handle_record
   reads = self.modifiers.modify(read1, read2)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/trim/modifiers.py", line 1054, in modify
   read1, read2 = mods(read1, read2)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/trim/modifiers.py", line 363, in __call__
   match = self.aligner.match_insert(read1.sequence, read2.sequence)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/align/__init__.py", line 362, in match_insert
   prob = self.match_probability(insert_match[4], insert_match_size, **self.base_probs)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/util/__init__.py", line 95, in __call__
   self.factorial(i) /
OverflowError: int too large to convert to float

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/util/__init__.py", line 673, in run_interruptible
   func(*args, **kwargs)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/base.py", line 23, in __call__
   self.process_batch(batch)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/base.py", line 58, in process_batch
   self.handle_records(context, records)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/trim/__init__.py", line 48, in handle_records
   super().handle_records(context, records)
 File "/home/drl/miniconda3/lib/python3.6/site-packages/atropos/commands/base.py", line 78, in handle_records
   idx, context['index'])) from err
atropos.AtroposError: An error occurred at record 6 of batch 1

Test_R1_001.fastq.gz
Test_R2_001.fastq.gz

conda install: requirements incompatible for 3+

conda install atropos
Fetching package metadata ...............
Solving package specifications: .


UnsatisfiableError: The following specifications were found to be in conflict:
  - atropos -> khmer -> python 2.7*
  - python 3.6*
Use "conda info <package>" to see the dependencies for each package.

miRNA option Error

When I used miRNA option, I got this error.
In documentation, miRNA option sets the adapter sequence to the Illumina small RNA adapter by default.
Do I need to specify adapter sequence?

atropos trim --mirna -se input.fastq.gz -o output.fastq.gz
2017-09-13 11:39:26,521 INFO: This is Atropos 1.1.12+3.g1d4a9ee with Python 3.6.2
2017-09-13 11:39:26,521 ERROR: Error executing command: trim
Traceback (most recent call last):
  File "/path/to/atropos/atropos/commands/__init__.py", line 217, in execute_cli
    retcode, _ = command.execute(args)
  File "/path/to/atropos/atropos/commands/__init__.py", line 69, in execute
    options = self.parse_args(args)
  File "/path/to/atropos/atropos/commands/__init__.py", line 116, in parse_args
    return parser.parse(args)
  File "/path/to/atropos/atropos/commands/cli.py", line 50, in parse
    self.validate_command_options(options)
  File "/path/to/atropos/atropos/commands/trim/cli.py", line 690, in validate_command_options
    if (options.adapter is None and options.front is None and
AttributeError: 'Namespace' object has no attribute 'adapter'

ERROR: Unknown error

Hi John,

As you suggested, I attach the error log along with my command line and mini fastq files.

2017-08-05 01:06:56,535 INFO: This is Atropos 1.1.9 with Python 3.6.2
2017-08-05 01:06:56,539 INFO: Loading list of known contaminants from https://raw.githubusercontent.com/jdidion/atropos/master/atropos/adapters/sequencing_adapters.fa
2017-08-05 01:06:56,810 INFO: Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
2017-08-05 01:07:10,720 ERROR: Unknown error
Traceback (most recent call last):
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/util/init.py", line 673, in run_interruptible
func(*args, **kwargs)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/base.py", line 23, in call
self.process_batch(batch)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/base.py", line 58, in process_batch
self.handle_records(context, records)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 48, in handle_records
super().handle_records(context, records)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/base.py", line 73, in handle_records
self.handle_record(context, record)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/base.py", line 122, in handle_record
return self.handle_reads(context, read1, read2)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 52, in handle_reads
return self.record_handler.handle_record(context, read1, read2)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 70, in handle_record
reads = self.modifiers.modify(read1, read2)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/trim/modifiers.py", line 1054, in modify
read1, read2 = mods(read1, read2)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/trim/modifiers.py", line 401, in call
self.correct_errors(read1, read2, insert_match)
File "/site/ne/home/i0180769/apps/python/v3.6.2/lib/python3.6/site-packages/atropos/commands/trim/modifiers.py", line 253, in correct_errors
base2 = BASE_COMPLEMENTS[r2_seq[j]]
IndexError: list index out of range
2017-08-05 01:07:10,731 DEBUG: Not generating report file

Here is my command line
ADAPTER_FWD="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
ADAPTER_REV="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
atropos --aligner insert -a ${ADAPTER_FWD} -A ${ADAPTER_REV}
--correct-mismatches liberal -q 20 -m 30 --log-file atropos.log --log-level DEBUG
-o ${fastq_R1_name}.fq.gz -p ${fastq_R2_name}.fq.gz
-pe1 ${fastq_R1} -pe2 ${fastq_R2}

temp_1.fastq.gz
temp_2.fastq.gz

for many files I get atropos.io.seqio.FormatError: FASTQ file ended prematurely

2017-05-12 21:29:40,110 INFO: This is Atropos 0+unknown with Python 3.6.1
2017-05-12 21:29:40,114 INFO: Trimming 0 adapter with at most 10.0% errors in single-end mode ...
2017-05-12 21:48:45,045 ERROR: Atropos error
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/util/__init__.py", line 673, in run_interruptible
    func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/base.py", line 22, in __call__
    for batch in command_runner.iterator():
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/base.py", line 266, in __next__
    read_index, record = next(self.iterable)
  File "atropos/io/_seqio.pyx", line 222, in __iter__ (atropos/io/_seqio.c:5280)
atropos.io.seqio.FormatError: FASTQ file ended prematurely

Errors with NULL FASTQ sequences

Found a strange bug from some recent HiSeq data:

When running:
atropos --aligner insert -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
-o /shared/data/trim1.fq.gz -p /shared/data/trim2.fq.gz
-pe1 /shared/data/test1.fq.gz -pe2 /shared/data/test2.fq.gz
-u 10 -U 7 -q 25
--op-order ACQ

On attached paired fq (minimal file to demonstrate)
I get this error

Traceback (most recent call last):
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/base.py", line 74, in handle_records
self.handle_record(context, record)
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/base.py", line 127, in handle_record
return self.handle_reads(context, read1, read2)
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 52, in handle_reads
return self.record_handler.handle_record(context, read1, read2)
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 70, in handle_record
reads = self.modifiers.modify(read1, read2)
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/trim/modifiers.py", line 1054, in modify
read1, read2 = mods(read1, read2)
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/trim/modifiers.py", line 363, in call
match = self.aligner.match_insert(read1.sequence, read2.sequence)
File "/shared/conda/lib/python3.6/site-packages/atropos/align/init.py", line 280, in match_insert
seq2_rc = reverse_complement(seq2)
File "/shared/conda/lib/python3.6/site-packages/atropos/util/init.py", line 428, in reverse_complement
return "".join(BASE_COMPLEMENTS[base] for base in reversed(seq))
File "/shared/conda/lib/python3.6/site-packages/atropos/util/init.py", line 428, in
return "".join(BASE_COMPLEMENTS[base] for base in reversed(seq))
KeyError: '\x00'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/shared/conda/lib/python3.6/site-packages/atropos/util/init.py", line 674, in run_interruptible
func(*args, **kwargs)
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/base.py", line 23, in call
self.process_batch(batch)
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/base.py", line 58, in process_batch
self.handle_records(context, records)
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/trim/init.py", line 48, in handle_records
super().handle_records(context, records)
File "/shared/conda/lib/python3.6/site-packages/atropos/commands/base.py", line 78, in handle_records
idx, context['index'])) from err
atropos.AtroposError: An error occurred at record 998 of batch 1

It seems to coordinate with some empty fq records:
@170830_GRCF3_0496_AHTCV7BCXY:2:2216:21251:101305/1
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
+
####################################################################################################
@170830_GRCF3_0496_AHTCV7BCXY:2:2216:21251:101345/1
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
+
####################################################################################################

In contrast, trim_galore and other trimming software doesn't seem to have a problem with it.

test2.fq.gz
test1.fq.gz

atropos does not seem to cut most of adapters and primers

I've tried atropos in two modes ( with default downloadable adapters and with detection of adapters and using results of that run in trimming). It does not seem to clean at all. I compared it with sickle and the latest works much better, but still does not cut all Illumina primers. I enclose FASTQC reports.
REPORTS.zip for original, atropos and sickle. The SRA that was used is SRR2040662

minor discrepancy between cutadapt and atropos report

I am aware that atropos can output reports in json, which is more robust but currently I am still parsing the report in txt format and noticed this discrepancy in the single end case as compared to the cutadapt report. This is just a typo in the label. Note the correct "s" in "Reads with adapters"

=== Summary ===

Total reads processed:                   2,500
Reads with adapters:                         0 (0.0%)
Reads that were too short:                  18 (0.7%)
Reads written (passing filters):         2,482 (99.3%)

whereas in atopos 1.1.5 there is no "s" at "adapter"

Reads                                  records   fraction
----------------------------------- ---------- ----------
Total reads processed:                  78,316
Reads with adapter:                        330       0.4%
Reads that were too short:                 285       0.4%

Error when using --sra with json output

The JSON serializer chokes when serializing the sra_reader. The sra_reader that is set in the option dict is intended to be transient. It just needs to be nulled out after use.

single read adapter detection does not work

Whenever I try to do this from the latest docker, I get:

2017-05-12 18:35:04,245 INFO: Detecting adapters and other potential contaminant sequences based on 12-mers in 10000 reads
2017-05-12 18:35:04,432 ERROR: Error executing command: detect
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/__init__.py", line 199, in execute_cli
    retcode, _ = command.execute(args)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/__init__.py", line 74, in execute
    summary, options.report_file, options.report_formats)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/__init__.py", line 153, in generate_reports
    generator.generate_reports(summary)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/reports.py", line 51, in generate_reports
    self.add_derived_data(summary)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/reports.py", line 66, in add_derived_data
    for bp in summary['total_bp_counts'])
KeyError: 'total_bp_counts'

Error atropos trim using multiple threads

Hello,

I used atropos trim with --threads 4 and received an error shown below. When I run the same command without the threads option, everything runs well.

atropos trim --info-file info_file --no-default-adapters --known-adapters-file sequencing_adapters.fa --input1 SRR5829902_1.fastq.gz --input2 SRR5829902_2.fastq.gz --trim-n -q 20 --minimum-length 90 --maximum-length 101 --max-n 1 --preserve-order --report-file atropos_report_file --log-file atropos_logfile --output SRR5829902_1_atropos_trim --paired-output SRR5829902_2_atropos_trim --threads 4

Process Result process:
Traceback (most recent call last):
  File "/Users/ekopylova/miniconda/envs/atropos/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/Users/ekopylova/miniconda/envs/atropos/lib/python3.5/site-packages/atropos/commands/trim/__init__.py", line 874, in run
    num_batches if num_batches > 0 else None)
  File "/Users/ekopylova/miniconda/envs/atropos/lib/python3.5/site-packages/atropos/commands/trim/__init__.py", line 781, in finish
    self.writers.close()
AttributeError: 'ResultProcess' object has no attribute 'close'

Let me know if you require further info. Thanks!

Jenya

OverflowError: int too large to convert to float

Atropos crashes with "OverflowError: int too large to convert to float"

2017-10-06 22:44:50,847 ERROR: Unexpected error in Worker process 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/multicore.py", line 210, in run
    self.pipeline.process_batch(batch)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/multicore.py", line 146, in process_batch
    super().process_batch(batch)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/base.py", line 58, in process_batch
    self.handle_records(context, records)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/trim/__init__.py", line 48, in handle_records
    super().handle_records(context, records)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/base.py", line 73, in handle_records
    self.handle_record(context, record)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/base.py", line 122, in handle_record
    return self.handle_reads(context, read1, read2)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/trim/__init__.py", line 52, in handle_reads
    return self.record_handler.handle_record(context, read1, read2)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/trim/__init__.py", line 70, in handle_record
    reads = self.modifiers.modify(read1, read2)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/trim/modifiers.py", line 1054, in modify
    read1, read2 = mods(read1, read2)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/trim/modifiers.py", line 363, in __call__
    match = self.aligner.match_insert(read1.sequence, read2.sequence)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/align/__init__.py", line 359, in match_insert
    prob = self.match_probability(insert_match[4], insert_match_size, **self.base_probs)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/util/__init__.py", line 95, in __call__
    self.factorial(i) /
OverflowError: int too large to convert to float
2017-10-06 22:46:01,125 ERROR: Atropos error
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/util/__init__.py", line 673, in run_interruptible
    func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/multicore.py", line 291, in __call__
    self.ensure_alive)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/multicore.py", line 512, in enqueue_all
    fail_callback=fail_callback)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/multicore.py", line 426, in wait_on
    fail_callback()
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/trim/__init__.py", line 686, in ensure_alive
    super().ensure_alive()
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/multicore.py", line 249, in ensure_alive
    ensure_processes(self.worker_processes)
  File "/usr/local/lib/python3.6/site-packages/atropos-0+unknown-py3.6-linux-x86_64.egg/atropos/commands/multicore.py", line 394, in ensure_processes
    str(i) for i, a in enumerate(is_alive) if a != alive)))
atropos.commands.multicore.MulticoreError: One or more process exited: 0,1,2,3,4,5,6
2017-10-06 22:46:05,863 ERROR: Result process waiting on result for 70.1 seconds
2017-10-06 22:46:10,868 ERROR: Result process waiting on result for 75.1 seconds
2017-10-06 22:46:15,874 ERROR: Result process waiting on result for 80.1 seconds
2017-10-06 22:46:20,879 ERROR: Result process waiting on result for 85.1 seconds
2017-10-06 22:46:25,880 ERROR: Result process waiting on result for 90.1 seconds
2017-10-06 22:46:30,886 ERROR: Result process waiting on result for 95.1 seconds
2017-10-06 22:46:35,889 ERROR: Result process waiting on result for 100.1 seconds
2017-10-06 22:46:40,890 ERROR: Result process waiting on result for 105.1 seconds
2017-10-06 22:46:45,896 ERROR: Result process waiting on result for 110.1 seconds
2017-10-06 22:46:50,901 ERROR: Result process waiting on result for 115.1 seconds
2017-10-06 22:46:55,903 ERROR: Result process waiting on result for 120.1 seconds
2017-10-06 22:47:00,903 ERROR: Result process waiting on result for 125.1 seconds
2017-10-06 22:47:01,184 ERROR: Waiting on Result process to terminate for 60.1 seconds

the commands that I run with latest atropos docker container were:

cd /cromwell-executions/quality_de_novo/0cd5264f-4d18-4639-95c7-6b4f34cdd992/call-atropos_illumina_pe/execution
atropos trim \
  --aligner insert \
  -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG \
  -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT \
  -pe1 /cromwell-executions/quality_de_novo/0cd5264f-4d18-4639-95c7-6b4f34cdd992/call-atropos_illumina_pe/inputs/pipelines/whales/graywhale/GWliver_S1_L001_R1_001.fastq.gz \
  -pe2 /cromwell-executions/quality_de_novo/0cd5264f-4d18-4639-95c7-6b4f34cdd992/call-atropos_illumina_pe/inputs/pipelines/whales/graywhale/GWliver_S1_L001_R2_001.fastq.gz \
  -o GWliver_S1_L001_R1_001_trimmed.fastq.gz \
  -p GWliver_S1_L001_R2_001_trimmed.fastq.gz \
  --threads 8 \
  --correct-mismatches liberal

I can give the input files if needed but they are quite large (several gb-s)

gzipped file created by atropos cannot be parsed by pysam.

I have experience a problem with the gzip file output by atropos (version 1.0.23).
The output fastq.gz file is correct, however, when parsed with the pysam library it looks like the gzip file is corrupted somehow and the iteration stops (without errore). I am not sure this is a pysam issue or an atropos issue (or both). Here is the code used to scan the fastq.gz file created by atropos in multithreaded mode.

>>> import pysam
>>> fastq = pysam.FastxFile(self.filename)
>>> for i, record in enumerate(fastq):
>>>     pass
>>> print(i)
985

but the input fastq file has a million reads. Then, I decompressed and recompressed the file and everything seems fine. I have posted this issue in atropos repository (not in pysam yet) to figure out whether others had experience this issue; I understand this may not be an atropos issue. I was using zlib 1.2.11 and pysam 0.11.2.2

No module named lzma

I downloaded this and tried a local install using python3 setup.py install --user. Now when I run it I get an error that I have been unable to trace. The full error stack is

[dsimpson@localhost atropos]$ atropos
Traceback (most recent call last):
File "/home/dsimpson/.local/bin/atropos", line 4, in
import('pkg_resources').run_script('atropos==0+unknown', 'atropos')
File "/usr/lib/python3.5/site-packages/pkg_resources/init.py", line 750, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python3.5/site-packages/pkg_resources/init.py", line 1527, in run_script
exec(code, namespace, namespace)
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/EGG-INFO/scripts/atropos", line 26, in
main()
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/EGG-INFO/scripts/atropos", line 23, in main
sys.exit(execute_cli(args))
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/commands/init.py", line 189, in execute_cli
print_subcommands()
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/commands/init.py", line 232, in print_subcommands
command.get_help() for command in iter_commands())))
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/commands/init.py", line 232, in
command.get_help() for command in iter_commands())))
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/commands/init.py", line 101, in get_help
name=self.name, description=self.description.strip())
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/commands/init.py", line 95, in description
return self.get_command_parser_class().description
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/commands/init.py", line 82, in get_command_parser_class
mod = import_module(self.cli_module)
File "/usr/lib/python3.5/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 986, in _gcd_import
File "", line 969, in _find_and_load
File "", line 944, in _find_and_load_unlocked
File "", line 222, in _call_with_frames_removed
File "", line 986, in _gcd_import
File "", line 969, in _find_and_load
File "", line 958, in _find_and_load_unlocked
File "", line 673, in _load_unlocked
File "", line 665, in exec_module
File "", line 222, in _call_with_frames_removed
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/commands/detect/init.py", line 8, in
from atropos.commands.base import (
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/commands/base.py", line 8, in
from atropos.adapters import AdapterCache
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/adapters/init.py", line 15, in
from atropos.io.seqio import ColorspaceSequence, FastaReader
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/io/init.py", line 8, in
from atropos.io.compression import get_file_opener
File "/home/dsimpson/.local/lib/python3.5/site-packages/atropos-0+unknown-py3.5-linux-x86_64.egg/atropos/io/compression.py", line 6, in
import lzma
File "/usr/lib/python3.5/lzma.py", line 26, in
from _lzma import *
ImportError: No module named '_lzma'

I am assuming I did something wrong here, any ideas? Thank you.

Known adapter detector not detecting known adapters

In the 'detect' command, the 'known' detector (-d known) is not identifying known adapters. The KnownDetector._find_contaminants method is using the wrong calculation to determine the match_frac (currently it's calculating abundance).

feature idea: adapters by platform name

In the pipelines that deal with GEO it is easy (with GEOParse and other libs) to extract instument_model field from sample descripts with values like "Illumina HiSeq 2000". If users will be able to give such values to atropos instead of manually googling the adapters it would save a lot of time.

RFE: SImilar options to trimmomatic's CROP

Sometimes I just want to trim "from xx bases onward" or "take just the first xx bases". In trimmomatic (considerably slower than atropos) this is obtained with the HEADCROP and CROP commands, respectibely.

The rationale for this is stripping unique molecular identifiers (UMIs) from the actual reads from some targeted sequencing panels.

HEADCROP can probably be replaced with --cut which, according to the documentation, as it just cuts a fixed number of bases.

In sequencing chemistries that produce fixed-length reads like Illumina, a temporary CROP-like solution could be done with --cut -(read length - adapter) but sounds a bit hacky.

Parallelized atropos trim produces MulticoreError

Atropos trim when parallelized with compression on worker is failing on finish() command at this line.
Related to #30.

It appears that the code initializes cur_batch = 1 and increments this after writing each batch, thus when all batches have been written cur_batch == total_batches + 1, however this check expects cur_batch == total_batches.

What's unclear to me is why this only fails for some samples and not all, since this should always be true. I believe the fix should be simply updating this line to cur_batch == total_batches + 1.

To reproduce

atropos trim --log-level DEBUG -T 8 --preserve-order --no-default-adapters -u 6 -U 6 -o read1.trim.fastq.gz -p read2.trim.fastq.gz -pe1 read1.fastq.gz -pe2 read2.fastq.gz > trim.log

Causes the following error:

Process Result process:
Traceback (most recent call last):
  File "<PATH>/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "<PATH>/venv/lib/python3.5/site-packages/atropos/commands/trim/__init__.py", line 874, in run
    num_batches if num_batches > 0 else None)
  File "<PATH>/venv/lib/python3.5/site-packages/atropos/commands/trim/__init__.py", line 780, in finish
    total_batches))
atropos.commands.multicore.MulticoreError: OrderPreservingWriterResultHandler finishing without having seen 50482 batches

pip install fails on py3.6

Collecting atropos
 Using cached atropos-1.1.1.tar.gz
Building wheels for collected packages: atropos
 Running setup.py bdist_wheel for atropos ... error
 Complete output from command /home/endrebas/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-f13mmw81/atropos/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpb2sd14qqpip-wheel- --python-tag cp36:
 running bdist_wheel
 running build
 running build_py
 creating build
 creating build/lib.linux-x86_64-3.6
 creating build/lib.linux-x86_64-3.6/atropos
 copying atropos/_version.py -> build/lib.linux-x86_64-3.6/atropos
 copying atropos/__init__.py -> build/lib.linux-x86_64-3.6/atropos
 creating build/lib.linux-x86_64-3.6/tests
 copying tests/test_multicore.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_paired.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_colorspace.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_align.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_trim.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_qualtrim.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_atropos.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_seqio.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/utils.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_filters.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_xopen.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_adapters.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/test_modifiers.py -> build/lib.linux-x86_64-3.6/tests
 copying tests/__init__.py -> build/lib.linux-x86_64-3.6/tests
 creating build/lib.linux-x86_64-3.6/atropos/align
 copying atropos/align/__init__.py -> build/lib.linux-x86_64-3.6/atropos/align
 creating build/lib.linux-x86_64-3.6/atropos/adapters
 copying atropos/adapters/__init__.py -> build/lib.linux-x86_64-3.6/atropos/adapters
 creating build/lib.linux-x86_64-3.6/atropos/util
 copying atropos/util/colorspace.py -> build/lib.linux-x86_64-3.6/atropos/util
 copying atropos/util/__init__.py -> build/lib.linux-x86_64-3.6/atropos/util
 creating build/lib.linux-x86_64-3.6/atropos/commands
 copying atropos/commands/stats.py -> build/lib.linux-x86_64-3.6/atropos/commands
 copying atropos/commands/legacy_report.py -> build/lib.linux-x86_64-3.6/atropos/commands
 copying atropos/commands/base.py -> build/lib.linux-x86_64-3.6/atropos/commands
 copying atropos/commands/reports.py -> build/lib.linux-x86_64-3.6/atropos/commands
 copying atropos/commands/multicore.py -> build/lib.linux-x86_64-3.6/atropos/commands
 copying atropos/commands/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands
 copying atropos/commands/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands
 creating build/lib.linux-x86_64-3.6/atropos/io
 copying atropos/io/compression.py -> build/lib.linux-x86_64-3.6/atropos/io
 copying atropos/io/seqio.py -> build/lib.linux-x86_64-3.6/atropos/io
 copying atropos/io/progress.py -> build/lib.linux-x86_64-3.6/atropos/io
 copying atropos/io/__init__.py -> build/lib.linux-x86_64-3.6/atropos/io
 creating build/lib.linux-x86_64-3.6/atropos/commands/error
 copying atropos/commands/error/report.py -> build/lib.linux-x86_64-3.6/atropos/commands/error
 copying atropos/commands/error/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands/error
 copying atropos/commands/error/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands/error
 creating build/lib.linux-x86_64-3.6/atropos/commands/trim
 copying atropos/commands/trim/writers.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
 copying atropos/commands/trim/reports.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
 copying atropos/commands/trim/filters.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
 copying atropos/commands/trim/modifiers.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
 copying atropos/commands/trim/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
 copying atropos/commands/trim/qualtrim.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
 copying atropos/commands/trim/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
 creating build/lib.linux-x86_64-3.6/atropos/commands/qc
 copying atropos/commands/qc/reports.py -> build/lib.linux-x86_64-3.6/atropos/commands/qc
 copying atropos/commands/qc/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands/qc
 copying atropos/commands/qc/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands/qc
 creating build/lib.linux-x86_64-3.6/atropos/commands/detect
 copying atropos/commands/detect/reports.py -> build/lib.linux-x86_64-3.6/atropos/commands/detect
 copying atropos/commands/detect/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands/detect
 copying atropos/commands/detect/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands/detect
 copying atropos/adapters/sequencing_adapters.fa -> build/lib.linux-x86_64-3.6/atropos/adapters
 UPDATING build/lib.linux-x86_64-3.6/atropos/_version.py
 set build/lib.linux-x86_64-3.6/atropos/_version.py to '1.1.1'
 running build_ext
 building 'atropos.align._align' extension
 creating build/temp.linux-x86_64-3.6
 creating build/temp.linux-x86_64-3.6/atropos
 creating build/temp.linux-x86_64-3.6/atropos/align
 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/endrebas/anaconda3/include/python3.6m -c atropos/align/_align.c -o build/temp.linux-x86_64-3.6/atropos/align/_align.o
 gcc -pthread -shared -L/home/endrebas/anaconda3/lib -Wl,-rpath=/home/endrebas/anaconda3/lib,--no-as-needed build/temp.linux-x86_64-3.6/atropos/align/_align.o -L/home/endrebas/anaconda3/lib -lpython3.6m -o build/lib.linux-x86_64-3.6/atropos/align/_align.cpython-36m-x86_64-linux-gnu.so
 building 'atropos.commands.trim._qualtrim' extension
 creating build/temp.linux-x86_64-3.6/atropos/commands
 creating build/temp.linux-x86_64-3.6/atropos/commands/trim
 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/endrebas/anaconda3/include/python3.6m -c atropos/commands/trim/_qualtrim.c -o build/temp.linux-x86_64-3.6/atropos/commands/trim/_qualtrim.o
 gcc: error: atropos/commands/trim/_qualtrim.c: No such file or directory
 gcc: fatal error: no input files
 compilation terminated.
 error: command 'gcc' failed with exit status 4

 ----------------------------------------
 Failed building wheel for atropos
 Running setup.py clean for atropos
Failed to build atropos
Installing collected packages: atropos
 Running setup.py install for atropos ... error
   Complete output from command /home/endrebas/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-f13mmw81/atropos/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-v1_jk3hp-record/install-record.txt --single-version-externally-managed --compile:
   running install
   running build
   running build_py
   creating build
   creating build/lib.linux-x86_64-3.6
   creating build/lib.linux-x86_64-3.6/atropos
   copying atropos/_version.py -> build/lib.linux-x86_64-3.6/atropos
   copying atropos/__init__.py -> build/lib.linux-x86_64-3.6/atropos
   creating build/lib.linux-x86_64-3.6/tests
   copying tests/test_multicore.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_paired.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_colorspace.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_align.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_trim.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_qualtrim.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_atropos.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_seqio.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/utils.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_filters.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_xopen.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_adapters.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/test_modifiers.py -> build/lib.linux-x86_64-3.6/tests
   copying tests/__init__.py -> build/lib.linux-x86_64-3.6/tests
   creating build/lib.linux-x86_64-3.6/atropos/align
   copying atropos/align/__init__.py -> build/lib.linux-x86_64-3.6/atropos/align
   creating build/lib.linux-x86_64-3.6/atropos/adapters
   copying atropos/adapters/__init__.py -> build/lib.linux-x86_64-3.6/atropos/adapters
   creating build/lib.linux-x86_64-3.6/atropos/util
   copying atropos/util/colorspace.py -> build/lib.linux-x86_64-3.6/atropos/util
   copying atropos/util/__init__.py -> build/lib.linux-x86_64-3.6/atropos/util
   creating build/lib.linux-x86_64-3.6/atropos/commands
   copying atropos/commands/stats.py -> build/lib.linux-x86_64-3.6/atropos/commands
   copying atropos/commands/legacy_report.py -> build/lib.linux-x86_64-3.6/atropos/commands
   copying atropos/commands/base.py -> build/lib.linux-x86_64-3.6/atropos/commands
   copying atropos/commands/reports.py -> build/lib.linux-x86_64-3.6/atropos/commands
   copying atropos/commands/multicore.py -> build/lib.linux-x86_64-3.6/atropos/commands
   copying atropos/commands/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands
   copying atropos/commands/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands
   creating build/lib.linux-x86_64-3.6/atropos/io
   copying atropos/io/compression.py -> build/lib.linux-x86_64-3.6/atropos/io
   copying atropos/io/seqio.py -> build/lib.linux-x86_64-3.6/atropos/io
   copying atropos/io/progress.py -> build/lib.linux-x86_64-3.6/atropos/io
   copying atropos/io/__init__.py -> build/lib.linux-x86_64-3.6/atropos/io
   creating build/lib.linux-x86_64-3.6/atropos/commands/error
   copying atropos/commands/error/report.py -> build/lib.linux-x86_64-3.6/atropos/commands/error
   copying atropos/commands/error/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands/error
   copying atropos/commands/error/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands/error
   creating build/lib.linux-x86_64-3.6/atropos/commands/trim
   copying atropos/commands/trim/writers.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
   copying atropos/commands/trim/reports.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
   copying atropos/commands/trim/filters.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
   copying atropos/commands/trim/modifiers.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
   copying atropos/commands/trim/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
   copying atropos/commands/trim/qualtrim.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
   copying atropos/commands/trim/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands/trim
   creating build/lib.linux-x86_64-3.6/atropos/commands/qc
   copying atropos/commands/qc/reports.py -> build/lib.linux-x86_64-3.6/atropos/commands/qc
   copying atropos/commands/qc/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands/qc
   copying atropos/commands/qc/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands/qc
   creating build/lib.linux-x86_64-3.6/atropos/commands/detect
   copying atropos/commands/detect/reports.py -> build/lib.linux-x86_64-3.6/atropos/commands/detect
   copying atropos/commands/detect/cli.py -> build/lib.linux-x86_64-3.6/atropos/commands/detect
   copying atropos/commands/detect/__init__.py -> build/lib.linux-x86_64-3.6/atropos/commands/detect
   copying atropos/adapters/sequencing_adapters.fa -> build/lib.linux-x86_64-3.6/atropos/adapters
   UPDATING build/lib.linux-x86_64-3.6/atropos/_version.py
   set build/lib.linux-x86_64-3.6/atropos/_version.py to '1.1.1'
   running build_ext
   building 'atropos.align._align' extension
   creating build/temp.linux-x86_64-3.6
   creating build/temp.linux-x86_64-3.6/atropos
   creating build/temp.linux-x86_64-3.6/atropos/align
   gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/endrebas/anaconda3/include/python3.6m -c atropos/align/_align.c -o build/temp.linux-x86_64-3.6/atropos/align/_align.o
   gcc -pthread -shared -L/home/endrebas/anaconda3/lib -Wl,-rpath=/home/endrebas/anaconda3/lib,--no-as-needed build/temp.linux-x86_64-3.6/atropos/align/_align.o -L/home/endrebas/anaconda3/lib -lpython3.6m -o build/lib.linux-x86_64-3.6/atropos/align/_align.cpython-36m-x86_64-linux-gnu.so
   building 'atropos.commands.trim._qualtrim' extension
   creating build/temp.linux-x86_64-3.6/atropos/commands
   creating build/temp.linux-x86_64-3.6/atropos/commands/trim
   gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/endrebas/anaconda3/include/python3.6m -c atropos/commands/trim/_qualtrim.c -o build/temp.linux-x86_64-3.6/atropos/commands/trim/_qualtrim.o
   gcc: error: atropos/commands/trim/_qualtrim.c: No such file or directory
   gcc: fatal error: no input files
   compilation terminated.
   error: command 'gcc' failed with exit status 4

   ----------------------------------------
Command "/home/endrebas/anaconda3/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-f13mmw81/atropos/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-v1_jk3hp-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-f13mmw81/atropos/

Trimming Thruplex adapters

ThruPlex library prep places a UMI as the first 6 nt and then a piece of an adapter that will contain all or part of TCAGTAGCTCA. Right now attempting to trim using that sequence will remove almost all sequence instead of just the adapter part. I found that preprocessing ThruPlex reads to capture the UMI's first is required.

How to specify multiple adapters?

Could not find this info (easily) in the docs. Is it -b adapter1 -b adapter2 or -b adapter1,adapter2?

I dunno if you would ever have multiple adapters in a file, but I guess you could have multiple barcodes...

SAM output does not contain header

Due to the way output formatting is decoupled from output writing, it is not straight-forward to initialize an output file with a header. This leads to the generated SAM output being invalid.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.