Code Monkey home page Code Monkey logo

donut_falls's Introduction

Donut Falls

Named after the beautiful Donut Falls

Location: 40.630°N 111.655°W, Elevation: 7,942 ft (2,421 m), Hiking level: easy

(Image credit: User submitted photos at alltrails.com)

More information about the trail leading up to this landmark can be found at utah.com/hiking/donut-falls

Donut Falls is a Nextflow workflow developed by @erinyoung at the Utah Public Health Laborotory for long-read nanopore sequencing of microbial isolates. Built to work on linux-based operating systems. Additional config options are needed for cloud batch usage.

Donut Falls is also included in the staphb toolkit staphb-toolkit.

We made a wiki, please read it!

Wiki table of contents:

Getting started

Install dependencies

Quick start

nextflow run UPHL-BioNGS/Donut_Falls -profile <singularity or docker> --sample_sheet <sample_sheet.csv>

Sample Sheets

Sample sheet is a csv file with the name of the sample and corresponding nanopore fastq.gz file on a single row with header sample and fastq. When Illumina fastq files are available for polishing or hybrid assembly, they are added to end of each row under column header fastq_1 and fastq_2.

Option 1 : just nanopore reads

sample,fastq
test,long_reads_low_depth.fastq.gz

Option 2 : nanopore reads and at least one sample has Illumina paired-end fastq files

sample,fastq,fastq_1,fastq_2
sample1,sample1.fastq.gz,sample1_R1.fastq.gz,sample1_R2.fastq.gz
sample2,sample2.fastq.gz,,

Switching assemblers

There are currently several options for assembly

These are specified with the assembler paramater. If Illumina reads are found, then flye and raven assemblies will be polished with those reads.

Note: more than one assembler can be chosen (i.e. params.assembler = 'flye,raven'). This will run the input files on each assembler listed. Listing an assembler more than once will not create additional assemblies with that tool (i.e. params.assembler = 'flye,flye,flye' will still only run the input files through flye once).

Reading the sequencing summary file

Although not used for anything else, the sequencing summary file can be read in and put through nanoplot to visualize the quality of a sequencing run. This is an optional file and can be set with 'params.sequencing_summary'.

nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sequencing_summary <sequencing summary file>
  • WARNING : Does not work with older versions of the summary file.

Examples

# nanopore assembly with flye followed by polishing if illumina files are supplied
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv

# or with docker and specifying the assembler
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv --assembler flye

# hybrid assembly with unicycler where both nanopore and illumina files are required
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv --assembler unicycler

# assembling with all three asssemblers
# specifying the results to be stored in 'donut_falls_test_results' instead of 'donut_falls'
# using docker instead of singularity
nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet sample_sheet.csv --assembler unicycler,flye,raven


# using some test files (requires internet connection)
nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet sample_sheet.csv --test

# same as above
nextflow run UPHL-BioNGS/Donut_Falls -profile docker,test --sample_sheet sample_sheet.csv

Credits

Donut Falls would not be possible without

  • bandage : visualize gfa files
  • busco : assessment of assembly quality
  • bwa : aligning reads for polypolish
  • circulocov : read depth per contig
  • dnaapler : rotation
  • fastp : cleaning illumina reads (default values) and nanopore reads (minimum length = 1,000 & minimum Q = 12)
  • flye : de novo assembly (default assembler)
  • gfastats : assessment of assembly
  • medaka : polishing with nanopore reads
  • multiqc : amalgamation of results
  • nanoplot : fastq file QC visualization
  • polypolish : reduces sequencing artefacts through polishing with Illumina reads
  • pypolca : reduces sequencing artefacts through polishing with Illumina reads
  • rasusa : subsampling nanopore reads to 150X depth
  • raven : de novo assembly option (params.assembler = 'raven')
  • unicycler : hybrid assembly option (params.assembler = 'unicycler')

donut_falls's People

Contributors

erinyoung avatar k-florek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

k-florek

donut_falls's Issues

Version check

Needing updates: staphb/busco from 5.6.1-prok-bacteria_odb10_2024-01-08 to 5.6.1

Version check

Needing updates: staphb/fastp from 0.23.2 to 0.23.4

Version check

Needing updates: staphb/fastp from 0.23.2 to 0.23.4

Version check

Needing updates: staphb/dragonflye from 1.0.14 to 1.1.1 staphb/fastp from 0.23.2 to 0.23.4 staphb/flye from 2.9.2 to 2.9.3 staphb/htslib from 1.17 to 1.19 staphb/nanoplot from 1.40.0 to 1.41.6 staphb/raven from 1.8.1 to 1.8.3

Version check

Needing updates: staphb/fastp from 0.23.2 to 0.23.4

Version check

Needing updates: staphb/dragonflye from 1.0.14 to 1.1.2 staphb/fastp from 0.23.2 to 0.23.4 staphb/flye from 2.9.2 to 2.9.3 staphb/htslib from 1.17 to 1.19 staphb/nanoplot from 1.40.0 to 1.42.0 staphb/rasusa from 0.7.0 to 0.8.0 staphb/raven from 1.8.1 to 1.8.3

Version check

Needing updates: staphb/fastp from 0.23.2 to 0.23.4

Sort output fasta

The following example shows how use a closure to collect and sort all sequences in a FASTA file from shortest to longest:

Channel
.fromPath('/data/sequences.fa')
.splitFasta( record: [id: true, sequence: true] )
.collectFile( name:'result.fa', sort: { it.size() } ) {
it.sequence
}
.view { it.text }

Replace circlator

This is what fails 99% of the time.

I'd like to use dnaapler instead.

Version check

Needing updates: staphb/dragonflye from 1.0.14 to 1.1.1 staphb/fastp from 0.23.2 to 0.23.4 staphb/flye from 2.9.2 to 2.9.3 staphb/htslib from 1.17 to 1.18 staphb/nanoplot from 1.40.0 to 1.41.6 staphb/raven from 1.8.1 to 1.8.3

Version check

Needing updates: staphb/busco from 5.6.1-prok-bacteria_odb10_2024-01-08 to 5.6.1

Version check

Needing updates: staphb/fastp from 0.23.2 to 0.23.4

Version check

Needing updates: staphb/dragonflye from 1.0.14 to 1.1.2 staphb/fastp from 0.23.2 to 0.23.4 staphb/flye from 2.9.2 to 2.9.3 staphb/htslib from 1.17 to 1.19 staphb/nanoplot from 1.40.0 to 1.42.0 staphb/rasusa from 0.7.0 to 0.8.0 staphb/raven from 1.8.1 to 1.8.3

Version check

Needing updates: staphb/dragonflye from 1.0.14 to 1.1.1 staphb/fastp from 0.23.2 to 0.23.4 staphb/flye from 2.9.2 to 2.9.3 staphb/htslib from 1.17 to 1.19 staphb/nanoplot from 1.40.0 to 1.41.6 staphb/raven from 1.8.1 to 1.8.3

Add homopolish

Homopolish uses a similar genome to reduce indel errors. It would be useful to have as an option.

Version check

Needing updates: staphb/dragonflye from 2.9.2 to 1.0.14

Editing .config file

Hi Erin,
I edited the UPHL.config file calling it FBPHL.config
Within the file, I edited the comment portion, replacing it with the path of my Donut_Falls.nf file:
nextflow run /home/bi_fellow/Donut_Falls/Donut_Falls.nf -c /home/bi_fellow/Donut_Falls/configs/FBPHL.config -with-dag donut_falls_$(date +"%y-%m-%d-%H%M%S").png
Was that right?
However, when I run this
nextflow run UPHL-BioNGS/Donut_falls -profile singularity --reads LR_fastqs/barcode01.fastq.gz --reads reads
I got this
UPHL-BioNGS/Donut_falls currently is sticked on revision: erin-dev -- you need to specify explicitly a revision with the option -r to use it
What could be the issue?
Thanks,
TJ

Version check

Needing updates: staphb/dragonflye from 2.9.2 to 1.0.14

Add dragonflye assembly

This one is going to take some thought since it's like its own workflow, but it should probably get added. The process file is there, but it's not tests.

redo tests

Fails due to github actions as opposed to something in the script wrong

  • flye assembly
  • masurca hybrid assembly
  • raven assembly from directory
  • raven assembly with polishing

Version check

Needing updates: staphb/busco from 5.6.1-prok-bacteria_odb10_2024-01-08 to 5.6.1

Version check

Needing updates: staphb/dragonflye from 1.0.14 to 1.1.2 staphb/fastp from 0.23.2 to 0.23.4 staphb/flye from 2.9.2 to 2.9.3 staphb/htslib from 1.17 to 1.19 staphb/nanoplot from 1.40.0 to 1.42.0 staphb/rasusa from 0.7.0 to 0.8.0 staphb/raven from 1.8.1 to 1.8.3

Version check

Needing updates: staphb/busco from 5.6.1-prok-bacteria_odb10_2024-01-08 to 5.6.1

Version check

Needing updates: staphb/busco from 5.6.1-prok-bacteria_odb10_2024-01-08 to 5.6.1

Version check

Needing updates: staphb/dragonflye from 1.0.14 to 1.1.2 staphb/fastp from 0.23.2 to 0.23.4 staphb/flye from 2.9.2 to 2.9.3 staphb/htslib from 1.17 to 1.19 staphb/nanoplot from 1.40.0 to 1.42.0 staphb/polypolish from 0.5.0 to 0.6.0 staphb/rasusa from 0.7.0 to 0.8.0 staphb/raven from 1.8.1 to 1.8.3

Version check

Needing updates: staphb/dragonflye from 1.0.14 to 1.1.2 staphb/fastp from 0.23.2 to 0.23.4 staphb/flye from 2.9.2 to 2.9.3 staphb/htslib from 1.17 to 1.19 staphb/nanoplot from 1.40.0 to 1.42.0 staphb/polypolish from 0.5.0 to 0.6.0 staphb/rasusa from 0.7.0 to 0.8.0 staphb/raven from 1.8.1 to 1.8.3

Version check

Needing updates: staphb/dragonflye from 1.0.14 to 1.1.2 staphb/fastp from 0.23.2 to 0.23.4 staphb/flye from 2.9.2 to 2.9.3 staphb/htslib from 1.17 to 1.19 staphb/nanoplot from 1.40.0 to 1.42.0 staphb/rasusa from 0.7.0 to 0.8.0 staphb/raven from 1.8.1 to 1.8.3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.