Code Monkey home page Code Monkey logo

otb's Introduction

 +-------------------------------------------------------------------------------------+
 |                                             ,,                                      |
 |                                      mm    *MM                                      |
 |                                      MM     MM                                      |
 |                           ,pW"Wq.  mmMMmm   MM,dMMb.                                |
 |                          6W'   `Wb   MM     MM    `Mb                               |
 |                          8M     M8   MM     MM     M8                               |
 |                          YA.   ,A9   MM     MM.   ,M9                               |
 |                           `Ybmd9'    `Mbmo  P^YbmdP'                                |
 |                                                                                     |
 |-------------------------------------------------------------------------------------|
 |              _   ._   |        _|_  |_    _     |_    _    _  _|_  |                |
 |             (_)  | |  |  \/     |_  | |  (/_    |_)  (/_  _>   |_  o                |
 |                          /                                                          |
 | /   _    _   ._    _   ._ _    _      _.   _   _   _   ._ _   |_   |  o   _    _  \ |
 ||   (_|  (/_  | |  (_)  | | |  (/_    (_|  _>  _>  (/_  | | |  |_)  |  |  (/_  _>   ||
 | \   _|                                                                            / |
 +-------------------------------------------------------------------------------------+

only the best (genome assemblies) is a Hi-C / HiFi pipeline specifically designed for phasing

DOI License: Public Domain

Checkout the wiki, and tutorial

In order to utilize this pipeline:

otb operates from a local directory, and must be ran as ./otb.sh since it sources some shell scripts in the scr directory.

we recomend that you set NXF_SINGULARITY_LIBRARYDIR in your bashrc (or similar CLI) environment, otb containers will be stored at that location.

in order to use otb download this repository and use ./otb.sh, an example:

 ./otb.sh --runner sge --mode homozygous --threads 40 -f otb_test_file_R2.fastq -r otb_test_file_R1.fastq --polish-type simple --reads otb_test.bam

otb runs in the following fashion:

otb is fully featured and utilzes the following softwares:

  • bbtools
  • genomescope2
  • hifiasm
  • jellyfish
  • pbadapterfilt
  • ragtag
  • samtools
  • shhquis
  • busco
  • hicstuff
  • any2fasta
  • blobtools
  • merfin
  • deep variant
  • yahs
  • bcftools
  • bwa
  • fcs-adaptor

otb now has a preprint. Preprint: https://doi.org/10.32942/X2T897

otb is in the public domain in the United States per 17 U.S.C. § 105

otb's People

Contributors

astahlke avatar dluecke avatar molikd avatar sharupaul avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

otb's Issues

samtools flagstat, multiple bam files

capture multiple bams as a list, loop through and run flagstat once per bam. This is sanity check for the bams being the right size, having correct number of reads, etc.

generate the hic files

yahs juicer_pre for yahs outputs, and juicebox tools juicer pre for fragments list from hicstuff.

ragtag patch step is hella slow.

The ragtag patch step with the ec reads is hella slow, this is the one before HiC is run. Either need to speed it up or remove it.

gfa2fasta error in phasing mode

this line in gfa2fasta needs to be updated to the genomes of any of the HiFiASM modes, namely in phasing mode hic.p_ctg is the genome

file '*.bp.p_ctg.gfa.fasta' optional true into fasta_unoriented_ch, fasta_genome_ch, fasta_busco_ch, no_polish_yahs_align_genome_ch, fasta_fai_yahs_genome_ch

Yahs integration

Yahs to be added to otb, ragtag will need to be run on yahs output to align rebuild hap1 and hap2 outputs.

otb.sh needs to be updated for hic/mat/pat reads

params.hicreadf = "$baseDir/data/.R1.fastq.gz"
params.hicreadr = "$baseDir/data/
.R2.fastq.gz"
params.matreadf = "$baseDir/data/.R1.fastq.gz"
params.matreadr = "$baseDir/data/
.R2.fastq.gz"
params.patreadf = "$baseDir/data/.R1.fastq.gz"
params.patreadr = "$baseDir/data/
.R2.fastq.gz"

otb needs correct arguments

Is ragtag a good default component?

Ragtag could potentially generate false positives and incorrectly capture inversions between haplotypes. Inverted haplotype of chromosome 4 on locust genome could be a good case study.

The hifi data input format only the bam format ?

Thank you for your work. I'm trying to use otb(only the best), but I see that the input of --readbam in wiki is bam file. I wonder if this can be in fasta format. Most assembly software uses fasta format files, and I can't find my original bam file.

I tried to run it, but an error occurred. I don't know if I didn't understand it well. I hope you can give me some advice.

./otb.sh --runner none --mode homozygous --threads 40 -f 00.data/hic_R1.fastq -r 00.data/hic_R2.fastq --polish-type simple --bam hifi.fatsa

[Sun Jul 3 20:49:30 CST 2022]: checking runner
[Sun Jul 3 20:49:30 CST 2022]: none being used
[Sun Jul 3 20:49:30 CST 2022]: checking polishing
[Sun Jul 3 20:49:30 CST 2022]: simple/ragtag polishing
[Sun Jul 3 20:49:30 CST 2022]: building run parameters
[Sun Jul 3 20:49:30 CST 2022]: not running busco
[Sun Jul 3 20:49:30 CST 2022]: reads file(s) not given, exiting

thanks for your help!

IMG files showing up in home folder, can those go in folder assembly run? maybe need some config

(base) [scott.geib@ceres ~]$ pwd
/home/scott.geib

-rwxrwxr-x. 1 scott.geib scott.geib 438M Sep 8 20:07 dmolik-blobtools.img
-rwxrwxr-x. 1 scott.geib scott.geib 521M Sep 8 19:52 dmolik-genomescope2.img
-rwxrwxr-x. 1 scott.geib scott.geib 74M Sep 8 19:53 dmolik-hifiasm.img
-rwxrwxr-x. 1 scott.geib scott.geib 59M Sep 8 19:53 dmolik-jellyfish.img
-rwxrwxr-x. 1 scott.geib scott.geib 400M Sep 8 19:56 dmolik-pbadapterfilt.img
-rwxrwxr-x. 1 scott.geib scott.geib 392M Sep 8 19:58 dmolik-ragtag.img
-rwxrwxr-x. 1 scott.geib scott.geib 122M Sep 8 20:00 dmolik-shhquis.img

memory limits with hifiasm

large datasets can currently cause out of memory errors while running hifiasm, limit (in part) with -f

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.