Code Monkey home page Code Monkey logo

jelly2's Introduction

Jelly2

Note: this code is not yet fully functional! There are bugs that I am still working out. I will remove this statement and tag a release when the code is entirely debugged. That said, the code is essentially complete, with full implementation of the basic pipeline.

This project started out as a hack of PBJelly in an attempt to improve the efficiency of the code. I ended up rebuilding the core pipeline, focusing it down into four stages: Setup, Support, Assembly, and Placement. The code is designed for compatibility with the most recently developed standards, requiring PacBio subreads in the BAM format. It makes use of the cutting edge algorithms Minimap, Miniasm, and Racon, for ultra-fast assembly of gap sequences. The stages are managed by a single driver script, Jelly2.py, so this program is easy to use. To see the full range of options, run python Jelly2.py --help, but the basic usage is:

$ python Jelly2.py [options] <Scaffolds.fa> <Subreads.bam>

Make sure that the src directory is in your PYTHONPATH variable, so the driver script can find the modules. Though this repository was created on May 8, 2017, it contains the git tracking history of the above-linked PBJelly repository in which this project began.

Dependencies

I recommend using Pitchfork to deploy BLASR, along with SAMtools and bam2fastx

Design Philosophy

To find PacBio reads that span the gaps predicted in the scaffolds, gap-flanking sequences are extracted from the assembly. Separate Fasta files are created for the left flanks and the right flanks. The PacBio reads are then aligned against each flank file with BLASR. Supporting reads are those that align to each flank, spanning the predicted gap size within a certain margin of error. If there are enough support reads for a gap, the reads will be assembled with Minimap and Miniasm. A consensus will be determined with Minimap and Racon. If the assembled gap sequence passes quality control, it will be placed into the scaffold, filling the gap.

Contact

Any questions can be addressed to: [email protected]

jelly2's People

Contributors

dbrowneup avatar ressy avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

xma82

jelly2's Issues

Reads from multiple movies

Hi,

I am using Jelly2 to gap fill our genome assembly with PB reads, and I could not get Jelly2 to accept reads from multiple movies. First I tried combining subreads from multiple movies into a single bam file for blasr to map with, however blasr threw an error about mixed data sets, causing Jelly2 to fail. I then tried using an fofn containing the file names of the subread bamfiles as input for Jelly2 (which blasr v 5.3.2 allows), but Jelly2 threw an error and failed prior to sending out the mapping job to blasr.

I found a workaround where I used blasr to map the combined reads to the left and right gaps separately using an fofn as input, then started Jelly2 up at the sorting step in the Support stage. Would it be possible to allow an fofn as an acceptable input for read mapping (or, if you have already done so and I am not aware of it, advise how to make that functionality work)?

Otherwise, the Jelly2 wrapper works very nicely. I separately came up with a similar idea a few weeks ago and a friend told me about your program. I am happy that I don't have to re-invent the wheel!

Joel

QUAL values missing from BAM files, unable to write FastQ

Finally got everything working to the point where the program attempts to load QUAL values associated with the gap-bridging sequence. However, there are no QUAL values in the original PacBio BAM files. I have opened an issue with PacBio (https://github.com/PacificBiosciences/PacBioFileFormats/issues/50) to try and get this resolved.

[dbrowne@ada8 2017.05.09_Jelly2_Testing]$ cat OUT_JELLY2_TESTING_v21 
Found save point: support
Number of gaps loaded in gap_list: 38
6 gaps loaded for scaffold Contig17_1_1011970_pilon|arrow
Scaffold: Contig17_1_1011970_pilon|arrow Gap: 1 Size: 2509 Support: 0
Scaffold: Contig17_1_1011970_pilon|arrow Gap: 2 Size: 4400 Support: 0
Scaffold: Contig17_1_1011970_pilon|arrow Gap: 3 Size: 8609 Support: 97
Read Span: -5054
Read Span: 7754
Traceback (most recent call last):
  File "Jelly2.py", line 125, in <module>
    main()
  File "Jelly2.py", line 103, in main
    support.find_support(args)
  File "/scratch/user/dbrowne/Software/Jelly2/src/Support.py", line 79, in find_support
    quality = L.query_qualities[L.query_alignment_end:R.query_alignment_start]
TypeError: 'NoneType' object has no attribute '__getitem__'

------------------------------------------------------------
Sender: LSF System <lsfadmin@nxt1527>
Subject: Job 5689987: <JELLY2_TESTING_v21> in cluster <Main_Compute> Exited

Job <JELLY2_TESTING_v21> was submitted from host <login8> by user <dbrowne> in cluster <Main_Compute>.
Job was executed on host(s) <20*nxt1527>, in queue <sn_long>, as user <dbrowne> in cluster <Main_Compute>.
</home/dbrowne> was used as the home directory.
</scratch/user/dbrowne/2017.05_MAY/2017.05.09_Jelly2_Testing> was used as the working directory.
Started at Thu May 11 15:50:04 2017
Results reported on Thu May 11 15:50:35 2017

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#BSUB -J JELLY2_TESTING_v21 -L /bin/bash -W 40:00
#BSUB -n 20 -R "span[ptile=20] rusage[mem=2700] select[nxt]" -M 2700
#BSUB -o OUT_JELLY2_TESTING_v21
##
# Load Python, SAMtools
ml Python/2.7.12-intel-2016b
ml SAMtools/1.3.1-intel-2016b
# Load Jelly2, pysam, pyfaidx
export PYTHONPATH=$PYTHONPATH:/scratch/user/dbrowne/Software/Jelly2/src:/scratch/user/dbrowne/Software/lib/python2.7/site-packages
# Load BLASR
export LD_LIBRARY_PATH=/scratch/user/dbrowne/Software/pitchfork/deployment/lib:$LD_LIBRARY_PATH
export PATH=/scratch/user/dbrowne/Software/pitchfork/deployment/bin:$PATH
# Load Minimap, Miniasm, Racon
export PATH=/scratch/user/dbrowne/Software/racon/bin:/scratch/user/dbrowne/Software/minimap:/scratch/user/dbrowne/Software/miniasm:$PATH
#
cd /scratch/user/dbrowne/2017.05_MAY/2017.05.09_Jelly2_Testing
#
python Jelly2.py -t 20 -d 2 -b "--minSubreadLength 10000 --minAlnLength 200 --maxScore -1000" \
    Scaffolds_Q60HQ_1Mb.fa \
    PACBIO_Reads_ALL.bam

------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   28.77 sec.
    Max Memory :                                 2499 MB
    Average Memory :                             1419.20 MB
    Total Requested Memory :                     54000.00 MB
    Delta Memory :                               51501.00 MB
    Max Processes :                              4
    Max Threads :                                5

The output (if any) is above this job summary.

IOError: file `sorted_gaps.L.bam` not found

Jelly2 test #7 failed after successfully running BLASR alignments against left and right flanks. There appears to be a problem with the code wrapping samtools sort.

[dbrowne@ada3 2017.05.09_Jelly2_Testing]$ cat OUT_JELLY2_TESTING_v7 
[INFO] 2017-05-09T22:59:48 [blasr] started.
[INFO] 2017-05-10T02:00:05 [blasr] ended.
[INFO] 2017-05-10T02:00:05 [blasr] started.
[INFO] 2017-05-10T05:00:01 [blasr] ended.
open: No such file or directory
[bam_sort_core] fail to open file sorted_gaps.L.bam
open: No such file or directory
[bam_sort_core] fail to open file sorted_gaps.R.bam
open: No such file or directory
[bam_index_build2] fail to open the BAM file.
open: No such file or directory
[bam_index_build2] fail to open the BAM file.
Traceback (most recent call last):
  File "Jelly2.py", line 94, in <module>
    main()
  File "Jelly2.py", line 85, in main
    support.find_support(args)
  File "/scratch/user/dbrowne/Software/Jelly2/src/Support.py", line 42, in find_support
    gapsL = pysam.AlignmentFile('sorted_gaps.L.bam', 'rb')
  File "pysam/libcalignmentfile.pyx", line 397, in pysam.libcalignmentfile.AlignmentFile.__cinit__ (pysam/libcalignmentfile.c:5831)
  File "pysam/libcalignmentfile.pyx", line 558, in pysam.libcalignmentfile.AlignmentFile._open (pysam/libcalignmentfile.c:7556)
IOError: file `sorted_gaps.L.bam` not found

------------------------------------------------------------
Sender: LSF System <lsfadmin@nxt1356>
Subject: Job 5569273: <JELLY2_TESTING_v7> in cluster <Main_Compute> Exited

Job <JELLY2_TESTING_v7> was submitted from host <login3> by user <dbrowne> in cluster <Main_Compute>.
Job was executed on host(s) <20*nxt1356>, in queue <sn_long>, as user <dbrowne> in cluster <Main_Compute>.
</home/dbrowne> was used as the home directory.
</scratch/user/dbrowne/2017.05_MAY/2017.05.09_Jelly2_Testing> was used as the working directory.
Started at Tue May  9 22:59:39 2017
Results reported on Wed May 10 05:00:02 2017

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#BSUB -J JELLY2_TESTING_v7 -L /bin/bash -W 40:00
#BSUB -n 20 -R "span[ptile=20] rusage[mem=2700] select[nxt]" -M 2700
#BSUB -o OUT_JELLY2_TESTING_v7
##
# Load required modules
ml Biopython/1.68-intel-2016b-Python-2.7.12
ml SAMtools/1.3.1-intel-2016b
# Add python libraries for Jelly2, pysam, pyfaidx
export PYTHONPATH="$PYTHONPATH:/scratch/user/dbrowne/Software/Jelly2/src:/scratch/user/dbrowne/Software/lib/python2.7/site-packages"
# Load BLASR
export LD_LIBRARY_PATH=/scratch/user/dbrowne/Software/pitchfork/deployment/lib:$LD_LIBRARY_PATH
export PATH=/scratch/user/dbrowne/Software/pitchfork/deployment/bin:$PATH
# Load Minimap, Miniasm, Racon
export PATH=/scratch/user/dbrowne/Software/minimap:/scratch/user/dbrowne/Software/miniasm:/scratch/user/dbrowne/Software/pitchfork/deployment/bin:$PATH
#
cd /scratch/user/dbrowne/2017.05_MAY/2017.05.09_Jelly2_Testing
#
python Jelly2.py -t 20 -b "--minSubreadLength 10000" Scaffolds_Q60HQ_1Mb.fa PACBIO_Reads_ALL.bam

------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   57429.12 sec.
    Max Memory :                                 326 MB
    Average Memory :                             195.18 MB
    Total Requested Memory :                     54000.00 MB
    Delta Memory :                               53674.00 MB
    Max Swap :                                   1 MB
    Max Processes :                              5
    Max Threads :                                29

The output (if any) is above this job summary.

Note: approximately half of the PacBio subreads >10kb were aligned to the flanks. May want to tighten up BLASR alignment parameters to improve run time and reduce noise.

interested in using Jelly2

Hey there!

I'm very excited to see an alternative for PBJelly! I'm very interested in your program and would like to test it on my genome (few genomes, actually). I noticed that you have the disclaimer up there saying it is still buggy. What should I watch out for? What kind of metrics would be interested in knowing about its performance? Please let me know how I can help.

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.