Code Monkey home page Code Monkey logo

platanus_b's Introduction

Platanus_B README.md

Description

Platanus_B is a de novo assembler for isolated bacterial genomes. The features of this tool are as follows: (1) It requires at least one Illuimina paired-end library. This library is useful for large-scale and/or high-resolution analysis. (2) It also can accept Oxford-Nanopore and/or PacBio long reads ("iterate" command). (3) Implementing the iteration of sequence-extensions, gap-closing and error-removals, it can archive assemblies with contiguity and accuracy for many cases. (4) As an utility function, it can combine multiple assemblies through the "combine" command.

Version

v1.3.2

Web site

http://platanus.bio.titech.ac.jp/

Author

Rei Kajitani at Tokyo Institute of Technology wrote key source codes.
Address for this tool: [email protected]

Requirements

  • OpenMP

    • To compile the source code.
  • Minimap2

  • Perl

    • To execute the scripts in this package, which .

Installation

Using Docker

docker pull rkajitani/platanus_b
docker run -it --rm -v $(pwd):/work -w /work rkajitani/platanus_b

Options of docker run can be modified according to users' environments and purposes (e.g., --rm and -v).

From source

make
cp platanus_b <installation_path>

Note that the absolute path of "sub_bin", which consists of Perl-scripts and minimap2 for "iterate" and "combine" commands, are written in the platanus_b executable. Please re-complile this if you move sub_bin to another directory. For macOS, please install OpenMP if a complilation fails. This can be installed using Homebrew with the command of "brew install libomp" or "brew install llvm".

Synopsis

Inputs

  • Illumina paired-end: PE_1.fq PE_2.fq (mandatory)
  • Oxford Nanopor long-reads: ONT.fq (optional)

Commands

platanus_b assemble -f PE_1.fq PE_2.fq 2>assemble.log
platanus_b iterate -c out_contig.fa -IP1 PE_1.fq PE_2.fq -ont ONT.fq 2>iterate.log

Final output

out_iterativeAssembly.fa

Contig assembly usage

Command

platanus_b assemble [OPTIONS] 2>log

Options

-o STR               : prefix of output files (default out, length <= 200)
-f FILE1 [FILE2 ...] : reads file (fasta or fastq, number <= 100)
-k INT               : initial k-mer size (default 32)
-K FLOAT             : maximum-k-mer factor (maximum-k = FLOAT*read-length, default  0.5)
-s INT               : step size of k-mer extension (>= 1, default 10)
-n INT               : initial k-mer coverage cutoff (default 0, 0 means auto)
-c INT               : minimun k-mer coverage (default 1)
-a FLOAT             : k-mer extension safety level (default 10.0)
-u FLOAT             : maximum difference for bubble crush (identity, default 0)
-d FLOAT             : maximum difference for branch cutting (coverage ratio, default 0.5)
-e FLOAT             : k-mer coverage depth (k = initial k-mer size specified by -k) of homozygous region (default auto)
-t INT               : number of threads (<= 100, default 1)
-m INT               : memory limit for making kmer distribution (GB, >=1, default 16)
-tmp DIR             : directory for temporary files (default .)
-kmer_occ_only       : only output k-mer occurrence table (out_kmer_occ.bin; default off)
-repeat              : mode to assemble repetitive sequences (e.g. 16s rRNA))

Input format:

Uncompressed and compressed (gzip or bzip2) files are accepted for -f option.

Outputs:

PREFIX_contig.fa
PREFIX_kmerFrq.tsv

PREFIX is specified by -o

Iteration of sequence-extension, gap-closeing and error-removal.

Command

platanus_b iterate [OPTIONS] 2>log

Options

-o STR                             : prefix of output file and directory (do not use "/", default out, length <= 200)
-c FILE1 [FILE2 ...]               : contig (or scaffold) file (fasta format)
-i INT                             : number of iterations (default 6)
-l INT                             : -l value of "scaffold" step
-u FLOAT                           : maximum difference for bubble crush (identity, default 0)
-ip{INT} PAIR1 [PAIR2 ...]         : lib_id inward_pair_file (reads in 1 file, fasta or fastq)
-IP{INT} FWD1 REV1 [FWD2 REV2 ...] : lib_id inward_pair_files (reads in 2 files, fasta or fastq)
-op{INT} PAIR1 [PAIR2 ...]         : lib_id outward_pair_file (reads in 1 file, fasta or fastq)
-OP{INT} FWD1 REV1 [FWD2 REV2 ...] : lib_id outward_pair_files (reads in 2 files, fasta or fastq)
-ont FILE1 [FILE2 ...]             : Oxford Nanopore long-read file (fasta or fastq)
-p FILE1 [FILE2 ...]               : PacBio long-read file (fasta or fastq)
-gc FILE1 [FILE2 ...]              : Guiding contig file; i.e. other assemblies, synthetic long-reads or corrected reads (fasta or fastq)
-t INT                             : number of threads (default 1)
-m INT                             : memory limit for making kmer distribution (GB, >=1, default 1)
-tmp DIR                           : directory for temporary files (default .)
-sub_bin DIR                       : directory for binary files which platanus_b use internally (e.g. minimap2) (default compilation-dir/sub_bin)
-keep_file                         : keep intermediate files (default, off)
-trim_overlap                      : trim overlapping edges of scaffolds (default, off)

Input format:

Uncompressed and compressed (gzip or bzip2) files are accepted for -c, -ip, -IP, -op, -OP, -p, -x and -X option.

Outputs:

PREFIX_iterativeAssembly.fa (including sequences below)

PREFIX is specified by -o


Notes

  • Options related to run time Although -t (number of threads) of all commands and -m (memory amount) of the "assemble" command are not mandatory to run, it is recommended to set the values adjusting your machine-environment. These options may severely effect the run time.
    e.g.,
    Available number of threads and memory amount are 4 and 16GB, respectively.
    -> -t 4 -m 16

  • Compressed input files Both uncompressed and compressed (gzip or bzip2) FASTA/FASTQ files are accepted. Formats are auto-detected. Internally, "file -bL", "gzip -cd" and "bzip2 -cd" commands, which can be used in most of the UNIX OSs, are utilized.

  • Paired-end (mate-pair) input
    The "phase" and "consensus" accept paired-end and/or mate-pair libraries. Paired libraries are classified into "inward-pair" and "outward-pair" according to the sequence direction. For file formats, separate and interleaved files can be input through -IP (-OP) and -ip (-op) options, respectively.

Inward-pair (usually called "paired-end", accepted in options "-IP" or "-ip"):

FWD --->
    5' -------------------- 3'
    3' -------------------- 5'
                    <--- REV 

Outward-pair (usually called "mate-pair", accepted in options "-OP" or "-op"):

                    ---> REV 
    5' -------------------- 3'
    3' -------------------- 5'
FWD <---

Example inputs:

Inward-pair (separate, insert=300)   : PE300_1.fq PE300_2.fq
Inward-pair (interleaved, insert=500): PE500_pair.fq
Outward-pair (separate, insert=2k)   : MP2k_1.fa MP2k_2.fq

Corresponding options:

-IP1 PE300_1_pair.fq PE300_2.fq \
-ip2 PE500_pair.fq \
-OP3 MP2k_1.fq MP2k_2.fq

platanus_b's People

Contributors

rkajitani avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

tomoakin

platanus_b's Issues

Error(11): Error, Create link exception!!

Hello,
I am trying Platanus_B on a bacterial dataset. The first step (assemble) worked well but the second step (iterate) failed with the error message "Error(11): Error, Create link exception!!
ln, cp, mv or cat command failed."

$platanus_b assemble -f Illumina_reads1_trimmed.fq Illumina_reads2_trimmed.fq -t 50

platanus_b version: 1.3.2
platanus_b assemble -f Illumina_reads1_trimmed.fq Illumina_reads2_trimmed.fq -t 50

K = 32, saving kmers from reads...
AVE_READ_LEN=149.906

KMER_EXTENSION:
K=32, KMER_COVERAGE=68.1114 (>= 9), COVERAGE_CUTOFF=9
K=42, KMER_COVERAGE=62.3832, COVERAGE_CUTOFF=9, PROB_SPLIT=10e-15.9546
K=52, KMER_COVERAGE=56.6551, COVERAGE_CUTOFF=9, PROB_SPLIT=10e-14.0795
K=62, KMER_COVERAGE=50.9269, COVERAGE_CUTOFF=9, PROB_SPLIT=10e-11.9534
K=72, KMER_COVERAGE=45.1988, COVERAGE_CUTOFF=8, PROB_SPLIT=10e-10.6336
K=82, KMER_COVERAGE=39.4706, COVERAGE_CUTOFF=6, PROB_SPLIT=10e-10.1412
K=92, KMER_COVERAGE=33.7424, COVERAGE_CUTOFF=3, PROB_SPLIT=10e-10.8317
K=102, KMER_COVERAGE=28.0143, COVERAGE_CUTOFF=1, PROB_SPLIT=10e-11.1251
K=107, KMER_COVERAGE=25.1502, COVERAGE_CUTOFF=1, PROB_SPLIT=10e-10.1444
K=108, KMER_COVERAGE=24.5774, COVERAGE_CUTOFF=1, PROB_SPLIT=10e-10.3728
K=109, KMER_COVERAGE=24.0046, COVERAGE_CUTOFF=1, PROB_SPLIT=10e-10.124
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=1328
NUM_CUT=3
NUM_CUT=0
TOTAL_NUM_CUT=1331
extracting reads (containing kmer used in contig assemble)...
K = 42, loading kmers from contigs...
K = 42, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 9
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=278
NUM_CUT=1
NUM_CUT=0
TOTAL_NUM_CUT=279
extracting reads (containing kmer used in contig assemble)...
K = 52, loading kmers from contigs...
K = 52, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 9
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=183
NUM_CUT=2
NUM_CUT=0
TOTAL_NUM_CUT=185
extracting reads (containing kmer used in contig assemble)...
K = 62, loading kmers from contigs...
K = 62, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 9
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=127
NUM_CUT=1
NUM_CUT=0
TOTAL_NUM_CUT=128
extracting reads (containing kmer used in contig assemble)...
K = 72, loading kmers from contigs...
K = 72, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 8
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=112
NUM_CUT=0
TOTAL_NUM_CUT=112
extracting reads (containing kmer used in contig assemble)...
K = 82, loading kmers from contigs...
K = 82, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 6
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=161
NUM_CUT=1
NUM_CUT=0
TOTAL_NUM_CUT=162
extracting reads (containing kmer used in contig assemble)...
K = 92, loading kmers from contigs...
K = 92, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 3
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=489
NUM_CUT=5
NUM_CUT=0
TOTAL_NUM_CUT=494
extracting reads (containing kmer used in contig assemble)...
K = 102, loading kmers from contigs...
K = 102, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 1
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=6321
NUM_CUT=51
NUM_CUT=0
TOTAL_NUM_CUT=6372
extracting reads (containing kmer used in contig assemble)...
K = 107, loading kmers from contigs...
K = 107, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 1
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=4417
NUM_CUT=41
NUM_CUT=0
TOTAL_NUM_CUT=4458
extracting reads (containing kmer used in contig assemble)...
K = 108, loading kmers from contigs...
K = 108, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 1
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=3227
NUM_CUT=25
NUM_CUT=0
TOTAL_NUM_CUT=3252
extracting reads (containing kmer used in contig assemble)...
K = 109, loading kmers from contigs...
K = 109, saving additional kmers(not found in contigs) from reads...
COVERAGE_CUTOFF = 1
loading kmers...
connecting kmers...
removing branches...
BRANCH_DELETE_THRESHOLD=0.5
NUM_CUT=3078
NUM_CUT=25
NUM_CUT=0
TOTAL_NUM_CUT=3103
LENGTH_CUTOFF = 218
COVERAGE_CUTOFF = 3
removing erroneous nodes...
NUM_REMOVED_NODES=3124
NUM_REMOVED_NODES=534
NUM_REMOVED_NODES=6
NUM_REMOVED_NODES=0
TOTAL_NUM_REMOVED_NODES=3664
extracting reads (containing kmer used in contig assemble)...
K = 109, loading kmers from contigs...
K = 109, saving additional kmers(not found in contigs) from reads...
loading kmers...
connecting kmers...
assemble completed!

$platanus_b iterate -c out_contig.fa -IP1 Illumina_reads1-trimmed.fq Illumina_reads2-trimmed.fastq -ont ONT_reads.fa -t 50

checking sub-executables ...
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/close_gap.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/combinatorial_gap_close.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/fasta_around_gap.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/fasta_grep.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/get_flanked_region_info_outer.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/get_flanked_region_info.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/paf_contained_short_seq_list_bubble_aware.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/paf_filter_flanking_pair.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/paf_filter_qcov.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/paf_match_short_seq_list_bubble_aware.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/paf_max_match_unique.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/reduce_filled_info.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/remove_redundant_seq.pl
/opt/binf-f-402/part2/src/Platanus_B/sub_bin/minimap2
all sub-executables found

Error(11): Error, Create link exception!!
ln, cp, mv or cat command failed.

Thanks a lot in advance for your help.

Can you add a resume option to platanus-allee?

Hi @rkajitani

Platanus-allee is a good assembler. But it is very slow and memory consuming for large and complex plant genomes (genome size is ~3.6G). The most maddening thing is that we need to rerun from the very beginning if the program is interrupted (like power failure), can you add an option to resume the running from the last normal step (very usefull in phasing stage)?

Best,
Kun

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.