Code Monkey home page Code Monkey logo

cosine's Introduction

COSINE

COINE is an aligner tool for mapping long noisy reads (e.g. TGS reads)

Command options

Type cosine -h for list of command options and default values.

    --ref_filename STR         : input reference filename 
    --read_filename STR        : input read filename
    --output_ref_prefix STR    : reference index file prefix
    --output_read_prefix STR   : output prefix for alignment results
    
    --save_ref_fft             : index reference file

    --num_threads INT          : number of threads in dynamic programming step [1]
        
    --kmer_size INT            : k-mer size (recommended: 3 or 4) [3]
        
    --window_size INT          : window size [500]
    --window_shift INT         : distance between consecutive windows [10]
        
    --max_read_size INT        : maximum read fragment size [5000] 
                                 Split longer reads to < INT fragments

    --peak_search_window FLOAT : segment size (FLOAT * read_length) to search local maximum peaks
    --max_num_moments_peaks INT: maximum number of local peaks in each iteration to estimate mean and std [1000]
    
    --max_num_top_peaks INT    : maximum number of top peaks per fragment to be selected for dynamic programming step [10]
    --min_peak_significance_metric    FLOAT : minimum significance metric (mean + FLOAT * std)
    --peak_significance_metric_margin FLOAT : maximum margin (FLOAT * std) of a significant peak to the top peak value 

    --min_dp_score INT         : minimum dynamic programming score
    --dp_guided_band_len       : band length (2 x INT) in BDP 
    
    --m_match INT              : match score [1]
    --m_mismatch INT           : mismatch score [-1]
    --m_gap_open INT           : gap open score [-1]
    --m_gap_ext INT            : gap extension score [-1]
    --ins_portion              : maximum expected ins rate. [.6]
                                 This parameter is used in dynamic programming step.
    --del_portion              : maximum expected del rate. [.3]
                                 This parameter is used in dynamic programming step.
    --max_error_rate           : maximum expected error rate. [.6]
                                 This parameter is used in dynamic programming step.

    --fft_block_size INT       : fft block size [32768]
    --max_num_fft_blocks INT   : number of FFT blocks to be processed simultaneously.
                                 This parameter depends on GPU memory. 
    --max_fft_segment_size INT : maximum size of FFT segments.
                                 This parameter depends on GPU memory.
                                 Either set this or max_num_fft_blocks parameter.
                                 
    --seed                     : seed value for rand() 
                                 In order to replicate exact same results.
    --use_seed                 : use seed value to initialize rand() 
 

Examples

Generate index (FFT blocks)

Index file depends on fft_block_size, kmer_size, max_read_size, window_size and window shift parameters.

    bin/cosine --ref_filename sample_data/Ecoli.fa \
               --output_ref_prefix sample_out/Ecoli \
               --fft_block_size 32768  \
               --kmer_size 3 \
               --max_read_size 5000 \
               --window_size 500 \
               --window_shift 100 \
               --seed 0 \
               --save_ref_fft 1

Generated index files are *.bfft (binary) and *.header which has information on parameter settings and reference sequences.

> more Ecoli.header
@HD     1.00    KS:3    WL:500  WS:10   FL:32768        RS:5000 NS:1
@SQ     SN:U00096.3     LN:4641652      KN:464155       BN:15

@HD
1.00: COSINE version
KS: k-mer size
WL: window size
WS: window shift
FL: FFT block size
RS: maximum read size
NS: number of sequences in the reference file

@FQ [per sequence]
SN: sequence name
LN: sequence length
KN: length of sequence k-mer count vector
BN: number of FFT blocks

Run alignment

    bin/cosine --ref_filename sample_data/Ecoli.fa \
               --read_filename sample_data/Ecoli_reads_acc85.fa \
               --output_ref_prefix sample_out/Ecoli \
               --output_read_prefix sample_out/Ecoli_reads_acc85 \
               --fft_block_size 32768  \
               --max_num_fft_blocks 15 \
               --num_threads 10 \
               --kmer_size 3 \
               --max_read_size 5000 \
               --window_size 500 \
               --window_shift 100   \
               --min_dp_score 100 \
               --seed 1

Generated output files are *.sam (alignments) and intermediate file with detected peak information per read (*.peaks)

Suggested settings for Pacbio and ONT reads

For higher quality reads (read accuracy > 85%), it is recommened to use --kmer_size 3. It run 2-3x faster than --kmer_size 4 without noticable perfromane loss.
For ONT reads, we suggest the setting of --window_size 100 --window_shift 10 , and for PacBio reads, the setting of --window_size 250 --window_shift 50 for optimal performance.

System Requirement

Current implementation of COSINE is on NVIDIA GPU. It requires cufft and curand libraries from NVIDIA.

Cite

Under review.

License

Academic license.

Please contact Wong lab for any commercial license.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.