Code Monkey home page Code Monkey logo

bwa-pssm's Introduction

BWA-PSSM Readme

The usage of BWA-PSSM is very similar to that of BWA. Due to some of the underlying differences, howerver, the recommended parameters are different. More information can be found at the official web site:

http://bwa-pssm.binf.ku.dk/

Installation:

BWA-PSSM requires the gdsl library. It can be downloaded from:

http://home.gna.org/gdsl/

Once gdsl is installed, BWA-PSSM can be compiled by simply running 'make'.

Usage:

Converting damaged DNA to a PSSM

For use in ancient DNA sequencing applications, the original fastq file may be converted to a pssm and used for alignment. This approach modifies the probabilities of the bases likely to be degraded:

cat damaged.fastq | python scripts/fastq2wm.py

Note that if the base pairs are base 64, the -q option should be used:

cat damaged.fastq | python scripts/fastq2wm.py -q 64

Within both the 33 and 64 base pair scripts, the maximum length of the read needs to be set. Only reads shorter than the maximum read length are modified by the damage matrix. It is presumed that maximum length reads have not been degraded and should not be modified.

cat damaged.fastq | python scripts/fastq2wm.py --max-length 76

Low quality data

In the examples below, pssm-file can be either a pssm file as generated by fastq2wm33.pl or a regular fastq file. If it is a regular fastq file, it will be converted to a pssm internally:

bwa pssm -m 2000 index-file pssm-file | bwa samse index-file - pssm-file > out.sam

Helicos data

bwa pssm -N -i2 -o5 -O 0.015 -E 0.015 -D 0.03 -m 2000 index-file pssm-file | bwa samse index-file - pssm-file > out.sam

Short explanation of the new parameters

-z This options indicates that the pssm threshold will be the maximum possible score - * maximum-mismatch-penalty, where maximum-mismatch-penalty is equal to the difference between the match and mismatch score of the highest quality base found in the read. In other words, we allow the equivalent of high quality mismatches in the alignment.

-y Similar to -z, except it uses bwa's default mismatch criteria for creating the mismatch allowance. Suppose bwa's method determines that a read of length l should be allowed n mismatches. Then using -y for such a read will be equivalent to using the option -z (n - ). This can be useful in files containing reads of variable length.

-O Specify a probability for gap opens.

-E Specify a probability for gap extensions.

-D Specify a probability for deletions.

-m Indicate the size of the heap for storing partial matches. This is equivalent to the number of recursions allowed in the search tree. Small values lead to fast running times while large ones lead to more sensitivity.

Error model:

An error model can be introduced in the following format.

Each line has qual score and base followed by 4 scores for A, C, G and T. Fields separated by blanks Like this:

0 A 0. 0. 0. 0.
0 C 0. 0. 0. 0.
:
3 A 1.3 -2.1 -3.1 -2.0
3 C -1.9 2.0 -2.1 -1.8
:
:
50 T ...

The first column corresponds to the position. The second to the base for which the scores are entered. The final four columns correspond to the score for that row of the pssm. So the 6th column of the example above corresponds to a score of 0. for encountering a T, if there was an A present in the read.

If a certain position does not have additional information from the error model, then its PSSM scores are unchanged from the values calculated from the quality score.

bwa-pssm's People

Contributors

bpow avatar jmarshall avatar lh3 avatar pkerpedjiev avatar roelkluin avatar sjackman avatar sorensjm avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.