Code Monkey home page Code Monkey logo

hamstring's Introduction

This python module generates, checks, and corrects quaternary Hamming barcodes. The theory for generating quaternary (DNA) Hamming barcodes comes from the publication Bystrykh, L. V. (2012). Generalized DNA Barcode Design Based on Hamming Codes. PLoS ONE. Currently, the hamstring module only works with Hamming7,4 encoding, but may be generalized to other sizes of data and parity bits.

Figure1 Bystrykh et al.

Usage

Generate Hamming DNA barcodes

generateBarcodes.py [-h] out

arguments:
  out         output barcode file name
  -h, --help  show this help message and exit

example output:

index base4 nucleotide gc
0 0000000 AAAAAAA 0.0
1 3303001 TTATAAC 0.14
2 2202002 GGAGAAG 0.57
3 1101003 CCACAAT 0.43
4 0303010 ATATACA 0.14

Checksum a list of Hamming DNA barcodes

checkBarcodes.py [-h] list

arguments:
  list        list of barcodes to check, one per line
  -h, --help  show this help message and exit

example output:

in	fixed	checksum
CATAACT	CAAAACT	A > T at pos 3
AGAGAGA	AGAGAGA	ok
TCACAGC	TCACAGC	ok
GAACAGG	GAAAAGG	A > C at pos 4
CTATAGT	CTATAGT	ok
TTTAAAN	NNNNNNN	bad
NNNNNNN	NNNNNNN	bad

Tag fastq reads with a barcode (for generating a simulated dataset)

tagReads.py [-h] -e nb fastq out

arguments:
  nb          number of barcodes to generate
  fastq       fastq file to process
  out         name for new fastq file
  -e, --erate error rate for single barcode base errors. default=0.05
  -h, --help  show this help message and exit

example input:

@HWI-EAS179_0001:5:1:7:119#0/1
CAGGGCGCGAATGNTTTGAGAGGGANATTGGAAANNNNNGATAGANNGGNCTATNNTGNNNNNNNNNNNNNNNNNN
+
HIHHHGHHHFDHH#EHHH?HHHDH>#DGGG@7@?##########################################

example output:

@HWI-EAS179_0001:5:1:7:119#0/1
CCATGGCCAGGGCGCGAATGNTTTGAGAGGGANATTGGAAANNNNNGATAGANNGGNCTATNNTGNNNNNNNNNNNNNNNNNN
+
HHHHHHHHIHHHGHHHFDHH#EHHH?HHHDH>#DGGG@7@?##########################################

Check and fix barcodes in fastq file

fixFastq.py [-h] [-s] list fastq out

arguments:
  list        list of barcodes used in experiment, one per line
  fastq       fastq file to process
  out         name for new fastq file
  -s, --strict  change all barcodes not in list to 'N'
  -h, --help  show this help message and exit

example input:

@HWI-EAS179_0001:5:1:7:119#0/1
GCATGGCCAGGGCGCGAATGNTTTGAGAGGGANATTGGAAANNNNNGATAGANNGGNCTATNNTGNNNNNNNNNNNNNNNNNN
+
HHHHHHHHIHHHGHHHFDHH#EHHH?HHHDH>#DGGG@7@?##########################################

example output:

@HWI-EAS179_0001:5:1:7:119#0/1
CCATGGCCAGGGCGCGAATGNTTTGAGAGGGANATTGGAAANNNNNGATAGANNGGNCTATNNTGNNNNNNNNNNNNNNNNNN
+
HHHHHHHHIHHHGHHHFDHH#EHHH?HHHDH>#DGGG@7@?##########################################

Requirements

  • Python 2.7

Hamstring Module

The core hamstring module has no external module dependencies and should run under any OS.

base4Encode(n,d) is used to convert decimal notation n to quaternary notation with d leading digits. example:

hamstring.base4Encode(22, 4)
[0, 1, 1, 2]

generateHamming(data,parity) is used to generate DNA quaternary Hamming codes from list of quaternary digits data with parity number of parity bits. example:

hamstring.generateHamming([0,1,1,2],3)
{'parity': [1, 1, 0], 'nucleotide': 'CCAACCG', 'data': [0, 1, 1, 2], 'base4': '1100112'}

decodeHamming(barcode,parity) is used to decode barcode nucleotide Hamming string with parity number of parity bits, and perform error correction if needed. example:

hamstring.decodeHamming('CCAACCG',3)
{'nucleotide': 'CCAACCG', 'chksum': 'ok'}

hamstring.decodeHamming('CCATCCG',3)
{'nucleotide': 'CCAACCG', 'chksum': 'T to A at 4'}

hamstring.decodeHamming('CCANCCG',3)
{'nucleotide': 'NNNNNNN', 'chksum': 'bad'}

Author

Matt Shirley - mdshw5'at'gmail'.'com - http://mattshirley.com

hamstring's People

Contributors

mdshw5 avatar

Watchers

James Cloos avatar Adrian Viehweger avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.