Code Monkey home page Code Monkey logo

kmer's Introduction

kmer

The k-mer features of a set of DNA/ Genomic sequences are of great significance on revealing the hidden patterns in that sequence population. Here, k is a integer constant and can range from 2 to several dozens, depending on the real application requirements. The k-mer features therefore are widely used in many applications of Bioinformatics, such as for building new prediction methods, etc.

This piece of Python code is for generating k-mer features for a list of DNA/ genomic sequences. Given a list of m DNA sequences, it returns a 2-d array with shape (m, 4k) for the 1-hot representation of the kmer features. For a specific k, the total number of k-mer features is 4k. For a DNA sequence, the value of each k-mer feature could be the number of occurrences of this k-mer, or its percentage of occurrences compared to all the other different k-mer features.

This code is carefully written to ensure efficiency. It runs sufficiently fast for a set of a very large number of sequences.

The code is written in Python 3.

How to use this code:

Example:

from kmer import kmer_featurization  # import the module kmer_featurization from the kmer.py file

seq_list = ['ATCGA', 'TCGAC']  # a list of DNA sequences

k = 6  # choose the value for k
obj = kmer_featurization(k)  # initialize a kmer_featurization object
kmer_features = obj.obtain_kmer_feature_for_a_list_of_sequences(seq_list, write_number_of_occurrences=False)
# If you would like the k-mer features to be the percentage of occurrences (ranging from 0 to 1) as stated above, then leave write_number_of_occurrences as False (the default). If you prefer the features to be the counts for each k-mer occurrence, then set it to True.

# If you just pass one sequence in string:
seq = 'ATCGAGC'
k = 6  
obj = kmer_featurization(k) 
kmer_feature = obj.obtain_kmer_feature_for_one_sequence(seq, write_number_of_occurrences=False)

kmer's People

Contributors

mindai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.