Code Monkey home page Code Monkey logo

cubscout.jl's Introduction

CUBScout

Stable Dev Build Status

Codon Usage Bias (CUB) in Julia

CUBScout helps you work with codons! Beyond counting codons and finding codon frequency, CUBScout calculates Codon Usage Bias (CUB) and related expressivity predictions. Currently, CUBScout calculates:

  • Six measures of codon usage bias:
    • B, from Karlin and Mrazek, 1996
    • ENC, from Wright 1990
    • ENC', from Novembre, 2002
    • MCB, from Urrutia and Hurst, 2001
    • MILC, from Supek and Vlahovicek, 2005
    • SCUO, from Wan et al., 2004
  • Five expressivity measures based on codon usage bias:
    • CAI, from Sharp and Li, 1987
    • E, from Karlin and Mrazek, 1996
    • FOP, from Ikemura, 1981
    • GCB, from Merkl, 2003
    • MELP, from Supek and Vlahovicek, 2005

CUBScout is based off of the fabulous coRdon package in R by Anamaria Elek, Maja Kuzman, and Kristian Vlahovicek. I am grateful for their clear code and would encourage you to cite coRdon as well when using CUBScout.

You can install CUBScout by:

using Pkg
pkg> add CUBScout

Or for the dev version:

pkg> add CUBScout#main

CUBScout is under active development, and I welcome contributions or suggestions! Additional features I'm working on/would like to incorporate:

  • Performance improvements
  • Plotting support (e.g. BPlots)
  • Additional CUB measures, including S, RCDI, CDC, RCA, RCSU, and RCBS
  • Growth predictions derived from CUB, such as those in growthpred and gRodon

cubscout.jl's People

Contributors

gus-pendleton avatar

Stargazers

Thomas Poulsen avatar Hiroki Ban avatar Camilo García avatar Marco Matthies avatar John Lapeyre avatar

Watchers

 avatar

Forkers

jakobnissen

cubscout.jl's Issues

[RFC] Having methods for BioSequences types

Hi @gus-pendleton

This is such a nice idea to incorporate the metrics of this package! I mentioned on Slack that I'm willing to incorporate some of them in my package GeneFinder.jl in the future. Currently, GeneFinder is able to detect ORFs in a dedicated struct, and then using the BioSequences types it can get the LongSubSeq{DNAAlphabet{4}} in a vector of them, here is an example:

julia> seq = randdnaseq(10^3)

1000nt DNA Sequence:
GGGATTCACATCGTACCGCCCGGGTACCGCCGGCAACGTTGTCAGACCCATTGAAAGGCCTATGGCATCCTGTGCAAG

Using the GeneFinder function gets() it extracts all the possible ORFs in a Vector{LongSubSeq{DNAAlphabet{4}}}:

julia> getcds(seq)

36-element Vector{LongSubSeq{DNAAlphabet{4}}}:
 ATGCCATAG
 ATGGGTCTGACACTGCAGTCACTAGTTGCTAATAACAGACTGTCTCATATTTCATCAGAAGTGAGACTGTGTTGTTAG
 ATGTACTCAGACGATGGTACACGCAGTCAGACACGAATAGCGGGTAATTAA
 ATGGGAATGAAGCACCAGCAGCGCCTTACAGAAGGCACCCAAGTGTCCCGAAGCTGTCTCATATTTCATCAGAAGTGA
 ATGAAGCACCAGCAGCGCCTTACAGAAGGCACCTACCGCCAAGTGTCCCGAAGCTGTCTCATATTTCATCAGAAGTGA
 ATGGTACACGCAGTCAGACACGAATAG
 ATGCTCATGCGACCAGTTAGACCGCAGACTCCGATAGTCGTACAATCCCTAGTTCGGGGTTCGAGTTCTCCTAGATGA
 ATGCGACCAGTTAGACCGCAGACTCCGATAGTCCGATCGGTACAATCCCTAGTTCGGGGTTCGAGTTCTCCTAGATGA
 ATGGGGCATTAG
 ATGGGAAACGACAACATATGTACAATCCCTAGTTCGGGGTTCGAGTTCTCCTAG
 ATGTACAATCCCTAG
 ATGATAATTAGGCGTTCACGATACACAATGGAAGGAGAGGCGGGCCCTGCAGGCTAA
 ATGCGTACCTCGTAG
 ATGGAAGGAGAGGCGGGCCCTGCAGGCTAA
 ATGCACAAGTCGCTTGATATGGGGTGGTAG
 
 ATGCGCTAA
 ATGCTACCGGGTAGGGATAGCCCGATGGGGGGCATGGACCAGGGGCTGGACCCCGTTGCAAATAATTCCTCGAAATGA
 ATGGGGGGCATGGACTTAGGCGGCCCCAGCTTATTCTACCAGGGGCTGGACCCCGTTGCAAATAATTCCTCGAAATGA
 ATGGACTTAGGCGGCCCCAGCTTATTCTACCACCCCATACAGGGGCTGGACCCCGTTGCAAATAATTCCTCGAAATGA
 ATGGTTCTAATCTCGGGCAGCTACGAGGTACGCATGCAACAGGGGCTGGACCCCGTTGCAAATAATTCCTCGAAATGA
 ATGCAAAACCACTCCTCGATCTGTCAGGGGCTGGACCCCGTTGCAAATAATTCCTCGAAATGA
 ATGTTGTCGTTTCCCATTTTGTGTCCGATCGGACTATCGGAGTCTGCGGTCTAA
 ATGACTGTACTAGGTCACGGTCAATCCTAA
 ATGCCCCATACGTGA
 ATGAGCATGCGTCGGAAGCTTTGCTTACAACCGCGACAGAGATCAAGCAATTGA
 ATGCGTCGGAAGCTTTGCTTACAACCGCGACAGAGATCAAGCAATTGA
 ATGAAATATGAGACAGCTTCGGGACACTTGGAACCTTATCTGCTGGTGCTTCATTCCCATCTAGGAATCTGTTATTAG
 ATGAGACAGCTTCGGGACACTTGGAACCTTATCCTTACAATTGCAATGCGGTAG
 ATGCGGTAG
 ATGTGA

So I was curious whether the CUBs or EXPs measurements could be applied to the single CDS or the vector of CDSs. Either way, I was wondering if there is any chance to have CUBScout methods that leverage these dedicated data structures as they are optimized for the Bio ecosystem (I meant the BioSequences types by the way).

Best.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.