Code Monkey home page Code Monkey logo

rna-splice-sites-recognition's Introduction

RNA Splice Site Recognition

In Eukaryotes, the initial RNA or precursor-mRNA that is transcribed from a gene’s DNA template must be processed before it becomes a mature messenger RNA (mRNA) that can direct the synthesis of protein. One of the steps in this processing, called “RNA splicing,” involves the removal or splicing out of certain sequences referred to as intervening sequences, or introns. The final mRNA thus consists of the remaining sequences, called exons, which are connected to one another through the splicing process. Some RNA molecules have the capability to splice themselves but some splice alternatively. The splicing research [1] discovered that alternative patterns of splicing within a single precursor-mRNA at different junctions could yield in a variety of mature mRNAs. Most of the alternative splicing is caused by a mutation of a splice site which can reduce spliceosome binding specificity of that splice site or completely make it loss of function. The alternative splicing can produce different functional proteins, which could lead to causing many diseases in human [2].

Many studies have proposed models to recognize the splice sites to reveal which splice sites contain a mutation that may cause a splicing error. Position-Weight-Matrix (PWM) is a model commonly used to recognize DNA-binding motif sequences by transforming sequence data into a probability matrix [3]. It can be used to recognize simple sequence structures. However, growing evidence indicates that sequence specificities can be more accurately captured by more complex techniques [4-6]. Recently, some deep learning methods outperformed other recent approaches in many problems including DNA-binding sites recognition [7-9] but not much work has specifically been done in recognition of RNA Splice sites. Therefore, I plan to apply a Convolutional Neural Network (CNN) model to the task of recognizing the splice sites and classifying the sequence whether is pathogenic or not.

Goals

  • Propose a framework to predict the effect of variants on splicing events
  • Establish the pipeline based on deep learning models to recognize splice sites and predict the variants

Requirements and Installation

  • Python 3+
  • Keras with TensorFlow backend
  • NumPy
  • Pandas
  • sklearn

Pseudocode for data preparation, i.e., splice site data and variant data

  1. Getting splice site data Link
  2. Getting variant data: Link

Acknowledgement

I would like to thank Daniele Merico and Worrawat Engchuan for providing the dataset and guiding about a splicing. The dataset was provided as part of competition tracks of DLAI 2

References

  1. Berget, Susan M., Claire Moore, and Phillip A. Sharp. "Spliced segments at the 5′ terminus of adenovirus 2 late mRNA." Proceedings of the National Academy of Sciences 74.8 (1977): 3171-3175.
  2. Faustino, Nuno André, and Thomas A. Cooper. "Pre-mRNA splicing and human disease." Genes & development 17.4 (2003): 419-437.
  3. Stormo, Gary D. "DNA binding sites: representation and discovery." Bioinformatics 16.1 (2000): 16-23.
  4. Rohs, Remo, et al. "Origins of specificity in protein-DNA recognition." Annual review of biochemistry 79 (2010): 233-269.
  5. Kazan, Hilal, et al. "RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins." PLoS computational biology 6.7 (2010): e1000832.
  6. Siggers, Trevor, and Raluca Gordaˆn. "Protein–DNA binding: complexities and multi-protein codes." Nucleic acids research 42.4 (2013): 2099-2111.
  7. Alipanahi, Babak, et al. "Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning." Nature biotechnology 33.8 (2015): 831-838.
  8. Zhou, Jian, and Olga G. Troyanskaya. "Predicting effects of noncoding variants with deep learning-based sequence model." Nature methods 12.10 (2015): 931-934.
  9. Kelley, David R., Jasper Snoek, and John L. Rinn. "Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks." Genome research 26.7 (2016): 990-999.
  10. Dean, Victoria, et al. “Deep learning for branch point selection in RNA splicing”, MLCB2016, https://vdean.github.io/MLCB2016_paper.pdf

rna-splice-sites-recognition's People

Contributors

smiile8888 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.