Bioinformatics

Several Python programs that solve various bioinformatics computational problems from the Rosalind project. The repository also contains three mini projects that focus on analysing molecular data to answer specific questions.

Homework for the Introduction to Bioinformatics course at the University of Ljubljana, Faculty of Computer and Information Science. The course covered the following main topics: molecular biology, probability models, gene finding, sequence alignment, Markov models, models of evolution, phylogenetics and sequence assembly.

Rosalind

Below, we list the solved problems and provide links to the Rosalind webpage containing the description and dataset. The file name represents a Rosalind ID of the individual problem.

DNA.py Counting DNA Nucleotides
RNA.py Transcribing DNA into RNA
REVC.py Complementing a Strand of DNA
GC.py Computing GC Content
PROT.py Translating RNA into Protein
SUBS.py Finding a Motif in DNA
KMER.py k-Mer Composition
LEXF.py Enumerating k-mers Lexicographically
KMP.py Speeding Up Motif Finding
PROB.py Introduction to Random Strings
ORFR.py Finding Genes with ORFs
ORF.py Open Reading Frames
PDST.py Creating a Distance Matrix
HAMM.py Counting Point Mutations
EDIT.py Edit Distance
EDTA.py Edit Distance Alignment
GLOB.py Global Alignment with Scoring Matrix
LOCA.py Local Alignment with Scoring Matrix
GAFF.py Global Alignment with Scoring Matrix and Affine Gap Penalty
BA10A.py Compute the Probability of a Hidden Path
BA10B.py Compute the Probability of an Outcome Given a Hidden Path
BA10C.py Implement the Viterbi Algorithm
BA10I.py Implement Viterbi Learning
BA10K.py Implement Baum-Welch Learning
BA10D.py Compute the Probability of a String Emitted by an HMM
BA10H.py Estimate the Parameters of an HMM
BA10J.py Solve the Soft Decoding Problem
TRAN.py Transitions and Transversions
CONS.py Consensus and Profile
REVP.py Locating Restriction Sites
SPLC.py RNA Splicing
DBRU.py Constructing a De Bruijn Graph
PCOV.py Genome Assembly with Perfect Coverage
GREP.py Genome Assembly with Perfect Coverage and Repeats
LCSM.py Finding a Shared Motif

Mini projects

Individual "porocilo.pdf" files contain more detailed reports for the projects in the Slovenian language.

Gene Prediction

We analyse the whole genome of Mycoplasma genitalium bacteria and try to find the regions that encode genes, especially those that encode proteins. We find the optimal ORF length for our algorithm based on recall and precision measures.

DNA evolution

We search for a gene that would enable us to best distinguish between a Danio rerio fish and a group of six mammals. Using a distance measure derived from the Jukes-Cantor model we can compare the same gene between different species and then plot a phylogenetic tree.

Sequence Reconstruction

We search several gene sequences for the minimal length of fragments that we can then still reconstruct into the original sequence. This is done iteratively by first heuristically estimating the fragment length and then creating a de Brujin graph for smaller and smaller fragments until only one unique Euler's path can be found.

ia6382 / bioinformatics-examples Goto Github PK

bioinformatics-examples's Introduction

Bioinformatics

Rosalind

Mini projects

Gene Prediction

DNA evolution

Sequence Reconstruction

bioinformatics-examples's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent