Several Python programs that solve various bioinformatics computational problems from the Rosalind project. The repository also contains three mini projects that focus on analysing molecular data to answer specific questions.
Homework for the Introduction to Bioinformatics course at the University of Ljubljana, Faculty of Computer and Information Science. The course covered the following main topics: molecular biology, probability models, gene finding, sequence alignment, Markov models, models of evolution, phylogenetics and sequence assembly.
Below, we list the solved problems and provide links to the Rosalind webpage containing the description and dataset. The file name represents a Rosalind ID of the individual problem.
DNA.py
Counting DNA NucleotidesRNA.py
Transcribing DNA into RNAREVC.py
Complementing a Strand of DNAGC.py
Computing GC ContentPROT.py
Translating RNA into ProteinSUBS.py
Finding a Motif in DNAKMER.py
k-Mer CompositionLEXF.py
Enumerating k-mers LexicographicallyKMP.py
Speeding Up Motif FindingPROB.py
Introduction to Random StringsORFR.py
Finding Genes with ORFsORF.py
Open Reading FramesPDST.py
Creating a Distance MatrixHAMM.py
Counting Point MutationsEDIT.py
Edit DistanceEDTA.py
Edit Distance AlignmentGLOB.py
Global Alignment with Scoring MatrixLOCA.py
Local Alignment with Scoring MatrixGAFF.py
Global Alignment with Scoring Matrix and Affine Gap PenaltyBA10A.py
Compute the Probability of a Hidden PathBA10B.py
Compute the Probability of an Outcome Given a Hidden PathBA10C.py
Implement the Viterbi AlgorithmBA10I.py
Implement Viterbi LearningBA10K.py
Implement Baum-Welch LearningBA10D.py
Compute the Probability of a String Emitted by an HMMBA10H.py
Estimate the Parameters of an HMMBA10J.py
Solve the Soft Decoding ProblemTRAN.py
Transitions and TransversionsCONS.py
Consensus and ProfileREVP.py
Locating Restriction SitesSPLC.py
RNA SplicingDBRU.py
Constructing a De Bruijn GraphPCOV.py
Genome Assembly with Perfect CoverageGREP.py
Genome Assembly with Perfect Coverage and RepeatsLCSM.py
Finding a Shared Motif
Individual "porocilo.pdf" files contain more detailed reports for the projects in the Slovenian language.
We analyse the whole genome of Mycoplasma genitalium bacteria and try to find the regions that encode genes, especially those that encode proteins. We find the optimal ORF length for our algorithm based on recall and precision measures.
We search for a gene that would enable us to best distinguish between a Danio rerio fish and a group of six mammals. Using a distance measure derived from the Jukes-Cantor model we can compare the same gene between different species and then plot a phylogenetic tree.
We search several gene sequences for the minimal length of fragments that we can then still reconstruct into the original sequence. This is done iteratively by first heuristically estimating the fragment length and then creating a de Brujin graph for smaller and smaller fragments until only one unique Euler's path can be found.