eseraygun / python-alignment Goto Github PK
View Code? Open in Web Editor NEWNative Python library for generic sequence alignment
License: BSD 3-Clause "New" or "Revised" License
Native Python library for generic sequence alignment
License: BSD 3-Clause "New" or "Revised" License
Since backtraceFrom
is implemented by recursion (instead of iteration), calling the aligner on "long" sequences (more than 1000 items) results in a RecursionError
with Python defaults. Extending stack depth limit may cause other serious issues.
I am not sure whether there even exists a metric for this: if there is a segment where both sequences deviate a lot in their length, then the current algorithm will do as many substitutions to the left as possible and fill up the rest by insertions/deletions. Wouldn't it be fairer if it tried to distribute those more evenly across the segment's symbol mass?
Any thoughts or future plans on incorporating multiple sequence alignment algorithms?
P.S. really found your library useful 😄
Does anyone know whether there is any documentation available on the full functionalities? Maybe someone has something local that he/she is willing to share? So far, I only found the example code snippets that help you to get started.
Especially interested in a short explanation on the differences between:
GlobalSequenceAlignment
StrictGlobalSequenceAlignment
LocalSequenceAlignment
Hi,
Thank you for providing the module.
I was wondering whether you could provide information on the algorithms used?
e.g. Smith Watermann?
(While your code is well structured I am getting a bit lost with some of the variables)
I am considering using / modifying it, but have currently the following blockers:
Because of using such constructions as print 'v', v
this library can not be used with python 3.x.
When globally aligning sequences that deviate much, combinatory explosion can quickly leed to excessive runtime memory consumption in the current implementation. And it is not always easy to detect those cases by score heuristics in a prior backtrace=False
pass.
I believe these should be added:
(The limit could be based on stack depth or number of alternatives, for example.)
Example: I am trying to align OCRed images of German Fraktur script with their corresponding ground truth text. Sometimes the OCR fails miserably like so:
Mitreden andrer 274. Günſtiger Eindruck der Staatsrathsſitzungen 274.
(original line)
*0obe-ondrer '? '-änſiger Eindrue der Torerotheflgg,,
(OCR result)
In this case, using StrictGlobalSequenceAligner
tries to take more than 20 GB RSS (at which point I quit).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.