Code Monkey home page Code Monkey logo

python-alignment's People

Contributors

de-code avatar eseraygun avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

python-alignment's Issues

Long sequences run into RecursionError

Since backtraceFrom is implemented by recursion (instead of iteration), calling the aligner on "long" sequences (more than 1000 items) results in a RecursionError with Python defaults. Extending stack depth limit may cause other serious issues.

Allow less greedy insertion/deletion

I am not sure whether there even exists a metric for this: if there is a segment where both sequences deviate a lot in their length, then the current algorithm will do as many substitutions to the left as possible and fill up the rest by insertions/deletions. Wouldn't it be fairer if it tried to distribute those more evenly across the segment's symbol mass?

Multiple sequence alignment?

Any thoughts or future plans on incorporating multiple sequence alignment algorithms?

P.S. really found your library useful 😄

Documentation

Does anyone know whether there is any documentation available on the full functionalities? Maybe someone has something local that he/she is willing to share? So far, I only found the example code snippets that help you to get started.

Especially interested in a short explanation on the differences between:

  • GlobalSequenceAlignment
  • StrictGlobalSequenceAlignment
  • LocalSequenceAlignment

Algorithm details

Hi,

Thank you for providing the module.
I was wondering whether you could provide information on the algorithms used?
e.g. Smith Watermann?
(While your code is well structured I am getting a bit lost with some of the variables)

I am considering using / modifying it, but have currently the following blockers:

  • It doesn't give me back the indices or the original sequence elements (because I am meant to encode them)
  • Recursion is in general good but in this case it leads to a too deep stack level with long sequences

Python 3.x support

Because of using such constructions as print 'v', v this library can not be used with python 3.x.

Unlimited backtrace is impractical

When globally aligning sequences that deviate much, combinatory explosion can quickly leed to excessive runtime memory consumption in the current implementation. And it is not always easy to detect those cases by score heuristics in a prior backtrace=False pass.

I believe these should be added:

  1. a package-exposed variable with a default limit (perhaps relative to the sequences' length)
  2. an optional parameter with an override limit to be able to control the quality-performance trade-off.

(The limit could be based on stack depth or number of alternatives, for example.)

Example: I am trying to align OCRed images of German Fraktur script with their corresponding ground truth text. Sometimes the OCR fails miserably like so:
Mitreden andrer 274. Günſtiger Eindruck der Staatsrathsſitzungen 274. (original line)
*0obe-ondrer '? '-änſiger Eindrue der Torerotheflgg,, (OCR result)
In this case, using StrictGlobalSequenceAligner tries to take more than 20 GB RSS (at which point I quit).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.