Code Monkey home page Code Monkey logo

pywfa's Introduction

pyWFA

A python wrapper for wavefront alignment using WFA2-lib

Installation

To download from pypi:

pip install pywfa

From conda:

conda install -c bioconda pywfa

Build from source:

git clone https://github.com/kcleal/pywfa
cd pywfa
pip install .

Overview

Alignment of pattern and text strings can be performed by accessing WFA2-lib functions directly:

from pywfa import WavefrontAligner

pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT"
text =    "TCTATACTGCGCGTTTGGAGAAATAAAATAGT"
a = WavefrontAligner(pattern)
score = a.wavefront_align(text)
assert a.status == 0  # alignment was successful
assert a.cigarstring == "3M1X4M1D7M1I9M1X6M"
assert a.score == -24
a.cigartuples
>>> [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)]
a.cigar_print_pretty()
>>> ALIGNMENT   3M1X4M1D7M1I9M1X6M
  ALIGNMENT.COMPACT 1X1D1I1X
  PATTERN    TCTTTACTCGCGCGTT-GGAGAAATACAATAGT
             ||| |||| ||||||| ||||||||| ||||||
  TEXT       TCTATACT-GCGCGTTTGGAGAAATAAAATAGT

The output of cigar_pretty_print can be directed to a file, rather than stdout using:

a.cigar_print_pretty("file.txt")

To obtain a python str of this print out, access the results object (see below).

Cigartuples follow the convention:

Operation Code
M 0
I 1
D 2
N 3
S 4
H 5
= 7
X 8
B 9

For convenience, a results object can be obtained by calling the WavefrontAligner with a pattern and text:

pattern = "TCTTTACTCGCGCGTTGGAGAAATACAATAGT"
text =    "TCTATACTGCGCGTTTGGAGAAATAAAATAGT"
a = WavefrontAligner(pattern)
result = a(text)  # alignment result
result.__dict__
>>> {'pattern_length': 32, 'text_length': 32, 'pattern_start': 0, 'pattern_end': 32, 'text_start': 0, 'text_end': 32, 'cigartuples': [(0, 3), (8, 1), (0, 4), (2, 1), (0, 7), (1, 1), (0, 9), (8, 1), (0, 6)], 'score': -24, 'pattern': 'TCTTTACTCGCGCGTTGGAGAAATACAATAGT', 'text': 'TCTATACTGCGCGTTTGGAGAAATAAAATAGT', 'status': 0}

# Alignment can also be called with a pattern like this:
a(text, pattern)

# obtain a string in the same format as cigar_print_pretty
a.pretty
>>> 3M1X4M1D7M1I9M1X6M      ALIGNMENT
    1X1D1I1X      ALIGNMENT.COMPACT
          PATTERN    TCTTTACTCGCGCGTT-GGAGAAATACAATAGT
                     |||*|||| ||||||| |||||||||*||||||
          TEXT       TCTATACT-GCGCGTTTGGAGAAATAAAATAGT

Configure

To configure the WaveFrontAligner, options can be provided during initialization:

from pywfa import WavefrontAligner

a = WavefrontAligner(scope="score",
                     distance="affine2p",
                     span="end-to-end",
                     heuristic="adaptive")

Supported distance metrics are "affine" (default) and "affine2p". Scope can be "full" (default) or "score". Span can be "ends-free" (default) or "end-to-end". Heuristic can be None (default), "adaptive" or "X-drop".

When using heuristic functions it is recommended to check the status attribute:

pattern = "AAAAACCTTTTTAAAAAA"
text = "GGCCAAAAACCAAAAAA"
a = WavefrontAligner(heuristic="adaptive")
a(pattern, text)
a.status
>>> 0   # successful alignment, -1 indicates the alignment was stopped due to the heuristic

Default options

The WavefrontAligner will be initialized with the following default options:

Parameter Default value
pattern None
distance "affine"
match 0
gap_opening 6
gep_extension 2
gap_opening2 24
gap_extension2 1
scope "full"
span "ends-free"
pattern_begin_free 0
pattern_end_free 0
text_begin_free 0
text_end_free 0
heuristic None
min_wavefront_length 10
max_distance_threshold 50
steps_between_cutoffs 1
xdrop 20

Modifying the cigar

If desired the cigar can be modified so the end operation is either a soft-clip or a match, this makes the alignment cigar resemble those produced by bwa, for example:

pattern = "AAAAACCTTTTTAAAAAA"
text = "GGCCAAAAACCAAAAAA"
a = WavefrontAligner(pattern)

res = a(text, clip_cigar=False)
print(cigartuples_to_str(res.cigartuples))
>>> 4I7M5D6M

res(text, clip_cigar=True)
print(cigartuples_to_str(res.cigartuples))
>>> 4S7M5D6M

An experimental feature is to trim short matches at the end of alignments. This results in alignments that approximate local alignments:

pattern = "AAAAAAAAAAAACCTTTTAAAAAAGAAAAAAA"
text = "ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA"
a = WavefrontAligner(pattern)

# The unmodified cigar may have short matches at the end:
res = a(text, clip_cigar=False)
res.cigartuples
>>> [(0, 1), (1, 5), (8, 6), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]
res.aligned_text
>>> ACCCCCCCCCCCAAAAACCAAAAAAAAAAAAA
res.text_start, res.text_end
>>> 0, 32

# The minimum allowed block of matches can be set at e.g. 5 bp, which will trim off short matches
res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5)
res.cigartuples
>>> [(4, 12), (0, 7), (2, 5), (0, 5), (8, 1), (0, 7)]
res.aligned_text
>>> AAAAACCAAAAAAAAAAAAA
res.text_start, res.text_end
>>> 12, 32

# Mismatch operations X can also be elided, note this occurs after the clip_cigar stage
res = a(text, clip_cigar=True, min_aligned_bases_left=5, min_aligned_bases_right=5, elide_mismatches=True)
res.cigartuples
>>> [(4, 12), (0, 7), (2, 5), (0, 13)]
res.aligned_text
>>> AAAAACCAAAAAAAAAAAAA

Notes: The alignment score is not modified currently by trimming the cigar, however the pattern_start, pattern_end, test_start and text_end are modified when the cigar is modified.

pywfa's People

Contributors

acenglish avatar ilia-kats avatar kcleal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pywfa's Issues

To upgrade wfa2_lib to v2.2

wfa2_lib v2.2 has support for match score calculation. Would be great to upgrade the wrapper for wfa2_lib v2.2.

Running Overview example with conda pywfa package outputs "Illegal instruction", crashes

I'm trying to use the pywfa package from bionconda and it is causing an immediate crash with the message "Illegal instruction". If I clone the repo and install pywfa code using pip it works. I'm using Ubuntu 20.04.5 LTS and the CPU is "AMD EPYC 7402 24-Core Processor".

Steps to reproduce:

  1. Create a conda env with pywfa: conda create -n pywfa pywfa (currently it installs pywfa 0.5.1, build py39hf95cd2a_2)
  2. Activate the conda environment
  3. Copy and paste the example code from the Overview into a file, and run it

It crashes on the line with: a = WavefrontAligner(pattern)

issue with ubuntu 20 installation

Anyone has issue with ubuntu 20.04 installation with pip.
Python version 3.8.

The error running the test case after installation:

~/.local/lib/python3.8/site-packages/pywfa/tests$ python3 test.py
Traceback (most recent call last):
File "test.py", line 7, in
from pywfa.align import WavefrontAligner, clip_cigartuples, cigartuples_to_str, elide_mismatches_from_cigar
File "/home/ice/.local/lib/python3.8/site-packages/pywfa/init.py", line 2, in
from .align import WavefrontAligner, clip_cigartuples, cigartuples_to_str, elide_mismatches_from_cigar
ImportError: /home/ice/.local/lib/python3.8/site-packages/pywfa/align.cpython-38-x86_64-linux-gnu.so: undefined symbol: wavefront_align

Colab - Import Error "undefined symbol: wavefront_align"

Hello !

I tried running the Overview example on Colab. After installing successfully pywfa-0.4.1 with pip, I tried to import the WaveFrontAligner but the following error is raised:

ImportError                               Traceback (most recent call last)
<ipython-input-5-c314336c1681> in <module>
----> 1 import pywfa.align

/usr/local/lib/python3.9/dist-packages/pywfa/__init__.py in <module>
      1 from __future__ import absolute_import
----> 2 from .align import WavefrontAligner, clip_cigartuples, cigartuples_to_str, elide_mismatches_from_cigar
      3 

ImportError: /usr/local/lib/python3.9/dist-packages/pywfa/align.cpython-39-x86_64-linux-gnu.so: undefined symbol: `wavefront_align`

I will try to run the example locally but I would like to know if someone had the same issue or if there is a way to fix it ?

Illegal instruction: 4

Thanks for this tool, however I have some issues in using this. First I tried

pip install pywfa and I successfully imported the package but when I called a = WavefrontAligner(pattern), I get "Illegal instruction: 4" error.

Then I tried building from source, which failed too. I tried to compile from setup.py, which also failed somewhere with this error:
....

gcc-12 -Wall -g -fPIC -O3 -march=native  -DWFA_PARALLEL -fopenmp -I.. -c wavefront.c -o ../build/wavefront.o
make[2]: Leaving directory '/Users/asoylev/Programs/pywfa/pywfa/WFA2_lib/wavefront'
make --directory=bindings/cpp all
make[2]: Entering directory '/Users/asoylev/Programs/pywfa/pywfa/WFA2_lib'
make[2]: *** bindings/cpp: No such file or directory.  Stop.
make[2]: Leaving directory '/Users/asoylev/Programs/pywfa/pywfa/WFA2_lib'
make[1]: *** [Makefile:93: bindings/cpp] Error 2
make[1]: Leaving directory '/Users/asoylev/Programs/pywfa/pywfa/WFA2_lib'

With conda, I get the following error:

conda install -c bioconda pywfa
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \ 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                                            

UnsatisfiableError:

Note that I'm using mac m1. But I tried conda installation with ubuntu and got the same error.

Thank you

Providing output path for `WavefrontAligner.cigar_print_pretty()` doesn't suppress stderr

Hi,

Thanks for making this convenient python package. I've noticed that if I write the formatted alignment to a file using aligner.cigar_print_pretty("/path/to/file.txt") that it still dumps quite a lot of cigar string text to stderr. It would be great to suppress this so that time isn't wasted printing things, and so the output is still readable.

Thanks

cannot import pywfa

Specs:

Ubuntu: 20.04
Python: 3.9.15

I installed pywfa via pip install into a conda environment (using the local pip of the conda environment). When I try to import it I get the following error:

Traceback (most recent call last):

  File "/tmp/ipykernel_44566/3182516229.py", line 1, in <cell line: 1>
    import pywfa

  File "/home/avicenna/miniconda3/envs/ACORG/lib/python3.9/site-packages/pywfa/__init__.py", line 2, in <module>
    from .align import WavefrontAligner, clip_cigartuples, cigartuples_to_str, elide_mismatches_from_cigar

ImportError: /home/avicenna/miniconda3/envs/ACORG/lib/python3.9/site-packages/pywfa/align.cpython-39-x86_64-linux-gnu.so: undefined symbol: wavefront_align

Import Error on M1 Mac Python 3.10

Hello, I'm getting an import error similar to the other one reported here - but different enough that I'll make it a separate issue.

After pip install pywfa

>>> from pywfa.align import WavefrontAligner
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Caskroom/miniconda/base/envs/py-3.10/lib/python3.10/site-packages/pywfa/__init__.py", line 2, in <module>
    from .align import WavefrontAligner, clip_cigartuples, cigartuples_to_str, elide_mismatches_from_cigar
ImportError: dlopen(/opt/homebrew/Caskroom/miniconda/base/envs/py-3.10/lib/python3.10/site-packages/pywfa/align.cpython-310-darwin.so, 0x0002): symbol not found in flat namespace (_cigar_print_pretty)
>>>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.