Code Monkey home page Code Monkey logo

iranges's Introduction

Project generated with PyScaffold PyPI-Server Unit tests

Integer ranges in Python

Python implementation of the IRanges Bioconductor package.

To get started, install the package from PyPI

pip install iranges

# To install optional dependencies
pip install iranges[optional]

IRanges

An IRanges holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, but other use cases may allow zero or even negative values.

from iranges import IRanges

starts = [1, 2, 3, 4]
widths = [4, 5, 6, 7]
x = IRanges(starts, widths)

print(x)
 ## output
 IRanges object with 4 ranges and 0 metadata columns
                start              end            width
 <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
 [0]                1                5                4
 [1]                2                7                5
 [2]                3                9                6
 [3]                4               11                7

Interval Operations

IRanges supports most interval based operations. For example to compute gaps

x = IRanges([-2, 6, 9, -4, 1, 0, -6, 10], [5, 0, 6, 1, 4, 3, 2, 3])

gaps = x.gaps()
print(gaps)
 ## output

 IRanges object with 2 ranges and 0 metadata columns
                start              end            width
 <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
 [0]               -3               -2                1
 [1]                5                9                4

Or Perform interval set operations

x = IRanges([1, 5, -2, 0, 14], [10, 5, 6, 12, 4])
y = IRanges([14, 0, -5, 6, 18], [7, 3, 8, 3, 3])

intersection = x.intersect(y)
print(intersection)
 ## output
 IRanges object with 3 ranges and 0 metadata columns
                start              end            width
 <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
 [0]               -2                3                5
 [1]                6                9                3
 [2]               14               18                4

Overlap operations

IRanges uses nested containment lists under the hood to perform fast overlap and search based operations. These methods typically return a list of indices that map to each interval in query.

subject = IRanges([2, 2, 10], [1, 2, 3])
query = IRanges([1, 4, 9], [5, 4, 2])

overlap = subject.find_overlaps(query)
print(overlap)
 ## output
 [[1, 0], [], [2]]

Similarly one can perform search operations like follow, precede or nearest.

query = IRanges([1, 3, 9], [2, 5, 2])
subject = IRanges([3, 5, 12], [1, 2, 1])

nearest = subject.nearest(query, select="all")
print(nearest)
 ## output
 [[0], [0, 1], [2]]

Further Information

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

iranges's People

Contributors

jkanche avatar ltla avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

iranges's Issues

Question about the coordinate systems in use

Thanks for developing the genomic tool stack in python. As a user from R/Bioconductor ecosystem, I would like to get some clarifications on the coordinate system in use.

Here is the current description about the coordinate system from BiocPy's documentation

An IRanges holds a start position and a width, and is most typically used to represent coordinates along some genomic sequence. The interpretation of the start position depends on the application; for sequences, the start is usually a 1-based position, ...

>>> from iranges import IRanges
>>> starts = [-2, 6, 9, -4, 1, 0, -6, 10]
>>> widths = [5, 0, 6, 1, 4, 3, 2, 3]
>>> ir = IRanges(starts, widths)
>>> print(ir)
IRanges object with 8 ranges and 0 metadata columns
               start              end            width
    <ndarray[int32]> <ndarray[int32]> <ndarray[int32]>
[0]               -2                3                5
[1]                6                6                0
[2]                9               15                6
[3]               -4               -3                1
[4]                1                5                4
[5]                0                3                3
[6]               -6               -4                2
[7]               10               13                3

This seems to indicate that the BiocPy's coordinate system is 1-based and half open interval, e.g. [start, end).

However, this behavior is different to R's IRanges, where the coordinate system uses a 1-based closed interval, e.g. [start, end]:

> library(IRanges)
> starts = c(-2, 6, 9, -4, 1, 0, -6, 10)
> width = c(5, 0, 6, 1, 4, 3, 2, 3)
> ir <- IRanges(start = starts, width=width)
> ir
IRanges object with 8 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]        -2         2         5
  [2]         6         5         0
  [3]         9        14         6
  [4]        -4        -4         1
  [5]         1         4         4
  [6]         0         2         3
  [7]        -6        -5         2
  [8]        10        12         3

Wanted to clarify if the design is intentional. If so, could we note the different behaviors between BiocPy's and R's IRanges in the documentation? I was wondering if we could also create some utility functions to convert across the coordinate systems, so it's easy to port the existing R scripts that expects a 1-based closed interval.

Thank you again for making the tool!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.