Code Monkey home page Code Monkey logo

Comments (8)

endrebak avatar endrebak commented on June 17, 2024 1

Good point! I never considered that. I should really think about how to do that in a general way for most overlap-based methods.

Having PyRanges-questions about SO is mostly for my vanity :) But I do follow the tag there :)

from pyranges.

endrebak avatar endrebak commented on June 17, 2024

The way to do this would be

gr1.slack(10).join(gr2)

Thanks for trying pyranges btw. Ps. it has its own tag at Stack Overflow :)

from pyranges.

jnasser3 avatar jnasser3 commented on June 17, 2024

This isn't quite the same though because this first adds the slack to gr1, and then does the join, so it returns the slacked gr1 instead of the original gr1.

+--------------+-----------+-----------+--------------+-----------+-----------+--------------+
| Chromosome | Start | End | Strand | Start_b | End_b | Strand_b |
| (category) | (int32) | (int32) | (category) | (int32) | (int32) | (category) |
|--------------+-----------+-----------+--------------+-----------+-----------+--------------|
| chr1 | 0 | 20 | + | 15 | 20 | + |
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+

I guess one could do gr1.join(gr2.slack(10)) but that will also change gr2.

What I'm looking for is similar functionality as the maxgap variable in GenomicRanges::findOverlaps. Where the overlap returns the original ranges in gr1 and gr2, but it will consider ranges separated by a distance of < maxgap.

Thanks for this package. It makes many workflows that were previously impossible in Python now possible (and fast!). I can move questions to SO and leave the issues for github if that would be preferable

from pyranges.

endrebak avatar endrebak commented on June 17, 2024

Another good reason to have a slack-option is that you can require at least an overlap of X, by setting the slack to a negative value. Cf. #22. This would potentially also solve this SO question

from pyranges.

endrebak avatar endrebak commented on June 17, 2024

If the min of start is > than slack, you can do

import pyranges as pr
gr1 = pr.PyRanges(chromosomes="chr1", starts=[0], ends=[10], strands = "+")
gr2 = pr.PyRanges(chromosomes="chr1", starts=[15], ends=[20], strands = "+")
j = gr1.slack(10).join(gr2)
j.slack(-10)
# +--------------+-----------+-----------+--------------+-----------+-----------+--------------+
# | Chromosome   |     Start |       End | Strand       |   Start_b |     End_b | Strand_b     |
# | (category)   |   (int32) |   (int32) | (category)   |   (int32) |   (int32) | (category)   |
# |--------------+-----------+-----------+--------------+-----------+-----------+--------------|
# | chr1         |        10 |        10 | +            |        15 |        20 | +            |
# +--------------+-----------+-----------+--------------+-----------+-----------+--------------+
# Stranded PyRanges object has 1 rows and 7 columns from 1 chromosomes.

As you can see above, this does not work in your example.

from pyranges.

endrebak avatar endrebak commented on June 17, 2024

I am still working on this, it is just that adding slack to join interfered with the new_pos argument to join. I am working on a backwards-compatible way to fix it :)

from pyranges.

endrebak avatar endrebak commented on June 17, 2024

I think I have an idea for how to add slack to overlap-operations:

1. before doing the operations, duplicate the Start and End columns,
   but rename to <UUID + Start>, <UUID + End>
2. call slack on the gr
3. do operation as usual
4. after the operation is done, replace Start and End 
with <UUID+Start> and <UUID+End> and drop the latter

Two more thIngs I need to think about:

  1. in some situations you probably only want to add 5' slack or 3' slack. (This is not applicable to the merge/cluster operations, since these are on the same pyranges and hence do not matter I1+5'slack == i2 + 3'slack if I1 and I2 are two consecutive intervals.)

  2. in other cases you might want to add a different slack to the two, like 1Kbp to 5' and 500bp to 3'.

Should the slack-argument take a dict of {"3": 1000, "5": 500} or just {"3": 1000} if no 5' extension is desired and just an int if the two cases are equal? Will need to think about this.

from pyranges.

endrebak avatar endrebak commented on June 17, 2024

This is out in master :) Example usage: https://github.com/biocore-ntnu/pyranges/blob/master/tests/slack/test_slack.py

from pyranges.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.