Comments (8)
Good point! I never considered that. I should really think about how to do that in a general way for most overlap-based methods.
Having PyRanges-questions about SO is mostly for my vanity :) But I do follow the tag there :)
from pyranges.
The way to do this would be
gr1.slack(10).join(gr2)
Thanks for trying pyranges btw. Ps. it has its own tag at Stack Overflow :)
from pyranges.
This isn't quite the same though because this first adds the slack to gr1, and then does the join, so it returns the slacked gr1 instead of the original gr1.
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+
| Chromosome | Start | End | Strand | Start_b | End_b | Strand_b |
| (category) | (int32) | (int32) | (category) | (int32) | (int32) | (category) |
|--------------+-----------+-----------+--------------+-----------+-----------+--------------|
| chr1 | 0 | 20 | + | 15 | 20 | + |
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+
I guess one could do gr1.join(gr2.slack(10))
but that will also change gr2.
What I'm looking for is similar functionality as the maxgap variable in GenomicRanges::findOverlaps. Where the overlap returns the original ranges in gr1 and gr2, but it will consider ranges separated by a distance of < maxgap.
Thanks for this package. It makes many workflows that were previously impossible in Python now possible (and fast!). I can move questions to SO and leave the issues for github if that would be preferable
from pyranges.
Another good reason to have a slack-option is that you can require at least an overlap of X
, by setting the slack to a negative value. Cf. #22. This would potentially also solve this SO question
from pyranges.
If the min of start is > than slack, you can do
import pyranges as pr
gr1 = pr.PyRanges(chromosomes="chr1", starts=[0], ends=[10], strands = "+")
gr2 = pr.PyRanges(chromosomes="chr1", starts=[15], ends=[20], strands = "+")
j = gr1.slack(10).join(gr2)
j.slack(-10)
# +--------------+-----------+-----------+--------------+-----------+-----------+--------------+
# | Chromosome | Start | End | Strand | Start_b | End_b | Strand_b |
# | (category) | (int32) | (int32) | (category) | (int32) | (int32) | (category) |
# |--------------+-----------+-----------+--------------+-----------+-----------+--------------|
# | chr1 | 10 | 10 | + | 15 | 20 | + |
# +--------------+-----------+-----------+--------------+-----------+-----------+--------------+
# Stranded PyRanges object has 1 rows and 7 columns from 1 chromosomes.
As you can see above, this does not work in your example.
from pyranges.
I am still working on this, it is just that adding slack to join interfered with the new_pos argument to join. I am working on a backwards-compatible way to fix it :)
from pyranges.
I think I have an idea for how to add slack to overlap-operations:
1. before doing the operations, duplicate the Start and End columns,
but rename to <UUID + Start>, <UUID + End>
2. call slack on the gr
3. do operation as usual
4. after the operation is done, replace Start and End
with <UUID+Start> and <UUID+End> and drop the latter
Two more thIngs I need to think about:
-
in some situations you probably only want to add 5' slack or 3' slack. (This is not applicable to the merge/cluster operations, since these are on the same pyranges and hence do not matter
I1+5'slack == i2 + 3'slack
if I1 and I2 are two consecutive intervals.) -
in other cases you might want to add a different slack to the two, like 1Kbp to 5' and 500bp to 3'.
Should the slack-argument take a dict of {"3": 1000, "5": 500}
or just {"3": 1000}
if no 5' extension is desired and just an int if the two cases are equal? Will need to think about this.
from pyranges.
This is out in master :) Example usage: https://github.com/biocore-ntnu/pyranges/blob/master/tests/slack/test_slack.py
from pyranges.
Related Issues (20)
- Duplicated reserved GTF columns with certain attribute tags in `write_gtf` HOT 4
- left join broken with ValueError and TypeError HOT 22
- Future warning when trying to find intersection between non overlapping ranges HOT 9
- Is there a way to do an `intersect()` of a dataframe with itself without symmetric duplicates? HOT 3
- pandas futurewarning in init.py HOT 2
- Additional how-to-pages HOT 1
- Stable non-conflicting interface: v.1.0.0 HOT 2
- Return overlap between PyRanges objects, keeping columns from both? HOT 6
- subtract does not respect strandedness default
- when using .coverage: AttributeError: module 'numpy' has no attribute 'long' HOT 1
- count_overlaps causes overflow errors when number of reads are large HOT 1
- ModuleNotFoundError: No module named 'pyfaidx' HOT 2
- count_overlaps documentation says it supports nb_cpu but it is not implemented HOT 1
- overlap default strandedness argument incorrectly documented
- pr.get_sequence may not be working? HOT 4
- PyRanges.intersect - "invert=True" kwarg behavior HOT 1
- pyranges.cluster() Exception: Starts/Ends not int64 or int32: int64 HOT 9
- Unexpected behaviour with strandedness of pyranges object HOT 8
- PyRanges read_bed produces wrong number of chromosomes when cast to categorical HOT 1
- ResourceWarning: files opened in readers.read_bed are never closed HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyranges.