Comments (2)
Thanks for developing this excellent resource.
On this point, I'm happy with the current solution posted on readthedocs: pr.PyRanges(pandas.read_table(f))
, however, in terms of being 'too strict', using this appreach did give me the following error message:
The dataframe does not have all the columns Chromosome, Start and End
Which surprised me, until I realized that the columns were "chrom", "start", and "end" (lower-case), and didn't match. The pr.read_bed
function was much more forgiving for variations on these names.
Of course I can manually change these to match the needed "Chromosome", "Start"/"End"(capitalized), but "chrom/start/end" are the standard output from bedtools functions like unionbedg, etc.
With the current solution based on pr.PyRanges(pandas.read_table(f))
is there a way to accommodate column labels like this? In the long run, it would be ideal if the read_bedlike (or whatever) could allow for these column labels so that it could work smoothly together with bedtools (which is pretty important for anyone working with bed-like files).
Thanks again for this project, it looks really promising.
from pyranges.
DataFrames can only have one name per column. Therefore I cannot allow "chrom" and "Chromosome" to refer to the same column.
If I allowed pyranges to have aliases for column names (start and Start) code that used the dataframes underlying the pyranges would break since no aliasing-code is implemented in pandas.
One possibility is to write a function to read bedtools-like data:
def read_bedtools(f):
df = pd.read_table(f)
df = df.rename(columns={"chrom": "Chromosome", "start": "Start", "end": "End", "strand": "Strand"})
return pr.PyRanges(df)
# the opposite for write_bedtools
Thanks for the question and the kind words btw.
from pyranges.
Related Issues (20)
- left join broken with ValueError and TypeError HOT 22
- Future warning when trying to find intersection between non overlapping ranges HOT 9
- Is there a way to do an `intersect()` of a dataframe with itself without symmetric duplicates? HOT 3
- pandas futurewarning in init.py HOT 2
- Additional how-to-pages HOT 1
- Stable non-conflicting interface: v.1.0.0 HOT 2
- Return overlap between PyRanges objects, keeping columns from both? HOT 6
- subtract does not respect strandedness default
- when using .coverage: AttributeError: module 'numpy' has no attribute 'long' HOT 1
- count_overlaps causes overflow errors when number of reads are large HOT 1
- ModuleNotFoundError: No module named 'pyfaidx' HOT 2
- count_overlaps documentation says it supports nb_cpu but it is not implemented HOT 1
- overlap default strandedness argument incorrectly documented
- pr.get_sequence may not be working? HOT 4
- PyRanges.intersect - "invert=True" kwarg behavior HOT 1
- pyranges.cluster() Exception: Starts/Ends not int64 or int32: int64 HOT 9
- Unexpected behaviour with strandedness of pyranges object HOT 8
- PyRanges read_bed produces wrong number of chromosomes when cast to categorical HOT 1
- ResourceWarning: files opened in readers.read_bed are never closed HOT 3
- SyntaxError: invalid syntax for L131 in names.py HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyranges.