petl-developers / petlx Goto Github PK
View Code? Open in Web Editor NEWOptional extensions for petl based on third party libraries.
License: MIT License
Optional extensions for petl based on third party libraries.
License: MIT License
A todataframe() convenience function to load a table into a pandas DataFrame would be useful (probably have to go via numpy structured array).
Hook packages into petl.fluent so they can be used in the fluent style, e.g., etl().fromgff3() etc..
Proposed to add functions fromgff3, gff3unpackinfo and gff3intervaljoin for working with gff3 annotation files.
Ensure display() and displayall() functions in ipython integration module support unicode.
Link points to petl 0.3, very out of date!
Proposed to add utility function fromflagstat as convenience for parsing outputs of samtools flagstat.
There are several flags available when opening and xlsx workbook, proposed to add these also to fromxlsx and pass through to openpyxl:
guess_types will enable (default) or disable type inference when reading cells.
data_only controls whether cells with formulae have either the formula (default) or the value stored the last time Excel read the sheet.
keep_vba controls whether any Visual Basic elements are preserved or not (default). If they are preserved they are still not editable.
Proposed to add some simple vcf utility functions for reshaping vcf files into various simpler table forms. E.g., fromvcf, vcfmeltsamples, vcfunpackinfo, vcfunpacksamples, vcfheader.
Attributes field is none if the actual field has a trailing semicolon.
Add a display() function/method to the ipython integration so you can get multiple tables to display their output from the same code cell.
I wonder if it would be useful to have a wiki for this project. Now that i'm trying to contribute, it would be nice to organize questions, new information etc. in one place. Such things could easily get lost in the Google group. For example, "How do I run the test cases" etc. etc.
Is it worth caching the trees in interval join containers? Currently trees are built each time an iterator is requested.
Proposed to add support for extracting from a GFF3 file for a specific region where the GFF is tabix indexed.
Proposed to add functions based on bounding box queries via the rtree module - http://toblerity.github.com/rtree/tutorial.html
E.g., bboxlookup(), bboxlookupone(), bboxjoin()
Proposed to add function intervalsubtract() to interval module behaving as bedtools subtract.
Proposed to add to/from functions for working with mongodb.
Would be nice to select rows from a given chromosome, without also having to provide coords.
Change default display to 5 rows similar to petl.head().
intervalleftjoin() should work if right table is missing one of the facet keys found in the left
Proposed to add 'Sheet1' as default sheet name.
It would be nice to be able to do...
a = tbl.values('foo').array()
RowContainer has moved in petl 0.11, need to update interval and gff3 modules.
Proposed to add module providing adapters for htsql. Including fromhtsql taking htsql object or connection string followed by query.
Currently the dtype for a structured array is inferred one column at a time. However, passing a sample of the data to np.rec.array() would infer a dtype for the whole table in one go, and would simplify the code.
It would be useful to have functions that support joining tables by overlapping ranges, rather than exact key values. E.g., to join a list of positions in a genome with records from a gene annotation table.
The proposal is to add functions intervaljoin and facetintervaljoin which join two tables based on overlapping ranges, with the faceted version combining a conventional key-based join with a range join.
In addition to the existing interval lookup functions, it is proposed to add faceted versions of these functions, to allow for construction and query of multiple interval trees. The motivating use case is lookup of genomic locations, where you want one interval tree per chromosome.
The proposal is to add functions facetintervallookup, facetintervallookupone, facetintervalrecordlookup, facetintervalrecordlookupone, as faceted versions of the existing interval lookup functions.
Since petl 0.16 the cachetag convention is deprecated, remove cachetag methods in petlx and dependencies on deprecated members.
Proposed to add fromsav using the spss recipe on activestate's website.
Proposed to add functions for working with hdf5 via pytables.
Proposed to add fromdta using statsmodels.
Add support for working directly with Excel (XLS) files, probably via xlrd.
Proposed to add utility function to petlx.interval to return collapsed interval from a table with start, stop coords.
Add fromarray() function to petlx.array module (was postponed from #1).
The import pattern has changed: petl-developers/petl#230
Need to modify the integration module accordingly.
Modify fromxlsx to extract from first sheet in workbook, rather than fixed name 'Sheet1'.
Add a package petlx.ipython with function display() which takes table, converts to HTML and inlines in notebook.
I suspect fromtabix will currently break on files with no header rows.
The result of fromxlsx is not properly wrapped when using petl.fluent or petl.interactive.
Request convenience function torecarray() in petlx.array, to save having to type ".view(recarray)" all the time.
Add a range
keyword argument to fromxlsx to allow extracting a table from a specific cell range.
Current use of the suffix notation in the petlx.interval module is not appropriate as start and stop values are not indices and so not a slice. Proposed to change to a find() method as per the underlying bx-python module.
>>> from petlx import intervallookup
Should import from petl.interval package.
Implement a fromsoup() method using Beatiful Soup to provide more flexibility and power for extracting tables from XML or HTML.
It would be convenient to be able to perform an interval left join but then have matching values from one or more fields the right hand table given as a list of values in a new column. I.e., the output would have one row per input row in the left hand table.
Proposed to add a package petlx.tabix with function fromtabix which supports extracting data from a tab delimited file specifying a sequence region and coords.
This is a placeholder for adding support for creating a database table prior to loading it. I.e., a function similar to the standard petl.todb() function, but automatically generate a schema definition based on the table to be loaded, and execute the table creation, prior to loading.
It looks like sqlalchemy has good support for managing different SQL dialects, so proposed to use sqlalchemy as a dependency.
Moved from petl-developers/petl#225
Implement a toxlsx function using the openpyxl optimized writer: http://pythonhosted.org/openpyxl/optimized.html#optimized-writer
I'd like the most convenient possible method of loading a table of data from a petl row container into a numpy structured array for plotting or numerical processing. You can use the numpy fromiter function, but it would be nice to wrap that to make it even more convenient, with minimal specification of the datatype and no need to duplicate the field names (also maybe even guess the data type for fields if not specified).
The proposal is to add a toarray(tbl, dtype, n) function taking a table (row container) as first positional arg, a dtype as some convenient way of specifying the dtype to use for the structured array (possibly sparse?) and an integer n as a hint on the array size (passed through to fromiter).
It is also proposed to add a fromarray(a) function taking a 1D structured array as input and providing a view as a row container to allow round-tripping to and from numpy arrays and petl transformation functions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.