Code Monkey home page Code Monkey logo

petlx's People

Contributors

alimanfoo avatar dorissun avatar ianfiske avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

petlx's Issues

todataframe

A todataframe() convenience function to load a table into a pandas DataFrame would be useful (probably have to go via numpy structured array).

hook into petl.fluent

Hook packages into petl.fluent so they can be used in the fluent style, e.g., etl().fromgff3() etc..

gff3 utilities

Proposed to add functions fromgff3, gff3unpackinfo and gff3intervaljoin for working with gff3 annotation files.

fromflagstat

Proposed to add utility function fromflagstat as convenience for parsing outputs of samtools flagstat.

fromxlsx flags

There are several flags available when opening and xlsx workbook, proposed to add these also to fromxlsx and pass through to openpyxl:

guess_types will enable (default) or disable type inference when reading cells.
data_only controls whether cells with formulae have either the formula (default) or the value stored the last time Excel read the sheet.
keep_vba controls whether any Visual Basic elements are preserved or not (default). If they are preserved they are still not editable.

vcf utilities

Proposed to add some simple vcf utility functions for reshaping vcf files into various simpler table forms. E.g., fromvcf, vcfmeltsamples, vcfunpackinfo, vcfunpacksamples, vcfheader.

ipy notebook display()

Add a display() function/method to the ipython integration so you can get multiple tables to display their output from the same code cell.

Project wiki?

I wonder if it would be useful to have a wiki for this project. Now that i'm trying to contribute, it would be nice to organize questions, new information etc. in one place. Such things could easily get lost in the Google group. For example, "How do I run the test cases" etc. etc.

fromgff3 with region

Proposed to add support for extracting from a GFF3 file for a specific region where the GFF is tabix indexed.

intervalsubtract

Proposed to add function intervalsubtract() to interval module behaving as bedtools subtract.

to/from mongodb

Proposed to add to/from functions for working with mongodb.

fromhtsql

Proposed to add module providing adapters for htsql. Including fromhtsql taking htsql object or connection string followed by query.

simplify toarray/torecarray()?

Currently the dtype for a structured array is inferred one column at a time. However, passing a sample of the data to np.rec.array() would infer a dtype for the whole table in one go, and would simplify the code.

interval joins

It would be useful to have functions that support joining tables by overlapping ranges, rather than exact key values. E.g., to join a list of positions in a genome with records from a gene annotation table.

The proposal is to add functions intervaljoin and facetintervaljoin which join two tables based on overlapping ranges, with the faceted version combining a conventional key-based join with a range join.

faceted interval lookups

In addition to the existing interval lookup functions, it is proposed to add faceted versions of these functions, to allow for construction and query of multiple interval trees. The motivating use case is lookup of genomic locations, where you want one interval tree per chromosome.

The proposal is to add functions facetintervallookup, facetintervallookupone, facetintervalrecordlookup, facetintervalrecordlookupone, as faceted versions of the existing interval lookup functions.

cachetag is deprecated

Since petl 0.16 the cachetag convention is deprecated, remove cachetag methods in petlx and dependencies on deprecated members.

fromsav

Proposed to add fromsav using the spss recipe on activestate's website.

to/from hdf5

Proposed to add functions for working with hdf5 via pytables.

fromdta

Proposed to add fromdta using statsmodels.

to/from xls

Add support for working directly with Excel (XLS) files, probably via xlrd.

collapsed intervals

Proposed to add utility function to petlx.interval to return collapsed interval from a table with start, stop coords.

fromarray

Add fromarray() function to petlx.array module (was postponed from #1).

ipython display table

Add a package petlx.ipython with function display() which takes table, converts to HTML and inlines in notebook.

torecarray()

Request convenience function torecarray() in petlx.array, to save having to type ".view(recarray)" all the time.

interval use of suffix notation ([]) is inappropriate

Current use of the suffix notation in the petlx.interval module is not appropriate as start and stop values are not indices and so not a slice. Proposed to change to a find() method as per the underlying bx-python module.

interval doc error

>>> from petlx import intervallookup

Should import from petl.interval package.

fromsoup

Implement a fromsoup() method using Beatiful Soup to provide more flexibility and power for extracting tables from XML or HTML.

interval left join as list of values

It would be convenient to be able to perform an interval left join but then have matching values from one or more fields the right hand table given as a list of values in a new column. I.e., the output would have one row per input row in the left hand table.

fromtabix via pysam

Proposed to add a package petlx.tabix with function fromtabix which supports extracting data from a tab delimited file specifying a sequence region and coords.

support automated database table creation upon loading

This is a placeholder for adding support for creating a database table prior to loading it. I.e., a function similar to the standard petl.todb() function, but automatically generate a schema definition based on the table to be loaded, and execute the table creation, prior to loading.

It looks like sqlalchemy has good support for managing different SQL dialects, so proposed to use sqlalchemy as a dependency.

Moved from petl-developers/petl#225

to/from numpy structured array

I'd like the most convenient possible method of loading a table of data from a petl row container into a numpy structured array for plotting or numerical processing. You can use the numpy fromiter function, but it would be nice to wrap that to make it even more convenient, with minimal specification of the datatype and no need to duplicate the field names (also maybe even guess the data type for fields if not specified).

The proposal is to add a toarray(tbl, dtype, n) function taking a table (row container) as first positional arg, a dtype as some convenient way of specifying the dtype to use for the structured array (possibly sparse?) and an integer n as a hint on the array size (passed through to fromiter).

It is also proposed to add a fromarray(a) function taking a 1D structured array as input and providing a view as a row container to allow round-tripping to and from numpy arrays and petl transformation functions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.