Code Monkey home page Code Monkey logo

get_reader's People

Contributors

shawnbrown avatar

Stargazers

 avatar

get_reader's Issues

Add tests for CSVs encoded with a BOM (byte order mark)

Look to add tests for the following:

  • UTF-8
  • UTF-8 BOM
  • UTF-16BE
  • UTF-16LE
  • UTF-32BE
  • UTF-32LE
EncodingBytes
UTF-8EF BB BF
UTF-16, big-endianFE FF
UTF-16, little-endianFF FE
UTF-32, big-endian00 00 FE FF
UTF-32, little-endianFF FE 00 00

Project Scope

Now that get_reader is its own project, it would be useful to explicitly define its scope and goals. We can always redefine these terms in the future but a working definition can help guide development and prevent scope creep.

The initial motivation for get_reader was to provide a common interface for reading Unicode CSV data across different versions of Python. Reading Unicode CSV data is very different in Python 3 than it was in Python 2.

Here's what I'm thinking for this working definition:

Essential Properties

The get_reader project should:

  1. Provide a common interface for reading tabular data across different versions of Python.
  2. Provide simplified interfaces to multiple data sources that might otherwise have unfamiliar APIs (like a simplified version of the IO tools sub-package in pandas except without the overhead of a dependency as large as pandas).
  3. Be easily vendorable by simply copying it into the other project's directory (no hard third-party dependencies and no modifications to get_reader's source code).
  4. Provide broad support for many different versions of Python.
  5. Read data using memory-efficient iteration (unless explicitly directed to do otherwise)--to support reading data from sources that are larger than available memory.

Non-essential Properties

  1. Provide tools for working with reader and reader-like objects (e.g., ReaderLike for type checking).

Adding an Interface

Before adding an interface (e.g., from_sql(), from_excel(), etc.) it is useful to ask the following questions:

PROs:

  • Does the interface unify differences across multiple version of Python? Bonus points if it unifies differences between Python 2 and 3.
  • Can the interface reduce the number of objects a user would otherwise need to manage explicitly (automatically closing files or database cursors)?
  • Does using the interface take less lines of boilerplate code than it would require to read the data directly? How many lines of boilerplate code does it save? Can it do this reliably without introducing ambiguity or unpredictability?
  • Does the interface simplify reading data from sources that might otherwise have an unfamiliar API (e.g., DBF, Excel)?

CONs:

  • Does the interface obfuscate a standard or otherwise well-known API?
  • Would the feature introduce an API or behavior that is inconsistent with existing interfaces?
  • Would including the feature compromise the get_reader project's status as a light-weight, easy-to-include dependency?

from_sql() constructor

Explore the idea of adding a from_sql() constructor method to get_reader():

def from_sql(self, connection, table_or_query):
    """Return a reader object which will iterate over records
    from the given table or query result.
    """
    ...

Unmerge and Fill Excel Values

Add an option for unmerging and filling cell values for Excel files.

Merged horizontal cells:

AB
123

Unmerged horizontal cells:

ABB
123

Merged vertical cells:

ABC
123
45

Unmerged vertical cells:

ABC
123
145

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.