shawnbrown / get_reader Goto Github PK
View Code? Open in Web Editor NEWSimplified csv.reader-like objects for CSV, Excel, DBF, and more.
Simplified csv.reader-like objects for CSV, Excel, DBF, and more.
Look to add tests for the following:
Encoding | Bytes |
---|---|
UTF-8 | EF BB BF |
UTF-16, big-endian | FE FF |
UTF-16, little-endian | FF FE |
UTF-32, big-endian | 00 00 FE FF |
UTF-32, little-endian | FF FE 00 00 |
Now that get_reader
is its own project, it would be useful to explicitly define its scope and goals. We can always redefine these terms in the future but a working definition can help guide development and prevent scope creep.
The initial motivation for get_reader
was to provide a common interface for reading Unicode CSV data across different versions of Python. Reading Unicode CSV data is very different in Python 3 than it was in Python 2.
Here's what I'm thinking for this working definition:
Essential Properties
The
get_reader
project should:
- Provide a common interface for reading tabular data across different versions of Python.
- Provide simplified interfaces to multiple data sources that might otherwise have unfamiliar APIs (like a simplified version of the IO tools sub-package in
pandas
except without the overhead of a dependency as large as pandas).- Be easily vendorable by simply copying it into the other project's directory (no hard third-party dependencies and no modifications to get_reader's source code).
- Provide broad support for many different versions of Python.
- Read data using memory-efficient iteration (unless explicitly directed to do otherwise)--to support reading data from sources that are larger than available memory.
Non-essential Properties
- Provide tools for working with reader and reader-like objects (e.g.,
ReaderLike
for type checking).Adding an Interface
Before adding an interface (e.g.,
from_sql()
,from_excel()
, etc.) it is useful to ask the following questions:PROs:
- Does the interface unify differences across multiple version of Python? Bonus points if it unifies differences between Python 2 and 3.
- Can the interface reduce the number of objects a user would otherwise need to manage explicitly (automatically closing files or database cursors)?
- Does using the interface take less lines of boilerplate code than it would require to read the data directly? How many lines of boilerplate code does it save? Can it do this reliably without introducing ambiguity or unpredictability?
- Does the interface simplify reading data from sources that might otherwise have an unfamiliar API (e.g., DBF, Excel)?
CONs:
- Does the interface obfuscate a standard or otherwise well-known API?
- Would the feature introduce an API or behavior that is inconsistent with existing interfaces?
- Would including the feature compromise the
get_reader
project's status as a light-weight, easy-to-include dependency?
Change get_reader.from_excel()
to accept keyword arguments that are passed on to the call to xlrd.open_workbook()
in get_reader.py).
See...
Line 372 in f481d95
Line 383 in f481d95
Line 603 in f481d95
Explore the idea of adding a from_sql()
constructor method to get_reader()
:
def from_sql(self, connection, table_or_query):
"""Return a reader object which will iterate over records
from the given table or query result.
"""
...
Add an option for unmerging and filling cell values for Excel files.
Merged horizontal cells:
A | B | |
---|---|---|
1 | 2 | 3 |
Unmerged horizontal cells:
A | B | B |
---|---|---|
1 | 2 | 3 |
Merged vertical cells:
A | B | C |
---|---|---|
1 | 2 | 3 |
4 | 5 |
Unmerged vertical cells:
A | B | C |
---|---|---|
1 | 2 | 3 |
1 | 4 | 5 |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.