pudo / dataset Goto Github PK

View Code? Open in Web Editor NEW

4.7K 4.7K 296.0 800 KB

Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.

Home Page: https://dataset.readthedocs.org/

License: MIT License

Python 99.49% Makefile 0.51%

database python sql

dataset's Introduction

dataset: databases for lazy people

In short, dataset makes reading and writing data in databases as simple as reading and writing JSON files.

Read the docs

To install dataset, fetch it with pip:

$ pip install dataset

Note: as of version 1.0, dataset is split into two packages, with the data export features now extracted into a stand-alone package, datafreeze. See the relevant repository here.

dataset's People

Contributors

Stargazers

Watchers

Forkers

carocha aklaver brettbeaudoin frabcus scraperdragon scraperwiki pombredanne abelsonlive jonathaneunice dnatag aashish24 rtvt123 wojons ygmpkk vdt 3onyc kusl mnutini stkem diegoguimaraes vhagerty danieldrepper winshuai brutus msjgriffiths jfw scaledev cli248 stefanw gwillink bag-of-projects kod3r lifuzu adityau mt0803 strogo qz267 vernvernstrom charsmith aaronmartin0303 irontablee is00hcw nzale thendcn thejeshgn victorkashirin xrotwang tlevine aniversarioperu urwithajit9 gcmalloc mvtango adammendoza annatorsten jmdcal cerdman datrollmon cadooo dulanism ghing disko geekpradd saimn fawques smiren christophersu rv816 fgcarto mgaitan horacio kaajavi reubano colelove calm4wei askebos extempore thisiswei jljs madprog klahnakoski delejnr rlugojr andersos shobhitmittal evanvolgas perfettiful 3manuek kevinhsu maserg aliciawyy jmerkow saun4app rodrigues882013 robertomalatesta stefco adamchainz digideskio johnlpuc163 arturhoo rickyvaughn2

dataset's Issues

sqlalchemy-migrate throwing exception

Came across this module and wanted to play around with it. However, it kept on throwing an error stating "can't import name schemavisitor" when I imported it. Traced the error back to sqlalchemy-migrate and saw other people running into issues with sqlalchemy being incompatible with the latest version of sqlalchemy. So I uninstalled sqlalchemy and installed an older version and sure enough, the error went away. Thought I would make you aware of the issue. I'm super excited to play around with this module, looks really handy!

dataset requires mysql-python to work with MySQL

I get this error when I use dataset without the MySQLdb module installed:

Traceback (most recent call last):
  File "freeze_data.py", line 7, in <module>
    db = dataset.connect(config['db_path'])
  File "/Library/Python/2.7/site-packages/dataset-0.3.8-py2.7.egg/dataset/__init__.py", line 29, in connect
    return Database(url, schema=schema, reflectMetadata=reflectMetadata)
  File "/Library/Python/2.7/site-packages/dataset-0.3.8-py2.7.egg/dataset/persistence/database.py", line 36, in __init__
    engine = create_engine(url, **kw)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/__init__.py", line 332, in create_engine
    return strategy.create(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/strategies.py", line 64, in create
    dbapi = dialect_cls.dbapi(**dbapi_args)
  File "/Library/Python/2.7/site-packages/sqlalchemy/connectors/mysqldb.py", line 57, in dbapi
    return __import__('MySQLdb')
ImportError: No module named MySQLdb

which is easily solved with:

pip install mysql-python

How do I silence the "DEBUG:dataset.persistence…" messages?

I'm getting logging messages appearing in my script output. How do I silence them? And why are they even there in the first place?

>>> import dataset
>>> dt = dataset.connect('sqlite:///:memory:')
>>> table = dt['test']
DEBUG:dataset.persistence.database:Creating table: test on Engine(sqlite:///:memory:)
>>> table.insert({"id": "bv56fzi", "name": "This will fail"})
DEBUG:dataset.persistence.table:Creating column: name (<class 'sqlalchemy.types.UnicodeText'>) on 'test'

FreezeException error

Installed dataset from PyPi using pip. When I ran it I got the following:

aklaver@killi:~> datafreeze datafreeze_test.yml
Traceback (most recent call last):
File "/usr/local/bin/datafreeze", line 9, in
load_entry_point('dataset==0.3', 'console_scripts', 'datafreeze')()
File "/usr/local/lib/python2.7/site-packages/dataset/freeze/app.py", line 92, in main
config = Configuration(args.config)
File "/usr/local/lib/python2.7/site-packages/dataset/freeze/config.py", line 36, in init
raise FreezeException(unicode(ioe))
NameError: global name 'FreezeException' is not defined

I looked in config.py and FreezeException is not imported.

Python3-compatible MySQL connecter

The quick start tutorial suggests mysql-db as the connecter. However, mysql-db doesn't support Python 3 right now, see https://pypi.python.org/pypi/MySQL-python/1.2.5.

An alternative is MySQL Connnecter, which supports Python 3, and sqlalchemy also supports it (see document here)

mysql+mysqlconnector://<user>:<password>@<host>[:<port>]/<dbname>

I think the tutorial could also suggests MySQL Connecter as a Python3-compatible MySQL connecter.

Let `upsert` return the ID of the row added / updated.

Just started playing around with dataset and was thinking this would be nice. I keep writing small insert / update wrappers right now.

update/updsert will delete keys in row?

hi folks,

I just found that in https://github.com/pudo/dataset/blob/master/dataset/persistence/table.py#L128

we actually delete keys in the row, e.g.

c = {'id':1, 'val':'name'}
table.upsert(c, ['id'])
# NOW c == {'val': 'name'} HERE!

is there anyway to avoid this?
i mean keeping c unchanged after calling update/upsert.

Thanks.

Add upsert functionality

Common enough use case I believe.

Would get rid of code like

if not table.update(...):
    table.insert(...)

Automatically generate created_at/updated_at

Quoting an E-Mail

When creating a new table, is it possible to add a column that would automatically update a timestamp when the record was created or updated?

SqlAlchemy seems to support this: http://stackoverflow.com/questions/13978554/is-possible-to-create-column-in-sqlalchemy-which-is-going-to-be-automatically-po

table.find() leaves query open, can't run operations inside loop

Apparently, table.find() blocks any access to the DB so that one cannot update() or do anything else inside a loop based on find().

datafreeze: Set up a default logger.

Clairifcation on insert types

I have run into an issue where I need to explicitly state the datatype of a column I'm inserting. Luckily, dataset provides this functionality, but it isn't documented very well.

Case in point

In reading that and the documentation for Table.create_column it would appear that I would need to set the type of something I wanted to make a float to something like sqlalchemy.Float in the type dict. That isn't working, even if I import sqlalchemy.

Here is my actual source code. Any help figuring out how to explicitly set the types would be much appreciated!

Python 3 Support

SQLAlchemy already supports Python 3.
Also please update the PyPI description for not supporting python 3+.

insert() is not returning last row id

table.insert(...) is not returning last row id. Instead it returns 0, even on successful insert.

connection schema handling on inserts

On my postgresql database (9.3.2), executing the quickstart examples works when connection schema is not specified:

>>>import dataset
>>>db = dataset.connect()
>>>table = db['person']
>>>print(table.columns)
['id']
>>>table.insert(dict(name='John Doe', age=46))
1

However, when connection schema is specified, the created table is not found when using insert - it seems to be looking instead in the current search_path:

>>>import dataset
>>>db = dataset.connect(schema="myschema")
>>>table = db['person']
>>>print(table.columns)
['id']
>>>table.insert(dict(name='John Doe', age=46))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/dataset/persistence/table.py", line 66, in insert
    self._ensure_columns(row, types=types)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/dataset/persistence/table.py", line 198, in _ensure_columns
    self.create_column(column, _type)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/dataset/persistence/table.py", line 224, in create_column
    Column(name, type)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/alembic/operations.py", line 365, in add_column
    schema=schema
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/alembic/ddl/impl.py", line 127, in add_column
    self._exec(base.AddColumn(table_name, column, schema=schema))
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/alembic/ddl/impl.py", line 76, in _exec
    conn.execute(construct, *multiparams, **params)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1651, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 717, in execute
    return meth(self, multiparams, params)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 67, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 771, in _execute_ddl
    compiled
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 927, in _execute_context
    context)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1076, in _handle_dbapi_exception
    exc_info
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 920, in _execute_context
    context)
  File "/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 425, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (ProgrammingError) relation "person" does not exist
 'ALTER TABLE person ADD COLUMN age INTEGER' {}

make class ResultIter inexhaustible

Currently, ResultIter is a self-iterators, which means it can be used once. For instance,

>>> rows = table.all()
>>> list(rows)
>>> # return the correct list of OrderedDict
>>> list(rows)
>>> # raise ResourceClosedError: This result object is closed.

So we cannot reused rows. If we want to do that, we need to use table.all() again.

I am wondering if there is any approach that we can make the ResultIter reusable.

Connection pooling for postgresql

It seems pooling is disabled for postgres by using NullPool. This causes my multithreaded app deadlock after having too many DB connections.

Is there any reason that we can't use the default pool configuration provided by sqlalchemy?

Return primary key from insert

insert currently returns None, but having the id of the just inserted element would be super helpful.

Support transactions for data and DDL ops

have engine.begin(), .commit() and .rollback().

Order of an OrderedDict not preserved

It would be nice if, when passing an OrderedDict to 'insert', the order of the columns was preserved: i.e.

>>> data = collections.OrderedDict([('A', 1), ('B', 2), ('C', 3), ('D', 4)])
>>> mytable.insert(data, ['A', 'B'])
>>> mytable.find_one()[-1:]
[1, 2, 3, 4]

(above from memory, sorry)

Obviously if OrderedDicts with different key orderings are provided, the database wouldn't be able to handle both orders, and should stick with the first.

I'll look deeper into this when I get a moment - was hoping to have a pull request sorted but it's a little more complicated than I expected.

test_table_cache_updates failed in Python 3

test_table_cache_updates test case fails in Python 3 and MySQL, but works fine in Python 2

def test_table_cache_updates(self):
     tbl1 = self.db.get_table('people')
     tbl1.insert(dict(first_name='John', last_name='Smith'))
     tbl2 = self.db.get_table('people')

     assert list(tbl2.all()) == [(1, 'John', 'Smith')]

I checked the result of list(tbl2.all()). It was [(1, 'Smith', 'John')] ( the order of first_name and second_name was different).

I also checked tbl1.table in Python 2 and Python 3 separately. The order of first name and second name was different in schema.Table.

>>> # python 2
>>> tbl1.table
>>> Table('people', MetaData(bind=Engine(mysql+mysqlconnector://chen:***@localhost/contacts)), Column('id', INTEGER(display_width=11), table=<people>, primary_key=True, nullable=False), Column('first_name', TEXT(), table=<people>), Column('last_name', TEXT(), table=<people>), schema=None)

>>> # python 3
>>> tbl1.table
>>> Table('people', MetaData(bind=Engine(mysql+mysqlconnector://chen:***@localhost/contacts)), Column('id', INTEGER(display_width=11), table=<people>, primary_key=True, nullable=False), Column('last_name', TEXT(), table=<people>), Column('first_name', TEXT(), table=<people>), schema=None)

So I guess this is a sqlalchemy issue, and we may not use list(tbl2.all()) == [(1, 'John', 'Smith')]. How about tbl1.table == tbl2.table?

create_table with custome id and type problem

When I go through the Database API under MySQL, table3 = db.create_table('population3', primary_id='race', primary_type='Text') raises a OperationalError,

OperationalError: (OperationalError) (1170, "BLOB/TEXT column 'race' used in key  
specification without a key length") '\nCREATE TABLE population3 (\n\trace TEXT 
NOT NULL, \n\tPRIMARY KEY (race)\n)\n\n' ()

However, if I change the primary_type to Integer, it works fine.

hstore error

Postgres 9.0.13
psycopg 2.4.6
SQLAlchemy 0.8.0

Ran datafreeze against Postgres with following in *.yaml file:

exports:

query: "SELECT category, plant_type FROM cell_per"
filename: "type/{{plant_type}}.csv"
format: csv

and got error:

aklaver@killi:~> datafreeze datafreeze_test.yaml
INFO:dataset.freeze.app:Running: SELECT category, plant_type FROM cell_per
/usr/local/lib64/python2.7/site-packages/sqlalchemy/dialects/postgresql/base.py:1672: SAWarning: Did not recognize type 'hstore' of column 'record_fld'
name, format_type, default, notnull, domains, enums, schema)

The directory and the files where created and the data was correct.

ValueError: Couldn't parse datetime string '1383102430626' - value is not a string.

I am opening a sqlite3 database with dataset. The database was generated by ghost - the nodejs blog software. I get a ValueError at line 49 in dataset/persistence/util.py presumably whenever dataset encounters a datetime.

>>> db = dataset.connect('sqlite:///ghost.db')
>>> for post in db['posts']:
...     print post
...
...
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/watson/github/dataset-test/env/lib/python2.7/site-packages/dataset/persistence/util.py", line 49, in ne
xt
    return dict(zip(self.keys, row))
ValueError: Couldn't parse datetime string '1383102430626' - value is not a string.


>>> db['posts'].columns
set([u'status', u'meta_description', u'meta_title', u'markdown', u'uuid', u'language', u'title', u'image', u'updated_a
t', u'created_by', u'id', u'created_at', u'featured', u'html', u'updated_by', u'published_by', u'published_at', u'auth
or_id', u'slug', u'page'])

However, the same query in SQL works fine:

>>> for post in db.query('select * from posts'):
>>>     print post

Postgresql self.table.count() doesn't return count when offset is used and ignores limit. So db.[table].find(_offset=N) retuns "TypeError: 'NoneType' object has no attribute 'getitem'"

To support this argument:

SELECT count(*) FROM table_1 OFFSET 20;

count

(0 rows)

Explanation: Postgres calculates query before and then adds offset to it. so 'select count(*)...1' will return 700 (according to my test database) in row 1 the postgres will apply the offset of 20 to it where it finds no result, hence zero rows.

SELECT count(*) FROM table_1 limit 2;

count

700
(1 row)
Explanation: similar to offset , postgres calculates the query and then applies the limit to it. since only one row was to be returned and 1<2, that one row gets returned.

SELECT count(table_1.id) AS tbl_row_count FROM table_1 where table_1.id in (select table_1.id from table_1 LIMIT ALL OFFSET 10);

tbl_row_count

(1 row)

So dataset needs to apply third logic while calculating count in find() method.

Row Offset in SQL Server

I'm trying to use dataset with SQL Server. I'm trying to pull an entire table that has 180k rows and 52 columns.

Essentially I've written

rows = [x for x in table.all()]

This runs for a few minutes and then throws an exception:

---------------------------------------------------------------------------
CompileError                              Traceback (most recent call last)
<ipython-input-16-a9df205f5f00> in <module>()
----> 1 lrows = list(get(rows))

<ipython-input-15-9df065062177> in get(rows)
      1 rows = table.all()
      2 def get(rows):
----> 3     for item in rows:
      4         try:
      5             yield item

C:\Anaconda\lib\site-packages\dataset\persistence\util.pyc in next(self)
     42         row = self.rp.fetchone()
     43         if row is None:
---> 44             if self._next_rp():
     45                 return self.next()
     46             else:

C:\Anaconda\lib\site-packages\dataset\persistence\util.pyc in _next_rp(self)
     32     def _next_rp(self):
     33         try:
---> 34             self.rp = self.result_proxies.next()
     35             self.count += self.rp.rowcount
     36             self.keys = self.rp.keys()

C:\Anaconda\lib\site-packages\dataset\persistence\table.pyc in <genexpr>((q,))
    333             queries.append(self.table.select(whereclause=args, limit=qlimit,
    334                                              offset=qoffset, order_by=order_by))
--> 335         return ResultIter((self.database.executable.execute(q) for q in queries))
    336 
    337     def __len__(self):

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in execute(self, statement, *multiparams, **params)
   1612 
   1613         connection = self.contextual_connect(close_with_result=True)
-> 1614         return connection.execute(statement, *multiparams, **params)
   1615 
   1616     def scalar(self, statement, *multiparams, **params):

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in execute(self, object, *multiparams, **params)
    660                                                 object,
    661                                                 multiparams,
--> 662                                                 params)
    663         else:
    664             raise exc.InvalidRequestError(

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _execute_clauseelement(self, elem, multiparams, params)
    752             compiled_sql = elem.compile(
    753                             dialect=dialect, column_keys=keys,
--> 754                             inline=len(distilled_params) > 1)
    755 
    756         ret = self._execute_context(

C:\Anaconda\lib\site-packages\sqlalchemy\sql\expression.pyc in compile(self, bind, dialect, **kw)
   1910             else:
   1911                 dialect = default.DefaultDialect()
-> 1912         return self._compiler(dialect, bind=bind, **kw)
   1913 
   1914     def _compiler(self, dialect, **kw):

C:\Anaconda\lib\site-packages\sqlalchemy\sql\expression.pyc in _compiler(self, dialect, **kw)
   1916         Dialect."""
   1917 
-> 1918         return dialect.statement_compiler(dialect, self, **kw)
   1919 
   1920     def __str__(self):

C:\Anaconda\lib\site-packages\sqlalchemy\dialects\mssql\base.pyc in __init__(self, *args, **kwargs)
    683     def __init__(self, *args, **kwargs):
    684         self.tablealiases = {}
--> 685         super(MSSQLCompiler, self).__init__(*args, **kwargs)
    686 
    687     def visit_now_func(self, fn, **kw):

C:\Anaconda\lib\site-packages\sqlalchemy\sql\compiler.pyc in __init__(self, dialect, statement, column_keys, inline, **kwargs)
    287         # dialect.label_length or dialect.max_identifier_length
    288         self.truncated_names = {}
--> 289         engine.Compiled.__init__(self, dialect, statement, **kwargs)
    290 
    291         if self.positional and dialect.paramstyle == 'numeric':

C:\Anaconda\lib\site-packages\sqlalchemy\engine\interfaces.pyc in __init__(self, dialect, statement, bind, compile_kwargs)
    785             self.statement = statement
    786             self.can_execute = statement.supports_execution
--> 787             self.string = self.process(self.statement, **compile_kwargs)
    788 
    789     @util.deprecated("0.7", ":class:`.Compiled` objects now compile "

C:\Anaconda\lib\site-packages\sqlalchemy\engine\interfaces.pyc in process(self, obj, **kwargs)
    804 
    805     def process(self, obj, **kwargs):
--> 806         return obj._compiler_dispatch(self, **kwargs)
    807 
    808     def __str__(self):

C:\Anaconda\lib\site-packages\sqlalchemy\sql\visitors.pyc in _compiler_dispatch(self, visitor, **kw)
     72 
     73             def _compiler_dispatch(self, visitor, **kw):
---> 74                 return getter(visitor)(self, **kw)
     75         else:
     76             # The optimization opportunity is lost for this case because the

C:\Anaconda\lib\site-packages\sqlalchemy\dialects\mssql\base.pyc in visit_select(self, select, **kwargs)
    745             # to use ROW_NUMBER(), an ORDER BY is required.
    746             if not select._order_by_clause.clauses:
--> 747                 raise exc.CompileError('MSSQL requires an order_by when '
    748                                               'using an offset.')
    749 

CompileError: MSSQL requires an order_by when using an offset.

Creating a table presents a DatsetException

Hi,

When I attempt to create a table:

conn = dataset.connect('sqlite:///:memory:')
table = conn.get_table("test_table", primary_id='id', primary_type="Text")

I get the following exception:

DatasetException: The primary_type has to be either 'Integer' or 'Text'.

Hmm, maybe it's because I'm importing unicode_literals.

table = conn.get_table("test_table", primary_id='id', primary_type=str("Text"))

Still the same exception.

In Database.create_table, where this exception is raised:

if primary_type is 'Integer':
    #snip
elif primary_type is 'Text':
    #snip
else:
    raise DatasetException()

Within the create_table function...

>>> primary_type
'Text'
>>> primary_type is 'Text'
False
>>> primary_type == 'Text'
True

is tests to verify identity, not simple equality. The default case of primary_type = 'Integer' works just fine due to interning. Defining the string outside of the function scope causes an issue. Am I doing something wrong here, or is this a bug?

Read default URI from environment

We could read a default environment variable, such as DATABASE_URI when connect() is called with no arguments. This way dataset-based applications would have a uniform mechanism for their configuration.

Foreign Keys? (Question)

Hi!

This is no issue, but a Question.
First of all: This is a great approach. Very simple to use, built on SqlAlchemy, i love it!

Do you have a concept in mind, how to extend the project with Foreignkeys ?

Connecting to DB with many tables is slow

I like the Pythonic interface dataset provides. When I try to do a connect to a MySQL DB with many (thousands) of tables, it seems like it's trying to load all the table names upon connection, so it takes a long time.

db = dataset.connect('mysql+pymysql://user:pass@host/db')

I didn't see any way in the docs to just connect and then perform operations on a DB without it taking time to try to cache those bits of metadata. Any suggestions?

More searchable name

I'm taking a hard look at dataset, but I found after I had misplaced my link to dataset that the terms I was using to search for dataset in google were not leading me back despite several variations. This is an interesting project. I hope it can be more easily found.

Connecting to large databases takes forever

Connecting to a large PostgreSQL database (+10gigs) via dataset.connect takes several minutes.

By inspecting the Database constructor I found out that the problem is caused by metadata.reflect.

Is there a way to work around this?

Add example code for using db.query with SQLAlchemy expressions

I tried to figure this out, but I don't know where to get the session object from.

Contribute Fusion tables support to SQLAlchemy

This would allow users to connect to a Fusion tables database, making it easy to query and export that database.

traceback on 'import dataset'

Just cloned trunk, pip install -e .
an error in sqlalchemy.migrate preventing import

import dataset
...
.../python2.7/site-packages/migrate/changeset/ansisql.py
---> 10 from sqlalchemy.schema import SchemaVisitor

ImportError: cannot import name SchemaVisitor

seems it's moved

from sqlalchemy.sql.base import SchemaVisitor

Stale table cache casues incorrect results

The following code snippet demonstrates the issue:

import dataset

db = dataset.connect('sqlite:///:memory:')

# We create test_table and insert a single row into it
t1 = db.get_table('test_table')
t1.insert(dict(key='value'))

# We access the table with a different Table object and we cannot see the
# property named key that we added above
t2 = db.get_table('test_table')
print list(t2.all()) # [OrderedDict([(u'id', 1)])]

The problem occurs only when accessing the data through two different Table objects.

PS I cheated a little bit because added break at the end of the loop in Table.find - without it Table.all() never returns. However the problem described above appears in the version uploaded to PyPI too.

Returning as dict

I would really like it if object were returned instead of dictionaries, it would make looping over the results a lot easier

Missing MSSQL dialect?

I'm trying to interface with a MSSQL database and insert data to a table, but I get a KeyError in visitor.py

Maybe I'm not using it right, or maybe its a bug?

Redshift compatibility

Are there any plans on making dataset compliant with RedShift? It's SQL is a bit more limited than psql, but should work for the most part. It would be a great addition to include in dataset.

StopIteration exception propagating on Table.all()

Here is a relevant stacktrace:

    Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 1218, in _when_complete
        callback()
      File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 1239, in _execute_method
        self._when_complete(method(*self.path_args, **self.path_kwargs),
      File "/usr/local/lib/python2.7/dist-packages/tornado_json/utils.py", line 60, in _wrapper
        output = rh_method(self, input_)
      File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 2395, in wrapper
        return method(self, *args, **kwargs)
      File "/home/hamza/gitr/cutthroat-server/src/cutthroat/api/room.py", line 128, in get
        return self.db_conn.list_rooms()
      File "/home/hamza/gitr/cutthroat-server/src/cutthroat/db.py", line 390, in list_rooms
        } for r in self.db['rooms'].all()
      File "/usr/local/lib/python2.7/dist-packages/dataset-0.5.0-py2.7.egg/dataset/persistence/table.py", line 398, in all
        return self.find()
      File "/usr/local/lib/python2.7/dist-packages/dataset-0.5.0-py2.7.egg/dataset/persistence/table.py", line 355, in find
        return ResultIter((self.database.executable.execute(q) for q in queries))
      File "/usr/local/lib/python2.7/dist-packages/dataset-0.5.0-py2.7.egg/dataset/persistence/util.py", line 30, in __init__
        raise StopIteration
    StopIteration

This started occuring in 0.5.0. 0.4.0 was not affected.

CSV import with type detection

It would be nice if dataset made loading data from CSV easy too. Currently, you need to go via the csv.DictReader boilerplate to get things into a database.

Perhaps I am whinging too much about needing to insert this code into the start of my IPython Notebooks?

with open('yummy-data.csv') as f:
    headers = csv.reader(f).next()

records = []

with open('yummy-data.csv') as f:
    reader = csv.DictReader(f, fieldnames=headers)
    for i, row in enumerate(reader):
        if i > 0:
            records.append(row)

Unable to run the test suite

After running python setup.py test -s test all the tests fail. I get the same stack trace in all test cases:

ERROR: test_upsert_all_key (test.test_persistence.TableTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/grn/p/dataset/test/test_persistence.py", line 125, in setUp
    self.tbl.insert(row)
  File "/home/grn/p/dataset/dataset/persistence/table.py", line 74, in insert
    res = self.database.executable.execute(self.table.insert(row))
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1598, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 664, in execute
    return meth(self, multiparams, params)
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 282, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 828, in _execute_context
    None, None)
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1023, in _handle_dbapi_exception
    exc_info
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 185, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 824, in _execute_context
    context = constructor(dialect, self, conn, *args)
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 525, in _init_compiled
    param.append(processors[key](compiled_params[key]))
  File "/home/grn/.virtualenvs/dataset/local/lib/python2.7/site-packages/sqlalchemy/sql/type_api.py", line 853, in process
    return process_param(value, dialect)
  File "/home/grn/p/dataset/dataset/persistence/util.py", line 67, in process_bind_param
    return (value / 1000 - self.epoch).total_seconds()
StatementError: unsupported operand type(s) for /: 'datetime.datetime' and 'int' (original cause: TypeError: unsupported operand type(s) for /: 'datetime.datetime' and 'int') u'INSERT INTO weather (date, place, temperature) VALUES (?, ?, ?)' []

I saw an old issue related to SQLite and datetime that introduced a fix for this database. Interestingly the tests almost pass after commenting out the fix - the test test_find hangs up.

I'm on Debian Jessie (+ backports), Python 2.7.6, SQLite 3.8.2, SQLAlchemy 0.9.1.

datafreeze: Support slugifying a specific column in a result set.

There should be a an option in "exports" to transform a specific function into a slug when freezing to a file.

freeze as a JSON serializable object?

I saw the freeze api docs claimed how to export a result set into files.

However, i am wondering that, based on this, could we make it also support exporting a result set into a JSON serializable object?

Currently we don't support this feature in dataset, does we?

Relational model

Congrats for this work which is looking good !
I'm particulary interested by the automatic schema feature, but do you plan to deal with 1 to many and many to many relationship in the future ?
Thanks,

all()

Table.all() is a no argument shortcut to find() which in turn has the signature:

find(_limit=None, _offset=0, _step=5000, order_by='id', **filter)

Any particular reason the order_by='id' argument is there? I realize in the ORM world 'id' is almost taken to be a given, but if you are running against many databases that is not the case. You then get the following error:

con["cell_per"].all()

KeyError Traceback (most recent call last)
in ()
----> 1 con["cell_per"].all()

/usr/local/lib/python2.7/site-packages/dataset/persistence/table.pyc in all(self)
333
334 rows = table.all()"""
--> 335 return self.find()
336
337 def iter(self):

/usr/local/lib/python2.7/site-packages/dataset/persistence/table.pyc in find(self, _limit, _offset, _step, order_by, **filter)
268 if isinstance(order_by, (str, unicode)):
269 order_by = [order_by]
--> 270 order_by = [self._args_to_order_by(o) for o in order_by]
271
272 args = self._args_to_clause(filter)

/usr/local/lib/python2.7/site-packages/dataset/persistence/table.pyc in _args_to_order_by(self, order_by)
239 return self.table.c[order_by[1:]].desc()
240 else:
--> 241 return self.table.c[order_by].asc()
242
243 def find(self, _limit=None, _offset=0, _step=5000,

/usr/local/lib64/python2.7/site-packages/sqlalchemy/util/_collections.pyc in getitem(self, key)
152
153 def getitem(self, key):
--> 154 return self._data[key]
155
156 def delitem(self, key):

KeyError: 'id'

Switch migrations code to Alembic

cf. http://alembic.readthedocs.org/en/latest/ops.html#module-alembic.operations

Case-(in)sensitivity: dataset tries to create existing columns

Consider the following:

Column column exists in Table table. I try to create a row in table, however, I accidentally capitalize Column like so because I forgot it was lowercase. Now, dataset will see try to create Column in table because by this line, set(["Column"]) - set([u'column']) will still result in set(["Column"]) because dataset is case-sensitive here.

Now, the issue is, since MySQL is case-INsensitive (for columns at least, on all platforms), dataset will try and create Column but MySQL will throw an OperationalError because such a column, called column instead of Column, already exists.

I was wondering what was your take on this? I think dataset should be handling this in a case-insensitive way (so maybe sanitize all column names from calls by .lower()ing them first?)

I also realize this wouldn't happen if the programmer was paying attention and actually remembered what naming convention he was following, but even so, dataset should SOMEHOW avoid this situation so the programmer never has to see an OperationalError.

Can't handle multibyte chars correctly

Hi. Thank you for create nice library.

I've got multibyte problem.

If MySQL's charset is utf8, SQLAlchemy's database connection url should set like following.
mysql://foo:foo_password@localhost/bar_table?charset=utf8&use_unicode=0

Currently, database.py ignore url query string.
So, dataset can't handle multibyte correctly.
SQLAlchemy convert to multibyte char to ? automatically.

--- database.py.orig    2013-11-16 00:55:53.000000000 +0900
+++ database.py 2013-11-16 00:55:46.000000000 +0900
@@ -24,6 +24,7 @@
         kw['poolclass'] = NullPool
     self.lock = threading.RLock()
     self.local = threading.local()
+       original_url = url
     if '?' in url:
         url, query = url.split('?', 1)
         query = parse_qs(query)
@@ -33,7 +34,7 @@
             if len(schema_qs):
                 schema = schema_qs.pop()
     self.schema = schema
-        engine = create_engine(url, **kw)
+       engine = create_engine(original_url, **kw)
     self.url = url
     self.engine = construct_engine(engine)
     self.metadata = MetaData(schema=schema)

Is there any reason to ignore url query?
IMHO dataset should not ignore url query for handling multibyte.

Regards.

I need an option to stop the default "id" column being created on new tables

I'm dumping some data into a database. It's my first time using dataset. I try this…

import dataset

dt = dataset.connect('sqlite:///:memory:')
table = dt['test']
table.insert({"id": "bv56fzi", "name": "This will fail"})

And I get an exception…

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/dataset/persistence/table.py", line 70, in insert
    res = self.database.executable.execute(self.table.insert(row))
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 1614, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 662, in execute
    params)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 874, in _execute_context
    context)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 1024, in _handle_dbapi_exception
    exc_info
  File "/Library/Python/2.7/site-packages/sqlalchemy/util/compat.py", line 163, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
    context)
  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/default.py", line 324, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (IntegrityError) datatype mismatch u'INSERT INTO test (id, name) VALUES (?, ?)' ('bv56fzi', 'This will fail')

It took me a while to work out that my id column was conflicting with the id INTEGER PRIMARY KEY created by dataset when the table was made.

I'm not sure whether the id is being created out of necessity (because you can't create a table with no columns) or out of design (because most of your users want an auto-incrementing id column and the default rowid is too obscure?).

Either way, it's a shame you do, because it meant there was seemingly no way for me to use dataset in this particular project. I ended up using Dumptruck which I'd really rather not do.

Are there plans to allow creation of a table without the automatic id INTEGER PRIMARY KEY column?

pudo / dataset Goto Github PK

dataset's Introduction

dataset: databases for lazy people

dataset's People

Contributors

Stargazers

Watchers

Forkers

dataset's Issues

count

count

tbl_row_count

con["cell_per"].all()

Recommend Projects

Recommend Topics

Recommend Org