python-postgres / fe Goto Github PK

py-postgresql: Query a PostgreSQL database with Python 3 using the PQv3 protocol.

License: BSD 3-Clause "New" or "Revised" License

Python 92.54% C 7.46%

database postgresql python sql

fe's Introduction

About

py-postgresql is a Python 3 package providing modules for working with PostgreSQL. Primarily, a high-level driver for querying databases.

While py-postgresql is still usable for many purposes, asyncpg and PostgREST are likely more suitable for most applications:

py-postgresql, currently, does not have direct support for high-level async interfaces provided by recent versions of Python. Future versions may change this.

Project Future

Advisory

In v2.0, many, potentially breaking, changes are planned. If you have automated installations using PyPI, make sure that they specify a major version.

Support for older versions of PostgreSQL and Python will be removed. This will allow the driver to defer version parsing fixing (#109), and better prepare for future versions.
The connection establishment strategy will be simplified to only performing one attempt. sslmode parameter should be considered deprecated. v1.4 will provide a new security parameter implying sslmode=require. See (#122) and (#75).
StoredProcedure will be removed. See (#80).

Installation

From PyPI using pip:

python3 -m pip install py-postgresql

From GitHub using a full clone:

git clone https://github.com/python-postgres/fe.git
cd fe
python3 ./setup.py install

Basic Usage

import postgresql
db = postgresql.open('pq://user:password@host:port/database')

get_table = db.prepare("SELECT * from information_schema.tables WHERE table_name = $1")
print(get_table("tables"))

# Streaming, in a transaction.
with db.xact():
	for x in get_table.rows("tables"):
		print(x)

REPL with connection bound to db builtin:

python3 -m postgresql.bin.pg_python -I 'pq://postgres@localhost:5423/postgres'

Documentation

fe's People

Contributors

Stargazers

Watchers

fe's Issues

Unix Sockets are broken

Reported by http://twitter.com/rintavarustus

Unix Connector references SocketCreator instead of SocketFactory.
No documentation for unix keyword(may not work with open).
Needs tests and must patch v0.9 as well.

Give DB-API its own exception hierarchy

Per-request from sqlalchemy, use a custom exception hierarchy for the .driver.dbapi20 module.

date and timestamp infinity broken

datetime.* doesn't support it, so a really high date will need to be used.

Should some text compensation be provided for input? if x == 'infinity': x=DateTimeInfinity?

COPY row parser/serializer

Extend postgresql.string to include a means to make COPY rows from a sequence of Python strings or None's. Where None represents NULL. For strings, this would mean properly escaping field content.

Also, include a parser to split and normalize the fields of a given serialized COPY row.

This will only support the traditional tab-separated values

Add tests for enums

Notably, enum ARRAYs, and composites containing enums.

Expand cluster state to distinguish between not running and not existing

Make it less stupid.

Radius Authentication

Looks like they're adding this in 9.0. Issue opened to note that.

Formalize Type I/O Customization Interfaces

Currently, type I/O is undocumented and in need of some API refactoring.

Also, type I/O on statements and cursors need to be adjustable.

Difficulties involve cases where the I/O routine may need to depend on backend specific information like integer_datetimes.

Working with Threads

Add a section to the driver documentation about working with threads. Most drivers offer thread safety, but driver.pq3 does not.

Discuss: one connection per thread, synchronizing using a queue, and explicit locking.

Rewrite pure-Python process_tuple as a factory

When doing tuple processing, it's always against a sequence of rows with the same number of columns. So, let's dynamically generate the code to process the tuples. By doing so, we should be able to get a performance gain in pure-Python cases.

The number of tuple processors will be consistent with the number of statements created; 2-per statement object.

Subsequently, process_chunk becomes a map() operation.

Rewrite pgpassfile to not depend on csv

The .pgpassfile is far too simple; csv is a bit heavy of a requirement.

Command tag only extracts first word

extract_command() should probably just be self.data.strip(" \t\n0123456789")

Currently: "ALTER TABLE" will come back as "ALTER". Only the first "word" is returned by extract_command().

DB-API documentation

Refine the recommendation of PG-API, talk about the implemented DB-API extensions, link to the PEP.

Use pkgutil to get package resources

I've seen some people complain about projects using file, and that's exactly what py-postgresql uses to get lib*.sql files.

Make a function that will try to hit pkgutil first, and fallback to file based resolution.

HTTP Cluster Management Interfaces

postgresql.cluster has made a serious effort in order to automate the management of PostgreSQL clusters using Python.

The next step is to expose these interfaces so that users may manage their PG data directories using a web interface: Web APIs(JSON and XML-RPC?) and a JavaScript GUI application.

The Web API will need the following actions:

Identify an Installation
Forget an Installation
Initialize a data directory
Remove a data directory
List data directories
Start a cluster
Stop a cluster
Restart a cluster
Reload a cluster
Permanently set a setting in a cluster
Get a setting in a cluster
Get all settings in a cluster
Write the HBA file
Read the HBA file
Create a Database
Drop a Database
... (database management? )=
Create Role
Drop Role
... (role management? )=
WAL directory relocation (other directories?)
Logging Configuration?

Other ideas: cluster configuration templates, configuration restoration(last known working config mgmt).

The GUI part may need to come later, but it would be nice to have something in the first round.

Rewrite cat_messages into buffer_messages

Use a Mutable bytes array to manage the out-going data.

This won't change element3.Message.serialize(), but it should offer some benefits wrt to avoiding reallocating the out-buffer. Additionally, it would allow a more appropriate out-going message consumption loop.

buffer_messages(buffer, messages) writes as many messages as possible to the buffer, returning the number written in full. The buffer size will be automatically increased in cases where the first message is too large for the buffer.

Add a lock interface for supporting advisory locks

PostgreSQL's advisory locks offers users a handy tool.

Add a lock interface that is consistent with threading's Lock():

l = db.lock()
Creates a lock object with a generated identifier

il = db.lock('explicit_id')
Creates a lock object with an explicit identifier.
Will need to derive an numeric ID from the string.

From there, l and il must support the methods provided by Lock():

l.acquire([blocking = 1])
l.release()

Additionaly, CM interfaces:

with l:
...

Where l.enter is consistent with l.acquire(blocking = True).

postgresql.cluster does not work on win32

Two options:

Use pg_ctl.
Implement/steal pg_kill (what pg_ctl uses for supporting windows.)

Part of the ultimate purpose of Cluster() is to provide a postgres daemon manager. Using pg_ctl tends to get in the
way of that: no access to the process result code, no access to any early failure messages on startup(?), and indirect
access to the PostgreSQL "ping" tool for waiting for startup.

Practical problems implementing pg_kill: What happens when core changes it?
Problems using pg_ctl: Difficulty providing quality interfaces(bad past experiences).

The way I see it:

pg_ctl = DBA tool.
postgresql.cluster.Cluster = Python Programmer's tool.

This distinction leads me to think that re-implementing pg_kill will be the right solution.

List the non-fatal socket exceptions

There appears to be many more fatal than non-fatal socket exceptions, so use a list of non-fatal socket exceptions to identify the fatal ones. =(

Should '% ' be identified as a '%' or should it throw an error?

Per the sqlalchemy test failures pointed out by Kevin:

'SELECT flds.intcol % %(intcol_1)s AS anon_1 \nFROM flds ORDER BY flds.idcol' {'intcol_1': 3}

The paramstyle code expects %% to be used to represent %.

Add a Library class for streamlining ILF use.

In my own code, I've come across a pattern of ILF use:

project/lib:

libfoo.sql
foo.py

Where the foo module creates a class, Foo, that references libfoo.sql. Often, some higher-level functionality is needed in addition to the mere collection of queries.

I'm currently thinking of implementing this using a metaclass that resolves the ILF location relative to the .py file of the class' module. Subsequently, we can automatically bind the library to a common location, '.lib', when the Foo class is instantiated against a connection.

Use memoryview more

Seems like it will be a great way to avoid all the memcpy's going on throughout .protocol.

Some problems: port.optimized needs to be taught about them, and some .parse() methods are written expecting bytes().

Generalize options in .clientparameters

Move the core option information in clientparameters into a resource/data module to be referenced by clientparameters.

Currently option information is built and stored using make_option. With python-dev's apparent direction of moving to argparse, this no longer appears to be an appropriate long term format.

(Might be wise to consider multi-lingual support for the descriptions..)

Track average message size

In order to properly calibrate fetch sizes and recvsize, some information about what's happening on the wire would be useful. For instance, if we find that a cursor's rows are relatively small on average, we should increase fetchsize in order to reduce the number of syscalls being issued.

The rather low recvsize has proven to be an impediment to performance.

Add support for binary hstore

title says it all.

New file:

postgresql/types/io/pg_hstore.py

Look at 8.5/9.0 send and recv functions to get started.

hstore is in contrib, so tests need to be conditional.
hstore should map to/from a Python dictionary.

Statement References

Implement a symbol definition extension that allows the binding to identify that the symbol's statement is a reference
to the statement that should be executed.

If the symbol's statement takes no parameters, the reference should be resolved at bind time; if the statement does
take parameters, the symbol's execution must be a two step process: resolve the reference using the given parameters,
and apply the remaining parameters to the dereferenced statement.

This will establish a dynamic query policy: SQL should be used to generate dynamic statements.

While it may be easier in Python with the appropriate abstractions, it's implementation would require more semantic
knowledge of the statement's structure(execute this python code in a special context here, here and there). By deferring
to SQL, it will likely encourage more portable SQL generation.

Name: statement references | symbol pointers | ?

Format: [name&]

Use the '&' to lend toward C's reference operator.
I would like to prepend the character, but I don't want interfere with grep'ing.

Add "do" method to api.Connection

Pretty easy change to support DO statements:

def do(self, code, language = None):
self.execute("DO " + quote_literal(code) + (" LANGUAGE " + language) if language is not None else "")

Remove 2PC support

v1.0 removed documentation, and subsequent bugfixes will include deprecation warnings.

Queue this up on v1.1's list.

PGGSSLIB unsupported

Use ctypes/libffi to implement libpq's usage of PGGSSLIB.

This is a significant barrier for entry for applications that work with many users.

Chances are that this will be implemented as a separate project and then incorporated into the package tree.

Add .column execution method

Returns an iterator to the values in the first column of the result.

The trick is to alter the processing pipeline to quickly target the first column so that the least amount of time is spent recreating tuples.

Get rid of compose()

Often, just using the keyword hack, regular functions perform as well as the C compose(), and are more portable than the C compose().

Of course, performance checks should be made..

RELEASE savepoint after ROLLBACK TO

Looks like rolling back to a savepoint doesn't "release" it.

This would only be, potentially, problematic in cases where xact() is repeatedly entered and exited with exception. Nested xact() are fairly safe because rolling back to an earlier SP will implicitly release the latter SP's.

Avoid the element3.Tuple formality

In some recent profiling, it became evident that using element3.Tuple() was rather costly for little functional gain. The only gain it provides is the ability to query the 'type' attribute.

So by making some special cases response message processing, we can just use tuple() objects to identify tuple data coming from PG.

Implement and Document Listening Services

Currently the support for Notify messages is pretty well undocumented. It looks like payload support is coming in 8.5, so it will be important for this to be more visible.

There are a few parts to implementing an appropriate set of listening interfaces:

Subscribe/Unsubscribe interface (basically, emit N-LISTEN $id statements)?
Notification hook management (punting here; single attribute, let user decide the implementation)
wait() method for simple event loops

The simple wait()[db.listen()?] method will perform pg_sleep() function calls in order to keep an active protocol transaction. This would allow for interrupts to take place, tho, not reliably(small windows between sleep() calls)..

Rewrite pgpassfile to not depend on csv

The .pgpassfile is far too simple; csv is a bit heavy of a requirement.

Add support for user service files. New in 9.0

Nice thing about this is with it being relative to the user so we can use it without forcing the user to set PGSYSCONFIG environment variable.

Symbol Descriptions

Symbols don't provide any metadata. While statements can provide much of this information, procedure references require
some specialization wrt the exposed description.

A full description is needed:

parameter types
parameter names
optional parameters (due to procedure defaults[8.4 feature])
result types
result column names (if applicable; single column results may not have names)
result transformation/finalized type (if any; column/mapping)

How should COPY be described? o_O

I'm thinking along the lines of:

Bs.pg_description, Bs.sql_description

Where each attribute can express type identifiers in a native form...

Statements and procedures are a bit more simple than symbols, so I'm not optimistic about being able to directly reflect
their metadata attributes.

Make ARRAYs less stupid

Primarily, support lowerbounds.

Evaluate some of the algorithms in the methods. I think a few need refactoring.

Simplify xact3.Instruction

Much of the state machine is over complicated to compensate for 'Q' and 'F' commands. By isolating those in different classes and restricting transactions to a single sync, much of the complexity can be simply removed.

Reform PQ Notice/Error Messages

Currently, the identifiers of notice message parts are being translated into descriptive identifiers--this is covers all the parts currently emitted by PG, but will not cover future parts.

PQ Notice messages need to use the single character part identifier to key the dictionary. Then, at a higher level, it can then be translated into something more useful.

postgresql.api.Message instances will also need to carry the original Notice/Error message..

Dynamically Resolve typio dependencies

Currently, when the typio modules are loaded, decimal, xml.etree, and datetime are loaded immediately. These modules and their dependencies are substantial in size so it would be beneficial to only load them when the I/O routines are requested.

Protocol Transaction Queue

Refactor .protocol.client3.Connection to manage Transactions using a Queue.

Currently, Connections can only handle one Protocol Transaction at a given time. This requires that each Protocol Transaction consumes at least one round trip. With a Queue based implementation, the Connection should keep track of which Protocol Transactions have had their Protocol Messages sent to the server, and which ones have not. This would allow multiple Protocol Transactions to be started without waiting for the prior to finish.

Prior, or currently, xact3.Transaction took steps to allow multiple synchronize messages to be sent within a given Protocol Transaction. At one point, the intention was to allow the transaction to be composed so that if grouping were desired, it would be done inside a single Protocol Transaction. However, the difficulty of distinguishing whose response messages belongs to what Protocol Transaction Controller stopped this from being practical. Not to mention, this would cause more significant refactoring of the Controllers.

Automatic Remoting

Per-Hannu's talk,

Add functionality for automatic "remoting" of local Python code.

This feature will depend on a pg-python installation on the remote end.

Improve driver.Message documentation

It covers the properties, but some examples would help.

Re-organize the lib package

Not sure how. =(

Add ldap support to pg_service.conf

8.3 added the ability to lookup connection parameters using LDAP with libpq.

http://www.postgresql.org/docs/8.3/static/libpq-ldap.html
http://python-ldap.sourceforge.net/

Zero Copy

A connection-to-connection COPY management object which provides efficient transmission of COPY data.

This object will use a mutable buffer in conjunction with a function that will identify the current COPY state. Multiple target connections will need to be supported. Exceptions thrown by this object's main method will need to be generalized into a exception identifying the target that caused the failure. This will be one component in order to allow the user to craft fault tolerance capabilities. The next component is the ability to dynamically remove a target from the copy manager. This will allow the COPY to continue after the fault was identified as an acceptable risk.

pg-python initialization code?

Would be nice to be able to one-shot the pg-python initialization code.

postgresql.backend.pl ?

Deprecate gid parameter to db.xact(...)

The gid argument enabled a 2pc transaction. When given a gid, the transaction object required that the prepare() method be used before committing on exit. This was intended to provide an abstraction, but use in any distributed transaction manager is likely to lead to more pain than it's worth. For v1.0, the gid documentation will simply be removed, and in 1.1, we'll probably throw a deprecation warning if it's a provided. Subsequent versions will just throw the TypeError as the feature will not exist.

Cover edge cases in element3.c:cat_messages

Currently it's not checking things like xsize for wrapping. If a user decides cat some gig size messages, it could be problematic.