Comments (5)
You are right about both. Sorry for the late response - really busy at work and 10-month old daughter :)
The reconnect logic was only conceived to reconnect idle connections that were silently dropped by the server - for example, if you are running a lazy query once in an hour; and 30 minutes in the middle the server was restarted.
Trying to implement general retry logic on such a deep level is a muddy water from my experience and that's why I leave it to application developer.
Regarding the hangs - this is bug indeed. It turns out that there even is a simpler reproduction - instead of doing restarts, just shutdown the server and your script would hang.
I've fixed this in master. Can you please have alook?
from momoko.
Oops, forgot to show my complete reproduction script:
$ cat pgrestart.py
#!/usr/bin/python
# What happens when momoko is connected to a database and the postgresql server is
# restarted? momoko should recover, but doesn't.
from __future__ import print_function
import sys
import logging
import random
import time
import psycopg2
from tornado import gen, ioloop
import momoko
log = logging.getLogger('pgrestart')
@gen.coroutine
def test(dsn, io_loop):
log.info('creating pool: %r', dsn)
pool = momoko.Pool(dsn=dsn, size=1, ioloop=io_loop)
yield pool.connect()
log.info('pool: %r', pool)
while True:
log.debug('calling pool.getconn()')
try:
conn = yield pool.getconn()
except psycopg2.DatabaseError:
log.exception('error from pool.getconn()')
time.sleep(1)
continue
log.debug('querying conn %r', conn)
cur = yield conn.execute('select pg_backend_pid()')
pid = cur.fetchall()[0][0]
log.info('pid: %r', pid)
cur.close()
pool.putconn(conn)
time.sleep(random.uniform(0.2, 0.8))
pool.close()
def main():
dsn = sys.argv[1]
logging.basicConfig(format='[%(asctime)s %(levelname)-1.1s %(name)s] %(message)s',
level=logging.DEBUG)
logging.getLogger("momoko").setLevel(logging.INFO)
io_loop = ioloop.IOLoop.instance()
future = test(dsn, io_loop)
io_loop.add_future(future, lambda x: io_loop.stop())
io_loop.start()
main()
If I run this normally, the output is very boring:
python pgrestart.py 'host=localhost dbname=psycopg2_test'
[2016-07-04 16:25:46,807 I pgrestart] creating pool: 'host=localhost dbname=psycopg2_test'
[2016-07-04 16:25:46,819 I pgrestart] pool: <momoko.connection.Pool object at 0x7f55c52ffc50>
[2016-07-04 16:25:46,819 D pgrestart] calling pool.getconn()
[2016-07-04 16:25:46,820 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f55bdec27d0>
[2016-07-04 16:25:46,821 I pgrestart] pid: 29496
[2016-07-04 16:25:47,528 D pgrestart] calling pool.getconn()
[2016-07-04 16:25:47,530 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f55bdec27d0>
[2016-07-04 16:25:47,531 I pgrestart] pid: 29496
[...repeat forever...]
But if I start it running, and then in other terminal I run sudo service postgresql restart
, here's what I get almost every time:
[2016-07-04 16:26:52,676 D pgrestart] calling pool.getconn()
[2016-07-04 16:26:52,682 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f020b16a7d0>
[2016-07-04 16:26:52,683 I pgrestart] pid: 29510
[2016-07-04 16:26:53,445 D pgrestart] calling pool.getconn()
[2016-07-04 16:26:53,448 E pgrestart] error from pool.getconn()
Traceback (most recent call last):
File "pgrestart.py", line 32, in test
conn = yield pool.getconn()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available
[2016-07-04 16:26:54,450 D pgrestart] calling pool.getconn()
[2016-07-04 16:26:54,470 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f020b16a7d0>
[2016-07-04 16:26:54,471 I pgrestart] pid: 29543
That is, one query fails while PG is down, and then we're back in business with a new backend. I'd love it if momoko hid that and did the necessary retries for me, but that's not the serious problem.
Here is the serious problem, which I see occasionally:
[2016-07-04 16:27:12,581 D pgrestart] calling pool.getconn()
[2016-07-04 16:27:12,583 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f020b16a7d0>
[2016-07-04 16:27:12,584 I pgrestart] pid: 29634
[2016-07-04 16:27:13,087 D pgrestart] calling pool.getconn()
[2016-07-04 16:27:13,088 E pgrestart] error from pool.getconn()
Traceback (most recent call last):
File "pgrestart.py", line 32, in test
conn = yield pool.getconn()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available
[2016-07-04 16:27:14,089 D pgrestart] calling pool.getconn()
[2016-07-04 16:27:14,090 E tornado.application] Exception in callback <functools.partial object at 0x7f020a8fa890>
Traceback (most recent call last):
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/ioloop.py", line 600, in _run_callback
ret = callback()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/data/src/momoko/momoko/connection.py", line 579, in on_connection_available
conn = fut.result()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available
...and that's it. My app is dead, frozen, hung. Nothing happens until I kill it and restart it.
Note that I can greatly increase the frequency of hangs by decreasing the sleep after "error from pool.getconn()". Eg. with a 1 sec sleep as shown, roughly 1 in 10 restarts result in a hung process. But if I reduce the sleep to 0.1 sec:
--- a/pgrestart.py
+++ b/pgrestart.py
@@ -32,7 +32,7 @@ def test(dsn, io_loop):
conn = yield pool.getconn()
except psycopg2.DatabaseError:
log.exception('error from pool.getconn()')
- time.sleep(1)
+ time.sleep(0.1)
continue
log.debug('querying conn %r', conn)
then my process hangs almost every single time. It's dead easy to reproduce.
from momoko.
And here is the output of running pgrestart.py
with momoko debugging cranked up:
[2016-07-04 16:31:47,029 D pgrestart] calling pool.getconn()
[2016-07-04 16:31:47,029 D momoko] Acquired free connection 8
[2016-07-04 16:31:47,030 D momoko] Obtained connection: 8
[2016-07-04 16:31:47,031 D pgrestart] querying conn <momoko.connection.Connection object at 0x7fe91745a7d0>
[2016-07-04 16:31:47,032 I pgrestart] pid: 29784
[2016-07-04 16:31:47,032 D momoko] About to release connection 8
[2016-07-04 16:31:47,033 D momoko] The connection is alive
[2016-07-04 16:31:47,033 D momoko] Handling free connection 8
[2016-07-04 16:31:47,033 D momoko] No outstanding requests - adding to free pool
[2016-07-04 16:31:47,611 D pgrestart] calling pool.getconn()
[2016-07-04 16:31:47,611 D momoko] Acquired free connection 8
[2016-07-04 16:31:47,611 D momoko] Obtained connection: 8
[2016-07-04 16:31:47,612 D momoko] Method failed Asynchronously
[2016-07-04 16:31:47,612 D momoko] About to release connection 8
[2016-07-04 16:31:47,612 D momoko] The connection is dead
[2016-07-04 16:31:47,612 D momoko] All connections are dead - aborting waiting queue
[2016-07-04 16:31:47,612 E pgrestart] error from pool.getconn()
Traceback (most recent call last):
File "pgrestart.py", line 32, in test
conn = yield pool.getconn()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
value = future.result()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available
[2016-07-04 16:31:47,713 D pgrestart] calling pool.getconn()
[2016-07-04 16:31:47,713 D momoko] All connections are dead
[2016-07-04 16:31:47,715 D momoko] Adding dead connection
[2016-07-04 16:31:47,716 E tornado.application] Exception in callback <functools.partial object at 0x7fe916be8520>
Traceback (most recent call last):
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/ioloop.py", line 600, in _run_callback
ret = callback()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/data/src/momoko/momoko/connection.py", line 579, in on_connection_available
conn = fut.result()
File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available
from momoko.
Oh yeah, I'm seeing this on:
- Ubuntu 15.10
- system package postgresql-9.4 9.4.8-0ubuntu0.15.10
- system package python 2.7.9-1 (but
python --version
reports 2.7.10: weird!) - system package python-psycopg2 2.6.1-1build1
- latest git master of momoko (commit 678f494)
from momoko.
And thanks for thorough reproduction (as usual :)
from momoko.
Related Issues (20)
- momoko makes IOError in ioloop HOT 3
- Behavior of .execute() wo parameters does not match psycopg2 HOT 1
- code in ConnectionContainer.all_dead conflicts with Pool.putconn HOT 4
- The future returned by Connection.execute will never resolve HOT 1
- self.conns.empty() in Pool.close will not work as expected by big chance HOT 3
- How can I run the first sample of momoko HOT 6
- Pool fails to reconnect when using Unix-domain socket (OperationalError: server closed the connection unexpectedly) HOT 18
- Momoko Pool trying to use dead connections HOT 3
- Writing tests for tornado app, which uses momoko for postgres DB. HOT 2
- Cannot add momoko HOT 1
- Please help me figure out why this code freezes forever HOT 8
- Any suggestion on how I'd commit/rollback using a connection pool? HOT 3
- when using RealDictCursor, Pool.register_json error HOT 2
- Typo in connection.py (socker instead of socket) HOT 2
- error if trying to connect pool while Postgress is not online HOT 2
- NamedTupleCursor factory test is executed even if MOMOKO_TEST_HSTORE is disabled HOT 1
- Debian packaging HOT 1
- AttributeError: 'Future' object has no attribute 'set_exc_info' HOT 5
- Not worked with python 3.7 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from momoko.