Code Monkey home page Code Monkey logo

Comments (5)

haizaar avatar haizaar commented on July 22, 2024 1

You are right about both. Sorry for the late response - really busy at work and 10-month old daughter :)

The reconnect logic was only conceived to reconnect idle connections that were silently dropped by the server - for example, if you are running a lazy query once in an hour; and 30 minutes in the middle the server was restarted.
Trying to implement general retry logic on such a deep level is a muddy water from my experience and that's why I leave it to application developer.

Regarding the hangs - this is bug indeed. It turns out that there even is a simpler reproduction - instead of doing restarts, just shutdown the server and your script would hang.

I've fixed this in master. Can you please have alook?

from momoko.

gward avatar gward commented on July 22, 2024

Oops, forgot to show my complete reproduction script:

$ cat pgrestart.py
#!/usr/bin/python

# What happens when momoko is connected to a database and the postgresql server is
# restarted? momoko should recover, but doesn't.

from __future__ import print_function

import sys
import logging
import random
import time

import psycopg2

from tornado import gen, ioloop
import momoko

log = logging.getLogger('pgrestart')


@gen.coroutine
def test(dsn, io_loop):
    log.info('creating pool: %r', dsn)
    pool = momoko.Pool(dsn=dsn, size=1, ioloop=io_loop)
    yield pool.connect()
    log.info('pool: %r', pool)

    while True:
        log.debug('calling pool.getconn()')
        try:
            conn = yield pool.getconn()
        except psycopg2.DatabaseError:
            log.exception('error from pool.getconn()')
            time.sleep(1)
            continue

        log.debug('querying conn %r', conn)
        cur = yield conn.execute('select pg_backend_pid()')
        pid = cur.fetchall()[0][0]
        log.info('pid: %r', pid)
        cur.close()
        pool.putconn(conn)

        time.sleep(random.uniform(0.2, 0.8))

    pool.close()


def main():
    dsn = sys.argv[1]

    logging.basicConfig(format='[%(asctime)s %(levelname)-1.1s %(name)s] %(message)s',
                        level=logging.DEBUG)
    logging.getLogger("momoko").setLevel(logging.INFO)

    io_loop = ioloop.IOLoop.instance()
    future = test(dsn, io_loop)
    io_loop.add_future(future, lambda x: io_loop.stop())
    io_loop.start()


main()

If I run this normally, the output is very boring:

python pgrestart.py 'host=localhost dbname=psycopg2_test'
[2016-07-04 16:25:46,807 I pgrestart] creating pool: 'host=localhost dbname=psycopg2_test'
[2016-07-04 16:25:46,819 I pgrestart] pool: <momoko.connection.Pool object at 0x7f55c52ffc50>
[2016-07-04 16:25:46,819 D pgrestart] calling pool.getconn()
[2016-07-04 16:25:46,820 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f55bdec27d0>
[2016-07-04 16:25:46,821 I pgrestart] pid: 29496
[2016-07-04 16:25:47,528 D pgrestart] calling pool.getconn()
[2016-07-04 16:25:47,530 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f55bdec27d0>
[2016-07-04 16:25:47,531 I pgrestart] pid: 29496
[...repeat forever...]

But if I start it running, and then in other terminal I run sudo service postgresql restart, here's what I get almost every time:

[2016-07-04 16:26:52,676 D pgrestart] calling pool.getconn()
[2016-07-04 16:26:52,682 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f020b16a7d0>
[2016-07-04 16:26:52,683 I pgrestart] pid: 29510
[2016-07-04 16:26:53,445 D pgrestart] calling pool.getconn()
[2016-07-04 16:26:53,448 E pgrestart] error from pool.getconn()
Traceback (most recent call last):
  File "pgrestart.py", line 32, in test
    conn = yield pool.getconn()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
    value = future.result()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available
[2016-07-04 16:26:54,450 D pgrestart] calling pool.getconn()
[2016-07-04 16:26:54,470 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f020b16a7d0>
[2016-07-04 16:26:54,471 I pgrestart] pid: 29543

That is, one query fails while PG is down, and then we're back in business with a new backend. I'd love it if momoko hid that and did the necessary retries for me, but that's not the serious problem.

Here is the serious problem, which I see occasionally:

[2016-07-04 16:27:12,581 D pgrestart] calling pool.getconn()
[2016-07-04 16:27:12,583 D pgrestart] querying conn <momoko.connection.Connection object at 0x7f020b16a7d0>
[2016-07-04 16:27:12,584 I pgrestart] pid: 29634
[2016-07-04 16:27:13,087 D pgrestart] calling pool.getconn()
[2016-07-04 16:27:13,088 E pgrestart] error from pool.getconn()
Traceback (most recent call last):
  File "pgrestart.py", line 32, in test
    conn = yield pool.getconn()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
    value = future.result()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available
[2016-07-04 16:27:14,089 D pgrestart] calling pool.getconn()
[2016-07-04 16:27:14,090 E tornado.application] Exception in callback <functools.partial object at 0x7f020a8fa890>
Traceback (most recent call last):
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/ioloop.py", line 600, in _run_callback
    ret = callback()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/data/src/momoko/momoko/connection.py", line 579, in on_connection_available
    conn = fut.result()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available

...and that's it. My app is dead, frozen, hung. Nothing happens until I kill it and restart it.

Note that I can greatly increase the frequency of hangs by decreasing the sleep after "error from pool.getconn()". Eg. with a 1 sec sleep as shown, roughly 1 in 10 restarts result in a hung process. But if I reduce the sleep to 0.1 sec:

--- a/pgrestart.py
+++ b/pgrestart.py
@@ -32,7 +32,7 @@ def test(dsn, io_loop):
             conn = yield pool.getconn()
         except psycopg2.DatabaseError:
             log.exception('error from pool.getconn()')
-            time.sleep(1)
+            time.sleep(0.1)
             continue

         log.debug('querying conn %r', conn)

then my process hangs almost every single time. It's dead easy to reproduce.

from momoko.

gward avatar gward commented on July 22, 2024

And here is the output of running pgrestart.py with momoko debugging cranked up:

[2016-07-04 16:31:47,029 D pgrestart] calling pool.getconn()
[2016-07-04 16:31:47,029 D momoko] Acquired free connection 8
[2016-07-04 16:31:47,030 D momoko] Obtained connection: 8
[2016-07-04 16:31:47,031 D pgrestart] querying conn <momoko.connection.Connection object at 0x7fe91745a7d0>
[2016-07-04 16:31:47,032 I pgrestart] pid: 29784
[2016-07-04 16:31:47,032 D momoko] About to release connection 8
[2016-07-04 16:31:47,033 D momoko] The connection is alive
[2016-07-04 16:31:47,033 D momoko] Handling free connection 8
[2016-07-04 16:31:47,033 D momoko] No outstanding requests - adding to free pool
[2016-07-04 16:31:47,611 D pgrestart] calling pool.getconn()
[2016-07-04 16:31:47,611 D momoko] Acquired free connection 8
[2016-07-04 16:31:47,611 D momoko] Obtained connection: 8
[2016-07-04 16:31:47,612 D momoko] Method failed Asynchronously
[2016-07-04 16:31:47,612 D momoko] About to release connection 8
[2016-07-04 16:31:47,612 D momoko] The connection is dead
[2016-07-04 16:31:47,612 D momoko] All connections are dead - aborting waiting queue
[2016-07-04 16:31:47,612 E pgrestart] error from pool.getconn()
Traceback (most recent call last):
  File "pgrestart.py", line 32, in test
    conn = yield pool.getconn()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/gen.py", line 1008, in run
    value = future.result()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available
[2016-07-04 16:31:47,713 D pgrestart] calling pool.getconn()
[2016-07-04 16:31:47,713 D momoko] All connections are dead
[2016-07-04 16:31:47,715 D momoko] Adding dead connection
[2016-07-04 16:31:47,716 E tornado.application] Exception in callback <functools.partial object at 0x7fe916be8520>
Traceback (most recent call last):
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/ioloop.py", line 600, in _run_callback
    ret = callback()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/data/src/momoko/momoko/connection.py", line 579, in on_connection_available
    conn = fut.result()
  File "/var/lib/venv/renesys/local/lib/python2.7/site-packages/tornado/concurrent.py", line 232, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
DatabaseNotAvailable: No database connection available

from momoko.

gward avatar gward commented on July 22, 2024

Oh yeah, I'm seeing this on:

  • Ubuntu 15.10
  • system package postgresql-9.4 9.4.8-0ubuntu0.15.10
  • system package python 2.7.9-1 (but python --version reports 2.7.10: weird!)
  • system package python-psycopg2 2.6.1-1build1
  • latest git master of momoko (commit 678f494)

from momoko.

haizaar avatar haizaar commented on July 22, 2024

And thanks for thorough reproduction (as usual :)

from momoko.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.