Code Monkey home page Code Monkey logo

Comments (15)

hbrunn avatar hbrunn commented on July 18, 2024

Is this really a problem or does it only look ugly in the logs?

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

The cron job crashes on the client instance, then the server keeps sending "instance seems to be dead" emails until we restart the client instance (even though Odoo was still running fine). Having a stack trace in the log is definitely ok, but the cron shouldn't crash IMO.

from server-tools.

hbrunn avatar hbrunn commented on July 18, 2024

hmm, but that's weird, because cronjobs' exceptions should be handled safely by cron anyways: https://github.com/OCA/OCB/blob/7.0/openerp/addons/base/ir/ir_cron.py#L140 - that's also the reason I didn't do any error handling in the first place.
Could you find our why this is not true on the client instance in question? Maybe a very old code base?

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

Thanks for this answer which makes a lot of sense. Let me check it and get back to you!

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

I checked the code of ir_cron.py and it's exactly the same as OCB v7

from server-tools.

hbrunn avatar hbrunn commented on July 18, 2024

then the cron thread shouldn't crash. Do some debugging to find out what goes wrong here

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

The Odoo source code was pretty old anyway. I updated it to the latest OCB v7. I'll first check if it still crashes (it shouldn't take long) and add some log if it's the case. Else I'll close the issue :)

from server-tools.

hbrunn avatar hbrunn commented on July 18, 2024

@seb-elico does it work now?

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

@hbrunn Hi Holger! No, unfortunately it's still crashing. I haven't had time to further investigate it so far. When I have time, I'll try to add some logs or catch the exception inside alive(self) to see if it solves the problem. We have a lot of issues with other modules due to the bad quality of internet so China is a good playground to test Odoo in "extreme" conditions ;) For instance, the fetchmail module has several issues due to connection timeout. But exceptions are handled inside the module and it doesn't cause the cron to crash, hence my first suggestion https://github.com/OCA/OCB/blob/7.0/addons/fetchmail/fetchmail.py#L259

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

@hbrunn I added some logs and changed the log level to debug. It starts to be clearer now (but still not completely clear).
First, about the exceptions: indeed, when an exception is raised, it's handled by the cron. And after the exception has been raised, the cron is able to start a new job. FYI, so far I have seen:

  • URLError: <urlopen error [Errno 110] Connection timed out>
  • URLError: <urlopen error [Errno 104] Connection reset by peer>

Second, regarding the cron that "crashes": after the server stopped receiving HTTP requests, I saw the following message in the log every time the cron tried to start the job:
openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job Dead man's switch client, skipping it.
However, 18 minutes later, I got a Connection timed out and then the cron started to launch jobs again. It feels like the cron was "frozen" and that it took 18 minutes to reach the timeout!
It might be possible that sometimes it's "frozen" even longer and maybe even forever...
I'm going to add more logs (including the data in order to use the RAM usage as a way to make sure that the job that fails is the same than the one that was launched a while ago, 18 minutes in my previous example). I'll also try to setup a short timeout (like 30 seconds) to see if I keep having long gaps between the send and the exception. The good thing is: it happens very often thanks to the bad internet connection!
To be continued...

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

Please find below a log showing what I explained in my previous message (I added a sent log in case of success and a could not send in case of failure):

09:28:36,770 8 DEBUG stable openerp.addons.base.ir.ir_cron: Starting job `Dead man's switch client`.
09:28:36,776 8 DEBUG stable openerp.addons.base.ir.ir_cron: cron.object.execute(u'stable', 1, '*', u'dead.mans.switch.client', u'alive')
09:28:36,780 8 DEBUG stable openerp.addons.dead_mans_switch_client.models.dead_mans_switch_client: sending {'ram': 13.310339821349043, 'user_count': 0, 'cpu': 0.0, 'database_uuid': u'12345678-90ab-cdef-1234-567890abcdef'}
09:28:37,324 8 DEBUG stable openerp.addons.dead_mans_switch_client.models.dead_mans_switch_client: sent {'ram': 13.310339821349043, 'user_count': 0, 'cpu': 0.0, 'database_uuid': u'12345678-90ab-cdef-1234-567890abcdef'}
09:28:37,324 8 DEBUG stable openerp.addons.base.ir.ir_cron: 0.548s (dead.mans.switch.client, alive)
09:28:43,596 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:29:37,386 8 DEBUG ? openerp.service.cron: cron0 polling for jobs
09:29:44,638 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:30:39,299 8 DEBUG ? openerp.service.cron: cron0 polling for jobs
09:30:39,303 8 DEBUG stable openerp.addons.base.ir.ir_cron: Starting job `Dead man's switch client`.
09:30:39,310 8 DEBUG stable openerp.addons.base.ir.ir_cron: cron.object.execute(u'stable', 1, '*', u'dead.mans.switch.client', u'alive')
09:30:39,315 8 DEBUG stable openerp.addons.dead_mans_switch_client.models.dead_mans_switch_client: sending {'ram': 13.310339821349043, 'user_count': 0, 'cpu': 0.1, 'database_uuid': u'12345678-90ab-cdef-1234-567890abcdef'}
09:30:45,701 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:30:45,705 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:31:46,766 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:31:46,770 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:32:47,832 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:32:47,837 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:33:48,857 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:33:48,861 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:34:49,889 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:34:49,894 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:35:52,570 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:35:52,574 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:36:53,601 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:36:53,606 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:37:54,637 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:37:54,642 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:38:55,673 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:38:55,678 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:39:56,740 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:39:56,744 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:40:59,440 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:40:59,445 8 DEBUG stable openerp.addons.base.ir.ir_cron: Another process/thread is already busy executing job `Dead man's switch client`, skipping it.
09:41:01,950 8 DEBUG stable openerp.addons.dead_mans_switch_client.models.dead_mans_switch_client: could not send {'ram': 13.310339821349043, 'user_count': 0, 'cpu': 0.1, 'database_uuid': u'12345678-90ab-cdef-1234-567890abcdef'}
09:41:01,950 8 ERROR stable openerp.addons.base.ir.ir_cron: Call of self.pool.get('dead.mans.switch.client').alive(cr, uid, *()) failed in Job 10
Traceback (most recent call last):
  File "/opt/odoo/sources/odoo/openerp/addons/base/ir/ir_cron.py", line 136, in _callback
    method(cr, uid, *args)
  File "/opt/odoo/additional_addons/dead_mans_switch_client/models/dead_mans_switch_client.py", line 72, in alive
    raise e
URLError: <urlopen error [Errno 104] Connection reset by peer>
09:42:00,504 8 DEBUG ? openerp.service.cron: cron1 polling for jobs
09:42:00,508 8 DEBUG stable openerp.addons.base.ir.ir_cron: Starting job `Dead man's switch client`.
09:42:00,514 8 DEBUG stable openerp.addons.base.ir.ir_cron: cron.object.execute(u'stable', 1, '*', u'dead.mans.switch.client', u'alive')
09:42:00,518 8 DEBUG stable openerp.addons.dead_mans_switch_client.models.dead_mans_switch_client: sending {'ram': 13.324073147619526, 'user_count': 0, 'cpu': 0.0, 'database_uuid': u'12345678-90ab-cdef-1234-567890abcdef'}
09:42:01,206 8 DEBUG stable openerp.addons.dead_mans_switch_client.models.dead_mans_switch_client: sent {'ram': 13.324073147619526, 'user_count': 0, 'cpu': 0.0, 'database_uuid': u'12345678-90ab-cdef-1234-567890abcdef'}
09:42:01,206 8 DEBUG stable openerp.addons.base.ir.ir_cron: 0.692s (dead.mans.switch.client, alive)

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

I added a 30 seconds timeout to urllib2.urlopen, it seems much more stable now. The exceptions in the log are slightly different:

  • URLError: <urlopen error timed out>
  • URLError: <urlopen error _ssl.c:495: The handshake operation timed out>

I've never seen those exceptions before... Oh and I forgot to mention: the server is contacted on an HTTPS URL (hence the SSL exception).
I'm gonna let it run all night long to check if it "crashes/freezes" again and I'll let you know by tomorrow :)

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

@hbrunn Hi Holger! It seems that the timeout did the trick :D No crash/freeze of the cron during the night, a lot of timeouts have been raised (1 out of 10 requests in average). FYI, here's the patch I did, feel free to reuse it! The most important being the SEND_TIMEOUT part.

--- a/dead_mans_switch_client/models/dead_mans_switch_client.py
+++ b/dead_mans_switch_client/models/dead_mans_switch_client.py
@@ -11,6 +11,7 @@ except ImportError:
 import urllib2
 from openerp.osv import orm

+SEND_TIMEOUT = 30

 class DeadMansSwitchClient(orm.AbstractModel):
     _name = 'dead.mans.switch.client'
@@ -53,15 +54,21 @@ class DeadMansSwitchClient(orm.AbstractModel):
             logger.error('No server configured!')
             return
         data = self._get_data(cr, uid, context=context)
-        logger.debug('sending %s', data)
-        urllib2.urlopen(
-            urllib2.Request(
-                url,
-                json.dumps({
-                    'jsonrpc': '2.0',
-                    'method': 'call',
-                    'params': data,
-                }),
-                {
-                    'Content-Type': 'application/json',
-                }))
+        logger.debug('Sending %s', data)
+        try:
+            urllib2.urlopen(
+                urllib2.Request(
+                    url,
+                    json.dumps({
+                        'jsonrpc': '2.0',
+                        'method': 'call',
+                        'params': data,
+                    }),
+                    {
+                        'Content-Type': 'application/json',
+                    }),
+                timeout=SEND_TIMEOUT)
+            logger.debug('Successfully sent %s', data)
+        except Exception, e:
+            logger.debug('Failed to send %s', data)
+            raise e

from server-tools.

hbrunn avatar hbrunn commented on July 18, 2024

nice! Please make a PR with that, but read the timeout from an ir.config_parameter, maybe dead_mans_switch_client.send_timeout

from server-tools.

seb-elico avatar seb-elico commented on July 18, 2024

Done! #309
I also created a PR on your 7.0 branch so that you can update your own 7.0 PR :)

from server-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.