Code Monkey home page Code Monkey logo

Comments (17)

peterhinch avatar peterhinch commented on August 11, 2024

That is odd. The lock instance is heavily used: if the code were (inexplicably) picking up a version with no __aexit__ the consequence would be immediately apparent. The fact that it occurs after a period of running implies a failure of the runtime environment. I suspect RAM issues.

What hardware are you using? Are you using frozen bytecode? How much free RAM is there when your test script is running?

If you're running on an ESP8266, especially if you're using TLS, it may be a consequence of lack of RAM or RAM fragmentation. You might want to run a test which periodically prints (or publishes) RAM statistics to see if this might be the issue. In MicroPython coding I have encountered errors which appear to make no sense: in every case the cause proved to be a lack of contiguous free RAM at runtime.

The other possibility aside from RAM issues is a problem with TLS. We have extensively tested the mqtt_as library on ESP8266 and Pyboard D, but TLS is uncharted territory.

from micropython-mqtt.

eRJx avatar eRJx commented on August 11, 2024

I use ESP32 Wrover with 520k+8MB SPIRAM RAM. At the moment of failure it was 4005184 free ram. Maybe the developmental version of micropython v.1.11-498-gf69ef97f2 on 2019-10-25 matters.

from micropython-mqtt.

peterhinch avatar peterhinch commented on August 11, 2024

That seems to imply that your script is using ~4MB of RAM. This seems high, given that our test scripts run on an ESP8266 with ~21K free. Have you checked for memory leaks? Have you a figure for maximum free block size when it failed?

Testing on ESP32 was restricted to units without SPIRAM. However I haven't read of any issues with SPIRAM firmware builds. My guess is still that RAM fragmentation may be the cause, depending on how your application uses all that RAM. Presumably if TLS were the problem it would fail immediately.

from micropython-mqtt.

eRJx avatar eRJx commented on August 11, 2024

I don't know that memory fragmentation matters. The error message indicates memory corruption because the Lock () object allocation was previously done and disappeared later.
I am currently testing several ESPs and for several hours there was no error

from micropython-mqtt.

peterhinch avatar peterhinch commented on August 11, 2024

The error message indicates memory corruption

You would indeed think so. Prior experience with similar "inexplicable" errors suggests that fragmentation is a strong contender as the cause. There may be a bug in the MicroPython runtime which can cause corruption under these circumstances, but creating a test case might be challenging.

In practice issuing gc.collect() periodically can help preempt fragmentation but mqtt_as already does this.

I'm not sure I can offer any other ideas here beyond monitoring your RAM usage.

from micropython-mqtt.

kevinkk525 avatar kevinkk525 commented on August 11, 2024

from micropython-mqtt.

peterhinch avatar peterhinch commented on August 11, 2024

All I can say is I've seen similar errors in MicroPython code when contiguous RAM runs low. I suspect a bug in the MicroPython VM, but the question is how to produce a sensible test case.

from micropython-mqtt.

eRJx avatar eRJx commented on August 11, 2024

After about 70 hours of testing mqtt_as, the program stopped again with the message:

RAM free 4061904 alloc 36336
(b'v1/xxx/things/yyy/data/3', b'temp,c=48481', False)
(b'v1/xxx/things/yyy/data/3', b'temp,c=48483', False)
(b'v1/xxx/things/yyy/data/3', b'temp,c=48484', False)
(b'v1/xxx/things/yyy/data/3', b'temp,c=48485', False)
RAM free 4061952 alloc 36288
(b'v1/xxx/things/yyy/data/3', b'temp,c=48486', False)
(b'v1/xxx/things/yyy/data/3', b'temp,c=48487', False)
(b'v1/xxx/things/yyy/data/3', b'temp,c=48488', False)
(b'v1/xxx/things/yyy/data/3', b'temp,c=48489', False)
RAM free 4061984 alloc 36256
(b'v1/xxx/things/yyy/data/3', b'temp,c=48490', False)
(b'v1/xxx/things/yyy/data/3', b'temp,c=48491', False)
(b'v1/xxx/things/yyy/data/3', b'temp,c=48492', False)
(b'v1/xxx/things/yyy/data/3', b'temp,c=48493', False)
RAM free 4062032 alloc 36208
Traceback (most recent call last):
  File "main.py", line 1, in <module>
  File "<string>", line 42, in <module>
  File "<string>", line 40, in <module>
  File "/uasyncio/core.py", line 180, in run_until_complete
  File "/uasyncio/core.py", line 154, in run_forever
  File "/uasyncio/core.py", line 146, in run_forever
AssertionError: Unsupported coroutine yield value: <function> (of type <class 'function'>)
MicroPython v1.11 on 2019-05-29; ESP32 module with ESP32
Type "help()" for more information.
>>>

A broker connection was encoded ssl and messages were published every 5s over 70h.
/uasyncio/core.py:

145:  else:
146:                        assert False, "Unsupported coroutine yield value: %r (of type %r)" % (ret, type(ret))

/uasyncio/mqtt_as.py was modified (Add of temporarily changing non-blocking sockets for blocking sockets):
0339efa
main program:

from mqtt_as import MQTTClient
from mqtt_as import config 
import uasyncio as asyncio


def callback(topic, msg, retained):
    print((topic, msg, retained))

async def conn_han(client):
    await client.subscribe('v1/{}/things/{}/data/3'.format(config['user'],config['client_id']), 0)

async def main(client):
    await client.connect()
    n = 0
    while True:
        await asyncio.sleep(5)
        #print(' publish: v1/{}/things/{}/data/3'.format(config['user'],config['client_id']) , 'counter,null={}'.format(n))
        # If WiFi is down the following will pause for the duration.
        await client.publish('v1/{}/things/{}/data/3'.format(config['user'],config['client_id']) , 'temp,c={}'.format(n), qos = 0)
        n += 1



config['ssl'] = True

config['subs_cb'] = callback
config['connect_coro'] = conn_han








MQTTClient.DEBUG = True  
client = MQTTClient(config)
loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main(client))
finally:
    client.close()  

from micropython-mqtt.

eRJx avatar eRJx commented on August 11, 2024

I would like to try to reset it in case of program crash so that it starts working again. How can this be done correctly?

from micropython-mqtt.

peterhinch avatar peterhinch commented on August 11, 2024

This is a question best raised in the forum as it applies regardless of the software you're running.

In essence the solution is a watchdog timer (WDT): a forum search on those keywords should pay dividends. WDT's are ideally implemented in hardware with a piece of electronics which pulses the reset line unless it receives regular pulses from a pin pulsed by a coro in your application. A less ideal solution is a software WDT which might use a timer interrupt to do the same thing, issuing machine.reset() if it a global timestamp has not been recently updated by a coro. The drawback of a software WDT is that a total crash may stop the ISR from running.

I have less experience of long term running on ESP32, but my experience of ESP8266 is that these do very occasionally crash for reasons which are quite unaccountable (by me). They are definitely not running out of RAM and I have ensured that their hardware environment is ideal.

from micropython-mqtt.

eRJx avatar eRJx commented on August 11, 2024

I run four tests in parallel on four ESP32 Wrover with 520k+8MB SPIRAM RAM from last Friday. mem_info() now shows :

stack:2416 out of 15360
GC: total: 4098240, used: 95600, free: 4002640
 No. of 1-blocks: 978, 2-blocks: 325, max blk z: 393, max free sz: 248115

I can't assess if it's good, but except that one crashed as described above, the others work uninterrupted. So most likely the problem is in the hardware. I keep testing, we'll see how the statistics will be.

from micropython-mqtt.

peterhinch avatar peterhinch commented on August 11, 2024

There is evidently no problem with memory so you're probably right about hardware.

from micropython-mqtt.

eRJx avatar eRJx commented on August 11, 2024

What a bad luck...., another one has just hung up:

File "/uasyncio/core.py", line 85, in run_forever
AttributeError: 'PollEventLoop' object has no attribute 'call_soon'

Another two are still working

from micropython-mqtt.

peterhinch avatar peterhinch commented on August 11, 2024

I tested on ESP32 but have not done long term running tests on that platform. I have on ESP8266 and Pyboard D, with many weeks of cumulative runtime on each.

The Pyboard D is rock solid.

Alas ESP8266 is less so. The reference board and the WeMOS D1 Mini occasionally crash, despite my taking great care to ensure a stable, accurate and electrically quiet power supply. On the other hand I have a Sonoff Basic R3 unit with an ESP8285 which has been running for three weeks and is still going strong. Yet I have experienced other ESP8266 hardware (including earlier Sonoff models) which crashed so frequently as to be useless.

Obviously this doesn't help you a great deal - you may be breaking new ground by doing long running tests on ESP32 devices with PSRAM. If ESP8266/ESP8285 are anything to go by, hardware design quality seems variable.

from micropython-mqtt.

eRJx avatar eRJx commented on August 11, 2024

I tested 4 pieces of ESP32 for a month. 3 have crashed 1 works so far. I think the problem is not with software. I think I can close the topic

from micropython-mqtt.

gmos avatar gmos commented on August 11, 2024

].
20/09/2020 14:46:46 - Traceback (most recent call last):
20/09/2020 14:46:46 - File "main.py", line 51, in
20/09/2020 14:46:46 - File "rmc/remcoo.py", line 211, in
20/09/2020 14:46:46 - File "/lib/uasyncio/core.py", line 87, in run_forever
20/09/2020 14:46:46 - AttributeError: 'PollEventLoop' object has no attribute 'call_soon'
20/09/2020 14:46:46 - Not in task.
20/09/2020 14:47:06 - main.py forcing reboot....
20/09/2020 14:47:06 - ets Jun 8 2016 00:22:57

ESP32 Wrover, on my own board.

Furthermore:
09/08/2020 14:49:06 - INFO:rmc.remcoo:650292543:Last application reset cause: hard
20/09/2020 14:47:46 - INFO:rmc.remcoo:653921259:Last application reset cause: hard
So the ESP has been running 42 days continuously before the problem emerged.

This (and a couple of other weird errors) happens on more than one of the boards. Still no clue if it is HW or SW.
Most often it happens somewhere in AsyncIO. But that is no wonder since almost all time is spent in AsyncIO, waiting for something usefull to do.

from micropython-mqtt.

peterhinch avatar peterhinch commented on August 11, 2024

ESP32 hardware with PSRAM has a known hardware issue. I've never managed to keep one going for anything like 42 days. It may be worth installing a non-PSRAM firmware build if you can cope with the reduced RAM (I haven't actually tested this).

That said, in long term tests of non-PSRAM ESP32's (running a different application) I and another user have experienced occasional spontaneous reboots.

In my experience the only rock-solid reliable WiFi client is the Pyboard D which ran a test over many weeks exchanging nearly 1,000,000 messages. The test only ended when my wife briefly unplugged it, thinking it wouldn't notice...

from micropython-mqtt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.