Code Monkey home page Code Monkey logo

microfaas-worker's People

Contributors

abyrne55 avatar allenz1120 avatar yannip1234 avatar

Watchers

 avatar  avatar  avatar

microfaas-worker's Issues

Orchestrator should tell inactive workers to power down

Currently, when the orchestrator receives a worker request from a worker that's not registered in WORKERS (even if it is registered in AVAILABLE_WORKERS), its default behavior is to just drop the connection:

        try:
            w = WORKERS[str(self.worker_id)]
        except KeyError:
            log.error("Worker with unknown ID %s attempted to connect", self.worker_id)
            return

We'd be much better-served by telling such a worker to power-down by changing the default behavior shown above, and/or by updating the Worker state machine to tell inactive workers to power-off. This would minimize the power draw of inactive workers.

Orchestrator frequently tells VMWorkers to reboot due to unhandled request

(Post-fix bug report; documenting here for posterity)

Running a VM cluster with the latest code on the refactor branch was frequently producing warnings about unhandled requests, e.g.

Sep 03 12:02:53 beaglebone python3[1762]: INFO:root:Processed results of invocation csmZbi from worker 103
Sep 03 12:02:53 beaglebone python3[1762]: INFO:root:Processed results of invocation ixjaZm from worker 104
Sep 03 12:02:53 beaglebone python3[1762]: INFO:root:Sending pkill to VMWorker104
Sep 03 12:02:53 beaglebone python3[1762]: ERROR:root:VMWorker104 made request but no output events set
Sep 03 12:02:53 beaglebone python3[1762]: WARNING:root:Telling VMWorker104 to reboot due to unhandled request
Sep 03 12:02:53 beaglebone python3[1762]: INFO:root:Transmitted work to VMWorker105
Sep 03 12:02:54 beaglebone python3[1762]: INFO:root:Processed results of invocation FlS9lG from worker 105
Sep 03 12:02:54 beaglebone python3[1762]: INFO:root:Transmitted work to VMWorker106
Sep 03 12:02:54 beaglebone python3[1762]: INFO:root:Processed results of invocation b1oC9x from worker 106
Sep 03 12:02:54 beaglebone python3[1762]: INFO:root:Transmitted work to VMWorker103
Sep 03 12:02:55 beaglebone python3[1762]: INFO:root:Transmitted work to VMWorker105
Sep 03 12:02:55 beaglebone python3[1762]: INFO:root:Processed results of invocation 1JIlkt from worker 105
Sep 03 12:02:55 beaglebone python3[1762]: INFO:root:Processed results of invocation uf84SY from worker 103
Sep 03 12:02:55 beaglebone python3[1762]: INFO:root:Transmitted work to VMWorker106
Sep 03 12:02:55 beaglebone python3[1762]: INFO:root:Processed results of invocation Hh1bw7 from worker 106
Sep 03 12:02:55 beaglebone python3[1762]: INFO:root:Sending pkill to VMWorker106
Sep 03 12:02:55 beaglebone python3[1762]: ERROR:root:VMWorker106 made request but no output events set
Sep 03 12:02:55 beaglebone python3[1762]: WARNING:root:Telling VMWorker106 to reboot due to unhandled request
Sep 03 12:02:56 beaglebone python3[1762]: INFO:root:Transmitted work to VMWorker105
Sep 03 12:02:56 beaglebone python3[1762]: INFO:root:Transmitted work to VMWorker103
Sep 03 12:02:56 beaglebone python3[1762]: INFO:root:Processed results of invocation 1S6Zsc from worker 103
Sep 03 12:02:56 beaglebone python3[1762]: INFO:root:Sending pkill to VMWorker103
Sep 03 12:02:56 beaglebone python3[1762]: ERROR:root:VMWorker103 made request but no output events set
Sep 03 12:02:56 beaglebone python3[1762]: WARNING:root:Telling VMWorker103 to reboot due to unhandled request
Sep 03 12:02:56 beaglebone python3[1762]: INFO:root:Attempting to power up VMWorker106
Sep 03 12:02:57 beaglebone python3[1762]: INFO:root:Processed results of invocation iKHAFI from worker 105
Sep 03 12:02:57 beaglebone python3[1762]: INFO:root:Attempting to power up VMWorker104

This was concerning because it seemed that VMWorkers were being told to reboot unnecessarily right after being (appropriately) pkilled. The most likely explanation, however, was that the VMWorker was able to squeeze one more worker request after the call to pkill job was executed but before the QEMU process was actually terminated, so the orchestrator wasn't expecting this. In other words, this is a harmless (though annoying) bug, as the VM is getting killed no matter what it's being told to do by the orchestrator in its final moments.

This bug should be fixed by 4d3b7d3. Please close this after confirmation.

Worker-side code should have its own repo

This repo is getting a little messy as we try to split the code up into proper modules. It would make sense to move the worker-side code (e.g., worker.py, micropg.py) to a different repo: probably best to create a "worker-filesystem" repo that just has our initramfs file structure and then worker code all-in-one.

Runaway starting of VMs near end of experiment

Observing behavior where VM server starts OOM-killing near the end of our experiments due to apparently infinite creation of VMs. Log tail looks like:

Aug 20 11:08:22 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker103
Aug 20 11:08:26 beaglebone python3[490]: INFO:root:Transmitted work to VMWorker105
Aug 20 11:08:26 beaglebone python3[490]: INFO:root:Processed results of invocation drZG71 from worker 105
Aug 20 11:08:28 beaglebone python3[490]: INFO:root:Transmitted work to VMWorker105
Aug 20 11:08:28 beaglebone python3[490]: INFO:root:Processed results of invocation pNUgnz from worker 105
Aug 20 11:08:29 beaglebone python3[490]: INFO:root:Transmitted work to VMWorker105
Aug 20 11:08:29 beaglebone python3[490]: INFO:root:Processed results of invocation LXgBkQ from worker 105
Aug 20 11:08:29 beaglebone python3[490]: INFO:root:Transmitted work to VMWorker108
Aug 20 11:08:29 beaglebone python3[490]: INFO:root:Processed results of invocation 6yEaaU from worker 108
Aug 20 11:08:34 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker112
Aug 20 11:08:40 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker109
Aug 20 11:08:41 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker106
Aug 20 11:08:44 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker115
Aug 20 11:08:45 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker114
Aug 20 11:08:47 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker104
Aug 20 11:08:49 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker116
Aug 20 11:08:50 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker110
Aug 20 11:08:51 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker111
Aug 20 11:08:53 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker113
Aug 20 11:09:09 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker107
Aug 20 11:09:15 beaglebone python3[490]: INFO:root:Transmitted work to VMWorker106
Aug 20 11:09:22 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker103
Aug 20 11:09:34 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker112
Aug 20 11:09:40 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker109
Aug 20 11:09:44 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker115
Aug 20 11:09:45 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker114
Aug 20 11:09:47 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker104
Aug 20 11:09:49 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker116
Aug 20 11:09:50 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker110
Aug 20 11:09:51 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker111
Aug 20 11:09:53 beaglebone python3[490]: INFO:root:Attempting to power up VMWorker113

Orchestrator mislabeling logs

The orchestrator labels all result log files with the -vm postfix, even when working with BBB clusters. BBB-clusters should be labeled with a -bbb postfix, and "mixed" clusters can be labeled with something like -mixed

Unnecessary warnings/exceptions when shutting down VMWorkers

When a VMWorker transitions into the OFF state, sometimes the pkill command fails to halt the corresponding VM before it makes another worker request. This triggers the following warning/exception:

Aug 19 13:36:57 beaglebone python3[20799]: WARNING:root:VMWorker104 made request but no output events set

and/or

Aug 19 13:36:57 beaglebone python3[20799]: ----------------------------------------
Aug 19 13:36:57 beaglebone python3[20799]: Exception happened during processing of request from ('192.168.1.104', 42042)
Aug 19 13:36:57 beaglebone python3[20799]: Traceback (most recent call last):
Aug 19 13:36:57 beaglebone python3[20799]:   File "/usr/lib/python3.7/socketserver.py", line 650, in process_request_thread
Aug 19 13:36:57 beaglebone python3[20799]:     self.finish_request(request, client_address)
Aug 19 13:36:57 beaglebone python3[20799]:   File "/usr/lib/python3.7/socketserver.py", line 360, in finish_request
Aug 19 13:36:57 beaglebone python3[20799]:     self.RequestHandlerClass(request, client_address, self)
Aug 19 13:36:57 beaglebone python3[20799]:   File "/usr/lib/python3.7/socketserver.py", line 720, in __init__
Aug 19 13:36:57 beaglebone python3[20799]:     self.handle()
Aug 19 13:36:57 beaglebone python3[20799]:   File "/home/debian/MicroFaaS/orchestrator.py", line 103, in handle
Aug 19 13:36:57 beaglebone python3[20799]:     self.data = self.request.recv(12288).strip()
Aug 19 13:36:57 beaglebone python3[20799]: ConnectionResetError: [Errno 104] Connection reset by peer
Aug 19 13:36:57 beaglebone python3[20799]: ----------------------------------------

These warnings/exceptions can be safely caught and ignored, as pkill ensures the VM is properly shut down, even if it lets a few erroneous worker requests slip out during the second or two it needs to kill QEMU.

(Note that this is in reference to code that's currently on the refactor branch.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.