Code Monkey home page Code Monkey logo

horde-worker-regen's People

Contributors

db0 avatar dependabot[bot] avatar fripe070 avatar gabrieljanczak avatar rikudousage avatar stinkerwue avatar superintendent2521 avatar tazlin avatar zten avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

horde-worker-regen's Issues

Low system RAM environments fail with SDXL (or other high RAM footprint models)

Of note, I have observed that SDXL seems to struggle with 13gb RAM/16 GB setups despite this seeming to work locally when I have had less (9gb system ram free) resources.

I was able to observe a single SDXL job finish, submit, but on the next job (also SDXL) the inference would hang on the sampler step in comfyui.

I suspect this may be tied to the hardcoded attempts to keep 9gb system memory free.

Add alchemy features/forms

  • In addition to previous forms, deepdanbooru interrogation is now supported via horde_safety.
  • There is some interest also in returning the image "features" as extracted by a clip model. This is what happens already as part of interrogation in horde_safety and is "free" in this respect.

Terminal UI

The previous worker had a well received curses UI. A similar sort of UI is certainly possible and would ideally show all the information as before (see link) and:

  • The state of each process
  • an option to see each individual process log file

Some of the previously show information, such as the Worker Total and Entire Horde could/should be moved to a separate stats screen to accommodate the per process details.

Make the queued megapixelsteps (and associated variables) configurable somehow

Is it possible to make the worker more efficient (in terms of total throughput) by allowing more queued megapixelsteps, at the cost of average time-to-return for jobs going up (time from pop to time to submit, measured by performance on the API /workers/ endpoint).

If so, the megapixelsteps behavior should be adjusted in the bridge data config some how.

This may be a not-planned depending on the potential (or lack of) performance benefits.

Putting the files into a path that contains spaces crashes on startup

I cloned the project into "/mnt/Margarine/File Storage/Apps/horde-worker-reGen/", except that when launching horde_bridge.sh, I get a crash with this log:

Using jemalloc from /usr/lib/x86_64-linux-gnu
/tmp/mambafFGSqIcPNmU: line 2: /mnt/Margarine/File: No such file or directory
/tmp/mambafFGSqIcPNmU: line 3: micromamba: command not found
/tmp/mambafFGSqIcPNmU: line 5: exec: python: not found
download_models.py exited with error code. Aborting

I suspect there is a missing pair of quotes in a script somewhere that makes the path get cut and thus not found. Would be great to get it fixed!

LoRa+TI downloads could be instigated ahead of inference

Currently, if a LoRa or TI is not on disk, the download is not started until the inference message is received and comfyui is entered. This could be potentially done earlier, and potentially could be potentially part of the preload response by an inference process.

Support runtime config file selection

Please implement support to direct the worker to a file other than bridgeData.yaml as the configuration file. This allows multiple worker instances to be run from the same directory with different configurations. This is useful for multi-GPU cases to avoid needing to duplicate the directory (and models) or work around a different way.

ERROR

critical libmamba Cannot activate, prefix does not exist at: 'C:\Users\yuuwa\OneDrive\Plocha\horde-worker-reGen-main\conda\envs\windows'

ERROR: Cannot install -r requirements.txt (line 5) because these package versions have conflicting dependencies.

The conflict is caused by:
hordelib 2.6.5 depends on mediapipe>=0.9.1.0
hordelib 2.6.4 depends on mediapipe>=0.9.1.0

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

i try lot of stuff but non of them help me

Single model workers configs should mean a less aggressive memory cleanup scheme

The primary intent behind leaving a certain amount of free system ram is to allow a cushion for potentially very large other models to load (such as SDXL models). However, in the situation where the worker is configured only to run a single model, the memory conditions become much more predictable and will fail anyway if an OOM occurs.

  • If the worker has one model only
    • If the model has only a single model file
      • Keep the model entirely on VRAM 100% of the time
    • If the model has multiple models (as is the case with Stable Cascade)
      • Avoid offloading to disk if possible, swapping the models only between RAM and VRAM.

If failures are met in this situation, its likely the model overhead would only be encouraging the worker to run in very poor memory conditions (as they would constantly be loading off disk for little to no reason).

`n_iter` support

The reGen worker was written with this option in mind, and it presently (should) always assume that more than one image result is possible, but API and hordelib machinery may need to be added or adjusted to fully confirm this works as intended.

Allow workers to configure a 'pinned' model that will be preferred

I can see the merit in having a model such as SDXL configured to pop on its own queue, or at least pop preferentially (by omitting the other models set to load while it doesn't have a job) so workers can offer the SD1.5 models while prioritizing SDXL (or whatever model they choose).

asyncio timeout on submit can put the worker into maintenance

Not quite sure what caused this stack trace, but it's also not on the trace logs

2024-01-28 14:38:45.455 | ERROR    | asyncio.events:_run:80 - An error has been caught in function '_run', process 'MainProcess' (4053147), thread 'MainThread' (140531284598656):
Traceback (most recent call last):

  File "/home/db0/projects/horde-worker-reGen/run_worker.py", line 110, in <module>
    main(multiprocessing.get_context("spawn"))
    │    │               └ <bound method DefaultContext.get_context of <multiprocessing.context.DefaultContext object at 0x7fcffb50b910>>
    │    └ <module 'multiprocessing' from '/usr/lib/python3.10/multiprocessing/__init__.py'>
    └ <function main at 0x7fcffb678f70>

  File "/home/db0/projects/horde-worker-reGen/run_worker.py", line 71, in main
    start_working(
    └ <function start_working at 0x7fcff8f5d480>

  File "/home/db0/projects/horde-worker-reGen/horde_worker_regen/process_management/main_entry_point.py", line 22, in start_working
    process_manager.start()
    │               └ <function HordeWorkerProcessManager.start at 0x7fcff81b8430>
    └ <horde_worker_regen.process_management.process_manager.HordeWorkerProcessManager object at 0x7fcff7a822c0>

  File "/home/db0/projects/horde-worker-reGen/horde_worker_regen/process_management/process_manager.py", line 2468, in start
    asyncio.run(self._main_loop())
    │       │   │    └ <function HordeWorkerProcessManager._main_loop at 0x7fcff81b83a0>
    │       │   └ <horde_worker_regen.process_management.process_manager.HordeWorkerProcessManager object at 0x7fcff7a822c0>
    │       └ <function run at 0x7fcffb572950>
    └ <module 'asyncio' from '/usr/lib/python3.10/asyncio/__init__.py'>

  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
           │    │                  └ <coroutine object HordeWorkerProcessManager._main_loop at 0x7fcf0f1685f0>
           │    └ <function BaseEventLoop.run_until_complete at 0x7fcffb0ec3a0>
           └ <_UnixSelectorEventLoop running=True closed=False debug=False>
  File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
    │    └ <function BaseEventLoop.run_forever at 0x7fcffb0ec310>
    └ <_UnixSelectorEventLoop running=True closed=False debug=False>
  File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
    │    └ <function BaseEventLoop._run_once at 0x7fcffb0ede10>
    └ <_UnixSelectorEventLoop running=True closed=False debug=False>
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
    │      └ <function Handle._run at 0x7fcffb0957e0>
    └ <Handle Task.task_wakeup(<Future cancelled>)>
> File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
    │    │            │    │           │    └ <member '_args' of 'Handle' objects>
    │    │            │    │           └ <Handle Task.task_wakeup(<Future cancelled>)>
    │    │            │    └ <member '_callback' of 'Handle' objects>
    │    │            └ <Handle Task.task_wakeup(<Future cancelled>)>
    │    └ <member '_context' of 'Handle' objects>
    └ <Handle Task.task_wakeup(<Future cancelled>)>

  File "/home/db0/projects/horde-worker-reGen/horde_worker_regen/process_management/process_manager.py", line 1676, in submit_single_generation
    async with self._aiohttp_session.put(
               │    │                └ <function ClientSession.put at 0x7fcff8c904c0>
               │    └ <aiohttp.client.ClientSession object at 0x7fcf0f1d14e0>
               └ <horde_worker_regen.process_management.process_manager.HordeWorkerProcessManager object at 0x7fcff7a822c0>

  File "/home/db0/projects/horde-worker-reGen/venv/lib/python3.10/site-packages/aiohttp/client.py", line 1141, in __aenter__
    self._resp = await self._coro
    │    │             │    └ <member '_coro' of '_BaseRequestContextManager' objects>
    │    │             └ <aiohttp.client._RequestContextManager object at 0x7fcf00904250>
    │    └ <member '_resp' of '_BaseRequestContextManager' objects>
    └ <aiohttp.client._RequestContextManager object at 0x7fcf00904250>
  File "/home/db0/projects/horde-worker-reGen/venv/lib/python3.10/site-packages/aiohttp/client.py", line 467, in _request
    with timer:
         └ <aiohttp.helpers.TimerContext object at 0x7fcf00906530>
  File "/home/db0/projects/horde-worker-reGen/venv/lib/python3.10/site-packages/aiohttp/helpers.py", line 721, in __exit__
    raise asyncio.TimeoutError from None
          │       └ <class 'asyncio.exceptions.TimeoutError'>
          └ <module 'asyncio' from '/usr/lib/python3.10/asyncio/__init__.py'>

asyncio.exceptions.TimeoutError

AMD GPU issues

When trying to use this with an AMD GPU it doesn't work, as it says that it is trying to search for an NVIDIA GPU

Post-processor crashing makes the worker stall

From the logs of cozmyc, I noticed some weird OOM errors about Post-processors even though there should be more than enough VRAM. I don't quite understand why since I thought the CPU uses the RAM.

(Cozmyc has a very old CPU, so this is probably relevant)

2024-01-16 09:43:59.153 | ERROR    | hordelib.comfy_horde:send_sync:666 - execution_error, {'prompt_id': 'b0749e50-0fc3-423e-b157-d72a8511b395', 'node_id': 'face_restore_with_model', 'node_type': 'FaceRestoreWithModel', 'executed': ['model_loader', 'image_loader'], 'exception_message': 'Unable to allocate 384. MiB for an array with shape (4096, 4096, 3) and data type float64', 'exception_type': 'numpy.core._exceptions._ArrayMemoryError', 'traceback': ['  File "C:\\Users\\santiago\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\hordelib\\_comfyui\\execution.py", line 154, in recursive_execute\n    output_data, output_ui = get_output_data(obj, input_data_all)\n', '  File "C:\\Users\\santiago\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\hordelib\\_comfyui\\execution.py", line 84, in get_output_data\n    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)\n', '  File "C:\\Users\\santiago\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\hordelib\\_comfyui\\execution.py", line 77, in map_node_over_list\n    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))\n', '  File "C:\\Users\\santiago\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\hordelib\\nodes\\facerestore\\__init__.py", line 180, in restore_face\n    restored_img = face_helper.paste_faces_to_input_image()\n', '  File "C:\\Users\\santiago\\AppData\\Local\\Programs\\Python\\Python310\\Lib\\site-packages\\hordelib\\nodes\\facerestore\\facelib\\utils\\face_restoration_helper.py", line 527, in paste_faces_to_input_image\n    inv_soft_mask * pasted_face + (1 - inv_soft_mask) * upsample_img\n'],

We should look at our error crashing in the PP process to make it fail more gracefully and inform the user.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.