runpod / containers Goto Github PK

View Code? Open in Web Editor NEW

139.0 3.0 83.0 158.94 MB

🐳 | Dockerfiles for the RunPod container images used for our official templates.

Home Page: https://hub.docker.com/u/runpod

License: MIT License

Dockerfile 0.24% Shell 0.58% Jupyter Notebook 98.83% Python 0.26% HCL 0.08% HTML 0.01%

bittensor docker runpod stable-diffusion

containers's People

Contributors

Stargazers

Watchers

Forkers

marcus-arcadius heinthetaung damian0815 viczar lltcggie ty4nn binzcodes drewwalkup barshag joennlae niklastr dreamiphotos rulk davguij cjkihl fireinside tresmoa bassammq d8ahazard hashir788 swiecki ianscrivener hdworkstation103 abdullin camenduru lixw1994 astuanax pvanlaar toddw alexandre-wada llgoo msarmadi geniusfoever galindus therealadityashankar acidbubbles dusalex linuxem yvonne-aizawa furkangozukara synthintel0 supremesource bacoco yaroslavry sorokinvld slmagus jeremychu jjziets kunato oudeis tomohiro-sawada tissak kingdave100 s3714110 ashleykleynhans 5l1v3r1 pablodawson quangson93 madcap66 rkoyama1623 mattssn fluder-paradyne cwoehle wuminqi kopyl solitarythinker tungllm rslsp1 valachio panstech hughperkins mario4272 geofox flowtyone pepsighan joeaelkhoury nerdylive123 orellbuehler pandyamarut sam-ai56 kodxana

containers's Issues

env variable for --ServerApp.preferred_dir=/workspace

is it possible to use volume mount path env variable for --ServerApp.preferred_dir={from env} or like --notebook-dir={from env} user can start with volume mount path in my case /content instead of /workspace

containers/torch/start.sh

Line 19 in 564385a

    
           jupyter lab --allow-root --no-browser --port=8888 --ip=* --ServerApp.terminado_settings='{"shell_command":["/bin/bash"]}' --ServerApp.token=$JUPYTER_PASSWORD --ServerApp.allow_origin=* --ServerApp.preferred_dir=/workspace

Deployed image doestn`t react on clicks such as 'Generate', 'Refresh' etc

I just built (using docker buildx bake) and deployed an image of stable diffusion webui. Everything started fine, but webui doesn`t react to any of my clicks. I skipped the creation of runpod.yaml file, but just because I don't understand the purpose of it and how to fulfil it. I am quite new to this. Sorry if my problem is really silly. Would be happy for any help ^)

can't build stable-diffusion-auto docker

#16 317.6 RuntimeError: Couldn't install torch.
#16 317.6 Command: "/workspace/venv/bin/python3" -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

runpod/base needs a cuda 12.1.1 version

The 12.1.0 container is EOL (https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md) and will be deleted soon.

This is printed on boot:

2024-04-17T06:32:00.716216802Z *************************
2024-04-17T06:32:00.716263593Z ** DEPRECATION NOTICE! **
2024-04-17T06:32:00.716538501Z *************************
2024-04-17T06:32:00.716629960Z THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
2024-04-17T06:32:00.716661327Z     https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

Automatic WEBUI - custom .ckpt or safe tensor not loaded

I just followed the tutorial on RunPod Automatic WEBUI. and the custom safe tensor specified is not loaded when running inference. Instead, what is loaded is the default SD "model".

the the modded to load probably, we need to specify on this endpoint sdapi/v1/options:

"sd_model_checkpoint": "Anything-V3.0-pruned.ckpt [2700c435]",

response = requests.post(url=f'{url}/sdapi/v1/options', json=option_payload)

Link to docker file source is dead

If I follow the link

https://github.com/runpod/containers/tree/main/gpt4all

from

https://hub.docker.com/r/runpod/gpt4all#!

I get a404 - page not founderror.

[Feature Request] Add an official template for H2O LLM Studio

This allows people to fine-tune LLMs and test them without any coding experience. It has become fairly popular and receives regular updates:

https://github.com/h2oai/h2o-llmstudio

New versions (>v6?) of A1111 fail to use the `--xformers` flag (`no module 'xformers'. Processing without...` in startup logs)

I noticed that image generation was significantly slower in new versions of the runpod official A1111 image. Looking into it, it seems like it's due to xformers not being installed, or not loading correctly for whatever reason.

To reproduce (giving my specific steps, but I think it'd occur on secure cloud, and non-3090 machines too):

Go to community cloud
Select a 3090
Deploy with official v10 "Runpod Stable Diffusion" template (runpod/stable-diffusion:web-ui-10.0.0)
Look at startup logs, and also observe that generating a single image with 20 steps takes about 1.3 seconds instead of 1.1 seconds with runpod/stable-diffusion:web-automatic-6.0.1, and also observe that in the A1111 UI at the bottom of the page, it says xformers: N/A instead of xformers: <version number>

Here's a snippet from the startup logs:

2023-07-29T06:28:19.671508453Z 
2023-07-29T06:28:19.671510697Z ---
2023-07-29T06:28:20.581231736Z Python 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
2023-07-29T06:28:20.581256173Z Version: v1.5.1
2023-07-29T06:28:20.581258698Z Commit hash: 68f336bd994bed5442ad95bad6b6ad5564a5409a
2023-07-29T06:28:20.581260371Z 
2023-07-29T06:28:20.581261924Z 
2023-07-29T06:28:20.581263447Z Launching Web UI with arguments: -f --port 3000 --xformers --skip-install --listen --enable-insecure-extension-access
2023-07-29T06:28:20.581282242Z no module 'xformers'. Processing without...
2023-07-29T06:28:20.581286390Z no module 'xformers'. Processing without...
2023-07-29T06:28:20.581287803Z No module 'xformers'. Proceeding without it.

Upgrade PyTorch version to 2.1.0 and CUDA 12.1.1

Hi Runpod, it would be great if you can either upgrade the current template for PyTorch and CUDA to the new version or create a new template with the newer version of PyTorch and CUDA since some libraries have a dependency on this.

Serverless Automatic needs --listen

The workers won't actually be able to start up. I fixed this in my own build and it worked. https://github.com/runpod/containers/blob/main/serverless-automatic/start.sh#L7

Unable to build comfyui image

I'm trying to build the img locally running:

docker build -t runpod/stable-diffusion-comfyui-custom -f official-templates/stable-diffusion-comfyui/Dockerfile .

From the root of the repository (all files have full permission as well).

Regardless I'm getting the following error:

[+] Building 1.1s (12/12) FINISHED                                                                                                                                                                                           docker:default
 => [internal] load .dockerignore                                                                                                                                                                                                      0.0s
 => => transferring context: 2B                                                                                                                                                                                                        0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                                   0.0s
 => => transferring dockerfile: 2.93kB                                                                                                                                                                                                 0.0s
 => ERROR [internal] load metadata for docker.io/library/scripts:latest                                                                                                                                                                1.0s
 => CANCELED [internal] load metadata for docker.io/nvidia/cuda:11.8.0-base-ubuntu22.04                                                                                                                                                1.0s
 => ERROR [internal] load metadata for docker.io/library/proxy:latest                                                                                                                                                                  1.0s
 => CANCELED [internal] load metadata for docker.io/runpod/stable-diffusion:models-1.0.0                                                                                                                                               1.0s
 => CANCELED [internal] load metadata for docker.io/runpod/stable-diffusion-models:2.1                                                                                                                                                 1.0s
 => [auth] library/proxy:pull token for registry-1.docker.io                                                                                                                                                                           0.0s
 => [auth] nvidia/cuda:pull token for registry-1.docker.io                                                                                                                                                                             0.0s
 => [auth] library/scripts:pull token for registry-1.docker.io                                                                                                                                                                         0.0s
 => [auth] runpod/stable-diffusion-models:pull token for registry-1.docker.io                                                                                                                                                          0.0s
 => [auth] runpod/stable-diffusion:pull token for registry-1.docker.io                                                                                                                                                                 0.0s
------
 > [internal] load metadata for docker.io/library/scripts:latest:
------
------
 > [internal] load metadata for docker.io/library/proxy:latest:
------
Dockerfile:70
--------------------
  68 |     # Start Scripts
  69 |     COPY pre_start.sh /pre_start.sh
  70 | >>> COPY --from=scripts start.sh /
  71 |     RUN chmod +x /start.sh
  72 |     
--------------------
ERROR: failed to solve: scripts: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed

Can you please help me figure out what I'm doing wrong? Keep in mind that I have not modified anything from the Dockerfile yet :( (I also succesfully did ''docker login'')

Investigate Torch image and SXM5 comparability

Image appears to with with PCIe but not SXM5

Tensorflow container has been deleted?

I am trying to understand where this container comes from https://hub.docker.com/r/runpod/tensorflow
It links to this git repo but the folder has been deleted.

I would like to run a newer version of tensorflow but don't understand how I could update the container that currently exists in runpod for tensorflow

[Feature Request] jupyter lab start notebook from url

If we can start a notebook from a URL, this feature would become very helpful for both users and template creators.

In my case, I should instruct the user to enter the following URL: https://github.com/camenduru/stable-diffusion-webui-runpod, and then copy and paste the code into a new notebook. However, this manual process can be avoided by using JupyterLab's 'start notebook' feature.

Building wheel for pycairo (pyproject.toml): finished with status 'error' when installng ControlNet v1.1.142

This error appears when relaunching the webui process after installing ControlNet v1.1.142
running : runpod/stable-diffusion:web-automatic-5.0.0

2023-05-06T18:42:40.063374181Z   Building wheel for pycairo (pyproject.toml): finished with status 'error'
2023-05-06T18:42:40.063378201Z Failed to build pycairo
2023-05-06T18:42:40.063381781Z 
2023-05-06T18:42:40.063385181Z stderr:   error: subprocess-exited-with-error
2023-05-06T18:42:40.063388871Z   
2023-05-06T18:42:40.063392351Z   × Building wheel for pycairo (pyproject.toml) did not run successfully.
2023-05-06T18:42:40.063398041Z   │ exit code: 1
2023-05-06T18:42:40.063401791Z   ╰─> [12 lines of output]
2023-05-06T18:42:40.063405601Z       running bdist_wheel
2023-05-06T18:42:40.063409130Z       running build
2023-05-06T18:42:40.063412730Z       running build_py
2023-05-06T18:42:40.063416320Z       creating build
2023-05-06T18:42:40.063419860Z       creating build/lib.linux-x86_64-cpython-310
2023-05-06T18:42:40.063423460Z       creating build/lib.linux-x86_64-cpython-310/cairo
2023-05-06T18:42:40.063427140Z       copying cairo/__init__.py -> build/lib.linux-x86_64-cpython-310/cairo
2023-05-06T18:42:40.063430940Z       copying cairo/__init__.pyi -> build/lib.linux-x86_64-cpython-310/cairo
2023-05-06T18:42:40.063434690Z       copying cairo/py.typed -> build/lib.linux-x86_64-cpython-310/cairo
2023-05-06T18:42:40.063438410Z       running build_ext
2023-05-06T18:42:40.063441890Z       'pkg-config' not found.
2023-05-06T18:42:40.063445430Z       Command ['pkg-config', '--print-errors', '--exists', 'cairo >= 1.15.10']
2023-05-06T18:42:40.063449340Z       [end of output]
2023-05-06T18:42:40.063452820Z   
2023-05-06T18:42:40.063456250Z   note: This error originates from a subprocess, and is likely not a problem with pip.
2023-05-06T18:42:40.063460050Z   ERROR: Failed building wheel for pycairo
2023-05-06T18:42:40.063463640Z ERROR: Could not build wheels for pycairo, which is required to install pyproject.toml-based projects

it can be solved by installing the following before attempting to run the controlnet installer:

apt-get install libcairo2 libcairo2-dev

Container Cleanup

Single start.sh script that can be used for all of the containers
Implement NGINX proxy referencing README.md
Contains the NGINX config

start.sh

The script will add SSH key
Launch Jupyter
Launch NGINX for Proxy
Copy ENV variables to IP SSH

cuda error using official image

FROM runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04

using this docker file and running

import inference.models.yolo_world.yolo_world
YOLO = inference.models.yolo_world.yolo_world.YOLOWorld(model_id="yolo_world/l")

causes the following error:

UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
Creating inference sessions
UserWarning: Specified provider 'OpenVINOExecutionProvider' is not in available provider names.Available providers: 'TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider'
EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=-593199125 ; hostname=0a84033fcf95 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=238 ; expr=cudaSetDevice(info_.device_id); 

 when using ['CUDAExecutionProvider', 'OpenVINOExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 383, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 435, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=-593199125 ; hostname=0a84033fcf95 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=238 ; expr=cudaSetDevice(info_.device_id); 



The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/scripts/temp.py", line 4, in <module>
    YOLO = inference.models.yolo_world.yolo_world.YOLOWorld(model_id="yolo_world/l")
  File "/usr/local/lib/python3.10/dist-packages/inference/models/yolo_world/yolo_world.py", line 54, in __init__
    clip_model = Clip(model_id="clip/ViT-B-32")
  File "/usr/local/lib/python3.10/dist-packages/inference/models/clip/clip_model.py", line 65, in __init__
    self.visual_onnx_session = onnxruntime.InferenceSession(
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 394, in __init__
    raise fallback_error from e
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 389, in __init__
    self._create_inference_session(self._fallback_providers, None)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 435, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*, const char*, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 804: forward compatibility was attempted on non supported HW ; GPU=-593199125 ; hostname=0a84033fcf95 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=238 ; expr=cudaSetDevice(info_.device_id);

The same python script using

FROM pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime

works as expected.

nvidia-smi 
Mon Jun  3 22:59:43 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   63C    P0    25W /  80W |   1538MiB /  8192MiB |     94%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

docker-compose.yaml

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu ]

TGI Docker file not able to access GPU?

I am creating a pod that uses HF's text-generation-interface (TGI) Docker container (see image_name below). I can create a pod successfully as long as I do not pass in the --quantize parameter within the docker_args. For example, if I pass in docker_args="--model-id "tiiuae/falcon-7b-instruct" --num-shard 1 --quantize bitsandbytes" The error in the container log has...2023-08-10T11:30:29.101220272-06:00 /opt/conda/lib/python3.9/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. and ends with the message: 2023-08-10T11:30:29.101315592-06:00 ValueError: quantization is not available on CPU

HF support's comment when I asked on GitHub: it seems more to me that the GPU is not detected in the docker image, and that error message is bogus stemming from that. (I can run fine with 1.0.0 with bnb on a simple docker + gpu environement). Another comment just made by HF GitHub: Something about shm not being properly set or something.... If I try with the other quantization option gptq, the container throws a signal 4. Is the container seeing the GPU? What is going on with bitsandbytes? Why signal 4. I am hoping to minimize the amount of memory and inference time. Help very much appreciated.

Here is my call to create_pod: pod = runpod.create_pod( name=model_id, image_name="ghcr.io/huggingface/text-generation-inference:1.0.0", gpu_type_id=gpu_type, cloud_type=cloud_type, docker_args=f"--model-id {model_id} --num-shard {num_shard} -quantize {quantize}", gpu_count=gpu_count, volume_in_gb=volume_in_gb, container_disk_in_gb=5, ports="80/http", volume_mount_path="/data", # min_vcpu_count=2, # min_memory_in_gb=15, ) The specs on my community pod is 1 x RTX 3090 9 vCPU 37 GB RAM.

Thank you.

TLB training fails

Then trying to train on some models like Lykon/Dreamshape it fails.
There is an os error that it cannot find config.json.
Please check it

Adding --api support to oobabooga/text-generation-web-ui

It would be very nice to have an environment variable to launch oobabooga (https://github.com/runpod/containers/blob/main/oobabooga/start.sh) with the API. See https://github.com/oobabooga/text-generation-webui#api for more information on this.

Otherwise, the only way to launch text-generation-web-ui with the API is to build my own docker image from scratch.

An alternative would be an ARGS environment variable to let us pass whatever we need to the python app.

Thank you for your consideration :)

cuda/cudnn mismatch in TF instances

Hi RunPod!

I am experiencing issues with the performance of TensorFlow on your A100 80GB machines. The problems seem to originate from an apparent version mismatch between CUDA, cuDNN, and cuBLAS, which is not aligning properly with the version of TensorFlow currently utilized on your systems.

Additionally, I have noticed significantly slow training times on my setups that are beyond what is normally expected. This sluggish performance is particularly noticeable when compared with a 40GB Colab A100 machine which often even outperforms your 1 A100 80GB setup.

Here are the error messages I am receiving:

When initiating training on my single A100 80GB machine:

2023-07-15 00:38:25.585795: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda/lib64/libcublas.so.11: undefined symbol: cublasGetSmCountTarget
2023-07-15 00:38:25.781595: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

And also, when I prepared my data, model, and everything else on my 4xA100 80GB machine a while back:

2023-06-22 20:07:13.541513: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda/lib64/libcublas.so.11: undefined symbol: cublasGetSmCountTarget
2023-06-22 20:07:13.881071: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-06-22 20:07:14.015923: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
2023-06-22 20:07:14.563243: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
2023-06-22 20:07:15.052808: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401

I picked RunPod as my go-to choice when I decided to move on from Colab, thanks to the potential I saw in your platform. Despite the current, let's call them firmware challenges, I'm hopeful that you get your systems up do date and fixed.

All the best!

Pytorch cuda image nvidia driver not installed?

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Add thebloke's docker image

Will use his docker and then a FROM to build our own, adding our files on top.

How to install docker runtime in `runpod/base:0.5.1-cpu` container instance

run a container instance from runpod/base:0.5.1-cpu image
docker run --name base -it -d runpod/base:0.5.1-cpu

after,into the container
docker exec -it base /bin/bash

I want install docker-ce in the base container ,follow docker doc https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository

apt-get update; \
apt-get install -y sudo \
	ca-certificates \
	vim \
	curl; \
sudo install -m 0755 -d /etc/apt/keyrings; \
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc; \
sudo chmod a+r /etc/apt/keyrings/docker.asc; \
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null; \
sudo apt-get update; \
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

my question, docker is not running, please tell me what todo, thanks

Add Torch Vision to Comfy UI

Hello,

Template: https://github.com/runpod/containers/tree/main/official-templates/stable-diffusion-comfyui

User comment:

I'm trying to install ComfyUI Manager the standard way with git clone into the custom_nodes folder and it doesn't appear in the UI. I don't know of any other way. Am I missing something?

OK nevermind. I figured it out. I had to install torch vision. The extension is 1.5MB and it's the basic one that let's you download other extensions, so it would be convenient to include it.

Thanks!
JM