foldingathome / containers Goto Github PK
View Code? Open in Web Editor NEWDocker containers for easily launching the Folding@home client anywhere!
License: Creative Commons Zero v1.0 Universal
Docker containers for easily launching the Folding@home client anywhere!
License: Creative Commons Zero v1.0 Universal
Some NAS systems by Synology, Inc. offer for the deployment of custom images in its preconfigured integrated Docker installment.
I have successfully deployd fah-gpu/v5.7.1 on a Synology DS 218+ two bay NAS running DSM 6.2.3-25426 Update 2 (which is a 4.4.59+ #25426 SMP PREEMPT x86_64 GNU/Linux synology_apollolake_218+). Obstacles on the way:
--gpus
switch from the docker run commandallow
and web-allow
config option must be correctly configured
-p 7396:7396
<web-allow>ADDRESSES</web-allow>
and <web-allow v='ADDRESSES'>
are unexpectedly equally valid methods to allow external access, the configuration <allow v='172.17.0.1/24'/><web-allow v='172.17.0.1/24'/>
permits for web access using the standard configuration of Docker's Bridged network device00:29:18:WU02:FS00:Download complete
00:29:19:WU02:FS00:Received Unit: id:02 state:DOWNLOAD error:NO_ERROR project:12261 run:0 clone:236 gen:81 core:0x23 unit:0x000000ec0000005100002fe500000000
00:29:19:WU02:FS00:Starting
00:29:19:WU02:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /fah/cores/cores.foldingathome.org/openmm-core-23/centos-7.9.2009-64bit/release/0x23-8.0.3/Core_23.fah/FahCore_23 -dir 02 -suffix 01 -version 706 -lifeline 1 -checkpoint 15 -opencl-platform 0 -opencl-device 0 -cuda-device 0 -gpu-vendor nvidia -gpu 0 -gpu-usage 100
00:29:19:WU02:FS00:Started FahCore on PID 50
00:29:19:WU02:FS00:Core PID:54
00:29:19:WU02:FS00:FahCore 0x23 started
00:29:19:WARNING:WU02:FS00:FahCore returned: WU_STALLED (127 = 0x7f)
Core 0x23 seems to require OpenCL 3.0. But, OpenCL 3.0 does not work properly on CUDA 11.2.2.
$ docker exec -it fah0 clinfo
Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 3.0 CUDA 12.2.148
Platform Profile FULL_PROFILE
(snip)
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.11
ICD loader Profile OpenCL 2.1
NOTE: your OpenCL library only supports OpenCL 2.1,
but some installed platforms support OpenCL 3.0.
Programs using 3.0 features may crash
or behave unexepectedly
According to the NVIDIA Technical Blog, NVIDIA supports OpenCL 3.0 since Linux driver version 465.19.1. The matching CUDA version would be 11.3.1 according to the CUDA release notes
Therefore, I guess that the CUDA version of base image should be updated at least 11.3.1.
Also reported here: FoldingAtHome/fah-issues#1571
Hi,
This is failing for me on a Jetson Nano:
Steps:
# fah/config.xml
<config>
<!-- Set with your user, passkey, team-->
<user value="REDACTED"/>
<passkey value="REDACTED"/>
<team value="0"/>
<power value="full"/>
<exit-when-done v='true'/>
<web-enable v='false'/>
<disable-viz v='true'/>
<gui-enabled v='false'/>
<!-- 1 slots for GPUs -->
<slot id='0' type='GPU'> </slot>
<!-- 16-1 = 15 = 3*5 for decomposition -->
<slot id='1' type='SMP'> <cpus v='15'/> </slot>
</config>
$ docker run --gpus all --name fah0 -d --user "$(id -u):$(id -g)" --volume $HOME/fah:/fah foldingathome/fah-gpu:latest
08a13874aba9145efd6b729c2b543f90e6fd250c0a880eb6721dc577816a950b
works
Then I run:
$ docker start fah0
fah0
~$ docker logs fah0
standard_init_linux.go:211: exec user process caused "exec format error"
standard_init_linux.go:211: exec user process caused "exec format error"
Prerequisites:
Docker > 19
$ docker -v
Docker version 19.03.6, build 369ce74a3c
$ nvidia-container-runtime -v
runc version spec: 1.0.1-dev
nvidia-container-runtime is already the newest version (3.1.0-1).
specs:
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
These containers require a --gpu
flag for Docker that is not available in a fresh install on Ubuntu 20.04 amd64
with the amdgpu
driver. Also the readme of the container explicitly mentions to use an nvidia-container-runtime
. This appears a too-strict dependency, which brings vendor lock-in and a decrease of diversity in computational ecosystems.
Does it appear valuable to also support the AMD ROCm platform, or are there any plans yet to do so?
Few resources become interesting upon that sight:
Please could you build a new version of the image as a new fah-client version has been released.
Also, it may be worth setting up some sort of auto-triggered pipeline to build a new image whenever a new version of the application is available.
I'm using the fah-gpu-amd container to run the folding@home client on my desktop, as the OS it runs is not supported by ROCm userspace and having a consistent environment is much simpler.
I've noticed if the folding@home client kills a subprocess for some reason (pausing, or if some bug is detected), the process ends up as a zombie and the folding@home client never cleans it up. As folding@home is PID 1, there is no other reaper and the process continue to exist and the client gets wedged, unable to respawn the work unit. Note: this happens to both GPU and CPU WU on the same machine.
Could the folding@home client be updated to reap these processes? Otherwise, could the containers be updated with a different PID 1 to reap these dead children? If the new PID 1 is wanted, I could take a look at creating an appropriate PR.
Hi I am not sure if this is the right place to open a ticket but if not I am hoping you could help point me in the right direction.
I work for VMware and have worked for several other infrastructure companies where we burn many thousands of hours of cpu cycles every year running some random sample application, where the sample app itself has no point, its just needed to have some app to demonstrate our infrastructure software underneath.
I would like to see if its possible to turn at least some demo use cases into demos that could use folding@home as our sample workload. I am just starting to investigate this idea and dont yet know basic things like what is the minimal hardware requirement a container running folding@home would need or how long does a container need to run to create at least a minimal or greater benefit.
I realize this use case may not work well for all of our demos, but we have a lot of different types of demos and I am very hopeful we could find demo types that could make a meaningful contribution of cpu cycles. For example while most of our demo apps are short running, I am proposing some ideas for events where we may partner with an organization like folding@home to run larger scale demos that may run for a day or multiple days. Even during shorter demos, we often have our engineers access a temporary environment for their demo that continues to run for some period of time after they finish giving the demo, sometimes for multiple days or weeks.
I am actively trying to pursue exploring this idea within VMware and am not sure how quickly I will make progress, but if you have any advice or guidance I would be very grateful.
Thank you!
The size of the current fah-gpu-amd image is about 3.9GB.
It is too large compared to 92.1MB of the size of fah-gpu (Nvidia CUDA).
Currently, fah-gpu-amd installs the rocm-dev
package, which depends on the ROCm runtime and development libraries and LLVM compilers that rely on the g++ development environment.
I guess that this image's goal is to provide OpenMM, precisely speaking OpenCL, runtime to run the FAH core 22 efficiently.
If my guess is correct, only the runtimes related to OpenCL acceleration would be enough.
The README.md
outlines so many benefits, but completely misses the part about performance. I think that for distributed computing community the performance is very important topic to cover.
https://hackernoon.com/another-reason-why-your-docker-containers-may-be-slow-d37207dec27f
#7 was closed without the resolution about performance, and as I constantly get back to this, I decided to raise the dedicated issue. #8 also contains question about performance on Google Cloud Platform.
The goal is to measure and compare performance of F@H with and without containers on similar hardware. To see if containerization of such payload is efficient, and provides ways to troubleshoot and improve that.
Storing this idea here, for eventual consideration when there is enough interest.
Most of the projects are licensed under GPL which can get an automatic approval from many companies to allow their employees to work on it because it's a known, long standing, license which is well understood.
This repo is licensed under CC0, which is a lesser used license, and so will increase the work needed to allow some folk to work on the project.
Please consider altering the license to GPL or a similarly well known license (e.g. CC-BY, Apache 2, MIT, BSD, LGPL)
Given that core 22 is moving to CUDA 11.x shall we do the same?
It is best practice to run containers as a non-root user. This could be added to the design rules and is easily achieved by adding the following to the dockerfile:
# Add Folding user
RUN useradd -mG folding
# Run as non privileged user
USER folding
After doing this users will need to change the ownership of their host fah volume as it will still be owned by root. This will be uid 1000 gid 1000
I have deployed fah-gpu container on Google Kubernetes Engine (Google Cloud Platform). Currently I'm using cluster with one node:
Machine type: n1-standard-1 (1 vCPU, 3.75 GB memory)
CPU: Intel(R) Xeon(R) CPU @ 2.20GHz, GenuineIntel Family 6 Model 79 Stepping 0
OS: Container-Optimized OS
GPU: NVIDIA Tesla T4
Preemptible VM
I'm running single GPU folding slot, and it works well - GPU yields ~800-850k points per day, and is able to process very massive work units in reasonable amount of time. However, I wonder if it is possible to increase my GPU performance, so I have 3 issues, which probably are worth to be discussed and noted in Readme file.
sgnsajgon@cloudshell:~$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-folding-at-home-n1-std-1-tesla-t4-0eeec1f6-7rlh 1000m 106% 1758Mi 66%
sgnsajgon@cloudshell:~$ kubectl top pods
NAME CPU(cores) MEMORY(bytes)
fahclient-gpu-statefulset-0 955m 739Mi
We can see that node CPU is fully utilized, almost entirely consumed by fahclient Pod (there are also other kube-system Pods deployed on this node, but they do not consume much of resources).
Is it possible that if I deploy fah-gpu on faster or more modern CPU, then I will get better GPU performance? Which CPU (in terms of it's parameters) is the optimal choice for my GPU, and in general, for other GPUs in context of fah-gpu container? How can we check whether we get maximum performance out of our GPUs?
Will I get any performance boost if I use newer CUDA toolkit than 9.2, for example if I use newer base Docker image, i.e. 10.1-base-ubuntu18.04, or the latest?
I'm thinking about scaling up my cluster. Which option would be better in context of performance - running 2 GPUs (and 2 vCPUs and more RAM) on single node (vertical scaling), or running two separate nodes, each having 1 GPU and 1 vCPU (horizontal scaling).
I think some information and tips on this subject would be greatly appreciated.
Thank you so much, great job, I'm looking forward to new features.
I just pushed foldingathome/fah-gpu-amd:22.01.0-rc1
to Docker Hub, can one/both of you verify it's a good image on your systems, and also check the README wasn't mangled by the slight differences in GitHub and Docker markdown?
https://hub.docker.com/r/foldingathome/fah-gpu-amd
If all is well, I will push 21.01.0
and latest
tags.
Thanks.
Note: Also, the docs should update the command:
from: fah-gpu:VERSION
to: foldingathome/fah-gpu:VERSION
OLD:
# Run container with GPUs, name it "fah0", map user and /fah volume
docker run --gpus all --name fah0 -d --user "$(id -u):$(id -g)" \
--volume $HOME/fah:/fah fah-gpu:VERSION
NEW:
# Run container with GPUs, name it "fah0", map user and /fah volume
docker run --gpus all --name fah0 -d --user "$(id -u):$(id -g)" \
--volume $HOME/fah:/fah foldingathome/fah-gpu:VERSION
Hi,
I do need to set a proxy server - while this is no problem in docker, and docker normally adds these settings to the container, the fah client seems to ignore them.
So the container is stuck at
~/fah# docker logs fah0
13:18:19:Downloading GPUs.txt from assign1.foldingathome.org:80
13:18:19:Connecting to assign1.foldingathome.org:80
How can I set the proxy inside the container?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.