First, thx for the open-source code! But when I run the example <code class="notransla

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Very high CPU usage about acme HOT 5 CLOSED

google-deepmind commented on May 18, 2024

Very high CPU usage

from acme.

Comments (5)

LQNew commented on May 18, 2024 4

@fastturtle, thx for your detailed and nice reply! I'm sure that the GPU is used for trainning and inference, by typing nvidia-smi in the terminal to observe GPU staus, I find the program uses 1460 MiB GPU memory and with around 30% Volatile GPU-Util.

Also, I have done some extra experiments, I find in other envs, by running other examples, the highest CPU usage is around 800%, compared to above 1900% CPU usage with DQN running on the Atari games, CPU usage is greatly reduced. I suspect that in Atari games, with default settings in Acme, CPU usage is actually high. Later, I will try tuning the parameters of reverb to test results according to your suggestion, but as for your mentioned parameter num_workers_per_iterator, I think in DQN agent, the place I can change is here: replay_client = reverb.TFClient(address) # initialize DQN replay client, but I find that the parameter num_workers_per_iterator can't be changed in class TFClient , so am I right or there exists other place to change the parameter num_workers_per_iterator ?

BTW, I feel very excited about the usage of snappy, so does Acme support use of snappy now or will Acme release the code of running code in snappy? If there doesn't exist the usage of snappy in Acme, I look forward to enjoying the usage of snappy in the near future!

from acme.

PPSantos commented on May 18, 2024

Hi all!
I did not experience such high CPU values, however, I did observe a high memory usage. In order to mitigate this situation a bit, I tweaked the reverb related parameters that can be found in the acme.datasets.reverb module (around these lines):

https://github.com/deepmind/acme/blob/f17b91c03aed5ecf25df1b5c01970ebe68b2c733/acme/datasets/reverb.py#L105-L121

By doing this I was able to control the number of spawned threads. This was something I had to consider because I am running multiple copies of the framework in parallel and, thus, I need to somehow limit a bit the resources used by each instance. However, notice that these changes will most surely affect performance.

Finally, note that I'm just a mere user, not a contributor.

(and I am not running the framework on Atari games)

from acme.

fastturtle commented on May 18, 2024

Hi @LQNew, I'm a bit surprised to see your CPU usage is so high. Reverb's dataset does spawn multiple threads to parallelize data reads, so perhaps tuning the parameters @PPSantos mentioned along with max_workers_per_iterator could help. Are you certain that the GPU is being used for training and inference?

As for memory usage, without compression and using the default parameters for run_dqn.py the replay will take approximately 1e6 * 84 * 84 * 4 bytes = 28.224 GB (this is the size of one observation multiplied by the max replay size). However compression should significantly reduce this, I ran a quick experiment with snappy and found the compression ratio is approximately 0.06, so the data should only take up ~1.7 GB.

from acme.

LQNew commented on May 18, 2024

@PPSantos , thx for your reply! I also run some examples on other envs, such as running the example run_dqn.py under directory examples/bsuite, in the bsuite env, find aroud 400% ~ 800% CPU usge on my PC, compared to above 1900% CPU usage with DQN running on the Atari games, CPU usage is greatly reduced.

from acme.

nikolamomchev commented on May 18, 2024

@LQNew thanks for the question and sorry if it was not sufficiently well answered.

I'll be closing the issue as it's been a while since there was any action on it and I assume it may not be relevant. Please submit a separate one if you still have problems and we'll try to answer to the best of our abilities.

from acme.

Very high CPU usage about acme HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent