Code Monkey home page Code Monkey logo

Comments (5)

LQNew avatar LQNew commented on May 18, 2024 4

@fastturtle, thx for your detailed and nice reply! I'm sure that the GPU is used for trainning and inference, by typing nvidia-smi in the terminal to observe GPU staus, I find the program uses 1460 MiB GPU memory and with around 30% Volatile GPU-Util.

Also, I have done some extra experiments, I find in other envs, by running other examples, the highest CPU usage is around 800%, compared to above 1900% CPU usage with DQN running on the Atari games, CPU usage is greatly reduced. I suspect that in Atari games, with default settings in Acme, CPU usage is actually high. Later, I will try tuning the parameters of reverb to test results according to your suggestion, but as for your mentioned parameter num_workers_per_iterator, I think in DQN agent, the place I can change is here: replay_client = reverb.TFClient(address) # initialize DQN replay client, but I find that the parameter num_workers_per_iterator can't be changed in class TFClient , so am I right or there exists other place to change the parameter num_workers_per_iterator ?

BTW, I feel very excited about the usage of snappy, so does Acme support use of snappy now or will Acme release the code of running code in snappy? If there doesn't exist the usage of snappy in Acme, I look forward to enjoying the usage of snappy in the near future!

from acme.

PPSantos avatar PPSantos commented on May 18, 2024

Hi all!
I did not experience such high CPU values, however, I did observe a high memory usage. In order to mitigate this situation a bit, I tweaked the reverb related parameters that can be found in the acme.datasets.reverb module (around these lines):

https://github.com/deepmind/acme/blob/f17b91c03aed5ecf25df1b5c01970ebe68b2c733/acme/datasets/reverb.py#L105-L121

By doing this I was able to control the number of spawned threads. This was something I had to consider because I am running multiple copies of the framework in parallel and, thus, I need to somehow limit a bit the resources used by each instance. However, notice that these changes will most surely affect performance.

Finally, note that I'm just a mere user, not a contributor.

(and I am not running the framework on Atari games)

from acme.

fastturtle avatar fastturtle commented on May 18, 2024

Hi @LQNew, I'm a bit surprised to see your CPU usage is so high. Reverb's dataset does spawn multiple threads to parallelize data reads, so perhaps tuning the parameters @PPSantos mentioned along with max_workers_per_iterator could help. Are you certain that the GPU is being used for training and inference?

As for memory usage, without compression and using the default parameters for run_dqn.py the replay will take approximately 1e6 * 84 * 84 * 4 bytes = 28.224 GB (this is the size of one observation multiplied by the max replay size). However compression should significantly reduce this, I ran a quick experiment with snappy and found the compression ratio is approximately 0.06, so the data should only take up ~1.7 GB.

from acme.

LQNew avatar LQNew commented on May 18, 2024

@PPSantos , thx for your reply! I also run some examples on other envs, such as running the example run_dqn.py under directory examples/bsuite, in the bsuite env, find aroud 400% ~ 800% CPU usge on my PC, compared to above 1900% CPU usage with DQN running on the Atari games, CPU usage is greatly reduced.

from acme.

nikolamomchev avatar nikolamomchev commented on May 18, 2024

@LQNew thanks for the question and sorry if it was not sufficiently well answered.

I'll be closing the issue as it's been a while since there was any action on it and I assume it may not be relevant. Please submit a separate one if you still have problems and we'll try to answer to the best of our abilities.

from acme.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.