Comments (16)
I used https://vast.ai/
Their servers aren't quite as reliable or secure as regular cloud providers but they are much cheaper.
Also though, I'm working on a version of the training script that will uses less resources! Stay tuned
from pokemonredexperiments.
Joking Tone
100GB of RAM? Those be some Rookie numbers. :P
from pokemonredexperiments.
For all in this thread, the new script run_baseline_parallel_fast.py
trains much faster and only uses 15-20G of memory!
from pokemonredexperiments.
Okay, I got it training with 8GB of RAM. Will see if it learns anything
from pokemonredexperiments.
Piggy-backing on this, are there any hardware requirements beyond a large chunk of RAM? I ask because this seems like a lot of fun to run through, and I run a rather beefy server architecture at home (I work in IT, and I'd rather run my own cloud than use someone else's, for labs and messing around with stuff).
My ESXI server has 28 physical cores (56 threads), and 192GB of RAM, but no descrete GPU. Is that workable, or do I need a dedicated GPU to run the training?
from pokemonredexperiments.
I could be completely wrong but I think no gpu is required to run this training currently.
Those specs are better than what I am running on so you should be able to run it no problem
from pokemonredexperiments.
Piggy-backing on this, are there any hardware requirements beyond a large chunk of RAM? I ask because this seems like a lot of fun to run through, and I run a rather beefy server architecture at home (I work in IT, and I'd rather run my own cloud than use someone else's, for labs and messing around with stuff).
The more CPU's you have the more instances of training you can do (by adjusting num_cpu in run_baseline_parallel.py, I managed twice the amount of threads but your mileage may vary, adjusting that num_cpu higher or lower does cause the RAM usage to go up or down respectively, I'm running fine with num_cpu at 24 with a 12 thread CPU and 32GB of RAM).
from pokemonredexperiments.
Piggy-backing on this, are there any hardware requirements beyond a large chunk of RAM? I ask because this seems like a lot of fun to run through, and I run a rather beefy server architecture at home (I work in IT, and I'd rather run my own cloud than use someone else's, for labs and messing around with stuff).
My ESXI server has 28 physical cores (56 threads), and 192GB of RAM, but no descrete GPU. Is that workable, or do I need a dedicated GPU to run the training?
As the code stands CPU and RAM is needed, and it doesn't touch the GPU. So you could set your num_cpu to 28, and be golden. Maxing out my cores to 24, I only use about 90GB of RAM when things are going. I set my cores to 30 and it maxes out my CPUs, but only went to about 105GB of RAM used. So you should be golden.
from pokemonredexperiments.
Piggy-backing on this, are there any hardware requirements beyond a large chunk of RAM? I ask because this seems like a lot of fun to run through, and I run a rather beefy server architecture at home (I work in IT, and I'd rather run my own cloud than use someone else's, for labs and messing around with stuff).
My ESXI server has 28 physical cores (56 threads), and 192GB of RAM, but no descrete GPU. Is that workable, or do I need a dedicated GPU to run the training?As the code stands CPU and RAM is needed, and it doesn't touch the GPU. So you could set your num_cpu to 28, and be golden. Maxing out my cores to 24, I only use about 90GB of RAM when things are going. I set my cores to 30 and it maxes out my CPUs, but only went to about 105GB of RAM used. So you should be golden.
Thank you!
from pokemonredexperiments.
Thanks for sharing! I'm wondering if you used Docker to train your model in the cloud or did you take another route?
from pokemonredexperiments.
I can almost guarantee that I can bring the RAM usage way down... porting it now, will need a bit of time. CPU cores will still be non-negotiable, the env is slow. 50 steps/second/core more or less
from pokemonredexperiments.
Found it!
n_steps is the number of frames per environment that you are keeping in memory. So 2048*8 for each of 44 environments... 720,896. Napkin math says 44 GB of observations without any optimizations. That batch size is not unheard of in RL, particularly for long games, but probably it can be made lower.
from pokemonredexperiments.
How long does a training session useally take?, Or do you just stop it and try the run_pretrained after? (Sorry new to this, but want to join in this fun!!)
from pokemonredexperiments.
How long does a training session useally take?, Or do you just stop it and try the run_pretrained after? (Sorry new to this, but want to join in this fun!!)
This is a trick question/answer.
(Please note I'm training my AI differently then most)
With running 24 cores(instants of the game) with ep_length at 8192 * 10, it takes just over an hour to get a session done. From there you can let the AI keep running another session, or press Control +C to stop it. From there you have to edit the to watch a run with your session folder and your Step file.
Normally starting a new training session with 2048 * 10, you should see your AI start to get the first gym badge. From there you let it keep training, and it gets better at doing it.
The current hangup is Mount Moon, but everyone are trying to find the missing key for this.
from pokemonredexperiments.
In the parallel_fast.py file, what is the tricks which speed up the training?
I noticed that the batch_size shrinked by 4x and num_cpy shrinked by 3x, is there some other tricks that is adopted?
from pokemonredexperiments.
In the parallel_fast.py file, what is the tricks which speed up the training?
I noticed that the batch_size shrinked by 4x and num_cpy shrinked by 3x, is there some other tricks that is adopted?
That's actually the mini batch size. The normal batch sized is mathed into the system which also used the ep_length.
With the smaller batch size, it can look at the data fast then looking at the larger 512.
from pokemonredexperiments.
Related Issues (20)
- error randomly happens while training HOT 2
- Module pyboy.logger not found HOT 1
- II
- Coord out of bounds HOT 1
- How to change GPU HOT 5
- Saved games interactive mode HOT 1
- Change Reward function to give points to number of unique pokemon HOT 2
- Windows 10 Random Crash After Unknown Amount of Attempts HOT 3
- What is the (human) learning path to know how to build this from scratch? HOT 1
- i got an error when i tried to install requirements.txt HOT 1
- Please help a French - stuck at installation HOT 3
- business contact
- Step three HOT 2
- × Preparing metadata (pyproject.toml) did not run successfully. HOT 1
- Python 3.12 Issue - "Deprecated Module 'distutils' not available anymore." HOT 1
- How do I use my [AMD] GPU HOT 3
- VisuliseProgress.ipynb error HOT 2
- About Visualization HOT 1
- Error running run_pretrained_interactive.py
- how to show pokemon ID, Lvl and Name below Agent Name on the map?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pokemonredexperiments.