altmand / guandan_mcc Goto Github PK
View Code? Open in Web Editor NEWmcc_second_guandan
License: Apache License 2.0
mcc_second_guandan
License: Apache License 2.0
Now I can run the game in wintest file. However, I want to train a model but after executing learner.py in learner, Receiving FPS is 0.00 and Consuming FPS is also 0.0. I run actor.py first, then I run learner.py. Is something wrong with these execution steps?
环境:
python=3.8
tensorflow=1.15.5+cu113
numpy=1.18.5
ws4py=0.5.1
pyarrow=5.0.0
pyzmq=22.3.0
python actor_n/actor.py
python learner/learner.py
parser = ArgumentParser()
parser.add_argument('--ip', type=str, default='127.0.0.1',
help='IP address of learner server')
parser.add_argument('--data_port', type=int, default=5000,
help='Learner server port to send training data')
parser.add_argument('--param_port', type=int, default=5001,
help='Learner server port to subscribe model parameters')
parser.add_argument('--exp_path', type=str, default='/mnt/workspace/guandan_mcc/Clients',
help='Directory to save logging data, model parameters and config file')
parser.add_argument('--num_saved_ckpt', type=int, default=4,
help='Number of recent checkpoint files to be saved')
parser.add_argument('--observation_space', type=int, default=(567,),
help='The YAML configuration file')
parser.add_argument('--action_space', type=int, default=(5, 216),
help='The YAML configuration file')
parser.add_argument('--epsilon', type=float, default=0.01,
help='Epsilon')
parser = ArgumentParser()
parser.add_argument('--alg', type=str, default='MC', help='The RL algorithm')
parser.add_argument('--env', type=str, default='GuanDan', help='The game environment')
parser.add_argument('--data_port', type=int, default=5000, help='Learner server port to receive training data')
parser.add_argument('--param_port', type=int, default=5001, help='Learner server to publish model parameters')
parser.add_argument('--model', type=str, default='guandan_model', help='Training model')
parser.add_argument('--pool_size', type=int, default=65536, help='The max length of data pool')
parser.add_argument('--batch_size', type=int, default=32768, help='The batch size for training')
parser.add_argument('--training_freq', type=int, default=250,
help='How many receptions of new data are between each training, '
'which can be fractional to represent more than one training per reception')
parser.add_argument('--keep_training', type=bool, default=False,
help="No matter whether new data is received recently, keep training as long as the data is enough "
"and ignore --training_freq
")
parser.add_argument('--config', type=str, default=None, help='Directory to config file')
parser.add_argument('--exp_path', type=str, default=None, help='Directory to save logging data and config file')
parser.add_argument('--record_throughput_interval', type=int, default=60,
help='The time interval between each throughput record')
parser.add_argument('--num_envs', type=int, default=1, help='The number of environment copies')
parser.add_argument('--ckpt_save_freq', type=int, default=3000, help='The number of updates between each weights saving')
parser.add_argument('--ckpt_save_type', type=str, default='weight', help='Type of checkpoint file will be recorded : weight(smaller) or checkpoint(bigger')
parser.add_argument('--observation_space', type=int, default=(567,),
help='The YAML configuration file')
parser.add_argument('--action_space', type=int, default=(5, 216),
help='The YAML configuration file')
parser.add_argument('--epsilon', type=float, default=0.01,
help='Epsilon')
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:5000 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:5001 0.0.0.0:* LISTEN
tcp 0 0 10.224.128.51:10250 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:59083 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:111 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:6000 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:6001 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:46609 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:34449 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:6002 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8082 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:6003 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:35253 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:8086 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:46423 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8088 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8888 0.0.0.0:* LISTEN
tcp 0 0 10.224.128.51:8889 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:43713 0.0.0.0:* LISTEN
tcp6 0 0 ::1:111 :::* LISTEN
tcp6 0 0 :::3011 :::* LISTEN
本地的5000,5001,6000,6001端口均已启动
启动learner.py后显示:
Logging to LEARNER-2023-11-09-17-05-31/log
Data socket has been bound to port 5000
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
Receiving FPS: 0.00, Consuming FPS: 0.00
依赖环境均按照项目给出的版本安装,没有对源码结构进行修改。learner.py里的参数配置均为默认值(修改过--pool_size,--batch_size的数值吞吐量打印还是为0)
1.如何训练模型?python learner/learner.py 是训练模型么?
2. actor_n/start.sh 是做什么的呢?如何修改ip 或者port呢?
我训练的是DMC模型,运行的项目目录为actor_n,learner_n
当我leaner的docker容器下运行learner_n目录下的start.sh文件:
sshpass ssh [email protected] "bash /yzm/Danzero_plus/actor_n/start.sh"
nohup /usr/bin/python -u /yzm/Danzero_plus/learner_n/learner.py > /yzm/Danzero_plus/learner_n/learner_out.log 2>&1 &
能够启动actor容器下的game.py 和actor.py文件,但是为什么我的learner.py的FPS输出一直为0?
运行game.py能够正常输出消息,
这是我运行actor.py文件的日志:
2024-01-17 09:44:02.781942: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. start0 2024-01-17 09:44:03.506227: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2899995000 Hz 2024-01-17 09:44:03.506503: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x178e020 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2024-01-17 09:44:03.506527: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2024-01-17 09:44:03.507608: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2024-01-17 09:44:03.507621: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303) 2024-01-17 09:44:03.507638: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist Logging to /yzm/Client0/log start1 2024-01-17 09:44:04.010809: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2899995000 Hz 2024-01-17 09:44:04.011124: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x178e020 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2024-01-17 09:44:04.011149: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2024-01-17 09:44:04.012251: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2024-01-17 09:44:04.012264: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303) 2024-01-17 09:44:04.012279: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist Logging to /yzm/Client1/log /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.deserialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. self._target(*self._args, **self._kwargs) /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. self._target(*self._args, **self._kwargs) start2 2024-01-17 09:44:04.514308: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2899995000 Hz 2024-01-17 09:44:04.514629: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x178e020 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2024-01-17 09:44:04.514654: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2024-01-17 09:44:04.515743: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2024-01-17 09:44:04.515756: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303) 2024-01-17 09:44:04.515771: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist Logging to /yzm/Client2/log /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.deserialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. self._target(*self._args, **self._kwargs) /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. self._target(*self._args, **self._kwargs) start3 2024-01-17 09:44:05.017970: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2899995000 Hz 2024-01-17 09:44:05.018274: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x178e020 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2024-01-17 09:44:05.018298: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2024-01-17 09:44:05.019441: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2024-01-17 09:44:05.019456: E tensorflow/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: UNKNOWN ERROR (303) 2024-01-17 09:44:05.019471: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist Logging to /yzm/Client3/log /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.deserialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. self._target(*self._args, **self._kwargs) /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. self._target(*self._args, **self._kwargs) /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.deserialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. self._target(*self._args, **self._kwargs) /usr/lib/python3.8/multiprocessing/process.py:108: FutureWarning: 'pyarrow.serialize' is deprecated as of 2.0.0 and will be removed in a future version. Use pickle or the pyarrow IPC functionality instead. self._target(*self._args, **self._kwargs)
这个dan.ckpt 是需要自己训练吗?
I am interested in this project and I want to see four clients are playing guandan. However, how to start the game correctly as I am reading the file in showdown but which one should be executed first?
一般训练1天左右,就会出现,actor端做了nan检查,过滤掉,但仍然会出现
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.