Comments (17)
which version of torch and os platform are you using?
from machin.
torch version is 1.8.1+cu111 and my os is Windows 10, but I already tried several different torch versions.
from machin.
Yeah, windows torch does not support rpc_sync
and any distributed model that is using this function (IMPALA, A3C, etc).
So far I don't have a windows platform to test so there might be some import errors. Could you please show the detailed error stack in python?
from machin.
Of course, see below:
Traceback (most recent call last):
File "c:<file_path>.py", line 1, in
from machin.frame.algorithms import DQN
File "C:...\AppData\Local\Programs\Python\Python38\lib\site-packages\machin_init_.py", line 1, in
from . import env, frame, model, parallel, utils
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env_init_.py", line 1, in
from . import utils, wrappers
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env\wrappers_init_.py", line 1, in
from . import base, openai_gym
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env\wrappers\openai_gym.py", line 8, in
from machin.parallel.exception import ExceptionWithTraceback
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel_init_.py", line 2, in
from . import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed_init_.py", line 1, in
from .world import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 535, in
class RpcGroup:
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 550, in RpcGroup
@_copy_doc(rpc.rpc_sync)
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'
from machin.
Oh, that error is easy to fix, for now as a temporary fix you need to do the following changes:
In file https://github.com/iffiX/machin/blob/master/machin/parallel/__init__.py
(C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel_init_.py on your local system)
- Remove
from . import distributed
- Remove
"distributed"
from__all__
The wrapper you are using does not depend on rpc functions.
Please notify me if any other import errors persist.
from machin.
I did make these changes, but unfortunately I still run into the following:
Traceback (most recent call last):
File "c:<file_path>.py", line 1, in
from machin.frame.algorithms import DQN
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin_init_.py", line 1, in
from . import env, frame, model, parallel, utils
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env_init_.py", line 1, in
from . import utils, wrappers
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env\wrappers_init_.py", line 1, in
from . import base, openai_gym
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env\wrappers\openai_gym.py", line 8, in
from machin.parallel.exception import ExceptionWithTraceback
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel_init_.py", line 2, in
from . import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\server_init_.py", line 1, in
from . import ordered_server
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\server\ordered_server.py", line 5, in
from ..distributed import RpcGroup
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed_init_.py", line 1, in
from .world import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 535, in
class RpcGroup:
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 550, in RpcGroup
@_copy_doc(rpc.rpc_sync)
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'
from machin.
Oh I forgot the "server", you also need to remove that. Sorry for this inconvenience.
from machin.
No worries. But still:
Traceback (most recent call last):
File "c:<file-path>.py", line 1, in
from machin.frame.algorithms import DQN
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin_init_.py", line 1, in
from . import env, frame, model, parallel, utils
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame_init_.py", line 1, in
from . import algorithms, buffers, noise, transition
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\algorithms_init_.py", line 3, in
from .dqn import DQN
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\algorithms\dqn.py", line 8, in
from machin.frame.buffers.buffer import Transition, Buffer
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\buffers_init_.py", line 2, in
from .buffer_d import DistributedBuffer
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\buffers\buffer_d.py", line 5, in
from machin.parallel.distributed import RpcGroup
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed_init_.py", line 1, in
from .world import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 535, in
class RpcGroup:
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 550, in RpcGroup
@_copy_doc(rpc.rpc_sync)
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'
from machin.
OK for these errors you need to change the ImportError
to Exception
in these two files:
https://github.com/iffiX/machin/blob/master/machin/frame/algorithms/__init__.py
https://github.com/iffiX/machin/blob/master/machin/frame/buffers/__init__.py
Because AttributeError is not captured here.
from machin.
Okay thanks, I will have a look into it and come back to you tomorrow.
from machin.
No problem, I will correct these problem in my code now, and try to find a windows testing environment.
from machin.
Hello again, see below:
Traceback (most recent call last):
File "c:..\Desktop\Forschung\RL\Implementations\PyTorch Templates\machin\CartPole-DQN.py", line 1, in
from machin.frame.algorithms import DQN
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin_init_.py", line 1, in
from . import env, frame, model, parallel, utils
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame_init_.py", line 1, in
from . import algorithms, buffers, helpers, noise, transition
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\algorithms_init_.py", line 14, in
from .a3c import A3C
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\algorithms\a3c.py", line 2, in
from machin.parallel.server import PushPullGradServer
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\server_init_.py", line 1, in
from . import ordered_server
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\server\ordered_server.py", line 5, in
from ..distributed import RpcGroup, debug_with_process
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed_init_.py", line 1, in
from .world import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 585, in
class RpcGroup:
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 600, in RpcGroup
@_copy_doc(rpc.rpc_sync)
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'
from machin.
OK, now move from .a3c import A3C
to that try except block:
like this:
try:
from .a3c import A3C
from .apex import DQNApex, DDPGApex
from .impala import IMPALA
from .ars import ARS
except Exception as _:
warnings.warn(
"Failed to import algorithms relying on torch.distributed." " Set them to None."
)
A3C = None
DQNApex = None
DDPGApex = None
IMPALA = None
ARS = None
from machin.
Great job, this example works fine now!
I will close this issue and open a new one if any further problems should occur.
Thanks again.
from machin.
OK, during this time I will add a quick fix to this when I got circleci working. :)
from machin.
After searching for a while I cannot find a platform with reasonable time for my auto testing, and since it is too difficult to maintain a hybrid jenkins-windows-vm setup I will not consider windows CI in the near future.
As a complement, I will do a one-time testing manually for requested future versions.
from machin.
can help me that below:
@rpc.functions.async_execution
AttributeError: module 'torch.distributed.rpc' has no attribute 'functions'
from machin.
Related Issues (18)
- [FEATURE] Is there a tutorial for maddpg.py HOT 4
- Error importing PPO HOT 4
- [Question] Hybrid action space HOT 7
- Algorithm impala cannot use GPU[ALTER] HOT 2
- Multi Discrete Action Spaces HOT 7
- ImportError: cannot import name 'FileStore' HOT 5
- Failed to run IMALA and MADDPG from examples folder HOT 2
- Hierarchical discrete action space HOT 3
- [FEATURE] Large Transition Batch Size HOT 3
- Apex-ddpg cannot use GPU HOT 2
- len(tmp_observations) < 2 on PPO raise ValueError: The parameter probs has invalid values HOT 3
- A2C entropy minimized instead of maximized HOT 2
- [FEATURE] Custom replay buffers HOT 1
- Can your_first_program be trained on a GPU? HOT 1
- "index_add_(): self and source must have the same scalar type" while training DQN rainbow HOT 2
- Variable lengths samples in batch in update() HOT 4
- [ALTER] Readability - black standard formatting HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from machin.