Code Monkey home page Code Monkey logo

Comments (17)

iffiX avatar iffiX commented on June 9, 2024

which version of torch and os platform are you using?

from machin.

MarWaltz avatar MarWaltz commented on June 9, 2024

torch version is 1.8.1+cu111 and my os is Windows 10, but I already tried several different torch versions.

from machin.

iffiX avatar iffiX commented on June 9, 2024

Yeah, windows torch does not support rpc_sync and any distributed model that is using this function (IMPALA, A3C, etc).
So far I don't have a windows platform to test so there might be some import errors. Could you please show the detailed error stack in python?

from machin.

MarWaltz avatar MarWaltz commented on June 9, 2024

Of course, see below:

Traceback (most recent call last):
File "c:<file_path>.py", line 1, in
from machin.frame.algorithms import DQN
File "C:...\AppData\Local\Programs\Python\Python38\lib\site-packages\machin_init_.py", line 1, in
from . import env, frame, model, parallel, utils
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env_init_.py", line 1, in
from . import utils, wrappers
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env\wrappers_init_.py", line 1, in
from . import base, openai_gym
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env\wrappers\openai_gym.py", line 8, in
from machin.parallel.exception import ExceptionWithTraceback
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel_init_.py", line 2, in
from . import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed_init_.py", line 1, in
from .world import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 535, in
class RpcGroup:
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 550, in RpcGroup
@_copy_doc(rpc.rpc_sync)
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'

from machin.

iffiX avatar iffiX commented on June 9, 2024

Oh, that error is easy to fix, for now as a temporary fix you need to do the following changes:
In file https://github.com/iffiX/machin/blob/master/machin/parallel/__init__.py
(C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel_init_.py on your local system)

  1. Remove from . import distributed
  2. Remove "distributed" from __all__
    The wrapper you are using does not depend on rpc functions.

Please notify me if any other import errors persist.

from machin.

MarWaltz avatar MarWaltz commented on June 9, 2024

I did make these changes, but unfortunately I still run into the following:

Traceback (most recent call last):
File "c:<file_path>.py", line 1, in
from machin.frame.algorithms import DQN
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin_init_.py", line 1, in
from . import env, frame, model, parallel, utils
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env_init_.py", line 1, in
from . import utils, wrappers
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env\wrappers_init_.py", line 1, in
from . import base, openai_gym
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\env\wrappers\openai_gym.py", line 8, in
from machin.parallel.exception import ExceptionWithTraceback
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel_init_.py", line 2, in
from . import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\server_init_.py", line 1, in
from . import ordered_server
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\server\ordered_server.py", line 5, in
from ..distributed import RpcGroup
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed_init_.py", line 1, in
from .world import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 535, in
class RpcGroup:
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 550, in RpcGroup
@_copy_doc(rpc.rpc_sync)
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'

from machin.

iffiX avatar iffiX commented on June 9, 2024

Oh I forgot the "server", you also need to remove that. Sorry for this inconvenience.

from machin.

MarWaltz avatar MarWaltz commented on June 9, 2024

No worries. But still:

Traceback (most recent call last):
File "c:<file-path>.py", line 1, in
from machin.frame.algorithms import DQN
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin_init_.py", line 1, in
from . import env, frame, model, parallel, utils
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame_init_.py", line 1, in
from . import algorithms, buffers, noise, transition
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\algorithms_init_.py", line 3, in
from .dqn import DQN
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\algorithms\dqn.py", line 8, in
from machin.frame.buffers.buffer import Transition, Buffer
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\buffers_init_.py", line 2, in
from .buffer_d import DistributedBuffer
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\buffers\buffer_d.py", line 5, in
from machin.parallel.distributed import RpcGroup
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed_init_.py", line 1, in
from .world import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 535, in
class RpcGroup:
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 550, in RpcGroup
@_copy_doc(rpc.rpc_sync)
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'

from machin.

iffiX avatar iffiX commented on June 9, 2024

OK for these errors you need to change the ImportError to Exception in these two files:
https://github.com/iffiX/machin/blob/master/machin/frame/algorithms/__init__.py
https://github.com/iffiX/machin/blob/master/machin/frame/buffers/__init__.py

Because AttributeError is not captured here.

from machin.

MarWaltz avatar MarWaltz commented on June 9, 2024

Okay thanks, I will have a look into it and come back to you tomorrow.

from machin.

iffiX avatar iffiX commented on June 9, 2024

No problem, I will correct these problem in my code now, and try to find a windows testing environment.

from machin.

MarWaltz avatar MarWaltz commented on June 9, 2024

Hello again, see below:

Traceback (most recent call last):
File "c:..\Desktop\Forschung\RL\Implementations\PyTorch Templates\machin\CartPole-DQN.py", line 1, in
from machin.frame.algorithms import DQN
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin_init_.py", line 1, in
from . import env, frame, model, parallel, utils
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame_init_.py", line 1, in
from . import algorithms, buffers, helpers, noise, transition
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\algorithms_init_.py", line 14, in
from .a3c import A3C
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\frame\algorithms\a3c.py", line 2, in
from machin.parallel.server import PushPullGradServer
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\server_init_.py", line 1, in
from . import ordered_server
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\server\ordered_server.py", line 5, in
from ..distributed import RpcGroup, debug_with_process
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed_init_.py", line 1, in
from .world import (
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 585, in
class RpcGroup:
File "C:..\AppData\Local\Programs\Python\Python38\lib\site-packages\machin\parallel\distributed\world.py", line 600, in RpcGroup
@_copy_doc(rpc.rpc_sync)
AttributeError: module 'torch.distributed.rpc' has no attribute 'rpc_sync'

from machin.

iffiX avatar iffiX commented on June 9, 2024

OK, now move from .a3c import A3C to that try except block:


like this:

try:
    from .a3c import A3C
    from .apex import DQNApex, DDPGApex
    from .impala import IMPALA
    from .ars import ARS
except Exception as _:
    warnings.warn(
        "Failed to import algorithms relying on torch.distributed." " Set them to None."
    )
    A3C = None
    DQNApex = None
    DDPGApex = None
    IMPALA = None
    ARS = None

from machin.

MarWaltz avatar MarWaltz commented on June 9, 2024

Great job, this example works fine now!
I will close this issue and open a new one if any further problems should occur.

Thanks again.

from machin.

iffiX avatar iffiX commented on June 9, 2024

OK, during this time I will add a quick fix to this when I got circleci working. :)

from machin.

iffiX avatar iffiX commented on June 9, 2024

After searching for a while I cannot find a platform with reasonable time for my auto testing, and since it is too difficult to maintain a hybrid jenkins-windows-vm setup I will not consider windows CI in the near future.

As a complement, I will do a one-time testing manually for requested future versions.

from machin.

oneoneonecy avatar oneoneonecy commented on June 9, 2024

can help me that below:
@rpc.functions.async_execution
AttributeError: module 'torch.distributed.rpc' has no attribute 'functions'

from machin.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.