Code Monkey home page Code Monkey logo

Comments (7)

Construct705 avatar Construct705 commented on July 19, 2024 1

cool. many thanks. works

from super-gradients.

dagshub avatar dagshub commented on July 19, 2024

Join the discussion on DagsHub!

from super-gradients.

ofrimasad avatar ofrimasad commented on July 19, 2024

Hi @ngocnd2402.
This is a general error of Python on Windows.
We do not support Windows officially, but if you provide some more information, perhaps we could help.
Could you send information on the OS, relevant HW (GPU + CUDA), and Python environment?
Can you share the full error stack trace?

from super-gradients.

Construct705 avatar Construct705 commented on July 19, 2024

I have the same problem:
Windows Server 2022
Torch 1.13.0
Quadro P4000 GPU
Python 3.8
CUDA 12.0
AttributeError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from super_gradients.training import models
3 yolo_nas_l = models.get("yolo_nas_l", pretrained_weights="coco")

File ~\anaconda3\envs\yolo-nas\lib\site-packages\super_gradients_init_.py:2
1 from super_gradients.common import init_trainer, is_distributed, object_names
----> 2 from super_gradients.training import losses, utils, datasets_utils, DataAugmentation, Trainer, KDTrainer, QATTrainer
3 from super_gradients.common.registry.registry import ARCHITECTURES
4 from super_gradients.sanity_check import env_sanity_check

File ~\anaconda3\envs\yolo-nas\lib\site-packages\super_gradients\training_init_.py:2
1 # PACKAGE IMPORTS FOR EXTERNAL USAGE
----> 2 import super_gradients.training.utils.distributed_training_utils as distributed_training_utils
3 from super_gradients.training.datasets import datasets_utils, DataAugmentation
4 from super_gradients.training.sg_trainer import Trainer

File ~\anaconda3\envs\yolo-nas\lib\site-packages\super_gradients\training\utils\distributed_training_utils.py:13
11 from torch.distributed.elastic.multiprocessing import Std
12 from torch.distributed.elastic.multiprocessing.errors import record
---> 13 from torch.distributed.launcher.api import LaunchConfig, elastic_launch
15 from super_gradients.common.environment.ddp_utils import init_trainer
16 from super_gradients.common.data_types.enum import MultiGPUMode

File ~\anaconda3\envs\yolo-nas\lib\site-packages\torch\distributed\launcher_init_.py:10
1 #!/usr/bin/env/python3
2
3 # Copyright (c) Facebook, Inc. and its affiliates.
(...)
6 # This source code is licensed under the BSD-style license found in the
7 # LICENSE file in the root directory of this source tree.
---> 10 from torch.distributed.launcher.api import ( # noqa: F401
11 LaunchConfig,
12 elastic_launch,
13 launch_agent,
14 )

File ~\anaconda3\envs\yolo-nas\lib\site-packages\torch\distributed\launcher\api.py:15
13 import torch.distributed.elastic.rendezvous.registry as rdzv_registry
14 from torch.distributed.elastic import events, metrics
---> 15 from torch.distributed.elastic.agent.server.api import WorkerSpec
16 from torch.distributed.elastic.agent.server.local_elastic_agent import LocalElasticAgent
17 from torch.distributed.elastic.multiprocessing import SignalException, Std

File ~\anaconda3\envs\yolo-nas\lib\site-packages\torch\distributed\elastic\agent\server_init_.py:40
9 """
10 The elastic agent is the control plane of torchelastic. It is a process
11 that launches and manages underlying worker processes. The agent is
(...)
28 in the same job) to make a collective decision.
29 """
31 from .api import ( # noqa: F401
32 ElasticAgent,
33 RunResult,
(...)
38 WorkerState,
39 )
---> 40 from .local_elastic_agent import TORCHELASTIC_ENABLE_FILE_TIMER, TORCHELASTIC_TIMER_FILE

File ~\anaconda3\envs\yolo-nas\lib\site-packages\torch\distributed\elastic\agent\server\local_elastic_agent.py:19
16 import uuid
17 from typing import Any, Dict, Optional, Tuple
---> 19 import torch.distributed.elastic.timer as timer
20 from torch.distributed.elastic import events
22 from torch.distributed.elastic.agent.server.api import (
23 RunResult,
24 SimpleElasticAgent,
(...)
27 WorkerState,
28 )

File ~\anaconda3\envs\yolo-nas\lib\site-packages\torch\distributed\elastic\timer_init_.py:44
42 from .api import TimerClient, TimerRequest, TimerServer, configure, expires # noqa: F401
43 from .local_timer import LocalTimerClient, LocalTimerServer # noqa: F401
---> 44 from .file_based_local_timer import FileTimerClient, FileTimerServer, FileTimerRequest

File ~\anaconda3\envs\yolo-nas\lib\site-packages\torch\distributed\elastic\timer\file_based_local_timer.py:63
51 def to_json(self) -> str:
52 return json.dumps(
53 {
54 "version": self.version,
(...)
59 },
60 )
---> 63 class FileTimerClient(TimerClient):
64 """
65 Client side of FileTimerServer. This client is meant to be used
66 on the same host that the FileTimerServer is running on and uses
(...)
79 negative or zero signal will not kill the process.
80 """
81 def init(self, file_path: str, signal=signal.SIGKILL) -> None:

File ~\anaconda3\envs\yolo-nas\lib\site-packages\torch\distributed\elastic\timer\file_based_local_timer.py:81, in FileTimerClient()
63 class FileTimerClient(TimerClient):
64 """
65 Client side of FileTimerServer. This client is meant to be used
66 on the same host that the FileTimerServer is running on and uses
(...)
79 negative or zero signal will not kill the process.
80 """
---> 81 def init(self, file_path: str, signal=signal.SIGKILL) -> None:
82 super().init()
83 self._file_path = file_path

AttributeError: module 'signal' has no attribute 'SIGKILL'

from super-gradients.

Mohamed2519 avatar Mohamed2519 commented on July 19, 2024

I solve it by go to the file of the error and replace SIGKILL into SIGILL

from super-gradients.

Louis-Dupont avatar Louis-Dupont commented on July 19, 2024

@ngocnd2402 , does this solve your issue ?

from super-gradients.

ngocnd2402 avatar ngocnd2402 commented on July 19, 2024

yeah, it works. Thanks <3

from super-gradients.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.