Code Monkey home page Code Monkey logo

super-gradients's Introduction



Build, train, and fine-tune production-ready deep learning SOTA vision models Tweet

Version 3.5 is out! Notebooks have been updated!



Build with SuperGradients


Support various computer vision tasks

Have some questions about SuperGradients - Try our AI helper

Ready to deploy pre-trained SOTA models

YOLO-NAS and YOLO-NAS-POSE architectures are out! The new YOLO-NAS delivers state-of-the-art performance with the unparalleled accuracy-speed performance, outperforming other models such as YOLOv5, YOLOv6, YOLOv7 and YOLOv8. A YOLO-NAS-POSE model for pose estimation is also available, delivering state-of-the-art accuracy/performance tradeoff.

Check these out here: YOLO-NAS & YOLO-NAS-POSE.

# Load model with pretrained weights
from super_gradients.training import models
from super_gradients.common.object_names import Models

model = models.get(Models.YOLO_NAS_M, pretrained_weights="coco")

All Computer Vision Models - Pretrained Checkpoints can be found in the Model Zoo

Classification

Semantic Segmentation

Object Detection

Pose Estimation

Easy to train SOTA Models

Easily load and fine-tune production-ready, pre-trained SOTA models that incorporate best practices and validated hyper-parameters for achieving best-in-class accuracy. For more information on how to do it go to Getting Started

Plug and play recipes

python -m super_gradients.train_from_recipe architecture=regnetY800 dataset_interface.data_dir=<YOUR_Imagenet_LOCAL_PATH> ckpt_root_dir=<CHEKPOINT_DIRECTORY>

More examples on how and why to use recipes can be found in Recipes

Production readiness

All SuperGradients models’ are production ready in the sense that they are compatible with deployment tools such as TensorRT (Nvidia) and OpenVINO (Intel) and can be easily taken into production. With a few lines of code you can easily integrate the models into your codebase.

# Load model with pretrained weights
from super_gradients.training import models
from super_gradients.common.object_names import Models

model = models.get(Models.YOLO_NAS_M, pretrained_weights="coco")

# Prepare model for conversion
# Input size is in format of [Batch x Channels x Width x Height] where 640 is the standard COCO dataset dimensions
model.eval()
model.prep_model_for_conversion(input_size=[1, 3, 640, 640])
    
# Create dummy_input

# Convert model to onnx
torch.onnx.export(model, dummy_input,  "yolo_nas_m.onnx")

More information on how to take your model to production can be found in Getting Started notebooks

Quick Installation


pip install super-gradients

What's New


Version 3.4.0 (November 6, 2023)

  • YoloNAS-Pose model released - a new frontier in pose estimation
  • Added option to export a recipe to a single YAML file or to a standalone train.py file
  • Other bugfixes & minor improvements. Full release notes available here

Version 3.1.3 (July 19, 2023)


30th of May

Version 3.1.1 (May 3rd)

Check out SG full release notes.

Table of Content


Getting Started


Start Training with Just 1 Command Line

The most simple and straightforward way to start training SOTA performance models with SuperGradients reproducible recipes. Just define your dataset path and where you want your checkpoints to be saved and you are good to go from your terminal!

Just make sure that you setup your dataset according to the data dir specified in the recipe.

python -m super_gradients.train_from_recipe --config-name=imagenet_regnetY architecture=regnetY800 dataset_interface.data_dir=<YOUR_Imagenet_LOCAL_PATH> ckpt_root_dir=<CHEKPOINT_DIRECTORY>

Quickly Load Pre-Trained Weights for Your Desired Model with SOTA Performance

Want to try our pre-trained models on your machine? Import SuperGradients, initialize your Trainer, and load your desired architecture and pre-trained weights from our SOTA model zoo

# The pretrained_weights argument will load a pre-trained architecture on the provided dataset
    
import super_gradients

model = models.get("model-name", pretrained_weights="pretrained-model-name")

Classification

Semantic Segmentation

Pose Estimation

Object Detection

How to Predict Using Pre-trained Model

Albumentations Integration

Advanced Features


Post Training Quantization and Quantization Aware Training

Quantization involves representing weights and biases in lower precision, resulting in reduced memory and computational requirements, making it useful for deploying models on devices with limited resources. The process can be done during training, called Quantization aware training, or after training, called post-training quantization. A full tutorial can be found here.

Quantization Aware Training YoloNAS on Custom Dataset

This tutorial provides a comprehensive guide on how to fine-tune a YoloNAS model using a custom dataset. It also demonstrates how to utilize SG's QAT (Quantization-Aware Training) support. Additionally, it offers step-by-step instructions on deploying the model and performing benchmarking.

Knowledge Distillation Training

Knowledge Distillation is a training technique that uses a large model, teacher model, to improve the performance of a smaller model, the student model. Learn more about SuperGradients knowledge distillation training with our pre-trained BEiT base teacher model and Resnet18 student model on CIFAR10 example notebook on Google Colab for an easy to use tutorial using free GPU hardware

Recipes

To train a model, it is necessary to configure 4 main components. These components are aggregated into a single "main" recipe .yaml file that inherits the aforementioned dataset, architecture, raining and checkpoint params. It is also possible (and recommended for flexibility) to override default settings with custom ones. All recipes can be found here
Recipes support out of the box every model, metric or loss that is implemented in SuperGradients, but you can easily extend this to any custom object that you need by "registering it". Check out this tutorial for more information.

Using Distributed Data Parallel (DDP)

Why use DDP ?

Recent Deep Learning models are growing larger and larger to an extent that training on a single GPU can take weeks. In order to train models in a timely fashion, it is necessary to train them with multiple GPUs. Using 100s GPUs can reduce training time of a model from a week to less than an hour.

How does it work ?

Each GPU has its own process, which controls a copy of the model and which loads its own mini-batch from disk and sends it to its GPU during training. After the forward pass is completed on every GPU, the gradient is reduced across all GPUs, yielding to all the GPUs having the same gradient locally. This leads to the model weights to stay synchronized across all GPUs after the backward pass.

How to use it ?

You can use SuperGradients to train your model with DDP in just a few lines.

main.py

from super_gradients import init_trainer, Trainer
from super_gradients.common import MultiGPUMode
from super_gradients.training.utils.distributed_training_utils import setup_device

# Initialize the environment
init_trainer()

# Launch DDP on 4 GPUs'
setup_device(multi_gpu=MultiGPUMode.DISTRIBUTED_DATA_PARALLEL, num_gpus=4)

# Call the trainer
Trainer(expriment_name=...)

# Everything you do below will run on 4 gpus

...

Trainer.train(...)

Finally, you can launch your distributed training with a simple python call.

python main.py

Please note that if you work with torch<1.9.0 (deprecated), you will have to launch your training with either torch.distributed.launch or torchrun, in which case nproc_per_node will overwrite the value set with gpu_mode:

python -m torch.distributed.launch --nproc_per_node=4 main.py
torchrun --nproc_per_node=4 main.py

Calling functions on a single node

It is often in DDP training that we want to execute code on the master rank (i.e rank 0). In SG, users usually execute their own code by triggering "Phase Callbacks" (see "Using phase callbacks" section below). One can make sure the desired code will only be ran on rank 0, using ddp_silent_mode or the multi_process_safe decorator. For example, consider the simple phase callback below, that uploads the first 3 images of every batch during training to the Tensorboard:

from super_gradients.training.utils.callbacks import PhaseCallback, PhaseContext, Phase
from super_gradients.common.environment.env_helpers import multi_process_safe

class Upload3TrainImagesCalbback(PhaseCallback):
    def __init__(
        self,
    ):
        super().__init__(phase=Phase.TRAIN_BATCH_END)
    
    @multi_process_safe
    def __call__(self, context: PhaseContext):
        batch_imgs = context.inputs.cpu().detach().numpy()
        tag = "batch_" + str(context.batch_idx) + "_images"
        context.sg_logger.add_images(tag=tag, images=batch_imgs[: 3], global_step=context.epoch)

The @multi_process_safe decorator ensures that the callback will only be triggered by rank 0. Alternatively, this can also be done by the SG trainer boolean attribute (which the phase context has access to), ddp_silent_mode, which is set to False iff the current process rank is zero (even after the process group has been killed):

from super_gradients.training.utils.callbacks import PhaseCallback, PhaseContext, Phase

class Upload3TrainImagesCalbback(PhaseCallback):
    def __init__(
        self,
    ):
        super().__init__(phase=Phase.TRAIN_BATCH_END)

    def __call__(self, context: PhaseContext):
        if not context.ddp_silent_mode:
            batch_imgs = context.inputs.cpu().detach().numpy()
            tag = "batch_" + str(context.batch_idx) + "_images"
            context.sg_logger.add_images(tag=tag, images=batch_imgs[: 3], global_step=context.epoch)

Note that ddp_silent_mode can be accessed through SgTrainer.ddp_silent_mode. Hence, it can be used in scripts after calling SgTrainer.train() when some part of it should be ran on rank 0 only.

Good to know

Your total batch size will be (number of gpus x batch size), so you might want to increase your learning rate. There is no clear rule, but a rule of thumb seems to be to linearly increase the learning rate with the number of gpus

Easily change architectures parameters

from super_gradients.training import models

# instantiate default pretrained resnet18
default_resnet18 = models.get(model_name="resnet18", num_classes=100, pretrained_weights="imagenet")

# instantiate pretrained resnet18, turning DropPath on with probability 0.5
droppath_resnet18 = models.get(model_name="resnet18", arch_params={"droppath_prob": 0.5}, num_classes=100, pretrained_weights="imagenet")

# instantiate pretrained resnet18, without classifier head. Output will be from the last stage before global pooling
backbone_resnet18 = models.get(model_name="resnet18", arch_params={"backbone_mode": True}, pretrained_weights="imagenet")

Using phase callbacks

from super_gradients import Trainer
from torch.optim.lr_scheduler import ReduceLROnPlateau
from super_gradients.training.utils.callbacks import Phase, LRSchedulerCallback
from super_gradients.training.metrics.classification_metrics import Accuracy

# define PyTorch train and validation loaders and optimizer

# define what to be called in the callback
rop_lr_scheduler = ReduceLROnPlateau(optimizer, mode="max", patience=10, verbose=True)

# define phase callbacks, they will fire as defined in Phase
phase_callbacks = [LRSchedulerCallback(scheduler=rop_lr_scheduler,
                                       phase=Phase.VALIDATION_EPOCH_END,
                                       metric_name="Accuracy")]

# create a trainer object, look the declaration for more parameters
trainer = Trainer("experiment_name")

# define phase_callbacks as part of the training parameters
train_params = {"phase_callbacks": phase_callbacks}

Integration to DagsHub

Open In Colab

from super_gradients import Trainer

trainer = Trainer("experiment_name")
model = ...

training_params = { ...  # Your training params
                   "sg_logger": "dagshub_sg_logger",  # DagsHub Logger, see class super_gradients.common.sg_loggers.dagshub_sg_logger.DagsHubSGLogger for details
                   "sg_logger_params":  # Params that will be passes to __init__ of the logger super_gradients.common.sg_loggers.dagshub_sg_logger.DagsHubSGLogger
                     {
                       "dagshub_repository": "<REPO_OWNER>/<REPO_NAME>", # Optional: Your DagsHub project name, consisting of the owner name, followed by '/', and the repo name. If this is left empty, you'll be prompted in your run to fill it in manually.
                       "log_mlflow_only": False, # Optional: Change to true to bypass logging to DVC, and log all artifacts only to MLflow  
                       "save_checkpoints_remote": True,
                       "save_tensorboard_remote": True,
                       "save_logs_remote": True,
                     }
                   }

Integration to Weights and Biases

from super_gradients import Trainer

# create a trainer object, look the declaration for more parameters
trainer = Trainer("experiment_name")

train_params = { ... # training parameters
                "sg_logger": "wandb_sg_logger", # Weights&Biases Logger, see class WandBSGLogger for details
                "sg_logger_params": # paramenters that will be passes to __init__ of the logger 
                  {
                    "project_name": "project_name", # W&B project name
                    "save_checkpoints_remote": True
                    "save_tensorboard_remote": True
                    "save_logs_remote": True
                  } 
               }

Integration to ClearML

from super_gradients import Trainer

# create a trainer object, look the declaration for more parameters
trainer = Trainer("experiment_name")

train_params = { ... # training parameters
                "sg_logger": "clearml_sg_logger", # ClearML Logger, see class ClearMLSGLogger for details
                "sg_logger_params": # paramenters that will be passes to __init__ of the logger 
                  {
                    "project_name": "project_name", # ClearML project name
                    "save_checkpoints_remote": True,
                    "save_tensorboard_remote": True,
                    "save_logs_remote": True,
                  } 
               }

Integration to Voxel51

You can apply SuperGradients YOLO-NAS models directly to your FiftyOne dataset using the apply_model() method:

import fiftyone as fo
import fiftyone.zoo as foz

from super_gradients.training import models

dataset = foz.load_zoo_dataset("quickstart", max_samples=25)
dataset.select_fields().keep_fields()

model = models.get("yolo_nas_m", pretrained_weights="coco")

dataset.apply_model(model, label_field="yolo_nas", confidence_thresh=0.7)

session = fo.launch_app(dataset)

The SuperGradients YOLO-NAS model can be accessed directly from the FiftyOne Model Zoo:

import fiftyone as fo
import fiftyone.zoo as foz

model = foz.load_zoo_model("yolo-nas-torch")

dataset = foz.load_zoo_dataset("quickstart")
dataset.apply_model(model, label_field="yolo_nas")

session = fo.launch_app(dataset)

Installation Methods


Prerequisites

General requirements
To train on nvidia GPUs

Quick Installation

Install stable version using PyPi

See in PyPi

pip install super-gradients

That's it !

Install using GitHub
pip install git+https://github.com/Deci-AI/super-gradients.git@stable

Implemented Model Architectures


All Computer Vision Models - Pretrained Checkpoints can be found in the Model Zoo

Image Classification

Semantic Segmentation

Object Detection

Pose Estimation


Implemented Datasets


Deci provides implementation for various datasets. If you need to download any of the dataset, you can find instructions.

Image Classification

Semantic Segmentation

Object Detection

Pose Estimation


Documentation

Check SuperGradients Docs for full documentation, user guide, and examples.

Contributing

To learn about making a contribution to SuperGradients, please see our Contribution page.

Our awesome contributors:


Made with contrib.rocks.

Citation

If you are using SuperGradients library or benchmarks in your research, please cite SuperGradients deep learning training library.

Community

If you want to be a part of SuperGradients growing community, hear about all the exciting news and updates, need help, request for advanced features, or want to file a bug or issue report, we would love to welcome you aboard!

  • Discord is the place to be and ask questions about SuperGradients and get support. Click here to join our Discord Community

  • To report a bug, file an issue on GitHub.

  • Join the SG Newsletter for staying up to date with new features and models, important announcements, and upcoming events.

  • For a short meeting with us, use this link and choose your preferred time.

License

This project is released under the Apache 2.0 license.

Citing

BibTeX

@misc{supergradients,
  doi = {10.5281/ZENODO.7789328},
  url = {https://zenodo.org/record/7789328},
  author = {Aharon,  Shay and {Louis-Dupont} and {Ofri Masad} and Yurkova,  Kate and {Lotem Fridman} and {Lkdci} and Khvedchenya,  Eugene and Rubin,  Ran and Bagrov,  Natan and Tymchenko,  Borys and Keren,  Tomer and Zhilko,  Alexander and {Eran-Deci}},
  title = {Super-Gradients},
  publisher = {GitHub},
  journal = {GitHub repository},
  year = {2021},
}

Latest DOI

DOI


Deci Platform

Deci Platform is our end to end platform for building, optimizing and deploying deep learning models to production.

Request free trial to enjoy immediate improvement in throughput, latency, memory footprint and model size.

Features

  • Automatically compile and quantize your models with just a few clicks (TensorRT, OpenVINO).
  • Gain up to 10X improvement in throughput, latency, memory and model size.
  • Easily benchmark your models’ performance on different hardware and batch sizes.
  • Invite co-workers to collaborate on models and communicate your progress.
  • Deci supports all common frameworks and Hardware, from Intel CPUs to Nvidia's GPUs and Jetsons. ֿ

Request free trial here

super-gradients's People

Contributors

avideci avatar bit-scientist avatar bloodaxe avatar danbochman avatar daniel-deci avatar danielafrimi avatar eran-deci avatar fcakyon avatar hakuryuu96 avatar jacobmarks avatar jonathan-sha avatar lkdci avatar lotem-deci avatar louis-dupont avatar najeeb5 avatar natanbagrov avatar oferbaratz avatar ofrimasad avatar ranrubin avatar roikoren755 avatar shairoz-deci avatar shani-perl avatar shanibenbaruch avatar shaydeci avatar soumik12345 avatar spsancti avatar strelok899 avatar tomerkeren42 avatar yonatan-kaplounov avatar yurkovak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

super-gradients's Issues

How can I enable multigpu training

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

How to accelerate regseg on tensorRT

I tried to use tensorrt with original regseg repository.
However onnx had trouble with torch.split and also torch2trt unable to use with specific tensorrt version.
Please let me know how did you use tensorrt with regseg model when measure the latency.

Default dataloader params have shuffle=False

Describe the bug

By default, it does not pass shuffle=True to the dataloader, so SequentialSampler gets instantiated. In training on Imagenet it makes NN see only examples of a single class which quickly throws it out of the minima.
Passing shuffle=True to the dataloader params solves the issue

To Reproduce

Minimal example:

from super_gradients import Trainer
from super_gradients.training import MultiGPUMode
from super_gradients.training import models
from super_gradients.training.dataloaders import imagenet_train, imagenet_val
from super_gradients.training.metrics import Accuracy

super_gradients.init_trainer()

dataloader_params = {"batch_size": 196}  # buggy params
# dataloader_params = {"batch_size": 196, "shuffle": True}  # non-buggy params


train_params = {"max_epochs": 1,
                "initial_lr": 0.001,
                "optimizer": "SGD",
                "optimizer_params": {"weight_decay": 0.0001, "momentum": 0.9, "nesterov": True},
                "loss": "cross_entropy",
                "train_metrics_list": [Accuracy()],
                "valid_metrics_list": [Accuracy()],
                "loss_logging_items_names": ["Loss"],
                "metric_to_watch": "Accuracy",
                "greater_metric_to_watch_is_better": True
                }

train_dataloader = imagenet_train(dataloader_params=dataloader_params)
val_dataloader = imagenet_val(dataloader_params=dataloader_params)

model = models.get("resnet50", pretrained_weights="imagenet", num_classes=1000)

trainer = Trainer(experiment_name="reproduce_shuffle_bug",
                  multi_gpu=MultiGPUMode.OFF,
                  device='cuda')

trainer.train(model=model,
              training_params=train_params,
              train_loader=train_dataloader,
              valid_loader=val_dataloader)

Expected behavior

Accuracy not dropping

Environment:

  • OS Linux 5.4.0-94-generic x86_64
  • Super Gradients version 3.0.0

Additional context

Add any other context about the problem here.

How to merge label classes without modifying the ground truth data

I want to take a set of model weights for semantic segmentation pre-trained on cityscapes and finetune it such that it ignores all classes other than road.
Is there currently a way to merge or ignore label classes, e.g. by passing an argument to the dataloader?
Else can you give a hint where best to modify the source code to achieve this?

My best guess would be to modify this function to set other to be ignored.

Cannot install supergradients

Hello,
I facing this issue while trying to install supergradients.

image
Also i tried,
pip install super-gradients i am encountering this issue.
image

How to implement transfer learning in unseen dataset .?

Lets if any unseen data we add then i do not want to train whole data set by mixing old and new data....I want to train only unseen data and our final model must detect old label and new label after training on unseen label?

MaskAttentionLoss in DiceCEEdgeLoss doesn't handle images without any edges

Describe the bug

Training models that use DiceCEEdgeLoss results in NaN loss on images that only contain one semantic class. The edge_target becomes a tensor filled with zeros because there are no edges in the image:

edge_target = target_to_binary_edge(
target, num_classes=self.num_classes, kernel_size=self.edge_kernel, ignore_index=self.ignore_index, flatten_channels=True
)

Then, when computing the MaskAttentionLoss, mask_loss is a tensor filled with zeros, gets reassigned to an empty tensor, and, finally, computing the mean of an empty tensor results in NaN.

mask_loss = mask_loss[mask == 1] # consider only mask samples for mask loss computing
mask_loss = apply_reduce(mask_loss, self.reduction)

To Reproduce

I've written a new test in tests/unit_tests/mask_loss_test.py that reproduces the problem.

def test_with_cross_entropy_loss_maskless(self):
    """
    Test case with mask filled with zeros, corresponding to a scenario without
    attention. It's expected that the mask doesn't contribute to the loss.

    This scenario may happen when using edge masks on an image without
    edges - there's only one semantic region in the whole image.

    Shapes: predict [BxCxHxW], target [BxHxW], mask [Bx1xHxW]
    """
    predict = torch.randn(self.batch, self.num_classes, self.img_size, self.img_size)
    target = self._get_default_target_tensor()
    # Create a mask filled with zeros to disable the attention component
    mask = self._get_default_mask_tensor() * 0.0

    loss_weigths = [1.0, 0.5]
    ce_crit = nn.CrossEntropyLoss(reduction="none")
    mask_ce_crit = MaskAttentionLoss(criterion=ce_crit, loss_weights=loss_weigths)

    # expected result - no contribution from mask
    ce_loss = ce_crit(predict, target)
    expected_loss = ce_loss.mean() * loss_weigths[0]

    # mask ce loss result
    loss = mask_ce_crit(predict, target, mask)

    self._assertion_torch_values(expected_loss, loss)

Running this test results in:

AssertionError: False is not true : Unequal torch tensors: excepted: 1.7192925214767456, found: nan

Expected behavior

A mask filed with zeros should "disable" attention. Thus, the mask should not contribute to the loss.

Environment:

  • 3.0.7

Additional context

Can be fixed by checking if mask_loss is NaN and setting it to 0 instead. Like this:

mask_loss = mask_loss if not mask_loss.isnan() else mask_loss.new_tensor(0.0)

Training YOLONAS from scratch for using it in a commercial application

As far as I understand for the license, if I used pre-trained weights for training YOLONAS (fine-tune my model on my dataset), I cannot use it in commercial applications. Is it right?

If so, when training:

from super_gradients.training import models
model = models.get('yolo_nas_l', 
                   num_classes=len(dataset_params['classes']), 
                   pretrained_weights="coco"
                   )

how to change this and train it from scratch?

CoCoSegmentationDataSet._generate_samples_and_targets() doesn't call super (= no caching)

Describe the bug

CoCoSegmentationDataSet._generate_samples_and_targets() doesn't call the corresponding parent class method, and therefore image and label caching is doesn't work for this class.
The solution is to add super()._generate_samples_and_targets() as the last line in CoCoSegmentationDataSet._generate_samples_and_targets().
Excuse me for not making this as a pull request this time.

python TFlite or onnx inference script

Is your feature request related to a problem?

No. i am able to convert .pth to tflite models i need a starter script which can be used for tflite inference

Describe the solution you'd like

A standalone python inference script

Describe alternatives you've considered

if you have onnx inference script also it would be helpful

The mAP reported by SG is not consistent with COCO-API

Describe the bug

When training a detection model, e.g. SSD Lite MobileNet, the mAP shown in SG (TensorBoard) is much higher than the mAP returned via COCO-API

To Reproduce

Steps to reproduce the behavior:

  1. Branch https://github.com/Deci-AI/super-gradients/tree/ssd_mobilenet
  2. Run train_from_recipe.py with coco_ssd_lite_mobilenet_v2.yaml or coco_ssd_mobilenet_v1.yaml
  3. (The model should reach ~30 mAP)

However: running the COCO-API reference returns ~0.17 mAP.

Expected behavior

The mAP information should correspond to COCO-API's mAP.

Screenshots

image

Support for 3D images

Is your feature request related to a problem? Please describe.

Medical image segmentation often relies on 3D scans (MRI, CT), but there are very few pretrained models to use for different medical image tasks (classification, segmentation)

Describe the solution you'd like

It would be great if there were some state of the art models available in super-gradients, and even better if they would be pre-trained. For example this model http://arxiv.org/abs/2208.09567, or use the synthesized 100k brains to train a 3d classification model like this one: http://arxiv.org/abs/2209.07162, https://arxiv.org/pdf/2303.08216.pdf

Select which gpu to train on

Is your feature request related to a problem? Please describe.

I am trying to train yolo-nas in an environment with multiple gpus (device=0 and 1), but I only want to use 1. Is there a way to train the model specifically in the 2nd gpu (device=1)

Describe the solution you'd like

Train the model by only using the 2nd gpu

Describe alternatives you've considered

Nothing really.

Integrating custom models into backbone

Hello there!

I loved your work, keep it up!

I'd like to integrate a custom attention model as backbone model for an object detection task and test it out. Is there any documentation or tutorial that you can provide so that I can follow? Or any help would be appreciated!

Kornia augmentations integration

Hi guys, very nice repository— congratulations!

We were wondering whether we can help you guys to integrate kornia.augmentations in your framework.

We have special containers to automate the case for augmenting for detection, segmentation, videos, etc.

An example of a similar integration can be found here from ms-torchgeo team after further collaboration

let us know so that our augmentations team can assist you in case 0f missing features /cc @shijianjian @twsl

Integration with 🤗 Hub

Hi folks.

Thanks for providing the pre-trained models along with pre-training scripts.

At Hugging Face, the Hub is our house to serve models, datasets, spaces, etc. It facilitates easy artifact loading and usage, providing common and streamlined API access to your models, datasets, etc.

Hugging Face supports third-party integrations too and I was wondering if you'd be up to exploring the integration. The integration will primarily facilitate easy model sharing and model downloading which could be beneficial for the vision community in general.

Here's the main doc that'd be helpful for you for the integration: https://huggingface.co/docs/hub/models-adding-libraries. Let me know if you'd need any help.

Need a inference code for yolo-nas , so that i can get output in x1,y1,x2,y2 in terms of bounding box , along with the conf

I'm using Opencv , and i need to be able to put my image frame from the opencv which ofcourse is in numpy.

But the abstracted code you provide don't seem to give an option to get output in the form of x's and y's and conf score ,

Or it might be hiden in some different Class hiearchy , i'm not sure , cause i could not find it . would be very helpful if you could

help with it .

Arigato

Bug in super-gradients in Linux ubuntu and google colab

You'll use yolo_nas_l throughout this notebook. Because you should always go big, or go home.

It's a good life philosophy.

But your fine tuning notebook seems to be not working, I tried to rerun your notebook and in the second cell it is showing me error.
Here is a screenshot of error.
Screenshot from 2023-05-04 11-02-37

Also following your code into linux env I am facing certain errors while installing super-gradients.

Building wheels for collected packages: pycocotools, stringcase, termcolor, treelib, antlr4-python3-runtime, future
Building wheel for pycocotools (PEP 517) ... error
ERROR: Command errored out with exit status 1:
command: /home/soumyadeep/mmaction_custom/ava_custom_v2/Yolo_train/YOLO-NAS/yolonas_env/bin/python3 /tmp/tmpxprr4m03 build_wheel /tmp/tmp5_axaif9
cwd: /tmp/pip-install-n3xzz9vp/pycocotools
Complete output (67 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-38
creating build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/init.py -> build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/mask.py -> build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/cocoeval.py -> build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/coco.py -> build/lib.linux-x86_64-cpython-38/pycocotools
running build_ext
skipping 'pycocotools/_mask.c'

and more.

Is there any soluiton into it?

Upsample size mismatch in segmentation models

Describe the bug

Depending on the input image size, upsampled feature maps with nn.Upsample don't always match the size of the skip connection. This is a known issue, some reference links:

Replacing nn.Upsample with torch.nn.functional.interpolate seems to be the recommended solution.

To Reproduce

Here's a snippet using PP-LiteSeg. The dataset is cityscapes, but that's not important, the image size is the important factor. I imagine that the issue is in all models using nn.Upsample and concatenating with skip connections:

from super_gradients.training import models, dataloaders, Trainer
from super_gradients.common.object_names import Models
from super_gradients.training.metrics import IoU


trainer = Trainer(experiment_name="eval-pp-liteseg-b75")
val_loader = dataloaders.cityscapes_stdc_seg75_val(dataset_params={
    "transforms": [
            {
                "SegRescale": {
                    "long_size": 1025
                }
            }
        ]
    },
    dataloader_params={"batch_size": 1},
)
model = models.get(
    Models.PP_LITE_B_SEG75,
    pretrained_weights="cityscapes",
)
metric = IoU(num_classes=20, ignore_index=19)
miou = trainer.test(
    model=model,
    test_loader=val_loader,
    test_metrics_list=[metric],
    metrics_progress_verbose=False
)[0].cpu().item()
print(f"mIoU: {miou}")

Results in an error:

  File ".../src/super_gradients/training/models/segmentation_models/ppliteseg.py", line 52, in forward
    atten = torch.cat([*self._avg_max_spatial_reduce(x, use_concat=False), *self._avg_max_spatial_reduce(skip, use_concat=False)], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 66 but got size 65 for tensor number 2 in the list.

Expected behavior

Fully convolutional segmentation models should work for all input image sizes.

Environment:

  • Ubuntu
  • super-gradients v3.0.7
  • PyTorch 1.11

Dependency Issue during Installation

Describe the bug

C:\Users\Isaac>pip install super_gradients
Collecting super_gradients
  Using cached super_gradients-3.1.1-py3-none-any.whl (964 kB)
INFO: pip is looking at multiple versions of super-gradients to determine which version is compatible with other requirements. This could take a while.
  Using cached super_gradients-3.1.0-py3-none-any.whl (965 kB)
  Using cached super_gradients-3.0.9-py3-none-any.whl (938 kB)
  Using cached super_gradients-3.0.8-py3-none-any.whl (892 kB)
  Using cached super_gradients-3.0.7-py3-none-any.whl (794 kB)
  Using cached super_gradients-3.0.6-py3-none-any.whl (762 kB)
  Using cached super_gradients-3.0.5-py3-none-any.whl (748 kB)
  Using cached super_gradients-3.0.4-py3-none-any.whl (748 kB)
INFO: pip is looking at multiple versions of super-gradients to determine which version is compatible with other requirements. This could take a while.
  Using cached super_gradients-3.0.3-py3-none-any.whl (732 kB)
Requirement already satisfied: torch>=1.9.0 in c:\python311\lib\site-packages (from super_gradients) (2.0.0)
Requirement already satisfied: tqdm>=4.57.0 in c:\python311\lib\site-packages (from super_gradients) (4.65.0)
Collecting boto3>=1.17.15 (from super_gradients)
  Using cached boto3-1.26.126-py3-none-any.whl (135 kB)
Collecting jsonschema>=3.2.0 (from super_gradients)
  Using cached jsonschema-4.17.3-py3-none-any.whl (90 kB)
Collecting Deprecated>=1.2.11 (from super_gradients)
  Using cached Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: opencv-python>=4.5.1 in c:\python311\lib\site-packages (from super_gradients) (4.7.0.72)
Requirement already satisfied: scipy>=1.6.1 in c:\python311\lib\site-packages (from super_gradients) (1.10.1)
Requirement already satisfied: matplotlib>=3.3.4 in c:\python311\lib\site-packages (from super_gradients) (3.7.1)
Requirement already satisfied: psutil>=5.8.0 in c:\python311\lib\site-packages (from super_gradients) (5.9.5)
Collecting tensorboard>=2.4.1 (from super_gradients)
  Using cached tensorboard-2.12.3-py3-none-any.whl (5.6 MB)
Requirement already satisfied: setuptools>=21.0.0 in c:\python311\lib\site-packages (from super_gradients) (65.5.0)
Collecting coverage~=5.3.1 (from super_gradients)
  Using cached coverage-5.3.1.tar.gz (684 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torchvision>=0.10.0 in c:\python311\lib\site-packages (from super_gradients) (0.15.1)
Collecting sphinx~=4.0.2 (from super_gradients)
  Using cached Sphinx-4.0.3-py3-none-any.whl (2.9 MB)
Collecting sphinx-rtd-theme (from super_gradients)
  Using cached sphinx_rtd_theme-1.2.0-py2.py3-none-any.whl (2.8 MB)
Collecting torchmetrics==0.8 (from super_gradients)
  Using cached torchmetrics-0.8.0-py3-none-any.whl (408 kB)
Requirement already satisfied: pillow>=9.2.0 in c:\python311\lib\site-packages (from super_gradients) (9.5.0)
Collecting hydra-core>=1.2.0 (from super_gradients)
  Using cached hydra_core-1.3.2-py3-none-any.whl (154 kB)
Collecting omegaconf (from super_gradients)
  Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Collecting super_gradients
  Using cached super_gradients-3.0.2-py3-none-any.whl (664 kB)
  Using cached super_gradients-3.0.1-py3-none-any.whl (635 kB)
  Using cached super_gradients-3.0.0-py3-none-any.whl (615 kB)
  Using cached super_gradients-2.6.0-py3-none-any.whl (11.0 MB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Using cached super_gradients-2.5.0-py3-none-any.whl (11.0 MB)
  Using cached super_gradients-2.2.0-py3-none-any.whl (10.9 MB)
  Using cached super_gradients-2.1.0-py3-none-any.whl (23.0 MB)
Collecting elasticsearch==7.15.2 (from super_gradients)
  Using cached elasticsearch-7.15.2-py2.py3-none-any.whl (379 kB)
Collecting CMRESHandler>=1.0.0 (from super_gradients)
  Using cached CMRESHandler-1.0.0-py3-none-any.whl (15 kB)
Collecting super_gradients
  Using cached super_gradients-2.0.1-py3-none-any.whl (19.4 MB)
  Using cached super_gradients-2.0.0-py3-none-any.whl (19.4 MB)
  Using cached super_gradients-1.7.5-py3-none-any.whl (19.3 MB)
Collecting torchmetrics==0.7.3 (from super_gradients)
  Using cached torchmetrics-0.7.3-py3-none-any.whl (398 kB)
Collecting super_gradients
  Using cached super_gradients-1.7.4-py3-none-any.whl (19.3 MB)
  Using cached super_gradients-1.7.3-py3-none-any.whl (19.3 MB)
Collecting torchmetrics>=0.5.0 (from super_gradients)
  Using cached torchmetrics-0.11.4-py3-none-any.whl (519 kB)
Collecting super_gradients
  Using cached super_gradients-1.7.2-py3-none-any.whl (19.3 MB)
  Using cached super_gradients-1.7.1-py3-none-any.whl (15.1 MB)
  Using cached super_gradients-1.6.0-py3-none-any.whl (547 kB)
  Using cached super_gradients-1.5.2-py3-none-any.whl (540 kB)
  Using cached super_gradients-1.5.1-py3-none-any.whl (540 kB)
  Using cached super_gradients-1.5.0-py3-none-any.whl (497 kB)
  Using cached super_gradients-1.4.0-py3-none-any.whl (419 kB)
  Using cached super_gradients-1.3.1-py3-none-any.whl (416 kB)
  Using cached super_gradients-1.3.0-py3-none-any.whl (415 kB)
ERROR: Cannot install super-gradients==1.3.0, super-gradients==1.3.1, super-gradients==1.4.0, super-gradients==1.5.0, super-gradients==1.5.1, super-gradients==1.5.2, super-gradients==1.6.0, super-gradients==1.7.1, super-gradients==1.7.2, super-gradients==1.7.3, super-gradients==1.7.4, super-gradients==1.7.5, super-gradients==2.0.0, super-gradients==2.0.1, super-gradients==2.1.0, super-gradients==2.2.0, super-gradients==2.5.0, super-gradients==2.6.0, super-gradients==3.0.0, super-gradients==3.0.1, super-gradients==3.0.2, super-gradients==3.0.3, super-gradients==3.0.4, super-gradients==3.0.5, super-gradients==3.0.6, super-gradients==3.0.7, super-gradients==3.0.8, super-gradients==3.0.9, super-gradients==3.1.0 and super-gradients==3.1.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    super-gradients 3.1.1 depends on torch<1.14 and >=1.9.0
    super-gradients 3.1.0 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.9 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.8 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.7 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.6 depends on torch<=1.12 and >=1.9.0
    super-gradients 3.0.5 depends on torch<=1.12 and >=1.9.0
    super-gradients 3.0.4 depends on torch<=1.12 and >=1.9.0
    super-gradients 3.0.3 depends on onnxruntime
    super-gradients 3.0.2 depends on onnxruntime
    super-gradients 3.0.1 depends on onnxruntime
    super-gradients 3.0.0 depends on onnxruntime
    super-gradients 2.6.0 depends on onnxruntime
    super-gradients 2.5.0 depends on onnxruntime
    super-gradients 2.2.0 depends on onnxruntime
    super-gradients 2.1.0 depends on onnxruntime
    super-gradients 2.0.1 depends on onnxruntime
    super-gradients 2.0.0 depends on onnxruntime
    super-gradients 1.7.5 depends on onnxruntime
    super-gradients 1.7.4 depends on onnxruntime
    super-gradients 1.7.3 depends on onnxruntime
    super-gradients 1.7.2 depends on onnxruntime
    super-gradients 1.7.1 depends on onnxruntime
    super-gradients 1.6.0 depends on onnxruntime
    super-gradients 1.5.2 depends on onnxruntime
    super-gradients 1.5.1 depends on onnxruntime
    super-gradients 1.5.0 depends on onnxruntime
    super-gradients 1.4.0 depends on onnxruntime
    super-gradients 1.3.1 depends on onnxruntime
    super-gradients 1.3.0 depends on onnxruntime

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

AttributeError: module 'signal' has no attribute 'SIGKILL'. Did you mean: 'SIGILL'?

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

  1. Train recipe '...'
  2. Change param '...'
  3. See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Environment:

  • OS [e.g. uname -s -r -m]
  • Relevant HW info, GPU + CUDA [e.g. nvidia-smi]
  • Super Gradients version
  • Python environment [e.g. pip freeze]

Additional context

Add any other context about the problem here.

Getting started: YOLOX inference and image preprocessing

cc: @shaydeci
I want to start using YOLOX for Object Detection using the pre-trained coco weights.
Looking at the tutorial page I was unclear if the expected input to the model should be RGB or BGR.

I looked at COCODetectionDataset(DetectionDataset) which lead me to the .get_resized_image() method. This method uses cv2.imread, which is BGR. So, is it correct to assume that the YOLOX pretrained model also had this kind of preprocessing? (i.e., the same as in super_gradients.training.datasets.detection_datasets.detection_dataset.DetectionDataset.get_resized_image)?
While COCODetectionDataset seems to be using cv2 (-->BGR), the example in the tutorial above is using PIL.Image.open() which returns RGB.

Currently, I'm using super_gradients.training.transforms.transforms.rescale_and_pad_to_size to preprocess my images, but they are read by opencv (BGR), so I'm wondering if I need to skip the .swap() phase?

Thanks

YOLOv5 Tutorial Assistance

@shaydeci @oferbaratz @ofrimasad hi I'm working on our new Deci.ai + YOLOv5 partnership tutorial and need some help. The tutorial is at https://github.com/ultralytics/yolov5/wiki/YOLOv5-Deci-AI-Tutorial and is based on a word document provided by Rachel than I've converted to Markdown.

I'd like you guys at Deci to review the content and help supply the 6 additional images (denoted by IMAGE_HERE). To streamline this I've pasted the markdown content directly here in this issue. If you simply edit this issue with the appropriate changes I can then transfer those over to the YOLOv5 repo.

For the images I've provided one example hyperlinked image myself. The guidelines are that they should be 1920 pixel wide JPG screenshots at <500kB each.

Thanks for the help and let me know if you have any questions!

Markdown content below

📚 This guide explains how to streamline the process of compiling and quantizing YOLOv5 🚀 to achieve better performance with the Deci platform. UPDATED 6 August 2022.

Content

  • About the Deci Platform
  • First-time setup
  • Runtime optimization and benchmarking of your model

About Deci Platform

The Deci platform includes free tools for easily managing, optimizing, and deploying models in any production environment. Deci supports all popular DL frameworks, such as TensorFlow, PyTorch, Keras and ONNX. All you need is our web-based platform or our Python client to run it from your code.

With Deci you can:

  • Improve Inference performance by up to 10X
    Automatically compile and quantize your models and evaluate different production settings to achieve better latency, throughout, reduce model size and memory footprint on your hardware.

  • Find the best inference hardware for your application
    Benchmark your model's performance on various hardware (including edge) devices with a click of a button. Eliminate the need to manually setup and test various hardware and production settings.

  • Deploy with a Few Lines of Code
    Leverage Deci's python-based inference engine. Compatible with multiple frameworks and hardware types.

For more information about the Deci platform please visit Deci's website.

First-time setup

Step 1:

Go to https://console.deci.ai/sign-up and open your free account.

Deci AI signup page

Step 2:

In order to start optimizing your pre-trained YOLOv5 model, you will need to convert it into an ONNX format. Please follow these simple instructions on this link to convert your model to ONNX format.

Step 3:

Go to "Lab" tab and click the "New Model" button in the top right part of the screen to upload your model in the ONNX format.

Deci AI Lab page

Follow the steps of the model upload wizard to select your target hardware as well as desired batch size and quantization level for the model compilation.

Deci AI Lab page

After filling in the relevant information, click "Start". The Deci platform will automatically perform a runtime optimization of your YOLOv5 model for the hardware you selected as well as benchmark your model on various hardware types. This process takes approximately 10 minutes.

Once done, a new row will appear on your screen underneath the baseline model you previously uploaded. Here you can see the optimized version of your pre-trained YOLOv5 model.

Deci AI Lab page

What's next?

  1. You can then download your optimized model by clicking on "Deploy" button

Deci AI Lab page

You will then be prompted to download your model and receive the instructions on how to install and use Infery - Deci's runtime inference engine.

The use of Infery is optional. You can get the python raw files and use them with any other inference engine of your choice.

Deci AI Lab page

  1. Explore the optimization and benchmark results on the "Insights" tab.

Deci AI Lab page

Advanced training recipes for ddrnet

Thanks for your great works. Do you have the plan to apply more advanced training recipes for ddrnet_23_slim and ddrnet_23? The paddleseg version ddrnet_23 has achieved 79.85%mIoU.

Add gradient clipping

Is your feature request related to a problem? Please describe.

Some training recipes require gradient clipping especially when transfer learning from pretrained models. Such option should be added to training params

Describe the solution you'd like

add clip_grad_norm to training params

How to train object detection model or classification model on custom dataset?

Hi,
I have image data annotated for object detection as well on classification task. How I can train and build a object detection model and classification model on my own dataset?

A formal guideline would help me to understand the process thanks. Basically, I want to know how we can load our custom data into the Dataloader in supergradient?

STDC-seg fails to train

I am attempting to train STDC-seg model using super-gradients,

  • My dataset is in coco2017 format

Here is my train.py code
-----train.py----
from super_gradients.training.datasets.dataset_interfaces.dataset_interface import CoCoSegmentationDatasetInterface
from super_gradients.training.sg_model import SgModel
from super_gradients.training.metrics import BinaryIOU
from super_gradients.training.transforms.transforms import ResizeSeg, RandomFlip, RandomRescale, CropImageAndMask,
PadShortToCropSize, ColorJitterSeg
from super_gradients.training.utils.callbacks import BinarySegmentationVisualizationCallback, Phase
from torchvision import transforms

DEFINE DATA TRANSFORMATIONS

dataset_params = {"dataset_dir": "/home/syed/work/vision_datasets/11apr22",
"batch_size": 8,
"val_batch_size":8,
"num_classes":2
}

dataset_interface = CoCoSegmentationDatasetInterface(dataset_params,
cache_labels = False, cache_images = False, dataset_classes_inclusion_tuples_list = [(0, 'background'), (1, 'drivable-area"')])

model = SgModel("stdc2_seg50_scratch_50_epochs")

CONNECTING THE DATASET INTERFACE WILL SET SGMODEL'S CLASSES ATTRIBUTE ACCORDING TO SUPERVISELY

#model.connect_dataset_interface(dataset_interface)

THIS IS WHERE THE MAGIC HAPPENS- SINCE SGMODEL'S CLASSES ATTRIBUTE WAS SET TO BE DIFFERENT FROM CITYSCAPES'S, AFTER

LOADING THE PRETRAINED REGSET, IT WILL CALL IT'S REPLACE_HEAD METHOD AND CHANGE IT'S SEGMENTATION HEAD LAYER ACCORDING

TO OUR BINARY SEGMENTATION DATASET

model.build_model(architecture = "stdc2_seg50", arch_params={"num_classes":1})
#model.build_model("stdc2_seg50")

model.connect_dataset_interface(dataset_interface)

DEFINE TRAINING PARAMS. SEE DOCS FOR THE FULL LIST.

train_params = {"max_epochs": 50,
"lr_mode": "cosine",
"initial_lr": 0.0064, # for batch_size=16
"optimizer_params": {"momentum": 0.843,
"weight_decay": 0.00036,
"nesterov": True},
"criterion_params": {"num_classes": 1
},

            "cosine_final_lr_ratio": 0.1,
            "multiply_head_lr": 10,
            "optimizer": "SGD",
            "loss": "stdc_loss",
            "ema": True,
            "zero_weight_decay_on_bias_and_bn": True,
            "average_best_models": True,
            "mixed_precision": False,
            "metric_to_watch": "mean_IOU",
            "greater_metric_to_watch_is_better": True,
            "train_metrics_list": [BinaryIOU()],
            "valid_metrics_list": [BinaryIOU()],
            "loss_logging_items_names": ["loss"],
            "phase_callbacks": [BinarySegmentationVisualizationCallback(phase=Phase.VALIDATION_BATCH_END,
                                                                        freq=1,
                                                                        last_img_idx_in_batch=4)],
            }

model.train(train_params)

The training stops with following error message.

(super-gradients) gridai@session:~/work/super-gradients → python train.py
You did not mention an AWS environment.You can set the environment variable ENVIRONMENT_NAME with one of the values: development,staging,production
/home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
callbacks -WARNING- Failed to import deci_lab_client
loading annotations into memory...
Done (t=0.12s)
creating index...
index created!
loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
/home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
/home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/deprecate/deprecation.py:115: FutureWarning: The IoU was deprecated since v0.7 in favor of torchmetrics.classification.jaccard.JaccardIndex. It will be removed in v0.8.
stream(template_mgs % msg_args)
sg_model -INFO- Using EMA with params {}
"events.out.tfevents.1649674474.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6391.0" will not be deleted
"events.out.tfevents.1649674209.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6318.0" will not be deleted
"events.out.tfevents.1649761776.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1654.0" will not be deleted
"events.out.tfevents.1649761893.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1800.0" will not be deleted
"events.out.tfevents.1649761085.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1571.0" will not be deleted
"events.out.tfevents.1649676959.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6464.0" will not be deleted
"events.out.tfevents.1649761829.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1727.0" will not be deleted
"events.out.tfevents.1649674011.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6245.0" will not be deleted
sg_model -INFO- Started training for 50 epochs (0/49)

Train epoch 0: 0%| | 0/210 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [27,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [28,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [29,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [96,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [97,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [98,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [99,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [100,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [101,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [102,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [103,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [104,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [59,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [63,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [112,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [113,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [114,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [115,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [116,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [117,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [118,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [119,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [120,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [121,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [122,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [123,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [124,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [125,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [126,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [127,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [87,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [88,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [89,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [90,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [91,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [92,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [93,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [94,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [95,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
Train epoch 0: 0%| | 0/210 [00:02<?, ?it/s]
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
Exception raised from createEvent at ../aten/src/ATen/cuda/CUDAEvent.h:174 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f29ab3097d2 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: + 0x10cf22a (0x7f29ac8eb22a in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0x2fff28 (0x7f29fe644f28 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #3: c10::TensorImpl::release_resources() + 0x175 (0x7f29ab2f2005 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #4: + 0x1ede49 (0x7f29fe532e49 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x4da268 (0x7f29fe81f268 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f29fe81f562 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)

frame #27: __libc_start_main + 0xf3 (0x7f2a010290b3 in /usr/lib/x86_64-linux-gnu/libc.so.6)

Aborted
(super-gradients) gridai@session:~/work/super-gradients →

Here are the system details

(super-gradients) gridai@session:/work/super-gradients → uname -srm
Linux 5.4.129-63.229.amzn2.x86_64 x86_64
(super-gradients) gridai@session:
/work/super-gradients → nvidia-smi
Tue Apr 12 11:30:44 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 29C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
(super-gradients) gridai@session:~/work/super-gradients → nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

Also Would you please provide an example code using coco128 dataset training diffrent STDC, with some documentation regarding important parameters to tweak.

Thanks

Add a `conda` install option for `super-gradients`

It will be helpful to have super-gradients added to conda-forge channel. I have started the work already in the following PR.

But there seems to be a problem with one of its dependencies:

  • deci-lab-client PyPI

    Neither does it have any source file (*.tar.gz) on PyPI, nor any release on a public GitHub repository.

Please provide (preferably) the source file for deci-lab-client on PyPI.

🔥 CONSTRAINT: To add any package on conda-forge channel, you need ALL its dependencies on conda-forge as well

test

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Support for NLP models?

Hi Deci team!

Thanks for the great open-source code here! A quick question about NLP support.

Is your feature request related to a problem? Please describe.

I was wondering how extensible this library is to NLP tasks. In the blog post, BERT was mentioned but it seems like all the models right now are CV tasks.

Describe the solution you'd like

It would be great to hear what I might need to setup to be able to get an NLP task running with super-gradients.

Describe alternatives you've considered

I've looked into the SgModule class and how the classification models are defined. But I didn't want to dive too deep before consulting your team first in case this was already supported.

Additional context

n/a

I have a pretrained model how to load them properly and predict it

Describe the bug

I have coco pretrained model how to load them properly and predict it. below is my code i get error if i didnt use "model.set_dataset_processing_params"

from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L,
                   checkpoint_path="./yolo_nas_l_coco.pth",
                   num_classes=80)

url = "https://previews.123rf.com/images/freeograph/freeograph2011/freeograph201100150/158301822-group-of-friends-gathering-around-table-at-home.jpg"
model.set_dataset_processing_params( class_names=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79"],
                                    image_processor="NormalizeImage",
                                     mean=1.0, std=1.0)
model.predict(url, conf=0.25).save("output")

I am getting following error

File "/-/super_gradients/common/decorators/factory_decorator.py", line 27, in wrapper
    kwargs[param_name] = factory.get(kwargs[param_name])
  File "/-/super_gradients/common/factories/processing_factory.py", line 15, in get
    return super().get(conf)
  File "/-/super_gradients/common/factories/base_factory.py", line 47, in get
    return self.type_dict[conf]()
TypeError: __init__() missing 2 required positional arguments: 'mean' and 'std'

To Reproduce

Steps to reproduce the behavior:

  1. Train recipe '...'
  2. Change param '...'
  3. See error

Expected behavior

it should predict and save output

Environment:

  • OS - ubuntu
  • Relevant HW info, GPU + CUDA [e.g. nvidia-smi] - GPU + CUDA
  • Super Gradients version 3.1.0
  • Python environment [e.g. pip freeze] 3.9

Error in building wheels

Getting errors while building wheels for collected packages:

> treelib, stringcase, pycocotools, termcolor, future, antlr4-python3-runtime.

I think, setup.py needs to be revised.

Tried with pip install super-gradients and pip install git+https://github.com/Deci-AI/super-gradients.git@stable both

image

Filtering Classes

Can we filter the classes ?Is there any argument that we can use ?

Cityscapes datastructure

Hi,
I want to replicate some results, however, I have some issues with loading in the Cityscapes dataset and I can't find anywhere how the dataset structure should be. I saw the list_file and labels_csv_path are needed, but I don't know how to obtain these.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.