deci-ai / super-gradients Goto Github PK

View Code? Open in Web Editor NEW

4.4K 43.0 478.0 410.44 MB

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.

Home Page: https://www.supergradients.com

License: Apache License 2.0

Shell 0.01% Python 6.91% Jupyter Notebook 93.07% Dockerfile 0.01% Makefile 0.01%

dependency-graph

super-gradients's Issues

MaskAttentionLoss in DiceCEEdgeLoss doesn't handle images without any edges

Describe the bug

Training models that use DiceCEEdgeLoss results in NaN loss on images that only contain one semantic class. The edge_target becomes a tensor filled with zeros because there are no edges in the image:

super-gradients/src/super_gradients/training/losses/dice_ce_edge_loss.py

Lines 101 to 103 in aa27454

    
           edge_target = target_to_binary_edge( 
        
               target, num_classes=self.num_classes, kernel_size=self.edge_kernel, ignore_index=self.ignore_index, flatten_channels=True 
        
           )

Then, when computing the MaskAttentionLoss, mask_loss is a tensor filled with zeros, gets reassigned to an empty tensor, and, finally, computing the mean of an empty tensor results in NaN.

super-gradients/src/super_gradients/training/losses/mask_loss.py

Lines 45 to 47 in aa27454

    
           mask_loss = mask_loss[mask == 1]  # consider only mask samples for mask loss computing 
        
           mask_loss = apply_reduce(mask_loss, self.reduction)

To Reproduce

I've written a new test in tests/unit_tests/mask_loss_test.py that reproduces the problem.

def test_with_cross_entropy_loss_maskless(self):
    """
    Test case with mask filled with zeros, corresponding to a scenario without
    attention. It's expected that the mask doesn't contribute to the loss.

    This scenario may happen when using edge masks on an image without
    edges - there's only one semantic region in the whole image.

    Shapes: predict [BxCxHxW], target [BxHxW], mask [Bx1xHxW]
    """
    predict = torch.randn(self.batch, self.num_classes, self.img_size, self.img_size)
    target = self._get_default_target_tensor()
    # Create a mask filled with zeros to disable the attention component
    mask = self._get_default_mask_tensor() * 0.0

    loss_weigths = [1.0, 0.5]
    ce_crit = nn.CrossEntropyLoss(reduction="none")
    mask_ce_crit = MaskAttentionLoss(criterion=ce_crit, loss_weights=loss_weigths)

    # expected result - no contribution from mask
    ce_loss = ce_crit(predict, target)
    expected_loss = ce_loss.mean() * loss_weigths[0]

    # mask ce loss result
    loss = mask_ce_crit(predict, target, mask)

    self._assertion_torch_values(expected_loss, loss)

Running this test results in:

AssertionError: False is not true : Unequal torch tensors: excepted: 1.7192925214767456, found: nan

Expected behavior

A mask filed with zeros should "disable" attention. Thus, the mask should not contribute to the loss.

Environment:

3.0.7

Additional context

Can be fixed by checking if mask_loss is NaN and setting it to 0 instead. Like this:

mask_loss = mask_loss if not mask_loss.isnan() else mask_loss.new_tensor(0.0)

The mAP reported by SG is not consistent with COCO-API

Describe the bug

When training a detection model, e.g. SSD Lite MobileNet, the mAP shown in SG (TensorBoard) is much higher than the mAP returned via COCO-API

To Reproduce

Steps to reproduce the behavior:

Branch https://github.com/Deci-AI/super-gradients/tree/ssd_mobilenet
Run train_from_recipe.py with coco_ssd_lite_mobilenet_v2.yaml or coco_ssd_mobilenet_v1.yaml
(The model should reach ~30 mAP)

However: running the COCO-API reference returns ~0.17 mAP.

Expected behavior

The mAP information should correspond to COCO-API's mAP.

Screenshots

AttributeError: module 'signal' has no attribute 'SIGKILL'. Did you mean: 'SIGILL'?

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

Train recipe '...'
Change param '...'
See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Environment:

OS [e.g. uname -s -r -m]
Relevant HW info, GPU + CUDA [e.g. nvidia-smi]
Super Gradients version
Python environment [e.g. pip freeze]

Additional context

Add any other context about the problem here.

how to convert .pth model into .h5 format? or how can we use .pth weight as api

how can we use .pth or .onnx weight ,there inference & want to use this weight in flask,gradio, etc apps

Getting started: YOLOX inference and image preprocessing

cc: @shaydeci
I want to start using YOLOX for Object Detection using the pre-trained coco weights.
Looking at the tutorial page I was unclear if the expected input to the model should be RGB or BGR.

I looked at COCODetectionDataset(DetectionDataset) which lead me to the .get_resized_image() method. This method uses cv2.imread, which is BGR. So, is it correct to assume that the YOLOX pretrained model also had this kind of preprocessing? (i.e., the same as in super_gradients.training.datasets.detection_datasets.detection_dataset.DetectionDataset.get_resized_image)?
While COCODetectionDataset seems to be using cv2 (-->BGR), the example in the tutorial above is using PIL.Image.open() which returns RGB.

Currently, I'm using super_gradients.training.transforms.transforms.rescale_and_pad_to_size to preprocess my images, but they are read by opencv (BGR), so I'm wondering if I need to skip the .swap() phase?

Thanks

Add gradient clipping

Is your feature request related to a problem? Please describe.

Some training recipes require gradient clipping especially when transfer learning from pretrained models. Such option should be added to training params

Describe the solution you'd like

add clip_grad_norm to training params

Bug in super-gradients in Linux ubuntu and google colab

You'll use yolo_nas_l throughout this notebook. Because you should always go big, or go home.

It's a good life philosophy.

But your fine tuning notebook seems to be not working, I tried to rerun your notebook and in the second cell it is showing me error.
Here is a screenshot of error.

Also following your code into linux env I am facing certain errors while installing super-gradients.

Building wheels for collected packages: pycocotools, stringcase, termcolor, treelib, antlr4-python3-runtime, future
Building wheel for pycocotools (PEP 517) ... error
ERROR: Command errored out with exit status 1:
command: /home/soumyadeep/mmaction_custom/ava_custom_v2/Yolo_train/YOLO-NAS/yolonas_env/bin/python3 /tmp/tmpxprr4m03 build_wheel /tmp/tmp5_axaif9
cwd: /tmp/pip-install-n3xzz9vp/pycocotools
Complete output (67 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-38
creating build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/init.py -> build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/mask.py -> build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/cocoeval.py -> build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/coco.py -> build/lib.linux-x86_64-cpython-38/pycocotools
running build_ext
skipping 'pycocotools/_mask.c'

and more.

Is there any soluiton into it?

Upsample size mismatch in segmentation models

Describe the bug

Depending on the input image size, upsampled feature maps with nn.Upsample don't always match the size of the skip connection. This is a known issue, some reference links:

Replacing nn.Upsample with torch.nn.functional.interpolate seems to be the recommended solution.

To Reproduce

Here's a snippet using PP-LiteSeg. The dataset is cityscapes, but that's not important, the image size is the important factor. I imagine that the issue is in all models using nn.Upsample and concatenating with skip connections:

from super_gradients.training import models, dataloaders, Trainer
from super_gradients.common.object_names import Models
from super_gradients.training.metrics import IoU


trainer = Trainer(experiment_name="eval-pp-liteseg-b75")
val_loader = dataloaders.cityscapes_stdc_seg75_val(dataset_params={
    "transforms": [
            {
                "SegRescale": {
                    "long_size": 1025
                }
            }
        ]
    },
    dataloader_params={"batch_size": 1},
)
model = models.get(
    Models.PP_LITE_B_SEG75,
    pretrained_weights="cityscapes",
)
metric = IoU(num_classes=20, ignore_index=19)
miou = trainer.test(
    model=model,
    test_loader=val_loader,
    test_metrics_list=[metric],
    metrics_progress_verbose=False
)[0].cpu().item()
print(f"mIoU: {miou}")

Results in an error:

  File ".../src/super_gradients/training/models/segmentation_models/ppliteseg.py", line 52, in forward
    atten = torch.cat([*self._avg_max_spatial_reduce(x, use_concat=False), *self._avg_max_spatial_reduce(skip, use_concat=False)], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 66 but got size 65 for tensor number 2 in the list.

Expected behavior

Fully convolutional segmentation models should work for all input image sizes.

Environment:

Ubuntu
super-gradients v3.0.7
PyTorch 1.11

YOLOv5 Tutorial Assistance

@shaydeci @oferbaratz @ofrimasad hi I'm working on our new Deci.ai + YOLOv5 partnership tutorial and need some help. The tutorial is at https://github.com/ultralytics/yolov5/wiki/YOLOv5-Deci-AI-Tutorial and is based on a word document provided by Rachel than I've converted to Markdown.

I'd like you guys at Deci to review the content and help supply the 6 additional images (denoted by IMAGE_HERE). To streamline this I've pasted the markdown content directly here in this issue. If you simply edit this issue with the appropriate changes I can then transfer those over to the YOLOv5 repo.

For the images I've provided one example hyperlinked image myself. The guidelines are that they should be 1920 pixel wide JPG screenshots at <500kB each.

Thanks for the help and let me know if you have any questions!

Markdown content below

📚 This guide explains how to streamline the process of compiling and quantizing YOLOv5 🚀 to achieve better performance with the Deci platform. UPDATED 6 August 2022.

Content

About the Deci Platform
First-time setup
Runtime optimization and benchmarking of your model

About Deci Platform

The Deci platform includes free tools for easily managing, optimizing, and deploying models in any production environment. Deci supports all popular DL frameworks, such as TensorFlow, PyTorch, Keras and ONNX. All you need is our web-based platform or our Python client to run it from your code.

With Deci you can:

Improve Inference performance by up to 10X
Automatically compile and quantize your models and evaluate different production settings to achieve better latency, throughout, reduce model size and memory footprint on your hardware.
Find the best inference hardware for your application
Benchmark your model's performance on various hardware (including edge) devices with a click of a button. Eliminate the need to manually setup and test various hardware and production settings.
Deploy with a Few Lines of Code
Leverage Deci's python-based inference engine. Compatible with multiple frameworks and hardware types.

For more information about the Deci platform please visit Deci's website.

First-time setup

Step 1:

Go to https://console.deci.ai/sign-up and open your free account.

Step 2:

In order to start optimizing your pre-trained YOLOv5 model, you will need to convert it into an ONNX format. Please follow these simple instructions on this link to convert your model to ONNX format.

Step 3:

Go to "Lab" tab and click the "New Model" button in the top right part of the screen to upload your model in the ONNX format.

Follow the steps of the model upload wizard to select your target hardware as well as desired batch size and quantization level for the model compilation.

After filling in the relevant information, click "Start". The Deci platform will automatically perform a runtime optimization of your YOLOv5 model for the hardware you selected as well as benchmark your model on various hardware types. This process takes approximately 10 minutes.

Once done, a new row will appear on your screen underneath the baseline model you previously uploaded. Here you can see the optimized version of your pre-trained YOLOv5 model.

What's next?

You can then download your optimized model by clicking on "Deploy" button

You will then be prompted to download your model and receive the instructions on how to install and use Infery - Deci's runtime inference engine.

The use of Infery is optional. You can get the python raw files and use them with any other inference engine of your choice.

Explore the optimization and benchmark results on the "Insights" tab.

Integrating custom models into backbone

Hello there!

I loved your work, keep it up!

I'd like to integrate a custom attention model as backbone model for an object detection task and test it out. Is there any documentation or tutorial that you can provide so that I can follow? Or any help would be appreciated!

How to train object detection model or classification model on custom dataset?

Hi,
I have image data annotated for object detection as well on classification task. How I can train and build a object detection model and classification model on my own dataset?

A formal guideline would help me to understand the process thanks. Basically, I want to know how we can load our custom data into the Dataloader in supergradient?

Select which gpu to train on

Is your feature request related to a problem? Please describe.

I am trying to train yolo-nas in an environment with multiple gpus (device=0 and 1), but I only want to use 1. Is there a way to train the model specifically in the 2nd gpu (device=1)

Describe the solution you'd like

Train the model by only using the 2nd gpu

Describe alternatives you've considered

Nothing really.

How to initialize and use YOLO-NAS quantized models as listed in docs?

For YOLO-NAS FP16 model, is it simply?

yolo_nas_s = super_gradients.training.models.get("yolo_nas_s", pretrained_weights="coco").to(torch.half)

What's the procedure to load/initialize YOLO-NASINT8 quantized model?

Do we need to perform some image preprocessing to use them?

Is it possible to train from multiple data directory?

Hello, i am curious for new yolo nas model. I am new to super gradient. Is it possible to train a model on custom dataset which are prepared in yolov5/8 format specially in multiple directories?

Is there support for 2 channel and 4 channel training

In Yolov8 we can train 2 channel images or 4 channel images by adding ch:2 or ch:4 parameter to yolov8s.yaml file . Does Yolo-NAS also have similar support if so where should we change it

How can i see the prediction result of a frame/image as bbox coordinates ?

I want to combine yolo-nas with deepsort for object tracking problem, so I need to get the bbox of each object.

OpenCV integration

Is it possible for a trained model with YOLONAS be used with OpenCV using, for example, the function cv::dnn::readNetFromONNX()

I have a pretrained model how to load them properly and predict it

Describe the bug

I have coco pretrained model how to load them properly and predict it. below is my code i get error if i didnt use "model.set_dataset_processing_params"

from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L,
                   checkpoint_path="./yolo_nas_l_coco.pth",
                   num_classes=80)

url = "https://previews.123rf.com/images/freeograph/freeograph2011/freeograph201100150/158301822-group-of-friends-gathering-around-table-at-home.jpg"
model.set_dataset_processing_params( class_names=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79"],
                                    image_processor="NormalizeImage",
                                     mean=1.0, std=1.0)
model.predict(url, conf=0.25).save("output")

I am getting following error

File "/-/super_gradients/common/decorators/factory_decorator.py", line 27, in wrapper
    kwargs[param_name] = factory.get(kwargs[param_name])
  File "/-/super_gradients/common/factories/processing_factory.py", line 15, in get
    return super().get(conf)
  File "/-/super_gradients/common/factories/base_factory.py", line 47, in get
    return self.type_dict[conf]()
TypeError: __init__() missing 2 required positional arguments: 'mean' and 'std'

To Reproduce

Steps to reproduce the behavior:

Train recipe '...'
Change param '...'
See error

Expected behavior

it should predict and save output

Environment:

OS - ubuntu
Relevant HW info, GPU + CUDA [e.g. nvidia-smi] - GPU + CUDA
Super Gradients version 3.1.0
Python environment [e.g. pip freeze] 3.9

Support for NLP models?

Hi Deci team!

Thanks for the great open-source code here! A quick question about NLP support.

Is your feature request related to a problem? Please describe.

I was wondering how extensible this library is to NLP tasks. In the blog post, BERT was mentioned but it seems like all the models right now are CV tasks.

Describe the solution you'd like

It would be great to hear what I might need to setup to be able to get an NLP task running with super-gradients.

Describe alternatives you've considered

I've looked into the SgModule class and how the classification models are defined. But I didn't want to dive too deep before consulting your team first in case this was already supported.

Additional context

n/a

PP-LiteSeg Attention Refinement Module (ARM) missing

Hi,

Isn't there supposed to be an Attention Refinement Module as part of the STDC encoder? I don't see it in the STDC backbone

Error in building wheels

Getting errors while building wheels for collected packages:

> treelib, stringcase, pycocotools, termcolor, future, antlr4-python3-runtime.

I think, setup.py needs to be revised.

Tried with pip install super-gradients and pip install git+https://github.com/Deci-AI/super-gradients.git@stable both

There seems to be a problem with the MobilNetV3 structure

Hello, the network structure of MobileNetV3 seems to be different from that of the original author. The SE module of V3 is not a module that uses SENet. The author changed nn.Line to nn.Conv

python TFlite or onnx inference script

Is your feature request related to a problem?

No. i am able to convert .pth to tflite models i need a starter script which can be used for tflite inference

Describe the solution you'd like

A standalone python inference script

Describe alternatives you've considered

if you have onnx inference script also it would be helpful

How to accelerate regseg on tensorRT

I tried to use tensorrt with original regseg repository.
However onnx had trouble with torch.split and also torch2trt unable to use with specific tensorrt version.
Please let me know how did you use tensorrt with regseg model when measure the latency.

PIP package versions have conflicting dependencies.

Hey,

I have some conflicts during pip install

My requirements.txt

aiohttp
sqlalchemy
pandas
aiosqlite
matplotlib
super_gradients

Support for 3D images

Is your feature request related to a problem? Please describe.

Medical image segmentation often relies on 3D scans (MRI, CT), but there are very few pretrained models to use for different medical image tasks (classification, segmentation)

Describe the solution you'd like

It would be great if there were some state of the art models available in super-gradients, and even better if they would be pre-trained. For example this model http://arxiv.org/abs/2208.09567, or use the synthesized 100k brains to train a 3d classification model like this one: http://arxiv.org/abs/2209.07162, https://arxiv.org/pdf/2303.08216.pdf

Kernel Dead Issue

Hi @BloodAxe @ofrimasad @jonathan-sha @spsancti @yurkovak ,
Thanks for the cool work. I was trying with the quick notebook on 5 min video.But the kernel was dead issue was rising.I couldn't run the inference. Any idea on why is happening ?

Cityscapes datastructure

Hi,
I want to replicate some results, however, I have some issues with loading in the Cityscapes dataset and I can't find anywhere how the dataset structure should be. I saw the list_file and labels_csv_path are needed, but I don't know how to obtain these.

Advanced training recipes for ddrnet

Thanks for your great works. Do you have the plan to apply more advanced training recipes for ddrnet_23_slim and ddrnet_23? The paddleseg version ddrnet_23 has achieved 79.85%mIoU.

About YOLONAS License

Hello,

As far as I understood, YOLONAS license prevents the commercial use of any forks of the repo with custom architectural changes. It also prevents the commercial use of any custom finetuned model. Am I right on that?

https://github.com/Deci-AI/super-gradients/blob/1c67ce369b53daf162a91faacb24f3501e7b84ad/LICENSE.YOLONAS.md

STDC-seg fails to train

I am attempting to train STDC-seg model using super-gradients,

My dataset is in coco2017 format

Here is my train.py code
-----train.py----
from super_gradients.training.datasets.dataset_interfaces.dataset_interface import CoCoSegmentationDatasetInterface
from super_gradients.training.sg_model import SgModel
from super_gradients.training.metrics import BinaryIOU
from super_gradients.training.transforms.transforms import ResizeSeg, RandomFlip, RandomRescale, CropImageAndMask,
PadShortToCropSize, ColorJitterSeg
from super_gradients.training.utils.callbacks import BinarySegmentationVisualizationCallback, Phase
from torchvision import transforms

DEFINE DATA TRANSFORMATIONS

dataset_params = {"dataset_dir": "/home/syed/work/vision_datasets/11apr22",
"batch_size": 8,
"val_batch_size":8,
"num_classes":2
}

dataset_interface = CoCoSegmentationDatasetInterface(dataset_params,
cache_labels = False, cache_images = False, dataset_classes_inclusion_tuples_list = [(0, 'background'), (1, 'drivable-area"')])

model = SgModel("stdc2_seg50_scratch_50_epochs")

CONNECTING THE DATASET INTERFACE WILL SET SGMODEL'S CLASSES ATTRIBUTE ACCORDING TO SUPERVISELY

#model.connect_dataset_interface(dataset_interface)

THIS IS WHERE THE MAGIC HAPPENS- SINCE SGMODEL'S CLASSES ATTRIBUTE WAS SET TO BE DIFFERENT FROM CITYSCAPES'S, AFTER

LOADING THE PRETRAINED REGSET, IT WILL CALL IT'S REPLACE_HEAD METHOD AND CHANGE IT'S SEGMENTATION HEAD LAYER ACCORDING

TO OUR BINARY SEGMENTATION DATASET

model.build_model(architecture = "stdc2_seg50", arch_params={"num_classes":1})
#model.build_model("stdc2_seg50")

model.connect_dataset_interface(dataset_interface)

DEFINE TRAINING PARAMS. SEE DOCS FOR THE FULL LIST.

train_params = {"max_epochs": 50,
"lr_mode": "cosine",
"initial_lr": 0.0064, # for batch_size=16
"optimizer_params": {"momentum": 0.843,
"weight_decay": 0.00036,
"nesterov": True},
"criterion_params": {"num_classes": 1
},

            "cosine_final_lr_ratio": 0.1,
            "multiply_head_lr": 10,
            "optimizer": "SGD",
            "loss": "stdc_loss",
            "ema": True,
            "zero_weight_decay_on_bias_and_bn": True,
            "average_best_models": True,
            "mixed_precision": False,
            "metric_to_watch": "mean_IOU",
            "greater_metric_to_watch_is_better": True,
            "train_metrics_list": [BinaryIOU()],
            "valid_metrics_list": [BinaryIOU()],
            "loss_logging_items_names": ["loss"],
            "phase_callbacks": [BinarySegmentationVisualizationCallback(phase=Phase.VALIDATION_BATCH_END,
                                                                        freq=1,
                                                                        last_img_idx_in_batch=4)],
            }

model.train(train_params)

The training stops with following error message.

(super-gradients) gridai@session:~/work/super-gradients → python train.py
You did not mention an AWS environment.You can set the environment variable ENVIRONMENT_NAME with one of the values: development,staging,production
/home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
callbacks -WARNING- Failed to import deci_lab_client
loading annotations into memory...
Done (t=0.12s)
creating index...
index created!
loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
/home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
/home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/deprecate/deprecation.py:115: FutureWarning: The IoU was deprecated since v0.7 in favor of torchmetrics.classification.jaccard.JaccardIndex. It will be removed in v0.8.
stream(template_mgs % msg_args)
sg_model -INFO- Using EMA with params {}
"events.out.tfevents.1649674474.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6391.0" will not be deleted
"events.out.tfevents.1649674209.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6318.0" will not be deleted
"events.out.tfevents.1649761776.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1654.0" will not be deleted
"events.out.tfevents.1649761893.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1800.0" will not be deleted
"events.out.tfevents.1649761085.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1571.0" will not be deleted
"events.out.tfevents.1649676959.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6464.0" will not be deleted
"events.out.tfevents.1649761829.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1727.0" will not be deleted
"events.out.tfevents.1649674011.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6245.0" will not be deleted
sg_model -INFO- Started training for 50 epochs (0/49)

Train epoch 0: 0%| | 0/210 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [27,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [28,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [29,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [96,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [97,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [98,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [99,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [100,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [101,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [102,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [103,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [104,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [59,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [63,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [112,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [113,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [114,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [115,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [116,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [117,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [118,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [119,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [120,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [121,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [122,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [123,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [124,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [125,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [126,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [127,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [87,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [88,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [89,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [90,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [91,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [92,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [93,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [94,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [95,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
Train epoch 0: 0%| | 0/210 [00:02<?, ?it/s]
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
Exception raised from createEvent at ../aten/src/ATen/cuda/CUDAEvent.h:174 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f29ab3097d2 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: + 0x10cf22a (0x7f29ac8eb22a in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0x2fff28 (0x7f29fe644f28 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #3: c10::TensorImpl::release_resources() + 0x175 (0x7f29ab2f2005 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #4: + 0x1ede49 (0x7f29fe532e49 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x4da268 (0x7f29fe81f268 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f29fe81f562 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)

frame #27: __libc_start_main + 0xf3 (0x7f2a010290b3 in /usr/lib/x86_64-linux-gnu/libc.so.6)

Aborted
(super-gradients) gridai@session:~/work/super-gradients →

Here are the system details

(super-gradients) gridai@session:/work/super-gradients → uname -srm
Linux 5.4.129-63.229.amzn2.x86_64 x86_64
(super-gradients) gridai@session:/work/super-gradients → nvidia-smi
Tue Apr 12 11:30:44 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 29C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
(super-gradients) gridai@session:~/work/super-gradients → nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

Also Would you please provide an example code using coco128 dataset training diffrent STDC, with some documentation regarding important parameters to tweak.

Thanks

CoCoSegmentationDataSet._generate_samples_and_targets() doesn't call super (= no caching)

Describe the bug

CoCoSegmentationDataSet._generate_samples_and_targets() doesn't call the corresponding parent class method, and therefore image and label caching is doesn't work for this class.
The solution is to add super()._generate_samples_and_targets() as the last line in CoCoSegmentationDataSet._generate_samples_and_targets().
Excuse me for not making this as a pull request this time.

How to merge label classes without modifying the ground truth data

I want to take a set of model weights for semantic segmentation pre-trained on cityscapes and finetune it such that it ignores all classes other than road.
Is there currently a way to merge or ignore label classes, e.g. by passing an argument to the dataloader?
Else can you give a hint where best to modify the source code to achieve this?

My best guess would be to modify this function to set other to be ignored.

Dependency Issue during Installation

Describe the bug

C:\Users\Isaac>pip install super_gradients
Collecting super_gradients
  Using cached super_gradients-3.1.1-py3-none-any.whl (964 kB)
INFO: pip is looking at multiple versions of super-gradients to determine which version is compatible with other requirements. This could take a while.
  Using cached super_gradients-3.1.0-py3-none-any.whl (965 kB)
  Using cached super_gradients-3.0.9-py3-none-any.whl (938 kB)
  Using cached super_gradients-3.0.8-py3-none-any.whl (892 kB)
  Using cached super_gradients-3.0.7-py3-none-any.whl (794 kB)
  Using cached super_gradients-3.0.6-py3-none-any.whl (762 kB)
  Using cached super_gradients-3.0.5-py3-none-any.whl (748 kB)
  Using cached super_gradients-3.0.4-py3-none-any.whl (748 kB)
INFO: pip is looking at multiple versions of super-gradients to determine which version is compatible with other requirements. This could take a while.
  Using cached super_gradients-3.0.3-py3-none-any.whl (732 kB)
Requirement already satisfied: torch>=1.9.0 in c:\python311\lib\site-packages (from super_gradients) (2.0.0)
Requirement already satisfied: tqdm>=4.57.0 in c:\python311\lib\site-packages (from super_gradients) (4.65.0)
Collecting boto3>=1.17.15 (from super_gradients)
  Using cached boto3-1.26.126-py3-none-any.whl (135 kB)
Collecting jsonschema>=3.2.0 (from super_gradients)
  Using cached jsonschema-4.17.3-py3-none-any.whl (90 kB)
Collecting Deprecated>=1.2.11 (from super_gradients)
  Using cached Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: opencv-python>=4.5.1 in c:\python311\lib\site-packages (from super_gradients) (4.7.0.72)
Requirement already satisfied: scipy>=1.6.1 in c:\python311\lib\site-packages (from super_gradients) (1.10.1)
Requirement already satisfied: matplotlib>=3.3.4 in c:\python311\lib\site-packages (from super_gradients) (3.7.1)
Requirement already satisfied: psutil>=5.8.0 in c:\python311\lib\site-packages (from super_gradients) (5.9.5)
Collecting tensorboard>=2.4.1 (from super_gradients)
  Using cached tensorboard-2.12.3-py3-none-any.whl (5.6 MB)
Requirement already satisfied: setuptools>=21.0.0 in c:\python311\lib\site-packages (from super_gradients) (65.5.0)
Collecting coverage~=5.3.1 (from super_gradients)
  Using cached coverage-5.3.1.tar.gz (684 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torchvision>=0.10.0 in c:\python311\lib\site-packages (from super_gradients) (0.15.1)
Collecting sphinx~=4.0.2 (from super_gradients)
  Using cached Sphinx-4.0.3-py3-none-any.whl (2.9 MB)
Collecting sphinx-rtd-theme (from super_gradients)
  Using cached sphinx_rtd_theme-1.2.0-py2.py3-none-any.whl (2.8 MB)
Collecting torchmetrics==0.8 (from super_gradients)
  Using cached torchmetrics-0.8.0-py3-none-any.whl (408 kB)
Requirement already satisfied: pillow>=9.2.0 in c:\python311\lib\site-packages (from super_gradients) (9.5.0)
Collecting hydra-core>=1.2.0 (from super_gradients)
  Using cached hydra_core-1.3.2-py3-none-any.whl (154 kB)
Collecting omegaconf (from super_gradients)
  Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Collecting super_gradients
  Using cached super_gradients-3.0.2-py3-none-any.whl (664 kB)
  Using cached super_gradients-3.0.1-py3-none-any.whl (635 kB)
  Using cached super_gradients-3.0.0-py3-none-any.whl (615 kB)
  Using cached super_gradients-2.6.0-py3-none-any.whl (11.0 MB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Using cached super_gradients-2.5.0-py3-none-any.whl (11.0 MB)
  Using cached super_gradients-2.2.0-py3-none-any.whl (10.9 MB)
  Using cached super_gradients-2.1.0-py3-none-any.whl (23.0 MB)
Collecting elasticsearch==7.15.2 (from super_gradients)
  Using cached elasticsearch-7.15.2-py2.py3-none-any.whl (379 kB)
Collecting CMRESHandler>=1.0.0 (from super_gradients)
  Using cached CMRESHandler-1.0.0-py3-none-any.whl (15 kB)
Collecting super_gradients
  Using cached super_gradients-2.0.1-py3-none-any.whl (19.4 MB)
  Using cached super_gradients-2.0.0-py3-none-any.whl (19.4 MB)
  Using cached super_gradients-1.7.5-py3-none-any.whl (19.3 MB)
Collecting torchmetrics==0.7.3 (from super_gradients)
  Using cached torchmetrics-0.7.3-py3-none-any.whl (398 kB)
Collecting super_gradients
  Using cached super_gradients-1.7.4-py3-none-any.whl (19.3 MB)
  Using cached super_gradients-1.7.3-py3-none-any.whl (19.3 MB)
Collecting torchmetrics>=0.5.0 (from super_gradients)
  Using cached torchmetrics-0.11.4-py3-none-any.whl (519 kB)
Collecting super_gradients
  Using cached super_gradients-1.7.2-py3-none-any.whl (19.3 MB)
  Using cached super_gradients-1.7.1-py3-none-any.whl (15.1 MB)
  Using cached super_gradients-1.6.0-py3-none-any.whl (547 kB)
  Using cached super_gradients-1.5.2-py3-none-any.whl (540 kB)
  Using cached super_gradients-1.5.1-py3-none-any.whl (540 kB)
  Using cached super_gradients-1.5.0-py3-none-any.whl (497 kB)
  Using cached super_gradients-1.4.0-py3-none-any.whl (419 kB)
  Using cached super_gradients-1.3.1-py3-none-any.whl (416 kB)
  Using cached super_gradients-1.3.0-py3-none-any.whl (415 kB)
ERROR: Cannot install super-gradients==1.3.0, super-gradients==1.3.1, super-gradients==1.4.0, super-gradients==1.5.0, super-gradients==1.5.1, super-gradients==1.5.2, super-gradients==1.6.0, super-gradients==1.7.1, super-gradients==1.7.2, super-gradients==1.7.3, super-gradients==1.7.4, super-gradients==1.7.5, super-gradients==2.0.0, super-gradients==2.0.1, super-gradients==2.1.0, super-gradients==2.2.0, super-gradients==2.5.0, super-gradients==2.6.0, super-gradients==3.0.0, super-gradients==3.0.1, super-gradients==3.0.2, super-gradients==3.0.3, super-gradients==3.0.4, super-gradients==3.0.5, super-gradients==3.0.6, super-gradients==3.0.7, super-gradients==3.0.8, super-gradients==3.0.9, super-gradients==3.1.0 and super-gradients==3.1.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    super-gradients 3.1.1 depends on torch<1.14 and >=1.9.0
    super-gradients 3.1.0 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.9 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.8 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.7 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.6 depends on torch<=1.12 and >=1.9.0
    super-gradients 3.0.5 depends on torch<=1.12 and >=1.9.0
    super-gradients 3.0.4 depends on torch<=1.12 and >=1.9.0
    super-gradients 3.0.3 depends on onnxruntime
    super-gradients 3.0.2 depends on onnxruntime
    super-gradients 3.0.1 depends on onnxruntime
    super-gradients 3.0.0 depends on onnxruntime
    super-gradients 2.6.0 depends on onnxruntime
    super-gradients 2.5.0 depends on onnxruntime
    super-gradients 2.2.0 depends on onnxruntime
    super-gradients 2.1.0 depends on onnxruntime
    super-gradients 2.0.1 depends on onnxruntime
    super-gradients 2.0.0 depends on onnxruntime
    super-gradients 1.7.5 depends on onnxruntime
    super-gradients 1.7.4 depends on onnxruntime
    super-gradients 1.7.3 depends on onnxruntime
    super-gradients 1.7.2 depends on onnxruntime
    super-gradients 1.7.1 depends on onnxruntime
    super-gradients 1.6.0 depends on onnxruntime
    super-gradients 1.5.2 depends on onnxruntime
    super-gradients 1.5.1 depends on onnxruntime
    super-gradients 1.5.0 depends on onnxruntime
    super-gradients 1.4.0 depends on onnxruntime
    super-gradients 1.3.1 depends on onnxruntime
    super-gradients 1.3.0 depends on onnxruntime

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Integration with 🤗 Hub

Hi folks.

Thanks for providing the pre-trained models along with pre-training scripts.

At Hugging Face, the Hub is our house to serve models, datasets, spaces, etc. It facilitates easy artifact loading and usage, providing common and streamlined API access to your models, datasets, etc.

Hugging Face supports third-party integrations too and I was wondering if you'd be up to exploring the integration. The integration will primarily facilitate easy model sharing and model downloading which could be beneficial for the vision community in general.

Here's the main doc that'd be helpful for you for the integration: https://huggingface.co/docs/hub/models-adding-libraries. Let me know if you'd need any help.

Filtering Classes

Can we filter the classes ?Is there any argument that we can use ?

Add a `conda` install option for `super-gradients`

It will be helpful to have super-gradients added to conda-forge channel. I have started the work already in the following PR.

conda-forge/staged-recipes#20167

But there seems to be a problem with one of its dependencies:

deci-lab-client PyPI

Neither does it have any source file (*.tar.gz) on PyPI, nor any release on a public GitHub repository.

Please provide (preferably) the source file for deci-lab-client on PyPI.

🔥 CONSTRAINT: To add any package on conda-forge channel, you need ALL its dependencies on conda-forge as well

test

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Add YOLOv5x coco pretrained checkpoint

Is it possible to add a YOLOv5x coco pretrained checkpoint?

How to implement transfer learning in unseen dataset .?

Lets if any unseen data we add then i do not want to train whole data set by mixing old and new data....I want to train only unseen data and our final model must detect old label and new label after training on unseen label?

Is there no other Loss that can be used? I want to do regression tasks

assert self.lr_schedule_function is callable called before its initialization

As can be seen in the following link, the assert is called before the initialization causing an AttributeError

super-gradients/src/super_gradients/training/utils/callbacks/callbacks.py

Line 475 in 21fa8ae

assert callable(self.lr_schedule_function), "self.lr_function must be callable"

How can I enable multigpu training

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

How can i measure test accuracy using test dataset?

Cannot install supergradients

Hello,
I facing this issue while trying to install supergradients.

Also i tried,
pip install super-gradients i am encountering this issue.

super-gradients examples for amazon sagemaker

Hello, I am mainly working with amazon sagemaker for building a model. Do you have any examples or tutorial for aws? Please let me know if you have.

Kornia augmentations integration

Hi guys, very nice repository— congratulations!

We were wondering whether we can help you guys to integrate kornia.augmentations in your framework.

We have special containers to automate the case for augmenting for detection, segmentation, videos, etc.

https://kornia.readthedocs.io/en/latest/augmentation.container.html#kornia.augmentation.container.AugmentationSequential

An example of a similar integration can be found here from ms-torchgeo team after further collaboration

https://github.com/microsoft/torchgeo/blob/main/torchgeo/transforms/transforms.py

let us know so that our augmentations team can assist you in case 0f missing features /cc @shijianjian @twsl

Need a inference code for yolo-nas , so that i can get output in x1,y1,x2,y2 in terms of bounding box , along with the conf

I'm using Opencv , and i need to be able to put my image frame from the opencv which ofcourse is in numpy.

But the abstracted code you provide don't seem to give an option to get output in the form of x's and y's and conf score ,

Or it might be hiden in some different Class hiearchy , i'm not sure , cause i could not find it . would be very helpful if you could

help with it .

Arigato

Training YOLONAS from scratch for using it in a commercial application

As far as I understand for the license, if I used pre-trained weights for training YOLONAS (fine-tune my model on my dataset), I cannot use it in commercial applications. Is it right?

If so, when training:

from super_gradients.training import models
model = models.get('yolo_nas_l', 
                   num_classes=len(dataset_params['classes']), 
                   pretrained_weights="coco"
                   )

how to change this and train it from scratch?

Default dataloader params have shuffle=False

Describe the bug

By default, it does not pass shuffle=True to the dataloader, so SequentialSampler gets instantiated. In training on Imagenet it makes NN see only examples of a single class which quickly throws it out of the minima.
Passing shuffle=True to the dataloader params solves the issue

To Reproduce

Minimal example:

from super_gradients import Trainer
from super_gradients.training import MultiGPUMode
from super_gradients.training import models
from super_gradients.training.dataloaders import imagenet_train, imagenet_val
from super_gradients.training.metrics import Accuracy

super_gradients.init_trainer()

dataloader_params = {"batch_size": 196}  # buggy params
# dataloader_params = {"batch_size": 196, "shuffle": True}  # non-buggy params


train_params = {"max_epochs": 1,
                "initial_lr": 0.001,
                "optimizer": "SGD",
                "optimizer_params": {"weight_decay": 0.0001, "momentum": 0.9, "nesterov": True},
                "loss": "cross_entropy",
                "train_metrics_list": [Accuracy()],
                "valid_metrics_list": [Accuracy()],
                "loss_logging_items_names": ["Loss"],
                "metric_to_watch": "Accuracy",
                "greater_metric_to_watch_is_better": True
                }

train_dataloader = imagenet_train(dataloader_params=dataloader_params)
val_dataloader = imagenet_val(dataloader_params=dataloader_params)

model = models.get("resnet50", pretrained_weights="imagenet", num_classes=1000)

trainer = Trainer(experiment_name="reproduce_shuffle_bug",
                  multi_gpu=MultiGPUMode.OFF,
                  device='cuda')

trainer.train(model=model,
              training_params=train_params,
              train_loader=train_dataloader,
              valid_loader=val_dataloader)

Expected behavior

Accuracy not dropping

Environment:

OS Linux 5.4.0-94-generic x86_64
Super Gradients version 3.0.0

Additional context

Add any other context about the problem here.

	edge_target = target_to_binary_edge(
	target, num_classes=self.num_classes, kernel_size=self.edge_kernel, ignore_index=self.ignore_index, flatten_channels=True
	)

	mask_loss = mask_loss[mask == 1] # consider only mask samples for mask loss computing

	mask_loss = apply_reduce(mask_loss, self.reduction)

deci-ai / super-gradients Goto Github PK

super-gradients's Issues

Describe the bug

To Reproduce

Expected behavior

Environment:

Additional context

Describe the bug

To Reproduce

Expected behavior

Screenshots

Describe the bug

To Reproduce

Expected behavior

Screenshots

Environment:

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe the bug

To Reproduce

Expected behavior

Environment:

Markdown content below

Content

About Deci Platform

With Deci you can:

First-time setup

Step 1:

Step 2:

Step 3:

What's next?

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Describe the bug

To Reproduce

Expected behavior

Environment:

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

DEFINE DATA TRANSFORMATIONS

CONNECTING THE DATASET INTERFACE WILL SET SGMODEL'S CLASSES ATTRIBUTE ACCORDING TO SUPERVISELY

THIS IS WHERE THE MAGIC HAPPENS- SINCE SGMODEL'S CLASSES ATTRIBUTE WAS SET TO BE DIFFERENT FROM CITYSCAPES'S, AFTER

LOADING THE PRETRAINED REGSET, IT WILL CALL IT'S REPLACE_HEAD METHOD AND CHANGE IT'S SEGMENTATION HEAD LAYER ACCORDING

TO OUR BINARY SEGMENTATION DATASET

DEFINE TRAINING PARAMS. SEE DOCS FOR THE FULL LIST.

model.train(train_params)

The training stops with following error message.

Aborted (super-gradients) gridai@session:~/work/super-gradients →

Here are the system details

Describe the bug

Describe the bug

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Describe the bug

To Reproduce

Expected behavior

Environment:

Additional context

Recommend Projects

Recommend Topics

Recommend Org

Aborted
(super-gradients) gridai@session:~/work/super-gradients →