Code Monkey home page Code Monkey logo

super-gradients's Issues

Upsample size mismatch in segmentation models

Describe the bug

Depending on the input image size, upsampled feature maps with nn.Upsample don't always match the size of the skip connection. This is a known issue, some reference links:

Replacing nn.Upsample with torch.nn.functional.interpolate seems to be the recommended solution.

To Reproduce

Here's a snippet using PP-LiteSeg. The dataset is cityscapes, but that's not important, the image size is the important factor. I imagine that the issue is in all models using nn.Upsample and concatenating with skip connections:

from super_gradients.training import models, dataloaders, Trainer
from super_gradients.common.object_names import Models
from super_gradients.training.metrics import IoU


trainer = Trainer(experiment_name="eval-pp-liteseg-b75")
val_loader = dataloaders.cityscapes_stdc_seg75_val(dataset_params={
    "transforms": [
            {
                "SegRescale": {
                    "long_size": 1025
                }
            }
        ]
    },
    dataloader_params={"batch_size": 1},
)
model = models.get(
    Models.PP_LITE_B_SEG75,
    pretrained_weights="cityscapes",
)
metric = IoU(num_classes=20, ignore_index=19)
miou = trainer.test(
    model=model,
    test_loader=val_loader,
    test_metrics_list=[metric],
    metrics_progress_verbose=False
)[0].cpu().item()
print(f"mIoU: {miou}")

Results in an error:

  File ".../src/super_gradients/training/models/segmentation_models/ppliteseg.py", line 52, in forward
    atten = torch.cat([*self._avg_max_spatial_reduce(x, use_concat=False), *self._avg_max_spatial_reduce(skip, use_concat=False)], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 66 but got size 65 for tensor number 2 in the list.

Expected behavior

Fully convolutional segmentation models should work for all input image sizes.

Environment:

  • Ubuntu
  • super-gradients v3.0.7
  • PyTorch 1.11

Integrating custom models into backbone

Hello there!

I loved your work, keep it up!

I'd like to integrate a custom attention model as backbone model for an object detection task and test it out. Is there any documentation or tutorial that you can provide so that I can follow? Or any help would be appreciated!

Need a inference code for yolo-nas , so that i can get output in x1,y1,x2,y2 in terms of bounding box , along with the conf

I'm using Opencv , and i need to be able to put my image frame from the opencv which ofcourse is in numpy.

But the abstracted code you provide don't seem to give an option to get output in the form of x's and y's and conf score ,

Or it might be hiden in some different Class hiearchy , i'm not sure , cause i could not find it . would be very helpful if you could

help with it .

Arigato

Integration with πŸ€— Hub

Hi folks.

Thanks for providing the pre-trained models along with pre-training scripts.

At Hugging Face, the Hub is our house to serve models, datasets, spaces, etc. It facilitates easy artifact loading and usage, providing common and streamlined API access to your models, datasets, etc.

Hugging Face supports third-party integrations too and I was wondering if you'd be up to exploring the integration. The integration will primarily facilitate easy model sharing and model downloading which could be beneficial for the vision community in general.

Here's the main doc that'd be helpful for you for the integration: https://huggingface.co/docs/hub/models-adding-libraries. Let me know if you'd need any help.

Filtering Classes

Can we filter the classes ?Is there any argument that we can use ?

How can I enable multigpu training

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

How to implement transfer learning in unseen dataset .?

Lets if any unseen data we add then i do not want to train whole data set by mixing old and new data....I want to train only unseen data and our final model must detect old label and new label after training on unseen label?

MaskAttentionLoss in DiceCEEdgeLoss doesn't handle images without any edges

Describe the bug

Training models that use DiceCEEdgeLoss results in NaN loss on images that only contain one semantic class. The edge_target becomes a tensor filled with zeros because there are no edges in the image:

edge_target = target_to_binary_edge(
target, num_classes=self.num_classes, kernel_size=self.edge_kernel, ignore_index=self.ignore_index, flatten_channels=True
)

Then, when computing the MaskAttentionLoss, mask_loss is a tensor filled with zeros, gets reassigned to an empty tensor, and, finally, computing the mean of an empty tensor results in NaN.

mask_loss = mask_loss[mask == 1] # consider only mask samples for mask loss computing
mask_loss = apply_reduce(mask_loss, self.reduction)

To Reproduce

I've written a new test in tests/unit_tests/mask_loss_test.py that reproduces the problem.

def test_with_cross_entropy_loss_maskless(self):
    """
    Test case with mask filled with zeros, corresponding to a scenario without
    attention. It's expected that the mask doesn't contribute to the loss.

    This scenario may happen when using edge masks on an image without
    edges - there's only one semantic region in the whole image.

    Shapes: predict [BxCxHxW], target [BxHxW], mask [Bx1xHxW]
    """
    predict = torch.randn(self.batch, self.num_classes, self.img_size, self.img_size)
    target = self._get_default_target_tensor()
    # Create a mask filled with zeros to disable the attention component
    mask = self._get_default_mask_tensor() * 0.0

    loss_weigths = [1.0, 0.5]
    ce_crit = nn.CrossEntropyLoss(reduction="none")
    mask_ce_crit = MaskAttentionLoss(criterion=ce_crit, loss_weights=loss_weigths)

    # expected result - no contribution from mask
    ce_loss = ce_crit(predict, target)
    expected_loss = ce_loss.mean() * loss_weigths[0]

    # mask ce loss result
    loss = mask_ce_crit(predict, target, mask)

    self._assertion_torch_values(expected_loss, loss)

Running this test results in:

AssertionError: False is not true : Unequal torch tensors: excepted: 1.7192925214767456, found: nan

Expected behavior

A mask filed with zeros should "disable" attention. Thus, the mask should not contribute to the loss.

Environment:

  • 3.0.7

Additional context

Can be fixed by checking if mask_loss is NaN and setting it to 0 instead. Like this:

mask_loss = mask_loss if not mask_loss.isnan() else mask_loss.new_tensor(0.0)

Kornia augmentations integration

Hi guys, very nice repositoryβ€” congratulations!

We were wondering whether we can help you guys to integrate kornia.augmentations in your framework.

We have special containers to automate the case for augmenting for detection, segmentation, videos, etc.

An example of a similar integration can be found here from ms-torchgeo team after further collaboration

let us know so that our augmentations team can assist you in case 0f missing features /cc @shijianjian @twsl

python TFlite or onnx inference script

Is your feature request related to a problem?

No. i am able to convert .pth to tflite models i need a starter script which can be used for tflite inference

Describe the solution you'd like

A standalone python inference script

Describe alternatives you've considered

if you have onnx inference script also it would be helpful

Add gradient clipping

Is your feature request related to a problem? Please describe.

Some training recipes require gradient clipping especially when transfer learning from pretrained models. Such option should be added to training params

Describe the solution you'd like

add clip_grad_norm to training params

Cannot install supergradients

Hello,
I facing this issue while trying to install supergradients.

image
Also i tried,
pip install super-gradients i am encountering this issue.
image

Cityscapes datastructure

Hi,
I want to replicate some results, however, I have some issues with loading in the Cityscapes dataset and I can't find anywhere how the dataset structure should be. I saw the list_file and labels_csv_path are needed, but I don't know how to obtain these.

Error in building wheels

Getting errors while building wheels for collected packages:

> treelib, stringcase, pycocotools, termcolor, future, antlr4-python3-runtime.

I think, setup.py needs to be revised.

Tried with pip install super-gradients and pip install git+https://github.com/Deci-AI/super-gradients.git@stable both

image

Advanced training recipes for ddrnet

Thanks for your great works. Do you have the plan to apply more advanced training recipes for ddrnet_23_slim and ddrnet_23? The paddleseg version ddrnet_23 has achieved 79.85%mIoU.

How to accelerate regseg on tensorRT

I tried to use tensorrt with original regseg repository.
However onnx had trouble with torch.split and also torch2trt unable to use with specific tensorrt version.
Please let me know how did you use tensorrt with regseg model when measure the latency.

CoCoSegmentationDataSet._generate_samples_and_targets() doesn't call super (= no caching)

Describe the bug

CoCoSegmentationDataSet._generate_samples_and_targets() doesn't call the corresponding parent class method, and therefore image and label caching is doesn't work for this class.
The solution is to add super()._generate_samples_and_targets() as the last line in CoCoSegmentationDataSet._generate_samples_and_targets().
Excuse me for not making this as a pull request this time.

Getting started: YOLOX inference and image preprocessing

cc: @shaydeci
I want to start using YOLOX for Object Detection using the pre-trained coco weights.
Looking at the tutorial page I was unclear if the expected input to the model should be RGB or BGR.

I looked at COCODetectionDataset(DetectionDataset) which lead me to the .get_resized_image() method. This method uses cv2.imread, which is BGR. So, is it correct to assume that the YOLOX pretrained model also had this kind of preprocessing? (i.e., the same as in super_gradients.training.datasets.detection_datasets.detection_dataset.DetectionDataset.get_resized_image)?
While COCODetectionDataset seems to be using cv2 (-->BGR), the example in the tutorial above is using PIL.Image.open() which returns RGB.

Currently, I'm using super_gradients.training.transforms.transforms.rescale_and_pad_to_size to preprocess my images, but they are read by opencv (BGR), so I'm wondering if I need to skip the .swap() phase?

Thanks

STDC-seg fails to train

I am attempting to train STDC-seg model using super-gradients,

  • My dataset is in coco2017 format

Here is my train.py code
-----train.py----
from super_gradients.training.datasets.dataset_interfaces.dataset_interface import CoCoSegmentationDatasetInterface
from super_gradients.training.sg_model import SgModel
from super_gradients.training.metrics import BinaryIOU
from super_gradients.training.transforms.transforms import ResizeSeg, RandomFlip, RandomRescale, CropImageAndMask,
PadShortToCropSize, ColorJitterSeg
from super_gradients.training.utils.callbacks import BinarySegmentationVisualizationCallback, Phase
from torchvision import transforms

DEFINE DATA TRANSFORMATIONS

dataset_params = {"dataset_dir": "/home/syed/work/vision_datasets/11apr22",
"batch_size": 8,
"val_batch_size":8,
"num_classes":2
}

dataset_interface = CoCoSegmentationDatasetInterface(dataset_params,
cache_labels = False, cache_images = False, dataset_classes_inclusion_tuples_list = [(0, 'background'), (1, 'drivable-area"')])

model = SgModel("stdc2_seg50_scratch_50_epochs")

CONNECTING THE DATASET INTERFACE WILL SET SGMODEL'S CLASSES ATTRIBUTE ACCORDING TO SUPERVISELY

#model.connect_dataset_interface(dataset_interface)

THIS IS WHERE THE MAGIC HAPPENS- SINCE SGMODEL'S CLASSES ATTRIBUTE WAS SET TO BE DIFFERENT FROM CITYSCAPES'S, AFTER

LOADING THE PRETRAINED REGSET, IT WILL CALL IT'S REPLACE_HEAD METHOD AND CHANGE IT'S SEGMENTATION HEAD LAYER ACCORDING

TO OUR BINARY SEGMENTATION DATASET

model.build_model(architecture = "stdc2_seg50", arch_params={"num_classes":1})
#model.build_model("stdc2_seg50")

model.connect_dataset_interface(dataset_interface)

DEFINE TRAINING PARAMS. SEE DOCS FOR THE FULL LIST.

train_params = {"max_epochs": 50,
"lr_mode": "cosine",
"initial_lr": 0.0064, # for batch_size=16
"optimizer_params": {"momentum": 0.843,
"weight_decay": 0.00036,
"nesterov": True},
"criterion_params": {"num_classes": 1
},

            "cosine_final_lr_ratio": 0.1,
            "multiply_head_lr": 10,
            "optimizer": "SGD",
            "loss": "stdc_loss",
            "ema": True,
            "zero_weight_decay_on_bias_and_bn": True,
            "average_best_models": True,
            "mixed_precision": False,
            "metric_to_watch": "mean_IOU",
            "greater_metric_to_watch_is_better": True,
            "train_metrics_list": [BinaryIOU()],
            "valid_metrics_list": [BinaryIOU()],
            "loss_logging_items_names": ["loss"],
            "phase_callbacks": [BinarySegmentationVisualizationCallback(phase=Phase.VALIDATION_BATCH_END,
                                                                        freq=1,
                                                                        last_img_idx_in_batch=4)],
            }

model.train(train_params)

The training stops with following error message.

(super-gradients) gridai@session:~/work/super-gradients β†’ python train.py
You did not mention an AWS environment.You can set the environment variable ENVIRONMENT_NAME with one of the values: development,staging,production
/home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
callbacks -WARNING- Failed to import deci_lab_client
loading annotations into memory...
Done (t=0.12s)
creating index...
index created!
loading annotations into memory...
Done (t=0.04s)
creating index...
index created!
/home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/utils/data/dataloader.py:487: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
/home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/deprecate/deprecation.py:115: FutureWarning: The IoU was deprecated since v0.7 in favor of torchmetrics.classification.jaccard.JaccardIndex. It will be removed in v0.8.
stream(template_mgs % msg_args)
sg_model -INFO- Using EMA with params {}
"events.out.tfevents.1649674474.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6391.0" will not be deleted
"events.out.tfevents.1649674209.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6318.0" will not be deleted
"events.out.tfevents.1649761776.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1654.0" will not be deleted
"events.out.tfevents.1649761893.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1800.0" will not be deleted
"events.out.tfevents.1649761085.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1571.0" will not be deleted
"events.out.tfevents.1649676959.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6464.0" will not be deleted
"events.out.tfevents.1649761829.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-7xfbb.1727.0" will not be deleted
"events.out.tfevents.1649674011.ixnode-cce75236-32dc-42d6-90e9-a713878cc921-758f988669-ftddj.6245.0" will not be deleted
sg_model -INFO- Started training for 50 epochs (0/49)

Train epoch 0: 0%| | 0/210 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [27,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [28,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [29,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [464,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [96,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [97,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [98,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [99,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [100,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [101,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [102,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [103,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [399,0,0], thread: [104,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [59,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [63,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [112,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [113,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [114,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [115,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [116,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [117,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [118,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [119,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [120,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [121,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [122,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [123,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [124,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [125,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [126,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [453,0,0], thread: [127,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [87,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [88,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [89,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [90,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [91,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [92,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [93,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [94,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:276: operator(): block: [400,0,0], thread: [95,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
Train epoch 0: 0%| | 0/210 [00:02<?, ?it/s]
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
Exception raised from createEvent at ../aten/src/ATen/cuda/CUDAEvent.h:174 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f29ab3097d2 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: + 0x10cf22a (0x7f29ac8eb22a in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so)
frame #2: + 0x2fff28 (0x7f29fe644f28 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #3: c10::TensorImpl::release_resources() + 0x175 (0x7f29ab2f2005 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #4: + 0x1ede49 (0x7f29fe532e49 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x4da268 (0x7f29fe81f268 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f29fe81f562 in /home/jovyan/conda/envs/super-gradients/lib/python3.9/site-packages/torch/lib/libtorch_python.so)

frame #27: __libc_start_main + 0xf3 (0x7f2a010290b3 in /usr/lib/x86_64-linux-gnu/libc.so.6)

Aborted
(super-gradients) gridai@session:~/work/super-gradients β†’

Here are the system details

(super-gradients) gridai@session:/work/super-gradients β†’ uname -srm
Linux 5.4.129-63.229.amzn2.x86_64 x86_64
(super-gradients) gridai@session:
/work/super-gradients β†’ nvidia-smi
Tue Apr 12 11:30:44 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 29C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
(super-gradients) gridai@session:~/work/super-gradients β†’ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

Also Would you please provide an example code using coco128 dataset training diffrent STDC, with some documentation regarding important parameters to tweak.

Thanks

Support for 3D images

Is your feature request related to a problem? Please describe.

Medical image segmentation often relies on 3D scans (MRI, CT), but there are very few pretrained models to use for different medical image tasks (classification, segmentation)

Describe the solution you'd like

It would be great if there were some state of the art models available in super-gradients, and even better if they would be pre-trained. For example this model http://arxiv.org/abs/2208.09567, or use the synthesized 100k brains to train a 3d classification model like this one: http://arxiv.org/abs/2209.07162, https://arxiv.org/pdf/2303.08216.pdf

Add a `conda` install option for `super-gradients`

It will be helpful to have super-gradients added to conda-forge channel. I have started the work already in the following PR.

But there seems to be a problem with one of its dependencies:

  • deci-lab-client PyPI

    Neither does it have any source file (*.tar.gz) on PyPI, nor any release on a public GitHub repository.

Please provide (preferably) the source file for deci-lab-client on PyPI.

πŸ”₯ CONSTRAINT: To add any package on conda-forge channel, you need ALL its dependencies on conda-forge as well

How to merge label classes without modifying the ground truth data

I want to take a set of model weights for semantic segmentation pre-trained on cityscapes and finetune it such that it ignores all classes other than road.
Is there currently a way to merge or ignore label classes, e.g. by passing an argument to the dataloader?
Else can you give a hint where best to modify the source code to achieve this?

My best guess would be to modify this function to set other to be ignored.

Bug in super-gradients in Linux ubuntu and google colab

You'll use yolo_nas_l throughout this notebook. Because you should always go big, or go home.

It's a good life philosophy.

But your fine tuning notebook seems to be not working, I tried to rerun your notebook and in the second cell it is showing me error.
Here is a screenshot of error.
Screenshot from 2023-05-04 11-02-37

Also following your code into linux env I am facing certain errors while installing super-gradients.

Building wheels for collected packages: pycocotools, stringcase, termcolor, treelib, antlr4-python3-runtime, future
Building wheel for pycocotools (PEP 517) ... error
ERROR: Command errored out with exit status 1:
command: /home/soumyadeep/mmaction_custom/ava_custom_v2/Yolo_train/YOLO-NAS/yolonas_env/bin/python3 /tmp/tmpxprr4m03 build_wheel /tmp/tmp5_axaif9
cwd: /tmp/pip-install-n3xzz9vp/pycocotools
Complete output (67 lines):
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-38
creating build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/init.py -> build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/mask.py -> build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/cocoeval.py -> build/lib.linux-x86_64-cpython-38/pycocotools
copying pycocotools/coco.py -> build/lib.linux-x86_64-cpython-38/pycocotools
running build_ext
skipping 'pycocotools/_mask.c'

and more.

Is there any soluiton into it?

Training YOLONAS from scratch for using it in a commercial application

As far as I understand for the license, if I used pre-trained weights for training YOLONAS (fine-tune my model on my dataset), I cannot use it in commercial applications. Is it right?

If so, when training:

from super_gradients.training import models
model = models.get('yolo_nas_l', 
                   num_classes=len(dataset_params['classes']), 
                   pretrained_weights="coco"
                   )

how to change this and train it from scratch?

Dependency Issue during Installation

Describe the bug

C:\Users\Isaac>pip install super_gradients
Collecting super_gradients
  Using cached super_gradients-3.1.1-py3-none-any.whl (964 kB)
INFO: pip is looking at multiple versions of super-gradients to determine which version is compatible with other requirements. This could take a while.
  Using cached super_gradients-3.1.0-py3-none-any.whl (965 kB)
  Using cached super_gradients-3.0.9-py3-none-any.whl (938 kB)
  Using cached super_gradients-3.0.8-py3-none-any.whl (892 kB)
  Using cached super_gradients-3.0.7-py3-none-any.whl (794 kB)
  Using cached super_gradients-3.0.6-py3-none-any.whl (762 kB)
  Using cached super_gradients-3.0.5-py3-none-any.whl (748 kB)
  Using cached super_gradients-3.0.4-py3-none-any.whl (748 kB)
INFO: pip is looking at multiple versions of super-gradients to determine which version is compatible with other requirements. This could take a while.
  Using cached super_gradients-3.0.3-py3-none-any.whl (732 kB)
Requirement already satisfied: torch>=1.9.0 in c:\python311\lib\site-packages (from super_gradients) (2.0.0)
Requirement already satisfied: tqdm>=4.57.0 in c:\python311\lib\site-packages (from super_gradients) (4.65.0)
Collecting boto3>=1.17.15 (from super_gradients)
  Using cached boto3-1.26.126-py3-none-any.whl (135 kB)
Collecting jsonschema>=3.2.0 (from super_gradients)
  Using cached jsonschema-4.17.3-py3-none-any.whl (90 kB)
Collecting Deprecated>=1.2.11 (from super_gradients)
  Using cached Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: opencv-python>=4.5.1 in c:\python311\lib\site-packages (from super_gradients) (4.7.0.72)
Requirement already satisfied: scipy>=1.6.1 in c:\python311\lib\site-packages (from super_gradients) (1.10.1)
Requirement already satisfied: matplotlib>=3.3.4 in c:\python311\lib\site-packages (from super_gradients) (3.7.1)
Requirement already satisfied: psutil>=5.8.0 in c:\python311\lib\site-packages (from super_gradients) (5.9.5)
Collecting tensorboard>=2.4.1 (from super_gradients)
  Using cached tensorboard-2.12.3-py3-none-any.whl (5.6 MB)
Requirement already satisfied: setuptools>=21.0.0 in c:\python311\lib\site-packages (from super_gradients) (65.5.0)
Collecting coverage~=5.3.1 (from super_gradients)
  Using cached coverage-5.3.1.tar.gz (684 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torchvision>=0.10.0 in c:\python311\lib\site-packages (from super_gradients) (0.15.1)
Collecting sphinx~=4.0.2 (from super_gradients)
  Using cached Sphinx-4.0.3-py3-none-any.whl (2.9 MB)
Collecting sphinx-rtd-theme (from super_gradients)
  Using cached sphinx_rtd_theme-1.2.0-py2.py3-none-any.whl (2.8 MB)
Collecting torchmetrics==0.8 (from super_gradients)
  Using cached torchmetrics-0.8.0-py3-none-any.whl (408 kB)
Requirement already satisfied: pillow>=9.2.0 in c:\python311\lib\site-packages (from super_gradients) (9.5.0)
Collecting hydra-core>=1.2.0 (from super_gradients)
  Using cached hydra_core-1.3.2-py3-none-any.whl (154 kB)
Collecting omegaconf (from super_gradients)
  Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Collecting super_gradients
  Using cached super_gradients-3.0.2-py3-none-any.whl (664 kB)
  Using cached super_gradients-3.0.1-py3-none-any.whl (635 kB)
  Using cached super_gradients-3.0.0-py3-none-any.whl (615 kB)
  Using cached super_gradients-2.6.0-py3-none-any.whl (11.0 MB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Using cached super_gradients-2.5.0-py3-none-any.whl (11.0 MB)
  Using cached super_gradients-2.2.0-py3-none-any.whl (10.9 MB)
  Using cached super_gradients-2.1.0-py3-none-any.whl (23.0 MB)
Collecting elasticsearch==7.15.2 (from super_gradients)
  Using cached elasticsearch-7.15.2-py2.py3-none-any.whl (379 kB)
Collecting CMRESHandler>=1.0.0 (from super_gradients)
  Using cached CMRESHandler-1.0.0-py3-none-any.whl (15 kB)
Collecting super_gradients
  Using cached super_gradients-2.0.1-py3-none-any.whl (19.4 MB)
  Using cached super_gradients-2.0.0-py3-none-any.whl (19.4 MB)
  Using cached super_gradients-1.7.5-py3-none-any.whl (19.3 MB)
Collecting torchmetrics==0.7.3 (from super_gradients)
  Using cached torchmetrics-0.7.3-py3-none-any.whl (398 kB)
Collecting super_gradients
  Using cached super_gradients-1.7.4-py3-none-any.whl (19.3 MB)
  Using cached super_gradients-1.7.3-py3-none-any.whl (19.3 MB)
Collecting torchmetrics>=0.5.0 (from super_gradients)
  Using cached torchmetrics-0.11.4-py3-none-any.whl (519 kB)
Collecting super_gradients
  Using cached super_gradients-1.7.2-py3-none-any.whl (19.3 MB)
  Using cached super_gradients-1.7.1-py3-none-any.whl (15.1 MB)
  Using cached super_gradients-1.6.0-py3-none-any.whl (547 kB)
  Using cached super_gradients-1.5.2-py3-none-any.whl (540 kB)
  Using cached super_gradients-1.5.1-py3-none-any.whl (540 kB)
  Using cached super_gradients-1.5.0-py3-none-any.whl (497 kB)
  Using cached super_gradients-1.4.0-py3-none-any.whl (419 kB)
  Using cached super_gradients-1.3.1-py3-none-any.whl (416 kB)
  Using cached super_gradients-1.3.0-py3-none-any.whl (415 kB)
ERROR: Cannot install super-gradients==1.3.0, super-gradients==1.3.1, super-gradients==1.4.0, super-gradients==1.5.0, super-gradients==1.5.1, super-gradients==1.5.2, super-gradients==1.6.0, super-gradients==1.7.1, super-gradients==1.7.2, super-gradients==1.7.3, super-gradients==1.7.4, super-gradients==1.7.5, super-gradients==2.0.0, super-gradients==2.0.1, super-gradients==2.1.0, super-gradients==2.2.0, super-gradients==2.5.0, super-gradients==2.6.0, super-gradients==3.0.0, super-gradients==3.0.1, super-gradients==3.0.2, super-gradients==3.0.3, super-gradients==3.0.4, super-gradients==3.0.5, super-gradients==3.0.6, super-gradients==3.0.7, super-gradients==3.0.8, super-gradients==3.0.9, super-gradients==3.1.0 and super-gradients==3.1.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    super-gradients 3.1.1 depends on torch<1.14 and >=1.9.0
    super-gradients 3.1.0 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.9 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.8 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.7 depends on torch<1.14 and >=1.9.0
    super-gradients 3.0.6 depends on torch<=1.12 and >=1.9.0
    super-gradients 3.0.5 depends on torch<=1.12 and >=1.9.0
    super-gradients 3.0.4 depends on torch<=1.12 and >=1.9.0
    super-gradients 3.0.3 depends on onnxruntime
    super-gradients 3.0.2 depends on onnxruntime
    super-gradients 3.0.1 depends on onnxruntime
    super-gradients 3.0.0 depends on onnxruntime
    super-gradients 2.6.0 depends on onnxruntime
    super-gradients 2.5.0 depends on onnxruntime
    super-gradients 2.2.0 depends on onnxruntime
    super-gradients 2.1.0 depends on onnxruntime
    super-gradients 2.0.1 depends on onnxruntime
    super-gradients 2.0.0 depends on onnxruntime
    super-gradients 1.7.5 depends on onnxruntime
    super-gradients 1.7.4 depends on onnxruntime
    super-gradients 1.7.3 depends on onnxruntime
    super-gradients 1.7.2 depends on onnxruntime
    super-gradients 1.7.1 depends on onnxruntime
    super-gradients 1.6.0 depends on onnxruntime
    super-gradients 1.5.2 depends on onnxruntime
    super-gradients 1.5.1 depends on onnxruntime
    super-gradients 1.5.0 depends on onnxruntime
    super-gradients 1.4.0 depends on onnxruntime
    super-gradients 1.3.1 depends on onnxruntime
    super-gradients 1.3.0 depends on onnxruntime

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

test

Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like

A clear and concise description of what you want to happen.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

I have a pretrained model how to load them properly and predict it

Describe the bug

I have coco pretrained model how to load them properly and predict it. below is my code i get error if i didnt use "model.set_dataset_processing_params"

from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L,
                   checkpoint_path="./yolo_nas_l_coco.pth",
                   num_classes=80)

url = "https://previews.123rf.com/images/freeograph/freeograph2011/freeograph201100150/158301822-group-of-friends-gathering-around-table-at-home.jpg"
model.set_dataset_processing_params( class_names=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79"],
                                    image_processor="NormalizeImage",
                                     mean=1.0, std=1.0)
model.predict(url, conf=0.25).save("output")

I am getting following error

File "/-/super_gradients/common/decorators/factory_decorator.py", line 27, in wrapper
    kwargs[param_name] = factory.get(kwargs[param_name])
  File "/-/super_gradients/common/factories/processing_factory.py", line 15, in get
    return super().get(conf)
  File "/-/super_gradients/common/factories/base_factory.py", line 47, in get
    return self.type_dict[conf]()
TypeError: __init__() missing 2 required positional arguments: 'mean' and 'std'

To Reproduce

Steps to reproduce the behavior:

  1. Train recipe '...'
  2. Change param '...'
  3. See error

Expected behavior

it should predict and save output

Environment:

  • OS - ubuntu
  • Relevant HW info, GPU + CUDA [e.g. nvidia-smi] - GPU + CUDA
  • Super Gradients version 3.1.0
  • Python environment [e.g. pip freeze] 3.9

YOLOv5 Tutorial Assistance

@shaydeci @oferbaratz @ofrimasad hi I'm working on our new Deci.ai + YOLOv5 partnership tutorial and need some help. The tutorial is at https://github.com/ultralytics/yolov5/wiki/YOLOv5-Deci-AI-Tutorial and is based on a word document provided by Rachel than I've converted to Markdown.

I'd like you guys at Deci to review the content and help supply the 6 additional images (denoted by IMAGE_HERE). To streamline this I've pasted the markdown content directly here in this issue. If you simply edit this issue with the appropriate changes I can then transfer those over to the YOLOv5 repo.

For the images I've provided one example hyperlinked image myself. The guidelines are that they should be 1920 pixel wide JPG screenshots at <500kB each.

Thanks for the help and let me know if you have any questions!

Markdown content below

πŸ“š This guide explains how to streamline the process of compiling and quantizing YOLOv5 πŸš€ to achieve better performance with the Deci platform. UPDATED 6 August 2022.

Content

  • About the Deci Platform
  • First-time setup
  • Runtime optimization and benchmarking of your model

About Deci Platform

The Deci platform includes free tools for easily managing, optimizing, and deploying models in any production environment. Deci supports all popular DL frameworks, such as TensorFlow, PyTorch, Keras and ONNX. All you need is our web-based platform or our Python client to run it from your code.

With Deci you can:

  • Improve Inference performance by up to 10X
    Automatically compile and quantize your models and evaluate different production settings to achieve better latency, throughout, reduce model size and memory footprint on your hardware.

  • Find the best inference hardware for your application
    Benchmark your model's performance on various hardware (including edge) devices with a click of a button. Eliminate the need to manually setup and test various hardware and production settings.

  • Deploy with a Few Lines of Code
    Leverage Deci's python-based inference engine. Compatible with multiple frameworks and hardware types.

For more information about the Deci platform please visit Deci's website.

First-time setup

Step 1:

Go to https://console.deci.ai/sign-up and open your free account.

Deci AI signup page

Step 2:

In order to start optimizing your pre-trained YOLOv5 model, you will need to convert it into an ONNX format. Please follow these simple instructions on this link to convert your model to ONNX format.

Step 3:

Go to "Lab" tab and click the "New Model" button in the top right part of the screen to upload your model in the ONNX format.

Deci AI Lab page

Follow the steps of the model upload wizard to select your target hardware as well as desired batch size and quantization level for the model compilation.

Deci AI Lab page

After filling in the relevant information, click "Start". The Deci platform will automatically perform a runtime optimization of your YOLOv5 model for the hardware you selected as well as benchmark your model on various hardware types. This process takes approximately 10 minutes.

Once done, a new row will appear on your screen underneath the baseline model you previously uploaded. Here you can see the optimized version of your pre-trained YOLOv5 model.

Deci AI Lab page

What's next?

  1. You can then download your optimized model by clicking on "Deploy" button

Deci AI Lab page

You will then be prompted to download your model and receive the instructions on how to install and use Infery - Deci's runtime inference engine.

The use of Infery is optional. You can get the python raw files and use them with any other inference engine of your choice.

Deci AI Lab page

  1. Explore the optimization and benchmark results on the "Insights" tab.

Deci AI Lab page

AttributeError: module 'signal' has no attribute 'SIGKILL'. Did you mean: 'SIGILL'?

Describe the bug

A clear and concise description of what the bug is.

To Reproduce

Steps to reproduce the behavior:

  1. Train recipe '...'
  2. Change param '...'
  3. See error

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Environment:

  • OS [e.g. uname -s -r -m]
  • Relevant HW info, GPU + CUDA [e.g. nvidia-smi]
  • Super Gradients version
  • Python environment [e.g. pip freeze]

Additional context

Add any other context about the problem here.

Support for NLP models?

Hi Deci team!

Thanks for the great open-source code here! A quick question about NLP support.

Is your feature request related to a problem? Please describe.

I was wondering how extensible this library is to NLP tasks. In the blog post, BERT was mentioned but it seems like all the models right now are CV tasks.

Describe the solution you'd like

It would be great to hear what I might need to setup to be able to get an NLP task running with super-gradients.

Describe alternatives you've considered

I've looked into the SgModule class and how the classification models are defined. But I didn't want to dive too deep before consulting your team first in case this was already supported.

Additional context

n/a

Default dataloader params have shuffle=False

Describe the bug

By default, it does not pass shuffle=True to the dataloader, so SequentialSampler gets instantiated. In training on Imagenet it makes NN see only examples of a single class which quickly throws it out of the minima.
Passing shuffle=True to the dataloader params solves the issue

To Reproduce

Minimal example:

from super_gradients import Trainer
from super_gradients.training import MultiGPUMode
from super_gradients.training import models
from super_gradients.training.dataloaders import imagenet_train, imagenet_val
from super_gradients.training.metrics import Accuracy

super_gradients.init_trainer()

dataloader_params = {"batch_size": 196}  # buggy params
# dataloader_params = {"batch_size": 196, "shuffle": True}  # non-buggy params


train_params = {"max_epochs": 1,
                "initial_lr": 0.001,
                "optimizer": "SGD",
                "optimizer_params": {"weight_decay": 0.0001, "momentum": 0.9, "nesterov": True},
                "loss": "cross_entropy",
                "train_metrics_list": [Accuracy()],
                "valid_metrics_list": [Accuracy()],
                "loss_logging_items_names": ["Loss"],
                "metric_to_watch": "Accuracy",
                "greater_metric_to_watch_is_better": True
                }

train_dataloader = imagenet_train(dataloader_params=dataloader_params)
val_dataloader = imagenet_val(dataloader_params=dataloader_params)

model = models.get("resnet50", pretrained_weights="imagenet", num_classes=1000)

trainer = Trainer(experiment_name="reproduce_shuffle_bug",
                  multi_gpu=MultiGPUMode.OFF,
                  device='cuda')

trainer.train(model=model,
              training_params=train_params,
              train_loader=train_dataloader,
              valid_loader=val_dataloader)

Expected behavior

Accuracy not dropping

Environment:

  • OS Linux 5.4.0-94-generic x86_64
  • Super Gradients version 3.0.0

Additional context

Add any other context about the problem here.

The mAP reported by SG is not consistent with COCO-API

Describe the bug

When training a detection model, e.g. SSD Lite MobileNet, the mAP shown in SG (TensorBoard) is much higher than the mAP returned via COCO-API

To Reproduce

Steps to reproduce the behavior:

  1. Branch https://github.com/Deci-AI/super-gradients/tree/ssd_mobilenet
  2. Run train_from_recipe.py with coco_ssd_lite_mobilenet_v2.yaml or coco_ssd_mobilenet_v1.yaml
  3. (The model should reach ~30 mAP)

However: running the COCO-API reference returns ~0.17 mAP.

Expected behavior

The mAP information should correspond to COCO-API's mAP.

Screenshots

image

How to train object detection model or classification model on custom dataset?

Hi,
I have image data annotated for object detection as well on classification task. How I can train and build a object detection model and classification model on my own dataset?

A formal guideline would help me to understand the process thanks. Basically, I want to know how we can load our custom data into the Dataloader in supergradient?

Select which gpu to train on

Is your feature request related to a problem? Please describe.

I am trying to train yolo-nas in an environment with multiple gpus (device=0 and 1), but I only want to use 1. Is there a way to train the model specifically in the 2nd gpu (device=1)

Describe the solution you'd like

Train the model by only using the 2nd gpu

Describe alternatives you've considered

Nothing really.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.