Code Monkey home page Code Monkey logo

Comments (11)

Louis-Dupont avatar Louis-Dupont commented on July 19, 2024 1

FYI, this is the YoloNAS image_processing params:

from super_gradients.training.processing import DetectionCenterPadding, StandardizeImage, NormalizeImage, ImagePermute, ComposeProcessing, DetectionLongestMaxSizeRescale

image_processor = ComposeProcessing(
    [
        DetectionLongestMaxSizeRescale(output_shape=(636, 636)),
        DetectionCenterPadding(output_shape=(640, 640), pad_value=114),
        StandardizeImage(max_value=255.0),
        ImagePermute(permutation=(2, 0, 1)),
    ]
)

The ImagePermute, for instance, is mandatory for YoloNAS because it permutes the axis of the image (H, W, C) to (C, H, W) and YoloNAS expects (C, H, W).

from super-gradients.

Louis-Dupont avatar Louis-Dupont commented on July 19, 2024 1

Great to hear it works. To give you some more info:

1. When you use predict on a model that was not trained using SG recipes, you get the following error:

Please call `model.set_dataset_processing_params(...)` first.

Why ? Because if you trained it yourself with a custom dataset, the model has no way to know what processing functions you applied to the images before feeding it to the model so you need to let it know using set_dataset_processing_params.

2. In your case you said you are loading a pretrained weights from SG, so why didnt it work ?

Because you loaded using checkpoint_path which is meants for loading local checkpoint (i.e. usually trained by yourself). The normal way of loading a pretrained model is to simply do : model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")
This way you should not need to call model.set_dataset_processing_params(...)

Also note that that way you don't need to set num_classes because you are using the default of the model. If you set it to another number, the head of the model will be changed and your model won't be able to predict with it (you will need to fine-tune on the new number of heads)

3. Tip/Fix

Besides, you are doing the following:

image = Image.open("./img.jpg")

# Prepare preprcoess transformations
pre_proccess = transforms.Compose([
    transforms.Resize([640, 640]),
    transforms.ToTensor(),
    transforms.Normalize([.485, .456, .406], [.229, .224, .225])
])

# Run preprocess on image. unsqueeze for [Batch x Channels x Width x Height] format
transformed_image = pre_proccess(image).unsqueeze(0)

...

model.predict(transformed_image, conf=0.45).save("output")

But the simpler approach and more correct approach would be this

model.predict("./img.jpg", conf=0.45).save("output")

With your approach, you apply transforms to the image before feeding it to the model.predict(...), which will anyway preprocess the images in the right way. So with your approach the image will be resized twice (no problem) but also normalized by you and then standardized by the model (with StandardizeImage).

Eventually your code should work like that:

from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L, pretrained_weights='coco')

model.predict("./img.jpg", conf=0.45).save("output")

Does that all make sens?

from super-gradients.

Louis-Dupont avatar Louis-Dupont commented on July 19, 2024 1

ToTensor applies some transformations which currently break compatibility with our predict, but there are workarounds:

1.

model.predict(...) expects the input images to be "channel last ". We will improve that in the near future, but currently, you need to make sure it's channel last (This is why we recommend, when possible to just pass the url/pass of the image(s) so that the model handles it for you).

Solution:
add transforms.Lambda(lambda x: x.permute(1, 2, 0)) to the transform OR do it afterward:
transformed_image = pre_proccess(image).permute(1, 2, 0)

2.

The input image is expected to be scaled [0-255] and ToTensor scales [0-1].

Solution:
add transforms.Lambda(lambda x: x*255) OR do it afterward:
transformed_image = pre_proccess(image) * 255

2.bis

The image should be [0-255], so you should either drop transforms.Normalize([.485, .456, .406], [.229, .224, .225]), or undo the transformation.

3.

The image should be of type uint8 (this is not desired, but we need to fix it). We will very soon support it natively so you won't have to do it yourself.

Solution:
model.predict(transformed_image.numpy().astype(np.uint8), conf=0.45)

4.

This is a bug on our side, but passing batch is not supported (by that, I mean a tensor of shape [batch_size, C, H, W]).

Solution:
In your case, don't call .unsqueeze(0). Now if you have a batch, you can simply do:
images = [image for image in images] # from [BS, C, H, W] to of List of BS Tensors, each of shape [C, H, W]
Note that we process multiple images by batch anyway, this will not affect performance but is simply due how we parse the input images. We we resolve it soon

All together

from super_gradients.common.object_names import Models
from super_gradients.training import models
import torchvision.transforms as transforms
from PIL import Image
import numpy as np

model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

image = Image.open("../../../../documentation/source/images/examples/countryside.jpg")

pre_proccess = transforms.Compose([
    transforms.Resize([640, 640]), 
    transforms.ToTensor(), 
    transforms.Lambda(lambda x: x.permute(1, 2, 0)), 
    transforms.Lambda(lambda x: x * 255)]
)

transformed_image = pre_proccess(image)

model.predict(transformed_image.numpy().astype(np.uint8), conf=0.45).show()

Of course, I encourage to use model.predict(image_path) or model.predict(image_without_any_processing) whenever possible, but if for some reason you don't have access to the original image, this is how I would do it :)

from super-gradients.

dagshub avatar dagshub commented on July 19, 2024

Join the discussion on DagsHub!

from super-gradients.

ajithkumarmcw avatar ajithkumarmcw commented on July 19, 2024

I have tried following image_processor steps but similar results only i am getting . even though i give the required parameter it still tells that its missing eg: mean and std in NormalizeImage
['ComposeProcessing', 'ImagePermute', 'ReverseImageChannels', 'StandardizeImage', 'NormalizeImage', 'DetectionCenterPadding', 'DetectionBottomRightPadding', 'DetectionRescale', 'DetectionLongestMaxSizeRescale']

from super-gradients.

ajithkumarmcw avatar ajithkumarmcw commented on July 19, 2024

@Louis-Dupont could u kindly help me with this ?

from super-gradients.

Louis-Dupont avatar Louis-Dupont commented on July 19, 2024

Hi @ajithkumarmcw, the syntax would be the following
image_processor={"NormalizeImage": {"mean": [1.0, 1.0, 1.0], "std": [1.0, 1.0, 1.0]}}
(the values are per-channel)
This syntax is used to work with our config files (the recipes), but you can also do all of this with regular python objects:

from super_gradients.training.processing import NormalizeImage
model.set_dataset_processing_params(image_processor=NormalizeImage(mean=[1.0, 1.0, 1.0], std=[1.0, 1.0, 1.0]))

That being said, this might not be required if you are using a model finetuned by SG. Unless you are fine-tuning the model without using SG recipes, you should just ignore this parameter since this is related to "how to make sure the model gets the image in the right format" which is automatically handled by SG.
Besides, setting image_processor=NormalizeImage means that you won't do any other image processing other than normalizing the image, which can be enough for some models but not for YoloNAS, and this would lead to an error.
What is your motivation for using overriding image_processor? Is it out of curiosity or because you have a custom training?

Also, I saw you setting checkpoint_path="./yolo_nas_l_coco.pth". Is this the default pretrained weights or the weight of a model you fine-tuned?

Feel free to let me know more about your use case if this doesn't answer your question :)

from super-gradients.

ajithkumarmcw avatar ajithkumarmcw commented on July 19, 2024

Hi @Louis-Dupont thanks for the detailed reply. My target is to run a finetuned model for a single image . As of now i am using "./yolo_nas_l_coco.pth" which is a pretrained weight provided by you.

from super_gradients.common.object_names import Models
from super_gradients.training import models
from super_gradients.training.processing import NormalizeImage
import torchvision.transforms as transforms
from PIL import Image

model = models.get(Models.YOLO_NAS_L,
                   checkpoint_path="./yolo_nas_l_coco.pth",
                   num_classes=80)

# Get PIL image
image = Image.open("./img.jpg")

# Prepare preprcoess transformations
pre_proccess = transforms.Compose([
    transforms.Resize([640, 640]),
    transforms.ToTensor(),
    transforms.Normalize([.485, .456, .406], [.229, .224, .225])
])

# Run preprocess on image. unsqueeze for [Batch x Channels x Width x Height] format
transformed_image = pre_proccess(image).unsqueeze(0)


model.predict(transformed_image, conf=0.45).save("output")

Above was my script but when ever i run above script. I get below error

Traceback (most recent call last):
  File "/~/yolo_nas_predict_1.py", line 29, in <module>
    model.predict(transformed_image, conf=0.45).save("output")
  File "/~/super_gradients/training/models/detection_models/customizable_detector.py", line 174, in predict
    pipeline = self._get_pipeline(iou=iou, conf=conf)
  File "/~/super_gradients/training/models/detection_models/customizable_detector.py", line 151, in _get_pipeline
    raise RuntimeError(
RuntimeError: You must set the dataset processing parameters before calling predict.
Please call `model.set_dataset_processing_params(...)` first.

thats why i added `model.set_dataset_processing_params(...)' function

Now even after adding this i am getting following error

model.set_dataset_processing_params( class_names=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79"],
                                    image_processor=NormalizeImage(mean=[0, 0, 0], std=[0, 0, 0]),
                                     iou=0.35,conf=0.25,
                                     )
Traceback (most recent call last):
  File "/~/yolo_nas/yolo_nas_predict_1.py", line 28, in <module>
    model.predict(transformed_image, conf=0.45).save("output")
  File "/~/yolonas/lib/python3.9/site-packages/super_gradients/training/models/prediction_results.py", line 196, in save
    for i, prediction in enumerate(self._images_prediction_lst):
  File "~/super_gradients/training/pipelines/pipelines.py", line 133, in _generate_prediction_result
    yield from self._generate_prediction_result_single_batch(batch_images)
  File "/~/super_gradients/training/pipelines/pipelines.py", line 151, in _generate_prediction_result_single_batch
    preprocessed_image, processing_metadata = self.image_processor.preprocess_image(image=image.copy())
  File "/~super_gradients/training/processing/processing.py", line 162, in preprocess_image
    return (image - self.mean) / self.std, None
ValueError: operands could not be broadcast together with shapes (1,3,640,640) (1,1,3) 

from super-gradients.

ajithkumarmcw avatar ajithkumarmcw commented on July 19, 2024

it worked now i am able to predict this is the script which worked thanks for your help

from super_gradients.common.object_names import Models
from super_gradients.training import models
from super_gradients.training.processing import NormalizeImage
import torchvision.transforms as transforms
from PIL import Image
from super_gradients.training.processing import DetectionCenterPadding, StandardizeImage, NormalizeImage, ImagePermute, ComposeProcessing, DetectionLongestMaxSizeRescale

model = models.get(Models.YOLO_NAS_L,
                   checkpoint_path="./yolo_nas_l_coco.pth",
                   num_classes=80)

# Get PIL image
image = Image.open("./img.jpg")

model.set_dataset_processing_params( class_names=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79"],
                                    image_processor=ComposeProcessing(
                                            [
                                                DetectionLongestMaxSizeRescale(output_shape=(636, 636)),
                                                DetectionCenterPadding(output_shape=(640, 640), pad_value=114),
                                                StandardizeImage(max_value=255.0),
                                                ImagePermute(permutation=(2, 0, 1)),
                                            ]
                                        ),
                                     iou=0.35,conf=0.25,
                                     )
model.predict(image, conf=0.45).save("output")

from super-gradients.

ajithkumarmcw avatar ajithkumarmcw commented on July 19, 2024

Yes, thanks for the detailed explanation. Any reason for this error
ValueError: operands could not be broadcast together with shapes (1,3,640,640) (1,1,3) or how could it be fixed without removing transforms()

from super-gradients.

ajithkumarmcw avatar ajithkumarmcw commented on July 19, 2024

thanks a lot

from super-gradients.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.