Code Monkey home page Code Monkey logo

Comments (14)

NatanBagrov avatar NatanBagrov commented on July 19, 2024 1

Hi @sivaji123256, regarding your original question of filtering classes.
This is not available out-of-the-box, but you can achieve this by modifying the model a bit. As you can see in the implementation of the head, it has out_channels=num_classes, which basically mean you have num_classes filters, each corresponds to a class. If you then take, in the forward method, the relevant filters (e.g., classes (=indices) [1], [15], [78], you are actually doing the filter you desire. Let me know if that helps

from super-gradients.

BloodAxe avatar BloodAxe commented on July 19, 2024 1

I think this has been already asked a couple of times here and there:
#1078 (comment)

from super-gradients.

dagshub avatar dagshub commented on July 19, 2024

Join the discussion on DagsHub!

from super-gradients.

BloodAxe avatar BloodAxe commented on July 19, 2024

Could you please elaborate what the use case? Do you want to detect only certain classes using pretrained model or something different?

from super-gradients.

sivaji123256 avatar sivaji123256 commented on July 19, 2024

hi @BloodAxe ,yes , i would like to filter a specific class from pretrained model.Also , I have the following issue
have tested on a video using ultralityics yolov8 and yolo nas as well using Tesla T4.I am not sure why but Yolo NAS large running at 20 msec .i.e calculated using 17000 msec /876 frame ( 17 sec as it was printed during inference) i.e as below in the figure whereas yolov8 large sized model is running at 11.5 msec per frame. I am not sure why its Yolo NAS is taking more time.Any idea on this?
yolo nas

from super-gradients.

BloodAxe avatar BloodAxe commented on July 19, 2024

Regarding filtering of classes - no, currently this is not supported.

Regarding inference time - Numbers that we shared on the frontier plot are obtained using TensorRT inference engine (batch size 1). This is in consistency with other models on that plot (YOLO-V8/7/6/5,PPYOLO). So we compare apples to apples.

Native pytorch inference is expected to be much slower. There are a number of reasons:

  1. Eager execution has it's price
  2. PyTorch model by default uses non-fused RepVGG blocks which is suitable for training, but for inference you may want to fuse them (You can achieve this by calling model.prep_model_for_conversion((640, 640)) but bear in mind that you cannot train this model after you've called prep_model_for_conversion)
  3. Video decoding / Moving images to GPU / Visualization of results takes additional time.

I'm not ready to comment regarding yolov8 as I'm not familiar with implementation of their inference.

from super-gradients.

sivaji123256 avatar sivaji123256 commented on July 19, 2024

@BloodAxe ,Thanks .But first of all , can you confirm me whether I was calculating the FPS value in a correct way as per the previous image attached?

from super-gradients.

Louis-Dupont avatar Louis-Dupont commented on July 19, 2024

The processing FPS can actually be seen directly next to the loading bar. In the following example, the FPS is 39.49it/s (i.e. fps)
Predicting Video: 100%|███████████████████████| 306/306 [00:07<00:00, 39.49it/s]
If you calculate it manually, you might get a different (wrong) value because the time in sec (7 here) is rounded. In my example, this would give: 306/7=43.71fps.

Note that there are 2 fps values: the one of the original video and the one at which we process the video.
To avoid confusion, we chose not to write the processing fps on the video because it could be mistaken for the video fps. Both type of fps are different because the video is not streamed (we first process it, and then visualize/save in the original fps)

from super-gradients.

BloodAxe avatar BloodAxe commented on July 19, 2024

Closing as answered

from super-gradients.

mriamnobody avatar mriamnobody commented on July 19, 2024

@NatanBagrov. Thank you for providing the necessary references. The method you have mentioned to filter classes seems to be too complex for me to comprehend and implement as a beginner. I'd be grateful if you could elaborate more on this topic. Thanks again

from super-gradients.

mriamnobody avatar mriamnobody commented on July 19, 2024

@NatanBagrov @BloodAxe . I modified a reference script provided by @Louis-Dupont. It is only missing the feature to to look for humans/persons in the frame.

import cv2
import time
import asyncio
import logging
from super_gradients.training import models
from super_gradients.common.object_names import Models
from telegram import Bot
from telegram.error import TelegramError

# Set up logging
logging.basicConfig(filename='app.log', 
                    filemode='w', 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                    level=logging.INFO)
logger = logging.getLogger(__name__)

TOKEN = 'YOUR_BOT_TOKEN'
CHAT_ID = 'YOUR_CHAT_ID'

bot = Bot(token=TOKEN)

async def send_message(text):
    try:
        await bot.send_message(chat_id=CHAT_ID, text=text)
    except TelegramError as e:
        logger.error("Failed to send message through Telegram with error: %s", e)

# Create a loop to run the async function in
loop = asyncio.get_event_loop()
loop.run_until_complete(send_message("Detection Program Started"))

def release_resources():
    logger.info("Releasing video capture objects and closing windows")
    cap1.release()
    cap2.release()
    cap3.release()
    cap4.release()
    cap5.release()
    cv2.destroyAllWindows()

start_time = time.time()
logger.info("Script start time: %s", start_time)

try:
    logger.info("Starting to load model")
    model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco").cuda()
    logger.info("Detection Model loaded successfully")
    loop.run_until_complete(send_message("Detection Model loaded successfully"))
except Exception as e:
    logger.error("Detection Model loading failed with error: %s", e)
    loop.run_until_complete(send_message("Model loading failed with error: " + str(e)))

try:
    cap1 = cv2.VideoCapture('camstream1')
    cap2 = cv2.VideoCapture('camstream2')
    cap3 = cv2.VideoCapture('camstream3')
    cap4 = cv2.VideoCapture('camstream4')
    cap5 = cv2.VideoCapture('camstream5')
except Exception as e:
    logger.error("VideoCapture initialization failed with error: %s", e)
    loop.run_until_complete(send_message("VideoCapture initialization failed with error: " + str(e)))

while True:
    try:
        captures = [(cap1, 'cam_name1'), (cap2, 'cam_name2'), (cap3, 'cam_name3'), (cap4, 'cam_name4'), (cap5, 'cam_name5')]
        frames = []
        logger.info("Reading frames from cameras")

        for cap, camera_name in captures:
            ret, frame = cap.read()
            if not ret:
                loop.run_until_complete(send_message(f"Failed to read frames from camera {camera_name}"))
            else:
                frames.append(frame)

        logger.info("Predicting frames")
        predictions = model.predict(frames)

        for i, (predicted_frame, (_, camera_name)) in enumerate(zip(predictions, captures)):
            cv2.imshow(camera_name, predicted_frame.draw())
            
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    except Exception as e:
        logger.error("An error occurred: %s", e)
        loop.run_until_complete(send_message("An error occurred: " + str(e)))
        time.sleep(5) # optional: wait before trying again
        continue

release_resources()

end_time = time.time()
logger.info("Script end time: %s", end_time)

execution_time = end_time - start_time
logger.info("Script executed in: %s seconds", execution_time)
print(f"Script executed in: {execution_time} seconds")
loop.run_until_complete(send_message(f"Script executed in: {execution_time} seconds"))

from super-gradients.

NatanBagrov avatar NatanBagrov commented on July 19, 2024

@NatanBagrov. Thank you for providing the necessary references. The method you have mentioned to filter classes seems to be too complex for me to comprehend and implement as a beginner. I'd be grateful if you could elaborate more on this topic. Thanks again

Do you intend to do the filtering as a post-process (after you get the prediction from the network), or you want the subclasses output directly from the model itself?

With the first option, you don't need to modify any yolo class, but rather just filter out irrelevant classes.

from super-gradients.

mriamnobody avatar mriamnobody commented on July 19, 2024

Thank you @NatanBagrov for quick response. Can you please explain what it means in both scenarios? I'm my project I want to save the frame when an human/person is detected in the frame with the person inside the bounding boxes. Nothing other than human in the bounding box. For this scenario which will be the best option?

from super-gradients.

Alberto1404 avatar Alberto1404 commented on July 19, 2024

Regarding filtering of classes - no, currently this is not supported.

Regarding inference time - Numbers that we shared on the frontier plot are obtained using TensorRT inference engine (batch size 1). This is in consistency with other models on that plot (YOLO-V8/7/6/5,PPYOLO). So we compare apples to apples.

Native pytorch inference is expected to be much slower. There are a number of reasons:

1. Eager execution has it's price

2. PyTorch model by default uses non-fused RepVGG blocks which is suitable for training, but for inference you may want to fuse them (You can achieve this by calling `model.prep_model_for_conversion((640, 640))` but bear in mind that you cannot train this model after you've called prep_model_for_conversion)

3. Video decoding / Moving images to GPU / Visualization of results takes additional time.

I'm not ready to comment regarding yolov8 as I'm not familiar with implementation of their inference.

@BloodAxe What about doing inference with another img-size? Such as in other YOLOs

from super-gradients.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.