Comments (14)
Hi @sivaji123256, regarding your original question of filtering classes.
This is not available out-of-the-box, but you can achieve this by modifying the model a bit. As you can see in the implementation of the head, it has out_channels=num_classes
, which basically mean you have num_classes
filters, each corresponds to a class. If you then take, in the forward
method, the relevant filters (e.g., classes (=indices) [1], [15], [78]
, you are actually doing the filter you desire. Let me know if that helps
from super-gradients.
I think this has been already asked a couple of times here and there:
#1078 (comment)
from super-gradients.
Join the discussion on DagsHub!
from super-gradients.
Could you please elaborate what the use case? Do you want to detect only certain classes using pretrained model or something different?
from super-gradients.
hi @BloodAxe ,yes , i would like to filter a specific class from pretrained model.Also , I have the following issue
have tested on a video using ultralityics yolov8 and yolo nas as well using Tesla T4.I am not sure why but Yolo NAS large running at 20 msec .i.e calculated using 17000 msec /876 frame ( 17 sec as it was printed during inference) i.e as below in the figure whereas yolov8 large sized model is running at 11.5 msec per frame. I am not sure why its Yolo NAS is taking more time.Any idea on this?
from super-gradients.
Regarding filtering of classes - no, currently this is not supported.
Regarding inference time - Numbers that we shared on the frontier plot are obtained using TensorRT inference engine (batch size 1). This is in consistency with other models on that plot (YOLO-V8/7/6/5,PPYOLO). So we compare apples to apples.
Native pytorch inference is expected to be much slower. There are a number of reasons:
- Eager execution has it's price
- PyTorch model by default uses non-fused RepVGG blocks which is suitable for training, but for inference you may want to fuse them (You can achieve this by calling
model.prep_model_for_conversion((640, 640))
but bear in mind that you cannot train this model after you've called prep_model_for_conversion) - Video decoding / Moving images to GPU / Visualization of results takes additional time.
I'm not ready to comment regarding yolov8 as I'm not familiar with implementation of their inference.
from super-gradients.
@BloodAxe ,Thanks .But first of all , can you confirm me whether I was calculating the FPS value in a correct way as per the previous image attached?
from super-gradients.
The processing FPS can actually be seen directly next to the loading bar. In the following example, the FPS is 39.49it/s (i.e. fps)
Predicting Video: 100%|███████████████████████| 306/306 [00:07<00:00, 39.49it/s]
If you calculate it manually, you might get a different (wrong) value because the time in sec (7 here) is rounded. In my example, this would give: 306/7=43.71fps
.
Note that there are 2 fps values: the one of the original video and the one at which we process the video.
To avoid confusion, we chose not to write the processing fps on the video because it could be mistaken for the video fps. Both type of fps are different because the video is not streamed (we first process it, and then visualize/save in the original fps)
from super-gradients.
Closing as answered
from super-gradients.
@NatanBagrov. Thank you for providing the necessary references. The method you have mentioned to filter classes seems to be too complex for me to comprehend and implement as a beginner. I'd be grateful if you could elaborate more on this topic. Thanks again
from super-gradients.
@NatanBagrov @BloodAxe . I modified a reference script provided by @Louis-Dupont. It is only missing the feature to to look for humans/persons in the frame.
import cv2
import time
import asyncio
import logging
from super_gradients.training import models
from super_gradients.common.object_names import Models
from telegram import Bot
from telegram.error import TelegramError
# Set up logging
logging.basicConfig(filename='app.log',
filemode='w',
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
level=logging.INFO)
logger = logging.getLogger(__name__)
TOKEN = 'YOUR_BOT_TOKEN'
CHAT_ID = 'YOUR_CHAT_ID'
bot = Bot(token=TOKEN)
async def send_message(text):
try:
await bot.send_message(chat_id=CHAT_ID, text=text)
except TelegramError as e:
logger.error("Failed to send message through Telegram with error: %s", e)
# Create a loop to run the async function in
loop = asyncio.get_event_loop()
loop.run_until_complete(send_message("Detection Program Started"))
def release_resources():
logger.info("Releasing video capture objects and closing windows")
cap1.release()
cap2.release()
cap3.release()
cap4.release()
cap5.release()
cv2.destroyAllWindows()
start_time = time.time()
logger.info("Script start time: %s", start_time)
try:
logger.info("Starting to load model")
model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco").cuda()
logger.info("Detection Model loaded successfully")
loop.run_until_complete(send_message("Detection Model loaded successfully"))
except Exception as e:
logger.error("Detection Model loading failed with error: %s", e)
loop.run_until_complete(send_message("Model loading failed with error: " + str(e)))
try:
cap1 = cv2.VideoCapture('camstream1')
cap2 = cv2.VideoCapture('camstream2')
cap3 = cv2.VideoCapture('camstream3')
cap4 = cv2.VideoCapture('camstream4')
cap5 = cv2.VideoCapture('camstream5')
except Exception as e:
logger.error("VideoCapture initialization failed with error: %s", e)
loop.run_until_complete(send_message("VideoCapture initialization failed with error: " + str(e)))
while True:
try:
captures = [(cap1, 'cam_name1'), (cap2, 'cam_name2'), (cap3, 'cam_name3'), (cap4, 'cam_name4'), (cap5, 'cam_name5')]
frames = []
logger.info("Reading frames from cameras")
for cap, camera_name in captures:
ret, frame = cap.read()
if not ret:
loop.run_until_complete(send_message(f"Failed to read frames from camera {camera_name}"))
else:
frames.append(frame)
logger.info("Predicting frames")
predictions = model.predict(frames)
for i, (predicted_frame, (_, camera_name)) in enumerate(zip(predictions, captures)):
cv2.imshow(camera_name, predicted_frame.draw())
if cv2.waitKey(1) & 0xFF == ord("q"):
break
except Exception as e:
logger.error("An error occurred: %s", e)
loop.run_until_complete(send_message("An error occurred: " + str(e)))
time.sleep(5) # optional: wait before trying again
continue
release_resources()
end_time = time.time()
logger.info("Script end time: %s", end_time)
execution_time = end_time - start_time
logger.info("Script executed in: %s seconds", execution_time)
print(f"Script executed in: {execution_time} seconds")
loop.run_until_complete(send_message(f"Script executed in: {execution_time} seconds"))
from super-gradients.
@NatanBagrov. Thank you for providing the necessary references. The method you have mentioned to filter classes seems to be too complex for me to comprehend and implement as a beginner. I'd be grateful if you could elaborate more on this topic. Thanks again
Do you intend to do the filtering as a post-process (after you get the prediction from the network), or you want the subclasses output directly from the model itself?
With the first option, you don't need to modify any yolo class, but rather just filter out irrelevant classes.
from super-gradients.
Thank you @NatanBagrov for quick response. Can you please explain what it means in both scenarios? I'm my project I want to save the frame when an human/person is detected in the frame with the person inside the bounding boxes. Nothing other than human in the bounding box. For this scenario which will be the best option?
from super-gradients.
Regarding filtering of classes - no, currently this is not supported.
Regarding inference time - Numbers that we shared on the frontier plot are obtained using TensorRT inference engine (batch size 1). This is in consistency with other models on that plot (YOLO-V8/7/6/5,PPYOLO). So we compare apples to apples.
Native pytorch inference is expected to be much slower. There are a number of reasons:
1. Eager execution has it's price 2. PyTorch model by default uses non-fused RepVGG blocks which is suitable for training, but for inference you may want to fuse them (You can achieve this by calling `model.prep_model_for_conversion((640, 640))` but bear in mind that you cannot train this model after you've called prep_model_for_conversion) 3. Video decoding / Moving images to GPU / Visualization of results takes additional time.
I'm not ready to comment regarding yolov8 as I'm not familiar with implementation of their inference.
@BloodAxe What about doing inference with another img-size? Such as in other YOLOs
from super-gradients.
Related Issues (20)
- Work with keypoints for recognize some poses HOT 1
- Custom metrics that depends on image_path?
- DetectionRandomAffine target-size is in wrong format HOT 2
- COCO Recipe reporting low precision HOT 1
- ImportError: cannot import name 'utils' from partially initialized module 'super_gradients.training' (most likely due to a circular import HOT 4
- yolo-nas-sat model availability
- AttributeError: 'RegSeg48' object has no attribute 'set_dataset_processing_params' HOT 1
- How to set different weight decay values for different modules of the model
- yolo nas pose demo/colab is broken HOT 1
- How to get edge_links, edge_colors, keypoint_colors when using yolo nas pose onnx?
- Validation metrics = 0.0 during training yolo-nas
- YOLO NAS'S Precision is significantly lower compare to other later YOLO model even when using same dataset ? HOT 4
- BaseSGLogger storage_location parameter is systematically overriden, why?
- Access Joints Coordinate
- Ground tensor shape issue when training YOLO_NAS_S model on a custom dataset HOT 1
- Issue when training and predicting with a custom dataset and the YOLO_NAS_S model HOT 2
- Model training process halted for small dataset HOT 4
- Inquiry About Official Release Date of OBB Detection Models for YOLO-NAS and Training HOT 1
- ONNX Export Output has Incorrect Class Labels but Correct Box and Confidence HOT 1
- Procuring license for commerical application of YOLO - NAS (with pre-trained weights)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from super-gradients.