Comments (42)
Try suppressing the number of boxes by amending the second line in the non_max_suppression function to bboxes = sorted(bboxes, key=lambda x: x[1], reverse=True)[:max_boxes]
. Also, evaluate after a couple of epochs so that the model has had the chance to converge a little bit first. I was running evaluation every 20 epochs for the 100examples.csv file.
from machine-learning-collection.
I have the same issue. It progressively gets slower for some reason.
from machine-learning-collection.
Yes i am also experiencing this , struck at batch : 0
100%|██████████████████████████████████████████████████████████████████████████| 26/26 [00:26<00:00, 1.03s/it, loss=55]
100%|████████████████████████████████████████████████████████████████████████| 26/26 [00:22<00:00, 1.14it/s, loss=51.8]
100%|████████████████████████████████████████████████████████████████████████| 26/26 [00:23<00:00, 1.13it/s, loss=49.9]
100%|████████████████████████████████████████████████████████████████████████| 26/26 [00:22<00:00, 1.16it/s, loss=49.2]
100%|███████████████████████████████████████████████████████████████████████████████████| 26/26 [00:07<00:00, 3.64it/s]
Class accuracy is: 9.126985%
No obj accuracy is: 0.085230%
Obj accuracy is: 99.735451%
0%| | 0/26 [00:00<?, ?it/s]eval batch : 0
from machine-learning-collection.
The way I interpret this is that all candidate boxes are over the threshold so the evaluation takes forever. This might happen because of a very low threshold or the fact that in the beginning the objectless score very high. Because if you look No obj accuracy is very low which means that all boxes are passed as containing an object. I do not know if proper bias/weight initialization can fix this or if increasing the threshold. One thing that I tried is to do the evaluation after 10 epochs where the values will stabilize and not lead to may positive boxes.
from machine-learning-collection.
@ckyrkou Thank you for the reply,
Currently ,
- doing evaluation after
20
epochs - increased
NMS_IOU_THRESH
to 0.75
i am still getting 10647
bounding boxes as below,
Class accuracy is: 35.317459%
No obj accuracy is: 6.079705%
Obj accuracy is: 69.444443%
0%| | 0/26 [00:00<?, ?it/s]
nme 0
bboxes , 10647
any thoughts ?
from machine-learning-collection.
The No obj accuracy is still very low. You need to change CONF_THRESHOLD for that. In the original config it is set to 0.05. I used CONF_THRESHOLD = 0.4. You can try that.
from machine-learning-collection.
@ckyrkou Thank you i tried with CONF_THRESHOLD
=0.6 , it was working alright
@beomgonyu you can please try this and see if that works :-)
from machine-learning-collection.
@guruprasaad123 Good to hear? Did you manage to reproduce the accuracies reported in the repo for pascal_voc?
from machine-learning-collection.
@ckyrkou i tried to reproduce the accuracy that is > 78 for pascal_voc , but i could'nt get to that level as of now.
This is what i am getting after 20 epochs,
Class accuracy is: 54.754784%
No obj accuracy is: 100.000000%
Obj accuracy is: 0.000000%
MAP: 0.0
and am still running the script ,
if i get any improvements on accuracy i would let you know for sure.
from machine-learning-collection.
Thanks. I tried running it for 100 epochs achieving up to 46 map. I was wondering if running for more would increase performance. I noticed that the parameters in the video are different than what is actually is in the repo.
from machine-learning-collection.
@ckyrkou cool , ofcourse parameters are different i too noticed , i too was wondering what could be the ideal parameter to get max mAP > 78 , i am also running for more than 100 epochs if i get improvement i would let you know for sure.
from machine-learning-collection.
@aningineer thank you i will surely try that out !
from machine-learning-collection.
thanks that works @aningineer , i have to set max_boxes = 1024
from machine-learning-collection.
Did you eventually managed to get the reported over 70% map?
from machine-learning-collection.
nope i have not yet , i am still trying @ckyrkou
from machine-learning-collection.
Same here. Haven't been able to reproduce 78% as described.
from machine-learning-collection.
I solved this issue by removing evaluation at initial time.
it seemed that there are so many predicted box, so it takes long time at initial time.
after training is steady, evaluation works well.
from machine-learning-collection.
I solved this issue by removing evaluation at initial time.
it seemed that there are so many predicted box, so it takes long time at initial time.
after training is steady, evaluation works well.
@beomgonyu What final Map do you get for Pascal VoC??
from machine-learning-collection.
hey there, i'm trying to train on 100 examples. i was stucked with the same problem and at least now, it works. Still, every accuracy seems to not change and MAP is fixed at 0.
Any idea why?
from machine-learning-collection.
hey there, i'm trying to train on 100 examples. i was stucked with the same problem and at least now, it works. Still, every accuracy seems to not change and MAP is fixed at 0.
Any idea why?
Did you notice that there are differences between the code of the video and the repository? for example in the config file.
As i read that there were problems i trained 50 epoch without checking the validation and i saved the training file. Then I went back to training using the training file as pre-trained weights and now i can check accuracy(map).
At the moment and as long as my loss is not less than 0 I keep training to try to reach MAP 0.78.
I also added:
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=3, verbose=True)
from machine-learning-collection.
hey there, i'm trying to train on 100 examples. i was stucked with the same problem and at least now, it works. Still, every accuracy seems to not change and MAP is fixed at 0.
Any idea why?Did you notice that there are differences between the code of the video and the repository? for example in the config file.
As i read that there were problems i trained 50 epoch without checking the validation and i saved the training file. Then I went back to training using the training file as pre-trained weights and now i can check accuracy(map).
At the moment and as long as my loss is not less than 0 I keep training to try to reach MAP 0.78.
I also added:
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=3, verbose=True)
Yes, i've noticed, and i used repository to clear my code. But i didn't find any impactfull difference, aside from CONF_THRESHOLD. On repository is 0.05, but below 0.6 (even with max_boxes = 1024), the training is painfully slow. But after 10 epochs, map is something like 0.0, or e-5.
I'm trying to rerun on train.csv and test.csv, but i'm pretty sure that even on 100 examples or 8 examples, map should go up to 0.9 after few epochs. The fact that doesn't happen is driving me insane. Because i don't know why the sum of TP is always a tensor of 0.
My loss is very often NaN. But i really can't understand why.
I'm testing with conf_threshold of 0.5
I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more
from machine-learning-collection.
I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more
@SimoDB90,
tonight i will upload to gmail and share a checkpoint where i have a loss around 1.50 and you can start from there.
from machine-learning-collection.
I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more
@SimoDB90,
tonight i will upload to gmail and share a checkpoint where i have a loss around 1.50 and you can start from there.
thank you a lot!
Just one thing. Are the utils functions right? Or there is something in repository wrong?
from machine-learning-collection.
I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more
@SimoDB90,
tonight i will upload to gmail and share a checkpoint where i have a loss around 1.50 and you can start from there.thank you a lot!
Just one thing. Are the utils functions right? Or there is something in repository wrong?
i haven't checked it in detail but i think they are fine at least the model is converging despite taking its time:
- 9m per epox.
- batchsize =8 I only have 6 GB of VRAM, more than that it fails
from machine-learning-collection.
I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more
@SimoDB90,
tonight i will upload to gmail and share a checkpoint where i have a loss around 1.50 and you can start from there.thank you a lot!
Just one thing. Are the utils functions right? Or there is something in repository wrong?i haven't checked it in detail but i think they are fine at least the model is converging despite taking its time:
* 9m per epox. * batchsize =8 I only have 6 GB of VRAM, more than that it fails
i'm working on google colab pro... my gtx 960 can only run to test if the code works :-)
just to know, how many epochs do you need to see some non zero value of MAP?
from machine-learning-collection.
i'm working on google colab pro... my gtx 960 can only run to test if the code works :-)
just to know, how many epochs do you need to see some non zero value of MAP?
I read in this thread that there were some problems at the beginning of the training so I trained without checking the accuracy and after 50 epox I started to check the accuracy.
I think when the loss was less than 20 I already started to have positive values, it's an estimate because I didn't check it at the beginning.
from machine-learning-collection.
fine, thank you. Maybe i stopped the training too early then
from machine-learning-collection.
well, i tried to train the 100examples.csv for 100 epochs. Never converged. Always map 0.0 and obj_accuracy 0%, while noobj_accuracy is always 100%.
Don't know where, but there is a problem in the code i suppose.
I watched carefully the repository and there are no differences.
Tried with 0.6 of conf_threshold; 0.5 map_iou_threshold and nms_iou_threshold 0.45; learning rate 1e-5 and 0 weight decay
from machine-learning-collection.
well, i tried to train the 100examples.csv for 100 epochs. Never converged. Always map 0.0 and obj_accuracy 0%, while noobj_accuracy is always 100%.
Don't know where, but there is a problem in the code i suppose.
I watched carefully the repository and there are no differences.
Tried with 0.6 of conf_threshold; 0.5 map_iou_threshold and nms_iou_threshold 0.45; learning rate 1e-5 and 0 weight decay
good morning,
calm everything will resolve itself. must be one of those bugs that sometimes come up.
I'm on vacation and I couldn't load the checkpoint, the hotel's WiFi is slow and the file is big.
I will try to load later.
update:
the checkpoint file is 740MB. I can't upload in hotel wifi, to slow :(
In the end of next week I will be at home and i will upload.
by any chance did you try to train with a smaller batch size and in full precision fp32?
from machine-learning-collection.
well, i tried to train the 100examples.csv for 100 epochs. Never converged. Always map 0.0 and obj_accuracy 0%, while noobj_accuracy is always 100%.
Don't know where, but there is a problem in the code i suppose.
I watched carefully the repository and there are no differences.
Tried with 0.6 of conf_threshold; 0.5 map_iou_threshold and nms_iou_threshold 0.45; learning rate 1e-5 and 0 weight decay
@SimoDB90, try to use these pre-trained weights(loss = 1.4) to continue your training:
(batch_size=8, lr=1e-6)
https://drive.google.com/file/d/1utjhWJ-KB11MsWNhWsE_J3xsh9QDMsLL/view?usp=sharing
I did some tests and got the best Map with CONF_THRESHOLD = 0.05 as it is in the config file in the github repository.
I'm wondering if it's worth continuing training until I have a smaller loss. How cool would it be to do this with vision transformer
from machine-learning-collection.
from machine-learning-collection.
well, i tried to train the 100examples.csv for 100 epochs. Never converged. Always map 0.0 and obj_accuracy 0%, while noobj_accuracy is always 100%.
Don't know where, but there is a problem in the code i suppose.
I watched carefully the repository and there are no differences.
Tried with 0.6 of conf_threshold; 0.5 map_iou_threshold and nms_iou_threshold 0.45; learning rate 1e-5 and 0 weight decay@SimoDB90, try to use these pre-trained weights(loss = 1.4) to continue your training:
(batch_size=8, lr=1e-6)
https://drive.google.com/file/d/1utjhWJ-KB11MsWNhWsE_J3xsh9QDMsLL/view?usp=sharing
I did some tests and got the best Map with CONF_THRESHOLD = 0.05 as it is in the config file in the github repository.
I'm wondering if it's worth continuing training until I have a smaller loss. How cool would it be to do this with vision transformer
Hi! I'm back from holiday. I tried your weights, but i have this traceback:
RuntimeError: Error(s) in loading state_dict for YOLOv3:
Missing key(s) in state_dict: "layers.0.conv.bias", "layers.1.conv.bias", "layers.2.layers.0.0.conv.bias", "layers.2.layers.0.1.conv.bias", "layers.3.conv.bias", "layers.4.layers.0.0.conv.bias", "layers.4.layers.0.1.conv.bias", "layers.4.layers.1.0.conv.bias", "layers.4.layers.1.1.conv.bias", "layers.5.conv.bias", "layers.6.layers.0.0.conv.bias", "layers.6.layers.0.1.conv.bias", "layers.6.layers.1.0.conv.bias", "layers.6.layers.1.1.conv.bias", "layers.6.layers.2.0.conv.bias", "layers.6.layers.2.1.conv.bias", "layers.6.layers.3.0.conv.bias", "layers.6.layers.3.1.conv.bias", "layers.6.layers.4.0.conv.bias", "layers.6.layers.4.1.conv.bias", "layers.6.layers.5.0.conv.bias", "layers.6.layers.5.1.conv.bias", "layers.6.layers.6.0.conv.bias", "layers.6.layers.6.1.conv.bias", "layers.6.layers.7.0.conv.bias", "layers.6.layers.7.1.conv.bias", "layers.7.conv.bias", "layers.8.layers.0.0.conv.bias", "layers.8.layers.0.1.conv.bias", "layers.8.layers.1.0.conv.bias", "layers.8.layers.1.1.conv.bias", "layers.8.layers.2.0.conv.bias", "layers.8.layers.2.1.conv.bias", "layers.8.layers.3.0.conv.bias", "layers.8.layers.3.1.conv.bias", "layers.8.layers.4.0.conv.bias", "layers.8.layers.4.1.conv.bias", "layers.8.layers.5.0.conv.bias", "layers.8.layers.5.1.conv.bias", "layers.8.layers.6.0.conv.bias", "layers.8.layers.6.1.conv.bias", "layers.8.layers.7.0.conv.bias", "layers.8.layers.7.1.conv.bias", "layers.9.conv.bias", "layers.10.layers.0.0.conv.bias", "layers.10.layers.0.1.conv.bias", "layers.10.layers.1.0.conv.bias", "layers.10.layers.1.1.conv.bias", "layers.10.layers.2.0.conv.bias", "layers.10.layers.2.1.conv.bias", "layers.10.layers.3.0.conv.bias", "layers.10.layers.3.1.conv.bias", "layers.11.conv.bias", "layers.12.conv.bias", "layers.13.layers.0.0.conv.bias", "layers.13.layers.0.1.conv.bias", "layers.14.conv.bias", "layers.15.pred.0.conv.bias", "layers.16.conv.bias", "layers.18.conv.bias", "layers.19.conv.bias", "layers.20.layers.0.0.conv.bias", "layers.20.layers.0.1.conv.bias", "layers.21.conv.bias", "layers.22.pred.0.conv.bias", "layers.23.conv.bias", "layers.25.conv.bias", "layers.26.conv.bias", "layers.27.layers.0.0.conv.bias", "layers.27.layers.0.1.conv.bias", "layers.28.conv.bias", "layers.29.pred.0.conv.bias".
I suspect my network is different from yours, but i used the same in youtube video (the same of repository, i double checked).
Any idea?
from machine-learning-collection.
(...)
I suspect my network is different from yours, but i used the same in youtube video (the same of repository, i double checked).
Any idea?
I'm going to upload the code I used for debugging purposes
from machine-learning-collection.
from machine-learning-collection.
I can write the code of my network maybe? Il lun 6 set 2021, 16:18 JoaoProductDev @.***> ha scritto:
(...)
Good afternoon,
this is the code I used. I think it's the same as the repository, maybe I have tweaked something but nothing relevant I would say.
Try to use this code with my weights.
from machine-learning-collection.
from machine-learning-collection.
ok.. i want to kill myself... after 2 hours, i found the problem: in the convblock, i put as default argument of batch norm active "False" instead of "True" Now your weights work and at this point i think even the training would work well You were so kind, and you can't imagine how much you helped me! thank you soooooooooooooooo much! Il giorno mar 7 set 2021 alle ore 14:58
no problem, it happens to everyone. glad I could help.
have you ever wondered if CNN is the way to go?
If an architecture were robust enough it wouldn't be sensitive to a One Pixel Attack. CNN lacks context.
Is Vision Transformer the future of computer vision?
sorry for my thoughts, maybe it's not the subject of this thread.
from machine-learning-collection.
from machine-learning-collection.
I'm not very into transformers to be honest. I started doing some serious DL study just a couple of months ago. In the future, I'll going deep, I hope. For now, I'm super happy that I understand the code and my Net works. I'm
best of luck
from machine-learning-collection.
from machine-learning-collection.
@ckyrkou Thank you i tried with
CONF_THRESHOLD
=0.6 , it was working alright @beomgonyu you can please try this and see if that works :-)
Bro can you please share your config parameter values for which you get fine No of Object and MAP accuracy.
from machine-learning-collection.
@ckyrkou Thank you i tried with
CONF_THRESHOLD
=0.6 , it was working alright @beomgonyu you can please try this and see if that works :-)Bro can you please share your config parameter values for which you get fine No of Object and MAP accuracy.
I tried to train 100 example yesterday and the no_obj_acc was painfully 0-2,5% and map 0.02 or less.. With learning rate of 0,005. Don't know if it has to be changed to make the training do something. I tried different values of other parameters but doesn't work.
The weight a couple of post above does work thou. But if you find a way to train it on your own, please let me know. I struggle on Yolo for several weeks
from machine-learning-collection.
Related Issues (20)
- YOLO v1 - why using Adam as the optimizer HOT 1
- Question in self-attention from 'transformer from scratch'
- Can you add a cff so your work can be cited? HOT 1
- Error in train.py HOT 1
- add header=None to pd.read_csv
- -
- Tensor tutorial 3: Neural Networks with Sequential and Functional API Issue
- Pretrained weight for semantic segmentation
- Image Captioning gives following error: TypeError: relu(): argument 'input' (position 1) must be Tensor, not InceptionOutputs HOT 1
- Re. Height and Width of image, mask or masks should be equal. You can disable shapes check by setting a parameter is_check_shapes=False of Compose class
- Weights ESRGAN
- YOLO v1 loss
- SelfAttention bug on Scores * V HOT 1
- Pytorch/GANs /CycleGAN/generator_model.py | Test function has a minor issue.
- Issue with YOLOv3 Anchors on Scale HOT 2
- ConvBlock for Discriminator
- type error: Trainer.__init__() got an unexpected keyword argument 'auto_lr_find'
- why is z_dim=64 in simple GAN code
- Very low validation accuracy while training from scratch on CIFAR10
- YOLOv3 - OpenCV error in function 'warpAffine': Argument 'borderMode' is required to be an integer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from machine-learning-collection.