Code Monkey home page Code Monkey logo

Comments (42)

aningineer avatar aningineer commented on May 25, 2024 1

Try suppressing the number of boxes by amending the second line in the non_max_suppression function to bboxes = sorted(bboxes, key=lambda x: x[1], reverse=True)[:max_boxes]. Also, evaluate after a couple of epochs so that the model has had the chance to converge a little bit first. I was running evaluation every 20 epochs for the 100examples.csv file.

from machine-learning-collection.

ckyrkou avatar ckyrkou commented on May 25, 2024

I have the same issue. It progressively gets slower for some reason.

from machine-learning-collection.

guruprasaad123 avatar guruprasaad123 commented on May 25, 2024

Yes i am also experiencing this , struck at batch : 0

100%|██████████████████████████████████████████████████████████████████████████| 26/26 [00:26<00:00,  1.03s/it, loss=55]
100%|████████████████████████████████████████████████████████████████████████| 26/26 [00:22<00:00,  1.14it/s, loss=51.8]
100%|████████████████████████████████████████████████████████████████████████| 26/26 [00:23<00:00,  1.13it/s, loss=49.9]
100%|████████████████████████████████████████████████████████████████████████| 26/26 [00:22<00:00,  1.16it/s, loss=49.2]
100%|███████████████████████████████████████████████████████████████████████████████████| 26/26 [00:07<00:00,  3.64it/s]
Class accuracy is: 9.126985%
No obj accuracy is: 0.085230%
Obj accuracy is: 99.735451%
  0%|                                                                                            | 0/26 [00:00<?, ?it/s]eval batch :  0

from machine-learning-collection.

ckyrkou avatar ckyrkou commented on May 25, 2024

The way I interpret this is that all candidate boxes are over the threshold so the evaluation takes forever. This might happen because of a very low threshold or the fact that in the beginning the objectless score very high. Because if you look No obj accuracy is very low which means that all boxes are passed as containing an object. I do not know if proper bias/weight initialization can fix this or if increasing the threshold. One thing that I tried is to do the evaluation after 10 epochs where the values will stabilize and not lead to may positive boxes.

from machine-learning-collection.

guruprasaad123 avatar guruprasaad123 commented on May 25, 2024

@ckyrkou Thank you for the reply,
Currently ,

  • doing evaluation after 20 epochs
  • increased NMS_IOU_THRESH to 0.75

i am still getting 10647 bounding boxes as below,

Class accuracy is: 35.317459%
No obj accuracy is: 6.079705%
Obj accuracy is: 69.444443%
  0%|                                                                                            | 0/26 [00:00<?, ?it/s]
nme  0
bboxes ,  10647

any thoughts ?

from machine-learning-collection.

ckyrkou avatar ckyrkou commented on May 25, 2024

The No obj accuracy is still very low. You need to change CONF_THRESHOLD for that. In the original config it is set to 0.05. I used CONF_THRESHOLD = 0.4. You can try that.

from machine-learning-collection.

guruprasaad123 avatar guruprasaad123 commented on May 25, 2024

@ckyrkou Thank you i tried with CONF_THRESHOLD=0.6 , it was working alright
@beomgonyu you can please try this and see if that works :-)

from machine-learning-collection.

ckyrkou avatar ckyrkou commented on May 25, 2024

@guruprasaad123 Good to hear? Did you manage to reproduce the accuracies reported in the repo for pascal_voc?

from machine-learning-collection.

guruprasaad123 avatar guruprasaad123 commented on May 25, 2024

@ckyrkou i tried to reproduce the accuracy that is > 78 for pascal_voc , but i could'nt get to that level as of now.
This is what i am getting after 20 epochs,

Class accuracy is: 54.754784%
No obj accuracy is: 100.000000%
Obj accuracy is: 0.000000%

MAP: 0.0

and am still running the script ,
if i get any improvements on accuracy i would let you know for sure.

from machine-learning-collection.

ckyrkou avatar ckyrkou commented on May 25, 2024

Thanks. I tried running it for 100 epochs achieving up to 46 map. I was wondering if running for more would increase performance. I noticed that the parameters in the video are different than what is actually is in the repo.

from machine-learning-collection.

guruprasaad123 avatar guruprasaad123 commented on May 25, 2024

@ckyrkou cool , ofcourse parameters are different i too noticed , i too was wondering what could be the ideal parameter to get max mAP > 78 , i am also running for more than 100 epochs if i get improvement i would let you know for sure.

from machine-learning-collection.

guruprasaad123 avatar guruprasaad123 commented on May 25, 2024

@aningineer thank you i will surely try that out !

from machine-learning-collection.

guruprasaad123 avatar guruprasaad123 commented on May 25, 2024

thanks that works @aningineer , i have to set max_boxes = 1024

from machine-learning-collection.

ckyrkou avatar ckyrkou commented on May 25, 2024

Did you eventually managed to get the reported over 70% map?

from machine-learning-collection.

guruprasaad123 avatar guruprasaad123 commented on May 25, 2024

nope i have not yet , i am still trying @ckyrkou

from machine-learning-collection.

aningineer avatar aningineer commented on May 25, 2024

Same here. Haven't been able to reproduce 78% as described.

from machine-learning-collection.

beomgonyu avatar beomgonyu commented on May 25, 2024

I solved this issue by removing evaluation at initial time.
it seemed that there are so many predicted box, so it takes long time at initial time.
after training is steady, evaluation works well.

from machine-learning-collection.

ckyrkou avatar ckyrkou commented on May 25, 2024

I solved this issue by removing evaluation at initial time.
it seemed that there are so many predicted box, so it takes long time at initial time.
after training is steady, evaluation works well.

@beomgonyu What final Map do you get for Pascal VoC??

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

hey there, i'm trying to train on 100 examples. i was stucked with the same problem and at least now, it works. Still, every accuracy seems to not change and MAP is fixed at 0.
Any idea why?

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

hey there, i'm trying to train on 100 examples. i was stucked with the same problem and at least now, it works. Still, every accuracy seems to not change and MAP is fixed at 0.
Any idea why?

@SimoDB90 ,

Did you notice that there are differences between the code of the video and the repository? for example in the config file.

As i read that there were problems i trained 50 epoch without checking the validation and i saved the training file. Then I went back to training using the training file as pre-trained weights and now i can check accuracy(map).

At the moment and as long as my loss is not less than 0 I keep training to try to reach MAP 0.78.

I also added:
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=3, verbose=True)

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

hey there, i'm trying to train on 100 examples. i was stucked with the same problem and at least now, it works. Still, every accuracy seems to not change and MAP is fixed at 0.
Any idea why?

@SimoDB90 ,

Did you notice that there are differences between the code of the video and the repository? for example in the config file.

As i read that there were problems i trained 50 epoch without checking the validation and i saved the training file. Then I went back to training using the training file as pre-trained weights and now i can check accuracy(map).

At the moment and as long as my loss is not less than 0 I keep training to try to reach MAP 0.78.

I also added:
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, patience=3, verbose=True)

Yes, i've noticed, and i used repository to clear my code. But i didn't find any impactfull difference, aside from CONF_THRESHOLD. On repository is 0.05, but below 0.6 (even with max_boxes = 1024), the training is painfully slow. But after 10 epochs, map is something like 0.0, or e-5.
I'm trying to rerun on train.csv and test.csv, but i'm pretty sure that even on 100 examples or 8 examples, map should go up to 0.9 after few epochs. The fact that doesn't happen is driving me insane. Because i don't know why the sum of TP is always a tensor of 0.
My loss is very often NaN. But i really can't understand why.
I'm testing with conf_threshold of 0.5

I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more

@SimoDB90,
tonight i will upload to gmail and share a checkpoint where i have a loss around 1.50 and you can start from there.

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more

@SimoDB90,
tonight i will upload to gmail and share a checkpoint where i have a loss around 1.50 and you can start from there.

thank you a lot!
Just one thing. Are the utils functions right? Or there is something in repository wrong?

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more

@SimoDB90,
tonight i will upload to gmail and share a checkpoint where i have a loss around 1.50 and you can start from there.

thank you a lot!
Just one thing. Are the utils functions right? Or there is something in repository wrong?

i haven't checked it in detail but i think they are fine at least the model is converging despite taking its time:

  • 9m per epox.
  • batchsize =8 I only have 6 GB of VRAM, more than that it fails

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

I suspect there is some problem in the code, but if the same code is working for you, i don't understand what to do more

@SimoDB90,
tonight i will upload to gmail and share a checkpoint where i have a loss around 1.50 and you can start from there.

thank you a lot!
Just one thing. Are the utils functions right? Or there is something in repository wrong?

i haven't checked it in detail but i think they are fine at least the model is converging despite taking its time:

* 9m per epox.

* batchsize =8 I only  have 6 GB of VRAM, more than that it fails

i'm working on google colab pro... my gtx 960 can only run to test if the code works :-)
just to know, how many epochs do you need to see some non zero value of MAP?

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

i'm working on google colab pro... my gtx 960 can only run to test if the code works :-)
just to know, how many epochs do you need to see some non zero value of MAP?

I read in this thread that there were some problems at the beginning of the training so I trained without checking the accuracy and after 50 epox I started to check the accuracy.

I think when the loss was less than 20 I already started to have positive values, it's an estimate because I didn't check it at the beginning.

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

fine, thank you. Maybe i stopped the training too early then

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

well, i tried to train the 100examples.csv for 100 epochs. Never converged. Always map 0.0 and obj_accuracy 0%, while noobj_accuracy is always 100%.
Don't know where, but there is a problem in the code i suppose.
I watched carefully the repository and there are no differences.
Tried with 0.6 of conf_threshold; 0.5 map_iou_threshold and nms_iou_threshold 0.45; learning rate 1e-5 and 0 weight decay

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

well, i tried to train the 100examples.csv for 100 epochs. Never converged. Always map 0.0 and obj_accuracy 0%, while noobj_accuracy is always 100%.
Don't know where, but there is a problem in the code i suppose.
I watched carefully the repository and there are no differences.
Tried with 0.6 of conf_threshold; 0.5 map_iou_threshold and nms_iou_threshold 0.45; learning rate 1e-5 and 0 weight decay

good morning,

calm everything will resolve itself. must be one of those bugs that sometimes come up.
I'm on vacation and I couldn't load the checkpoint, the hotel's WiFi is slow and the file is big.
I will try to load later.

update:
the checkpoint file is 740MB. I can't upload in hotel wifi, to slow :(
In the end of next week I will be at home and i will upload.

by any chance did you try to train with a smaller batch size and in full precision fp32?

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

well, i tried to train the 100examples.csv for 100 epochs. Never converged. Always map 0.0 and obj_accuracy 0%, while noobj_accuracy is always 100%.
Don't know where, but there is a problem in the code i suppose.
I watched carefully the repository and there are no differences.
Tried with 0.6 of conf_threshold; 0.5 map_iou_threshold and nms_iou_threshold 0.45; learning rate 1e-5 and 0 weight decay

@SimoDB90, try to use these pre-trained weights(loss = 1.4) to continue your training:
(batch_size=8, lr=1e-6)
https://drive.google.com/file/d/1utjhWJ-KB11MsWNhWsE_J3xsh9QDMsLL/view?usp=sharing

I did some tests and got the best Map with CONF_THRESHOLD = 0.05 as it is in the config file in the github repository.

I'm wondering if it's worth continuing training until I have a smaller loss. How cool would it be to do this with vision transformer

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

well, i tried to train the 100examples.csv for 100 epochs. Never converged. Always map 0.0 and obj_accuracy 0%, while noobj_accuracy is always 100%.
Don't know where, but there is a problem in the code i suppose.
I watched carefully the repository and there are no differences.
Tried with 0.6 of conf_threshold; 0.5 map_iou_threshold and nms_iou_threshold 0.45; learning rate 1e-5 and 0 weight decay

@SimoDB90, try to use these pre-trained weights(loss = 1.4) to continue your training:
(batch_size=8, lr=1e-6)
https://drive.google.com/file/d/1utjhWJ-KB11MsWNhWsE_J3xsh9QDMsLL/view?usp=sharing

I did some tests and got the best Map with CONF_THRESHOLD = 0.05 as it is in the config file in the github repository.

I'm wondering if it's worth continuing training until I have a smaller loss. How cool would it be to do this with vision transformer

Hi! I'm back from holiday. I tried your weights, but i have this traceback:

RuntimeError: Error(s) in loading state_dict for YOLOv3:
Missing key(s) in state_dict: "layers.0.conv.bias", "layers.1.conv.bias", "layers.2.layers.0.0.conv.bias", "layers.2.layers.0.1.conv.bias", "layers.3.conv.bias", "layers.4.layers.0.0.conv.bias", "layers.4.layers.0.1.conv.bias", "layers.4.layers.1.0.conv.bias", "layers.4.layers.1.1.conv.bias", "layers.5.conv.bias", "layers.6.layers.0.0.conv.bias", "layers.6.layers.0.1.conv.bias", "layers.6.layers.1.0.conv.bias", "layers.6.layers.1.1.conv.bias", "layers.6.layers.2.0.conv.bias", "layers.6.layers.2.1.conv.bias", "layers.6.layers.3.0.conv.bias", "layers.6.layers.3.1.conv.bias", "layers.6.layers.4.0.conv.bias", "layers.6.layers.4.1.conv.bias", "layers.6.layers.5.0.conv.bias", "layers.6.layers.5.1.conv.bias", "layers.6.layers.6.0.conv.bias", "layers.6.layers.6.1.conv.bias", "layers.6.layers.7.0.conv.bias", "layers.6.layers.7.1.conv.bias", "layers.7.conv.bias", "layers.8.layers.0.0.conv.bias", "layers.8.layers.0.1.conv.bias", "layers.8.layers.1.0.conv.bias", "layers.8.layers.1.1.conv.bias", "layers.8.layers.2.0.conv.bias", "layers.8.layers.2.1.conv.bias", "layers.8.layers.3.0.conv.bias", "layers.8.layers.3.1.conv.bias", "layers.8.layers.4.0.conv.bias", "layers.8.layers.4.1.conv.bias", "layers.8.layers.5.0.conv.bias", "layers.8.layers.5.1.conv.bias", "layers.8.layers.6.0.conv.bias", "layers.8.layers.6.1.conv.bias", "layers.8.layers.7.0.conv.bias", "layers.8.layers.7.1.conv.bias", "layers.9.conv.bias", "layers.10.layers.0.0.conv.bias", "layers.10.layers.0.1.conv.bias", "layers.10.layers.1.0.conv.bias", "layers.10.layers.1.1.conv.bias", "layers.10.layers.2.0.conv.bias", "layers.10.layers.2.1.conv.bias", "layers.10.layers.3.0.conv.bias", "layers.10.layers.3.1.conv.bias", "layers.11.conv.bias", "layers.12.conv.bias", "layers.13.layers.0.0.conv.bias", "layers.13.layers.0.1.conv.bias", "layers.14.conv.bias", "layers.15.pred.0.conv.bias", "layers.16.conv.bias", "layers.18.conv.bias", "layers.19.conv.bias", "layers.20.layers.0.0.conv.bias", "layers.20.layers.0.1.conv.bias", "layers.21.conv.bias", "layers.22.pred.0.conv.bias", "layers.23.conv.bias", "layers.25.conv.bias", "layers.26.conv.bias", "layers.27.layers.0.0.conv.bias", "layers.27.layers.0.1.conv.bias", "layers.28.conv.bias", "layers.29.pred.0.conv.bias".

I suspect my network is different from yours, but i used the same in youtube video (the same of repository, i double checked).
Any idea?

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

(...)

I suspect my network is different from yours, but i used the same in youtube video (the same of repository, i double checked).
Any idea?

I'm going to upload the code I used for debugging purposes

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

I can write the code of my network maybe? Il lun 6 set 2021, 16:18 JoaoProductDev @.***> ha scritto:
(...)

Good afternoon,

this is the code I used. I think it's the same as the repository, maybe I have tweaked something but nothing relevant I would say.

Try to use this code with my weights.

yolov3.zip

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

ok.. i want to kill myself... after 2 hours, i found the problem: in the convblock, i put as default argument of batch norm active "False" instead of "True" Now your weights work and at this point i think even the training would work well You were so kind, and you can't imagine how much you helped me! thank you soooooooooooooooo much! Il giorno mar 7 set 2021 alle ore 14:58

no problem, it happens to everyone. glad I could help.

have you ever wondered if CNN is the way to go?

If an architecture were robust enough it wouldn't be sensitive to a One Pixel Attack. CNN lacks context.
Is Vision Transformer the future of computer vision?

sorry for my thoughts, maybe it's not the subject of this thread.

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

from machine-learning-collection.

JoaoCH avatar JoaoCH commented on May 25, 2024

I'm not very into transformers to be honest. I started doing some serious DL study just a couple of months ago. In the future, I'll going deep, I hope. For now, I'm super happy that I understand the code and my Net works. I'm

best of luck

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

from machine-learning-collection.

mmuneeburahman avatar mmuneeburahman commented on May 25, 2024

@ckyrkou Thank you i tried with CONF_THRESHOLD=0.6 , it was working alright @beomgonyu you can please try this and see if that works :-)

Bro can you please share your config parameter values for which you get fine No of Object and MAP accuracy.

from machine-learning-collection.

SimoDB90 avatar SimoDB90 commented on May 25, 2024

@ckyrkou Thank you i tried with CONF_THRESHOLD=0.6 , it was working alright @beomgonyu you can please try this and see if that works :-)

Bro can you please share your config parameter values for which you get fine No of Object and MAP accuracy.

I tried to train 100 example yesterday and the no_obj_acc was painfully 0-2,5% and map 0.02 or less.. With learning rate of 0,005. Don't know if it has to be changed to make the training do something. I tried different values of other parameters but doesn't work.

The weight a couple of post above does work thou. But if you find a way to train it on your own, please let me know. I struggle on Yolo for several weeks

from machine-learning-collection.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.