Hello Sir. I am opening this issue as a continutation of Issue <a class="issue-link js

Enquiry regarding Performance of Training about ganav-offroad HOT 14 CLOSED

joeljosejjc commented on September 25, 2024

Enquiry regarding Performance of Training

from ganav-offroad.

Comments (14)

rayguan97 commented on September 25, 2024

A quick note: you can pull the latest code since I just updated the latest version. This version would not give much pressure on the GPU.

I have not seem same behavior on GA-Nav, but it does happen sometimes in general. You can adjust the learning rate, or simply choose the best model from all checkpoints. Sometimes I did see the best performing model of GA-Nav is from the last second or third checkpoint on rellis-3D with 1 or 2 points difference, but no huge degradation like this one. Usually when you change the batch size, you should adjust the learning rate as well for the best performance.
Correct, to my best recollection, I used 4 samples on one GPU, and I also have training setup on 2 2080 GPUs in previous version of code. Have you make sure nothing else is running on the GPU, or have you checked the RAM cost for each sample (so you can have a rough idea whether it can almost fit 4 or it's not even close)? But for the latest code, the computation cost have been greatly reduced, so you may run more samples on one gpu now.
Based on my experience, usually when the number of samples is big (either BN with large number of samples on one gpu, or SyncBN on multiple gpus with total large number of samples), the results is better but not by a large margin (around 0.3% in mIoU). Based on the latest code, I have tried 8 samples per GPU on 2 2080 GPUs, 32 samples on single 3090 GPU, and 32 samples on multiple (x2 or x4, I've lost that information) A5000 GPUs. Based on my record, here are the details results.

GPU | aAcc | mIoU | mAcc
2080 | 95.37 | 88.76 | 93.0 |
3090 | 95.33 | 88.92 | 93.32 |
A5000 | 95.66 | 89.08 | 93.55 |

from ganav-offroad.

joeljosejjc commented on September 25, 2024

Ok sir, I will try implementing the latest version of the repository and see if increasing samples per gpu improves the performance, and if that doesn't work, I'll try adjusting the learning rate as well to optimise the training phase.

Once again, thank you sir for your suggestions and inferences, which helped a lot in resolving the issues I faced in the training phase of the GANav model.

from ganav-offroad.

PaulDurai25 commented on September 25, 2024

thanks in advance... i have a doubt in dataset processing... in image/ folder which data should be organized whether ground truth data or the same label_id data present in annotation folder?

from ganav-offroad.

rayguan97 commented on September 25, 2024

You don’t need to worry about that. You can just leave both in the annotation folder and follow the processing steps.

from ganav-offroad.

PaulDurai25 commented on September 25, 2024

thank you very much...I have tested the latest updated files using, python ./tools/test.py ./trained_models/rellis_group6/ganav_rellis_6.py ./trained_models/rellis_group6/ganav_rellis_6.pth --eval=mIoU with samples_per_gpu=3. but i couldn't get the results what u have obtained...the algorithm miss classifies and not producing proper results... can you please guide me where i am doing wrong...the results obtained during testing is shown below...

from ganav-offroad.

rayguan97 commented on September 25, 2024

Can you describe in more details on how did you download the pth? which folder did you use in the google drive. It's possible that you are running the legacy model so that the weights can not load exactly.

It's also possible that you are not processing the data correctly. Can you make sure that you are running the steps in the readme file correctly?

To double check, I just go through the process from a fresh git pull, download the data and pth, and process it; here is what I got.

from ganav-offroad.

PaulDurai25 commented on September 25, 2024

Thank you very much for your suggestion. may i know which file taken as annotation data? I have taken pylon_camera_node_label_id dataset... is it correct?
I have downloaded GA-Nav-rellis folder for pth and py file

from ganav-offroad.

rayguan97 commented on September 25, 2024

That is correct.

Did you run python ./tools/convert_datasets/rellis_relabel[x].py?

from ganav-offroad.

PaulDurai25 commented on September 25, 2024

yes i ran...

from ganav-offroad.

PaulDurai25 commented on September 25, 2024

sir, I have created new setup and installed newly all packages and ran the code as per your steps. But still i couldn't get the results what you have achieved. can you please suggest where i am doing wrong...I used ganav_rellis_6.py and ganav_rellis_6.pth files downloaded from your updated post... I am getting the following result only....thanks in advance

from ganav-offroad.

PaulDurai25 commented on September 25, 2024

As per your suggestion it may be the chance of data processing method. I have converted .png files into .jpg in image folder data... because test.py asks image folder data in the form of .jpg... is this be the reason? your suggestion please...

from ganav-offroad.

rayguan97 commented on September 25, 2024

A couple of things:

Have you make sure to pull the new code? in your previous comment you mention that the weights could not be fully loaded because of the mismatch. That might be the results of the degradation.
Can you try RUGD dataset and see if you have same issue? That might explain whether you have trouble only with RELLIS-3D dataset, or with the env setup.
One possibility is that the windows version do not fully support some packages and might cause degradation, since I never run this code on windows. But the possibility is very low and instead of performance degradation, it's more likely to have errors in this case. Do you have access to a ubuntu machine, or maybe having a virtual env of linux, as a last resort?

I still need more information to be able to help, including how exactly you set it up and the folder structure if possible.

@joeljosejjc Did you see similar issue, or have trouble reproducing the reported result? I would very appreciate if you could share some feedback, sir. I have no trouble reproducing the results on my end.

from ganav-offroad.

PaulDurai25 commented on September 25, 2024

Thank you for your suggestions sir. I think the 3rd comment may be the issue, i am running on windows machine with conda prompt... i will try to implement into the Linux and let you know about this... thanks once again...

from ganav-offroad.

rayguan97 commented on September 25, 2024

No worry, let me know how it goes.

Regarding your query about the images: The original images are jpg and the annotation processed should be png for rellis, and for RUGD both are png file types.

from ganav-offroad.

Enquiry regarding Performance of Training about ganav-offroad HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent