Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Gradient Overflow Issue for focalnet_small_lrf about focalnet HOT 10 CLOSED

microsoft commented on September 18, 2024

Gradient Overflow Issue for focalnet_small_lrf

from focalnet.

Comments (10)

jwyang commented on September 18, 2024 2

Hi, @achen46, I checked your log and the code, and found the model building did not account for the configs in the YAML file, which means the drop path rate is always set to 0.1. I have pushed a fix, could you try again the training on your side? Thanks again for raising the problem!

from focalnet.

achen46 commented on September 18, 2024 1

Thank you @jwyang for quick response.The config used can be found here:
https://drive.google.com/file/d/19CX-4jOWKtOyncW7b8WaQb5PR5Z6dRgY/view?usp=sharing

from focalnet.

jwyang commented on September 18, 2024 1

thanks! I will take a look into the details.

from focalnet.

jwyang commented on September 18, 2024

Hi, @achen46 thanks for sharing this issue. I never encountered this problem on my side. but would love to investigate it based on your log. Will get back to you asap.

from focalnet.

achen46 commented on September 18, 2024

Hi @jwyang , thank you so much for addressing this issue. I just started training so will let you know if anything goes wrong. But locally, I was testing the code and noticed the memory usage is almost 2 times than when we use FocalNet in the original Swin Transformer repository. Basically, using the same network, FocalNet, with the same batch size, one expects similar VRAM usage. Do you know what might cause this high memory usage in your repository ?

Thanks again

from focalnet.

jwyang commented on September 18, 2024

Hi, @achen46, that's really great observation! I did not compare them the way you are doing. I will do an inspection on it. thanks for raising this issue! Also let me know if you still observe nan on your side when training FocalNet.

from focalnet.

jwyang commented on September 18, 2024

Hi, @achen46 I checked on my side the memory usage issue and found the memory cost on both repos are almost the same. Please provide more details on your side if you think there is still an issue. thanks!

from focalnet.

jwyang commented on September 18, 2024

I am closing this issue but feel free to reopen it if the issue still exists.

from focalnet.

achen46 commented on September 18, 2024

Hi @jwyang

I did more testing with your revised repository. The problem still exists as the overflow happens. I did try your focalnet_small_lrf with the original Swin Transformer repository with the same setup, but could only achieve a top1 accuracy of 83.03 which is lower than paper.

In general, I am not sure what is different between this repository and Swin Transformer, but performance is not achieved. Could you comment more on what is being done differently here ( except for the model obviously) ?

from focalnet.

achen46 commented on September 18, 2024

Screenshot of training accuracy and loss (due to gradient overflow) when using the updated code:

P.S: I don't think dropout path has anything to do with this. I use 0.3 here.

from focalnet.

Gradient Overflow Issue for focalnet_small_lrf about focalnet HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent