Code Monkey home page Code Monkey logo

Comments (10)

jwyang avatar jwyang commented on September 18, 2024 2

Hi, @achen46, I checked your log and the code, and found the model building did not account for the configs in the YAML file, which means the drop path rate is always set to 0.1. I have pushed a fix, could you try again the training on your side? Thanks again for raising the problem!

from focalnet.

achen46 avatar achen46 commented on September 18, 2024 1

Thank you @jwyang for quick response.The config used can be found here:
https://drive.google.com/file/d/19CX-4jOWKtOyncW7b8WaQb5PR5Z6dRgY/view?usp=sharing

from focalnet.

jwyang avatar jwyang commented on September 18, 2024 1

thanks! I will take a look into the details.

from focalnet.

jwyang avatar jwyang commented on September 18, 2024

Hi, @achen46 thanks for sharing this issue. I never encountered this problem on my side. but would love to investigate it based on your log. Will get back to you asap.

from focalnet.

achen46 avatar achen46 commented on September 18, 2024

Hi @jwyang , thank you so much for addressing this issue. I just started training so will let you know if anything goes wrong. But locally, I was testing the code and noticed the memory usage is almost 2 times than when we use FocalNet in the original Swin Transformer repository. Basically, using the same network, FocalNet, with the same batch size, one expects similar VRAM usage. Do you know what might cause this high memory usage in your repository ?

Thanks again

from focalnet.

jwyang avatar jwyang commented on September 18, 2024

Hi, @achen46, that's really great observation! I did not compare them the way you are doing. I will do an inspection on it. thanks for raising this issue! Also let me know if you still observe nan on your side when training FocalNet.

from focalnet.

jwyang avatar jwyang commented on September 18, 2024

Hi, @achen46 I checked on my side the memory usage issue and found the memory cost on both repos are almost the same. Please provide more details on your side if you think there is still an issue. thanks!

from focalnet.

jwyang avatar jwyang commented on September 18, 2024

I am closing this issue but feel free to reopen it if the issue still exists.

from focalnet.

achen46 avatar achen46 commented on September 18, 2024

Hi @jwyang

I did more testing with your revised repository. The problem still exists as the overflow happens. I did try your focalnet_small_lrf with the original Swin Transformer repository with the same setup, but could only achieve a top1 accuracy of 83.03 which is lower than paper.

In general, I am not sure what is different between this repository and Swin Transformer, but performance is not achieved. Could you comment more on what is being done differently here ( except for the model obviously) ?

from focalnet.

achen46 avatar achen46 commented on September 18, 2024

Screenshot of training accuracy and loss (due to gradient overflow) when using the updated code:

acc

loss

P.S: I don't think dropout path has anything to do with this. I use 0.3 here.

from focalnet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.