Comments (10)
Hi, @achen46, I checked your log and the code, and found the model building did not account for the configs in the YAML file, which means the drop path rate is always set to 0.1. I have pushed a fix, could you try again the training on your side? Thanks again for raising the problem!
from focalnet.
Thank you @jwyang for quick response.The config used can be found here:
https://drive.google.com/file/d/19CX-4jOWKtOyncW7b8WaQb5PR5Z6dRgY/view?usp=sharing
from focalnet.
thanks! I will take a look into the details.
from focalnet.
Hi, @achen46 thanks for sharing this issue. I never encountered this problem on my side. but would love to investigate it based on your log. Will get back to you asap.
from focalnet.
Hi @jwyang , thank you so much for addressing this issue. I just started training so will let you know if anything goes wrong. But locally, I was testing the code and noticed the memory usage is almost 2 times than when we use FocalNet in the original Swin Transformer repository. Basically, using the same network, FocalNet, with the same batch size, one expects similar VRAM usage. Do you know what might cause this high memory usage in your repository ?
Thanks again
from focalnet.
Hi, @achen46, that's really great observation! I did not compare them the way you are doing. I will do an inspection on it. thanks for raising this issue! Also let me know if you still observe nan on your side when training FocalNet.
from focalnet.
Hi, @achen46 I checked on my side the memory usage issue and found the memory cost on both repos are almost the same. Please provide more details on your side if you think there is still an issue. thanks!
from focalnet.
I am closing this issue but feel free to reopen it if the issue still exists.
from focalnet.
Hi @jwyang
I did more testing with your revised repository. The problem still exists as the overflow happens. I did try your focalnet_small_lrf with the original Swin Transformer repository with the same setup, but could only achieve a top1 accuracy of 83.03 which is lower than paper.
In general, I am not sure what is different between this repository and Swin Transformer, but performance is not achieved. Could you comment more on what is being done differently here ( except for the model obviously) ?
from focalnet.
Screenshot of training accuracy and loss (due to gradient overflow) when using the updated code:
P.S: I don't think dropout path has anything to do with this. I use 0.3 here.
from focalnet.
Related Issues (20)
- FocalNet-H Pre-trained Model HOT 3
- Inference on CPU HOT 2
- Focal-Stable-DINO repo HOT 1
- Can Train on 1 GPU?
- How to modify in optimizer_config HOT 2
- It seems that the shape of pre-trained focalnet_base_lrf is incompatible with Dino-base.
- Support for Weakly Supervised Learning Setup in FocalNet Training HOT 1
- Training Time for "FocalNet for Object Detection with DINO"
- All checkpoint downloads unavailable HOT 6
- The speed is relatively slow despite its low MACs.
- How to download Focal-DINO weights ? Model zoo links throw error HOT 5
- FocalNet-DINO checkpoints from huggingface. config HOT 1
- Increasing batch size negatively impacts mAP, is it because of padding ? HOT 1
- visualize.ipynb not found
- Cannot download pre-trained weight files from Git repository. HOT 1
- Focal modulation networks for studying the Earth system HOT 1
- Incomplete pre-trained weights
- Where is the pretrained model of Focalnet with ATSS? I clicked the 'ckpt', but got no information. HOT 1
- Multi Node Slurm Training
- Inquiry Regarding mmcv Version and Mask R-CNN COCO Model Download
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from focalnet.