Comments (5)
Hi, I believe that's the right way of doing gradient accumulation. Indeed, it is a bit cumbersome, but I can't think of a more elegant way to do it. Is it working for you?
from sam.
Hi, I believe that's the right way of doing gradient accumulation. Indeed, it is a bit cumbersome, but I can't think of a more elegant way to do it. Is it working for you?
Thank you for your reply. I'm sorry I haven't started the experiment yet. If I finish the experiment, I will post it as soon as possible.
from sam.
@davda54 Hi, I did a toy experiment with 10,000 categories. Under the same parameters, SAM performs better than SGD!
By the way, is it possible to use the SAM optimizer together with Pytorch AMP? and are there any special usages?Thanks again for your work~
from sam.
Awasome! :) To be honest, I have no experience with AMP, but I don't see any major issues in its combination with SAM.
from sam.
Hi @davda54 , I saw a recent discussion about gradient accumulation with the SAM optimizer and got a bit confused, so would like to make sure if the current SAM is still adaptable with the gradient accumulation method?
from sam.
Related Issues (20)
- "TypeError: __init__() missing 1 required positional argument: 'base_optimizer'" with 'ddp_sharded'' HOT 1
- Any chance for the implementation of the recent Fisher SAM? HOT 3
- Is saving the state by calling .state_dict() sufficient? HOT 4
- sam install HOT 1
- RuntimeError: stack expects a non-empty TensorList?? HOT 1
- RuntimeError: stack expects a non-empty TensorList HOT 2
- i found it hard to implement this optimizer on yolov5.looking forward to s.b. could do me a FAVOR. THX HOT 5
- Training Tips for multiple GPUs may be invalid! HOT 3
- Using SAM with torch.cuda.amp.GradScaler HOT 1
- Setting Rho == 0 is NOT equivalent to running the base optimizer HOT 1
- Wrong Adaptive mode? HOT 1
- SAM yolov5 HOT 1
- Has anyone reproduce the ViT on ImageNet results using this torch implementation? HOT 2
- bayesian-sam HOT 1
- Readme.MD Usage typo issue HOT 1
- SAM doesn't seem to be doing well HOT 2
- `model.no_sync()` should include the forward pass HOT 1
- bypass_bn is missing HOT 1
- Using the step function with closure HOT 1
- Will Layernorm or Groupnorm cause problems?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sam.