<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

About gradient accumulation about sam HOT 5 CLOSED

davda54 commented on July 17, 2024 1

About gradient accumulation

from sam.

Comments (5)

davda54 commented on July 17, 2024

Hi, I believe that's the right way of doing gradient accumulation. Indeed, it is a bit cumbersome, but I can't think of a more elegant way to do it. Is it working for you?

from sam.

bobo0810 commented on July 17, 2024

Hi, I believe that's the right way of doing gradient accumulation. Indeed, it is a bit cumbersome, but I can't think of a more elegant way to do it. Is it working for you?

Thank you for your reply. I'm sorry I haven't started the experiment yet. If I finish the experiment, I will post it as soon as possible.

from sam.

bobo0810 commented on July 17, 2024

@davda54 Hi, I did a toy experiment with 10,000 categories. Under the same parameters, SAM performs better than SGD!

By the way, is it possible to use the SAM optimizer together with Pytorch AMP? and are there any special usages?Thanks again for your work~

from sam.

davda54 commented on July 17, 2024

Awasome! :) To be honest, I have no experience with AMP, but I don't see any major issues in its combination with SAM.

from sam.

ODD2 commented on July 17, 2024

Hi @davda54 , I saw a recent discussion about gradient accumulation with the SAM optimizer and got a bit confused, so would like to make sure if the current SAM is still adaptable with the gradient accumulation method?

from sam.

Recommend Projects

About gradient accumulation about sam HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent