Comments (7)
可以先不要amp,用float32试一下吗?我这边暂时没有环境,明天再帮你看一下哈
from paddle.
main_grad不是只用于AMP吗?
from paddle.
inp = paddle.normal(mean=0, std=0.01, shape=[1, 32, 32]).astype('float32')
输入用 fp32 就可以了,amp会把它变成16,改成fp32可以跑通
from paddle.
Paddle框架中目前存在两种支持main_grad的方式:
- 使用
mix_precision_utils.MixPrecisionOptimizer
封装optimizer
- 使用
paddle.amp.decorate
并设置master_grad=True
两种方式不可同时启用,当前分布式环境下建议使用第一种。后续框架将会统一两种用法。
from paddle.
我按照意见,使用FP32输入,并且只使用mix_precision_utils.MixPrecisionOptimizer
封装optimizer,测试仍然会遇到一样的错误。
from paddle.
本地使用下面的脚本运行,可以跑通呀,咱们是不是环境没对齐?
(这个脚本直接把 nlp 的代码复制过来了,单文件就可以跑)
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
from paddle import _C_ops
from paddle.framework import core
import numpy as np
from paddle.distributed.fleet.utils import mix_precision_utils
def is_fused_matmul_bias_supported():
if paddle.is_compiled_with_cuda() and not paddle.is_compiled_with_rocm() or paddle.is_compiled_with_xpu():
return hasattr(core.eager.ops.legacy, "fused_gemm_epilogue")
else:
return False
if is_fused_matmul_bias_supported():
origin_linear = paddle.incubate.nn.functional.fused_linear
else:
origin_linear = paddle.nn.functional.linear
class FusedLinearWithGradAdd(paddle.autograd.PyLayer):
@staticmethod
def forward(ctx, x, weight, bias=None, name=None):
y = origin_linear(x, weight, bias)
ctx.save_for_backward(x, weight, bias)
return y
@staticmethod
def backward(ctx, y_grad):
x, weight, bias = ctx.saved_tensor()
x_grad = paddle.matmul(y_grad, weight, transpose_y=True)
# _C_ops.fused_linear_param_grad_add(x, y_grad, dw, db, multi precision, has bias)
if bias is None:
if hasattr(weight, "main_grad"):
weight.main_grad, _ = _C_ops.fused_linear_param_grad_add(
x, y_grad, weight.main_grad, None, True, False
)
return x_grad, None
else:
if weight.grad is not None:
weight.grad, _ = _C_ops.fused_linear_param_grad_add(x, y_grad, weight.grad, None, False, False)
return x_grad, None
else:
weight_grad, _ = _C_ops.fused_linear_param_grad_add(x, y_grad, None, None, False, False)
return x_grad, weight_grad
if hasattr(weight, "main_grad") and hasattr(bias, "main_grad"):
weight.main_grad, bias.main_grad = _C_ops.fused_linear_param_grad_add(
x, y_grad, weight.main_grad, bias.main_grad, True, True
)
return x_grad, None, None
else:
if weight.grad is not None:
assert bias.grad is not None
weight.grad, bias.grad = _C_ops.fused_linear_param_grad_add(
x, y_grad, weight.grad, bias.grad, False, True
)
return x_grad, None, None
else:
weight_grad, bias_grad = _C_ops.fused_linear_param_grad_add(x, y_grad, None, None, False, True)
return x_grad, weight_grad, bias_grad
def mock_layers():
paddle.nn.functional.linear = FusedLinearWithGradAdd.apply
if is_fused_matmul_bias_supported():
paddle.incubate.nn.functional.fused_linear = FusedLinearWithGradAdd.apply
mock_layers()
def create_optimizer(model, use_pure_bf16, use_main_grad):
if use_main_grad:
assert use_pure_bf16
model = mix_precision_utils.MixPrecisionLayer(model, dtype="bfloat16")
optimizer = paddle.optimizer.AdamW(
parameters=model.parameters(),
learning_rate=0.0001,
multi_precision=use_pure_bf16,
)
if use_main_grad:
optimizer = mix_precision_utils.MixPrecisionOptimizer(optimizer)
return optimizer
class Net(paddle.nn.Layer):
"""Network use for recompute testing"""
def __init__(self):
super().__init__()
self.layer = paddle.nn.Linear(32, 32)
def forward(self, inp):
out = self.layer(inp)
return out
def main():
paddle.seed(10)
model = Net()
optimizer = create_optimizer(model, use_pure_bf16=True, use_main_grad=True)
model = paddle.amp.decorate(models=model, dtype="bfloat16", level='O2', master_grad=True)
model.train()
for _ in range(10):
inp = paddle.normal(mean=0, std=0.01, shape=[1, 32, 32]).astype('float32')
inp.stop_gradient = False
with paddle.amp.auto_cast(True, level="O2", dtype="bfloat16"):
out = model(inp)
loss = out.mean()
loss.backward()
optimizer.step()
optimizer.clear_grad()
print(loss)
if __name__ == "__main__":
main()
from paddle.
已解决,关掉这个issue
from paddle.
Related Issues (20)
- 缺少torch.nn.utils.rnn.pad_sequence的API或者实现 HOT 2
- 在瑞芯微3568上已经部署好Fastdeploy。请问example中Yolov5的自己的onnx模型怎么使用呢? HOT 1
- paddlepaddle==2.6.0和2.6.1适配国产化与非国产化代码返回不一致问题 HOT 1
- Torch MultiHeadAttention To Paddle MultiHeadAttention Issue HOT 2
- softmax_with_cross_entropy API在软标签下label是否需要归一化 HOT 3
- 单机多卡问题 HOT 4
- Reported errors after running paddle.utils.run_check() HOT 2
- paddle.sparse.matmul两个参数都是sparse_csr_tensor时报错RuntimeError: (NotFound) The kernel `matmul_csr_csr` is not registered. HOT 2
- 这个教程代码有问题,https://www.paddlepaddle.org.cn/documentation/docs/zh/practices/nlp/transformer_in_English-to-Spanish.html
- Paddle的 Dataloader 遇到纯文本数据时报错 StopIteration HOT 1
- How to solve "license/cla Expected — Waiting for status to be reported" HOT 1
- 使用--use_trt出错 HOT 9
- 单机多卡问题 HOT 1
- templatedoc 机制清理 HOT 2
- 在华为Atlas310pro编译Paddle_npu版本,cmake有警告,make报错 HOT 3
- 【快乐开源】PIR模式下单测问题修复与适配
- 【开源之夏】动转静支持子图高阶微分
- phi::Device::SynchronizeStream传入的stream的raw_stream成员为空指针是否是正常的? HOT 24
- Does PP has a flowchart recognize product? HOT 2
- 如何将一个Tensor在内存中(不要保存到硬盘)序列化成可以通过网络传输的二进制格式? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddle.