riscv / riscv-bfloat16 Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://jira.riscv.org/browse/RVG-122
License: Creative Commons Attribution 4.0 International
Home Page: https://jira.riscv.org/browse/RVG-122
License: Creative Commons Attribution 4.0 International
Page six talks about "encopding"
Probably a typo of "encoding"
There are no scalar operands on the instructions in Zvfbfmin. Could we shift the Zfbfmin dependency to Zvfbfwma which does have scalar operands?
In line with the other issue (#51), there is a question mark about rounding mode support. This is less clear, as Google has not defined the used rounding mode, Intel is using RNE, ARM is using RTO or makes it selectable, and NVIDIA is using RTZ.
My argument is that with the original intent of trading precision for hardware efficiency the choice would be to do RTZ rounding (as that's free). The current draft specifies a selectable rounding mode which is consistent with other floating point extensions but would be quite costly in comparison to just RTZ.
Is there a way to enable a choice to just do RTZ (or even make it the default)? I guess further sub-extensions would work for options but it becomes a bit unwieldy.
vfwmaccbf16
and vfwmulbf16
descriptions do not specify that SEW=16 is required, unlike the vector conversion instructions. Is this correct?
The v0.0.1 document does not mention NaN boxing. I can see cases where it should be done, and others where the upper bits should simply be ignored. It could be instruction-specific. Please clarify the behavior.
Please can you confirm that BF16 operations are intended to be potentially mixed with FP16 operations without any CSR modifications?
I previously proposed #34 which addressed this. 94fcf6b added extra text on vector extension dependencies, however:
In specification 2.2.5, the supported rounding modes are listing except "DYN".
Is dynamic rounding mode reserved for FCVT.BF16.S and FCVT.S.BF16?
The PDF specification does not seem to specify full bit patterns to decode the described instructions - many fields just have names. Is this defined somewhere else, or are precise decodes still undefined?
Thanks.
https://github.com/riscv/riscv-bfloat16/blob/main/doc/insns/fcvt_BF16_S.adoc mentions a S.B16
field but it seems the other mention has been commented out. Should this mention be changed to BF16.S
?
The following says that a BF16 implementation must implement FP16:
The BFloat16 extensions depend on the half-precision floating-point extensions (Zfh and Zfhmin), which in turn rely on the single-precision floating-point extension (F).
Can you please clarify this requirement? Is the motivation to get load and stores? This appears to be overkill.
The instructions that convert to BFLOAT16 (FCVT.BF16.S
, vfncvtbf16.f.f.w
) do not say that Underflow can be signalled. Is this correct?
In IEEE-754, Underflow should be signalled if, for the result of a floating point operation:
Should conversion of a subnormal FP32 argument which rounds fraction bits and produces a subnormal or zero result therefore signal Underflow?
The descriptions of the vector instructions here refer to the RM field, but vector instructions have no such field. The descriptions should say "current rounding mode in fscr
", or something similar.
The current version of the spec draft states:
The BF16 extensions do not add any new load or store instructions, as the FLH and FSH 16-bit load and store instructions introduced by the half-precision extensions work just fine for BF16 values.
This is only true for implementations which use the standard IEEE encoding to store floating-point number in the RISC-V F registers. Any specific encoding would certainly not encode BF16 and half precision values in the same way. As this issue already been raised in the group discussion ?
I'll start with apologising for letting this linger for so long, making it painful so late in the process.
I've recently finished looking into subnormal support by other ISAs (for a RISC-V summit europe abstract).
The summary is that other ISAs mostly flush subnormals, namely Google's TPU1, Intel's AVX-512_BF162, and ARM's v8.02A3. ARM does have an optional extended BF16 support, where subnormal support becomes selectable, and NVIDIA also supports subnormals4. As far as I'm aware subnormal flushing applies to conversions as well as arithmetic.
I'd argue Google's TPU is the closest thing to a standard for BF16. Also the motivation for the BF16 format is to trade precision for hardware efficiency as ML often does not seem to need that precision. Both would argue for flushing subnormals (hardware support for this extension would be very cheap).
Thoughts? I can highlight this issue on the FP SIG list for further input?
1 S.Wang, P.Kanwar. “BFloat16: The secret to high performance on Cloud TPUs”.
2 Intel. “BFLOAT16 Hardware Numerics Definitions”.
3 Arm® Architecture Reference Manual, Armv8-A.
4 Fasi et al. “Numerical behavior of NVIDIA tensor Cores.
The Zfa extension describes fround.h
as "encoded like FCVT.H.S, but with rs2=4". This collides with the proposed fcvt.bf16.s encoding which also uses 4 in the rs2 position:
field bits<32> Inst = { 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, rs1{4}, rs1{3}, rs1{2}, rs1{1}, rs1{0}, frm{2}, frm{1}, frm{0}, rd{4}, rd{3}, rd{2}, rd{1}, rd{0}, 1, 0, 1, 0, 0, 1, 1 };
The argument order for vfwmaccbf16
is not consistent with similar instructions in the base vector extension. The order is specified here as:
vfwmaccbf16.vv vd, vs2, vs1, vm
vfwmaccbf16.vf vd, vs2, rs1, vm
Whereas similar instructions in the base vector extension are like this:
vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i]
Should these instructions be redefined to match the similar base instructions? Note that only the FMA type instructions in the base use this order - most other binary operations use the order as defined here. I'm not sure why this is.
Thanks.
Hi, I have downloaded pdf from https://github.com/riscv/riscv-bfloat16/releases/tag/20230322 and see the vfwmaccbf16 encoding is 100011
but vfwmaccbf16 in
https://github.com/riscv/riscv-bfloat16/blob/main/doc/insns/vfwmaccbf16.adoc
is written as 111011
which one should be followed.
I've started this PR on the psABI doc to look ahead to the (minor) modifications needed to account for BFloat16. Feel free to close this if there's an objection to having a tracking issue for something at another repo, but I thought it was worth advertising to those working on this spec, who may have additional insight.
All of the existing floating point extensions have a *inx
variant (zfinx, zdinx, zhinx, and zhinxmin). Do you plan to define one for for zfbfinxmin?
Is there a plan for a full set of operations, e.g. like FP32 in the F standard, for bf16? Are we driving towards a BF16 FADD etc?
Can anyone tell me the status of RISC-V bf16. So far BF16 is very important. Even more important than Fp16. I am wondering when we will RISC-V extension for BF16? Will scalar BF16 instrucions be similiar to ZFH? Thanks~
The table naming the FP formats in https://github.com/riscv/riscv-bfloat16/blob/main/doc/riscv-bfloat16-format.adoc is named "Obligatory Floating Point Format Table" and list some formats specified in IEEE-754 but also some formats which are not (e.g. BF16
and TF32
). Is the term "obligatory" appropriate in that context ?
The encoding for fcvt.bf16.s is conflict with fround.h in Zfa extension:
" If the Zfh extension is implemented, FROUND.H and FROUNDNX.H instructions are analogously
defined to operate on half-precision numbers. They are encoded like FCVT.H.S, but with rs2=4
and 5, respectively,"
Could it make sense to add right away (in Zvbfbmin
) a BF16 version of RVV 1.0 vfncvt.rod.f.f.w vd, vs2, vm
?
As round-to-odd might be quite useful for use cases of this type of conversions and I do not think it is available by default in https://github.com/riscv/riscv-bfloat16/blob/main/doc/insns/vfncvtbf16_f_f_w.adoc (since frm
does offer round-to-odd).
Does the BF16 extension support the FNORM FPR encoding?
FP64 and FP32 have corresponding unique load and store operations to match the operand data size. Overloading FP16 to support BF16 while matching operand size does not specify the end format.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.