Code Monkey home page Code Monkey logo

riscv-bfloat16's People

Contributors

a4lg avatar asb avatar cetola avatar hdelassus avatar jnk0le avatar kdockser avatar krovers avatar liweiwei90 avatar nibrunieatsi5 avatar rpsene avatar wmat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

riscv-bfloat16's Issues

typo on page 6

Page six talks about "encopding"
Probably a typo of "encoding"

Why does Zvfbfmin depend on Zfbfmin

There are no scalar operands on the instructions in Zvfbfmin. Could we shift the Zfbfmin dependency to Zvfbfwma which does have scalar operands?

Rounding mode

In line with the other issue (#51), there is a question mark about rounding mode support. This is less clear, as Google has not defined the used rounding mode, Intel is using RNE, ARM is using RTO or makes it selectable, and NVIDIA is using RTZ.

My argument is that with the original intent of trading precision for hardware efficiency the choice would be to do RTZ rounding (as that's free). The current draft specifies a selectable rounding mode which is consistent with other floating point extensions but would be quite costly in comparison to just RTZ.

Is there a way to enable a choice to just do RTZ (or even make it the default)? I guess further sub-extensions would work for options but it becomes a bit unwieldy.

NaN box handling in Bfloat16

The v0.0.1 document does not mention NaN boxing. I can see cases where it should be done, and others where the upper bits should simply be ignored. It could be instruction-specific. Please clarify the behavior.

Dependencies of vector bfloat16 extensions could be clearer

I previously proposed #34 which addressed this. 94fcf6b added extra text on vector extension dependencies, however:

  • It only added it to riscv-bfloat16-extensions.adoc, not to the individual extension descriptions (doc/riscv-bfloat16-{zvfbfmin,zvfbfwma.adoc}. Including the dependency information in the extension description means the information is more easily discoverable, and also matches the presentation used in the existing V spec.
  • Zve64f depends on Zve32f and Zve64d depends on Zve64f. Therefore, it's sufficient to say these extensions depend on Zve32f (which is the approach I took in #34). This matches what is done in the current V spec (this was adopted in riscv/riscv-v-spec#845).

Are full instruction encodings available somewhere?

The PDF specification does not seem to specify full bit patterns to decode the described instructions - many fields just have names. Is this defined somewhere else, or are precise decodes still undefined?

Thanks.

Reliance on zfh and zfhmin

The following says that a BF16 implementation must implement FP16:

The BFloat16 extensions depend on the half-precision floating-point extensions (Zfh and Zfhmin), which in turn rely on the single-precision floating-point extension (F).

Can you please clarify this requirement? Is the motivation to get load and stores? This appears to be overkill.

Should conversions to BFLOAT16 signal Underflow?

The instructions that convert to BFLOAT16 (FCVT.BF16.S, vfncvtbf16.f.f.w) do not say that Underflow can be signalled. Is this correct?

In IEEE-754, Underflow should be signalled if, for the result of a floating point operation:

  1. The result is smaller than the smallest normal value representable in the type; and:
  2. The operation results in a loss of precision.

Should conversion of a subnormal FP32 argument which rounds fraction bits and produces a subnormal or zero result therefore signal Underflow?

Vector instructions have no RM field

The descriptions of the vector instructions here refer to the RM field, but vector instructions have no such field. The descriptions should say "current rounding mode in fscr", or something similar.

use of FLH and FSH for BF 16 memory load / store

The current version of the spec draft states:

The BF16 extensions do not add any new load or store instructions, as the FLH and FSH 16-bit load and store instructions introduced by the half-precision extensions work just fine for BF16 values.

This is only true for implementations which use the standard IEEE encoding to store floating-point number in the RISC-V F registers. Any specific encoding would certainly not encode BF16 and half precision values in the same way. As this issue already been raised in the group discussion ?

https://github.com/riscv/riscv-bfloat16/blob/29ffa22aa440e549c4ded7abad6328d10f182f85/doc/riscv-bfloat16-extensions.adoc

Subnormal flushing

I'll start with apologising for letting this linger for so long, making it painful so late in the process.

I've recently finished looking into subnormal support by other ISAs (for a RISC-V summit europe abstract).
The summary is that other ISAs mostly flush subnormals, namely Google's TPU1, Intel's AVX-512_BF162, and ARM's v8.02A3. ARM does have an optional extended BF16 support, where subnormal support becomes selectable, and NVIDIA also supports subnormals4. As far as I'm aware subnormal flushing applies to conversions as well as arithmetic.

I'd argue Google's TPU is the closest thing to a standard for BF16. Also the motivation for the BF16 format is to trade precision for hardware efficiency as ML often does not seem to need that precision. Both would argue for flushing subnormals (hardware support for this extension would be very cheap).

Thoughts? I can highlight this issue on the FP SIG list for further input?

1 S.Wang, P.Kanwar. “BFloat16: The secret to high performance on Cloud TPUs”.
2 Intel. “BFLOAT16 Hardware Numerics Definitions”.
3 Arm® Architecture Reference Manual, Armv8-A.
4 Fasi et al. “Numerical behavior of NVIDIA tensor Cores.

fcvt.bf16.s encoding collides with fround.h from zfa

The Zfa extension describes fround.h as "encoded like FCVT.H.S, but with rs2=4". This collides with the proposed fcvt.bf16.s encoding which also uses 4 in the rs2 position:

field bits<32> Inst = { 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, rs1{4}, rs1{3}, rs1{2}, rs1{1}, rs1{0}, frm{2}, frm{1}, frm{0}, rd{4}, rd{3}, rd{2}, rd{1}, rd{0}, 1, 0, 1, 0, 0, 1, 1 };

vfwmaccbf16 argument order inconsistent with similar base vector instructions

The argument order for vfwmaccbf16 is not consistent with similar instructions in the base vector extension. The order is specified here as:

vfwmaccbf16.vv vd, vs2, vs1, vm
vfwmaccbf16.vf vd, vs2, rs1, vm

Whereas similar instructions in the base vector extension are like this:

vfwmacc.vv vd, vs1, vs2, vm    # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vfwmacc.vf vd, rs1, vs2, vm    # vd[i] = +(f[rs1] * vs2[i]) + vd[i]

Should these instructions be redefined to match the similar base instructions? Note that only the FMA type instructions in the base use this order - most other binary operations use the order as defined here. I'm not sure why this is.

Thanks.

Adding BFloat16 to the psABI doc

I've started this PR on the psABI doc to look ahead to the (minor) modifications needed to account for BFloat16. Feel free to close this if there's an objection to having a tracking issue for something at another repo, but I thought it was worth advertising to those working on this spec, who may have additional insight.

Zfbfinxmin extension?

All of the existing floating point extensions have a *inx variant (zfinx, zdinx, zhinx, and zhinxmin). Do you plan to define one for for zfbfinxmin?

Question about status of riscv bf16

Can anyone tell me the status of RISC-V bf16. So far BF16 is very important. Even more important than Fp16. I am wondering when we will RISC-V extension for BF16? Will scalar BF16 instrucions be similiar to ZFH? Thanks~

Encoding conflict with Zfa extension

The encoding for fcvt.bf16.s is conflict with fround.h in Zfa extension:
" If the Zfh extension is implemented, FROUND.H and FROUNDNX.H instructions are analogously
defined to operate on half-precision numbers. They are encoded like FCVT.H.S, but with rs2=4
and 5, respectively,"

Support for FNORM FPR encoding

Does the BF16 extension support the FNORM FPR encoding?

FP64 and FP32 have corresponding unique load and store operations to match the operand data size. Overloading FP16 to support BF16 while matching operand size does not specify the end format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.