dawsonjon / fpu Goto Github PK

View Code? Open in Web Editor NEW

514.0 514.0 142.0 102 KB

synthesiseable ieee 754 floating point library in verilog

License: MIT License

Verilog 75.90% C++ 1.19% Python 22.91%

fpu's People

Contributors

Stargazers

Watchers

Forkers

nishanthpanchakshari pratikkadge oter carlosfpga jamessemaj sidharrthnarasimhan ssk1328 eehrich spinlockirqsave chandrab akarshjg mainak1 gsivaperumal xinyuecai2016 cai870808 piranna christeefym volkanchen imranashraf orchid69 sasindugeemal alinabi rexkou elbehery95 nqhitsz yuechengli ritikaray charudatta10 yinxx divyashah98 hongwjung nckunoname minosniu marziehlenjani gyb1325 michaeljclark lvcargnini ghbear2020 f74041056 puxiancheng renaissanxe sathyavathikrish zhuomingliang sanpam jmge saptadeeppal shivakumarjakkani rohki arinjay21 parzival7x jsagoe1 jincxia pyjhzwh momenttt yangjie1994 xiaoerlang0359 musick44 deepakingole darshakd nickolayternovoy hilbertw mfkiwl ganesh-rahate zhuyh128 chh-w citrus-love1 dcaruso abcd-eng jillniu seanli618 gemmanguen tomhuangsrc jaundice penguinliong yangcc2019 silvertrousers jevsilv weberxzq sam-lee-901 mcuby gng1234 tunghoang290780 dogancanyosul prachi-agr dhairyashah01 neel2902 lolwa-coder zhuzhengchao ratkofri newgamezzz lawrence910426 pedromiguelcp sinceprojext mfzhang hecravi-sali yspkm exxcalibur prudent7 andyliugang mutiande

fpu's Issues

missing resp_z

File "./run_test.py", line 60, in run_test
stim_z = open("resp_z");
IOError: [Errno 2] No such file or directory: 'resp_z'

Managing simulation time for divider circuit

This divider code is working for us and its output is being displayed at 1165ns. But our requirement time is below 500 ns.
So can you suggest us ways to reduce the simulation time to 500ns?

Divider Explanation

Hi! I am a student and I stumbled on your work trying to understand how a floating point divider works but my module is something like this:

where the exception code is:

So I was wondering how does your divider.v relate to this module? I purely want to understand the working of your code and meaning behind: input_a_stb; input_a_ack; etc. like this.

Looking forward for a reply!

double_multiplier.v Line185

why need *4? (product <= a_m * b_m * 4;)
1.a_m * 1.b_m will 106 bits , why need 108 bits?

Regarding Multiplication and exponent

Sir I was going through your code for the implementation of IEEE 754 floatingpoint multiplier. I noticed that you used z_e <= a_e + b_e + 1; product <= a_m * b_m * 4; but I was bit confused as I think the code should be z_e <= a_e + b_e + 127; product <= a_m * b_m ;. But your syntax gives the correct answer but mine does not. Can you explain the logic behind using it?

multiplier slow converge

Hi, Jon

I just found another interesting trace that multiplier is stuck at state normalize_2 for a long time.

input:
a : 32'h5e_c72e (denormal)
b : 32'h1f_ddeb (denormal)

I have uploaded the testbench in here.

Double multiplier rounding error

double_multiplier.v:228
"if (z_m == 53'hffffff) begin"
should be
"if (z_m == 53'h1fffffffffffff) begin"

int_to_float returning 0 on certain inputs

When the module is given this input 32'b00010101010101100011010111001010
it outputs 0 or something close to zero (eg 10^-9)

multiplier mismatch

Hi, dawsonjon

I run C-to-RTL formal verification on the multiplier and found there is a mismatch.

input:
a : 32'h7f80_0000 (infinity)
b : 32'h0

output:
C implementation : 32'hFFC00000 (NAN)
RTL implementation : 32'h7f80_0000 (infinity)

I googled the IEEE 754 standard and found this table :

Table 4. Multiplication of operands.

[ref: http://techdocs.altium.com/display/FPGA/IEEE+754+Standard+-+Overview]

It seems multiple an Infinity to zero is NaN.

Add extra operations

min
max
copysign

Merge `get_a` and `put_z` states

I've seen that the get_a and put_z states could be merged in a single one without too much hassle earning some cycles, since the conditions to accept a new request could be done at the same time a result is given, allowing to overlap the requests on that stage. How do you see it?

Implement `fpu` module

Create a simple fpu module that host all the other components and can be used as a black-box. It can be just a wrapper over all the other components, just routing the a, b and z data wires and their signal ones according to an op selector, almost like a "kitchen sink" example of how to use the components.

In a future iteration, maybe it would be nice to create another more advanced one that allow execution of different operations at the same time to increase performance, with some control using a FIFO or similar to warranty order of execution, but maybe it would be done in an independent project too.

Add comparison operators

Add 32 and 64 bits comparison operators, more exactly:

Add unary operators

Square root would be more difficult, although I've find there are several (naïve?) implementations that we could use.

a question about multiplication and division algorithms

Thanks for sharing the codes.
Can you tell which algorithm exactly is being used for multiplication and division in the codes. I was curious.
Single-precision division is taking >110 cycles, so I guess, these algorithms may not be the ones used in real processors?
(I could not find any contact email, so thought of raising an issue).

g++ command update

g++ -o test test.cpp

Error(xxxxxxxxxxxxxxxxxxxxxxx) in Division output using the following testbench

testbench:-
`timescale 1ns / 1ps

module Divider_tb;

reg clk, rst;
reg [31:0] input_a;
reg input_a_stb;
reg [31:0] input_b;
reg input_b_stb;
reg output_z_ack;
//reg s_input_a_ack,s_input_b_ack;

wire input_a_ack;
wire input_b_ack;
wire [31:0] output_z;
wire output_z_stb;

Flaoting_32_Divider uut(

    .input_a(input_a),
    .input_b(input_b),
    .input_a_stb(input_a_stb),
    .input_b_stb(input_b_stb),
    //.s_input_a_ack(s_input_a_ack),
    //.s_input_b_ack(s_input_b_ack),
    .output_z_ack(output_z_ack),
    .clk(clk),
    .rst(rst),
   
    .output_z(output_z),
    .output_z_stb(output_z_stb),
    .input_a_ack(input_a_ack),
    .input_b_ack(input_b_ack)
    );

 always #5 clk=~clk;

 initial begin
 
         clk= 1'b0;
       
        end

initial
begin

rst=1'b1;
input_a_stb=1'b1;
input_b_stb=1'b1;
output_z_ack=1'b0;
//s_input_a_ack=1'b1;
//s_input_b_ack=1'b1;

#1 rst=1'b0;
#2 input_a=32'b01000010101101101011000000000000;

#1 input_b=32'b00111110000101000000000000000000;
//#2 s_input_a_ack=1'b1;
//#5 s_input_b_ack=1'b0;
end

initial
begin
$monitor("time=",$time,"input_a =%b,input_b=%b,output_z=%b",input_a,input_b,output_z);
end

endmodule

output:-
time= 0
input_a=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
input_b=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
output_z=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
time= 3
input_a =01000010101101101011000000000000,
input_b=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
output_z=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
time= 4
input_a=01000010101101101011000000000000,
input_b=00111110000101000000000000000000,
output_z=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

64 bits?

Could it be possible to expand the data size to 64 bits, maybe by using a flag? I'm doing a Verilog implementation of WebAssembly and the spec needs to do 64 bits floating point operations...

License?

What's the license of the project? According to copying.txt, the text seems to be very similar to MIT, could it be?

How much time does it require to divide two numbers?

I am trying to ask this because, I am using Quartus II. But this tool can't provide information or I can't analyze the output because there is 1 microsecond limit in waveform observation, I can't analyze output beyond 1 micro second. Should division take higher than 1 micro second?

A + -A = +=0 bug

Although fixed in the single fp codebase, this bug still exists in the double codebase. The fix that is in place ensures that A + -A works when A is positive, but not when A is negative (e.g. 0xff80000000000000 + 0x7f80000000000000).

$fopenr update

$fopenr("stim_a"); can be changed to $fopen("stim_a", "r")

Unsigned integer to real

Seems the conversions between integers and real number only support signed ones, how could it be possible to convert from/to unsigned integers and long numbers?

Pipelined Design?

Hi, Jon.
Thanks for your open-source FPU design. Amazing!
Having read the Verilog code of computation unit, I was impressive of the computational flow by a finite state machine.
However, I am wondering how to insert pipeline into your FPU design, for the sake of improving its throughput? The problem has puzzled me for almost half months. I wanna figure out how to insert pipeline in a finite state machine. Is it accessible? Wish for your help. Thank you.

ModuleNotFoundError: No module named 'streams'

:~$ python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import chips.api.api
Traceback (most recent call last):
File "", line 1, in
File "/home/naga/.local/lib/python3.6/site-packages/chips/init.py", line 3, in
import streams, sinks, process, instruction
ModuleNotFoundError: No module named 'streams'