dawsonjon / fpu Goto Github PK
View Code? Open in Web Editor NEWsynthesiseable ieee 754 floating point library in verilog
License: MIT License
synthesiseable ieee 754 floating point library in verilog
License: MIT License
File "./run_test.py", line 60, in run_test
stim_z = open("resp_z");
IOError: [Errno 2] No such file or directory: 'resp_z'
This divider code is working for us and its output is being displayed at 1165ns. But our requirement time is below 500 ns.
So can you suggest us ways to reduce the simulation time to 500ns?
Hi! I am a student and I stumbled on your work trying to understand how a floating point divider works but my module is something like this:
So I was wondering how does your divider.v relate to this module? I purely want to understand the working of your code and meaning behind: input_a_stb;
input_a_ack;
etc. like this.
Looking forward for a reply!
why need *4? (product <= a_m * b_m * 4;)
1.a_m * 1.b_m will 106 bits , why need 108 bits?
Sir I was going through your code for the implementation of IEEE 754 floatingpoint multiplier. I noticed that you used z_e <= a_e + b_e + 1; product <= a_m * b_m * 4; but I was bit confused as I think the code should be z_e <= a_e + b_e + 127; product <= a_m * b_m ;. But your syntax gives the correct answer but mine does not. Can you explain the logic behind using it?
Hi, Jon
I just found another interesting trace that multiplier is stuck at state normalize_2 for a long time.
input:
a : 32'h5e_c72e (denormal)
b : 32'h1f_ddeb (denormal)
I have uploaded the testbench in here.
double_multiplier.v:228
"if (z_m == 53'hffffff) begin"
should be
"if (z_m == 53'h1fffffffffffff) begin"
When the module is given this input 32'b00010101010101100011010111001010
it outputs 0 or something close to zero (eg 10^-9)
Hi, dawsonjon
I run C-to-RTL formal verification on the multiplier and found there is a mismatch.
input:
a : 32'h7f80_0000 (infinity)
b : 32'h0
output:
C implementation : 32'hFFC00000 (NAN)
RTL implementation : 32'h7f80_0000 (infinity)
I googled the IEEE 754 standard and found this table :
Table 4. Multiplication of operands.
[ref: http://techdocs.altium.com/display/FPGA/IEEE+754+Standard+-+Overview]
It seems multiple an Infinity to zero is NaN.
min
max
copysign
I've seen that the get_a
and put_z
states could be merged in a single one without too much hassle earning some cycles, since the conditions to accept a new request could be done at the same time a result is given, allowing to overlap the requests on that stage. How do you see it?
Create a simple fpu
module that host all the other components and can be used as a black-box. It can be just a wrapper over all the other components, just routing the a
, b
and z
data wires and their signal ones according to an op
selector, almost like a "kitchen sink" example of how to use the components.
In a future iteration, maybe it would be nice to create another more advanced one that allow execution of different operations at the same time to increase performance, with some control using a FIFO or similar to warranty order of execution, but maybe it would be done in an independent project too.
Add 32 and 64 bits comparison operators, more exactly:
eq
ne
lt
gt
le
ge
abs
neg
ceil
floor
trunc
nearest
sqrt
Square root would be more difficult, although I've find there are several (naïve?) implementations that we could use.
Thanks for sharing the codes.
Can you tell which algorithm exactly is being used for multiplication and division in the codes. I was curious.
Single-precision division is taking >110 cycles, so I guess, these algorithms may not be the ones used in real processors?
(I could not find any contact email, so thought of raising an issue).
g++ -o test test.cpp
testbench:-
`timescale 1ns / 1ps
module Divider_tb;
reg clk, rst;
reg [31:0] input_a;
reg input_a_stb;
reg [31:0] input_b;
reg input_b_stb;
reg output_z_ack;
//reg s_input_a_ack,s_input_b_ack;
wire input_a_ack;
wire input_b_ack;
wire [31:0] output_z;
wire output_z_stb;
Flaoting_32_Divider uut(
.input_a(input_a),
.input_b(input_b),
.input_a_stb(input_a_stb),
.input_b_stb(input_b_stb),
//.s_input_a_ack(s_input_a_ack),
//.s_input_b_ack(s_input_b_ack),
.output_z_ack(output_z_ack),
.clk(clk),
.rst(rst),
.output_z(output_z),
.output_z_stb(output_z_stb),
.input_a_ack(input_a_ack),
.input_b_ack(input_b_ack)
);
always #5 clk=~clk;
initial begin
clk= 1'b0;
end
initial
begin
rst=1'b1;
input_a_stb=1'b1;
input_b_stb=1'b1;
output_z_ack=1'b0;
//s_input_a_ack=1'b1;
//s_input_b_ack=1'b1;
#1 rst=1'b0;
#2 input_a=32'b01000010101101101011000000000000;
#1 input_b=32'b00111110000101000000000000000000;
//#2 s_input_a_ack=1'b1;
//#5 s_input_b_ack=1'b0;
end
initial
begin
$monitor("time=",$time,"input_a =%b,input_b=%b,output_z=%b",input_a,input_b,output_z);
end
endmodule
output:-
time= 0
input_a=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
input_b=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
output_z=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
time= 3
input_a =01000010101101101011000000000000,
input_b=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
output_z=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
time= 4
input_a=01000010101101101011000000000000,
input_b=00111110000101000000000000000000,
output_z=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Could it be possible to expand the data size to 64 bits, maybe by using a flag? I'm doing a Verilog implementation of WebAssembly and the spec needs to do 64 bits floating point operations...
What's the license of the project? According to copying.txt, the text seems to be very similar to MIT, could it be?
I am trying to ask this because, I am using Quartus II. But this tool can't provide information or I can't analyze the output because there is 1 microsecond limit in waveform observation, I can't analyze output beyond 1 micro second. Should division take higher than 1 micro second?
Although fixed in the single fp codebase, this bug still exists in the double codebase. The fix that is in place ensures that A + -A works when A is positive, but not when A is negative (e.g. 0xff80000000000000 + 0x7f80000000000000).
$fopenr("stim_a"); can be changed to $fopen("stim_a", "r")
Seems the conversions between integers and real number only support signed ones, how could it be possible to convert from/to unsigned integers and long numbers?
Hi, Jon.
Thanks for your open-source FPU design. Amazing!
Having read the Verilog code of computation unit, I was impressive of the computational flow by a finite state machine.
However, I am wondering how to insert pipeline into your FPU design, for the sake of improving its throughput? The problem has puzzled me for almost half months. I wanna figure out how to insert pipeline in a finite state machine. Is it accessible? Wish for your help. Thank you.
:~$ python3
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import chips.api.api
Traceback (most recent call last):
File "", line 1, in
File "/home/naga/.local/lib/python3.6/site-packages/chips/init.py", line 3, in
import streams, sinks, process, instruction
ModuleNotFoundError: No module named 'streams'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.