Code Monkey home page Code Monkey logo

koika's Introduction

Kôika: A Core Language for Rule-Based Hardware Design

This is the home of Kôika, an experimental hardware design language inspired by BlueSpec SystemVerilog. Kôika programs are built from rules, small bits of hardware that operate concurrently to compute state updates but provide the illusion of serializable (atomic) updates. Kôika has simple, precise semantics that give you strong guarantees about the behavior of your designs.

Our distribution includes an executable reference implementation of the language written using the Coq proof assistant, machine-checked proofs ensuring that Kôika's semantics are compatible with one-rule-at-a-time execution, and a formally-verified compiler that generates circuits guaranteed to correctly implement your designs.

Kôika programs are typically written inside of Coq using an embedded DSL (this lets you leverage Coq's powerful metaprogramming features and modular abstractions), though we also have a more limited standalone front-end that accepts programs in serialized (s-expressions) form. For simulation, debugging, and testing purposes, Kôika programs can be compiled into readable, cycle-accurate C++ models, and for synthesis the Kôika compiler targets a minimal subset of synthesizable Verilog supported by all major downstream tools.

Kôika is a research prototype: the circuits that our compiler generates typically have reasonable-to-good performance and area usage, but our RTL optimizer is very simple and can miss obvious improvements. Because our simulator can take advantage of high-level information, Kôika designs typically run reasonably fast in C++ simulation.

Our largest example at the moment is a simple RISCV (RV32I) 4-stage pipelined core.

Kôika is currently developed as a joint research effort at MIT, involving members of CSG (the Computation Structure Group) and PLV (the Programming Languages & Verification group). Our latest draft is a good place to get details about the research that powers it. The name “Kôika” (甲イカ) is Japanese for “cuttlefish”; we chose it because cuttlefishes have blue blood (a tribute to the name Bluespec), and because their eight arms are equipped with independent neurons that allow them operate semi-independently towards a shared purpose, much like rules in Kôika designs.

This README provides practical information to get started with Kôika. For details about Kôika's semantics, its compilation algorithm, and the work it draws inspiration from, please read our PLDI 2020 paper: The Essence of Bluespec, A Core Language for Rule-Based Hardware Design.

Getting started

Installing dependencies and building from source

  • OCaml 4.07 through 4.09, opam 2.0 or later, GNU make.

  • Coq 8.11 through 8.13:

    opam install coq=8.12.2
    
  • Dune 2.5 or later:

    opam upgrade dune
    
  • Some OCaml packages:

    opam install base core stdio parsexp hashcons zarith
    
  • To run the tests of our RISCV core, a RISCV compilation toolchain.

  • To run C++ simulations: a recent C++ compiler (clang or gcc), libboost-dev, and optionally clang-format.

You can compile the full distribution, including examples, tests, and proofs by running make in the top-level directory of this repo. Generated files are placed in _build, examples/_objects/, tests/_objects/, and examples/rv/_objects/.

Each directory in _objects contains a Makefile to ease further experimentation (including RTL simulation, profiling, trace generation, etc.).

For reproducibility, here is one set of versions known to work:

  • OCaml 4.09 with opam install base=v0.13.1 coq=8.11.1 core=v0.13.0 dune=2.5.1 hashcons=1.3 parsexp=v0.13.0 stdio=v0.13.0 zarith=1.9.1

Browsing examples

The examples/ directory of this repo demonstrates many of Kôika's features. The examples can be compiled using make examples, and browsed in either CoqIDE or Proof General. There is basic Emacs support for .lv files (the “Lispy Verilog” alternative frontend for Kôika) in etc/lv-mode.el.

See browsing the sources below for information about the repository's organization.

Compiling your own programs

Our compiler (cuttlec) supports multiple targets:

  • verilog (an RTL implementation of your design expressed in the synthesizable subset of Verilog 2000)
  • hpp (a cuttlesim model of your design, i.e. a cycle-accurate C++ implementation of it, useful for debugging)
  • cpp (a driver for the cuttlesim model of your design)
  • verilator (a simple C++ driver to simulate the Verilog using Verilator)
  • makefile (an auto-generated Makefile including convenient targets to debug, profile, trace, or visualize the outputs of your design)
  • dot (a basic representation of the RTL generated from your design)

For quick experimentation, you can just drop your files in examples/ or tests/ and run make examples/_objects/<your-file.v>/.

Programs written in the Coq EDSL

On the Coq side, after implementing your design, use the following to define a “package” (see the examples/ directory for more information, or read the syntax section below):

Definition package :=
  Interop.Backends.register
    {| ip_koika := …;
       ip_sim := …;
       ip_verilog := … |}.
Extraction "xyz.ml" package.

Compile your Coq sources using coqc or dune to generate xyz.ml, then compile that file using cuttlec xyz.ml -T ….

Among other things, a package contains instances of the Show typeclass used to print register names. These instances are typically derived automatically, but customizing them makes it possible to control the names given to signals in the generated Verilog and C++ code (for example, instead of x0, x1, …, x31, we use zero, ra, sp, gp, etc. in the RISCV core).

Programs written in serialized syntax (“Lispy Verilog”)

Use cuttlec your-program.lv -T verilog, or any other output option as described by cuttlec --help.

Technical overview

Details about Kôika's design and implementation can be found in our research paper.

Execution model

Kôika programs are made of rules, orchestrated by a scheduler. Each rule is a program that runs once per cycle, as long as it does not conflict with other rules. When conflicts arise (for example, when two rules attempt to write to the same register), the priority order specified by the scheduler determines which rule gets to fire (i.e. execute). Concretely, a rule might look like this (this is a rule that takes one step towards computing the GCD of the numbers in registers gcd_a and gcd_b):

Definition gcd_compute := {{
  let a := read0(gcd_a) in
  let b := read0(gcd_b) in
  if a != |16`d0| then
    if a < b then
      write0(gcd_b, a);
      write0(gcd_a, b)
    else
      write0(gcd_a, a - b)
  else
    fail
}}

The semantics of Kôika guarantee that each rule executes atomically, and that generated circuits behave one-rule-at-a-time — that is, even when multiple rules fire in the same cycle, the updates that they compute are as if only one rule had run per cycle (previous work used this property to define the language; in contrast, our semantics are more precise, and this atomicity property is proven in coq/OneRuleAtATime.v).

As an example, consider a simple two-stage pipeline with two single-element input FIFOs and one single-element output FIFO:

We implement these FIFOs using three single-bit registers (…_empty) indicating whether each FIFO is empty, and three data registers (…_data) holding the contents of these FIFOs. We have three rules: two to dequeue from the input FIFOs into a middle FIFO (deq0 and deq1), and one to dequeue from the middle FIFO and write a result (the input plus 412) into an output FIFO (process). The code looks like this (guard(condition) is short for if !condition then fail):

(* This is a compact way to define deq0, deq1, and process: *)
Definition rules (rl: rule_name_t) :=
  match rl with
  | deq0 =>
    {{ guard(!read0(in0_empty) && read0(fifo_empty));
       write0(fifo_data, read0(in0_data));
       write0(fifo_empty, Ob~0);
       write0(in0_empty, Ob~1) }}
  | deq1 =>
    {{ guard(!read0(in1_empty) && read0(fifo_empty));
       write0(fifo_data, read0(in1_data));
       write0(fifo_empty, Ob~0);
       write0(in1_empty, Ob~1) }}
  | process =>
    {{ guard(!read1(fifo_empty) && read0(out_empty));
       write0(out_data, read1(fifo_data) + |32`d412|);
       write1(fifo_empty, Ob~1);
       write0(out_empty, Ob~0) }}
  end.

A conflict arises when both inputs are available; what should happen in this case? The ambiguity is resolved by the scheduler:

Definition pipeline : scheduler :=
  deq0 |> deq1 |> process |> done.

This sequence indicates that deq0 has priority, so in_data0 is processed first. When both inputs are available and the middle FIFO is empty, when deq1 attempts to run, it will dynamically fail when trying to write into fifo_data.

This example includes a simple form of backpressure: if the middle FIFO is full, the first two rules will not run; if the output FIFO is full, the last rule will not run. This is made explicit by the guard statements (those would be hidden inside the implementation of the dequeue and enqueue methods of the FIFO in a larger example, as demonstrated below).

Looking carefully, you'll notice that reads and writes are annotated with 0s and 1s. These are forwarding specifications, or “ports”. Values written at port 0 are visible in the same cycle at port 1, and values written at port 1 overwrite values written at port 0. Hence, this example defines a bypassing FIFO: values written by deq0 and deq1 are processed by process in the same cycle as they are written, assuming that there is space in the output FIFO. If we had used a read0 instead, we would have had a pipelined FIFO.

In this example, starting with the following values:

in0_empty  ⇒ false
in0_data   ⇒ 42
in1_empty  ⇒ false
in1_data   ⇒ 73
fifo_empty ⇒ true
fifo_data  ⇒ 0
out_empty  ⇒ true
out_data   ⇒ 0

we get the following output:

in0_empty  ⇒ true
in0_data   ⇒ 42
in1_empty  ⇒ false
in1_data   ⇒ 73
fifo_empty ⇒ true
fifo_data  ⇒ 42
out_empty  ⇒ false
out_data   ⇒ 454

Syntax

Kôika programs are written using an embedded DSL inside of the Coq proof assistant. After compiling the distribution, begin your file with Require Import Koika.Frontend.

Preamble and types

Start by defining the following types:

  • reg_t: An enumerated type describing the state of your machine. For example,

    Inductive reg_t :=
    (* These bypassing FIFOs are used to communicate with the memory *)
    | to_memory (state: MemReqFIFO.reg_t)
    | from_memory (state: MemRespFIFO.reg_t)
    (* These FIFOs are used to connect pipeline stages *)
    | d2e (state: fromDecodeFIFO.reg_t)
    | e2w (state: fromExecuteFIFO.reg_t)
    (* The register file and the scoreboard track and record reads and writes *)
    | register_file (state: Rf.reg_t)
    | scoreboard (state: Scoreboard.reg_t)
    (* These are plain registers, not module instances *)
    | pc
    | epoch.
  • ext_fn_t: An enumerated type describing custom combinational primitives (custom IP) that your program should have access to (custom sequential IP is implemented using external rules, which are currently a work in progress; see examples/rv/RVCore.v for a concrete example). Use empty_ext_fn_t if you don't use external IP in your design. For example,

    Inductive ext_fn_t :=
    | custom_adder (size: nat).

Then, declare the types of the data held in each part of your state and the signatures of your external (combinational) IP (we usually name these functions R and Sigma). (In addition to bitsets, registers can contain structures, enums, or arrays of values; examples of these are given below.)

Definition R (reg: reg_t) :=
  match reg with
  (* The type of the other modules is opaque; it's defined by the Rf module *)
  | to_memory st => MemReqFIFO.R st
  | register_file st => Rf.R st
  …
  (* Our own state is described explicitly: *)
  | pc => bits_t 32
  | epoch => bits_t 1
  end.
Definition Sigma (fn: ext_fn_t): ExternalSignature :=
  match fn with
  | custom_adder sz => {$ bits_t sz ~> bits_t sz ~> bits_t sz $}
  end.

As needed, you can define your own custom types; here are a few examples:

Definition proto :=
  {| enum_name := "protocol";
     enum_members :=
       vect_of_list ["ICMP"; "IGMP"; "TCP"; "UDP"];
     enum_bitpatterns :=
       vect_of_list [Ob~0~0~0~0~0~0~0~1; Ob~0~0~0~0~0~0~1~0;
                     Ob~0~0~0~0~0~1~1~0; Ob~0~0~0~1~0~0~0~1] |}.
Definition flag :=
  {| enum_name := "flag";
     enum_members := vect_of_list ["set"; "unset"];
     enum_bitpatterns := vect_of_list [Ob~1; Ob~0] |}.
Definition ipv4_address :=
  {| array_len := 4;
     array_type := bits_t 8 |}.
Definition ipv4_header :=
  {| struct_name := "ipv4_header";
     struct_fields :=
       [("version", bits_t 4);
        ("ihl", bits_t 4);
        ("dscp", bits_t 6);
        ("ecn", bits_t 2);
        ("len", bits_t 16);
        ("id", bits_t 16);
        ("reserved", enum_t flag);
        ("df", enum_t flag);
        ("mf", enum_t flag);
        ("fragment_offset", bits_t 13);
        ("ttl", bits_t 8);
        ("protocol", enum_t proto);
        ("checksum", bits_t 16);
        ("src", array_t ipv4_address);
        ("dst", array_t ipv4_address)] |}.
Definition result (a: type) :=
  {| struct_name := "result";
     struct_fields := [("valid", bits_t 1); ("value", a)] |}.
Definition response := result (struct_t ipv4_header).

Rules

The main part of your program is rules. You have access to the following syntax (there is no distinction between expressions and statements; statements are just expressions returning unit):

pass
Do nothing
fail
Abort the current rule, reverting all state changes
let var := val in body
Let bindings
set var := val
Assignments
stmt1; stmt2
Sequence
if val then val1 else val2
Conditional
match val with | pattern => body… return default: body
Switches (case analysis)
read0(reg), read1(reg), write0(reg), write1(reg)
Read or write a register at port 0 or 1
pack(val), unpack(type, val)
Pack a value (go from struct, enum, or arrays to bits) or unpack a bitset
get(struct, field), subst(struct, field, value)
Get a field of a struct value, or replace a field in a struct value (without mutating the original one)
getbits(struct, field), substbits(struct, field, value)
Like get and subst, but on packed bitsets
!x, x && y, x || y, x ^ y
Logical operators (not, and, or, xor)
x + y, x - y, x << y, x >> y, x >>> y, zeroExtend(x, width), sext(x, width)
Arithmetic operators (plus, minus, logical shits, arithmetic shift right, left zero-extension, sign extension)
x < y, x <s y, x > y, x >s y, x <= y, x <s= y, x >= y, x >s= y, x == y, x != y
Comparison operators, signed and unsigned
x ++ y, x[y], x[y :+ z]
Bitset operators (concat, select, indexed part-select)
instance.(method)(arg, …)
Call a method of a module
function(args…)
Call an internal function
extcall function(args…)
Call an external function (combinational IP)
Ob~1~0~1~0, |4`d10|
Bitset constants (here, the number 10 on 4 bits)
struct name { field_n := val_n;… }
Struct constants
enum name { member }
Enum constants
#val
Lift a Coq value (for example a Coq definition)

For example, the following rule decreases the ttl field of an ICMP packet:

Definition _decr_icmp_ttl := {{
  let hdr := unpack(struct_t ipv4_header, read0(input)) in
  let valid := Ob~1 in
  match get(hdr, protocol) with
  | enum proto { ICMP } =>
    let t := get(hdr, ttl) in
    if t == |8`d0| then set valid := Ob~0
    else set hdr := subst(hdr, ttl, t - |8`d1|)
  return default: fail
  end;
  write0(output, pack(struct response { valid := valid; value := hdr }))
}}.

This rule fetches the next instruction in our RV32I core:

Definition fetch := {{
  let pc := read1(pc) in
  write1(pc, pc + |32`d4|);
  toIMem.(MemReq.enq)(struct mem_req {
       byte_en := |4`d0|; (* Load *)
       addr := pc;
       data := |32`d0|
     });
  f2d.(fromFetch.enq)(struct fetch_bookkeeping {
       pc := pc;
       ppc := pc + |32`d4|;
       epoch := read1(epoch)
    })
}}.

Rules are written in an untyped surface language; to typecheck a rule, use tc_action R Sigma rule_body, or use tc_rules as shown below.

Schedulers

A scheduler defines a priority order on rules: in each cycle rules appear to execute sequentially, and later rules that conflict with earlier ones do not execute (of course, all this is about semantics; the circuits generated by the compiler are (almost entirely) parallel).

A scheduler refers to rules by name, so you need three things:

  • A rule name type:

    Inductive rule_name_t :=
      start | step_compute | get_result.
  • A scheduler definition:

    Definition scheduler :=
      start |> step_compute |> get_result |> done.
  • A mapping from rule names to (typechecked) rules:

    Definition rules :=
      tc_rules R Sigma
        (fun rl =>
         match rl with
         | start => {{ … rule body … }}
         | step_compute => gcd_compute
         | get_result => {{ … rule body … }}
         end).

Formal semantics

The semantics of Kôika programs are given by a reference interpreter written in Coq. The results computed by this interpreter are the specification of the meaning of each program.

The reference interpreter takes three inputs:

  • A program, using the syntax described above

  • The initial value of each state element, r

    Definition r (reg: reg_t): R reg :=
      match reg with
      | to_memory st => MemReqFIFO.r st
      | register_file st => Rf.r st
      …
      | pc => Bits.zero
      | epoch => Bits.zero
      end.
  • A Coq model of the external IP that you use, if any:

    Definition sigma (fn: ext_fn_t): Sig_denote (Sigma fn) :=
      match fn with
      | custom_adder sz => fun (bs1 bs2: bits sz) => Bits.plus bs1 bs2
      end.

Then you can run your code:

Definition cr := ContextEnv.(create) r.

(* This computes a log of reads and writes *)
Definition event_log :=
  tc_compute (interp_scheduler cr sigma rules scheduler).

(* This computes the new value of each register *)
Definition interp_result :=
  tc_compute (commit_update cr event_log).

This interp_scheduler function implements the executable reference semantics of Kôika; it can be used to prove properties about programs, to guarantee that program transformation are correct, or to verify a compiler.

Compiler verification

In addition to the reference interpreter, we have a verified compiler that targets RTL. “Verified”, in this context, means that we have a machine-checked proof that the circuits produced by the compiler compute the exact same results as the original programs they were compiled from (the theorem is compiler_correct in coq/CircuitCorrectness.v).

For instance, in the following example, our theorem guarantees that circuits_result matches interp_result above:

Definition is_external (r: rule_name_t) :=
  false.

Definition circuits :=
  compile_scheduler rules is_external collatz.

Definition circuits_result :=
  tc_compute (interp_circuits empty_sigma circuits (lower_r (ContextEnv.(create) r))).

C++ Simulation

For simulation, debugging, and testing purposes, we have a separate compiler, cuttlesim, which generates C++ models from Kôika designs. The models are reasonably readable, suitable for debugging with GDB or LLDB, and typically run significantly faster than RTL simulation. Here is a concrete example, generated from examples/gcd_machine.v:

bool rule_step_compute() noexcept {
  {
    bits<16> a;
    READ0(step_compute, gcd_a, &a);
    {
      bits<16> b;
      READ0(step_compute, gcd_b, &b);
      if ((a != 16'0_b)) {
        if ((a < b)) {
          WRITE0(step_compute, gcd_b, a);
          WRITE0(step_compute, gcd_a, b);
        } else {
          WRITE0(step_compute, gcd_a, (a - b));
        }
      } else {
        FAIL_FAST(step_compute);
      }
    }
  }

  COMMIT(step_compute);
  return true;
}

The Makefile generated by cuttlec contains multiple useful targets that can be used in connection with cuttlesim; for example, coverage statistics (using gcov) can be used to get a detailed picture of which rules of a design tend to fail, and for what reasons, which makes it easy to diagnose e.g. back-pressure due to incorrect pipelining setups. Additionally, cuttlesim models can be used to generate value change dumps that can be visualized with GTKWave.

We wrote a paper about cuttlesim.

Compilation

The usual compilation process for programs defined using our Coq EDSL in as follows:

  1. Write you program as shown above.
  2. Write a package, gathering all pieces of your program together; packages are documented in coq/Interop.v.
  3. Export that package using extraction to OCaml.
  4. Compile this package to Verilog, C++, etc. using cuttlec; this invokes the verified compiler to circuits and a thin unverified layer to produce RTL, or separate (unverified) code to produce C++ models and graphs.

Additional topics

RTL Simulation, tracing, profiling, etc.

Running the cuttlec with the -t all flag generates all supported output formats, and a Makefile with a number of useful targets, including the following (replace collatz with the name of your design):

  • make obj_dir/Vcollatz

    Compile the generated RTL with Verilator.

  • make gdb

    Compile the C++ model of your design in debug mode, then run it under GDB.

  • make collatz.hpp.gcov

    Generate coverage statistics for the C++ model of your design (this shows which rules firer, how often then fire, and why they fail when they do).

  • make NCYCLES=25 gtkwave.verilator

    Compile the generated RTL with Verilator in --trace mode, then a VCD trace over 25 cycles and open it in GTKWave.

Use make help in the generated directory to learn more.

Function definitions

It is often convenient to define reusable combinational functions separately, as in this example:

Definition alu32: UInternalFunction reg_t empty_ext_fn_t := {{
  fun (funct3: bits_t 3) (inst_30: bits_t 1)
      (a: bits_t 32) (b: bits_t 32): bits_t 32 =>
    let shamt := b[Ob~0~0~0~0~0 :+ 5] in
    match funct3 with
    | #funct3_ADD  => if (inst_30 == Ob~1) then a - b else a + b
    | #funct3_SLL  => a << shamt
    | #funct3_SLT  => zeroExtend(a <s b, 32)
    | #funct3_SLTU => zeroExtend(a < b, 32)
    | #funct3_XOR  => a ^ b
    | #funct3_SRL  => if (inst_30 == Ob~1) then a >>> shamt else a >> shamt
    | #funct3_OR   => a || b
    | #funct3_AND  => a && b
    return default: |32`d0|
    end
}}.

That function would be called by writing alu32(fn3, i30, a, b).

Modularity

Function definitions are best for stateless (combinational) programs. For stateful code fragments, Kôika has a limited form of method calls.

The following (excerpted from examples/conflicts_modular.v) defines a Queue32 module implementing a bypassing FIFO, with methods to dequeue at port 0 and 1 and a method to enqueue at port 0.

Module Import Queue32.
  Inductive reg_t := empty | data.

  Definition R reg :=
    match reg with
    | empty => bits_t 1
    | data => bits_t 32
    end.

  Definition dequeue0: UInternalFunction reg_t empty_ext_fn_t :=
    {{ fun dequeue0 () : bits_t 32 =>
         guard(!read0(empty));
         write0(empty, Ob~1);
         read0(data) }}.

  Definition enqueue0: UInternalFunction reg_t empty_ext_fn_t :=
    {{ fun enqueue0 (val: bits_t 32) : unit_t =>
         guard(read0(empty));
         write0(empty, Ob~0);
         write0(data, val) }}.

  Definition dequeue1: UInternalFunction reg_t empty_ext_fn_t :=
    {{ fun dequeue1 () : bits_t 32 =>
         guard(!read1(empty));
         write1(empty, Ob~1);
         read1(data) }}.
End Queue32.

Our earlier example of conflicts can then be written thus:

Inductive reg_t :=
| in0: Queue32.reg_t -> reg_t
| in1: Queue32.reg_t -> reg_t
| fifo: Queue32.reg_t -> reg_t
| out: Queue32.reg_t -> reg_t.

Inductive rule_name_t := deq0 | deq1 | process.

Definition R (reg: reg_t) : type :=
  match reg with
  | in0 st => Queue32.R st
  | in1 st => Queue32.R st
  | fifo st => Queue32.R st
  | out st => Queue32.R st
  end.

Definition urules (rl: rule_name_t) :=
  match rl with
  | deq0 =>
    {{ fifo.(enqueue0)(in0.(dequeue0)()) }}
  | deq1 =>
    {{ fifo.(enqueue0)(in1.(dequeue0)()) }}
  | process =>
    {{ out.(enqueue0)(fifo.(dequeue1)() + |32`d412|) }}
  end.

Machine-friendly input

When generating Kôika code from another language, it can be easier to target a format with a simpler syntax than our Coq EDSL. In that case you can use Lispy Verilog, an alternative syntax for Kôika based on s-expressions. See the examples/ and tests/ directories for more information; here is one example; the Coq version of the same program is in examples/collatz.v:

;;; Computing terms of the Collatz sequence (Lispy Verilog version)

(defun times_three ((v (bits 16))) (bits 16)
  (+ (<< v 1'1) v))

(module collatz
  (register r0 16'19)

  (rule divide
    (let ((v (read.0 r0))
          (odd (sel v 4'0)))
      (when (not odd)
        (write.0 r0 (lsr v 1'1)))))

  (rule multiply
    (let ((v (read.1 r0))
          (odd (sel v 4'0)))
      (when odd
        (write.1 r0 (+ (times_three v) 16'1)))))

  (scheduler main
    (sequence divide multiply)))

Running on FPGA

The Makefiles that cuttlec generates include targets for generating ECP5 and ICE40 bitstreams. The default ECP5 target is set up for the ULX3S-85k FPGA. The default ICE40 target is set up for the TinyFPGA BX. Both are reasonably affordable FPGAs (but note that right now the RV32i code does not fit on the TinyFPGA BX).

To run the RISCV5 core on the ULX3S on Ubuntu 20:

  • Download a prebuilt ECP5 toolchain from https://github.com/YosysHQ/fpga-toolchain/releases.
  • Make sure that the trivial example at https://github.com/ulx3s/blink works.
  • Run make core in examples/rv to compile the RISCV core (other designs should work too, but you'll need to create a custom wrapper in Verilog to map inputs and outputs to your FPGAs pins.
  • Run make top_ulx3s.bit in examples/rv/_objects/rv32i.v/ to generate a bitstream. You can prefix this command with MEM_NAME=integ/morse (or any other test program) to load a different memory image into the bitstream.
  • Run fujprog top_ulx3s.bit to flash the FPGA.
  • To see the output of putchar(), use a TTY application like tio: tio /dev/ttyUSB0 (the default baud rate is 115200). Alternatively, use tty -F /dev/ttyUSB0 115200 igncr to set up the terminal and then use cat /dev/ttyUSB0.

Browsing the sources

The following list shows the current state of the repo:

coq/
CompilerCorrectness/
(Circuits)
(Circuits)
(Frontend)
(Interop)
  • Compiler.v: Top-level compilation function and helpers
  • ExtractionSetup.v: Custom extraction settings (also used by external Kôika programs
  • Interop.v: Exporting Kôika programs for use with the cuttlec command-line tool
(Language)
(ORAAT)
(Stdlib)
  • Std.v: Standard library
(Tools)
(Utilities)
etc/
vagrant/
  • configure: Generate dune files for examples/ and tests/
examples/
cosimulation.v.etc/
  • blackbox.v: Blackbox verilog module (a simple one-cycle delay) used to demonstrate Cuttlesim+Verilator cosimulation
  • cosimulation.cpp: Custom Cuttlesim driver that implements a Kôika extfun using a Verilator model
fft.v.etc/
  • fft.bsv: A Bluespec implementation of the fft.v example
fir.v.etc/
  • extfuns.hpp: C++ implementation of external functions for the fir example
  • fir.bsv: A Bluespec implementation of the fir.v example
  • mod19.v: Verilog implementation of external functions for the fir example
function_call.v.etc/
  • extfuns.hpp: C++ implementation of external functions for the function_call example
  • fetch_instr.v: Verilog implementation of external functions for the function_call example
rv/
etc/
nangate45/
  • synth.sh: Yosys synthesis script for Nangate Open Cell Library (45nm)
sv/
  • ext_mem.v: Wrapper used to connect the BRAM model with Kôika
  • memory.v: Verilog model of a BRAM
  • pll.v: PLL configuration for the TinyFPGA BX board
  • top.v: Verilog wrapper for the Kôika core (for use in simulation)
  • top_ice40_uart.v: Verilog wrapper for the Kôika core (for use in FPGA synthesis, with a UART interface)
  • top_ice40_usb.v: Verilog wrapper for the Kôika core (for use in FPGA synthesis, with a USB interface)
  • top_uart.v: Verilog wrapper for the Kôika core with a UART interface
uart.v.etc/
ocaml/
backends/
resources/
  • cuttlesim.cpp: Default driver for Kôika programs compiled to C++ using Cuttlesim
  • cuttlesim.hpp: Preamble shared by all Kôika programs compiled to C++
  • verilator.cpp: Default driver for Kôika programs compiled to C++ using Verilator
  • verilator.hpp: Preamble shared by all Kôika programs compiled to C++ using Verilator
  • coq.ml: Coq backend (from Lispy Verilog sources)
  • cpp.ml: C++ backend
  • dot.ml: Graphviz backend
  • gen.ml: Embed resources/* into resources.ml at build time
  • makefile.ml: Makefile backend (to make it easier to generate traces, statistics, models, etc.)
  • rtl.ml: Generic RTL backend
  • verilator.ml: Verilator backend exporting a simple C++ driver
  • verilog.ml: Verilog backend
common/
cuttlebone/
(Interop)
  • Extraction.v: Extraction to OCaml (compiler and utilities)
  • cuttlebone.ml: OCaml wrappers around functionality provided by the library extracted from Coq
frontends/
  • coq.ml: Simple frontend to compile and load OCaml files extracted from Coq
  • lv.ml: Lispy Verilog frontend
  • cuttlec.ml: Command line interface to the compilers
  • interop.ml: Functions to use if compiling Kôika programs straight from Coq, without going through cuttlec
  • koika.ml: Top-level library definition
  • registry.ml: Stub used to load Kôika programs extracted from Coq into cuttlec
tests/
trivial_state_machine.etc/
  • stm.v: Cleaned-up state machine example

koika's People

Contributors

cpitclaudel avatar ju-sh avatar math-fehr avatar namin avatar stellamplau avatar threonorm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

koika's Issues

About the upcoming module system and Kôika's semantics

I'm interested in learning more about the upcoming module system. What are its main design objectives and how far along is it in its development? I also have one use case in mind that Kôika doesn't quite handle yet, and I wonder how the module system will impact it.

Here is a simplified, abstract example of it. Assume the following:

  • In rule r1, there is a call to function f1 guarded by an if statement. The result of f1 is written to register g.
  • In a separate rule, if g holds the value 1 at the beginning of the cycle, the external rule e is called.
  • The only call to f1 that can possibly occur is the one in r1.

We want to prove the following property: "given schedule s and for all possible external semantics, whenever a call to f1 returns 1, e is called during the next cycle". This is complicated as of today, as all available semantics are register-centric, yet registers do not make an appearance here. More precisely, external calls are not considered for the reasons you detailed in issue #2 and both "TypedSemantics" and "CompactSemantics" erase the causality information that could be used to prove "when x is true, f1 is called and returns 1 and 1 is stored in g". It is of course possible to extract x, that is, the conditions under which f1 is called (remember that it is guarded by an if) and returns 1 (depends on the definition of r1), however in proving that when these are verified, 1 is stored in g, the link to the call to f1 is lost.

A more concrete example is as follows.

The rv example defines an isLegalInstruction function that is used to check whether a given word corresponds to an instruction according to the RISC-V specification. Currently, when an instruction fetched from memory turns out to be invalid, the program counter is set to zero, and the execution continues. If we wanted the processor to shutdown on an invalid instruction instead, we would have to modify RVCore to some extent. As the notion of shutdown is external to Kôika, we would have to rely on an external call to handle it (in fact ext_finish is already available).

We may be interested in proving the following property: "On the first cycle following one on which an invalid instruction was decoded (that is, one on which isLegalInstruction returned false), only one external call is issued. This call should be a call to the shutdown external rule. No other external calls can happen from this point on.".

I understand that the module system will probably have an impact on how external rules (including both those from other Kôika modules and "real" external calls such as ext_finish) are handled, with consequences on the semantics. Will the notion of calls to external rules indeed appear in the semantics? That should solve part of my problem. I guess it might not be the case for calls to rules of Kôika modules as those could be inlined as far as the logs go.

However, the more general problem of the lack of traceability of the causes of the actions tracked in the logs will likely persist as it is not really related to the notion of module as well as somewhat niche. The most obvious solution to this problem would be to develop yet another semantics with more detailed trace information and a proof of equivalence to the other ones in the situations they both cover. I guess that this is the route I will have to follow for the property I want to prove. Does this sound like the way to go or is there something simpler (maybe simply waiting for the module system is)?

Context unknown during type checking

This is a minor issue but the error requests it is reported :) The following action correctly fails type checking due to the input to the external function having the incorrect type. However, Koika fails to discover/print the context of where the type checking failure is. Note removing the if in the _simple action generates the right context.

  Inductive reg_t := foo | bar.

  Definition R r :=
    match r with
    | foo => bits_t 32
    | bar => bits_t 4
    end.

  Definition r idx : R idx :=
    match idx with
    | foo => Bits.zero
    | bar => Bits.zero
    end.

  Inductive ext_fn_t := f1.

  Definition Sigma (fn: ext_fn_t) :=
    match fn with
    | f1 => {$ bits_t 1 ~> bits_t 32 $}
    end.

  Inductive rule_name_t := simple .

  Definition _simple : uaction reg_t ext_fn_t :=
    {{ if |4`d1| == |4`d0| then pass
       else (
        let y := extcall f1 (read0(foo)) in
        write0(foo, y)
        )
    }}.

  Definition simple_device : scheduler :=
    simple |> done.

  Definition rules :=
    tc_rules R Sigma
             (fun r => match r with
                    | simple => _simple
                    end).

Koika status:

## Type error:
(TypeMismatch (bits_t 32) (bits_t 1))

## Context unknown; please report this bug.

Compilation error with the cosimulation example when verilator >= 5.004

I get the following error when I attempt to compile examples/cosimulation.v :

/usr/bin/ld: blackbox.obj_dir.opt/verilated.o: in function `VerilatedContext::threadPoolp()':
verilated.cpp:(.text+0x3212): undefined reference to `VlThreadPool::VlThreadPool(VerilatedContext*, unsigned int)'
collect2: error: ld returned 1 exit status

It is caused by the changes made to Verilator in commit d6126c4b3.
From this version on, the VlThreadPool constructor mentioned in the error message appears in examples/_objects/cosimulation.v/blackbox.obj_dir.opt/verilated.o. The VlThreadPool class is defined in verilated_threads.cpp, and linking this file manually fixes the issue.

There is no mention of verilated_threads.cpp in any of the dependency files. Maybe this is a Verilator issue ? I did not find signs of other people having similar issues.

Potential optimization in the RISC-V example

I noticed that the getFields function of the RISC-V example often gets called just to get the value of a single field (with calls along the line of let funct3 := get(getFields(inst), funct3)). Since fetching the values of all the fields just to extract the value of one of them looked a bit overkill, I tried replacing getFields with individual getXxx functions (e.g. getFunct3). This provides a decent boost to simulation performance:

-- Running tests with Cuttlesim --
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_qsort.rv32 -1
[before] real: 0m0.018s	user: 0m0.017s	sys: 0m0.001s
[after ] real: 0m0.013s	user: 0m0.013s	sys: 0m0.000s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_median.rv32 -1
[before] real: 0m0.012s	user: 0m0.012s	sys: 0m0.000s
[after ] real: 0m0.008s	user: 0m0.007s	sys: 0m0.001s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/img.rv32 -1
[before] real: 0m0.216s	user: 0m0.195s	sys: 0m0.020s
[after ] real: 0m0.160s	user: 0m0.142s	sys: 0m0.018s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/primes.rv32 -1
[before] real: 0m4.421s	user: 0m4.417s	sys: 0m0.001s
[after ] real: 0m3.090s	user: 0m3.084s	sys: 0m0.001s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/morse.rv32 -1
[before] real: 0m1.584s	user: 0m1.583s	sys: 0m0.000s
[after ] real: 0m1.086s	user: 0m1.085s	sys: 0m0.000s
-- Running tests with Verilator --
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_qsort.vmh -1
[before] real: 0m0.033s	user: 0m0.032s	sys: 0m0.001s
[after ] real: 0m0.030s	user: 0m0.030s	sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/morse.vmh -1
[before] real: 0m2.689s	user: 0m2.687s	sys: 0m0.000s
[after ] real: 0m2.186s	user: 0m2.184s	sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_median.vmh -1
[before] real: 0m0.022s	user: 0m0.022s	sys: 0m0.000s
[after ] real: 0m0.018s	user: 0m0.017s	sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/primes.vmh -1
[before] real: 0m8.358s	user: 0m8.349s	sys: 0m0.001s
[after ] real: 0m6.778s	user: 0m6.767s	sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/img.vmh -1
[before] real: 0m0.399s	user: 0m0.385s	sys: 0m0.014s
[after ] real: 0m0.327s	user: 0m0.310s	sys: 0m0.017s

This results in a speedup of roughly 1.43 for Cuttlesim and of 1.23 for Verilator compared to master. I did not yet check the effects on synthesis but I might do it soon. I assume that this kind of optimization is applied automatically when synthesizing (isn't it?), so I don't expect changes on this side (alas). See commit 04cfdf8 for a demonstration.

External function incorrectly pruned if rule is otherwise uneffectful

In the following example, an external function is written to but the input value is unused/dropped. The generated verilog drops the call to total_o completely (even though it is effectful as we write a value to it). I would guess the logic optimisation pass does not correctly account for external functions.

Full source example. Relevant sections:

  Definition Sigma (fn: ext_fn_t) :=
    match fn with
    | f1 => {$ bits_t 1 ~> bits_t 32 $}
    | do_zero_i => {$ bits_t 1 ~> bits_t 1 $}
    | total_o => {$ bits_t 32 ~> bits_t 1 $} (* <-- This external function *)
    end.

  Definition _write_back : uaction reg_t ext_fn_t :=
    {{ let previous_total := read0(total) in
       let drop := extcall total_o ( previous_total ) in pass
    }}.

  Definition simple_device : scheduler :=
    write_back |> ...

  Definition ext_fn_specs (fn: ext_fn_t) :=
    match fn with
    | f1 => {| efr_name := "f1"; efr_internal := true |}
    | do_zero_i => {| efr_name := "f2"; efr_internal := true |}
    | total_o => {| efr_name := "f3"; efr_internal := true |} (* <--- *)
    end.

Note no mention of the external call in generated verilog (occurs also for efr_internal := false):

// -*- mode: verilog -*-
// Generated by cuttlec from a Kôika module
module simple_device(input wire[0:0] CLK, input wire[0:0] RST_N);
	reg[31:0] total /* verilator public */ = 32'b0;

	wire[0:0] mod_f20_out;
	f2 mod_f20(.CLK(CLK), .RST_N(RST_N), .arg(1'b1), .out(mod_f20_out));
	wire[31:0] mod_f10_out;
	f1 mod_f10(.CLK(CLK), .RST_N(RST_N), .arg(1'b1), .out(mod_f10_out));

	always @(posedge CLK) begin
		if (!RST_N) begin
			total <= 32'b0;
		end else begin
			total <= ~mod_f20_out && ~mod_f20_out ? mod_f10_out + total : (mod_f20_out ? 32'b0 : total);
		end
	end
endmodule

Adding a dummy register and writing to it with a nonconstant value generates correct verilog:

  Definition _write_back : uaction reg_t ext_fn_t :=
    {{ let previous_total := read0(total) in
       let still_dropped := extcall total_o ( previous_total ) in (* <- still dropping value *)
       write0 (dummy, previous_total)                             (* <- but preventing rule being optimised away *) 
    }}.

Translating Koika's bits_t (Vect) to MIT's bbv

Hi,

Is there a particular reason why Koika bitvector's are designed with the non-inductive type vect ? Why not use MIT's bbv ?

Context: the Koika implementations I'm working on have to be proven correct against a spec that works with bbv, I've written a few functions to translate types (like the one below) but ideally I'd need to define some sort of rewritable equality, I suppose.

Fixpoint bbvword_to_bits_sized {sz} (wd : Word.word sz) : bits sz :=
    match wd with
    | Word.WO => vect_nil
    | @Word.WS hd n tl => if (Bool.eqb hd true)
        then {|vhd := true; vtl := bbvword_to_bits_sized tl|}
        else {|vhd := false; vtl := bbvword_to_bits_sized tl|}
    end.

Warning message about Boost version triggers spuriously

In the file ocaml/backends/resources/cuttlesim.hpp (around line 85)

#define MULTIPRECISION_THRESHOLD 64

#ifdef NEEDS_BOOST_MULTIPRECISION

#if BOOST_VERSION < 106800
// https://github.com/boostorg/multiprecision/commit/bbe819f8034a3c854deffc6191410b91ac27b3d6
// Before 1.68, static_cast<uint16_t>(uint128_t{1 << 16}) gives 65535 instead of 0
#pragma message("Bignum truncation is broken in Boost < 1.68; if you run into issues, try upgrading.")
#endif

#include <boost/multiprecision/cpp_int.hpp>

should be:

#define MULTIPRECISION_THRESHOLD 64

#include <boost/multiprecision/cpp_int.hpp>

#ifdef NEEDS_BOOST_MULTIPRECISION
#if BOOST_VERSION < 106800
// https://github.com/boostorg/multiprecision/commit/bbe819f8034a3c854deffc6191410b91ac27b3d6
// Before 1.68, static_cast<uint16_t>(uint128_t{1 << 16}) gives 65535 instead of 0
#pragma message("Bignum truncation is broken in Boost < 1.68; if you run into issues, try upgrading.")
#endif

Otherwise, the BOOST_VERSION macro is not defined at the time of the check, which results in the message triggering regardless of Boost's real version.

Infinite loop

Dear @cpitclaudel ,

We are trying out Koika to run part of an operating system on an FPGA.
In a test case (a.k.a. Example) we are using tc_compute and interpret_action to evaluate an action.
When we do so, we enter an infinite loop.

So far I could only identify two source where this can happen:

  1. In type class resolution (https://issuehint.com/issue/coq/coq/16216)
  2. In tactics such as repeat (https://stackoverflow.com/questions/53750792/can-composing-try-and-repeat-lead-to-an-infinite-loop-in-coq)

I tried to get some debug output on following the instructions shared in the link above but to no avail.
There is quite some code in Koika that uses repeat but I was not sure how to debug this properly.

It is hard to boil this down to a small example that would fit in a bug post.
But if you could possibly find the time, I could share the according repository with you.

Any help would be very much appreciated.

Multiplier correctness example

Hi, I'm new to coq and I am facing a few issues with understanding the code MultiplierCorrectness.v. I could understand till the lemma mul_to_partial_mul. I don't understand what the rest of the code does. @math-fehr It would be really helpful if I could get what are we trying to prove in lemmas enq_preserves_invariant, deq_preserves_invariant, step_preserves_result_invariant, step_preserves_result_finished_invariant. It would be really helpful for beginners like me if a few comments could be added giving the description of the lemma.

I am currently working on proving correctness of different modules of a processor based on riscv architecture. I would also like to start a discussion (if interested) as to how can we prove the functionality and timing correctness of different modules. For instance, I currently wrote a koika code for alu module of the processor. I was thinking how could the functionality correctness of this module could be proved with the help of coq (ie, what lemmas would be good to prove since its not a mathematical problem).

Any sort of idea or suggestion would be really helpful. Thank you :)

Avoidable stalling in the RISC-V example

valid_rs1 and valid_rs2 are never read in the RISC-V example. Making sure that the values really correspond to rs1 or rs2 instead of stalling whenever at least one of them is associated with something other than 0 in the scoreboard yields better performance:

-- Running tests with Cuttlesim --
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_qsort.rv32 -1
  [before] real: 0m0.018s	user: 0m0.017s	sys: 0m0.001s
  [after ] real: 0m0.013s	user: 0m0.013s	sys: 0m0.000s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_median.rv32 -1
  [before] real: 0m0.012s	user: 0m0.012s	sys: 0m0.000s
  [after ] real: 0m0.009s	user: 0m0.009s	sys: 0m0.000s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/img.rv32 -1
  [before] real: 0m0.213s	user: 0m0.197s	sys: 0m0.016s
  [after ] real: 0m0.157s	user: 0m0.137s	sys: 0m0.019s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/primes.rv32 -1
  [before] real: 0m4.413s	user: 0m4.407s	sys: 0m0.003s
  [after ] real: 0m3.084s	user: 0m3.080s	sys: 0m0.001s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/morse.rv32 -1
  [before] real: 0m1.583s	user: 0m1.581s	sys: 0m0.000s
  [after ] real: 0m1.142s	user: 0m1.141s	sys: 0m0.000s
-- Running tests with Verilator --
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_qsort.vmh -1
  [before] real: 0m0.033s	user: 0m0.033s	sys: 0m0.000s
  [after ] real: 0m0.031s	user: 0m0.030s	sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/morse.vmh -1
  [before] real: 0m2.692s	user: 0m2.689s	sys: 0m0.001s
  [after ] real: 0m2.547s	user: 0m2.544s	sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_median.vmh -1
  [before] real: 0m0.022s	user: 0m0.022s	sys: 0m0.000s
  [after ] real: 0m0.021s	user: 0m0.021s	sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/primes.vmh -1
  [before] real: 0m8.383s	user: 0m8.371s	sys: 0m0.002s
  [after ] real: 0m7.892s	user: 0m7.885s	sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/img.vmh -1
  [before] real: 0m0.401s	user: 0m0.388s	sys: 0m0.013s
  [after ] real: 0m0.378s	user: 0m0.361s	sys: 0m0.017s

I kept only the tests that run for more than 2ms to keep this short. This results in tests taking roughly 25% less time with Cuttlesim and 5% less time for Verilator. Of course, your mileage may vary. I did not check the effects on synthesis.

See related commit 66b59e9.

Some circuits that include extcalls behave unexpectedly when simulated through Verilator

The underlying problem seems to be that cuttlec treats extcalls like normal Kôika code when targetting Verilator, ignoring side effects.

For instance, unconditional calls to external modules with side effects are optimized away when the result of the extcall is left unused. For a minimal example of this behavior, see this commit on my fork. I modified the RISC-V example, adding an external module printing a value to terminal for both Cuttlesim and Verilator and calling it from the tick function in RVCore.v. make cuttlesim-tests quickly fills the output with messages, while the output of the tests ran by make verilator-tests is left unchanged. The commit 9638936 stores the result to a debug register, and in this case the behavior of Cuttlesim and Verilator are equivalent.

Another instance of this problem can be seen by nesting an extcall inside an if block, in which case the call is triggered without regard to the condition in some situations. See commit 9f1853a for a minimal example of this behavior (once again, the core of the example is in the tick function). Commit 65621bd is a slightly more complex example showing that the effects of calls to write<n> present in the same blocks as the triggered extcalls are ignored. Commit 0e34888 shows that binding the result of the extcall to another debug register somehow fixes that behavior. Maybe in that case the optimizer manages to remove the whole block? Commit 65621bd is quite similar to 0e34888, but in this case the condition starts as true and the write inside the if block makes it wrong for future calls. Although the value of the extcall is still stored in a register unrelated to the condition, this time the situation seems to be too complex for the optimizer and Verilator ends up printing a message on each tick. It looks like whenever the extcall can't be entirely removed through optimization, it is just triggered. Is this why Maybes are used as arguments to extcalls, with some repetition between the valid field and the conditions already checked in the context of the calls? Couldn't values be passed directly to extcalls and Maybes be generated automatically for Verilator at compile time based on the context of the call? I guess this is a feature that was never fully integrated and that functions well enough for its current purpose but I am interested in getting it to work in what I feel would be a cleaner way and would happily contribute to that end.

`unit_t` in ext_fn_t input generates invalid verliog

Perhaps not conventional usage but currently unit_t/bits_t 0 sized input from an external function generates invalid Verilog. e.g.

    Definition Sigma (fn: ext_fn_t) :=
      match fn with
      | f1 => {$ unit_t ~> bits_t 32 $}
      | f2 => {$ unit_t ~> bits_t 1 $}
      end.

Generated verilog with efr_internal := false:

  // -*- mode: verilog -*-
  // Generated by cuttlec from a Kôika module
  module simple_device(input wire[0:0] CLK, input wire[0:0] RST_N, output wire[-1:0] f1_arg, output wire[-1:0] f2_arg, input wire[31:0] f1_out, input wire[0:0] f2_out);
   reg[31:0] total /* verilator public */ = 32'b0;

   assign f2_arg = 0'b0;
   assign f1_arg = 0'b0;

   always @(posedge CLK) begin
    if (!RST_N) begin
     total <= 32'b0;
    end else begin
     total <= ~f2_out && ~f2_out ? f1_out + total : (f2_out ? 32'b0 : total);
    end
   end
  endmodule

Note the output wire[-1:0] f1_arg output wire[-1:0] f2_arg and also assign f1_arg = 0'b0; assign 21_arg = 0'b0; are invalid as constants cannot be zero length (IEEE 1800-2012 - 5.7 Numbers).

Generated verilog with efr_internal := true:

// -*- mode: verilog -*-
// Generated by cuttlec from a Kôika module
module simple_device(input wire[0:0] CLK, input wire[0:0] RST_N);
  reg[31:0] total /* verilator public */ = 32'b0;

  wire[0:0] mod_f20_out;
  f2 mod_f20(.CLK(CLK), .RST_N(RST_N), .arg(0'b0), .out(mod_f20_out));
  wire[31:0] mod_f10_out;
  f1 mod_f10(.CLK(CLK), .RST_N(RST_N), .arg(0'b0), .out(mod_f10_out));

  always @(posedge CLK) begin
    if (!RST_N) begin
      total <= 32'b0;
    end else begin
      total <= ~mod_f20_out && ~mod_f20_out ? mod_f10_out + total : (mod_f20_out ? 32'b0 : total);
    end
  end
endmodule

Note .arg(0'b0) again is a zero sized constant.

Build issues on macOS

I encounter the following compiling issue.

== Building OCaml library and executables ==
dune build ocaml/cuttlec.exe @install
ocamlopt ocaml/cuttlec.exe
ld: warning: directory not found for option '-L/opt/local/lib'
make: *** No rule to make target examples/_objects/collatz.lv', needed by examples'. Stop.

Did I miss some configuration in my _Coqproject file?
Any advice is welcome. Thanks!

Errors in examples/rv/MultiplierCorrectness.v

Most of the proofs of the lemmas defined in the file examples/rv/MultiplierCorrectness.v do not pass. I did not check them all individually, but most of them seem to be outdated. At least enq_preserves_invariant (no matching clauses for match), deq_preserves_invariant (attempt to save an incomplete proof), step_preserves_finished_invariant,step_preserves_step_invariant and step_preserves_result_finished_invariant are impacted.

This is easy to miss as no target in the Makefile checks these proofs. It might be useful to add a correctness target to give an easy way to test them, especially since this file contains the most complex examples of proofs on Kôika designs of the examples folder. Not including it by default makes sense considering that it takes some time to pass and is not always directly relevant.

For the record, my version of coq is 8.11.0.

Outdated information in examples/rv/README.rst

An "FPGA" section has been (relatively) recently added to the main README of the repository, including the information that the RV32I model does not fit on the TinyFPGA BX at the time. However, the README in examples/rv contains some older information that gives the impression that the TinyFPGA BX is supposed to work. Maybe replacing this duplication with a redirection to the "FPGA" section of the main README would be clearer.

I am in fact interested in demonstrating some properties of a slightly modified version of the RV32I example, and I recently got my hands on this FPGA model. I was looking for a way of testing it on this device to change from simulation and tried to synthesize Kôika's RV32I as a first step, and I stumbled upon an error unrelated to the size issue mentionned in the main README (of which I was not aware of at the time):

[...]
3.2.1. Analyzing design hierarchy..
ERROR: Module `\usb_uart_i40' referenced in module `\top_ice40_usb' in cell `\uart' is not part of the design.
make: *** [Makefile:269: top_ice40_usb.json] Error 1

I noticed that this file exists in the tinyfpga_bx_usbserial repository and spotted an issue seemingly related to Kôika there, which leads me to assume that there is some undocumented dependency on this project but I did not go further than this.

If the break did not happen too far back, I would be interested in trying to make my version run on the FPGA. Could you possibly recommend an older Kôika commit with which synthesis should work? Also, do you think that removing the multiplication module on my end would be enough to fix the size issue, or is this hard to say?

Different behavior in simulation and FPGA

When synthesizing the RV32E example with the unit/led program on my TinyFPGA BX, the LED stays off.
On the other hand, Cuttlesim and Verilator both behave as I expect them to following the definition of the test.

I tried the following things independently from each other:

  • replacing
if (led_wr_valid)
  led <= led_wr_data;

with

if (!led_wr_valid)
  led <= !led_wr_data;

in top_uart.v results in the LED turning (and staying) on (FPGA only, simulation not impacted although making the same change to top.v indeed inverts Verilator's behavior);

  • replacing both may_run and on by Ob~1 in the extcall to ext_led in RVCore.v results in the LED turning (and staying) on, as should be expected (FPGA and simulation);
  • replacing the contents of led.c with a simpler infinite loop containing only a call to putled(1) does not fix anything - the LED is still off, and simulation spams the output with the "☀" symbol.

Do you observe the same thing on your side? I don't have access to an ULX3S-85k as of now, does this test behave any differently on there? Are other tests featuring extcalls impacted?

I'm quite surprised by Verilator and synthesis disagreeing.

Vectors and name collisions

In Kôika projects including vectors, obtaining a name collision is trivial - simply add a register named <vector_name>_0. This is due to the way names are built through concatenation in coq/DerivingShow.v: the names generated are themselves valid Coq identifiers. For a demonstration, see commit 6e2010a on my fork. In this example, a register named rData_0 has been added to examples/vector.v and g++ doesn't appreciate that.

There are several possible workarounds, and the simplest I can think of is to use the $ character in the concatenation phase instead of _. The reason why this works is that the C++, Verilog and DOT backends all accept this character in register names whereas Coq doesn't allow it in identifiers. This follows from the definition of an identifier in the standard for C++ and Verilog, and from the fact that register names are printed between quotes for .dot files. See commit c11c463 for a demonstration of this possible fix (examples/vector.v has not changed since the previous commit yet it compiles now).

If support for $ in identifiers is to be added in some future Coq release, then this character should be considered reserved by Kôika. If another backend without support for $ in identifiers is to be added to Kôika in a future release, then there wouldn't be much of a problem. What matters is that Kôika outputs unique names, and these can always be adapted from cuttlec to suit the specifics of given backends.

Some other related but less problematic issues exist. For instance, there is no upper bound to the length of Coq identifiers yet the same can not be said of Verilog, where the lowest maximum length tolerated by the standard is 1024. On a similar note, Coq is in general much more tolerant than Verilog when it comes to the characters allowed in identifiers: using accents (let alone non-breaking spaces) in a Verilog identifier is not accepted. Writing Kôika code that typechecks in Coq yet results in an error further down the line is therefore possible and even easy. As Kôika outputs unique names, this kind of problem can always be managed from cuttlec. In practice, this problem is easy to deal with and in fact the solution is always some register renamings away.

The context of this question is that I am looking for a way to generate Verilog through Kôika in such a way as to get BRAM blocks to be used through inference when using vectors past a certain size. Assuming that registers stored in BRAM don't behave differently from registers stored using flip-flops, this shouldn't violate any of Kôika's expectations. It could for instance be interesting to model the memory of the RISC-V example directly in Kôika instead of relying on the current workarounds.

However, it seems to be possible to define a vector of registers of varying sizes through Vect as follows:

Definition R r :=
  match r with
  | rData n => bits_t (index_to_nat n)
  end.

This does not correspond to a vector in the traditional sense and it would in particular not be suitable for BRAM storage. Is this behavior desired or just a natural consequence of the way registers are managed? Why is reg_t defined directly through Inductive rather than through a custom construct?

There is a link between the previous question and the "related but less problematic issues" mentioned above: a way of solving them would be to limit the accepted names for register names by avoiding direct reliance on Inductive for their definition (and maybe likewise of Definition for R and such). Furthermore, not treating registers and vectors uniformly may reveal practical for my purpose. In fact, limiting the definition of reg_t to raw registers and vectors (in the traditional sense of the word) covers all the use cases of Kôika I can think of (I mean, registers alone would suffice but vectors make life way easier) - am I missing situations where a full blown inductive type is useful?

I think this digression matters as quite a lot of systems that would be interesting to model using Kôika have some notion of memory, which is best represented using vectors, which are in turn best stored in BRAM when targeting FPGAs.

Building error (Ocaml 4.11.1)

Hello, I'm trying to build Koika with OCaml 4.11.1 and I get the following error log in my command line when running make:

== Building Coq library ==
dune build @@coq/all
File "./coq/FiniteType.v", line 44, characters 4-49:
Warning: omega is deprecated since 8.12; use “lia” instead.
[omega-is-deprecated,deprecated]
File "./coq/FiniteType.v", line 44, characters 4-49:
Warning: omega is deprecated since 8.12; use “lia” instead.
[omega-is-deprecated,deprecated]
File "./coq/FiniteType.v", line 45, characters 4-58:
Warning: omega is deprecated since 8.12; use “lia” instead.
[omega-is-deprecated,deprecated]
File "./coq/FiniteType.v", line 52, characters 4-10:
Warning: omega is deprecated since 8.12; use “lia” instead.
[omega-is-deprecated,deprecated]
File "./coq/FiniteType.v", line 158, characters 0-67:
Warning: The default value for hint locality is currently "local" in a
section and "global" otherwise, but is scheduled to change in a future
release. For the time being, adding hints outside of sections without
specifying an explicit locality is therefore deprecated. It is recommended to
use "export" whenever possible. [deprecated-hint-without-locality,deprecated]
File "./coq/DeriveShow.v", line 106, characters 2-73:
Warning: The default value for hint locality is currently "local" in a
section and "global" otherwise, but is scheduled to change in a future
release. For the time being, adding hints outside of sections without
specifying an explicit locality is therefore deprecated. It is recommended to
use "export" whenever possible. [deprecated-hint-without-locality,deprecated]
File "./coq/EqDec.v", line 24, characters 2-3:
Error: [Focus] Wrong bullet -: Current bullet - is not finished.

make: *** [Makefile:17: coq] Error 1

Readme says that OCaml version should lie between 4.07 and 4.09. Is it a strict requirement and I've received the result of it's disturbance?

Proof failures with (unsupported) Coq 8.12

Some changes in Coq 8.12 tactics break proofs in a few places.

  • firstorder is now weaker (related coq change coq/coq#11760), firstorder with bool works in failing cases here
  • lia seems to be weaker, failing at coq/PrimitiveProperties.v:354&357 the first being:
  353         * f_equal.
  354           rewrite N.mul_mod_distr_l by (destruct sz; cbn; lia).  
  355           reflexivity.

This can be fixed by an intermediate remember:

  353         * f_equal.               
  354           rewrite N.mul_mod_distr_l ; (destruct sz; cbn; try lia).  
  355           remember (2^(Pos.of_succ_nat sz))%positive; lia

This seems odd to me (I can't find a relevant changelog for lia), does the Koika team think this is a Coq issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.