cr1901 / sentinel Goto Github PK

Another size-optimized RISC-V CPU for your consideration.

License: BSD 2-Clause "Simplified" License

Python 76.88% Assembly 10.99% SystemVerilog 2.46% C 6.29% Rust 3.25% RPC 0.13%

sentinel's Introduction

`sentinel`

Sentinel is a small RISC-V CPU (RV32I_Zicsr) written in Amaranth. It implements the Machine Mode privileged spec, and is designed to fit into ~1000 4-input LUTs or less on an FPGA. It is a good candidate for control tasks where a programmable state machine or custom size-tailored core would otherwise be used.

Unlike most RISC-V implementations, Sentinel is microcoded, not pipelined. Instructions require multiple clock cycles to execute. Sentinel is therefore not necessarily a good fit for applications where high throughput/ IPC is required. See below.

Sentinel has been tested against RISC-V Formal and the RISCOF frameworks, and passes both. Once I have added a few extra tests, the core can be considered correct with respect to the RISC-V Formal model. The core is also probably correct with respect to the SAIL golden model.

Why The Name `sentinel`?

I've like the way the word "sentinel" sounds ever since I first learned of the word, either from the title of a book on NJ lighthouses, or on an enemy from an old Sega Genesis RPG. The term has always stuck with me since then, albeit in a much more positive light than "the soldier golems of the forces of Darkness" :). Since "sentinel" means "one who stands watch", I think it's an apt name for a CPU intended to watch over the rest of your silicon, but otherwise stay out of the way. Also, since lighthouses are indeed "Sentinels Of The Shore", I wanted to shoehorn a lighthouse into the logo :).

Getting Started

Sentinel uses:

PDM as its package/dependency manager, and to orchestrate all things you can do with this repo.
m5meta microcode assembler, without which Sentinel would optimize down to ~0 LUTs :).
yosys and nextpnr for size-benchmarking. The user must provide these.
pytest for basic/regression testing.
DoIt as a lower-level dependency-graph aware task orchestrator (called from pdm).
riscv-tests, an older set of unit tests for RISC-V processors. Running these is done via pytest.
RISC-V Formal to verify that desirable properties of Sentinel (such as "instructions write the correct destination") hold for all possible inputs over a bounded number of clock cycles after reset.
RISCOF, the unit test framework that is maintained by RISC-V International themselves. This appears to have originally been derived from the riscv-tests, but is much more comprehensive.

The latter five are only required for development. Additionally for development, a user must provide:

riscv64-unknown-elf-gcc to compile tests from riscv-tests (I'm not sure what the correct way to install the compiler is nowadays, I use 8.3.0.)
SymbiYosys, a driver program for RISC-V Formal.
Boolector, the SMT Solver that RISC-V Formal uses.

RISCOF also requires the SAIL RISC-V emulator. This is a pain to compile, so I provide a Linux binary (and eventually Windows if I can get OCaml to behave long enough. I used to be able to install it just fine :'D!).

A user must first run the following before anything else:

pdm install -G dev -G examples

Use In Amaranth Code

I expect most users to only need to import from sentinel.top. The top-level module of the Sentinel CPU is appropriately named Top:

from sentinel.top import Top

class MySoC(Elaboratable):
    def __init__(self):
        self.cpu = Top()
        ...

    def elaborate(self, plat):
        m = Module()
        m.submodules.cpu = self.cpu
        ...

Top exposes a Wishbone Classic bus, and an irq input pin as the interface to all other modules in an FPGA design. Of course, Top's also has clk and rst lines, which belong to the sync clock domain rather than being directly exposed in Top's Signature. sync is the only clock domain that Sentinel uses.

See the AttoSoC class in examples/attosoc.py for a full working example. A working demo can be generated from this example, as explained below.

Generate A Verilog Core

Using `pdm`/`pyproject.toml` From This Package

This command will generate a core with a Wishbone Classic bus, and clk, rst, and irq input pins (as mentioned above, Sentinel uses a single clock domain):

pdm gen > sentinel.v

On reset, Sentinel begins execution at address `0``. See the CSR section for information on exception handling (including interrupts).

The Wishbone bus uses a block xfer to do a back-to-back memory write an instruction fetch. Otherwise, the wishbone bus will deassert CYC/STB the cycle after receipt of ACK. I may neeed to interface to IP that can't handle block cycles, so I will probably relax the block cycle requirement in the future via an option.

For help, run:

pdm gen -h

From An Installed Package/As A Dependency

If using Sentinel as an installed package, the previous section still applies, except the command is now:

[pdm run] python -m sentinel.gen

If you're using pdm to handle Python dependencies in e.g. a mixed Python/Verilog project, and Sentinel is a one of those Python dependencies, you may wish to use scripts to provide a shortcut for Verilog generation in your pyproject.toml (call = "python -m sentinel.gen" does not work!):

[tool.pdm.scripts]
gen = { call = "sentinel.gen:generate", help="generate Sentinel Verilog file" }

Generate A Demo Bitstream For Lattice iCEstick

pdm demo

For help, run:

pdm demo -h

Run Tests

pdm test

pdm test-quick

The above will invoke pytest and test Sentinel against handcrafted examples, as well as the riscv-test repo binaries. See the README.md in tests/upstream for information on how to refresh the binaries.

Right now (11/5/2023), the difference between test and test-quick is minimal.

Run RISC-V Formal Flow

pdm rvformal-all [-n num_cores]

pdm rvformal test-name

See README.md in tests/formal for more information, including valid/available test names.

Run RISCOF Flow

pdm riscof-all

pdm riscof-override /path/to/test_list.yaml

See README.md in tests/riscof for more information.

List `pdm` Scripts And `doit` Tasks

pdm run --list

pdm doit list [--all] [task]

doit tasks are documented as a courtesy, and to make sure developers/users don't get stuck. I am unsure about doit tasks' stability, so prefer running pdm as a wrapper to doit rather than running doit directly.

Block Diagram

Instruction Cycle Counts

TODO. I need to create a test that gets latency and throughput for each instruction type of the core. Some general observations (as of 11/18/2023), from examining the microcode:

There is room for improvement, even without making the core bigger.
Fetch/Decode takes a minimum of two cycles thanks to Wishbone classic's REQ/ACK handshake taking two cycles.
- When Wishbone ACK is asserted, Decode is taking place.
- The GP file is a synchronous single read port, single write port. Sentinel loads RS1 out of the register file during Decode.
All instructions share the same operation the cycle after ACK/Decode:
- Check for exceptions/interrupts, go to exception handler if so.
- Latch RS1 into the ALU.
- Load RS2 out of the register file, in anticipation for a "simple" instruction.
- Jump to the instruction-specific microcode block.
At minimum, an instruction (addi, or, etc) takes 3 cycles to retire after the initial shared cycles. This means Sentinel instructions have a minimum latency of 6 cycles per instruction (CPI).
Sentinel instructions have a maximum throughput of 4 CPI by overlapping the 2 Fetch/Decode cycles of the next instruction after the initial 3 shared cycles of the current instruction when possible ("pipelining").
- Some instructions overlap one of the Fetch/Decode cycles, some don't overlap either of them. In particular, shift instructions with a nonzero shift count don't pipeline Fetch/Decode. It may be possible to always overlap at least one cycle, but I haven't tweaked the core yet to ensure this is a sound optimization.
Shift instructions need work:
- For a shift of zero, shift-immediate latency is 10 CPI, throughtput 9 CPI. Shift-register latency is 11 CPI, throughput 10 CPI.
- For a shift of nonzero n, shift-immediate and shift-register latency and throughput is 7 + 2*n CPI.
Branch-not-taken latency and throughput is 7 CPI. Branch-taken latency and throughput is 8 CPI.
JAL/JALR latency is 9 CPI, throughput is 7 CPI.
Store latency and throughput is 8 CPI minimum. 2 cycles minimum are spent waiting for Wishbone ACK.
- The core will not release STB/CYC between the store and fetch of the next instruction.
Load latency is 10 CPI minimum, and throughput is 9 CPI. 2 cycles minimum are spent waiting for Wishbone ACK.
- The core will release STB/CYC before fetch of the next instruction.
CSR instructions require an extra Decode cycle compared to all other instructions (to check for legality).
- At minimum, a read of a read-only zero CSR register has a latency of 7 CPI, and a throughput of 6 CPI.
- At maximum, csrrc has a latency of 11 CPI, and a throughput of 10 CPI.
Entering an exception handler requires 5 clocks from the cycle at which the exception condition is detected.
- mret has a latency and throughput of 8 CPI.

CSRs

Sentinel physically implements the following CSRs:

mscratch
mcause
- The core can only physically trigger a subset of defined exceptions:
  - Machine external interrupt
  - Instruction access misaligned
  - Illegal instruction
  - Breakpoint
  - Load address misaligned
  - Store address misaligned
  - Environment call from M-mode
  In particular worth noting:
  - Misaligned accesses are not implemented in hardware.
  - There is no machine timer (a 64-bit counter is a bit too much to ask for right now :(...).
mip
- Only the MEIP bit is implemented. The RISC-V Privileged Spec says:
  
  MEIP is read-only in mip, and is set and cleared by a platform-specific interrupt controller.
  
  The user must provide their own interrupt controller. One simple implementation is to OR all external interrupt sources together, and query each peripheral when MEIP is pending to find which peripherals need attention. This is implemented for the serial and timer peripherals in the attosoc example.
  
  In the future, I may implement the high (platform-specific) 16-bits of mip/mie to make interrupt-handling quicker.
mie
- Only the MEIE bit is implemented.
mstatus
- Only the MPP, MPIE, and MIE bits are implemented.
mtvec
- The BASE is writeable; only the Direct MODE setting is implemented.
mepc

Additionally, the following CSRs are implemented as read-only zero (only the first 5 of the below registers trigger an exception on an attempt to write):

mvendorid
marchid
mimpid
mhartid
mconfigptr
misa
mstatush
mcountinhibit
mtval
mcycle
minstret
mhpmcounter3-31
mhpmevent3-31

All remaining machine-mode CSRs are unimplemented and trigger an exception on any access:

medeleg
mideleg
mcounteren
mtinst
mtval2
menvcfg
menvcfgh
mseccfg
mseccfgh

sentinel's People

Contributors

Stargazers

Watchers

Forkers

alisitsyn kivikakk

sentinel's Issues

Obscure `ResourceWarning` in `pytest` tests

William@DESKTOP-3H1DSBV MINGW64 ~/Projects/FPGA/amaranth/sentinel
$ pdm test -k "top or ucode or witness" 2>&1 | grep -v UnusedElab
============================= test session starts =============================
platform win32 -- Python 3.11.8, pytest-8.1.1, pluggy-1.4.0
rootdir: C:/msys64/home/William/Projects/FPGA/amaranth/sentinel
configfile: pyproject.toml
collected 63 items / 53 deselected / 10 selected

tests/sim/test_top.py .....s                                             [ 60%]
tests/sim/test_ucode.py .ss                                              [ 90%]
tests/sim/test_witness.py .                                              [100%]

============================== warnings summary ===============================
tests/sim/test_witness.py::test_csrrc_bad_rd
    m = Module()
  Enable tracemalloc to get traceback where the object was allocated.
  See https://docs.pytest.org/en/stable/how-to/capture-warnings.html#resource-warnings for more info.

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========== 7 passed, 3 skipped, 53 deselected, 1 warning in 16.89s ===========

William@DESKTOP-3H1DSBV MINGW64 ~/Projects/FPGA/amaranth/sentinel
$ git rev-parse --short HEAD
80eb813

Minor inconvenience at worse. Open an issue and see if I can figure out what causes it later.

The `Zcsr` in the document should be `Zicsr`

https://five-embeddev.com/riscv-isa-manual/latest/csr.html

Sentinel Demo does not consistently fit into 1280 LUTs, depending on toolchain

"Works On My Machine" Is Not Good Enough

On a Windows and Linux machine I own, I compile yosys from source using gcc:

Yosys 0.35+39 (git sha1 031ad38b5, sccache x86_64-w64-mingw32-g++ 13.2.0 -Os)

Going by Makefile, ABC revision is default for this commit: ABCREV = 896e5e7

Final statistics from yosys/abc look something like this:

=== top ===

   Number of wires:               1402
   Number of wire bits:           7955
   Number of public wires:        1402
   Number of public wire bits:    7955
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:               1719
     SB_CARRY                      138
     SB_DFF                          1
     SB_DFFE                        44
     SB_DFFESR                     290
     SB_DFFSR                       84
     SB_GB_IO                        1
     SB_IO                          16
     SB_LUT4                      1131
     SB_RAM40_4K                    14

Packing stats from nextpnr-ice40 are:

Info: Device utilisation:
Info: 	         ICESTORM_LC:  1276/ 1280    99%
Info: 	        ICESTORM_RAM:    14/   16    87%
Info: 	               SB_IO:    17/  112    15%
Info: 	               SB_GB:     7/    8    87%
Info: 	        ICESTORM_PLL:     0/    1     0%
Info: 	         SB_WARMBOOT:     0/    1     0%

CI Demo Breaks

Yosys 0.35+39 (git sha1 031ad38b5, clang 10.0.0-4ubuntu1 -fPIC -Os)

Going by CI build, abc revision is default for this commit: 896e5e7dedf9b9b1459fa019f1fa8aa8101fdf43. This means that yosys and abc are the same revision as on my machine.

Final statistics from CI emulated inside a container look like this:

=== top ===

   Number of wires:               1419
   Number of wire bits:           8021
   Number of public wires:        1419
   Number of public wire bits:    8021
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:               1729
     SB_CARRY                      138
     SB_DFF                          1
     SB_DFFE                        44
     SB_DFFESR                     290
     SB_DFFSR                       84
     SB_GB_IO                        1
     SB_IO                          16
     SB_LUT4                      1141
     SB_RAM40_4K                    14

nextpnr-ice40 then fails to pack the design:

Info: Device utilisation:
Info:            ICESTORM_LC:  1287/ 1280   100%
Info:           ICESTORM_RAM:    14/   16    87%
Info:                  SB_IO:    17/  112    15%
Info:                  SB_GB:     7/    8    87%
Info:           ICESTORM_PLL:     0/    1     0%
Info:            SB_WARMBOOT:     0/    1     0%

Info: Placed 18 cells based on constraints.
Info: Creating initial analytic placement for 1161 cells, random placement wirelen = 17758.
Info:     at initial placer iter 0, wirelen = 433
Info:     at initial placer iter 1, wirelen = 394
Info:     at initial placer iter 2, wirelen = 394
Info:     at initial placer iter 3, wirelen = 397
Info: Running main analytical placer, max placement attempts per cell = 219453.
ERROR: Failed to expand region (0, 0) |_> (13, 17) of 1287 ICESTORM_LCs
0 warnings, 1 error

Clearly, it is possible to make my demo as-is fit into 1280 LUTs and on an iCEStick- and working at that. However, it is not consistent, depending on the toolchain used to compile yosys and abc. It looks like I'm stress-testing the tools ability to optimize, and running too close to the ragged edge of disaster.

Hints

Log from my Linux machine: top-linux.txt
Log from container with OSS Cad Suite: top-container.txt

The logs (aside from paths and the compiler string) are pretty much identical until the abc9 step, which creates 10 more LUTs with the OSS Cad Suite yosys/abc than the outside-of-container yosys/abc I compiled on my machine yesterday.

TODO

Figure out why clang and gcc abc are diverging, if possible.
Look for low-hanging fruit left in my Sentinel design to further reduce LUT usage to make the design consistently fit. I need a bit more breathing room it seems.

For now, allow continue-on-error in CI, as long as other tests pass.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Pending Approval

These branches will be created by Renovate only once you click their checkbox below.

Update all dependencies (main) (critical-section, paramiko, portable-atomic)
Update all dependencies (next) (critical-section, paramiko, portable-atomic)
🔐 Create all pending approval PRs at once 🔐

Detected dependencies

Branch main

cargo

sentinel-rt/Cargo.toml

riscv 0.11.1

riscv-rt 0.12.2

critical-section 1.1.2

heapless 0.8.0

once_cell 1.19.0

panic-halt 0.2.0

portable-atomic 1.6.0

github-actions

.github/workflows/ci.yml

actions/checkout v4

actions/setup-python v5

pdm-project/setup-pdm v4

actions/cache v4

actions/checkout v4

actions/setup-python v5

pdm-project/setup-pdm v4

actions/cache v4

actions/checkout v4

actions/setup-python v5

pdm-project/setup-pdm v4

actions/cache v4

.github/workflows/update-yosys.yml

actions/checkout v4

peter-evans/create-pull-request v6

pep621

pyproject.toml

m5meta >=1.0.4

m5pre >=1.0.3

bronzebeard >=0.2.1

flake8 >=6.1.0

pytest >=7.4.2

doit >=0.36.0

verilog-vcd >=1.11

click >=8.1.7

bronzebeard >=0.2.1

tabulate >=0.9.0

pyelftools >=0.26

riscof >=1.25.3

paramiko ~=2.7

Branch next

cargo

sentinel-rt/Cargo.toml

riscv 0.11.1

riscv-rt 0.12.2

critical-section 1.1.2

heapless 0.8.0

once_cell 1.19.0

panic-halt 0.2.0

portable-atomic 1.6.0

github-actions

.github/workflows/ci.yml

actions/checkout v4

actions/setup-python v5

pdm-project/setup-pdm v4

actions/cache v4

actions/checkout v4

actions/setup-python v5

pdm-project/setup-pdm v4

actions/cache v4

actions/checkout v4

actions/setup-python v5

pdm-project/setup-pdm v4

actions/cache v4

.github/workflows/update-yosys.yml

actions/checkout v4

peter-evans/create-pull-request v6

pep621

pyproject.toml

m5meta >=1.0.4

m5pre >=1.0.3

bronzebeard >=0.2.1

pytest >=7.4.2

doit >=0.36.0

verilog-vcd >=1.11

click >=8.1.7

bronzebeard >=0.2.1

tabulate >=0.9.0

pyelftools >=0.26

riscof >=1.25.3

paramiko ~=2.7

ruff >=0.3.2

flake8 >=7.0.0

pydoclint >=0.4.1

Check this box to trigger a request for Renovate to run again on this repository

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.