Code Monkey home page Code Monkey logo

xcrypto's Introduction

XCrypto: a cryptographic ISE for RISC-V

Build Status Documentation

Acting as a component part of the wider SCARV project, XCrypto is a general-purpose Instruction Set Extension (ISE) for RISC-V that supports software-based cryptographic workloads.

Overview

A given cryptographic workload is commonly expected to satisfy a challenging and diverse range of traditional design metrics, including some combination of high-throughput, low-latency, low-footprint, power-efficiency, and high-assurance, while executing in what is potentially an adversarial environment. A large design space of options can be drawn from when developing a concrete implementation: these options span a spectrum, between those entirely based on hardware (e.g., a dedicated IP core) and those entirely based on software. ISEs can be viewed as representing a hybrid option, in the sense they alter a general-purpose processor core with special-purpose hardware and associated instructions; such targeted alterations then help to improve a software-based implementation wrt. some design metric (e.g., latency).

As an ISE, we pitch XCrypto as a solution (vs. the solution) within the wider design space of options. For example, it offers as an alternative to the solution being proposed by the RISC-V cryptography extensions group (see, e.g., their presentation: the design extends the RISC-V vector ISE). The idea is to leverage extensive existing literature and hence experience wrt. cryptographic ISEs (see, e.g., published work at the CHES conference), translating and applying it to RISC-V. Although potentially less performant than alternatives, we expect implementations using XCrypto to be more lightweight and flexible; as a result, we view it as representing an attractive solution in the context of micro-controller class cores.

Organisation

├── bin                    - scripts (e.g., environment configuration)
├── build                  - working directory for build
├── doc                    - documentation
├── extern                 - external resources (e.g., submodules)
│   ├── libscarv             - submodule: scarv/libscarv
│   ├── riscv-opcodes        - submodule: scarv/riscv-opcodes
│   ├── texmf                - submodule: scarv/texmf
│   └── wiki                 - submodule: scarv/xcrypto.wiki
├── pdf                    - PDFs, e.g., presentation slides
├── rtl                    - source code for re-usable hardware modules
└── src
    ├── docker             - source code for containers
    ├── helloworld         - source code for example program
    ├── test               - source code for test    program(s)
    └── toolchain          - source code for tool-chain

Note that:

  • ${REPO_HOME}/doc houses the XCrypto specification: this document captures the ISE itself, acting as both a) a definition of additional architectural state (e.g., register file and CSRs) and instructions (i.e., their semantics and encoding), and b) a design document. Pre-built versions accompany each releases of XCrypto.

  • ${REPO_HOME}/rtl houses a library of re-usable hardware components (e.g., for arithmetic operations), which could be used in an implementation of XCrypto.

  • Per the above, the content of this repository is non-specific to an implementation of XCrypto within any given processor core. That said, the associated repository scarv/scarv specifically houses such an implementation: the SCARV processor core (and associated SoC) offer an integrated implementation of components from the entire SCARV project, XCrypto included.

Quickstart (with more detail in the wiki)

  1. Execute

    git clone https://github.com/scarv/xcrypto.git ./xcrypto
    cd ./xcrypto
    git submodule update --init --recursive
    source ./bin/conf.sh

    to clone and initialise the repository, then configure the environment; for example, you should find that the REPO_HOME environment variable is set appropriately.

  2. Use targets in the top-level Makefile to drive a set of common tasks, e.g.,

    Command Description
    make build-doc build the LaTeX-based documentation
    make clone-toolchain clone the tool-chain
    make build-toolchain build the tool-chain
    make doxygen build the Doxygen-based documentation
    make spotless remove everything built in ${REPO_HOME}/build

Questions?

Publications and presentations

Acknowledgements

This work has been supported in part by EPSRC via grant EP/R012288/1 (under the RISE programme).

xcrypto's People

Contributors

ben-marshall avatar danpage avatar flaviens avatar justincormack avatar phthinh avatar ubfx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xcrypto's Issues

Equivilent RISC-V instruction listing

This might be a lot of work, but might it be illustrative to include the equivilent "standard" RISC-V instruction listing next to the XCrypto one?

Spec: Split Crypto CSR register (mccr) into two registers.

  • Currently the register contains read-only and read-write bits.
  • Split it into two with a note about how it could be be re-combined again in the future.
  • Makes life easier for software and mirrors what the rest of the ISA does wrt splitting RW/RO CRSs.

Need to coordinate with @danpage so the spec changes happen gracefully.

COP Interface Update

The COP interface as implemented adds one cycle of latency un-nessesarily per instruction. This should be fixed in order to make better performance comparisons!

Performance: Re-write top level instruction accept/execute/retire FSM

  • There are some opportunities to drastically increase instruction throughput in the reference implementation.
  • There are several edges in the FSM graph where we can retire and accept a new instruction in the same cycle.
  • Currently new instructions are accepted only once an old one is retired.

Instruction Proposal: xc.sbox.4

An instruction implementing an in-place 4-bit sbox

The concatenation of crs1 and crs2 describes the sbox, where the i'th nibble corresponds to sbox[i].

xc.sbox.4 crd, crs1, crs2
    let t = crs1 || crs2
    for i in 0..15
        let sbox[i] = t[4*i+3 :4*i]
    for i in 0..15
        let crd[4*i+3: 4*i] = sbox[crd[i]]

mclmul.1

  • Implement RTL
  • Implement Model
  • Implement Formal

xc.ins; xc.ext encoded wrong

There are bugs in the instructions xc.ins and xc.ext:
assembly ____________ complied opcode ___________ disassembly
xc.ins c1, c4, 0, 8 _______ 401a28ab _____________ xc.ins c1, c4, 0, 4
xc.ext c2, c5, 0, 8 _______ 403aa92b _____________ xc.ext c2, c5, 0, 4

RTL: MP Instructions

Implement multi-precision instructions RTL.

  • ADD2
  • ADD3
  • SUB2
  • SUB3
  • ACC1
  • ACC2
  • MAC
  • SLL
  • SLLI
  • SRL
  • SRL
  • EQU
  • LTU
  • GTU

Verif: Packed Arithmetic

Model the packed arithmetic instructions. Ideally there will be a lot of code-reuse here becuase only the inner operators of the instructions change, operands and pack-width selection can be abstracted to some degree.

  • Checking for correct pack-width support

  • ADD

  • SUB

  • MUL

  • SLL[i]

  • SRL[i]

  • ROT[i]

Icarus Build Broken

Icarus build target does not work due to differences between Verilator and Icarus.

Spec: Finalise behaviour of MP instructions

We haven't quite finalised how some of the MP instructions should behave, mainly wrt which bits some treat as carry in/carry out.

sub2/sub3

  • These don't seem to make sense in their current form. Would they be better formulated as a "deaccumulate"?

equ/ltu/gtu

  • I don't know exactly what sort of comparison these instructions are performing. I think they just need the individual carry bits stating.
  • I suspect these will need renaming.

Shift Instructions

  • Change spec so that the high word of the 64 bit result goes to destination register 2. Not only is this more sensible, it makes implementation easier as well.
  • See issue #24 for more information.

UArch Spec: Document control flow FSM

  • Add some documentation for the instruction control flow FSM, which is responsible for controlling the acceptance, execution and retiring of instructions.

  • Pay attention to how we handle instruction idempotency duing stalled retires.

Verif: New Formal Testbench

Issue to keep track of creating the new formal testbench.

Checks Implemented:

  • mv2gpr
  • mv2cop
  • add_px
  • sub_px
  • mul_px
  • sll_px
  • srl_px
  • rot_px
  • slli_px
  • srli_px
  • roti_px
  • rseed_cr
  • rsamp_cr
  • cmov_cr
  • cmovn_cr
  • scatter_b
  • gather_b
  • scatter_h
  • gather_h
  • lmix_cr
  • hmix_cr
  • bop_cr
  • equ_mp
  • ltu_mp
  • gtu_mp
  • add3_mp
  • add2_mp
  • sub3_mp
  • sub2_mp
  • slli_mp
  • sll_mp
  • srli_mp
  • srl_mp
  • acc2_mp
  • acc1_mp
  • mac_mp
  • lbu_cr
  • lhu_cr
  • lw_cr
  • lui_cr
  • lli_cr
  • twid_b
  • twid_n0
  • twid_n1
  • twid_c0
  • twid_c1
  • twid_c2
  • twid_c3
  • ins_cr
  • ext_cr
  • sb_cr
  • sh_cr
  • sw_cr

Spec+Implementation: residual inconsistencies

From a proof-read, two issues popped up:

  1. The mnemonic vs. condition of xc.cmov.{t,f} are flipped; we agreed to flip them back.
  • specification
  • implementation
  1. Vs. xc.macc.2, which uses crd2 and crd1 as the MSBs and LSBs mirroring the assembly language syntax, all the shifts, i.e., xc.msll, xc.msll.i, xc.msrl, and xc.msrl.i, do the opposite; we agreed to flip the shifts to match.
  • specification
  • implementation

Formal checks integration with simulation flows.

  • The old simulation model of the COP has been removed in favour of purly using the formal checkers.
  • This work will add the formal checkers back into the simulation testbench.
  • The aim is to remove the duplication of effort. Where there used to be a model of each instruction in both the formal flow and the simulation flow, now there is just the formal checks, re-used in both the formal flow and simulation flow.

Todo:

  • Move formal interface abstraction logic from formal testbench into its own module.
  • Integrate formal checkers into icarus simulation flow.

Integration Example

In order to use the co-processor, it will need to be integrated with an existing core. We can use one of

  • Picorv32
  • Rocket Core

The example integration should be a "drop-in" component which can be used as part of an SoC design. It will then be used to test the performance / energy effects of the new instructions.

Enough of the co-processor is implemented and stable to make starting this work a worthwhile exercise.

Verif: Formal testbench

Use the existing ISE model and checker to implement a formal testbench in using Yosys and Z3.

Spec: MP Shift destination register order.

Change spec and implementation / model so that the 64 bit result is made up of:
{crs2, crs1} << shamt rather than {crs1,crs2} << shamt.

Not only is this more sensible, it makes implementation easier as well.

  • Specification
  • Model
  • Implementation

Initial Public Release Checklist

Checklist for first public release of XCrypto

Repository Admin:

  • Pick a license (MIT)
  • Pick a version number (0.9.0)
  • Repository rename: hw-crypto-cop -> xcrypto
  • Rename environment variables COP_ -> XC_
  • Create a release tag to pinpoint the initial public release
    • Attach copies of the documentation as PDFs to the release tag.
    • Add some release notes documenting what features are (not) present.
    • See draft release notice
  • Make sure README is up to date.
  • Clean up (i.e., remove) unnecessary branches
  • Make repository public (and/or reboot)

Documentation:

  • Specification document proof read
    • make sure commented-out encodings are valid and included
    • solve encoding table in appendix
    • remove changelog
    • residual changes for ld.bu -> ld.b etc.
    • immediate distance fields for xc.psrl.i and xc.psll.i, cshamt => pshamt ?
    • address any remaining marked TODOs
  • Implementation document proof read

RTL:

  • #45 Stop un-implemented AES instructions from hanging the cop. Treat as NOP for now.

Verification:

  • All BMC proofs passing / explained.

Open Issues:

  • Issue #28 closed / fixed (scatter/gather data checks)
  • Issue #42 closed (final renaming pass)
  • Issue #46 closed (residual inconsistencies with names and fields)

Spec: encoding update

We agreed to rename lut4 to rtamt in MIXL.cr and MIXH.cr; it's the same field, but a different name is sane given it's a totally different use case.

Spec: multi-precision comparisons

As written, there's a bug with the multi-precision comparisons. These are meant to capture "chained" or digit-wise comparison steps, so accept and produce a flag; at the moment the flag is produced in GPR[rd], but it should also be accepted from say GPR[rs] (or even GPR[rd]) vs. XCR[crs3].

rngtest

  • Implement RTL
  • Implement Model
  • Implement Formal

Spec: exception behaviour

Currently the definition of exception behaviour for instructions is sort of scattered around; it seem better to capture this precisely in each semantics, or using some shared section.

xc.msub.3 - crs3 not sourced properly

Expected Behaviour

xc.msub.3 (crd2, crd1), crs1, crs2, crs3
   t   <- (crs1 - crs2) - crs3[0]
  crd1 <- t[31:0]
  crd2 <- t[63:32]

Actual Behaviour

All of CRS3 is subtracted, rather than just the LSB.

xc.msub.3 (crd2, crd1), crs1, crs2, crs3
   t   <- (crs1 - crs2) - crs3
  crd1 <- t[31:0]
  crd2 <- t[63:32]

Bug appears consistently in implementation and verification environment (as might happen when one person does both 👎 )

Fix:

xc.msub.3 should sample only the LSB of crs3 operand.

Verification: ALU Coprocessor

Formal checks for verifying the ALU coprocessor

General

  • Make sure memory ops take at least two cycles to complete.

XOR.B

  • Correct results

AND.B

  • Correct results

OR.B

  • Correct results

LB.B

  • Correct loaded data
  • Write enable always clear for LB
  • Correct memory address

SB.B

  • Correct memory address.
  • Correct byte enable.
  • Write enable always set for SB
  • Correct data in relevent byte.

Flow: New formal verification flow using fvm-tool

The current formal flow is not scaling. Some instructions are taking >40 hours to prove. This is because multiple assertions are active at once, or single assertions are trying to express too much.

I've created a basic tool which makes generating parts of the formal environment easier, and lets one split groups of assertions into individual ones. We can then run more, smaller proof jobs which will hopefully complete quicker.

The tasks here are to:

  • Copy the existing assertions into the new fvm-tool format.
  • Add a new flow/fvm/ directory and make flow which will automate everything.
  • Add a flow section which uses Yosys to generate the N required testbenches, one for each assertion, in SMT2 format.
  • Add a flow section which can run one or many of the proofs in parallel, initially using the make -j flag.

xc.ld.[h|b]

These instructions are versions of xc.ld.hu and xc.ld.bu which blank the rest of the register rather than updating a specific part of it.

The "u" in "hu" and "bu" stands for "update" rather than "unsigned", which might be worth pointing out explicitly in each of the instructions which use it.

  • Specification updates
  • Encodings
  • Binutils
  • Formal models
  • ISE Models
  • RTL

RTL: Memory Instructions

Finish testing memory instruction implementations

  • Memory bus testbench driver

Instruction Tests:

  • LW
  • SW
  • LH
  • SW
  • LB
  • SB
  • Gather.h
  • Gather.b
  • Scatter.h
  • Scatter.b

Instruction Implementation:

  • LW
  • SW
  • LH
  • SW
  • LB
  • SB
  • Gather.h
  • Gather.b
  • Scatter.h
  • Scatter.b

Spec: Future Instruction Proposals

There are some instructions we should add:

  • RTEST.cr, and instruction, or a second output from RSAMP.cr, so the RNG can signal issue w. entropy
  • carry-less, i.e., \F_2, version of any integer multiplication available

There are some instructions we could add:

  • a packed MULH.px, meaning MUL.px -> MULL.px, to capture MSBs and LSBs if need be
  • overwriting vs. updating versions of LBU.cr and LHU.cr
  • standard (i.e., not multi-precision) ADD.cr, SUB.cr, MUL.cr, or is this packed width = 32?

Verif: Multi-precision arithmetic instructions

Implement modelling of the MP instructions

  • ADD2
  • ADD3
  • SUB2
  • SUB3
  • ACC1
  • ACC2
  • MAC
  • SLL[i]
  • SRL[i]
  • EQU
  • LTU
  • GTU

Implement Unit tests for:

  • ADD2
  • ADD3
  • SUB2
  • SUB3
  • ACC1
  • ACC2
  • MAC
  • SLL[i]
  • SRL[i]
  • EQU
  • LTU
  • GTU

pmul.h

  • Implement RTL
  • Implement Model
  • Implement Formal

RTL: Packed multiplier broken for lack of packed shifter.

  • The packed multipler implements the dead basic "shift and add" algorithm for multiplication.

  • Currently it will always get the lowest pack element correct, but the nature of shift currently means that low
    partial products can "carry into" the next partial product.

  • Solution is to implement and share the packed shifter with the multipler in the same way that it shares the
    packed adder.

  • This blocks issue #11

  • Implement and integrate packed shifter.

Spec: Instruction naming changes

Somewhere to keep track of all of the naming changes to the instructions.

  1. Specification Updates.
  • Changes made to the specification document
  1. Propagate changes to:
  • Encoding specification (docs/ise-opcodes.txt)
  • ISE Model
  • RTL design
  • Binutils

Spec: final renaming pass

Finalising the specification document threw up some renaming cases that seem to make sense:

  1. xc.cmov and xc.cmovn: the latter is a variant of the former, so either

    • xc.cmov => xc.cmov.t = "true" variant, and xc.cmovn => xc.cmov.f = "false" variant, or
    • xc.cmov => xc.cmov, and xc.cmovn => xc.cmov.n
  2. xc.ld.li and xc.ld.hi: both should have a u varient code to match non-immediate analogues, so

    • xc.ld.li => xc.ld.liu, and xc.ld.hi => xc.ld.hiu
  3. xc.mmul.1 and xc.mclmul.1: the variant number is inconsistent with the others, so either

    • xc.mmul.1 => xc.mmul.3, and xc.mclmul.1 => xc.mclmul.3, or
    • xc.mmul.1 => xc.mmul, and xc.mclmul.1 => xc.mclmul since there is only one variant of each
  4. Given we need to do 3. anyway, I find clmul a bit awkward; I'd prefer a TLA, e.g., clm, so

    • xc.mclmul.1 => xc.mclm.1, xc.pclmul.h => xc.pclm.h, xc.pclmul.l => xc.pclm.l

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.