Code Monkey home page Code Monkey logo

my-own-risc-32-isa-based-cpu-on-fpgas's Introduction

RISC-32 ISA based CPU on FPGAs using Verilog

The three main objectives of this project are: - To learn Verilog. - Gain knowledge of the RISC-32 Instruction Set Architecture to facilitate the development of a softcore CPU on FPGAs for machine code execution. - Comprehend parallelism concepts such as carrying out multiple instances of the same operation in parallel on various data sets.

1. RISC-32 Instruction Set

The RiSC-32 is a 32-bit VLIW (very long instruction word) design which encapsulates two atomic instructions into a single instruction word, so that the hardware can execute multiple things at once.

1.1. Very Long Instruction Word (VLIW)

1.1.1. What is it?

Very Long Instruction Word (VLIW) architecture employs instructions that encode multiple operations within a single word.

Each instruction in a VLIW architecture typically consists of fixed-size fields, with each field dedicated to a specific operation or functional unit.

The VLIW instruction format allows for parallel execution of operations, enabling high performance in superscalar processors.

1.1.2. Why use it? Pros & Cons

The key to higher performance in microprocessors for a broad range of applications is the ability to exploit fine-grain, instruction-level parallelism. Some methods for exploiting fine-grain parallelism include:

  • Pipelining
  • Multipleprocessors
  • Superscalarimplementation
  • Specifying multiple independent operations per instruction

Advantages :

  • Reduces hardware complexity.
  • Reduces power consumption because of reduction of hardware complexity.
  • Since compiler takes care of data dependency check, decoding, instruction issues, it becomes a lot simpler.
  • Increases potential clock rate. Functional units are positioned corresponding to the instruction pocket by compiler.

Disadvantages :

  • Complex compilers are required which are hard to design. Increased program code size.
  • Larger memory bandwidth and register-file bandwidth.
  • Unscheduled events, for example a cache miss could lead to a stall which will stall the entire processor.
  • In case of un-filled opcodes in a VLIW, there is waste of memory space and instruction bandwidth.

1.1.3. Architecture comparison: CISC, RISC, and VLIW

The differences between RISC, CISC, and VLIW are in the formats and semantics of the instructions:

The number of CISC instructions varies, they frequently define an order of operations, and they may call for serial (slow) decoding techniques. A common feature of CISCs is their small number of registers, some of which may be special-purpose registers with limited uses. Usually, memory references are used in conjunction with other actions (add memory to register, for example). The purpose of CISC instruction sets is to leverage microcode.

RISC instructions are straightforward (fast) to decode, describe simple actions, and have a fixed size. There are a lot of general-purpose registers in RISC architectures. Only basic load-register-from-memory and store-register-to-memory operations allow instructions to access main memory. RISC instruction sets are made to make pipelining easier and do not require microcode.

VLIW instructions are lengthier than RISC instructions in order to indicate numerous independent simple operations. You can think of a VLIW instruction as multiple RISC instructions combined into one. Most characteristics of VLIW designs are RISC-like.

1.2. RISC-32 VLIW ISA

1.2.1. VLIW Instruction Format

Both the data and the instructions have a length of 32 bits. Every 32-bit instruction is split into two atoms. An atom can operate on one of the two register files—the 16-entry vector register file on the right or the 16-entry scalar register file on the right—based on its opcode.

Furthermore, an atom may function on both register files in specific situations. Each of the 16 registers in the scalar register file has a width of 32 bits. There are 16 registers in the vector register file as well, however they are all 128 bits wide, or one vector made up of four 32-bit words.

1.2.2. Conventions and Terminology

  • XLEN: bit-length of register in the machine architecture. EX: 32, 64, 128.
  • sxN(val): sign extend val to the left by repearting the sign bit to get a N bits data. EX: sx16(0x8a) --> 0xff8a
  • zxN(val): zero extend val to the left by repearting 0 to get a N bits data. EX: zx16(0x8a) --> 0x008a
  • zrN(val): zero extend val to the right by repeating 0 to get a N bits data. EX: zr(0x8a) --> 0x8a00
  • mN(addr): N bits data (little-endian) in memory starting at address addr. EX: m8(addr) <- source, dest <- m16(addr)
  • pc: current value of program counter.
  • rX: register X
  • immN: N bits immediate numberic operand, the data stored within instruction.
  • rX[h:l]: h to l bit of register X. EX: rA[15:3]

1.2.3. Opcodes

Opcodes, which are 4-bit values, have sixteen various operations that they can encode. Furthermore, the meaning of some opcodes varies based on whether they appear on the right side (atom 1, the low-order bits) or the left side (atom 0, the high-order bits) of the instruction.

A read or write to the vector register file is indicated by the use of a "vX" register identifier in each case, while a read or write to the scalar register file is indicated by the use of a "rX" register identifier.

More details are in the table below: Scalar Operations: 0xxx and include 1111 Vector Operations: 1xxx not include 1111

Opcode Assembly Action Description
0000 add rA, rB, rC rA <= rB + rC Add stored data in register B with stored data register C then store result data in register A.
0001 addi rA, rB, imm rA <= rB + sx32(imm4) Add stored data in register B with immediate data then store result data in register A.
0010 nand rA, rB, rC rA <= rB nand rC Nand stored data in register B with stored data in register C then store result data in register A.
0011 mul rA, rB, rC rA <= rB * rC Multiple stored data in register B with stored data in register C then store result data in register A.
0100 sub rA, rB, rC rA <= rB - rC Subtract stored data in register B with stored data in register C then store result data in register A.
0101 lw rA, rB, imm rA <= m32[rB + sz32(imm4)] Load 32 bits data from memory into register A. Memory address is formed by adding stored data in register B with immediate data.
0110 sw rA, rB, imm rA => m32[rB + sz32(imm4)] Store 32 bits data from register A into memory. Memory address is formed by adding stored data in register B with immediate data.
0111 bne rA, rB, imm (atom left only) PC <= (rA != rB) ? (PC + sz32(imm4)) : (PC + 1) If the stored data in register A and stored data in register B are not the same, branch to the address PC + imm, where PC is the address of this bne instruction.
0111 blz rA, imm (atom right only) PC <= (rA < 0) ? (PC + sz32(imm8)) : (PC + 1) If the stored data in register A is less than zero, branch to the address PC + imm, where PC is the address of this blz instruction.
1000 vadd vA, vB, vC vA <= vB + vC.i Add contents of vector B with vector C, then store result in vector A.
1001 vsum rA1, vB (atom right only) rA1 <= sum(vB.i with i in 0 to 3) Sum all 4 32-bit data values in vector B, then store results in scalar register A1.
1010 vnand vA, vB, vC vA <= vB nand vC.i Nand contents of vector B with vector C, then store result in vector A.
1011 vmul vA, vB, vC vA <= vB * vC Multiply contents of vector B with vector C, then store result in vector A.
1100 vxor vA, vB, vC vA <= vB xor vC Xor contents of vector B with vector C, then store result in vector A.
1101 vlw vA, rB, imm vA <= m128[vB + sz32(imm4)] Load 128-bit data from memory into vector A. Memory address is formed by adding stored data in register B with immediate data.
1101 vsw vA, rB, imm (atom rigth only) vA => m128[vB + sz32(imm4)] Store 128-bit data from vector A into memory. Memory address is formed by adding stored data in register B with immediate data.
1110 vec vA, rB0, rC0, rB1, rC1 (atom left only) vA.0 <= rB0, vA.1 <= rC0, rA.2 <= rB1, rA.3 <= rC1 Read 4 32-bit data values from scalar register file (rB0, rC0, rB1, rC1), write into the vector register file at register vector A.
1110 vlo rA0, rA1, vB (atom left only) rA0 <= vB.0, rA1 <= vB.1 Read 2 32-bit data values from low scalars in vector B, write into A0, A1 scalar registers.
1110 vhi rA0, rA1, vB (atom left only) rA0 <= vB.2, rA1 <= vB.3 Read 2 32-bit data values from high scalars in vector B, write into A0, A1 scalar registers.
1111 jalr rA, rB PC <= rB, rA <= PC + 1 Branch to the address in register B. Store PC + 1 into register A, where PC is the address of this jalr instruction.

2. RISC-32 Memory

2.1. Word Addressing & Memory Access

All scalar addresses in the RiSC-32 architecture are word-based (i.e., memory address 0 corresponds to the first 32 bits, or four bytes, of main memory, address 1 corresponds to the second four bytes of main memory, etc.). The machine can perform two scalar memory operations per cycle.

All vector offsets (immediate values for VLW and VSW instructions) in the RiSC-32 architecture are also word-based and operate on a much larger word (i.e., address offset of 1 corresponds to 4 bytes beyond the address, address offset of 2 corresponds to 8 bytes beyond the address, etc.). The machine can perform two vector loads but only one vector store per cycle, and the VSW instruction is only valid on the right side (even though you might want to, you cannot perform two VSW operations simultaneously, but you can perform two VLW operations).

2.2. Large Immediate Values

The architecture’s 4-bit immediate values can represent numbers in the range [-8 .. 7]. Because this is relatively limited, the instruction set allows for larger values for ADDI/VADDI and BRANCH instructions (but not LW/SW instructions). If an immediate for an ADDI/VADDI or BRANCH instruction is desired that is outside the specified range, it is specified by placing a 0 value in the instruction’s immediate field. The 0 value is chosen because, for example, an ADDI instruction wishing to add a zero value to a register could simply have used the ADD instruction and referenced register 0, which is always zero. When a 0 value is in the ADDI’s immediate field, the following 32-bit value is not an instruction but a full 32-bit immediate value. This is signaled to the assembler by putting a “.l” (dot el) at the end of an ADDI or BNE or BLZ instruction.

addi r1, r2, 7      |   nand r4, r5, r6
add  r7, r8, r9     |   vxor v10, v11, v12

1120 2456
00c0 ffee
0789 cabc

3. Verilog Implementation

The heart of the project is to create in Verilog a CPU model of the RiSC-32 instruction set. The model is to be single-cycle, sequential (non-pipelined) execution. This means that during every cycle, the CPU will execute a single instruction and will not move to the next instruction until the present instruction has been completed and the program counter redirected to a new instruction (the next instruction).

my-own-risc-32-isa-based-cpu-on-fpgas's People

Contributors

qyt0109 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.