The RiSC-32 is a 32-bit VLIW (very long instruction word) design which encapsulates two atomic instructions into a single instruction word, so that the hardware can execute multiple things at once.
Very Long Instruction Word (VLIW) architecture employs instructions that encode multiple operations within a single word.
Each instruction in a VLIW architecture typically consists of fixed-size fields, with each field dedicated to a specific operation or functional unit.
The VLIW instruction format allows for parallel execution of operations, enabling high performance in superscalar processors.
The key to higher performance in microprocessors for a broad range of applications is the ability to exploit fine-grain, instruction-level parallelism. Some methods for exploiting fine-grain parallelism include:
- Pipelining
- Multipleprocessors
- Superscalarimplementation
- Specifying multiple independent operations per instruction
Advantages :
- Reduces hardware complexity.
- Reduces power consumption because of reduction of hardware complexity.
- Since compiler takes care of data dependency check, decoding, instruction issues, it becomes a lot simpler.
- Increases potential clock rate. Functional units are positioned corresponding to the instruction pocket by compiler.
Disadvantages :
- Complex compilers are required which are hard to design. Increased program code size.
- Larger memory bandwidth and register-file bandwidth.
- Unscheduled events, for example a cache miss could lead to a stall which will stall the entire processor.
- In case of un-filled opcodes in a VLIW, there is waste of memory space and instruction bandwidth.
The differences between RISC, CISC, and VLIW are in the formats and semantics of the instructions:
The number of CISC instructions varies, they frequently define an order of operations, and they may call for serial (slow) decoding techniques. A common feature of CISCs is their small number of registers, some of which may be special-purpose registers with limited uses. Usually, memory references are used in conjunction with other actions (add memory to register, for example). The purpose of CISC instruction sets is to leverage microcode.
RISC instructions are straightforward (fast) to decode, describe simple actions, and have a fixed size. There are a lot of general-purpose registers in RISC architectures. Only basic load-register-from-memory and store-register-to-memory operations allow instructions to access main memory. RISC instruction sets are made to make pipelining easier and do not require microcode.
VLIW instructions are lengthier than RISC instructions in order to indicate numerous independent simple operations. You can think of a VLIW instruction as multiple RISC instructions combined into one. Most characteristics of VLIW designs are RISC-like.
Both the data and the instructions have a length of 32 bits. Every 32-bit instruction is split into two atoms. An atom can operate on one of the two register files—the 16-entry vector register file on the right or the 16-entry scalar register file on the right—based on its opcode.
Furthermore, an atom may function on both register files in specific situations. Each of the 16 registers in the scalar register file has a width of 32 bits. There are 16 registers in the vector register file as well, however they are all 128 bits wide, or one vector made up of four 32-bit words.
- XLEN: bit-length of register in the machine architecture. EX: 32, 64, 128.
- sxN(val): sign extend val to the left by repearting the sign bit to get a N bits data. EX: sx16(0x8a) --> 0xff8a
- zxN(val): zero extend val to the left by repearting 0 to get a N bits data. EX: zx16(0x8a) --> 0x008a
- zrN(val): zero extend val to the right by repeating 0 to get a N bits data. EX: zr(0x8a) --> 0x8a00
- mN(addr): N bits data (little-endian) in memory starting at address addr. EX: m8(addr) <- source, dest <- m16(addr)
- pc: current value of program counter.
- rX: register X
- immN: N bits immediate numberic operand, the data stored within instruction.
- rX[h:l]: h to l bit of register X. EX: rA[15:3]
Opcodes, which are 4-bit values, have sixteen various operations that they can encode. Furthermore, the meaning of some opcodes varies based on whether they appear on the right side (atom 1, the low-order bits) or the left side (atom 0, the high-order bits) of the instruction.
A read or write to the vector register file is indicated by the use of a "vX" register identifier in each case, while a read or write to the scalar register file is indicated by the use of a "rX" register identifier.
More details are in the table below: Scalar Operations: 0xxx and include 1111 Vector Operations: 1xxx not include 1111
Opcode | Assembly | Action | Description |
---|---|---|---|
0000 | add rA, rB, rC | rA <= rB + rC | Add stored data in register B with stored data register C then store result data in register A. |
0001 | addi rA, rB, imm | rA <= rB + sx32(imm4) | Add stored data in register B with immediate data then store result data in register A. |
0010 | nand rA, rB, rC | rA <= rB nand rC | Nand stored data in register B with stored data in register C then store result data in register A. |
0011 | mul rA, rB, rC | rA <= rB * rC | Multiple stored data in register B with stored data in register C then store result data in register A. |
0100 | sub rA, rB, rC | rA <= rB - rC | Subtract stored data in register B with stored data in register C then store result data in register A. |
0101 | lw rA, rB, imm | rA <= m32[rB + sz32(imm4)] | Load 32 bits data from memory into register A. Memory address is formed by adding stored data in register B with immediate data. |
0110 | sw rA, rB, imm | rA => m32[rB + sz32(imm4)] | Store 32 bits data from register A into memory. Memory address is formed by adding stored data in register B with immediate data. |
0111 | bne rA, rB, imm (atom left only) | PC <= (rA != rB) ? (PC + sz32(imm4)) : (PC + 1) | If the stored data in register A and stored data in register B are not the same, branch to the address PC + imm, where PC is the address of this bne instruction. |
0111 | blz rA, imm (atom right only) | PC <= (rA < 0) ? (PC + sz32(imm8)) : (PC + 1) | If the stored data in register A is less than zero, branch to the address PC + imm, where PC is the address of this blz instruction. |
1000 | vadd vA, vB, vC | vA <= vB + vC.i | Add contents of vector B with vector C, then store result in vector A. |
1001 | vsum rA1, vB (atom right only) | rA1 <= sum(vB.i with i in 0 to 3) | Sum all 4 32-bit data values in vector B, then store results in scalar register A1. |
1010 | vnand vA, vB, vC | vA <= vB nand vC.i | Nand contents of vector B with vector C, then store result in vector A. |
1011 | vmul vA, vB, vC | vA <= vB * vC | Multiply contents of vector B with vector C, then store result in vector A. |
1100 | vxor vA, vB, vC | vA <= vB xor vC | Xor contents of vector B with vector C, then store result in vector A. |
1101 | vlw vA, rB, imm | vA <= m128[vB + sz32(imm4)] | Load 128-bit data from memory into vector A. Memory address is formed by adding stored data in register B with immediate data. |
1101 | vsw vA, rB, imm (atom rigth only) | vA => m128[vB + sz32(imm4)] | Store 128-bit data from vector A into memory. Memory address is formed by adding stored data in register B with immediate data. |
1110 | vec vA, rB0, rC0, rB1, rC1 (atom left only) | vA.0 <= rB0, vA.1 <= rC0, rA.2 <= rB1, rA.3 <= rC1 | Read 4 32-bit data values from scalar register file (rB0, rC0, rB1, rC1), write into the vector register file at register vector A. |
1110 | vlo rA0, rA1, vB (atom left only) | rA0 <= vB.0, rA1 <= vB.1 | Read 2 32-bit data values from low scalars in vector B, write into A0, A1 scalar registers. |
1110 | vhi rA0, rA1, vB (atom left only) | rA0 <= vB.2, rA1 <= vB.3 | Read 2 32-bit data values from high scalars in vector B, write into A0, A1 scalar registers. |
1111 | jalr rA, rB | PC <= rB, rA <= PC + 1 | Branch to the address in register B. Store PC + 1 into register A, where PC is the address of this jalr instruction. |
All scalar addresses in the RiSC-32 architecture are word-based (i.e., memory address 0 corresponds to the first 32 bits, or four bytes, of main memory, address 1 corresponds to the second four bytes of main memory, etc.). The machine can perform two scalar memory operations per cycle.
All vector offsets (immediate values for VLW and VSW instructions) in the RiSC-32 architecture are also word-based and operate on a much larger word (i.e., address offset of 1 corresponds to 4 bytes beyond the address, address offset of 2 corresponds to 8 bytes beyond the address, etc.). The machine can perform two vector loads but only one vector store per cycle, and the VSW instruction is only valid on the right side (even though you might want to, you cannot perform two VSW operations simultaneously, but you can perform two VLW operations).
The architecture’s 4-bit immediate values can represent numbers in the range [-8 .. 7]. Because this is relatively limited, the instruction set allows for larger values for ADDI/VADDI and BRANCH instructions (but not LW/SW instructions). If an immediate for an ADDI/VADDI or BRANCH instruction is desired that is outside the specified range, it is specified by placing a 0 value in the instruction’s immediate field. The 0 value is chosen because, for example, an ADDI instruction wishing to add a zero value to a register could simply have used the ADD instruction and referenced register 0, which is always zero. When a 0 value is in the ADDI’s immediate field, the following 32-bit value is not an instruction but a full 32-bit immediate value. This is signaled to the assembler by putting a “.l” (dot el) at the end of an ADDI or BNE or BLZ instruction.
addi r1, r2, 7 | nand r4, r5, r6
add r7, r8, r9 | vxor v10, v11, v12
1120 2456
00c0 ffee
0789 cabc
The heart of the project is to create in Verilog a CPU model of the RiSC-32 instruction set. The model is to be single-cycle, sequential (non-pipelined) execution. This means that during every cycle, the CPU will execute a single instruction and will not move to the next instruction until the present instruction has been completed and the program counter redirected to a new instruction (the next instruction).