mrisc32 / mc1 Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 3.0 1.71 MB

A computer (FPGA SoC) based on the MRISC32-A1 CPU

Home Page: https://gitlab.com/mrisc32/mc1

License: zlib License

VHDL 85.53% Verilog 1.49% Python 1.95% Assembly 1.42% Makefile 1.57% C 7.37% Tcl 0.67%

computer cpu fpga mrisc32 vhdl video-logic

mc1's Introduction

This repo has moved to: https://gitlab.com/mrisc32/mrisc32

This is an open and free 32-bit RISC/Vector instruction set architecture (ISA), primarily inspired by the Cray-1 and MIPS architectures. The focus is to create a clean, modern ISA that is equally attractive to software, hardware and compiler developers.

This repository contains LaTeX documentation and databases of architectural information (e.g. instructions and system registers).

Documentation

The latest MRISC32 Instruction Set Manual (PDF) describes the MRISC32 ISA in detail.

Overview documents:

Features

Unified scalar/vector/integer/floating-point ISA.
There are two register files:
- R0-R31: 32 scalar registers, each 32 bits wide.
  - Three registers have special meaning in hardware: Z, LR, VL.
  - 29 registers are general purpose (of which three are reserved by the ABI: SP, FP, TP).
  - All registers can be used for all types (integers, addresses and floating-point).
- V0-V31: 32 vector registers, each with at least 16 32-bit elements.
  - All registers can be used for all types (integers, addresses and floating-point).
All instructions are 32 bits wide and easy to decode.
Most instructions are non-destructive 3-operand (two sources, one destination).
All conditionals are based on register content.
- There are no condition code flags (carry, overflow, ...).
- Compare instructions generate bit masks.
- Branch instructions can act on bit masks (all bits set, all bits zero, etc) as well as signed quantities (less than zero, etc).
- Bit masks are suitable for masking in conditional operations (for scalars, vectors and packed data types).
Powerful addressing modes:
- Scaled indexed load/store (x1, x2, x4, x8).
- Gather-scatter and stride-based vector load/store.
- PC-releative and absolute load/store:
  - ±4 MiB range with one instruction.
  - Full 32-bit range with two instructions.
- PC-relative and absolute branch:
  - ±4 MiB range with one instruction.
  - Full 32-bit range with two instructions.
Many traditional floating-point operations can be handled in whole or partially by integer operations, reducing the number of necessary instructions:
- Load/store.
- Branch.
- Sign and bit manipulation (e.g. neg, abs).
Vector operations use a Cray-like model:
- Vector operations are variable length (1-N elements).
- Most integer and floating-point instructions come in both scalar and vector variants.
- Vector instructions can use both vector and scalar operands (including immediate values), which removes the overhead for transfering scalar data into vector registers.
In addition to vector operations, there are also packed operations that operate on small data types (byte and half-word).
Fixed point operations are supported:
- Single instruction multiplication of Q31, Q15 and Q7 fixed point numbers.
- Single instruction conversion between floating-point and fixed point.
- Saturating and halving addition and subtraction.

Note: There is no support for 64-bit floating-point operations (that is left for a 64-bit version of the ISA).

mc1's People

Contributors

Stargazers

Watchers

Forkers

mfkiwl extraordyxp isabella232

mc1's Issues

video: Fix incorrect timing (video_tb.vhd vs mc1.vhd)

There is a difference in how video_tb.vhd works compared to how mc1.vhd works. The memory read timing is likely incorrect in video_tb.vhd.

Optimize block RAM (BRAM) usage

Investigate if BRAM in a typical device can be better utilized.

For instance, the VCPP stacks only use 384 bits each, hardly filling up a single memory block (usually 9+ Kbit / block), meaning that lots of block RAM bits go to waste. Could we use MLAB instead of BRAM in Intel devices, for instance?

DE0-CV: Add SDRAM support and hook in as external memory (XRAM)

MMIO: Add more I/O registers

At least cover the most important I/O ports for DE0-CV:

microSD
PS/2
GPIO

vcr: Add a BGCOL register

The BGCOL register would be a 24-bit RGB color that is displayed when no pixels are showing (i.e. outside HSTRT/HSTOP).

As an extension the pixels could be alpha blended with the background color.

MMIO: Make output regs r/w

ROM: Infer block ROM

According to Altera: Recommended HDL Coding Styles (page 6-30), this should infer synchronous ROM:

LIBRARY ieee;
USE ieee.std_logic_1164.all;

ENTITY sync_rom IS
  PORT (
    clock: IN STD_LOGIC;
    address: IN STD_LOGIC_VECTOR(7 downto 0);
    data_out: OUT STD_LOGIC_VECTOR(5 downto 0)
  );
END sync_rom;

ARCHITECTURE rtl OF sync_rom IS
BEGIN
  PROCESS (clock)
  BEGIN
    IF rising_edge (clock) THEN
      CASE address IS
        WHEN "00000000" => data_out <= "101111";
        WHEN "00000001" => data_out <= "110110";
        ...
        WHEN "11111110" => data_out <= "000001";
        WHEN "11111111" => data_out <= "101010";
        WHEN OTHERS => data_out <= "101111";
      END CASE;
    END IF;
  END PROCESS;
END rtl;

vid_pix_prefetch: Properly prefetch start of row and negative x-increments

Create a VCP assembler

The VCP assembler should accept a simple assembly syntax, e.g. similar to:

    ; Video registers
    .set    ADDR, 0
    .set    XOFFS, 1
    .set    XINCR, 2
    .set    HSTRT, 3
    .set    HSTOP, 4
    .set    CMODE, 5

    ; CMODE constants
    .set    CM_RGBA8888, 0
    .set    CM_RGBA5551, 1
    .set    CM_PAL8, 2
    .set    CM_PAL4, 3
    .set    CM_PAL2, 4
    .set    CM_PAL1, 5

    ; Set the program start address
    .org    0x000100

main:
    ; Display nothing
    setreg  HSTRT, 0
    setreg  HSTOP, 0

    ; Set the video mode
    setreg  XOFFS, 0x000000
    setreg  XINCR, 0x008000   ; 640 pixels/row
    setreg  CMODE, CM_PAL8

    ; Set the palette
    jsr     load_palette_a

    ; Activate video output starting at row 0.
    wait    0
    setreg  HSTOP, 1280

    ; Generate video addresses for all rows.
    .set    row, 0
    .set    row_addr, 0x001000
    .rept   360
      wait    row
      setreg  ADDR, row_addr
      .add    row, 2
      .add    row_addr, 160   ; Row stride
    .endr

    ; End of program
    wait    32767

load_palette_a:
    ; Load a palette with 256 colors.
    setpal  0, 255
    .word   0xff00ff00
    .lerp   0x01010101, 0xffffffff, 255
    rts

Syntax rules should be obvious from the above example (anything fancier should be unnecessary, to start with).

The following directives would be useful:

.org - set the program location
.rept/.endr - repeat section
.set - define symbol
.add - add a value to a symbol (redundant if .set supports mathematical expressions)
.lerp - generate a linear RGBA8888 gradient (linear interpolation between two end-values)
.word - insert one or more 32-bit data elements into the stream

The output from the assembler should be one of the following:

A GNU assembler compatible source file, suitable for inclusion (.inc).
A raw binary file.

It would be useful for a CPU (MRISC32) program to get access to some of the VCP-symbols, e.g. to determine the location of palette data. This could be handled by a .global directive that adds a GNU assembler symbol with the value of the given VCP symbol, for instance.

libc: Add a simple version of printf()

In its simplest form, printf(const char* str, ...) will:

Support up to N variable arguments (e.g. N=7, which fits into registers).
Iterate str to find %d, %h and %s:
- Copy the substring up to the % and call vcon_print() (use a static buffer, or malloc?).
  - Better yet: Introduce vcon_printn() that takes a length parameter.
- Depending on the type (d, h or s), call vcon_print_dec(), vcon_printf_hex() or vcon_print() with the correct argument.

It does not have to be a full printf() implementation, and we do not need to support sprintf() etc. Just have it around for convenience in C land.

ROM: Add a minimal video console

...for text output (and possibly input in the future). Printf debugging, here we go!

Improve the MMIO registers

Set up R/W MMIO register and map them to useful I/O, both external board I/O and internal machine I/O. These will be different for different platforms, so make it possible to map them to different things but try to keep the register names/locations as portable as possible.

Example board I/O:

LED:s [out] - One 32-bit register?
7-16 segment display [out] - One register per char x 8 chars?
Switches [in] - One 32-bit register?
Buttons [in] - One 32-bit register?
PS/2 [in] - One 32-bit register?
microSD [in/out] - Two 32-bit registers?
GPIO [in/out] - Many 32-bit registers?

Example internal I/O:

CPU clock frequency in Hz [in] - One 32-bit register.
Memory size [in] - One 32-bit register per memory type (VRAM, DRAM?).
Native video resolution [in] - Three registers: width, height, screen refresh rate.
Video frame number (free running counter) [in] - One 32-bit register.
Clock counter (free running counter, e.g. µs resolution or CPU clock ticks) [in] - One or two 32-bit registers.

video: Add simple 1-word caches to reduce VRAM traffic

The pixel pipeline often reads the same word over and over, especially in palette mode and low resolutions (e.g. 320x180 8bpp => 16 reads of each word, 1280x720 1bpp => 32 reads). Add a simple 1-word cache, or better yet a 2-word cache with prefetch.

Similarly the VCPP reads the same word over and over during WAIT commands. Add a 1-word cache.

The goal is to free up VRAM bandwidth to enable more things to be connected to the VRAM read port, in particular:

A second video pipeline for two-layer graphics.
Audio DMA.

VCP: Fix the example program (it's no longer correct)

PS/2: Don't forward device->host commands as keyboard scan codes

vcpp: Glitch when coming out from WAIT while VRAM is busy

The VCPP can feed incorrect data to the ID/EX stage (only for SETPAL commands?) when it comes out of a WAITX instruction and competes with the pixel pipeline for the VRAM.

Testcase

dual-gradients.vcp

Expected

The "lines" should be blue:ish.

Actual

After a certain row the lines become pink (i.e. the incorrect palette index gets set to the pink color).

PS/2: Translate PS/2 codes to MC1-internal (more sane) scan codes

The PS/2 key codes do not really make any sense from a software perspective (e.g. alphanumeric characters are not encoded continuously in the key code space). Also, if we want to support other input devices (e.g. USB), we need a common, virtual encoding anyway.

For instance, consider using the GLFW key code encoding.

vcpp: One instruction may be lost after a WAIT instruction

We currently have to insert a single NOP after each WAIT instruction, since the instruction directly after the WAIT instruction may be lost (depending on the pipeline/scheduling situation):

    wait    123
    nop             ; <- SHOULD NOT BE NEEDED!
    setreg  0, 0x004567

VCPP: Add a jump-to-subroutine instruction

A jump-to-subroutine instruction could be useful for several things, but two very important use cases are:

Allow the first program instruction, which is located at a fixed memory address, to be a branch to the real VCP location, anywhere in the video RAM.
Re-use large:ish program parts. This is most useful for altering between two or more palettes, for instance (e.g. every second line is a darker version of the main palette).

We'd need an internal call stack, but it should be sufficient with a very small stack (e.g. 8 entries). The return instruction can be encoded as a jump to address zero (which is something you'd never want to do).

vcpas: Implement expression evaluation

Much more interesting test VCP:s could be written if vcpas supported mathematical expressions.

video: Missing pixels (first 32-bit word) at the start of a framebuffer

Testcase

test-image-640x360-pal8.vcp

Expected

All framebuffer pixels should be shown.

Actual

The first four pixels (i.e. a 8x1 rectangle in 1280x720 resolution) are missing. This is the first 32-bit word that is read by the pixel pipeline.

synchronizer (CDC): Add protection against partial bus updates

See: https://github.com/damofthemoon/cdc/blob/master/doc/data_bus_synchronizer.md

Idea: Add a value change detector, and hold on to the old bus value until a few cycles have passed after a value change event.

MMIO: Add raster x and y position input regs

video: Improve dithering

After the final color stage, add dithering with a configurable (compile time) target bit resolution.

E.g. if the hardware has 12-bit RGB output (4 bits per component), the lower 4 bits of each component will be dropped. Therefore we'll add dithering in the range [0, 2^4-1] (i.e. [0, 15]).

Design ideas:

It should be possible to disable dithering completely at compile time (e.g. if full 24-bit video output is available).
It should be possible to configure the dithering method at run time via a VCR.
Implement a white noise ditherer (mostly for reference).
Implement a simple error diffusion algorithm e.g. one-dimensional or two-dimensional error diffusion (the latter would require a dual-ported line buffer).
Add a noisy start value per line (my be useful for reducing one-dimensional error diffusion artifacts).
Add a noisy start value per frame (should give temporal dithering which may hide dithering patterns, but care must be taken not to introduce flickering).

Introduce I/O buffer memory areas

Many I/O tasks require buffers (e.g. DMA, microSD I/O, keyboard FIFO, etc). These can be mapped into the MMIO memory area, but should use a RAM model rather than the current MMIO registers.

Decide whether to use a common RAM (e.g. BRAM) for all I/O areas, or independent RAM areas for each I/O device (the latter would eliminate bus contention, but requires more MUX logic on the CPU memory bus side, and possibly more BRAM blocks).

video: Add a second video layer

The second video layer will be a copy of the current video pipeline (so we get two VCPP:s, two sets of registers' two palettes and two pixel pipelines).

Use alpha blending (configurable) to mix the two layers.

Depends on #3

vid_create_fb_vcp() - Create a VCP for a regular framebuffer (width, height, cmode, ptr). Return VCP base address and address to the palette.
vid_active_vcp() - Set the active VCP
vid_wait_vblank() - Wait for vertical blank

mrisc32 / mc1 Goto Github PK

mc1's Introduction

This repo has moved to: https://gitlab.com/mrisc32/mrisc32

Documentation

Features

mc1's People

Contributors

Stargazers

Watchers

Forkers

mc1's Issues

Testcase

Expected

Actual

Testcase

Expected

Actual

Recommend Projects

Recommend Topics

Recommend Org