Code Monkey home page Code Monkey logo

fgpu's Introduction

FGPU is a soft GPU-like architecture for FPGAs. It is described in VHDL, fully customizable, and can be programmed using OpenCL.

FGPU is currently being developed and maintained by the Chair of Computer Engineering at the Brandenburg University of Technology Cottbus-Senftenberg, in Germany. It was originally developed by Muhammed Al Kadi from the Ruhr University Bochum, in Germany. Also, a key collaborator has been the Electronic Systems Research Group at the University of Pisa.

Contents and Structure of the FGPU Repository

This repository contains the following resources:

  • An RTL description of the FGPU architecture, in VHDL, which can be used for behavioral simulation and FPGA-targeted implementation -- see the RTL folder.
  • An LLVM-based FGPU compiler -- see the the compiler folder.
  • Files for running behavioral simulation in Mentor ModelSim -- see the project_modelsim folder.
  • Files for setting up simulation and implementation projects in Xilinx Vivado.
  • Pre-generated bitstreams to quickly test FGPU applications in the Zynq-7000 ZC706, skipping the HW generation step -- see the bitstreams folder.
  • Examples of OpenCL kernels for execution in the FGPU -- see the kernels folder.
  • Examples of complete benchmarks for execution in an ARM+FGPU processing system configured using the Vivado SDK -- see the benchmark folder.

FGPU Quick Start

The following figure illustrates the HW and SW flow for setting up the FGPU hardware and compiling applications.

Overview of the FGPU Framework.

Setting up the FGPU LLVM-based compiler

The LLVM-based compiler will be used to generate, from an OpenCL kernel description, the binaries containing the FGPU instructions that implement the kernel. To ensure portability, the FGPU compiler is built inside a Docker container. See the instructions in see the compiler README.

Setting up the Vivado SW/HW Development Environment

The Xilinx Vivado platform will be used to generate the hardware implementation from the set of RTL files, and also necessary libraries for communicating the main processing system (an ARM core, in case of the Zynq-7000 ZC706 board). These libraries must be linked with the binaries generated in the compilation stage to enable the processing system to send commands and receive results from the FGPU. Detailed instructions for this process can be found in the README for the Vivado flow.

Limitations

Supported Platforms / Tools

  • The current version of the repository has been tested in both Windows and Linux.
  • The implementation was tested in Xilinx Vivado v2017.2, v2019.2, v2020.2, and v2021.1.
  • The simulation was tested in Mentor ModelSim 2020.1
  • The VHDL code uses some VHDL-2008 constructs, which may be unsupported in some tools.
  • Easily portable to other ZYNQ devices. For Ultrascale+ architecture, check branch ultrascale.

Known Bugs

Hardware

  • Branch in a conditional procedure call is not working. In other words, if a branch took place, then a function is called, whose code will branch, a problem will occurr.
    • Reason: A branch writes the current top entry of the diveergence stack(in CU schedulre) and will create a new entry on the top. But the overwritten entry is needed when the function returns.
    • Solution: When a function is to be called, the wavefront active record (which referes to the top of divergence stack) needs to be incrmeneted. Have a look to the defined but commented PC_stack_dummy_entry in CU_scheduler.vhd.
    • How to regenerate: Use the LUdecompose kernel. Make a soft floating point operation in some if. You need data without NaN of inf.
  • When the buswidth of read data from cache to CUs is smaller than the #CUs * 32bit, they may be some problems. The BRAM which stores read data from the cache in each CU may be overfilled and overwritten.
  • Changing the address of atomic operation within the same kernel has never been tested. Application is needed.
  • Setting the number of cache banks equal or less to the number of the AXI interface banks (CACHE_N_BANKS_W &lt= GMEM_N_BANKS_W) may lead to a bottleneck

Software / Compiler

  • Using other sections rather that .text is not yet supported.
    • How to generate: Define the coeffecients of a 5x5 image filter as a 2D array constant. This data will be stored into .rodata section.

Collaborators

Special thanks to the Department of Information Engineering, University of Pisa for collaborating in making the FGPU more robust.

fgpu's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fgpu's Issues

any Zynq board

Will this project run on any Zynq chip board without changing anything?

FGPU Status control register never changed after executing the kernel

Hello @mbrandalero @beja65536 ,

I am trying to run FGPU in kc705 through XDMA pcie interface. I have successfully downloaded the bitstream in the board and even if i send the fgpu instructions in cram and the parameters of the kernel in the lram section correctly, when I start the execution of the kernel it never ends, and so the status register will always be 0. Is there anyone who could give me some guidance on how to figure out the source of the problem ?

Thanks in advance for your time.

How to exclude FPU units in the project

Hello,

I am trying to port the FGPU core in a KC705 plattform and exclude the floating point units (FPU). Could you please help me to figure out ho to exclude FPU ?

Thanks in advance for your time.

Soft-core GPU micro architecture

Hello,

I was wondering about the microarchitecture of the GPU that is implemented in the repo. I cannot find any reference about that in the README file. Is it little-endian or big-endian?

Thank you for your time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.