Code Monkey home page Code Monkey logo

halide-to-hardware's People

Contributors

aam avatar aankit-ca avatar abadams avatar abestephensg avatar alexreinking avatar chtsao8 avatar connellybarnes avatar dillonhuff avatar dsharlet-intel avatar dsharletg avatar gednyengs avatar jeffsetter avatar jingpu avatar joyliu37 avatar jrk avatar kernhanda avatar kgnk avatar kongty avatar kree-colemcalughlin avatar matthiaskramm avatar nickchornay avatar norabarlow avatar pranavb-ca avatar psuriana avatar rodrigob avatar ronen avatar shoaibkamil avatar steven-johnson avatar tdenniston avatar vksnk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

halide-to-hardware's Issues

Accumulation support

  • Add multiple input streams to unified buffer
  • Extract multiple input streams for accumulation
  • Connect unified buffer correctly for accumulation
  • Generate UNet with accumulation
  • Test UNet with CoreIR

Camera pipeline compute is malformed?

@jeffsetter I cannot get the coreir backend to load the current camera pipeline compute:

When I check the file at the command line I get:

./coreir/bin/coreir --load_libs commonlib --input ./coreir_compute/camera_pipeline_compute.json --output camera_pipeline_compute.v --passes rungenerators;flattentypes;verilog 
ERROR: {hcompute_curved_stencil}.curve$1.clk Is not fully connected (N)
{hcompute_curved_stencil}.curve$1 Is not fully connected (R)


ERROR: {hcompute_curved_stencil}.curve$1.clk Is not fully connected (N)
{hcompute_curved_stencil}.curve$1 Is not fully connected (R)


I AM DYING!

Any idea whats going wrong here?

Cant build examples in branch cleanup_codegen. Link failure with llvm::zlib?

@jeffsetter I pulled and built cleanup_codegen. When I try to build the code in apps/hardware_benchmarks/apps/harris

I get this:

bash-3.2$ make design-vhls
c++ -std=c++11 -I ../../../../include/ -I ../../../../tools/ -fvisibility=hidden -I ../../../../../coreir/include -L../../../../../coreir/lib -Wl,-rpath,../../../../../coreir/lib -g -fno-rtti harris_generator.cpp ../../../../lib/libHalide.a ../../../../tools/GenGen.cpp -o bin/harris.generator  -ldl -lpthread -lz -lcurses -L../../../../../coreir/lib -lcoreir-commonlib -lcoreir -lcoreirsim -lcoreir-float 
Undefined symbols for architecture x86_64:
  "llvm::zlib::uncompress(llvm::StringRef, llvm::SmallVectorImpl<char>&, unsigned long)", referenced from:
      llvm::readPGOFuncNameStrings(llvm::StringRef, llvm::InstrProfSymtab&) in libHalide.a(llvm_377_InstrProf.cpp.o)
  "llvm::zlib::isAvailable()", referenced from:
      llvm::collectPGOFuncNameStrings(llvm::ArrayRef<llvm::GlobalVariable*>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, bool) in libHalide.a(llvm_377_InstrProf.cpp.o)
      llvm::readPGOFuncNameStrings(llvm::StringRef, llvm::InstrProfSymtab&) in libHalide.a(llvm_377_InstrProf.cpp.o)
  "llvm::zlib::compress(llvm::StringRef, llvm::SmallVectorImpl<char>&, llvm::zlib::CompressionLevel)", referenced from:
      llvm::collectPGOFuncNameStrings(llvm::ArrayRef<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&) in libHalide.a(llvm_377_InstrProf.cpp.o)
      (anonymous namespace)::ELFWriter::writeObject(llvm::MCAssembler&, llvm::MCAsmLayout const&) in libHalide.a(llvm_1089_ELFObjectWriter.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [bin/harris.generator] Error 1

Any idea what is wrong?

Negative const syntax error in harris compute

When I try to generate test verilog for harris I get the follwing error:

cmd: ${COREIR_PATH}/bin/coreir --load_libs commonlib --input harris.json --output harris.v
cmd: verilator -Wall --cc harris.v --exe --build harris_verilog_tb.cpp --top-module harris -Wno-lint
%Error: harris.v:45542:8: syntax error, unexpected const, expecting IDENTIFIER
45542 | ) const-255__383 (
      |        ^
%Error: harris.v:45701:8: syntax error, unexpected const, expecting IDENTIFIER
45701 | ) const-255__283 (
      |        ^
%Error: Exiting due to 2 error(s)

This seems to come from the harris compute file: https://github.com/dillonhuff/clockwork/blob/b4dc132aa141f610421159bccf8ae21b8ae353ce/coreir_compute/harris_compute.json#L239

Building Unified Buffer Library in Vivado HLS

  • Improve Unified Buffer Library into template support up to 3D in Jing's line buffer style
  • Add testcase multiple channel convolution using circular line buffer as a real unified buffer
  • Line Buffer Lib support address stream
  • Address Generator
  • Bank selector support different I/O port
  • Psum Buffer with update state
  • VGGnet Pass
  • Mobilenet Pass

cascade_compute.h is out of date?

@jeffsetter when I run C++ simulation for cascade (from example_progs) I get the following compile error on the generated C++ code:

cmd: g++ -fstack-protector-all -std=c++11 regression_tb_unoptimized_cascade.cpp unoptimized_cascade.cpp
unoptimized_cascade.cpp: In function ‘void op_hcompute_hw_input_global_wrapper_stencil(HWStream<hw_uint<16> >&, hw_input_global_wrapper_stencil_cache&, int, int, int)’:
unoptimized_cascade.cpp:1397:24: error: ‘hcompute_hw_input_global_wrapper_stencil’ was not declared in this scope
  auto compute_result = hcompute_hw_input_global_wrapper_stencil(hw_input_stencil_hw_input_global_wrapper_s0_y_c__hw_input_global_wrapper_s0_x_value);
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
unoptimized_cascade.cpp:1397:24: note: suggested alternative: ‘op_hcompute_hw_input_global_wrapper_stencil’
  auto compute_result = hcompute_hw_input_global_wrapper_stencil(hw_input_stencil_hw_input_global_wrapper_s0_y_c__hw_input_global_wrapper_s0_x_value);
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                        op_hcompute_hw_input_global_wrapper_stencil
unoptimized_cascade.cpp: In function ‘void op_hcompute_hw_output_stencil(conv2_stencil_cache&, HWStream<hw_uint<16> >&, int, int, int)’:
unoptimized_cascade.cpp:1489:40: error: no matching function for call to ‘HWStream<hw_uint<16> >::write(hw_uint<8>&)’
  hw_output_stencil.write(compute_result);
                                        ^
In file included from cascade_compute.h:2,
                 from unoptimized_cascade.cpp:10:
hw_classes.h:392:10: note: candidate: ‘void HWStream<T>::write(const T&) [with T = hw_uint<16>]’
     void write(const T& v) {
          ^~~~~
hw_classes.h:392:10: note:   no known conversion for argument 1 from ‘hw_uint<8>’ to ‘const hw_uint<16>&’
clockwork: prog.cpp:2842: std::vector<std::__cxx11::basic_string<char> > run_regression_tb(const string&): Assertion `res == 0' failed.

It looks as though one of the functions hcompute_hw_input_global_wrapper_stencil does not exist in cascade_compute.h, could you check if the cascade compute file is up to date?

Error when doing make run-clockwork on new max pooling example

@jeffsetter @thenextged I've gotten the CPU code compiling and running for my max-pooling example (https://github.com/StanfordAHA/Halide-to-Hardware/tree/maxpool_example/apps/hardware_benchmarks/apps/max_pool_2x2), but when I run make run-clockwork I get the following error:

dhuff@kiwi:~/h2h2clockwork/Halide-to-Hardware/apps/hardware_benchmarks/apps/max_pool_2x2$ make run-clockwork
g++-7 -std=c++17 -I../../../../../clockwork -I../../../../../clockwork/include -I/home/dhuff/h2h2clockwork/Halide-to-Hardware/clockwork/barvinok-0.41/isl -fPIC -I/home/dhuff/h2h2clockwork/clockwork/barvinok-0.41/isl/ -c bin/clockwork_codegen.cpp -o bin/clockwork_codegen.o
In file included from bin/clockwork_codegen.cpp:2:0:
bin/maxpool_memory.cpp: In function ‘prog maxpool()’:
bin/maxpool_memory.cpp:11:27: error: ‘arg_0’ was not declared in this scope
 int32_t &hw_output_s0_c = arg_0;
                           ^~~~~
../../hw_support/hardware_targets.mk:138: recipe for target 'bin/clockwork_codegen.o' failed
make: *** [bin/clockwork_codegen.o] Error 1

Any idea what is going on here? Thanks!

Unified Buffer Functional Model

  • Renew functional model with the new stencil valid parameter
  • Use pybinding for functional model, single source of truth
  • Add reset for functional model (simulation purpose)

CoreIR Rewrite

Image Processing Applications

  • 3x3 conv
  • harris
  • strided conv
  • camera pipeline
    NN Processing Applications
  • UNet layer
  • UNet psum buffer
  • MobileNet with line buffer and double buffer
  • HDRNet with slicing layer

Integrate into System Flow

  • Ensure applications run properly (conv_3_3, cascade, harris, kitchen_sink)
  • Update CoreIR linking using new libraries
  • Create global buffer metadata from Halide
  • Run from Halide to CGRA and test using push-button
  • Run from Halide to SoC and test with global buffer using push-button

Does the clockwork / CPU comparison script support 3D buffers?

@jeffsetter @thenextged I'm trying to add max-pooling, which needs a 3D input and output (app here: https://github.com/StanfordAHA/Halide-to-Hardware/tree/maxpool_example/apps/hardware_benchmarks/apps/max_pool_2x2 ). When I run make run-cpu I get:

dhuff@kiwi:~/h2h2clockwork/Halide-to-Hardware/apps/hardware_benchmarks/apps/max_pool_2x2$ make run-cpu
./bin/process run cpu input.png 
Error: Input buffer input requires a buffer of exactly 3 dimensions, but the buffer passed in has 2 dimensions
../../hw_support/hardware_targets.mk:249: recipe for target 'run-cpu' failed
make: *** [run-cpu] Aborted (core dumped)

I've tried to modify process.cpp to use a 3D buffer here:

processor.input = Buffer<uint8_t>(64, 64, 3);
processor.output = Buffer<uint8_t>(31, 31, 3);

What am I doing wrong here?

Codegen CoreIR for unified buffer

  • Remove extraneous hardware generated by unified loop nest
  • Convert access loopnest into (stride, range) pairs
  • Create CoreIR codegen from Halide for unified buffer
  • Create CoreIR generator for physical unified buffer
  • Abstract CoreIR generator for abstract unified buffer

Run some applications on Garnet with Global Buffer

  • conv_3_3
  • harris
  • camera pipeline
  • unsharp
  • strided conv
  • stereo
  • upsample
  • downsample
  • unet single channel layer, rxry unroll
  • unet multichannel layer, rx unroll
  • unet multichannel layer, z unroll
  • unet tiled multichannel layer

Generated Designs are Huge

A lot of the designs coming out of the master branch are extremely large, the resulting bitstreams have around the order of 10x more configuration that prior to whatever codegen changed happened. It seems like a lot of the extra things are unnecessary control logic that doesn't need to be synthesized, is there some set of optimization passes that can be run to shrink the resulting designs?

old.txt
new.txt

Halide cpmpile CoreIR error

I have installed Halide and CoreIR according to this tutorial.
I want to compile the "apps/hardware_benchmarks/apps/pointwise" application to get the CoreIR of this example application, and I use "make design" command to compile, but I got the following error info:
/home/linuxbrew/.linuxbrew/bin/ld: CodeGen_PTX_Dev.cpp:(.text+0x5f35): undefined reference to llvm::legacy::PassManager::~PassManager()'
/home/linuxbrew/.linuxbrew/bin/ld: CodeGen_PTX_Dev.cpp:(.text+0x5f41): undefined reference to llvm::legacy::FunctionPassManager::~FunctionPassManager()' /home/linuxbrew/.linuxbrew/bin/ld: CodeGen_PTX_Dev.cpp:(.text+0x5f48): undefined reference to vtable for llvm::raw_pwrite_stream'
/home/linuxbrew/.linuxbrew/bin/ld: CodeGen_PTX_Dev.cpp:(.text+0x5f5f): undefined reference to llvm::raw_ostream::~raw_ostream()' /home/linuxbrew/.linuxbrew/bin/ld: CodeGen_PTX_Dev.cpp:(.text+0x62c6): undefined reference to llvm::DataLayout::~DataLayout()'
/home/linuxbrew/.linuxbrew/bin/ld: ../../../../distrib/lib/libHalide.a(CodeGen_PTX_Dev.o): in function Halide::Internal::CodeGen_PTX_Dev::dump() [clone .localalias.208]': CodeGen_PTX_Dev.cpp:(.text+0xe9): undefined reference to llvm::Module::print(llvm::raw_ostream&, llvm::AssemblyAnnotationWriter*, bool, bool) const'
/home/linuxbrew/.linuxbrew/bin/ld: ../../../../distrib/lib/libHalide.a(CodeGen_PTX_Dev.o): in function _GLOBAL__sub_I_CodeGen_PTX_Dev.cpp': CodeGen_PTX_Dev.cpp:(.text.startup+0x40): undefined reference to LLVMLinkInMCJIT'
/home/linuxbrew/.linuxbrew/bin/ld: ../../../../distrib/lib/libHalide.a(CodeGen_PTX_Dev.o):(.data.rel+0x0): undefined reference to llvm::DisableABIBreakingChecks' collect2: error: ld returned 1 exit status make: *** [bin/pointwise.generator] Error 1

Generate Harris result numbers

  • fix ranges using loop substitutions for successive kernels
  • combine loop access using sliding window
  • use logical size to update for loops
  • create compiler unit tests for access pattern ranges
  • verify that Harris runs on the CoreIR simulator
  • verify that Harris runs through GarnetFlow and shale
  • get numbers for Harris

Discrepancy in casting between pointwise compute .h file and coreir .json

@jeffsetter I was comparing the cgra pointwise output to the output I get with the coreir and noticed that one of the pointwise compute units casts its output to uint8:

https://github.com/dillonhuff/clockwork/blob/b4dc132aa141f610421159bccf8ae21b8ae353ce/pointwise_compute.h#L26

While the corresponding coreir compute uses 16 bit arithmetic in all compute units.

Can this be fixed or am I misunderstanding something about the outputs?

Adding Rewrite Rules

  • Bankend optimization, banking, chaining...
  • Port optimization for basic (2D, 3D) linebuffer
  • Port optimization for strided linebuffer
  • Layout transfer buffer for downsample rate matching
  • Storage folding for circular buffer (input bank size ≠ output bank size)

Improve testing of unified buffer

  • Add assertions to extraction of unified buffers
  • Add assertions to insertion of unified buffers
  • Check strides and ranges of resulting output access pattern
  • Check that compute, store, and streaming loops are correct
  • Check that the correct consumer buffers are connected
  • Check that the store and compute level is correct
  • Check that the stencil offset (output_min_pos) is correct
  • unit tests for downsample
  • unit tests for upsample
  • unit tests for bifurcating kernel graphs
  • unit tests for differing compute levels
  • unit tests using reorder to ensure unrolled loops are innermost

Multiple producer synchronization

When multiple producers connect to a single consumer, there needs to be some synchronization between the valid signals. There are issues when they are out-of-sync, as when they have a different valid signature.

CoreIR Simulator for Unified Buffer

Framework

  • merge the simulator plugin into H2H repo and link coreIR in compilation

Application test

  • conv33
  • Harris
  • Downsample
  • Stride Conv
  • Camera Pipe
  • UNet Layer

Properly use compute and store level

  • Extract compute level of a func
  • Determine streaming loops as loops between compute and store level
  • Unified buffer logical size is based on store level (especially for tiled accelerator)
  • Test that MobileNet can use a different compute level

Unsharp cpp compute is out of date?

@jeffsetter I'm trying to build a vanilla CPU unsharp and I'm getting the following error:

cmd: g++ -fstack-protector-all -std=c++11 regression_tb_unoptimized_unsharp.cpp unoptimized_unsharp.cpp
unoptimized_unsharp.cpp: In function ‘void op_hcompute_hw_input_stencil(HWStream<hw_uint<16> >&, hw_input_stencil_cache&, int, int, int, int)’:
unoptimized_unsharp.cpp:2142:24: error: ‘hcompute_hw_input_stencil’ was not declared in this scope
  auto compute_result = hcompute_hw_input_stencil(input_copy_stencil_hw_input_s0_x_c__hw_input_s0_y_c__hw_input_s0_c_value);
                        ^~~~~~~~~~~~~~~~~~~~~~~~~
unoptimized_unsharp.cpp:2142:24: note: suggested alternative: ‘op_hcompute_hw_input_stencil’
  auto compute_result = hcompute_hw_input_stencil(input_copy_stencil_hw_input_s0_x_c__hw_input_s0_y_c__hw_input_s0_c_value);
                        ^~~~~~~~~~~~~~~~~~~~~~~~~
                        op_hcompute_hw_input_stencil
clockwork: prog.cpp:2848: std::vector<std::__cxx11::basic_string<char> > run_regression_tb(const string&): Assertion `res == 0' failed.

Also I don't see unsharp_compute.json in coreir_compute.

HLS Bankend Configuration Generation

Application list

  • 3x3 conv
  • camera pipeline
  • VGG with basic double buffer, psum buffer
  • MobileNet with line buffer and layout transfer buffer(down sample buffer)
  • HDRNet with slicing layer

Get Pytorch->Onnx->Halide working

To do:

  • Check weights of .onnx file
  • Get human readable format of .onnx file
  • Test with small 1-2 layer pytorch network
  • Follow issue opened on pytorch github
  • Figure out how to schedule functions in Halide representation
  • Integrate new more operations to Onnx->Halide code (e.g. ConvTranspose, other)
  • Meet with Jeff, get code working end to end on CGRAFlow, fix compat issues with CoreIR

Remove extraneous muxes

Most of these muxes are due to incorrectly indexed buffers. The most problematic of these are muxes with more than one input. The following are the applications with this issue.

  • remove muxes in strided conv
  • remove muxes in downsample
  • remove muxes in unet conv

CoreIR Rewrite Rule + Application Checklist

General Issue

  • Stride Conv with odd row size has line buffer size incorrect
  • Using PyCoreIR instead of pure json generation

Image Processing Applications

  • 3x3 conv
  • harris
  • strided conv
  • camera pipeline

NN Processing Applications

  • UNet layer
  • UNet psum buffer ??
  • MobileNet with line buffer and double buffer ??

Multiple consumer streams of a unified buffer

  • Change parameters of unified buffer as a vector of consumers
  • Extract each output stream from HalideIR
  • Merge consumer buffers if the access pattern is similar
  • test case for multiple consumers
  • CoreIR simulator support for multiple consumers
  • unsharp application working

Issues with CoreIR code generation, specifically regarding unified buffer

  • Extra muxes (for example, between data output of unified buffer and input of compute units). Present in multichannel conv, strided conv, avg pool.
  • Harris -- ranges are incorrect for padded16, lxx, lxy, lyy unified buffers
  • Tiling image and defining scope of image that should be brought onto the accelerator/CGRA not reflected in generated CoreIR

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.