Code Monkey home page Code Monkey logo

fpga_caffe's People

Contributors

blgene avatar cdluminate avatar cypof avatar dgolden1 avatar ducha-aiki avatar eelstork avatar erictzeng avatar flx42 avatar jamt9000 avatar jeffdonahue avatar jyegerlehner avatar kloudkl avatar longjon avatar lukeyeager avatar mavenlin avatar mohomran avatar philkr avatar qipeng avatar rbgirshick avatar rdicecco avatar ronghanghu avatar sergeyk avatar sguada avatar shelhamer avatar tnarihi avatar turtletaco avatar williford avatar willyd avatar yangqing avatar yosinski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fpga_caffe's Issues

Error when training LeNet

When I was running ./examples/mnist/train_lenet_ocl_hwcn.sh. It seems like the xclbin files cannot be found. I did copy the bitstream into .build_release/opencl/src/caffe/layers/. The error message is as follows:

ERROR: No program executable for device
*** Aborted at 1521319025 (unix time) try "date -d @1521319025" if you are using GNU date ***
PC: @ 0x7fd621207954 xocl::detail::event::validOrError()
*** SIGSEGV (@0x20) received by PID 21575 (TID 0x7fd62487db00) from PID 32; stack trace: ***
@ 0x7fd622e5c4b0 (unknown)
@ 0x7fd621207954 xocl::detail::event::validOrError()
@ 0x7fd62120717c clWaitForEvents
@ 0x7fd624335256 caffe::OCLPoolingHWCNLayer<>::launchKernel()
@ 0x7fd62433545e caffe::OCLPoolingHWCNLayer<>::Forward_ocl()
@ 0x7fd6243dd35c caffe::Net<>::ForwardFromTo()
@ 0x7fd6243dd707 caffe::Net<>::Forward()
@ 0x7fd6244345b2 caffe::Solver<>::Test()
@ 0x7fd624434f7e caffe::Solver<>::TestAll()
@ 0x7fd624437607 caffe::Solver<>::Step()
@ 0x7fd62443784a caffe::Solver<>::Solve()
@ 0x40ada9 train()
@ 0x40719e main
@ 0x7fd622e47830 __libc_start_main
@ 0x407a29 _start
@ 0x0 (unknown)

What are the changes to be made to map two compute units ?

Please use the caffe-users list for usage, installation, or modeling questions, or other requests for help.
Do not post such requests to Issues. Doing so interferes with the development of Caffe.

Please read the guidelines for contributing before submitting this issue.

Issue summary

Steps to reproduce

If you are having difficulty building Caffe or training a model, please ask the caffe-users mailing list. If you are reporting a build error that seems to be due to a bug in Caffe, please attach your build configuration (either Makefile.config or CMakeCache.txt) and the output of the make (or cmake) command.

Your system configuration

Operating system:
Compiler:
CUDA version (if applicable):
CUDNN version (if applicable):
BLAS:
Python or MATLAB version (for pycaffe and matcaffe respectively):

Questions about optimal parameters

hi, @dicecco1,
If I use kernel crp_layer_hwcn_cpfp and set OCFACT to 4, then how could I do the corresponding changes to prototxt? Every thing works ok if I do not change the original parameters, such as pad_to=4,num_cu=16 and num_pe=4. I wonder if we could improve the performance by changing the parameters.
Thanks!

Performance difference between Winograd algorithm and direct convolution

In your paper related to this work, you mentioned a Winograd algorithm, but I can't find any information about it in this project, except one file, but it hasn't been used and it is denoted that only forward process is supported.

So I was wondering how about the performance difference between Winograd and direct convolution ? Does it make sense that you deleted the things about Winograd due to Winograd doesn't perform better than direct convolution? And, could you unveil more details about the performance comparison?

error: cannot find -lxilinxopencl and -llmx6.0

Hi, thanks for open-sourcing. I try to compile and run this, after "make all", I got an error bellow:

LD -o .build_release/lib/libcaffe.so.1.0.0
/usr/bin/ld: cannot find -lxilinxopencl
/usr/bin/ld: cannot find -llmx6.0
collect2: error: ld returned 1 exit status
Makefile:608: recipe for target '.build_release/lib/libcaffe.so.1.0.0' failed
make: *** [.build_release/lib/libcaffe.so.1.0.0] Error 1

According to your reply in issue #5, I sourced the settings64.sh file, but the error still exists.
By the way, I'm using SDAccel 2017.1 on Ubuntu 16.04.5 LTS Version.
Any suggestions?

The images predicted by 8pe and 16pe ware wrong

I want to use the kernel crp_layer_hwcn_cpfp_16pegrp to predict the output. I have already changed the deploy.txt. But the output are the same for 256 images. Then what's wrong?
The output:
ILSVRC2012_val_00000001,847,76,103,50,51,65
ILSVRC2012_val_00000002,847,76,103,50,51,970
ILSVRC2012_val_00000003,847,76,103,50,51,230
ILSVRC2012_val_00000004,847,76,103,50,51,809
ILSVRC2012_val_00000005,847,76,103,50,51,516
ILSVRC2012_val_00000006,847,76,103,50,51,57
ILSVRC2012_val_00000007,847,76,103,50,51,334
ILSVRC2012_val_00000008,847,76,103,50,51,415
ILSVRC2012_val_00000009,847,76,103,50,51,674
ILSVRC2012_val_00000010,847,76,103,50,51,332
ILSVRC2012_val_00000011,847,76,103,50,51,109
ILSVRC2012_val_00000012,847,76,103,50,51,286
ILSVRC2012_val_00000013,847,76,103,50,51,370
ILSVRC2012_val_00000014,847,76,103,50,51,757
ILSVRC2012_val_00000015,847,76,103,50,51,595
ILSVRC2012_val_00000016,847,76,103,50,51,147
ILSVRC2012_val_00000017,847,76,103,50,51,108
ILSVRC2012_val_00000018,847,76,103,50,51,23
ILSVRC2012_val_00000019,847,76,103,50,51,478
ILSVRC2012_val_00000020,847,76,103,50,51,517
ILSVRC2012_val_00000021,847,76,103,50,51,334
ILSVRC2012_val_00000022,847,76,103,50,51,173
ILSVRC2012_val_00000023,847,76,103,50,51,948
The last column is the label data.

make runtest failing

Hi @dicecco1 ,

Came across this work on SDAccel forum and read your paper, thx for open-sourcing!
After tweaking around the Makefile I am able to finish Make-ing the codes,
Make runtest passed these tests below, with sw_emu mode of SDAccel, and using the sw xclbins:
oclConvolutionLayerTest/1
oclConvolutionLayerTest/3
OCLPoolingLayerTest/1
OCLPoolingLayerTest/3

but it also failed some tests, like:
[----------] 6 tests from BlobSimpleTest/1, where TypeParam = double
[ RUN ] BlobSimpleTest/1.TestInitialization
[ OK ] BlobSimpleTest/1.TestInitialization (0 ms)
[ RUN ] BlobSimpleTest/1.TestPointersCPUOCL
src/caffe/test/test_blob.cpp:47: Failure
Value of: this->blob_preshaped_->ocl_data()
Actual: false
Expected: true
src/caffe/test/test_blob.cpp:49: Failure
Value of: this->blob_preshaped_->mutable_ocl_data()
Actual: false
Expected: true
[ FAILED ] BlobSimpleTest/1.TestPointersCPUOCL, where TypeParam = double (0 ms)

another one, with the data mismatch lines repeating many times:
[----------] 3 tests from OCLLRNLayerTest/2, where TypeParam = caffe::GPUDevice<float>
[ RUN ] OCLLRNLayerTest/2.TestForwardAcrossChannelsLRN2
src/caffe/test/test_lrn_layer.cpp:614: Failure
The difference between this->blob_top_->cpu_data()[i] and top_reference.cpu_data()[i] is 0.10926561057567596, which exceeds this->epsilon_, where
this->blob_top_->cpu_data()[i] evaluates to 0,
top_reference.cpu_data()[i] evaluates to -0.10926561057567596, and
this->epsilon_ evaluates to 9.9999997473787516e-06.
...
[ FAILED ] OCLLRNLayerTest/2.TestForwardAcrossChannelsLRN2, where TypeParam = caffe::GPUDevice<float> (11173 ms)
[ RUN ] OCLLRNLayerTest/2.TestForwardAcrossChannelsLRN1
*** Aborted at 1501926253 (unix time) try "date -d @1501926253" if you are using GNU date ***
PC: @ 0x7f762f241f54 clWaitForEvents
*** SIGSEGV (@0x0) received by PID 31881 (TID 0x7f7621f11720) from PID 0; stack trace: ***
@ 0x3d0dc0f7e0 (unknown)
@ 0x7f762f241f54 clWaitForEvents
@ 0x7f762e1e23d1 caffe::OCLLRNLayer<>::Call_ocl()
...
and then the test ended due to the segmentation fault.

Any ideas? SDAccel version was 2016.1.

compile erro "/bin/ld: cannot find -lxilinxopencl..."

I'm using centos7 and SDAccel 2017.1.
When I want to compile following your instructs in #3 (comment)

After command "make all", I got a problem like this :

...
CXX src/gtest/gtest-all.cpp
LD -o .build_release/lib/libcaffe.so.1.0.0
/bin/ld: cannot find -lxilinxopencl
/bin/ld: cannot find -llmx6.0
/bin/ld: cannot find -lcblas
/bin/ld: cannot find -latlas
collect2: error: ld returned 1 exit status
make: *** [.build_release/lib/libcaffe.so.1.0.0] Error 1

So, where shall I found the lxilinxopencl?

or could you please give a more detailed installation tutorial, such as how did you modify your Makefile.config ?

[testfpga] ERROR: entry in lengths[i] is zero

I finally can't make it in the last issue.

But, after changes like following:

  1. Change the OS from Centos7 to CentOS6.5
  2. Change the SDAccel from 2017.1 to 2016.3
  3. Change the protobuf to 2.4.1

I compiled the fpga_caffe project successfully.

Then I run the "make testfpga", it was OK.

Then I try to test whether the testfpga work fine, so I go to the ./build_release/testfpga, and run a command like below, but it seems like that only the PCIeBandwidthTest is OK, others all failed due to an erro "ERROR: entry in lengths[i] is zero", finally ends up with "Segmentation fault (core dumped)".

[root@localhost testfpga]# XCL_EMULATION_MODE=true ./test_all.bin 
[==========] Running 25 tests from 9 test cases.
[----------] Global test environment set-up.
[----------] 6 tests from PCIeBandwidthTest/0, where TypeParam = OCLDevice<float>
[ RUN      ] PCIeBandwidthTest/0.TestSetup
[       OK ] PCIeBandwidthTest/0.TestSetup (5 ms)
[ RUN      ] PCIeBandwidthTest/0.TestBurst
[       OK ] PCIeBandwidthTest/0.TestBurst (60 ms)
[ RUN      ] PCIeBandwidthTest/0.TestBurst2
[       OK ] PCIeBandwidthTest/0.TestBurst2 (71 ms)
[ RUN      ] PCIeBandwidthTest/0.TestPadBurst
[       OK ] PCIeBandwidthTest/0.TestPadBurst (138 ms)
[ RUN      ] PCIeBandwidthTest/0.TestByChannel
[       OK ] PCIeBandwidthTest/0.TestByChannel (85 ms)
[ RUN      ] PCIeBandwidthTest/0.TestLocalBandwidth
ERROR: entry in lengths[i] is zero
Segmentation fault (core dumped)

After that, I run the "make test" to generate common test, and then I run the "make runtest".

Things seems all right until it comes to "BlobSimpleTest"

[----------] 5 tests from BlobSimpleTest/0, where TypeParam = float
[ RUN      ] BlobSimpleTest/0.TestPointersCPUOCL
ERROR: clCreateBuffer
*** Aborted at 1511440095 (unix time) try "date -d @1511440095" if you are using GNU date ***
PC: @     0x7fd979c913ef clEnqueueReadWriteBuffer()
*** SIGSEGV (@0x20) received by PID 26980 (TID 0x7fd97c893940) from PID 32; stack trace: ***
    @     0x7fd9784497e0 (unknown)
    @     0x7fd979c913ef clEnqueueReadWriteBuffer()
    @     0x7fd979c9397f clEnqueueWriteBuffer
    @     0x7fd978cb7eaa caffe::SyncedMemory::ocl_data()
    @     0x7fd978ca1c51 caffe::Blob<>::ocl_data()
    @           0x5ea1c5 caffe::BlobSimpleTest_TestPointersCPUOCL_Test<>::TestBody()
    @           0x7dc1ed testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x7d4791 testing::Test::Run()
    @           0x7d4876 testing::TestInfo::Run()
    @           0x7d49b7 testing::TestCase::Run()
    @           0x7d4d1e testing::internal::UnitTestImpl::RunAllTests()
    @           0x7dbd6d testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x7d3dee testing::UnitTest::Run()
    @           0x4d5cc5 main
    @     0x7fd9780c4d1d __libc_start_main
    @           0x4d59bd (unknown)
make: *** [runtest] Segmentation fault (core dumped)

So, can you provide some suggestion? :-)

Some questions about kernel code

hi, @dicecco1
Recently when I study your kernel code crp_layer_hwcn_cpfp.cpp, I am puzzled by some troubles.
First, how do you set the buffer size? according to what? For example, in

cpfp16 inBuf[4][2 * 256 * 16];

inBuf[4][2 * 256 * 16], what does those numbers 4 2 256 16 means? And suppose my input blob shape size is 32 32 4 256, how can I associate it with the inBuf size number?

Second, what does the *Fact variable means? like

ap_uint<8> imgFact = numImages >> 4;

short burstFact = burstChannels >> 2;

and so on. and I find that you use the multiply of burstFact and imgFact to compute insize in
short inSize = burstFact * imgFact;

no winograd_pe.cl file

Hi @dicecco1 ,

I want rebuild winograd_pe.xclbin, but I can't find winograd_pe.cl in src/caffe/ocl_caffe/convolution/winograd/, can you help update it?

about alexnet

Hello, I try to use python to run the Alexnet model,My code like this:

import caffe
MODEL_FILE = 'alexnet_four_channel_model.caffemodel'
DEPLOY_FILE = 'config/deploy.prototxt'
TEST_ROOT = 'datas/'

caffe.set_mode_cpu()
net = caffe.Net(DEPLOY_FILE, MODEL_FILE, caffe.TEST)

transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2, 0, 1))
transformer.set_raw_scale('data', 255)
transformer.set_channel_swap('data', (2, 1, 0))
net.blobs['data'].reshape(1, 3, 227, 227)

img = caffe.io.load_image('temp.jpg')

net.blobs['data'].data[...] = transformer.preprocess('data', img)

out = net.forward()
pridects = out['prob']
predict = pridects.argmax()
print(predict)

Then, I found the deploy.protxt doesn't have an softmax layer, and I found that the the first dim of input_shape must be 256? does it mean the number of pictures? I try to modify it to 1, then it signals an error that
CHECK(num_ % 16 == 0);
I want to know what I need to do so that I can run the fpga_alexnet model. Now, I have compiled an crp_layer_hwcn_cpfp.xclbin file, and put it into the folder .build_release/opencl/src/caffe/layers/,
what others I should do?

Thanks~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.