leiwang1999 / zynq-nvdla Goto Github PK

NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.

TeX 0.80% Batchfile 0.01% Shell 0.01% Makefile 0.09% C 13.03% C++ 11.33% Python 0.01% BitBake 0.01% Verilog 74.66% SystemVerilog 0.05% Tcl 0.01%

fpga zynq nvdla verilog yolox-nano

zynq-nvdla's Introduction

ZYNQ-NVDLA

NVDLA Xilinx FPGA Mapping！

File Tree of WorkSpace

IP/ Vivado IP Package For Nvdla Small
include/ Tengine backend include
RTL/ nvdla small rtl (include wrapper.v)
kmd/ kernel mode drive for petalinux (include zynq7000 / zynq MPSoc)
paper/ Latex paper for Bachelor degree
prebuilt/ aarch64 prebuilt lib
reports/ Timing、Power、Resource、Execution reports
sdk_sanity/ sdk sanity Test for NVDLA
umd/ Compiler and Runtime source code

Talk

Tengine Open Talk : New Backend OpenDLA. [slides] [recording]

Test

3.1 Classification

Resnet18-Cifar10

$ cd <tengine-lite-root-dir>/build
$ cmake --build . --target tm_classification_opendla
$ cd examples
$ ./tm_classification_opendla -m /root/Tengine/models/resnet18-cifar10-nosoftmax-relu_int8.tmfile -i /root/Tengine/images/cat.jpg -g 32,32 -s 1,1,1
Mean value not specified, use default   104.0, 116.7, 122.7
tengine-lite library version: 1.4-dev
NVDLA time: 0.012502 seconds

model file : /root/Tengine/models/resnet18-cifar10-nosoftmax-relu_int8.tmfile
image file : /root/Tengine/images/cat.jpg
img_h, img_w, scale[3], mean[3] : 32 32 , 1.000 1.000 1.000, 104.0 116.7 122.7
Repeat 1 times, thread 1, avg time 12.62 ms, max_time 12.62 ms, min_time 12.62 ms
--------------------------------------
10.087049, 3
3.833079, 2
3.026115, 5
2.420892, 4
-0.403482, 0
--------------------------------------

3.2 Detection

Yolox-nano

$ cd <tengine-lite-root-dir>/build
$ cmake --build . --target tm_classification_opendla tm_yolox_opendla
$ cd examples
$ ./tm_yolox_opendla -m /root/Tengine/models/yolox_nano_relu_int8.tmfile -i /root/Tengine/images/dog.jpg -r 1
tengine-lite library version: 1.4-dev
Repeat 1 times, thread 1, avg time 1138.80 ms, max_time 1138.80 ms, min_time 1138.80 ms
--------------------------------------
detection num: 3
 2:  70%, [ 463,   80,  676,  163], car
16:  52%, [ 122,  220,  315,  517], dog
 1:  48%, [ 180,  181,  564,  430], bicycle

Output:

zynq-nvdla's People

Contributors

Stargazers

Watchers

Forkers

mfkiwl qaziullah bug1989 sujianleo thinkpiece linkongyuan balckwilliam wwtghx lavine2011 bufans laotie tongjiaxuan666 x-tinkerer feizhouxiaozhu hackerwpf wujw20 chizhou12306 chaogaoucr bigdot123456 deceive777xv takeshineshiro bolifeyo unclelee1117 simrit1 tanvirarafin hao310rui140326 wei8171023 chengquan lyk125 svs11 wangjie1450 h2od jianquanliu luogantt ajunlonglive gz2023 ha1y zzzzzzzzz1999 mukaino1 nitcloud tuanx singhae marenan snowmanliu laukoor powerhuafei starkerfirst lmb3939 deilt daidong78 mahiru-mahiru ccliuyang in-tivat zmole945

zynq-nvdla's Issues

zynqMP nvdla umd test error

硬件：
ZYNQMP-XCZU9EG，用的是ALINX的板子
Vivado版本：2020.1
Petalinux版本：2020.1
Vivado和Petalinux都安装在虚拟机下，虚拟机版: 17 Player

Petalinux
rootfs用的是ubuntu16
insmod nvdla和中断都正常：

opendla: loading out-of-tree module taints kernel.
[ 188.136367] Probe NVDLA config nvidia,NV-nvdla-wrapper-1.0
[ 188.142103] 0 . 12 . 5
[ 188.142116] reset engine done
[ 188.142916] [drm] Initialized nvdla 0.0.0 20171017 for 80000000.NV_nvdla_wrapper on minor 1

root@arm:/home/ubuntu/ZYNQ-NVDLA/umd# cat /proc/interrupts |grep nvdla
53: 0 0 0 0 GICv2 121 Level 80000000.NV_nvdla_wrapper

问题
在运行./out/apps/runtime/nvdla_runtime/nvdla_runtime --loadable /home/ubuntu/nvdla_loadables/lenet-mnist-caffe/fast-math.nvdla 时出现错误：

./out/apps/runtime/nvdla_runtime/nvdla_runtime`` --loadable /home/ubuntu/nvdla_loadables/lenet-mnist-caffe/fast-math.nvdla
Hello Runtime Debug
ch test:go to launchTestcreating new runtime context...
libnvdla<3> runtime sees loadable gave back 4 reloc entries
libnvdla<3> load memory list entries=17
CH TEST: hMem=eaa658, size = 8
CH TEST: create_args=ffaac4c8, size = 16
pData=ffaac544, size=4096,map_args.offset=15375840,hDlaDev->fd=4096
CH TEST: pData=ffaac544, size=4096,map_args.offset=0,hDlaDev->fd=1
CH TEST: size=4096,flags=3,MAP_SHARD=1,fd=3,offset=0,ptr=1
Failed to map memory errno=22
CH TEST: map offset 0x0,size 1,err=4096
CH TEST: pData=ffaac544, size=4096,map_args.offset=0, hDlaDev->fd==1
(DLA_RUNTIME) Error 0xffffffff: (propagating from Runtime.cpp, function loadMemory(), line 794)
(DLA_RUNTIME) Error 0xffffffff: (propagating from Runtime.cpp, function load(), line 325)
(DLA_TEST) Error 0x00000004: runtime->load failed (in RuntimeTest.cpp, function loadLoadable(), line 353)
(DLA_TEST) Error 0x00000004: (propagating from RuntimeTest.cpp, function run(), line 443)
(DLA_TEST) Error 0x00000004: (propagating from main.cpp, function launchTest(), line 88)
root@arm:/home/ubuntu/ZYNQ-NVDLA/umd#

加了些printf，怀疑是nvdla memory分配错误，但不知道怎么往下查了，请教下是否有什么建议？

system-user.dtsi文件如下：
/include/ "system-conf.dtsi"
/ {
reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;

nvdla_reserved: buffer@0 {
compatible = "shared-dma-pool";
no-map;
reg = <0x0 0x40000000 0x0 0x30000000>;
};
};
memory {
device_type = "memory";
reg = <0x0 0x0 0x0 0x7ff00000>, <0x00000008 0x00000000 0x0 0x80000000>;
};
};

&NV_nvdla_wrapper_0 {
compatible = "nvidia,NV-nvdla-wrapper-1.0";
memory-regin = <&nvdla_reserved>;
};

/SD/
&sdhci1 { /* FIXME - on CC - MIO 39 - 51 */
status = "okay";
no-1-8-v;
disable-wp;
};

/usb/
&dwc3_0 {
status = "okay";
dr_mode = "host";
};

/eth0/
&gem3 {
status = "okay";
};

vivado 自动推导axi和apb连线

您好，看了您的fpga映射博客，有如下几个疑问：
1.vivado封装IP包装AXI和APB总线的时候vivado自动推导的意思是进入port界面如果出现了vivado自动识别的总线对应就可以了嘛。
2.包装完总线后是不是需要绑定axi和apb与clock信号
3.题主提供的constant IP是为了让以太网复位吗？可以不用这个IP吗？
4.address editor 中的地址映射和kmd移植中的设备树地址有关系吗？

tengine下执行分类报错

tengine-lite library version: 1.5-dev
(DLA) Error 0x00000004: (in /root/umd/utils/BuddyAlloc.c, function construct(), line 180)
(DLA) Error 0x00000004: (propagating from Memory.cpp, function init(), line 95)
(DLA) Error 0x00000004: (propagating from engine-ast/EngineGraph.cpp, function initGraphResources(), line 107)
terminate called after throwing an instance of 'nvdla::priv::NvErrorException'
Aborted
博主，这个错你了解吗

zc706 runtime error because of vmap

zc706移植nvdla

检查驱动发现

申请vmalloc的大小空间太大超过要求。
板卡芯片为zynq7000系列的zc045ffg900-2,kmd用的仓库里的没有更改，为nvdla的dma预留了256MB，为什么会动态内存超出范围呢？题主有遇到类似的问题吗？该从什么方向上调整呢？

implementation error [DRC INBB-3]

您好@LeiWang1999，我最近有在向您学习NVDLA移植，开发板型号是XC7Z045FFG900-2I，在implementation时发生了如下错误

我一开始是可以正常implementation的，后来在依照开发板手册修改了一些ZYNQ IP配置之后就不能正常implementation了
考虑到可能是软件兼容性的问题，我在Ubuntu18.04下复现此过程不过仍然是同样的问题
希望您能给出一些修改意见，谢谢

libjpeg.a version

Hi,
I have a question about the libjpeg.a library. Did you use the prebuilt one from the nvdla/sw repository or you have built a different one because the library sizes are different?
Many thanks in advance.

Build the Tengine with RiscV+NVDLA Project

Hi @LeiWang1999,

Thank you for your wonderful work.

Can you please help me with the steps to Build the Tengine with RiscV+NVDLA Project? What are the changes to be made? Would be a great help from your side.

Best Regards,
Darshan C G

issue build UMD in MPSoC zynqMP

hi @LeiWang1999 , I followed your instruction to build UMD for zynqMP, unfortunately, when I ported into zcu102 board, it appeared error as below:
root@arm:/umd/out/apps/runtime/nvdla_runtime# ./nvdla_runtime
bash: ./nvdla_runtime: cannot execute binary file: Exec format error
root@arm:/umd/out/apps/runtime/nvdla_runtime#
If you have any suggestion, please share to me!
Thank you so much~

segmentation fault when running model in zynq fpga

@LeiWang1999 ! thank for your sharing, I completed to built and run zynq-nvdla in zcu102. However, when I execute model lenet in board it appeared an erro for segmentation as below:
root@nvdla_fpga:/umd/out/apps/runtime/nvdla_runtime# ./nvdla_runtime --loadable fast-math.nvdla --image ./Images/0_7.jpg --rawdump
creating new runtime context...
Emulator starting
dlaimg height: 28 x 28 x 1: LS: 224 SS: 0 Size: 6272
submitting tasks...
Segmentation fault
root@nvdla_fpga:/umd/out/apps/runtime/nvdla_runtime#

This is the insmod when I loaded kmd:
root@nvdla_fpga:# insmod /lib/modules/4.19.0-xilinx-v2019.1/extra/opendla.ko
[ 38.075678] Probe NVDLA config nvidia,nv_small
[ 38.080405] 0 . 12 . 5
[ 38.082769] reset engine done
[ 38.086171] [drm] Initialized nvdla 0.0.0 20171017 for a0000000.NV_nvdla_wrapper on minor 1
[ 38.094544] ------------[ cut here ]------------
[ 38.099160] memremap attempted on ram 0x0000000040000000 size: 0x40000000
[ 38.105961] WARNING: CPU: 1 PID: 2266 at kernel/iomem.c:108 memremap+0x178/0x1e8
[ 38.113341] Modules linked in: opendla(O+) mali(O) uio_pdrv_genirq
[ 38.119517] CPU: 1 PID: 2266 Comm: insmod Tainted: G O 4.19.0-xilinx-v2019.1 #1
[ 38.128116] Hardware name: xlnx,zynqmp (DT)
[ 38.132285] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 38.137059] pc : memremap+0x178/0x1e8
[ 38.140705] lr : memremap+0x178/0x1e8
[ 38.144349] sp : ffffff800cfb38c0
[ 38.147648] x29: ffffff800cfb38c0 x28: ffffff800cfb5000
[ 38.152952] x27: 0000000000000100 x26: ffffff8000b85a80
[ 38.158256] x25: ffffff800cfb3980 x24: 0000000040000000
[ 38.163551] x23: 0000000000000001 x22: ffffffc87985fa18
[ 38.168846] x21: 0000000040000000 x20: 0000000000000004
[ 38.174141] x19: 0000000040000000 x18: 0000000000000010
[ 38.179437] x17: 0000000000000000 x16: 0000000000000000
[ 38.184732] x15: ffffffffffffffff x14: ffffff8009138648
[ 38.190028] x13: ffffff80891d7ecf x12: ffffff80091d7ed8
[ 38.195323] x11: ffffff800914a000 x10: ffffff800cfb35a0
[ 38.200618] x9 : 00000000ffffffd0 x8 : ffffff8008563fe0
[ 38.205914] x7 : 30203a657a697320 x6 : 000000000000015c
[ 38.211209] x5 : 0000000000000007 x4 : 0000000000000000
[ 38.216504] x3 : 0000000000000000 x2 : ffffffffffffffff
[ 38.221800] x1 : c29b4d7389105500 x0 : 0000000000000000
[ 38.227095] Call trace:
[ 38.229528] memremap+0x178/0x1e8
[ 38.232829] dma_init_coherent_memory+0x50/0x110
[ 38.237436] dma_declare_coherent_memory+0x44/0xb8
[ 38.242223] nvdla_drm_probe+0x74/0x98 [opendla]
[ 38.246836] nvdla_probe+0x18c/0x1b0 [opendla]
[ 38.251266] platform_drv_probe+0x50/0xa0
[ 38.255266] really_probe+0x1c8/0x280
[ 38.258912] driver_probe_device+0x54/0xe8
[ 38.262992] __driver_attach+0xe4/0xe8
[ 38.266727] bus_for_each_dev+0x70/0xc0
[ 38.270553] driver_attach+0x20/0x28
[ 38.274112] bus_add_driver+0x1dc/0x208
[ 38.277932] driver_register+0x60/0x110
[ 38.281752] __platform_driver_register+0x44/0x50
[ 38.286448] nvdla_driver_init+0x1c/0x1000 [opendla]
[ 38.291397] do_one_initcall+0x74/0x178
[ 38.295216] do_init_module+0x54/0x1c8
[ 38.298948] load_module+0x1b5c/0x20e0
[ 38.302681] __se_sys_finit_module+0xb8/0xc8
[ 38.306934] __arm64_sys_finit_module+0x18/0x20
[ 38.311450] el0_svc_common+0x84/0xd8
[ 38.315095] el0_svc_handler+0x68/0x80
[ 38.318827] el0_svc+0x8/0xc
[ 38.321691] ---[ end trace bf498404d6f095c3 ]---
root@nvdla_fpga:#

based on your expertise, do you know any approach to solve this issue?
Warmly thank you so much,
Tony Do

Linking CXX shared library error

/usr/bin/ld: /root/Tengine/source/device/opendla/lib/libprotobuf.a(descriptor.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol __stack_chk_guard@@GLIBC_2.17' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: /root/Tengine/source/device/opendla/lib/libprotobuf.a(descriptor.o)(.text+0x50c): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol __stack_chk_guard@@GLIBC_2.17'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
source/CMakeFiles/tengine-lite.dir/build.make:7192: recipe for target 'source/libtengine-lite.so' failed
make[3]: *** [source/libtengine-lite.so] Error 1
CMakeFiles/Makefile2:325: recipe for target 'source/CMakeFiles/tengine-lite.dir/all' failed
make[2]: *** [source/CMakeFiles/tengine-lite.dir/all] Error 2
CMakeFiles/Makefile2:618: recipe for target 'examples/CMakeFiles/tm_classification_opendla.dir/rule' failed
make[1]: *** [examples/CMakeFiles/tm_classification_opendla.dir/rule] Error 2
Makefile:322: recipe for target 'tm_classification_opendla' failed
make: *** [tm_classification_opendla] Error 2

最后链接libprotobuf.a有问题，大佬遇到过吗？

block design后implementation失败

您好，我使用ZC706开发板，按照您的博客现在已经完成了block design工作，现在想要继续生成比特流但对bd进行implementation会报错提示IO口数量不足。看您在知乎的回答说综合时的IO超标不影响后续设计，请问这个问题应该如何解决呢？或者说是我在block design的时候犯了什么错误导致了这个问题吗？

如何移植到 zcu 102 开发板上呢？

您好，我正在尝试在zcu102板卡上复现您的项目，但是在block design 的IP连接部分遇到了问题，我注意到您也将这个项目移植到zcu102上过，能麻烦大佬分享一下您的block design吗？感谢！！

wrong output.dimg when execute nvdla_runtime

hi, lei
I have followed your blog and finally go to the step of running nvdla_runtime, the log says "test pass" but when I watch the file output.dimg, the result is "12 12 12 12 12 12 12 12 12 12 ", like:

can you help me, please!

Vivado implementation failed

[Place 30-58] IO placement is infeasible. Number of unplaced terminals (431) is greater than number of available sites (400).
The following are banks with available pins:

[Place 30-374] IO placer failed to find a solution
Below is the partial placement that can be analyzed to see if any constraint modifications will make the IO placement problem easier to solve.

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IO Placement : Bank Stats |
+----+-------+-------+------------------------------------------------------------------------+------------------------------------------+--------+--------+--------+-----+
| Id | Pins | Terms | Standards | IDelayCtrls | VREF | VCCO | VR | DCI |
+----+-------+-------+------------------------------------------------------------------------+------------------------------------------+--------+--------+--------+-----+
| 0 | 0 | 0 | | | | | | |
| 9 | 50 | 0 | | | | | | |
| 10 | 50 | 0 | | | | | | |
| 11 | 50 | 0 | | | | | | |
| 12 | 50 | 0 | | | | | | |
| 13 | 50 | 0 | | | | | | |
| 33 | 50 | 0 | | | | | | |
| 34 | 50 | 0 | | | | | | |
| 35 | 50 | 0 | | | | | | |
+----+-------+-------+------------------------------------------------------------------------+------------------------------------------+--------+--------+--------+-----+
| | 400 | 0 | | | | | | |
+----+-------+-------+------------------------------------------------------------------------+------------------------------------------+--------+--------+--------+-----+

[Place 30-99] Placer failed with error: 'IO Clock Placer failed'
Please review all ERROR, CRITICAL WARNING, and WARNING messages during placement to understand the cause for failure.

[Common 17-69] Command failed: Placer could not place all instances

想请教一下，在按照您的博客笔记执行硬件部分时，实现出现了问题。请问该如何改善，我是应该手动布局布线关掉一些无用的IO吗？

Module not found when generating bitstream

Hi, I am facing the following problem when generating bitstream for ZC706:

It shows that [Synth 8-439] module 'CKLNQD12' not found.
Please help me solve these errors. Thanks a lot.

[IP_Flow 19-3252] Bus Interface 'm_axi': The master AXIMM interface must have required port maps to the read or write channel signals.

我遇到了同一个问题请问是怎么解决的，从问答没看懂方案

After automatically inferring the port：

Originally posted by @AlenLqx in #6 (comment)

Convert SD card boot to QSPI boot

@LeiWang1999 需要对Petalinux设计做出哪些修改，QSPI flash只有256mb会不会不够用？
请问您有试过这样做吗，因为我选用的第三方开发板不能从SD卡启动，所以只能从QSPI启动，然后接下来的步骤要如何来做，您可以给出一些意见吗。
然后我现在是在SDK上把.bin和.elf给烧录到QSPI flash上面了，但是接下来要如何去完成linux系统的移植等问题呢？

kmd/zynq7000 与32位处理器的兼容

1.nvdla/sw仓库中kmd是在64位处理器下运行的，kmd/zynq7000中都修改哪些函数去适配32位处理器呢？还有哪些函数需要修改才能在32位机器上正确运行呢？
2.在vp在编译生成的.nvdla文件可以在zynq7000上使用吗？

Mapping the NVDLA accelerator to zynqMP platform

非常感谢你分享在zynq700系列上部署NVDLA的具体操作。
麻烦问下，你是否有在zynqMP platform上进行部署呢！
我不知道如何进行IP间的线路连接，麻烦你是否可以分享下相关经验呢？
真诚的感谢。

opendla.ko

请问tengine-opendla编译里的内核驱动程序在哪里呢

NVDLA Runtime error

Hi @LeiWang1999, I am trying to run NVDLA on FPGA and having some issues. I was hopping if you could guide me in this regard.

I have implemented nvdla small on Intel FPGA (ARRIA10) with 32bit HPS (ARM A 9). I have modified kmd and umd to make it compatible with 32bit arm processor.
I am able to successfully insert the module but while running the runtime test of PDP I am having the following error messages.

and for the other test the kernel got stuck in the middle and become unresponsive.

Note:
I am using kernel 4.14 and 512 MBs are reserved for KMD.
Could you kinldy look into the issue and let me know root cause of the errors.

Thanks in advance.

PACKAGE IP

Generate RTL code using tmake
Generate VIVADO project. ( ZCU102 Board )

--> EVEN including manual RTL code NV_nvdla_wrapper.v.
The total number is 268 while your is 270

Set top as NV_nvdla_wrapper.v.
Create and Package NEW IP
use first option as default

ERROR OCCUR!

[IP_Flow 19-3252] Bus Interface 'm_axi': The master AXIMM interface must have required port maps to the read or write channel signals.

Error: nvdla_interface.h: No such file or directory

感谢你的无私的分享~
因为刚开始研究这部分
所以目前按照你的步骤一步一步执行
但是做到petalinux 的部份时，
遇到 "Error: nvdla_interface.h: No such file or directory"
不知道有没有解决的经验?
其实我有先在一般电脑上执行KMD的部分，
执行结果是没问题的，但是移植到peatliniux 上就出现这问题了.
环境是使用 petalinux 2020.2

NVDLA stuck during inference

Hi @LeiWang1999

I am running the same implementation on ZCU104 with your code files, I insmoded the .ko file generated for MPSoC. I verified the device in "/dev/drm" and interrupt in "/proc/interrupts". They are perfect.

The problem is that it is getting stuck in the middle of model execution. I am attaching the debug log of KMD

submitting tasks...
[ 319.138013] Enter: dla_initiate_processors
[ 319.146859] Enter: dla_submit_operation
[ 319.150684] Prepare Convolution operation index 0 ROI 0 dep_count 1
[ 319.156939] Enter: dla_prepare_operation
[ 319.160857] processor:Convolution group:0, rdma_group:0 available
[ 319.166939] Enter: dla_read_config
[ 319.170342] Exit: dla_read_config
[ 319.173649] Exit: dla_prepare_operation status=0
[ 319.178259] Enter: dla_program_operation
[ 319.182173] Program Convolution operation index 0 ROI 0 Group[0]
[ 319.188205] no desc get due to index==-1
[ 319.192122] no desc get due to index==-1
[ 319.196035] no desc get due to index==-1
[ 319.199948] no desc get due to index==-1
[ 319.203864] no desc get due to index==-1
[ 319.207778] Enter: dla_op_programmed
[ 319.211348] Update dependency operation index 3 ROI 0 DEP_COUNT=3
[ 319.217431] Update dependency operation index 1 ROI 0 DEP_COUNT=1
[ 319.223516] enable SDP in dla_update_dependency as depdency are resolved
[ 319.230207] Enter: dla_enable_operation
[ 319.234036] exit dla_enable_operation without actual enable due to processor hasn't been programmed
[ 319.243071] Exit: dla_enable_operation status=0
[ 319.247594] Exit: dla_op_programmed
[ 319.251074] Exit: dla_program_operation status=0
[ 319.255684] Exit: dla_submit_operation
[ 319.259424] Enter: dla_dequeue_operation
[ 319.263341] Dequeue op from Convolution processor, index=3 ROI=0
[ 319.269337] Enter: dla_submit_operation
[ 319.273166] Prepare Convolution operation index 3 ROI 0 dep_count 2
[ 319.279423] Enter: dla_prepare_operation
[ 319.283340] processor:Convolution group:1, rdma_group:0 available
[ 319.289422] Enter: dla_read_config
[ 319.292824] Exit: dla_read_config
[ 319.296132] Exit: dla_prepare_operation status=0
[ 319.300742] Enter: dla_program_operation
[ 319.304656] Program Convolution operation index 3 ROI 0 Group[1]
[ 319.310685] no desc get due to index==-1
[ 319.314601] no desc get due to index==-1
[ 319.318519] no desc get due to index==-1
[ 319.322432] no desc get due to index==-1
[ 319.326347] no desc get due to index==-1
[ 319.330261] Enter: dla_op_programmed
[ 319.333831] Update dependency operation index 6 ROI 0 DEP_COUNT=3
[ 319.339914] Update dependency operation index 4 ROI 0 DEP_COUNT=2
[ 319.346000] Exit: dla_op_programmed
[ 319.349474] Exit: dla_program_operation status=0
[ 319.354080] Exit: dla_submit_operation
[ 319.357820] Exit: dla_dequeue_operation
[ 319.361649] Enter: dla_submit_operation
[ 319.365472] Prepare SDP operation index 1 ROI 0 dep_count 0
[ 319.371033] Enter: dla_prepare_operation
[ 319.374951] processor:SDP group:0, rdma_group:0 available
[ 319.380338] Enter: dla_read_config
[ 319.383738] Exit: dla_read_config
[ 319.387048] Exit: dla_prepare_operation status=0
[ 319.391655] Enter: dla_program_operation
[ 319.395571] Program SDP operation index 1 ROI 0 Group[0]
[ 319.400888] no desc get due to index==-1
[ 319.404806] no desc get due to index==-1
[ 319.408722] no desc get due to index==-1
[ 319.412635] no desc get due to index==-1
[ 319.416549] Enter: dla_op_programmed
[ 319.420119] Update dependency operation index 4 ROI 0 DEP_COUNT=1
[ 319.426202] enable SDP in dla_update_dependency as depdency are resolved
[ 319.432895] Enter: dla_enable_operation
[ 319.436722] exit dla_enable_operation without actual enable due to processor hasn't been programmed
[ 319.445759] Exit: dla_enable_operation status=0
[ 319.450280] Exit: dla_op_programmed
[ 319.453762] Exit: dla_program_operation status=0
[ 319.458369] Enter: dla_enable_operation
[ 319.462199] Enable SDP operation index 1 ROI 0
[ 319.466634] Enter: dla_op_enabled
[ 319.469942] Update dependency operation index 0 ROI 0 DEP_COUNT=1
[ 319.476025] enable Convolution in dla_update_dependency as depdency are resolved
[ 319.483412] Enter: dla_enable_operation
[ 319.487240] Enable Convolution operation index 0 ROI 0
[ 319.492376] Enter: dla_op_enabled
[ 319.495685] Exit: dla_op_enabled
[ 319.498906] Exit: dla_enable_operation status=0
[ 319.503427] Exit: dla_op_enabled
[ 319.506649] Exit: dla_enable_operation status=0
[ 319.511170] Exit: dla_submit_operation
[ 319.514912] Enter: dla_dequeue_operation
[ 319.518826] Dequeue op from SDP processor, index=4 ROI=0
[ 319.524130] Enter: dla_submit_operation
[ 319.527958] Prepare SDP operation index 4 ROI 0 dep_count 0
[ 319.533522] Enter: dla_prepare_operation
[ 319.537437] processor:SDP group:1, rdma_group:1 available
[ 319.542827] Enter: dla_read_config
[ 319.546227] Exit: dla_read_config
[ 319.549531] Exit: dla_prepare_operation status=0
[ 319.554137] Enter: dla_program_operation
[ 319.558051] Program SDP operation index 4 ROI 0 Group[1]
[ 319.563370] no desc get due to index==-1
[ 319.567286] no desc get due to index==-1
[ 319.571207] no desc get due to index==-1
[ 319.575124] no desc get due to index==-1
[ 319.579040] Enter: dla_op_programmed
[ 319.582607] Update dependency operation index 7 ROI 0 DEP_COUNT=2
[ 319.588692] Exit: dla_op_programmed
[ 319.592172] Exit: dla_program_operation status=0
[ 319.596782] Enter: dla_enable_operation
[ 319.600609] Enable SDP operation index 4 ROI 0
[ 319.605046] Enter: dla_op_enabled
[ 319.608352] Update dependency operation index 3 ROI 0 DEP_COUNT=2
[ 319.614437] Exit: dla_op_enabled
[ 319.617656] Exit: dla_enable_operation status=0
[ 319.622179] Exit: dla_submit_operation
[ 319.625919] Exit: dla_dequeue_operation
[ 319.629754] Enter: dla_submit_operation
[ 319.633585] Prepare PDP operation index 2 ROI 0 dep_count 1
[ 319.639149] Enter: dla_prepare_operation
[ 319.643065] processor:PDP group:0, rdma_group:0 available
[ 319.648454] Enter: dla_read_config
[ 319.651854] Exit: dla_read_config
[ 319.655164] Exit: dla_prepare_operation status=0
[ 319.659771] Enter: dla_program_operation
[ 319.663688] Program PDP operation index 2 ROI 0 Group[0]
[ 319.668991] group id 0 rdma id 0
[ 319.672229] no desc get due to index==-1
[ 319.676142] no desc get due to index==-1
[ 319.680058] no desc get due to index==-1
[ 319.683971] no desc get due to index==-1
[ 319.687887] no desc get due to index==-1
[ 319.691800] Enter: dla_op_programmed
[ 319.695370] Update dependency operation index 5 ROI 0 DEP_COUNT=2
[ 319.701453] Exit: dla_op_programmed
[ 319.704935] Exit: dla_program_operation status=0
[ 319.709543] Exit: dla_submit_operation
[ 319.713285] Enter: dla_dequeue_operation
[ 319.717199] Dequeue op from PDP processor, index=5 ROI=0
[ 319.722503] Enter: dla_submit_operation
[ 319.726331] Prepare PDP operation index 5 ROI 0 dep_count 1
[ 319.731895] Enter: dla_prepare_operation
[ 319.735810] processor:PDP group:1, rdma_group:1 available
[ 319.741200] Enter: dla_read_config
[ 319.744600] Exit: dla_read_config
[ 319.747910] Exit: dla_prepare_operation status=0
[ 319.752517] Enter: dla_program_operation
[ 319.756434] Program PDP operation index 5 ROI 0 Group[1]
[ 319.761736] group id 1 rdma id 1
[ 319.764968] no desc get due to index==-1
[ 319.768880] no desc get due to index==-1
[ 319.772793] no desc get due to index==-1
[ 319.776709] no desc get due to index==-1
[ 319.780623] no desc get due to index==-1
[ 319.784538] no desc get due to index==-1
[ 319.788452] Enter: dla_op_programmed
[ 319.792021] Exit: dla_op_programmed
[ 319.795501] Exit: dla_program_operation status=0
[ 319.800110] Exit: dla_submit_operation
[ 319.803851] Exit: dla_dequeue_operation
[ 319.807680] Exit: dla_initiate_processors status=0

After this, petalinux is also not reponsing. I had to reboot it.

I am thinking the OS is waiting for the NVDLA interrupt and it is not gettining it. I am also attaching the result of cat /pro/interrupts here

root@dlampsoc:/usr/nvdla# CPU0 CPU1 3: 16164 2068 6: 0 0 7: 0 0 8: 0 0 9: 0 0 10: 0 0 12: 0 0 13: 0 0 14: 0 0 15: 0 0 16: 0 0 17: 0 0 18: 0 0 19: 0 0 21: 0 0 22: 0 0 23: 0 0 24: 0 0 25: 0 0 26: 0 0 27: 0 0 28: 0 0 30: 0 0 32: 15 0 33: 0 0 34: 0 0 35: 0 0 36: 45 0 37: 0 0 38: 0 0 39: 0 0 40: 1319 0 41: 146 0 44: 0 0 45: 0 0 46: 0 0 47: 0 0 48: 0 0 49: 0 0 IPI0: 1454 IPI1: 17 IPI2: 0 IPI3: 0 IPI4: 665 IPI5: 0 IPI6: 0 cat /proc/interrupts
CPU2 CPU3
2317 4121 GICv2 30 Level arch_timer
0 0 GICv2 67 Level zynqmp_ipi
0 0 GICv2 175 Level arm-pmu
0 0 GICv2 176 Level arm-pmu
0 0 GICv2 177 Level arm-pmu
0 0 GICv2 178 Level arm-pmu
0 0 GICv2 156 Level zynqmp-dma
0 0 GICv2 157 Level zynqmp-dma
0 0 GICv2 158 Level zynqmp-dma
0 0 GICv2 159 Level zynqmp-dma
0 0 GICv2 160 Level zynqmp-dma
0 0 GICv2 161 Level zynqmp-dma
0 0 GICv2 162 Level zynqmp-dma
0 0 GICv2 163 Level zynqmp-dma
0 0 GICv2 109 Level zynqmp-dma
0 0 GICv2 110 Level zynqmp-dma
0 0 GICv2 111 Level zynqmp-dma
0 0 GICv2 112 Level zynqmp-dma
0 0 GICv2 113 Level zynqmp-dma
0 0 GICv2 114 Level zynqmp-dma
0 0 GICv2 115 Level zynqmp-dma
0 0 GICv2 116 Level zynqmp-dma
0 0 GICv2 95 Level eth0, eth0
0 0 GICv2 50 Level cdns-i2c
0 0 GICv2 42 Level ff960000.memory-controller
0 0 GICv2 57 Level axi-pmon, axi-pmon
0 0 GICv2 155 Level axi-pmon, axi-pmon
0 0 GICv2 47 Level ff0f0000.spi
0 0 GICv2 58 Level ffa60000.rtc
0 0 GICv2 59 Level ffa60000.rtc
0 0 GICv2 165 Level ahci-ceva[fd0c0000.ahci]
0 0 GICv2 81 Level mmc0
0 0 GICv2 53 Level xuartps
0 0 GICv2 84 Edge ff150000.watchdog
0 0 GICv2 88 Level ams-irq
0 0 GICv2 154 Level fd4c0000.dma
0 0 GICv2 151 Level fd4a0000.zynqmp-display
0 0 GICv2 121 Level a0000000.dla_small
0 0 GICv2 97 Level xhci-hcd:usb1
1082 1707 1483 Rescheduling interrupts
98 102 136 Function call interrupts
0 0 0 CPU stop interrupts
0 0 0 CPU stop (for crash dump) interrupts
2481 2467 2177 Timer broadcast interrupts
0 0 0 IRQ work interrupts
0 0 0 CPU wake-up interrupts

What do you think the issue is?

Tengine的运行结果 Segmentation fault

我用的是Tengine给的model_zoo里的 resnet18-cifar10_int8.tmfile 这个文件

运行 ./tm_classification_opendla -m /Tengine/models/resnet18-cifar10-int8.tmfile -i /Tengine/images/cat.jpg -g 32,32 -s 1,1,1后

结果

Floating point exception when run yolo

hello,
when I use the following command to run the yolo example with nvdla small config, there is a Floating point exception and the example failed:
./tm_yolox_opendla -m /home/ubuntu/tftproot/yolox_nano_relu_int8.tmfile -i /home/ubuntu/tftproot/dog.jpg -r 1
the last printf messages are:

libnvdla<3> tb-25 for tsd-25 for e-29 with NVDLA_FEATURE_DATA_INT8
libnvdla<3> tb-28 for tsd-28 for e-8 with NVDLA_FEATURE_DATA_INT8
libnvdla<3> tb-29 for tsd-29 for e-9 with NVDLA_FEATURE_DATA_INT8
libnvdla<3> tb-30 for tsd-30 for e-11 with NVDLA_FEATURE_DATA_INT8
libnvdla<3> tb-31 for tsd-31 for e-10 with NVDLA_FEATURE_DATA_INT8
./run_yolo.sh: line 1:  1670 Floating point exception./tm_yolox_opendla -m /home/ubuntu/tftproot/yolox_nano_relu_int8.tmfile -i /home/ubuntu/tftproot/dog.jpg -r 1
root@arm:/home/ubuntu/wrk/Tengine/build2/examples#

The other example shown in the Readme seem to be ok.

gcc: internal compiler error: Killed (program cc1plus)

gcc: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-5/README.Bugs for instructions.
/ZYNQ-NVDLA-master/umd/make/compile.mk:61: recipe for target '/ZYNQ-NVDLA-master/umd/out/core/src/compiler/libnvdla_compiler/engine-ast/FullyConnectedOp.o' failed
make[1]: *** [/ZYNQ-NVDLA-master/umd/out/core/src/compiler/libnvdla_compiler/engine-ast/FullyConnectedOp.o] Error 4
[ 2068.232790] Out of memory: Kill process 14355 (cc1plus) score 52 or sacrifice child
[ 2068.240476] Killed process 14355 (cc1plus) total-vm:78112kB, anon-rss:52848kB, file-rss:0kB, shmem-rss:0kB
我在编译compiler时执行如下命令
$ make -j nproc TOP=${PWD} TOOLCHAIN_PREFIX=/usr/bin/ compile您
出现了这个错误，一定要靠交换分区来解决吗，不知你遇到了没有

Package IP Error

Appears when Package IP：
[IP_Flow 19-3252] Bus Interface 'm_axi': The master AXIMM interface must have required port maps to the read or write channel signals.
Does anyone know how to solve it？