Code Monkey home page Code Monkey logo

Comments (20)

r0l1 avatar r0l1 commented on May 10, 2024 2

Looking at the jevoisinc camera module, which has a very similar GPU, I can't understand, how they manage to get this insane framerate ^^

from darknet.

sowson avatar sowson commented on May 10, 2024 1

There is in the "yolo_layer.c" file following code in lines 364-367:

if(!net.train || l.onlyforward){
    opencl_pull_array(l.output_gpu, l.output, l.batch*l.outputs);
    return;
}

You may:

if(!net.train || l.onlyforward){
    //opencl_pull_array(l.output_gpu, l.output, l.batch*l.outputs);
    return;
}

The pull is done anyway in the network.c and the functionality are fine.

from darknet.

PeterQuinn925 avatar PeterQuinn925 commented on May 10, 2024 1

Jevois uses darknet-NNPACK. I think it's https://github.com/digitalbrain79/darknet-nnpack.

I get about 1 fps with darknet/yolo, which I don't think is insane.

from darknet.

sowson avatar sowson commented on May 10, 2024

@r0l1 You may try to check BENCHMARK=1 in Makefile rebuild and test but it produces a lot of stats, however, there are showing what is slow and you/we may fix it. You may attach results of output in txt file here to analyze. The thing is that if you build with RPI=1 I am using naive gemm_gpu version instead of this from clBLAS to change that you may look in the code for RPI definition for pre-processor. Disable it in "blas_kernels.c" (at the end of file) and enable this in "gemm.c" (in the middle of the file should be about 3 of RPI "ifdef"). Thanks!

from darknet.

r0l1 avatar r0l1 commented on May 10, 2024

Thanks for the fast response. I didn't build with RPI enabled. I patched the Makefile and removed the -mfpmath=sse.

Makefile

GPU=1
GPU_FAST=1
GPU_MULTI=0
OPENCV=0
OPENMP=0
RPI=0
BENCHMARK=1
DEBUG=0

Benchmark Results

First Iteration

opencl_push_array	a35f4008	1363
fill_kernel	5044
fill_kernel	4786
im2col_gpu_kernel	11953
copy_kernel	5059
normalize_kernel	247
scale_bias_kernel	272
add_bias_kernel	212
activate_array_kernel	145
FW CONVOLUTIONAL	592493
fill_kernel	1279
forward_maxpool_layer_kernel	8148
FW MAXPOOL	8224
fill_kernel	8181
fill_kernel	8313
im2col_gpu_kernel	160
copy_kernel	8565
normalize_kernel	172
scale_bias_kernel	116
add_bias_kernel	99
activate_array_kernel	143
FW CONVOLUTIONAL	5079465
fill_kernel	2352
forward_maxpool_layer_kernel	4620
FW MAXPOOL	4696
fill_kernel	4579
fill_kernel	4485
im2col_gpu_kernel	165
copy_kernel	4444
normalize_kernel	111
scale_bias_kernel	99
add_bias_kernel	103
activate_array_kernel	138
FW CONVOLUTIONAL	470649
fill_kernel	1213
forward_maxpool_layer_kernel	2392
FW MAXPOOL	2463
fill_kernel	2311
fill_kernel	2265
im2col_gpu_kernel	140
copy_kernel	2210
normalize_kernel	113
scale_bias_kernel	97
add_bias_kernel	89
activate_array_kernel	148
FW CONVOLUTIONAL	158572
fill_kernel	653
forward_maxpool_layer_kernel	1299
FW MAXPOOL	1367
fill_kernel	1248
fill_kernel	1177
im2col_gpu_kernel	158
copy_kernel	530
normalize_kernel	96
scale_bias_kernel	97
add_bias_kernel	125
activate_array_kernel	106
FW CONVOLUTIONAL	5000
fill_kernel	305
forward_maxpool_layer_kernel	542
FW MAXPOOL	605
fill_kernel	337
fill_kernel	321
im2col_gpu_kernel	134
copy_kernel	317
normalize_kernel	102
scale_bias_kernel	147
add_bias_kernel	89
activate_array_kernel	129
FW CONVOLUTIONAL	5614
fill_kernel	322
forward_maxpool_layer_kernel	633
FW MAXPOOL	697
fill_kernel	527
fill_kernel	559
im2col_gpu_kernel	99
copy_kernel	559
normalize_kernel	136
scale_bias_kernel	87
add_bias_kernel	128
activate_array_kernel	96
FW CONVOLUTIONAL	14786
fill_kernel	302
fill_kernel	300
copy_kernel	266
normalize_kernel	131
scale_bias_kernel	92
add_bias_kernel	89
activate_array_kernel	128
FW CONVOLUTIONAL	3114
fill_kernel	335
fill_kernel	372
im2col_gpu_kernel	104
copy_kernel	351
normalize_kernel	124
scale_bias_kernel	85
add_bias_kernel	110
activate_array_kernel	99
FW CONVOLUTIONAL	5635
fill_kernel	123
fill_kernel	76
add_bias_kernel	71
activate_array_kernel	126
FW CONVOLUTIONAL	1611
fill_kernel	77
copy_kernel	83
activate_array_kernel	122
activate_array_kernel	125
activate_array_kernel	127
activate_array_kernel	121
activate_array_kernel	129
activate_array_kernel	123
opencl_pull_array	14037e8	283816
FW YOLO	284938
fill_kernel	381
copy_kernel	315
FW ROUTE	375
fill_kernel	77
fill_kernel	80
copy_kernel	79
normalize_kernel	102
scale_bias_kernel	77
add_bias_kernel	95
activate_array_kernel	93
FW CONVOLUTIONAL	898
fill_kernel	383
fill_kernel	325
upsample_kernel	83
FW UPSAMPLE	511
fill_kernel	857
copy_kernel	851
copy_kernel	77
FW ROUTE	1056
fill_kernel	630
fill_kernel	574
im2col_gpu_kernel	97
copy_kernel	628
normalize_kernel	89
scale_bias_kernel	87
add_bias_kernel	79
activate_array_kernel	93
FW CONVOLUTIONAL	4327
fill_kernel	91
fill_kernel	70
add_bias_kernel	75
activate_array_kernel	86
FW CONVOLUTIONAL	444
fill_kernel	73
copy_kernel	81
activate_array_kernel	90
activate_array_kernel	88
activate_array_kernel	86
activate_array_kernel	83
activate_array_kernel	86
activate_array_kernel	84
opencl_pull_array	152b2b0	22436
FW YOLO	23370
opencl_pull_array	152b2b0	291
../db-dnn2/img.jpg: Predicted in 13.945973 seconds.
Object: 100%

Second Iteration

opencl_push_array	a35f4008	2445
fill_kernel	114
fill_kernel	90
im2col_gpu_kernel	102
copy_kernel	71
normalize_kernel	88
scale_bias_kernel	89
add_bias_kernel	81
activate_array_kernel	93
FW CONVOLUTIONAL	1123
fill_kernel	72
forward_maxpool_layer_kernel	111
FW MAXPOOL	172
fill_kernel	72
fill_kernel	81
im2col_gpu_kernel	90
copy_kernel	66
normalize_kernel	83
scale_bias_kernel	83
add_bias_kernel	78
activate_array_kernel	98
FW CONVOLUTIONAL	1041
fill_kernel	72
forward_maxpool_layer_kernel	109
FW MAXPOOL	169
fill_kernel	71
fill_kernel	93
im2col_gpu_kernel	100
copy_kernel	68
normalize_kernel	82
scale_bias_kernel	76
add_bias_kernel	85
activate_array_kernel	89
FW CONVOLUTIONAL	1006
fill_kernel	72
forward_maxpool_layer_kernel	100
FW MAXPOOL	158
fill_kernel	65
fill_kernel	62
im2col_gpu_kernel	90
copy_kernel	77
normalize_kernel	85
scale_bias_kernel	75
add_bias_kernel	77
activate_array_kernel	89
FW CONVOLUTIONAL	923
fill_kernel	65
forward_maxpool_layer_kernel	92
FW MAXPOOL	154
fill_kernel	64
fill_kernel	63
im2col_gpu_kernel	95
copy_kernel	70
normalize_kernel	82
scale_bias_kernel	74
add_bias_kernel	75
activate_array_kernel	91
FW CONVOLUTIONAL	909
fill_kernel	65
forward_maxpool_layer_kernel	95
FW MAXPOOL	158
fill_kernel	64
fill_kernel	64
im2col_gpu_kernel	86
copy_kernel	67
normalize_kernel	82
scale_bias_kernel	90
add_bias_kernel	82
activate_array_kernel	93
FW CONVOLUTIONAL	932
fill_kernel	65
forward_maxpool_layer_kernel	94
FW MAXPOOL	154
fill_kernel	63
fill_kernel	63
im2col_gpu_kernel	91
copy_kernel	76
normalize_kernel	83
scale_bias_kernel	76
add_bias_kernel	74
activate_array_kernel	91
FW CONVOLUTIONAL	918
fill_kernel	66
fill_kernel	63
copy_kernel	77
normalize_kernel	82
scale_bias_kernel	76
add_bias_kernel	77
activate_array_kernel	88
FW CONVOLUTIONAL	785
fill_kernel	63
fill_kernel	61
im2col_gpu_kernel	93
copy_kernel	73
normalize_kernel	80
scale_bias_kernel	70
add_bias_kernel	71
activate_array_kernel	86
FW CONVOLUTIONAL	889
fill_kernel	77
fill_kernel	61
add_bias_kernel	64
activate_array_kernel	91
FW CONVOLUTIONAL	429
fill_kernel	75
copy_kernel	82
activate_array_kernel	91
activate_array_kernel	93
activate_array_kernel	91
activate_array_kernel	97
activate_array_kernel	91
activate_array_kernel	91
opencl_pull_array	14037e8	24525
FW YOLO	25489
fill_kernel	105
copy_kernel	104
FW ROUTE	184
fill_kernel	77
fill_kernel	70
copy_kernel	80
normalize_kernel	79
scale_bias_kernel	88
add_bias_kernel	79
activate_array_kernel	99
FW CONVOLUTIONAL	847
fill_kernel	70
fill_kernel	80
upsample_kernel	84
FW UPSAMPLE	279
fill_kernel	72
copy_kernel	86
copy_kernel	77
FW ROUTE	271
fill_kernel	71
fill_kernel	84
im2col_gpu_kernel	97
copy_kernel	74
normalize_kernel	77
scale_bias_kernel	79
add_bias_kernel	73
activate_array_kernel	98
FW CONVOLUTIONAL	954
fill_kernel	73
fill_kernel	76
add_bias_kernel	63
activate_array_kernel	85
FW CONVOLUTIONAL	441
fill_kernel	70
copy_kernel	86
activate_array_kernel	84
activate_array_kernel	98
activate_array_kernel	83
activate_array_kernel	84
activate_array_kernel	100
activate_array_kernel	89
opencl_pull_array	152b2b0	7621
FW YOLO	8599
opencl_pull_array	152b2b0	311
../db-dnn2/img.jpg: Predicted in 7.252359 seconds.
Object: 100%

from darknet.

r0l1 avatar r0l1 commented on May 10, 2024

I just enabled RPI in the Makefile and it is slighlty faster:

opencl_push_array	a4ae8008	2446
fill_kernel	120
fill_kernel	77
im2col_gpu_kernel	105
gemm_kernel	93
copy_kernel	88
normalize_kernel	85
scale_bias_kernel	87
add_bias_kernel	74
activate_array_kernel	103
FW CONVOLUTIONAL	1108
fill_kernel	77
forward_maxpool_layer_kernel	100
FW MAXPOOL	173
fill_kernel	72
fill_kernel	104
im2col_gpu_kernel	94
gemm_kernel	87
copy_kernel	85
normalize_kernel	90
scale_bias_kernel	78
add_bias_kernel	84
activate_array_kernel	102
FW CONVOLUTIONAL	1099
fill_kernel	76
forward_maxpool_layer_kernel	95
FW MAXPOOL	151
fill_kernel	78
fill_kernel	74
im2col_gpu_kernel	93
gemm_kernel	93
copy_kernel	83
normalize_kernel	84
scale_bias_kernel	69
add_bias_kernel	82
activate_array_kernel	105
FW CONVOLUTIONAL	1027
fill_kernel	90
forward_maxpool_layer_kernel	104
FW MAXPOOL	171
fill_kernel	73
fill_kernel	72
im2col_gpu_kernel	92
gemm_kernel	92
copy_kernel	86
normalize_kernel	96
scale_bias_kernel	76
add_bias_kernel	87
activate_array_kernel	90
FW CONVOLUTIONAL	1042
fill_kernel	74
forward_maxpool_layer_kernel	107
FW MAXPOOL	168
fill_kernel	74
fill_kernel	76
im2col_gpu_kernel	87
gemm_kernel	92
copy_kernel	87
normalize_kernel	83
scale_bias_kernel	76
add_bias_kernel	75
activate_array_kernel	100
FW CONVOLUTIONAL	1036
fill_kernel	70
forward_maxpool_layer_kernel	107
FW MAXPOOL	167
fill_kernel	77
fill_kernel	72
im2col_gpu_kernel	92
gemm_kernel	103
copy_kernel	92
normalize_kernel	95
scale_bias_kernel	75
add_bias_kernel	79
activate_array_kernel	100
FW CONVOLUTIONAL	1074
fill_kernel	66
forward_maxpool_layer_kernel	92
FW MAXPOOL	156
fill_kernel	62
fill_kernel	62
im2col_gpu_kernel	89
gemm_kernel	80
copy_kernel	83
normalize_kernel	88
scale_bias_kernel	75
add_bias_kernel	77
activate_array_kernel	91
FW CONVOLUTIONAL	1002
fill_kernel	65
fill_kernel	61
gemm_kernel	85
copy_kernel	80
normalize_kernel	81
scale_bias_kernel	77
add_bias_kernel	86
activate_array_kernel	89
FW CONVOLUTIONAL	857
fill_kernel	64
fill_kernel	62
im2col_gpu_kernel	91
gemm_kernel	86
copy_kernel	82
normalize_kernel	83
scale_bias_kernel	75
add_bias_kernel	75
activate_array_kernel	89
FW CONVOLUTIONAL	973
fill_kernel	63
fill_kernel	61
gemm_kernel	100
add_bias_kernel	77
activate_array_kernel	88
FW CONVOLUTIONAL	512
fill_kernel	66
copy_kernel	69
activate_array_kernel	93
activate_array_kernel	96
activate_array_kernel	94
activate_array_kernel	92
activate_array_kernel	93
activate_array_kernel	89
opencl_pull_array	13d05e8	23169
FW YOLO	24129
fill_kernel	93
copy_kernel	80
FW ROUTE	165
fill_kernel	85
fill_kernel	73
gemm_kernel	86
copy_kernel	80
normalize_kernel	83
scale_bias_kernel	79
add_bias_kernel	69
activate_array_kernel	90
FW CONVOLUTIONAL	860
fill_kernel	64
fill_kernel	62
upsample_kernel	77
FW UPSAMPLE	250
fill_kernel	64
copy_kernel	70
copy_kernel	83
FW ROUTE	258
fill_kernel	64
fill_kernel	65
im2col_gpu_kernel	93
gemm_kernel	88
copy_kernel	82
normalize_kernel	87
scale_bias_kernel	93
add_bias_kernel	96
activate_array_kernel	86
FW CONVOLUTIONAL	1070
fill_kernel	69
fill_kernel	68
gemm_kernel	91
add_bias_kernel	73
activate_array_kernel	90
FW CONVOLUTIONAL	506
fill_kernel	68
copy_kernel	81
activate_array_kernel	91
activate_array_kernel	89
activate_array_kernel	92
activate_array_kernel	88
activate_array_kernel	85
activate_array_kernel	85
opencl_pull_array	14f8098	7321
FW YOLO	8267
opencl_pull_array	14f8098	271
../db-dnn2/img.jpg: Predicted in 6.395759 seconds.
Object: 100%

from darknet.

r0l1 avatar r0l1 commented on May 10, 2024

Hmm, there is not a big difference. If I understand it right, the bottleneck is the link between CPU and GPU?

opencl_push_array	a353d008	2429
fill_kernel	108
fill_kernel	75
im2col_gpu_kernel	100
copy_kernel	70
normalize_kernel	88
scale_bias_kernel	87
add_bias_kernel	86
activate_array_kernel	98
FW CONVOLUTIONAL	1100
fill_kernel	97
forward_maxpool_layer_kernel	117
FW MAXPOOL	179
fill_kernel	75
fill_kernel	89
im2col_gpu_kernel	93
copy_kernel	65
normalize_kernel	80
scale_bias_kernel	82
add_bias_kernel	87
activate_array_kernel	98
FW CONVOLUTIONAL	1047
fill_kernel	69
forward_maxpool_layer_kernel	109
FW MAXPOOL	170
fill_kernel	74
fill_kernel	87
im2col_gpu_kernel	102
copy_kernel	68
normalize_kernel	82
scale_bias_kernel	77
add_bias_kernel	88
activate_array_kernel	91
FW CONVOLUTIONAL	1020
fill_kernel	64
forward_maxpool_layer_kernel	92
FW MAXPOOL	154
fill_kernel	64
fill_kernel	65
im2col_gpu_kernel	92
copy_kernel	75
normalize_kernel	92
scale_bias_kernel	75
add_bias_kernel	78
activate_array_kernel	94
FW CONVOLUTIONAL	935
fill_kernel	63
forward_maxpool_layer_kernel	91
FW MAXPOOL	154
fill_kernel	65
fill_kernel	65
im2col_gpu_kernel	92
copy_kernel	74
normalize_kernel	80
scale_bias_kernel	75
add_bias_kernel	83
activate_array_kernel	90
FW CONVOLUTIONAL	923
fill_kernel	67
forward_maxpool_layer_kernel	92
FW MAXPOOL	149
fill_kernel	64
fill_kernel	65
im2col_gpu_kernel	90
copy_kernel	72
normalize_kernel	83
scale_bias_kernel	90
add_bias_kernel	80
activate_array_kernel	93
FW CONVOLUTIONAL	928
fill_kernel	66
forward_maxpool_layer_kernel	92
FW MAXPOOL	153
fill_kernel	64
fill_kernel	63
im2col_gpu_kernel	92
copy_kernel	78
normalize_kernel	83
scale_bias_kernel	77
add_bias_kernel	76
activate_array_kernel	90
FW CONVOLUTIONAL	917
fill_kernel	64
fill_kernel	59
copy_kernel	75
normalize_kernel	81
scale_bias_kernel	76
add_bias_kernel	75
activate_array_kernel	91
FW CONVOLUTIONAL	787
fill_kernel	66
fill_kernel	63
im2col_gpu_kernel	90
copy_kernel	73
normalize_kernel	83
scale_bias_kernel	75
add_bias_kernel	79
activate_array_kernel	87
FW CONVOLUTIONAL	910
fill_kernel	78
fill_kernel	61
add_bias_kernel	65
activate_array_kernel	91
FW CONVOLUTIONAL	424
fill_kernel	64
copy_kernel	66
activate_array_kernel	90
activate_array_kernel	90
activate_array_kernel	92
activate_array_kernel	90
activate_array_kernel	106
activate_array_kernel	92
FW YOLO	911
fill_kernel	63
copy_kernel	74
FW ROUTE	142
fill_kernel	65
fill_kernel	64
copy_kernel	78
normalize_kernel	82
scale_bias_kernel	78
add_bias_kernel	75
activate_array_kernel	88
FW CONVOLUTIONAL	769
fill_kernel	65
fill_kernel	64
upsample_kernel	77
FW UPSAMPLE	253
fill_kernel	63
copy_kernel	71
copy_kernel	83
FW ROUTE	254
fill_kernel	63
fill_kernel	69
im2col_gpu_kernel	89
copy_kernel	75
normalize_kernel	90
scale_bias_kernel	77
add_bias_kernel	76
activate_array_kernel	87
FW CONVOLUTIONAL	939
fill_kernel	65
fill_kernel	63
add_bias_kernel	65
activate_array_kernel	91
FW CONVOLUTIONAL	424
fill_kernel	65
copy_kernel	82
activate_array_kernel	94
activate_array_kernel	88
activate_array_kernel	91
activate_array_kernel	89
activate_array_kernel	90
activate_array_kernel	88
FW YOLO	917
opencl_pull_array	152cf00	32873
../db-dnn2/img.jpg: Predicted in 7.266204 seconds.
Object: 100%

from darknet.

sowson avatar sowson commented on May 10, 2024

@r0l1 the last value in each line is the time an issue is when you copy between VRAM and RAM. Take a look at the last line. On the good and fast PC, it can be 0 and you have 32873 on the pull memory from VRAM to RAM. :)

opencl_pull_array 152cf00 32873

Thanks!

from darknet.

sowson avatar sowson commented on May 10, 2024

@r0l1 And how about RPI=1 ?

from darknet.

r0l1 avatar r0l1 commented on May 10, 2024

@sowson Understood. 32873 clock ticks are quite a lot... I hoped that I could reach at least 1fps for a video stream analysis... The folks at the ODroid forum reached 3fps with another approach.

RPI=1

opencl_push_array	a4a3d008	2453
fill_kernel	121
fill_kernel	77
im2col_gpu_kernel	101
gemm_kernel	103
copy_kernel	94
normalize_kernel	88
scale_bias_kernel	81
add_bias_kernel	81
activate_array_kernel	105
FW CONVOLUTIONAL	1109
fill_kernel	79
forward_maxpool_layer_kernel	116
FW MAXPOOL	176
fill_kernel	79
fill_kernel	71
im2col_gpu_kernel	107
gemm_kernel	76
copy_kernel	68
normalize_kernel	80
scale_bias_kernel	70
add_bias_kernel	69
activate_array_kernel	84
FW CONVOLUTIONAL	961
fill_kernel	64
forward_maxpool_layer_kernel	115
FW MAXPOOL	165
fill_kernel	68
fill_kernel	63
im2col_gpu_kernel	83
gemm_kernel	86
copy_kernel	70
normalize_kernel	71
scale_bias_kernel	67
add_bias_kernel	68
activate_array_kernel	84
FW CONVOLUTIONAL	861
fill_kernel	78
forward_maxpool_layer_kernel	92
FW MAXPOOL	143
fill_kernel	64
fill_kernel	63
im2col_gpu_kernel	80
gemm_kernel	84
copy_kernel	69
normalize_kernel	73
scale_bias_kernel	66
add_bias_kernel	72
activate_array_kernel	82
FW CONVOLUTIONAL	855
fill_kernel	67
forward_maxpool_layer_kernel	89
FW MAXPOOL	139
fill_kernel	66
fill_kernel	67
im2col_gpu_kernel	84
gemm_kernel	78
copy_kernel	72
normalize_kernel	72
scale_bias_kernel	71
add_bias_kernel	69
activate_array_kernel	84
FW CONVOLUTIONAL	870
fill_kernel	64
forward_maxpool_layer_kernel	90
FW MAXPOOL	139
fill_kernel	64
fill_kernel	63
im2col_gpu_kernel	79
gemm_kernel	77
copy_kernel	71
normalize_kernel	71
scale_bias_kernel	66
add_bias_kernel	69
activate_array_kernel	93
FW CONVOLUTIONAL	861
fill_kernel	69
forward_maxpool_layer_kernel	90
FW MAXPOOL	138
fill_kernel	65
fill_kernel	62
im2col_gpu_kernel	81
gemm_kernel	77
copy_kernel	82
normalize_kernel	72
scale_bias_kernel	68
add_bias_kernel	68
activate_array_kernel	81
FW CONVOLUTIONAL	857
fill_kernel	65
fill_kernel	61
gemm_kernel	78
copy_kernel	77
normalize_kernel	70
scale_bias_kernel	78
add_bias_kernel	80
activate_array_kernel	92
FW CONVOLUTIONAL	788
fill_kernel	73
fill_kernel	70
im2col_gpu_kernel	88
gemm_kernel	89
copy_kernel	80
normalize_kernel	80
scale_bias_kernel	89
add_bias_kernel	75
activate_array_kernel	87
FW CONVOLUTIONAL	928
fill_kernel	75
fill_kernel	60
gemm_kernel	92
add_bias_kernel	68
activate_array_kernel	81
FW CONVOLUTIONAL	442
fill_kernel	69
copy_kernel	66
activate_array_kernel	83
activate_array_kernel	81
activate_array_kernel	84
activate_array_kernel	82
activate_array_kernel	83
activate_array_kernel	82
FW YOLO	786
fill_kernel	65
copy_kernel	71
FW ROUTE	119
fill_kernel	64
fill_kernel	63
gemm_kernel	78
copy_kernel	71
normalize_kernel	73
scale_bias_kernel	67
add_bias_kernel	67
activate_array_kernel	87
FW CONVOLUTIONAL	748
fill_kernel	65
fill_kernel	62
upsample_kernel	71
FW UPSAMPLE	219
fill_kernel	66
copy_kernel	69
copy_kernel	71
FW ROUTE	217
fill_kernel	63
fill_kernel	61
im2col_gpu_kernel	83
gemm_kernel	77
copy_kernel	73
normalize_kernel	81
scale_bias_kernel	74
add_bias_kernel	68
activate_array_kernel	79
FW CONVOLUTIONAL	875
fill_kernel	65
fill_kernel	64
gemm_kernel	83
add_bias_kernel	67
activate_array_kernel	80
FW CONVOLUTIONAL	433
fill_kernel	66
copy_kernel	69
activate_array_kernel	80
activate_array_kernel	81
activate_array_kernel	79
activate_array_kernel	84
activate_array_kernel	77
activate_array_kernel	80
FW YOLO	774
opencl_pull_array	14f98e0	27580
../db-dnn2/img.jpg: Predicted in 6.348443 seconds.
Object: 100%

from darknet.

sowson avatar sowson commented on May 10, 2024

@r0l1 I am studying now https://rocm-documentation.readthedocs.io/en/latest/Programming_Guides/Opencl-optimization.html and interesting is the "1.3.4.1 Zero Copy Memory Objects" part, will see what I can do with it.

from darknet.

r0l1 avatar r0l1 commented on May 10, 2024

Sounds great! I'll have a detailed look at the document this weekend. Maybe I can help...

from darknet.

sowson avatar sowson commented on May 10, 2024

Maybe a good start point for you..?
MemMap.patch.txt
;-).

Thx!

from darknet.

r0l1 avatar r0l1 commented on May 10, 2024

That was fast! Thanks! Sadly this didn't make any difference. It was worth a try.
I'll test one more thing and if this doesn't succeed, then this device might not be suitable for yolo...

opencl_push_array	a353d008	28
could not push array to device. error: CL_INVALID_OPERATION
fill_kernel	173
fill_kernel	85
im2col_gpu_kernel	111
copy_kernel	72
normalize_kernel	82
scale_bias_kernel	87
add_bias_kernel	76
activate_array_kernel	97
FW CONVOLUTIONAL	1103
fill_kernel	75
forward_maxpool_layer_kernel	105
FW MAXPOOL	174
fill_kernel	76
fill_kernel	76
im2col_gpu_kernel	90
copy_kernel	66
normalize_kernel	79
scale_bias_kernel	82
add_bias_kernel	73
activate_array_kernel	91
FW CONVOLUTIONAL	999
fill_kernel	65
forward_maxpool_layer_kernel	96
FW MAXPOOL	159
fill_kernel	64
fill_kernel	81
im2col_gpu_kernel	88
copy_kernel	77
normalize_kernel	83
scale_bias_kernel	79
add_bias_kernel	88
activate_array_kernel	96
FW CONVOLUTIONAL	966
fill_kernel	64
forward_maxpool_layer_kernel	91
FW MAXPOOL	156
fill_kernel	64
fill_kernel	62
im2col_gpu_kernel	104
copy_kernel	83
normalize_kernel	90
scale_bias_kernel	77
add_bias_kernel	76
activate_array_kernel	92
FW CONVOLUTIONAL	979
fill_kernel	65
forward_maxpool_layer_kernel	90
FW MAXPOOL	151
fill_kernel	64
fill_kernel	62
im2col_gpu_kernel	106
copy_kernel	69
normalize_kernel	82
scale_bias_kernel	74
add_bias_kernel	77
activate_array_kernel	93
FW CONVOLUTIONAL	942
fill_kernel	65
forward_maxpool_layer_kernel	94
FW MAXPOOL	157
fill_kernel	65
fill_kernel	62
im2col_gpu_kernel	91
copy_kernel	70
normalize_kernel	83
scale_bias_kernel	87
add_bias_kernel	81
activate_array_kernel	89
FW CONVOLUTIONAL	928
fill_kernel	65
forward_maxpool_layer_kernel	99
FW MAXPOOL	161
fill_kernel	64
fill_kernel	63
im2col_gpu_kernel	90
copy_kernel	73
normalize_kernel	85
scale_bias_kernel	75
add_bias_kernel	75
activate_array_kernel	89
FW CONVOLUTIONAL	915
fill_kernel	64
fill_kernel	64
copy_kernel	70
normalize_kernel	86
scale_bias_kernel	68
add_bias_kernel	68
activate_array_kernel	82
FW CONVOLUTIONAL	741
fill_kernel	64
fill_kernel	64
im2col_gpu_kernel	82
copy_kernel	66
normalize_kernel	70
scale_bias_kernel	68
add_bias_kernel	67
activate_array_kernel	81
FW CONVOLUTIONAL	813
fill_kernel	78
fill_kernel	65
add_bias_kernel	59
activate_array_kernel	83
FW CONVOLUTIONAL	402
fill_kernel	64
copy_kernel	67
activate_array_kernel	81
activate_array_kernel	84
activate_array_kernel	83
activate_array_kernel	82
activate_array_kernel	85
activate_array_kernel	82
FW YOLO	790
fill_kernel	64
copy_kernel	72
FW ROUTE	122
fill_kernel	64
fill_kernel	62
copy_kernel	65
normalize_kernel	73
scale_bias_kernel	72
add_bias_kernel	67
activate_array_kernel	80
FW CONVOLUTIONAL	695
fill_kernel	65
fill_kernel	66
upsample_kernel	78
FW UPSAMPLE	250
fill_kernel	64
copy_kernel	72
copy_kernel	81
FW ROUTE	257
fill_kernel	62
fill_kernel	63
im2col_gpu_kernel	91
copy_kernel	77
normalize_kernel	83
scale_bias_kernel	74
add_bias_kernel	82
activate_array_kernel	96
FW CONVOLUTIONAL	947
fill_kernel	65
fill_kernel	62
add_bias_kernel	63
activate_array_kernel	87
FW CONVOLUTIONAL	424
fill_kernel	76
copy_kernel	80
activate_array_kernel	95
activate_array_kernel	87
activate_array_kernel	90
activate_array_kernel	90
activate_array_kernel	89
activate_array_kernel	91
FW YOLO	897
opencl_pull_array	152aee0	29886
../db-dnn2/img.jpg: Predicted in 7.195652 seconds.
Object: 100%

from darknet.

sowson avatar sowson commented on May 10, 2024

That is the best I can do. I am testing it working now before committing to the repo. It is not as fast as I expected.
MemMapTest.patch.txt
Thx!

from darknet.

sowson avatar sowson commented on May 10, 2024

After careful testing, I put into repo the last commit based on the patch. It turns out that now training stability is much better. I think this is all I can do on this issue at least for now.

from darknet.

r0l1 avatar r0l1 commented on May 10, 2024

@PeterQuinn925 They made all sorts of statements on their website:

Visual Attention detects interesting things in the world, running at 73 frames/s on the JeVois smart camera's quad-core processor.

TensorFlow: recognize 1000 different types of objects at up to 83 frames/second using deep learning

Darknet and Darknet YOLO: detect and recognize up to 1000 different types of objects using deep neural networks

I did expect, that YOLO would run with a similar good performance. However after digging into the source code i found following statement (Source):

The YOLO network is currently quite slow, hence it is only run once in a while.

@sowson Thank you for all your efforts! Right now I don't have access to my testing device (ODroid), but next week I'll give you a feedback about the performance state with the latest source.

Why did you set the compiler optimization flag to -O0? And this could be removed now, if you agree.

Edit: I'll close this issue after the benchmark.

from darknet.

prnvjb avatar prnvjb commented on May 10, 2024

Hi everyone,
How about implementing opencl + nnpack variant for this? I think, hopefully, increase fps atleast on opencl supported SoCs.

from darknet.

r0l1 avatar r0l1 commented on May 10, 2024

@sowson just tested the new source on the ODroid. Sadly there are no improvements. Thanks for your support! Please feel free to close this issue.

from darknet.

rajhlinux avatar rajhlinux commented on May 10, 2024

r0l1 "Looking at the jevoisinc camera module, which has a very similar GPU, I can't understand, how they manage to get this insane framerate"

After looking at the video (around 1:02):
https://youtu.be/aJp-mIBytno

... the jevoisinc mentions it uses a NPU (Tensor Core) built-in hardware, this is proper AI tech (Matrix Multiplier) hardware to boost AI computation performance. Similar to how Nvidia RTX GPUs have built in Tensor Cores. This means the jevoisinc is not relying heavily on the Mali GPU for AI computation. Doing AI in regular GPU cores are not ideal for AI but it does help in offloading CPU usage, if your SBC board does not have NPU (tensor cores), do not expect great outcomes. I hope this answers your curiosity.

That was a good find about jevoisinc, they are using allwinner SoCs. I highly recommend using MediaTek SoCs they also have built in Tensor Cores which they call APUs. I believe there are some dev boards out there for MediaTek.

from darknet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.