I tried running some plaidbench benchmarks on my new rx 6800 and comparing the results

PlaidML Poor Performance on (My?) RX 6800 about plaidml HOT 5 CLOSED

tedliosu commented on August 24, 2024

PlaidML Poor Performance on (My?) RX 6800

from plaidml.

Comments (5)

mdberryh commented on August 24, 2024

I have the AMD radeon 6600 pro, and I got a lot worse results. I was wondering if it was from using ROCM, but I'm not. I also see the benchmarks you're talking about also are using ROCM and PlaidML. When I was trying to get ROCM working, it sounded like it didn't support consumer GPUs, but I am now reading there is some support...I dunno.

@-desktop:~$ plaidbench --examples 2048 --batch-size 16 keras --no-fp16 --no-train mobilenet
Running 2048 examples with mobilenet, batch size 16, on backend plaid
INFO:plaidml:Opening device "opencl_amd_gfx1032.0"
Compiling network... Warming up... Running...
Example finished, elapsed: 8.574s (compile), 10.313s (execution)

-----------------------------------------------------------------------------------------
Network Name         Inference Latency         Time / FPS          
-----------------------------------------------------------------------------------------
mobilenet            5.04 ms                   3.96 ms / 252.74 fps
Correctness: untested. Could not find golden data to compare against.
@desktop:~$ plaidbench --examples 2048 --batch-size 16 keras --no-fp16 --no-train resnet50
Running 2048 examples with resnet50, batch size 16, on backend plaid
INFO:plaidml:Opening device "opencl_amd_gfx1032.0"
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5
102858752/102853048 [==============================] - 59s 1us/step
Compiling network... Warming up... Running...
Example finished, elapsed: 8.874s (compile), 43.838s (execution)

-----------------------------------------------------------------------------------------
Network Name         Inference Latency         Time / FPS          
-----------------------------------------------------------------------------------------
resnet50             21.41 ms                  19.53 ms / 51.21 fps
Correctness: untested. Could not find golden data to compare against.

from plaidml.

mdberryh commented on August 24, 2024

I've actually checked another site and notice they didn't mention batch sizes, but with batch size of 1 the performance increased a lot, so they might have left that to the defaults https://openbenchmarking.org/test/pts/plaidml&eval=31492f06de09eca1672491c6b1484ffae4f2df19#metrics

@-desktop:~$ plaidbench  keras --no-fp16 --no-train mobilenet
Running 1024 examples with mobilenet, batch size 1, on backend plaid
INFO:plaidml:Opening device "opencl_amd_gfx1032.0"
Compiling network... Warming up... Running...
Example finished, elapsed: 9.348s (compile), 17.516s (execution)

-----------------------------------------------------------------------------------------
Network Name         Inference Latency         Time / FPS          
-----------------------------------------------------------------------------------------
mobilenet            17.11 ms                  2.81 ms / 356.46 fps
Correctness: PASS, max_error: 7.314303729799576e-06, max_abs_error: 6.407499313354492e-07, fail_ratio: 0.0

from plaidml.

tedliosu commented on August 24, 2024

I have the AMD radeon 6600 pro, and I got a lot worse results. I was wondering if it was from using ROCM, but I'm not. I also see the benchmarks you're talking about also are using ROCM and PlaidML. When I was trying to get ROCM working, it sounded like it didn't support consumer GPUs, but I am now reading there is some support...I dunno.

Here's the thing @mdberryh - since I (think) around mid-December of last year (2021), the non-ROCm pro drivers (which have been posted every quarter or so to AMD's support website) have replaced/been completely merged with the rocm-dkms drivers from AMD's ROCm stack. So unless if my observations and conclusions here are wrong, it seems that it's not technically possible anymore to install a set of Radeon pro drivers onto a Linux distro without also pulling at least one or two packages from AMD's ROCm repositories, and hopefully what I said makes sense.

Also before the apparent merge I mentioned above, I had been using a Vega 56 and running ROCm and other pro-drivers just fine on that graphics card, as ROCm has been good for at least quite a while now in supporting at least some consumer GPUs (see this documentation for more details). But what you might been referring to is how AMD ROCm doesn't officially support the iGPUs within their APUs, and how excruciatingly slow AMD has been in getting support implemented for Navi 1X/Navi 2X chips in ROCm (in fact, for some reason I still don't see Navi 1X support officially listed within the documentation I've linked above).

from plaidml.

tedliosu commented on August 24, 2024

I've actually checked another site and notice they didn't mention batch sizes, but with batch size of 1 the performance increased a lot, so they might have left that to the defaults https://openbenchmarking.org/test/pts/plaidml&eval=31492f06de09eca1672491c6b1484ffae4f2df19#metrics

Actually @mdberryh if you go to that same website you linked and click on "View Source" for any one of the test definitions listed (for example this latest definition) you'll see that Phoronix's benchmarks all do in fact use a batch size of 16 (scroll down and look at the line "--examples 2048 --batch-size..." under "test-definition.xml" to see what I mean). So yea I'm pretty sure that unless if you're talking about a different benchmarking results site other than Phoronix's openbenchmarking.org, or unless if the vast majority of the people running the plaidml portion of the Phoronix Test Suite manually edited their test definition files (which I highly doubt), my original complaint about the performance issues that we're facing when batch size gets set to 16 on these plaidml benchmarks is still valid.

Also, have you tried running batch size = 1 for all of the other neural network benchmarks in plaidml (e.g. plaidbench keras --no-fp16 --no-train resnet50)? Because after doing that on my machine I noticed that mobilenet was the only neural network where a decrease in batch size resulted in an increase in performance. So yea you might wanna double check to make sure that decreasing the batch sizes also increases performance for all of the other neural networks on your machine to make sure that there isn't something wonky going on within the underlying software stack of plaidml, which I highly suspect there is at least from the benchmarking results I got.

from plaidml.

tedliosu commented on August 24, 2024

Since I broke the system containing my RX 6800 while attempting to upgrade its system memory, and no longer have the time nor energy to maintain my own desktop system, I just sold my RX 6800 (my only AMD GPU). Therefore, since I will not be able to repro any potential fix of this issue anymore, I am closing this issue for the time being. Will be more than willing to reopen this if anyone else runs into the same issue as me.

from plaidml.

PlaidML Poor Performance on (My?) RX 6800 about plaidml HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent