plaidml / plaidml Goto Github PK

View Code? Open in Web Editor NEW

4.6K 158.0 399.0 154.94 MB

PlaidML is a framework for making deep learning work everywhere.

Home Page: https://ai.intel.com/plaidml

License: Apache License 2.0

Python 16.89% C++ 69.73% C 0.39% CMake 2.60% Makefile 0.02% MLIR 10.30% TeX 0.07%

plaidml's People

Contributors

Stargazers

Watchers

Forkers

flaub codeaudit hbcbh1999 ifarhankhan skyformat99 astrojuanlu ml-lab zanjs hhy5277 ghoshaw zhuwenxiao tony32769 hedgefair onisimchukv thefiddler sw1sh marctc phoebejinx allensmile lqleeqee xcbat mathematixy lucaswu brianretford yylonly kevd1337 hal2001 yibit onealgorithm anthonyawuley jaykimbravekjh algoding mohnkhan ufukhurriyetoglu bssrdf ghostintheshellarise xshhhm chetant c00lrain erehzio cuulee xevoinc calvinalvin tzerrell ajeetbpr shi27feng shyamalschandra longjohncoder jeremydeanw fsword73 alecgraves anton4i andrei-pokrovsky sweetcard aleksanderhan mcne65 marcelomata solderzzc dqj babakshah adityabantwal afcarl dlond sfrias timfelixbeyer strategist922 hcsumanth multiplecrashes martyfranz keimhaqi saxenauts tonylv davidmr001 helpful-bus corner4world sirapob sycomix eban143 hephaex tigerneil shivam11 guismow starscream9559 ourobouros goesjaya scnerd hanzz2007 reinhardhsu siju-samuel bhardwajrahul mohammedgomaa michiel-aaau wh-forker daydreamcoding fendaq xuewengeophysics chengzhongshen happog fireae arvind-india

plaidml's Issues

No matching distribution error from pip

Hello, I'm not able to install this using pip and am having even less luck compiling bazel to do it from source.

I've tried python version 2.7 and 3.6.

How can I install this?

How do I get PlaidML to run without the GPU?

I'm trying to benchmark PlaidML in the CPU, without the GPU. I tried saying "n" in plaidml-setup but it goes ahead and uses the GPU anyhow:

PlaidML Setup (0.0.0.dev0)

Thanks for using PlaidML!

Some Notes:
  * Bugs and other issues: https://github.com/plaidml/plaidml
  * Questions: https://stackoverflow.com/questions/tagged/plaidml
  * Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
  * PlaidML is licensed under the GNU AGPLv3
 
Default Config Devices:
   No devices.

Experimental Config Devices:
   bonaire.0 : Advanced Micro Devices, Inc. Bonaire

Using experimental devices can cause poor performance, crashes, and other nastiness.

Enable experimental device support? (y,n)[n]:n

No devices available in chosen config. Rerun plaidml-setup.
Using PlaidML backend.

Current network being run : inception_v3
Running 256 examples with inception_v3, batch size 1
INFO:plaidml:b'Opening device "bonaire.0"'

Python 3 missing in setup.cfg

Is Pyhon 3 supported? It is not listed in setup.cfg but Python 2.7 is listed.

[RNN] Doesn't work with the following examples from the Keras repository

lstm_text_generation.py
babi_memnn.py

This is the error I get for the LSTM language model:

plaidml.keras.backend.PlaidMLKerasException: Applying function, tensor with mismatching dimensionality: X, expected=2, got=3
Traceback:
_Op(dot)
outputs: R
inputs:
  X: _Op(rnn_lo)
  outputs: B
  inputs:
    A: _Op(concatenate)
    outputs: O
    inputs:
      I1: _Op(reshape)
      outputs: O
      inputs:
        I: _Op(+)
        outputs: A
        inputs:
          ...truncated...
      I0: _Op(reshape)
      outputs: O
      inputs:
        I: _Op(+)
        outputs: A
        inputs:
          ...truncated...
      I3: _Op(reshape)
      outputs: O
      inputs:
        I: _Op(+)
        outputs: A
        inputs:
          ...truncated...
      I2: _Op(reshape)
      outputs: O
      inputs:
        I: _Op(+)
        outputs: A
        inputs:
          ...truncated...
  Y: dense_1/kernel[128, 59]
  dense_1/kernel[128, 59]

The output is similar for the memory network example.

Overall thanks for the effort though.

Error Building on OS X 10.11.6

I'm trying to build 'plaidml' on a Mac running OS X. 10.11.6.
I'm getting the following error:

ERROR: /Users/davidlaxer/plaidml/tile/lang/BUILD:141:1: Executing genrule //tile/lang:grammar failed (Exit 1).
/bin/bash: /usr/local/opt/bison/bin/bison: No such file or directory

$ bazel version
Build label: 0.7.0
Build target: bazel-out/darwin_x86_64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Wed Oct 18 14:25:46 2017 (1508336746)
Build timestamp: 1508336746
Build timestamp as int: 1508336746

$ bazel build -c opt plaidml:wheel plaidml/keras:wheel

...............
WARNING: The lower priority option '-c opt' does not override the previous value '-c opt'.
INFO: Found 2 targets...
INFO: From BuildWheel plaidml/keras/wheel.pkg/dist/plaidml_keras-0.0.0.dev0-py2.py3-none-any.whl:
running bdist_wheel
running build
running build_py
creating build
creating build/lib
creating build/lib/plaidml
creating build/lib/plaidml/keras
copying pkg/plaidml/keras/init.py -> build/lib/plaidml/keras
copying pkg/plaidml/keras/backend.py -> build/lib/plaidml/keras
installing to build/bdist.macosx-10.7-x86_64/wheel
running install
running install_lib
creating build/bdist.macosx-10.7-x86_64
creating build/bdist.macosx-10.7-x86_64/wheel
creating build/bdist.macosx-10.7-x86_64/wheel/plaidml
creating build/bdist.macosx-10.7-x86_64/wheel/plaidml/keras
copying build/lib/plaidml/keras/init.py -> build/bdist.macosx-10.7-x86_64/wheel/plaidml/keras
copying build/lib/plaidml/keras/backend.py -> build/bdist.macosx-10.7-x86_64/wheel/plaidml/keras
running install_egg_info
running egg_info
creating pkg/plaidml_keras.egg-info
writing requirements to pkg/plaidml_keras.egg-info/requires.txt
writing pkg/plaidml_keras.egg-info/PKG-INFO
writing top-level names to pkg/plaidml_keras.egg-info/top_level.txt
writing dependency_links to pkg/plaidml_keras.egg-info/dependency_links.txt
writing entry points to pkg/plaidml_keras.egg-info/entry_points.txt
writing manifest file 'pkg/plaidml_keras.egg-info/SOURCES.txt'
reading manifest file 'pkg/plaidml_keras.egg-info/SOURCES.txt'
writing manifest file 'pkg/plaidml_keras.egg-info/SOURCES.txt'
Copying pkg/plaidml_keras.egg-info to build/bdist.macosx-10.7-x86_64/wheel/plaidml_keras-0.0.0.dev0-py2.7.egg-info
running install_scripts
creating build/bdist.macosx-10.7-x86_64/wheel/plaidml_keras-0.0.0.dev0.dist-info/WHEEL
INFO: From Compiling external/protobuf/src/google/protobuf/compiler/js/embed.cc [for host]:
external/protobuf/src/google/protobuf/compiler/js/embed.cc:37:12: warning: unused variable 'output_file' [-Wunused-const-variable]
const char output_file[] = "well_known_types_embed.cc";
^
1 warning generated.
ERROR: /Users/davidlaxer/plaidml/tile/lang/BUILD:141:1: Executing genrule //tile/lang:grammar failed (Exit 1).
/bin/bash: /usr/local/opt/bison/bin/bison: No such file or directory
INFO: Elapsed time: 359.894s, Critical Path: 10.72s
David-Laxers-MacBook-Pro:plaidml davidlaxer$
David-Laxers-MacBook-Pro:plaidml davidlaxer$ sudo port install bison
Password:
---> Computing dependencies for bison
---> Cleaning bison
---> Scanning binaries for linking errors
---> No broken files found.
David-Laxers-MacBook-Pro:plaidml davidlaxer$ bison
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/bison: missing operand after /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/bison' Try /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/bison --help' for more information.
David-Laxers-MacBook-Pro:plaidml davidlaxer$ bazel build -c opt plaidml:wheel plaidml/keras:wheel
WARNING: The lower priority option '-c opt' does not override the previous value '-c opt'.
INFO: Found 2 targets...
ERROR: /Users/davidlaxer/plaidml/tile/lang/BUILD:141:1: Executing genrule //tile/lang:grammar failed (Exit 1).
/bin/bash: /usr/local/opt/bison/bin/bison: No such file or directory
INFO: Elapsed time: 28.168s, Critical Path: 8.71s

$ which bison
/usr/bin/bison

$ bison --version
bison (GNU Bison) 2.3
Written by Robert Corbett and Richard Stallman.

Poor Performance with PlaidML Backend on Mac?

While I realize it is experimental, the backend seems to work (in 2.7 and 3.5), and different parts of my computer seem to get warm when compared to TF (and CPU usage stays under 100%), but it's half the speed for a basic CNN on MNIST (https://gist.github.com/kmader/ebd1c2a07815e1f227c6e77ec996e928). Is there something I am doing wrong here?

Model

Layer (type)                 Output Shape              Param #   
=================================================================
reshape_1 (Reshape)          (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 16)        160       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 26, 26, 16)        2320      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 16)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 13, 13, 16)        2320      
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 11, 11, 16)        2320      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 5, 5, 16)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 400)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                4010      
=================================================================
Total params: 11,130
Trainable params: 11,130
Non-trainable params: 0

Results

Tensorflow 1.3 CPU

CPU times: user 1min 41s, sys: 17.1 s, total: 1min 58s
Wall time: 32.5 s

plaidml backend

INFO:plaidml:b'Opening device "intel(r)_iris(tm)_graphics_550.0'

Slowest

CPU times: user 44.2 s, sys: 5.76 s, total: 49.9 s
Wall time: 1min 9s

Fastest

CPU times: user 43.8 s, sys: 6.03 s, total: 49.8 s
Wall time: 49 s

Plaidbench train vgg16 - Abort trap 6

Getting an "Abort trap: 6" error every time I run the vgg16 benchmark with training.

clinfo here: #17 (comment)

on latest plaidml/plaidml-keras (0.1.2 from pip) and plaidbench.

Andys-iMac-2:plaidbench andy$ python plaidbench.py --plaid --batch-size 16 --train vgg16
Using PlaidML backend.

Current network being run : vgg16
Loading the data
INFO:plaidml:Opening device "amd_radeon_pro_580_compute_engine.0"
Model loaded.
Compiling and running initial batch, batch_size=16
INFO:plaidml:Analyzing Ops: 57 of 483 operations complete
INFO:plaidml:Analyzing Ops: 108 of 483 operations complete
INFO:plaidml:Analyzing Ops: 204 of 483 operations complete
INFO:plaidml:Analyzing Ops: 217 of 483 operations complete
INFO:plaidml:Analyzing Ops: 230 of 483 operations complete
INFO:plaidml:Analyzing Ops: 242 of 483 operations complete
INFO:plaidml:Analyzing Ops: 249 of 483 operations complete
INFO:plaidml:Analyzing Ops: 262 of 483 operations complete
Abort trap: 6

Add contributors file

It will be really great if you will add some contributors file where you will describe something like this:

What packages required to contribute
What steps should be executed after new features added:
- Test running and expected results
- Building source code. Yes I know that such instructions already exists in installation but maybe there should be added some flags for enabling debug features.
General requirements to follow during contributing:
- Test coverage should not decrease
- Tests should not fail
- Follow PEP8 or any other guide
- Use linter or anything else

I think with such file it will be easier to join you community and boost the development.

Mac+AMD: AMD not detected and Intel uses too high of a work group

iMac 2017 with a Radeon Pro 580 and a Core i5-7600K. Compiled and installed PlaidML from source. Installed via the pip wheel.

Ran plaidml-setup:

PlaidML Setup (0.0.0.dev0)

Thanks for using PlaidML!

Some Notes:

Bugs and other issues: https://github.com/plaidml/plaidml

Questions: https://stackoverflow.com/questions/tagged/plaidml

Say hello: https://groups.google.com/forum/#!forum/plaidml-dev

PlaidML is licensed under the GNU AGPLv3

Default Config Devices:
No devices.

Experimental Config Devices:
intel(r)_core(tm)i5-7600k_cpu@_3.80ghz.0 : Intel Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz

Using experimental devices can cause poor performance, crashes, and other nastiness.
Enable experimental device support? (y,n)[n]:y

PlaidML sends anonymous usage statistics to help guide improvements.
We'd love your help making it better.

Enable telemetry reporting? (y,n)[y]:y

Almost done. Multiplying some matrices...
Tile code:
function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
ERROR:plaidml:OpenCL: [CL_INVALID_WORK_GROUP_SIZE] : OpenCL Error : clEnqueueNDRangeKernel failed: total work group size (32) is greater than the device can support (1) (cb=12)
Whew. That worked.

Save settings to /Users/andy/.plaidml? (y,n)[y]:y
Success!

Should a gpu be detected at this point? Is there somewhere I can lower total work group size manually?

New to submitting git issues. Sorry if I'm missing anything.

Autoencoder output inconsistent with TensorFlow

I would like to use PlaidML, because it support AMD GPUs. Unfortunately, I have had issues trying to replicate the results for an autoencoder model that I implemented using the TensorFlow backend in Keras. The following is a simplified version of an autoencoder model that I implemented, which still demonstrates the issue:

import cv2
import numpy

import plaidml.keras
plaidml.keras.install_backend()

from keras.models import Model
from keras.layers import Input
from keras.layers.convolutional import Conv2D, MaxPooling2D, UpSampling2D
from keras.optimizers import Adam
from keras.datasets import cifar10

if __name__ == "__main__":
    input_ = Input(shape=(32, 32, 3))

    x = input_
    x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
    x = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
    x = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
    x = MaxPooling2D((2, 2), padding='same')(x)

    x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
    x = UpSampling2D((2, 2))(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
    x = UpSampling2D((2, 2))(x)
    x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
    x = UpSampling2D((2, 2))(x)
    x = Conv2D(3, (3, 3), padding='same', activation='sigmoid')(x)

    autoencoder = Model(input_, x)
    optimizer = Adam(lr=5e-5, beta_1=0.5, beta_2=0.999)
    autoencoder.compile(optimizer=optimizer, loss='mean_absolute_error')

    (x_train, y_train), (x_test, y_test) = cifar10.load_data()
    x_train = x_train.astype('float32') 
    x_test = x_test.astype('float32')
    x_train /= numpy.max(x_train)
    x_test /= numpy.max(x_test)

    for epoch in range(50):
        index = epoch * 32
        batch = x_train[index:index + 32]
        loss = autoencoder.train_on_batch(batch, batch)
        print(loss)

    test_batch = x_test[0:14]
    figure = numpy.stack([
        test_batch,
        autoencoder.predict(test_batch),
    ], axis=1)

    figure = figure.reshape((2, 7) + figure.shape[1:])
    images_shape = numpy.array(figure.shape)
    n = len(images_shape)
    if n % 2 == 0:
        y_axes = list(range(1, n - 1, 2))
        x_axes = list(range(0, n - 1, 2))
    else:
        y_axes = list(range(0, n - 1, 2))
        x_axes = list(range(1, n - 1, 2))
    new_axes = y_axes, x_axes, [n - 1]
    new_shape = [numpy.prod(images_shape[x]) for x in new_axes]
    figure = numpy.transpose(
        figure,
        axes=numpy.concatenate(new_axes)
        ).reshape(new_shape)

    figure = numpy.clip(figure * 255, 0, 255).astype('uint8')

    cv2.imwrite('output.jpg', figure)

When I run this code using the PlaidML backend in Keras, I get the following output:

The images on the left are the input and the images on the right are the output. As you can see, the output images are almost completely black.

When I run the same code using the TensorFlow backend in Keras, I get the following output:

This output is more what I would expect from the autoencoder.

These results were produced on a Windows 10 machine with an AMD RX 580 GPU.

Any help would be appreciated to help me understand why I am getting different results between PlaidML and TensorFlow. I truly appreciate the work that has been done to develop PlaidML.

Arm aarch64 processor with Mali GPU support

Does PlaidML support Arm A53/72 processor with Mali GPU?

Thanks,

install_backend() fails with AssertionError

When executing the example code found in the README file, I get the following error:

import plaidml.keras
plaidml.keras.install_backend()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File “~/dev/plaidml/plaidml/keras/__init__.py", line 75, in install_backend
    from keras.utils import conv_utils
  File “~/.virtualenvs/plaidml/lib/python3.6/site-packages/keras/__init__.py", line 3, in <module>
    from . import utils
  File “~/.virtualenvs/plaidml/lib/python3.6/site-packages/keras/utils/__init__.py", line 6, in <module>
    from . import conv_utils
  File “~/.virtualenvs/plaidml/lib/python3.6/site-packages/keras/utils/conv_utils.py", line 3, in <module>
    from .. import backend as K
  File “~/.virtualenvs/plaidml/lib/python3.6/site-packages/keras/backend/__init__.py", line 71, in <module>
    assert _backend in {'theano', 'tensorflow', 'cntk'}
AssertionError

I am building plaidml using bazel on MacOS and installing into a virtual environment (which also has keras 2.0.8 installed).

Any ideas?

plaidml-setup hangs on Clover with clpeak working

Jan Vesely has created patches for supporting clpeak using Mesa Clover. Yet, even with these patches it hangs as reported in #48

The following is done in Ubuntu 16.04.3 LTS 64 bit desktop after installing padoka and opencl throgh mesa-ocl-id, libclc, and upgrading libclc manually.
To get libclc working for clpeak and Theano, Caffe etc, there is a simple recipe for Ubuntu 16.04.
Assuming that the latest padoka PPA has been installed appropriately.

1. git clone http://llvm.org/git/libclc.git
2. cd libclc
3. Download fp16 patch, https://github.com/llvm-mirror/libclc/compare/master...jvesely:cl_khr_fp16.patch
4. apply, patch -p1 < downloaded patch
5. Build, ./configure.py --prefix=/usr
6. make
7. sudo make install

This will give Clover a working libclc companion and produce the following example clpeak log

8. run ./clpeak

Platform: Clover
Device: AMD KABINI (DRM 3.23.0 / 4.15.0-041500rc4-generic, LLVM 6.0.0)
Driver version  : 17.4.0-devel - padoka PPA (Linux x64)
Compute units   : 2
Clock frequency : 600 MHz
WAITING. COMPILING KERNELS. TAKES A LONG TIME. PATIENCE....

Global memory bandwidth (GBPS)
  float   : 10.47
  float2  : 10.92
  float4  : 10.98
  float8  : 10.57
  float16 : 6.64

Single-precision compute (GFLOPS)
  float   : 150.22
  float2  : 149.95
  float4  : 149.47
  float8  : 148.93
  float16 : 147.05

half-precision compute (GFLOPS)
  half   : 149.66
  half2  : 149.58
  half4  : 149.40
  half8  : 148.71
  half16 : 147.51

Double-precision compute (GFLOPS)
  double   : 9.57
  double2  : 9.56
  double4  : 9.55
  double8  : 9.53
  double16 : 9.50

Integer compute (GIOPS)
  int   : 30.53
  int2  : 30.53
  int4  : 30.53
  int8  : 30.52
  int16 : 30.50

Transfer bandwidth (GBPS)
  enqueueWriteBuffer         : 3.44
  enqueueReadBuffer          : 0.77
  enqueueMapBuffer(for read) : 705.85
    memcpy from mapped ptr   : 0.78
  enqueueUnmap(after write)  : 778.07
    memcpy to mapped ptr     : 2.26

Kernel launch latency : 648.14 us

9. Test plaidML

INFO:plaidml:Opening device "amd_kabini_(drm_3.23.0_/_4.15.0-041500rc4-generic,_llvm_6.0.0).0"
Almost done. Multiplying some matrices...
Tile code:
function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
[hangs] - no activity on GPU or CPU with radeontop

10. Testing a bit more patching of Mesa and libclc to support parts of OCL 1.2
http://rthas.today/index.php/Main_Page

clinfo

    Number of platforms                               1
    Platform Name                                   Clover 
    Platform Vendor                                 Mesa
    Platform Version                                OpenCL 1.1 Mesa 17.4.0-devel (git-85789831b4)
    Platform Profile                                FULL_PROFILE
    Platform Extensions                             cl_khr_icd
    Platform Extensions function suffix             MESA   
    Platform Name                                   Clover
    Number of devices                                 1
    Device Name                                     AMD KABINI (DRM 3.23.0 / 4.15.0-041500rc4-generic, LLVM 6.0.0)
    Device Vendor                                   AMD
    Device Vendor ID                                0x1002
    Device Version                                  OpenCL 1.2 Mesa 17.4.0-devel (git-85789831b4)
    Driver Version                                  17.4.0-devel
    Device OpenCL C Version                         OpenCL C 1.2 
    Device Type                                     GPU
    Device Profile                                  FULL_PROFILE
    Max compute units                               2
    Max clock frequency                             600MHz
    Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
    Max work item dimensions                        3
    Max work item sizes                             256x256x256
    Max work group size                             256
    Preferred work group size multiple              64
    Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
    Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
    Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
    Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
    Address bits                                    64, Little-Endian
    Global memory size                              1917045964 (1.785GiB)
    Error Correction support                        No
    Max memory allocation                           479261491 (457.1MiB)
    Unified memory for Host and Device              No
    Minimum alignment for any data type             128 bytes
    Alignment of base address                       32768 bits (4096 bytes)
    Global Memory cache type                        None
    Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            479261491 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             4096x4096x4096 pixels
    Max number of read image args                 32
    Max number of write image args                32
    Local memory type                               Local
    Local memory size                               32768 (32KiB)
    Max constant buffer size                        479261491 (457.1MiB)
    Max number of constant args                     16
    Max size of kernel argument                     1024
    Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
    Prefer user sync for interop                    Yes
    Profiling timer resolution                      0ns
    Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    printf() buffer size                            0
    Built-in kernels                                
    Device Available                                Yes
    Compiler Available                              Yes
    Linker Available                                Yes
    Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

  NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD KABINI (DRM 3.23.0 / 4.15.0-041500rc4-generic, LLVM 6.0.0)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD KABINI (DRM 3.23.0 / 4.15.0-041500rc4-generic, LLVM 6.0.0)
  ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.8
  ICD loader Profile                              OpenCL 1.2
        NOTE:   your OpenCL library declares to support OpenCL 1.2,
                but it seems to support up to OpenCL 2.1 too.

11. Trying PlaidML again with OCL 1.2 support, volatile atomics...
gives the same result as 9 above. HANG.

Would be great to get help on how to debug this.

Implement Data Parallelism

PlaidML currently aborts when multiple devices are selected. Initial support for multiple devices will be implemented through a data parallel approach, using something similar to MXNet's strategy.

Possible Implementation Strategy:

Add a setting for parallelism strategy, with two options [None, Data]. None matches current behavior
Modify the program.cc in local_platform to take a list of devices and the parallelism strategy.
If data parallelism is enabled, do the work necessary to build multiple kernels and split work amongst them
Need to figure out how to shoehorn this into keras - splitting batches into parts and rejoining seems like the most reasonable approach

NotImplementedError: The Keras backend function 'flatten' is not yet implemented in Plaid. You can help us prioritize by letting us know if this function is important to you, and as always, contributions are welcome!

I have implemented a simple UNET Segmentation network. I am using Dice Loss as the metric for the same. Implementation is provided below.
I am getting the error that Plaid-ml has not implemented Flatten.

System OS: Ubuntu 14.04
GPU - AMD Hainan (Experimental)
PLaid-ML - 0.1.2
plaid_ml-keras - 0.1.2

def dice_loss(y_true, y_pred):
smooth = 1.
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = y_true_f * y_pred_f
score = (2. * K.sum(intersection) + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
return 1. - score

Kindly suggest if there is any alternative available for this issue.

NameError: name 'reduce' is not defined

Great project — thanks for making it!

While trying to use the PlaidML backend in my code, I encountered a problem when trying to flatten my input.

Traceback (most recent call last):
  File "ddpq_dreams.py", line 73, in <module>
    actor.add(Flatten(input_shape=(1, ) + env.observation_space.shape))
  File "/usr/local/anaconda3/lib/python3.6/site-packages/keras/models.py", line 442, in add
    layer(x)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py", line 602, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/keras/layers/core.py", line 484, in call
    return K.batch_flatten(inputs)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/plaidml/keras/backend.py", line 1008, in batch_flatten
    return x.batch_flatten()
  File "/usr/local/anaconda3/lib/python3.6/site-packages/plaidml/keras/backend.py", line 509, in batch_flatten
    new_shape = (self.shape[0], reduce(operator.mul, self.shape[1:]))
NameError: name 'reduce' is not defined

The same error actually happens when executing a the plaidvision and plaidbench examples with a few of the models e.g. python plaidvision.py resnet50 and python plaidbench.py resnet50 --print-stacktraces:

Using PlaidML backend.
INFO:plaidml:b'Opening device "amd_radeon_r9_m370x_compute_engine.0'
Traceback (most recent call last):
  File "plaidvision.py", line 310, in <module>
    main()
  File "plaidvision.py", line 260, in main
    model = Model(args.model, args.weights)
  File "plaidvision.py", line 206, in __init__
    self.model = ModelClass(weights=weights)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/keras/applications/resnet50.py", line 235, in ResNet50
    x = Flatten()(x)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py", line 602, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/keras/layers/core.py", line 484, in call
    return K.batch_flatten(inputs)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/plaidml/keras/backend.py", line 1008, in batch_flatten
    return x.batch_flatten()
  File "/usr/local/anaconda3/lib/python3.6/site-packages/plaidml/keras/backend.py", line 509, in batch_flatten
    new_shape = (self.shape[0], reduce(operator.mul, self.shape[1:]))
NameError: name 'reduce' is not defined

System:

MacBook Pro (Retina, 15-inch, Mid 2015)
AMD Radeon R9 M370X 2 GB
macOS 10.13
Python 3.6.2
Keras (2.0.8)
plaidml (0.1.1)
plaidml-keras (0.1.1)

Please let me know if you need more information.

Exceptions while using model.fit

I was training a LSTM on 0.3.0rc1 when this error occured

---------------------------------------------------------------------------
Unknown                                   Traceback (most recent call last)
~\Anaconda3\lib\site-packages\plaidml\tile.py in bind(self, bindings)
    253             try:
--> 254                 outputs[output_name] = applier.add_output(output_name)
    255             except BaseException as e:

~\Anaconda3\lib\site-packages\plaidml\__init__.py in add_output(self, name)
   1367     def add_output(self, name):
-> 1368         return Var(_lib().plaidml_apply_alloc_output(self, name.encode()))
   1369 

~\Anaconda3\lib\site-packages\plaidml\__init__.py in _check_err(self, result, func, args)
    717             return None
--> 718         self.raise_last_status()
    719 

~\Anaconda3\lib\site-packages\plaidml\library.py in raise_last_status(self)
    130     def raise_last_status(self):
--> 131         raise self.last_status()
    132 

Unknown: b'Missing output binding "N"'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-19-c50a2dc352a6> in <module>()
----> 1 model.fit(Xtrain, Ytrain, epochs = 100)

~\Anaconda3\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1573         else:
   1574             ins = x + y + sample_weights
-> 1575         self._make_train_function()
   1576         f = self.train_function
   1577 

~\Anaconda3\lib\site-packages\keras\engine\training.py in _make_train_function(self)
    965                                                  updates=updates,
    966                                                  name='train_function',
--> 967                                                  **self._function_kwargs)
    968 
    969     def _make_test_function(self):

~\Anaconda3\lib\site-packages\plaidml\keras\backend.py in function(inputs, outputs, updates, name)
    718     if name == None:
    719         name = ''
--> 720     return _Function(inputs, outputs, updates, name)
    721 
    722 

~\Anaconda3\lib\site-packages\plaidml\keras\backend.py in __init__(self, inputs, outputs, updates, name)
    121                                    _device(),
    122                                    list(zip(self._input_names, inputs)),
--> 123                                    list(zip(self._output_names, outputs)), updates)
    124         self._invoker = plaidml.Invoker(_ctx, self._func)
    125 

~\Anaconda3\lib\site-packages\plaidml\tile.py in compose(ctx, dev, inputs, outputs, updates)
   1003             to_be_bound.extend(reqs)
   1004             continue
-> 1005         bindings.insert(op, op.bind(bindings))
   1006         updates.extend(op.side_effects)
   1007 

~\Anaconda3\lib\site-packages\plaidml\tile.py in bind(self, bindings)
    255             except BaseException as e:
    256                 raise Exception('Failed to add output \'{}\' in op {}: {}; code={}'.format(
--> 257                     output_name, self.name, e.message, self.code))
    258         return outputs
    259 

AttributeError: 'Unknown' object has no attribute 'message'

Word Embeddings support

Hi, is it scheduled to support word embeddings ?

import numpy as np
import pandas as pd
import plaidml.keras
plaidml.keras.install_backend()
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Dropout
from keras.layers.advanced_activations import ELU
from keras.layers.recurrent import LSTM, GRU
from keras.layers.core import Flatten
from keras.layers.embeddings import Embedding
from sklearn.metrics import mean_squared_error
import random, time

Read in data

from keras.regularizers import l2
from keras.layers import Input, merge
from keras.models import Model

def saveModel(model):
modelCopy = {}
modelCopy['layers'] = []
for i in range(len(model.layers)+1):
modelCopy['layers'].append(model.get_layer(index=i))
modelCopy['weights'] = (model.get_weights())
return modelCopy

ratings = pd.read_csv('u.data', sep='\t')
ratings.columns=['userId', 'movieId', 'rating', 'ts']
nusers, nmovies = ratings.userId.nunique(), ratings.movieId.nunique()
movies = np.unique(np.array(ratings.movieId))
dictMovies = {movies[i]:i+1 for i in range(nmovies)}
ratings.movieId = ratings.movieId.replace(dictMovies)

np.random.seed(42)
msk = np.random.rand(len(ratings))<0.8

trn = ratings[msk]
val = ratings[~msk]
k= 70

user_in = Input(shape=(1,), dtype='int64', name='user_in')
u = Embedding(nusers+1, int(k3/2), input_length=1)(user_in)
movie_in = Input(shape=(1,), dtype='int64', name='movie_in')
m = Embedding(nmovies+1, int(k1/2), input_length=1)(movie_in)

x= merge([u, m], mode='concat')
x= Flatten()(x)
x = Dropout(0.5)(x)

x = Dense(int(k/2), activation='elu')(x)
x = Dropout(0.5)(x)
x = Dense(1)(x)
model = Model([user_in, movie_in], x)
model.compile(loss='mean_squared_error', optimizer="nadam")

patience, nbEpoch, bestError, begin = 0, 0, 999999.9, time.time()

Manual Earlystopping: Patience < nb allowed fails

while((time.time()-begin<240)&(patience<10)):
X_train, Y_train = [trn.userId, trn.movieId], trn.rating
patience += 1
nbEpoch += 1
model.fit(X_train, Y_train, batch_size=512, nb_epoch=1, verbose=1)
yhat= model.predict( [val.userId, val.movieId])
error = mean_squared_error(val.rating, yhat)
if(error<bestError):
duration, patience = time.time()-begin, 0
bestEpoch, bestError = nbEpoch, error
print(bestError)
INFO:plaidml:b'Opening device "intel(r)_iris(tm)_graphics_6100.0"'
main:47: UserWarning: The merge function is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge, e.g. add, concatenate, etc.
/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/keras/legacy/layers.py:458: UserWarning: The Merge layer is deprecated and will be removed after 08/2017. Use instead layers from keras.layers.merge, e.g. add, concatenate, etc.
name=name)
Traceback (most recent call last):

File "", line 49, in
x = Dropout(0.5)(x)

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/keras/engine/topology.py", line 602, in call
output = self.call(inputs, **kwargs)

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/keras/layers/core.py", line 111, in call
training=training)

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 1855, in in_train_phase
cx = x()

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/keras/layers/core.py", line 109, in dropped_inputs
seed=self.seed)

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 1691, in dropout
side_effects = {_plaidml_val(rng_state): _plaidml_val(n)}

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 181, in _plaidml_val
return x._plaidml_val(indent)

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 852, in _plaidml_val
a.add_input(k, _plaidml_val(v, indent + 1))

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 181, in _plaidml_val
return x._plaidml_val(indent)

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 852, in _plaidml_val
a.add_input(k, _plaidml_val(v, indent + 1))

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 181, in _plaidml_val
return x._plaidml_val(indent)

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 852, in _plaidml_val
a.add_input(k, _plaidml_val(v, indent + 1))

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 181, in _plaidml_val
return x._plaidml_val(indent)

File "/Users/edmondjacoupeau/anaconda/lib/python3.6/site-packages/plaidml/keras/backend.py", line 856, in _plaidml_val
str(e), self, self.traceback()))

PlaidMLKerasException: b'Applying function, tensor with mismatching dimensionality: I0, expected=3, got=2'
Traceback:
_Op(concatenate)
outputs: O
inputs:
I0: _Op(gather)
outputs: OUT
inputs:
V: embedding_1/embeddings[944, 105]
embedding_1/embeddings[944, 105]
I: _Op(cast_as_int32)
outputs: O
inputs:
I: user_in[None, 1]
user_in[None, 1]
I1: _Op(gather)
outputs: OUT
inputs:
V: embedding_2/embeddings[1683, 35]
embedding_2/embeddings[1683, 35]
I: _Op(cast_as_int32)
outputs: O
inputs:
I: movie_in[None, 1]
movie_in[None, 1]

(Same error with the sequential API and with other use cases)

Run Fails with CL_MEM_OBJECT_ALLOCATION_FAILURE error

I tried using Keras VGG implementation. (keras.applications.vgg16). I am getting the following error. This might not be related to Plaidml, can you help me with what I need to change with the GPU config and where to get rid of this issue?

Error:

Unknown Traceback (most recent call last)
in ()
----> 1 preds=myAngleCV(X_train, X_angle, X_test)

in myAngleCV(X_train, X_angle, X_test)
21 callbacks = get_callbacks(filepath=file_path, patience=5)
22 gen_flow = gen_flow_for_two_inputs(X_train_cv, X_angle_cv, y_train_cv)
---> 23 galaxyModel= getVggAngleModel()
24 galaxyModel.fit_generator(
25 gen_flow,

in getVggAngleModel()
34 angle_layer = Dense(1, )(input_2)
35 base_model = VGG16(weights='imagenet', include_top=False,
---> 36 input_shape=X_train.shape[1:], classes=1)
37 x = base_model.get_layer('block5_pool').output
38

/home/sayantan/anaconda2/lib/python2.7/site-packages/keras/applications/vgg16.pyc in VGG16(include_top, weights, input_tensor, input_shape, pooling, classes)
134 x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
135 x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
--> 136 x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
137 x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)
138

/home/sayantan/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in call(self, inputs, **kwargs)
573 'layer.build(batch_input_shape)')
574 if len(input_shapes) == 1:
--> 575 self.build(input_shapes[0])
576 else:
577 self.build(input_shapes)

/home/sayantan/anaconda2/lib/python2.7/site-packages/keras/layers/convolutional.pyc in build(self, input_shape)
132 name='kernel',
133 regularizer=self.kernel_regularizer,
--> 134 constraint=self.kernel_constraint)
135 if self.use_bias:
136 self.bias = self.add_weight(shape=(self.filters,),

/home/sayantan/anaconda2/lib/python2.7/site-packages/keras/legacy/interfaces.pyc in wrapper(*args, **kwargs)
85 warnings.warn('Update your ' + object_name + 86 ' call to the Keras 2 API: ' + signature, stacklevel=2)
---> 87 return func(*args, **kwargs)
88 wrapper._original_function = func
89 return wrapper

/home/sayantan/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint)
397 dtype=dtype,
398 name=name,
--> 399 constraint=constraint)
400 if regularizer is not None:
401 self.add_loss(regularizer(weight))

/home/sayantan/anaconda2/lib/python2.7/site-packages/plaidml/keras/backend.pyc in variable(value, dtype, name, constraint)
2536 else:
2537 if isinstance(value, _Op):
-> 2538 value = value.eval()
2539 tensor = plaidml.Tensor(_device(), plaidml.Shape(_ctx, _dtypes[dtype], *value.shape))
2540 with tensor.mmap_discard(_ctx) as view:

/home/sayantan/anaconda2/lib/python2.7/site-packages/plaidml/keras/backend.pyc in eval(self)
900
901 # Copy the data
--> 902 with tensor.mmap_current() as view:
903 view.copy_to_ndarray(out)
904

/home/sayantan/anaconda2/lib/python2.7/contextlib.pyc in enter(self)
15 def enter(self):
16 try:
---> 17 return self.gen.next()
18 except StopIteration:
19 raise RuntimeError("generator didn't yield")

/home/sayantan/anaconda2/lib/python2.7/site-packages/plaidml/init.pyc in mmap_current(self)
1127 def mmap_current(self):
1128 mapping = _lib().plaidml_map_buffer_current(self.buffer,
-> 1129 ctypes.cast(None, _MAP_BUFFER_FUNCTYPE), None)
1130 yield _View(self.buffer._ctx, mapping, self.shape.dtype, self.shape.ctype,
1131 _lib().plaidml_get_shape_element_count(self.shape), self.shape, None)

/home/sayantan/anaconda2/lib/python2.7/site-packages/plaidml/init.pyc in _check_err(self, result, func, args)
672 if func == self.plaidml_writeback_mapping and args[2]:
673 return None
--> 674 self.raise_last_status()
675
676

/home/sayantan/anaconda2/lib/python2.7/site-packages/plaidml/library.pyc in raise_last_status(self)
134
135 def raise_last_status(self):
--> 136 raise self.last_status()
137
138 def _logger_callback(self, unused_arg, level, msg):

Unknown: unable to run OpenCL kernel: CL_MEM_OBJECT_ALLOCATION_FAILURE

Inconsistent computations

Network computes different numbers than tensorflow backend does. Comparing the result of just the first convolutional layer of VGG19, the mean squared error between outputs of both backends is
of order 10^-11, second 10^-9, and after pooling it's already 36(!), which totally screws with a prediction.
This lion picure incorrectly predicts ocarina, while tensorflow backend predicting lion correctly.

import numpy as np
from urllib.request import urlopen

import plaidml.keras
plaidml.keras.install_backend()

import keras
from keras.applications import VGG19
from keras.applications.vgg19 import preprocess_input, decode_predictions
from keras.models import Model
from keras.preprocessing import image

url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Cecil_the_lion_at_Hwange_National_Park_%284516560206%29.jpg/330px-Cecil_the_lion_at_Hwange_National_Park_%284516560206%29.jpg'

img = image.pil_image.open(urlopen(url)).resize((224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

model = VGG19()
decode_predictions(model.predict(x, batch_size=1))

plaidml-setup hangs on Clover (OpenCL 1.1)

My Ubuntu 16.04 has Mesa, LLVM and more from the X-Swat PPA. clinfo reports

$ clinfo
[...]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
Platform Name                                 Clover
Device Name                                   AMD TAHITI (DRM 2.49.0 / 4.10.0-37-generic, LLVM 5.0.0)
[...]
Number of platforms                               1
[...]
Platform Version                                OpenCL 1.1 Mesa 17.2.3
Device Version                                  OpenCL 1.1 Mesa 17.2.3
Device OpenCL C Version                         OpenCL C 1.1

When I run plaidml-setup, I have to "Enable experimental device support" and then it just hangs. radeontop shows no activity on the GPU. I cannot stop plaidml-setup using CTRL+C.

Termination would have been nice and an error message telling me that OpenCL 1.1 is not supported would have been helpful.

TypeError: unsupported operand type(s) for //: '_Op' and 'int'

I tried to train GAN model with PlaidML (to testing conv2d_transposed).
But error occurred in backend.py

$ python mnist_cond_gan_plaidml.py 
Discriminator model:
INFO:plaidml:Opening device "intel(r)_iris(tm)_plus_graphics_640.0"
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 28, 28, 1)     0                                            
____________________________________________________________________________________________________
sequential_1 (Sequential)        (None, 12544)         387840      input_1[0][0]                    
____________________________________________________________________________________________________
generation (Dense)               (None, 1)             12545       sequential_1[1][0]               
____________________________________________________________________________________________________
auxiliary (Dense)                (None, 10)            125450      sequential_1[1][0]               
====================================================================================================
Total params: 525,835
Trainable params: 525,835
Non-trainable params: 0
____________________________________________________________________________________________________
Traceback (most recent call last):
  File "mnist_cond_gan_plaidml.py", line 147, in <module>
    generator = build_generator(latent_size)
  File "mnist_cond_gan_plaidml.py", line 60, in build_generator
    kernel_initializer='glorot_normal'))
  File "/Users/shimizuryou/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/keras/models.py", line 475, in add
    output_tensor = layer(self.outputs[0])
  File "/Users/shimizuryou/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/keras/engine/topology.py", line 602, in __call__
    output = self.call(inputs, **kwargs)
  File "/Users/shimizuryou/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/keras/layers/convolutional.py", line 774, in call
    data_format=self.data_format)
  File "/Users/shimizuryou/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/plaidml/keras/backend.py", line 1669, in conv2d_transpose
    return conv_transpose(x, kernel, output_shape, strides, padding, data_format)
  File "/Users/shimizuryou/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/plaidml/keras/backend.py", line 1552, in conv_transpose
    x.shape)
  File "/Users/shimizuryou/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/plaidml/keras/backend.py", line 1393, in _format_conv_strings
    strides[i], padding)
  File "/Users/shimizuryou/.pyenv/versions/anaconda-2.4.0/lib/python2.7/site-packages/plaidml/keras/backend.py", line 1297, in pad_compute
    num_out_size = int((input_size - filter_size + stride) // stride)
TypeError: unsupported operand type(s) for //: '_Op' and 'int'

`No module named plaidml.plaidml_setup`

I am trying to build and install plaidml on macOS 10.12.

When I try to run plaidml-setup, I get No module named plaidml.plaidml_setup.

What I did was

virtualenv --system-site-packages ~/plaidml
source ~/plaidml/bin/activate
git clone https://github.com/plaidml/plaidml.git
bazel build -c opt plaidml:wheel plaidml/keras:wheel
pip install -U bazel-bin/plaidml/*whl bazel-bin/plaidml/keras/*whl

I am using bazel-0.7.0.

Any advice would be appreciated.

Neural Style Transfer from the Keras Examples

Not sure if this has already been fixed. Figured it'd be worth opening an issue in case no one's encountered this bug / error:

$ python2.7 neural_style_transfer.py ~/Downloads/cat.jpg ~/Downloads/starry_night.jpg ../output/style_transfer_
INFO:plaidml:Initializing device [...]"
INFO:plaidml:Opening device "[...]"
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
79781888/80134624 [============================>.] - ETA: 0sModel loaded.
Start of iteration 0
INFO:plaidml:Analyzing Ops: 40 of 487 operations complete
INFO:plaidml:Analyzing Ops: 74 of 487 operations complete
INFO:plaidml:Analyzing Ops: 108 of 487 operations complete
INFO:plaidml:Analyzing Ops: 147 of 487 operations complete
Traceback (most recent call last):
  File "neural_style_transfer.py", line 285, in <module>
    fprime=evaluator.grads, maxfun=20)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 193, in fmin_l_bfgs_b
    **opts)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 328, in _minimize_lbfgsb
    f, g = func_and_grad(x)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/lbfgsb.py", line 278, in func_and_grad
    f = fun(x, *args)
  File "/usr/local/lib/python2.7/dist-packages/scipy/optimize/optimize.py", line 292, in function_wrapper
    return function(*(wrapper_args + args))
  File "neural_style_transfer.py", line 263, in loss
    loss_value, grad_values = eval_loss_and_grads(x)
  File "neural_style_transfer.py", line 239, in eval_loss_and_grads
    outs = f_outputs([x])
  File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 913, in __call__
    self._invoker.invoke()
  File "/usr/local/lib/python2.7/dist-packages/plaidml/__init__.py", line 1317, in invoke
    return Invocation(self._ctx, self)
  File "/usr/local/lib/python2.7/dist-packages/plaidml/__init__.py", line 1323, in __init__
    self._as_parameter_ = _lib().plaidml_schedule_invocation(ctx, invoker)
  File "/usr/local/lib/python2.7/dist-packages/plaidml/__init__.py", line 667, in _check_err
    self.raise_last_status()
  File "/usr/local/lib/python2.7/dist-packages/plaidml/library.py", line 136, in raise_last_status
    raise self.last_status()
plaidml.exceptions.Unknown: Invalid reshape

Incompatible shapes plaidml.keras.backend.PlaidMLKerasException: Broadcast failure: (1, 1, 1, 48) and (1, 48, 48, 32) cannot be unified

Hi everyone!

I am trying to run a model built in keras with plaidml:

self.model_emotion = load_model(EMOTION_DETECTOR)

This code works fine in keras but in plaidml I've got this error:

INFO:plaidml:b'Opening device "ellesmere.0
Traceback (most recent call last):
  File "/home/erehzio/Projetos/seb-poc/teste_emotion.py", line 126, in <module>
    FACE_EMOTION = G.blocks.FaceEmotion()
  File "/usr/local/lib/python3.5/dist-packages/gryfo-1.0_81_g3634b08-py3.5.egg/gryfo/blocks/face_emotion_analyser.py", line 25, in __init__
  File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 239, in load_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 313, in model_from_config
    return layer_module.deserialize(config, custom_objects=custom_objects)
  File "/usr/local/lib/python3.5/dist-packages/keras/layers/__init__.py", line 54, in deserialize
    printable_module_name='layer')
  File "/usr/local/lib/python3.5/dist-packages/keras/utils/generic_utils.py", line 139, in deserialize_keras_object
    list(custom_objects.items())))
  File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 1214, in from_config
    model.add(layer)
  File "/usr/local/lib/python3.5/dist-packages/keras/models.py", line 475, in add
    output_tensor = layer(self.outputs[0])
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 602, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/keras/layers/normalization.py", line 177, in call
    epsilon=self.epsilon)
  File "/usr/local/lib/python3.5/dist-packages/plaidml/keras/backend.py", line 2187, in normalize_batch_in_training
    x=x, mean=mean, var=var, beta=beta, gamma=gamma, epsilon=epsilon)
  File "/usr/local/lib/python3.5/dist-packages/plaidml/keras/backend.py", line 1108, in batch_normalization
    return ((x - mean) * gamma / denom) + beta
  File "/usr/local/lib/python3.5/dist-packages/plaidml/keras/backend.py", line 477, in __mul__
    _broadcast_shape(self, other), 'function (B, C) -> (A) { A = B * C; }',
  File "/usr/local/lib/python3.5/dist-packages/plaidml/keras/backend.py", line 126, in _broadcast_shape
    return tuple([broad_dim(x, y) for (x, y) in zip(x_shape, y_shape)])
  File "/usr/local/lib/python3.5/dist-packages/plaidml/keras/backend.py", line 126, in <listcomp>
    return tuple([broad_dim(x, y) for (x, y) in zip(x_shape, y_shape)])
  File "/usr/local/lib/python3.5/dist-packages/plaidml/keras/backend.py", line 124, in broad_dim
    raise e  # Incompatible shapes
plaidml.keras.backend.PlaidMLKerasException: Broadcast failure: (1, 1, 1, 48) and (1, 48, 48, 32) cannot be unified

Can you help me?

Bidirectional RNNs not working

Since it was not mentioned in the readme anymore, I assumed that plaidml had support for bidirectional RNNs. It'd be better if you guys did mention this on the readme. Is there any way to track the progress of bidirectional support?
The error I was greeted with:
NotImplementedError: rnn is not implemented for go_backwards=True

BatchNormalization after Flatten gives "Duplicate binding" error

Hi,

Thanks for all the work on PlaidML!

I'm running into an issue using a BatchNorm layer after flatten.

Sample code:

import numpy as np

# Install the plaidml backend
import plaidml.keras
plaidml.keras.install_backend()

import keras
from keras.layers.normalization import BatchNormalization
from keras.layers import Input, Dense, Flatten
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(2,2))

# Removing the Flatten makes the model run,
# Changing the order of BatchNorm and Flatten also runs
x = Flatten()(inputs)
x = BatchNormalization()(x)
predictions = Dense(2, activation='softmax')(x)

model = Model(inputs=inputs, outputs=predictions)
model.summary()

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

x_train = np.array([[[1,1],[1,1]]])

y = model.predict(x=x_train, batch_size=1)
model.fit(x_train, np.array([[0., 1.]]), batch_size=1, epochs=1, verbose=1)

Error message:

INFO:plaidml:b'Opening device "amd_radeon_r9_m370x_compute_engine.0"'
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 2, 2)              0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 4)                 0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 4)                 16        
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 10        
=================================================================
Total params: 26
Trainable params: 18
Non-trainable params: 8
_________________________________________________________________
Epoch 1/1
---------------------------------------------------------------------------
Unknown                                   Traceback (most recent call last)
<ipython-input-1-bed6b5bd5b5f> in <module>()
     27 
     28 y = model.predict(x=x_train, batch_size=1)
---> 29 model.fit(x_train, np.array([[0., 1.]]), batch_size=1, epochs=1, verbose=1)

~/test/lib/python3.5/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1596                               initial_epoch=initial_epoch,
   1597                               steps_per_epoch=steps_per_epoch,
-> 1598                               validation_steps=validation_steps)
   1599 
   1600     def evaluate(self, x, y,

~/test/lib/python3.5/site-packages/keras/engine/training.py in _fit_loop(self, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
   1181                     batch_logs['size'] = len(batch_ids)
   1182                     callbacks.on_batch_begin(batch_index, batch_logs)
-> 1183                     outs = f(ins_batch)
   1184                     if not isinstance(outs, list):
   1185                         outs = [outs]

~/test/lib/python3.5/site-packages/plaidml/keras/backend.py in __call__(self, inputs)
    957             self._invoker.set_output(name, t)
    958 
--> 959         self._invoker.invoke()
    960 
    961         return [t.as_ndarray(_ctx) for t in tensors]

~/test/lib/python3.5/site-packages/plaidml/__init__.py in invoke(self)
   1324 
   1325     def invoke(self):
-> 1326         return Invocation(self._ctx, self)
   1327 
   1328 

~/test/lib/python3.5/site-packages/plaidml/__init__.py in __init__(self, ctx, invoker)
   1330 
   1331     def __init__(self, ctx, invoker):
-> 1332         self._as_parameter_ = _lib().plaidml_schedule_invocation(ctx, invoker)
   1333         self._free = _lib().plaidml_free_invocation
   1334 

~/test/lib/python3.5/site-packages/plaidml/__init__.py in _check_err(self, result, func, args)
    672         if func == self.plaidml_writeback_mapping and args[2]:
    673             return None
--> 674         self.raise_last_status()
    675 
    676 

~/test/lib/python3.5/site-packages/plaidml/library.py in raise_last_status(self)
    134 
    135     def raise_last_status(self):
--> 136         raise self.last_status()
    137 
    138     def _logger_callback(self, unused_arg, level, msg):

Unknown: b'Duplicate binding discovered: X_I_2'

I'm on a Macbook Pro with an AMD Radeon R9 M370X and PlaidML 0.1.2. Running the same code on CPU with the Tensorflow background doesn't produce this error.

Build failure with bazel 0.8.0

I tried to build it on my Ubuntu x86_64 machine, after downloading/installing bazel 0.8.0, and it failed:

[09:04] jwatte@ripper:/usr/local/src/plaidml$ bazel build plaidml:wheel plaidml/keras:wheel
........
ERROR: /usr/local/src/plaidml/plaidml/BUILD:21:1: Traceback (most recent call last):
        File "/usr/local/src/plaidml/plaidml/BUILD", line 21
                plaidml_proto_library(name = "proto", srcs = ["plaidml.p..."], ..."])
        File "/usr/local/src/plaidml/bzl/plaidml.bzl", line 81, in plaidml_proto_library
                plaidml_cc_proto_library(name = name, srcs = srcs, deps = dep..., <5 more arguments>)
        File "/usr/local/src/plaidml/bzl/plaidml.bzl", line 116, in plaidml_cc_proto_library
                cc_proto_library(name = (name + "_cc"), srcs = srcs, <7 more arguments>)
        File "/usr/local/src/plaidml/bzl/protobuf.bzl", line 193, in cc_proto_library
                cc_libs += [default_runtime]
trying to mutate a frozen object
ERROR: error loading package 'plaidml': Package 'plaidml' contains errors
INFO: Elapsed time: 0.491s
FAILED: Build did NOT complete successfully (2 packages loaded)

Using a git checkout from today.

Note that I'm running Ubuntu 17.04, so not one of the "officially supported" versions, but 16.04 doesn't support some of my hardware.

Update about Keras Embedding Implementation

Dear developer team,
any update about the implementation of Keras embedding in PlaidML?
Thanks
Best regards
Jerome

Keras Embedding Layer is not working

I try cGANs of https://github.com/eriklindernoren/Keras-GAN

Without PlaidML is working well, but after add

import plaidml.keras
plaidml.keras.install_backend()

I got the errors:

PlaidMLKerasException: b'Applying function, tensor with mismatching dimensionality: I, expected=3, got=2'
Traceback:
_Op(batch_flatten)
outputs: O
inputs:
I: _Op(gather)
outputs: OUT
inputs:
V: sequential_1/dropout_1/sequential_2/dropout_3/embedding_3/embeddings[10, 784]
sequential_1/dropout_1/sequential_2/dropout_3/embedding_3/embeddings[10, 784]
I: sequential_1/dropout_1/sequential_2/dropout_3/input_6[None, 1]
sequential_1/dropout_1/sequential_2/dropout_3/input_6[None, 1]

URL fetch failure in plaidvision example on MacOS

Hi, I am getting the following output when I try the plaidvision example as described on the website:

(plaidml) MacBook-Pro:plaidvision user$ python ./plaidvision.py mobilenet
Using PlaidML backend.
/Users/user/plaidml/lib/python2.7/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
INFO:plaidml:Opening device "intel(r)_iris(tm)_graphics_650.0"
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5
Traceback (most recent call last):
File "./plaidvision.py", line 314, in
main()
File "./plaidvision.py", line 264, in main
model = Model(args.model, args.weights)
File "./plaidvision.py", line 206, in init
self.model = ModelClass(weights=weights)
File "/Users/user/plaidml/lib/python2.7/site-packages/keras/applications/mobilenet.py", line 520, in MobileNet
cache_subdir='models')
File "/Users/user/plaidml/lib/python2.7/site-packages/keras/utils/data_utils.py", line 222, in get_file
raise Exception(error_msg.format(origin, e.errno, e.reason))
Exception: URL fetch failure on https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5: None -- [SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:661)

My configuration is as follows:

MacBook Pro (13-inch, 2017, Four Thunderbolt 3 Ports)

Python 2.7.14

(plaidml) MacBook-Pro:plaidvision user$ openssl version
LibreSSL 2.2.7

I installed plaidml as described in the instructions and configured it to use the built in Intel GPU. It seems to me that the problem is related to incompatibility between the server that provides the URL and the python library that is used by the framework to access the URL. However, I don't know how to fix this. Could anyone help?

Best wishes,

Leif

ValueError: Dilated convolutions not yet supported

Dilated convolutions would be helpful for wavenet and similar architectures

INFO:plaidml:b'Opening device "intel(r)_iris(tm)_graphics_550.0'
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-aa075ef71a9c> in <module>()
----> 1 full_seg_model = load_model('dilate_full_lesion_model.h5', compile = False)

/Users/mader/anaconda/lib/python3.5/site-packages/keras/models.py in load_model(filepath, custom_objects, compile)
    237             raise ValueError('No model found in config file.')
    238         model_config = json.loads(model_config.decode('utf-8'))
--> 239         model = model_from_config(model_config, custom_objects=custom_objects)
    240 
    241         # set weights

/Users/mader/anaconda/lib/python3.5/site-packages/keras/models.py in model_from_config(config, custom_objects)
    311                         'Maybe you meant to use '
    312                         '`Sequential.from_config(config)`?')
--> 313     return layer_module.deserialize(config, custom_objects=custom_objects)
    314 
    315 

/Users/mader/anaconda/lib/python3.5/site-packages/keras/layers/__init__.py in deserialize(config, custom_objects)
     52                                     module_objects=globs,
     53                                     custom_objects=custom_objects,
---> 54                                     printable_module_name='layer')

/Users/mader/anaconda/lib/python3.5/site-packages/keras/utils/generic_utils.py in deserialize_keras_object(identifier, module_objects, custom_objects, printable_module_name)
    137                 return cls.from_config(config['config'],
    138                                        custom_objects=dict(list(_GLOBAL_CUSTOM_OBJECTS.items()) +
--> 139                                                            list(custom_objects.items())))
    140             with CustomObjectScope(custom_objects):
    141                 return cls.from_config(config['config'])

/Users/mader/anaconda/lib/python3.5/site-packages/keras/engine/topology.py in from_config(cls, config, custom_objects)
   2495                 if layer in unprocessed_nodes:
   2496                     for node_data in unprocessed_nodes.pop(layer):
-> 2497                         process_node(layer, node_data)
   2498 
   2499         name = config.get('name')

/Users/mader/anaconda/lib/python3.5/site-packages/keras/engine/topology.py in process_node(layer, node_data)
   2452             if input_tensors:
   2453                 if len(input_tensors) == 1:
-> 2454                     layer(input_tensors[0], **kwargs)
   2455                 else:
   2456                     layer(input_tensors, **kwargs)

/Users/mader/anaconda/lib/python3.5/site-packages/keras/engine/topology.py in __call__(self, inputs, **kwargs)
    600 
    601             # Actually call the layer, collecting output(s), mask(s), and shape(s).
--> 602             output = self.call(inputs, **kwargs)
    603             output_mask = self.compute_mask(inputs, previous_mask)
    604 

/Users/mader/anaconda/lib/python3.5/site-packages/keras/layers/convolutional.py in call(self, inputs)
    162                 padding=self.padding,
    163                 data_format=self.data_format,
--> 164                 dilation_rate=self.dilation_rate)
    165         if self.rank == 3:
    166             outputs = K.conv3d(

/Users/mader/anaconda/lib/python3.5/site-packages/plaidml/keras/backend.py in conv2d(x, kernel, strides, padding, dilation_rate, data_format, force_winograd)
   1509                            kernel.shape[3] > 4)):
   1510         return _winograd(x, kernel, padding=padding)
-> 1511     return conv(x, kernel, strides, padding, data_format, dilation_rate)
   1512 
   1513 

/Users/mader/anaconda/lib/python3.5/site-packages/plaidml/keras/backend.py in conv(x, kernel, strides, padding, data_format, dilation_rate, channelwise)
   1296     for entry in dilation_rate:
   1297         if entry != 1:
-> 1298             raise ValueError("Dilated convolutions not yet supported")
   1299     if kernel.ndim != rank + 2:
   1300         raise ValueError("Convolution kernel shape inconsistent with input shape: " +

ValueError: Dilated convolutions not yet supported

Support Multiple Devices

PlaidML currently requires exactly one selected device.

We've considered multiple approaches to multi-device support:

[Data Parallelism] (Multiple copies of the function graph on each device, coordinate weight updates)
Model Parallelism (Split up the graph across devices)
Kernel Parallelism (Effectively splitting up the Tile Op index space)

Alternatively, we could implement backends for frameworks that have solid multi-device support, such as:

MXNet
TensorFlow

This is a parent issue tracking implementation of all 3. We'll start with data parallelism and move on to others as warranted, with separate issues for each.

Feature Request: ELU activation function Implementation

I will implement the ELU function from the Keras backend as introduced here: http://image-net.org/challenges/posters/JKU_EN_RGB_Schwarz_poster.pdf

Support/Replacement for 'ellipsis' in plaidml.keras.backend._parse_slice

Hello, and thank you for releasing such a wonderful tool for us AMD developers!

I'm trying to run a SSD model with Plaidml-powered backend, but I can't load the model properly.

# Instance network

model = ssd_512(image_size=(img_height, img_width, 3),
                n_classes=20,
                mode='inference',
                l2_regularization=0.0005,
                scales=[0.07, 0.15, 0.3, 0.45, 0.6, 0.75, 0.9, 1.05], # The scales for MS COCO are [0.04, 0.1, 0.26, 0.42, 0.58, 0.74, 0.9, 1.06]
                aspect_ratios_per_layer=[[1.0, 2.0, 0.5],
                                         [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                         [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                         [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                         [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                         [1.0, 2.0, 0.5],
                                         [1.0, 2.0, 0.5]],
               two_boxes_for_ar1=True,
               steps=[8, 16, 32, 64, 128, 256, 512],
               offsets=[0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5],
               limit_boxes=False,
               variances=[0.1, 0.1, 0.2, 0.2],
               coords='centroids',
               normalize_coords=True,
               subtract_mean=[123, 117, 104],
               swap_channels=True,
               confidence_thresh=0.5,
               iou_threshold=0.45,
               top_k=200,
               nms_max_output_size=400)

The problem here is in a single part of the model definition, where BGR turns to RGB:

    # ssd_512 class
    # ...
    # The following identity layer is only needed so that the subsequent lambda layers can be optional.
    x1 = Lambda(lambda z: z, output_shape=(img_height, img_width, img_channels), name='identity_layer')(x)
    if not (subtract_mean is None):
        x1 = Lambda(lambda z: z - np.array(subtract_mean), output_shape=(img_height, img_width, img_channels), name='input_mean_normalization')(x1)
    if not (divide_by_stddev is None):
        x1 = Lambda(lambda z: z / np.array(divide_by_stddev), output_shape=(img_height, img_width, img_channels), name='input_stddev_normalization')(x1)
    if swap_channels and (img_channels == 3):
        x1 = Lambda(lambda z: z[...,::-1], output_shape=(img_height, img_width, img_channels), name='input_channel_swap')(x1)
    # ...

More specifically, the z[...,::-1] part. This is an easy way to convert to and from BGR, but the backend does not understand the ellipsis in the array definition. A workaround is just using cv2.cvtColor before inserting anything in the network and assuming RGB, but I feel this is something that should be understood by the backend!

The complete stacktrace is here:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-0ccfea24f9b2> in <module>()
     27                iou_threshold=0.45,
     28                top_k=200,
---> 29                nms_max_output_size=400)
     30 
     31 # 2: Load the trained weights into the model.

~/Projekts/ssd_keras/models/keras_ssd512.py in ssd_512(image_size, n_classes, mode, l2_regularization, min_scale, max_scale, scales, aspect_ratios_global, aspect_ratios_per_layer, two_boxes_for_ar1, steps, offsets, limit_boxes, variances, coords, normalize_coords, subtract_mean, divide_by_stddev, swap_channels, confidence_thresh, iou_threshold, top_k, nms_max_output_size, return_predictor_sizes)
    267         x1 = Lambda(lambda z: z / np.array(divide_by_stddev), output_shape=(img_height, img_width, img_channels), name='input_stddev_normalization')(x1)
    268     if swap_channels and (img_channels == 3):
--> 269         x1 = Lambda(lambda z: z[...,::-1], output_shape=(img_height, img_width, img_channels), name='input_channel_swap')(x1)
    270 
    271     conv1_1 = Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv1_1')(x1)

~/.virtualenvs/cv/lib/python3.5/site-packages/keras/engine/topology.py in __call__(self, inputs, **kwargs)
    615 
    616             # Actually call the layer, collecting output(s), mask(s), and shape(s).
--> 617             output = self.call(inputs, **kwargs)
    618             output_mask = self.compute_mask(inputs, previous_mask)
    619 

~/.virtualenvs/cv/lib/python3.5/site-packages/keras/layers/core.py in call(self, inputs, mask)
    661         if has_arg(self.function, 'mask'):
    662             arguments['mask'] = mask
--> 663         return self.function(inputs, **arguments)
    664 
    665     def compute_mask(self, inputs, mask=None):

~/Projekts/ssd_keras/models/keras_ssd512.py in <lambda>(z)
    267         x1 = Lambda(lambda z: z / np.array(divide_by_stddev), output_shape=(img_height, img_width, img_channels), name='input_stddev_normalization')(x1)
    268     if swap_channels and (img_channels == 3):
--> 269         x1 = Lambda(lambda z: z[...,::-1], output_shape=(img_height, img_width, img_channels), name='input_channel_swap')(x1)
    270 
    271     conv1_1 = Conv2D(64, (3, 3), activation='relu', padding='same', kernel_initializer='he_normal', kernel_regularizer=l2(l2_reg), name='conv1_1')(x1)

~/.virtualenvs/cv/lib/python3.5/site-packages/plaidml/keras/backend.py in __getitem__(self, key)
    381 
    382     def __getitem__(self, key):
--> 383         (var_list, dim_list, formula_list, offset_list, shape) = self._gen_slice(key)
    384 
    385         if len(shape) == 0:

~/.virtualenvs/cv/lib/python3.5/site-packages/plaidml/keras/backend.py in _gen_slice(self, key)
    346         inner_idx = 0
    347         for idx in range(len(key)):
--> 348             length_numerator, step, offset = self._parse_slice(key, idx)
    349             if step == None:
    350                 # In this case offset is an int

~/.virtualenvs/cv/lib/python3.5/site-packages/plaidml/keras/backend.py in _parse_slice(self, key, idx)
    269         if isinstance(key[idx], int):
    270             return 1, None, key[idx]
--> 271         if ((not isinstance(key[idx].start, int) and not isinstance(key[idx].start, type(None))) or
    272             (not isinstance(key[idx].stop, int) and not isinstance(key[idx].stop, type(None))) or
    273             (not isinstance(key[idx].step, int) and not isinstance(key[idx].step, type(None)))):

AttributeError: 'ellipsis' object has no attribute 'start'

Feature request: Half Float (FP16) support

Are there plans to optimize around 16-bit floats for training models? AMD's Vega supports two 16-bit mul-add per clock which means an RX Vega 56 for under $450 could provide nearly 25 TFLOPS of training performance. And Nvidia's upcoming Volta architecture shows big performance gains in caffe2 when training with 16-bit floats.

There doesn't seem to be many other ML frameworks that support this.

plaidml:b'syntax error, unexpected ), expecting }' b'Out of memory' (Keras 2.1.2 with mobilenet)

Hello!

I am trying to run plaidbench - plaidml in keras 2.1.2 but I got this error:

INFO:plaidml:b'Opening device "ellesmere.0"' ERROR:plaidml:b'syntax error, unexpected ), expecting ( : function (I[N0, N1]) -> (O) {\n O = reshape(I, _Op(slice), 1, 1, 1024);\n}' b'Out of memory' Traceback: _Op(reshape) outputs: O inputs: I: _Op(mean) outputs: O inputs: I: _Op(relu) outputs: R inputs: X: _Op(switch) outputs: O inputs: C: conv1_bn/placeholder_1[] conv1_bn/placeholder_1[] T: _Op(+) outputs: A inputs: ...truncated... E: _Op(+) outputs: A inputs: ...truncated... Alpha: 0.0 0.0 MaxValue: 6 6

In keras 2.0.8 it's works. But I need to run an application that needs keras 2.1.2 together plaidml.

Thanks!

Where are the kernels?

Hi, I can't find any OpenCL kernels in the source dir, and was wondering where they could be (greps for "gemm" and "__kernel" come up empty). Perhaps I don't understand where plaidml fits in, I would have thought the bulk of the work would be writing OpenCL performant kernels though. Thanks for the clarification.

Memory error on Vega 10

Hi I am trying plaid ml on AMD Vega 10 : gfx900

I get the following error:

prj47-rack-06@PRJ47-RACK-06:~/biswa/plaidbench$ python plaidbench.py mobilenet
Using PlaidML backend.
INFO:plaidml:Initializing device gfx900.0: "gfx900", vendor "Advanced Micro Devi ces, Inc."
INFO:plaidml:Initializing device gfx900.1: "gfx900", vendor "Advanced Micro Devi ces, Inc."
INFO:plaidml:Initializing device gfx900.2: "gfx900", vendor "Advanced Micro Devi ces, Inc."
INFO:plaidml:Initializing device gfx900.3: "gfx900", vendor "Advanced Micro Devi ces, Inc."
INFO:plaidml:Opening device "gfx900.3": "Advanced Micro Devices, Inc. gfx900"

Model loaded.
Compiling and running initial batch, batch_size=1
Warmup
Memory access fault by GPU node-7 on address 0x4408bd6000. Reason: Page not pres ent or supervisor privilege.
Aborted (core dumped)

Any idea how to resolve this?

Thanks,
Biswa

Unable to compile Tile program

Hi!
I' trying a latest version of PlaidML.

I modifed a sample code of keras and try it, when an error occurred

WARNING:plaidml:Failed to build program: CL_BUILD_PROGRAM_FAILURE

Previous version was working well.

Best

Training networks with large embedding layers is very slow

Currently PlaidML doesn't support derivatives involving sparse tensors very well, leading to serious performance issues while training large embedding layers

When will you correspond to Conv2DTranspose?

Hi, This is very impressive project.

I tried plaidml on AMD Radeon RX570.
When I was trying keras/examples/mnist_cond_gan.py with PlaidML, error occurred.

NotImplementedError: The Keras backend function 'conv2d_transpose' is not yet implemented in Plaid. You can help us prioritize by letting us know if this function is important to you, and as always, contributions are welcome!

conv2d_transpose is quite important for implement Generative Adversarial Networks.

If you have a plan, I want to know that.
I think that when support conv2d_transpose in plaidML, almost users can change from CUDA to OpenCL based system completely.

MaxPooling2D in Keras fails due to TypeError in backend

I do not claim to even slightly understand the inner workings of Keras or PlaidML, but I was determined to hunt this issue down. Whenever I tried to add a keras MaxPooling2D layer to a Sequential, I received the following error:

TypeError: can only concatenate tuple (not "list") to tuple

Attributed to this line in plaidml/keras/backend.py:

outshape = x.shape[:2] + num_out_size

Without understanding much of the surrounding code, other than that its purpose was to generate Tile code, I found that x.shape resulted in a tuple (None, 1), yet num_out_size was an array of integers. After some educated guessing, I changed the line to this:

outshape = list(x.shape[:2]) + num_out_size

and no longer received the error. I am running plaidml 0.1.2, keras 2.0.8, plaidml-keras 0.1.2, on Python 3.6.3 installed through Homebrew (brew install python3) on a Mac running macOS 10.13. I produced the following example code to reproduce the error.

import plaidml.keras
plaidml.keras.install_backend()

import keras
from keras.models import Sequential
from keras.layers import Reshape, MaxPooling2D, Conv2D

model = Sequential()

model.add(
	Reshape(
		(1, 64, 64), 
    		input_shape=(1, 64*64)
	)
)

model.add(
	Conv2D(
		filters=3,
		kernel_size=(5,5),
		padding='same',
		activation='relu'
	)
)

model.add(
	MaxPooling2D(pool_size=(2,2), data_format='channels_first')
)

I cannot guarantee that this is a minimal example, it is just a very faithful recreation of the original code that triggered the error, with hardcoded values instead of external variables.

If I am misguided in submitting this issue, which I do not doubt, please point me in the right direction. I just wanted to see if this problem could be fixed through a little tinkering.

Support for PyTorch and dynamic computation graphs

Hi,

I'd like to get support for Pytorch and support for dynamic computational graph based DL languages. I think this is fairly important as a number of the newest techniques in DL are easily implemented on Pytorch rather than Keras or TF.

Zao

CL_OUT_OF_HOST_MEMORY

I am running it with py 2.7 on ubuntu 16.04. This is the result that I get when running plaidml-setup.

PlaidML Setup (0.1.3)

Thanks for using PlaidML!

Some Notes:
  * Bugs and other issues: https://github.com/plaidml/plaidml
  * Questions: https://stackoverflow.com/questions/tagged/plaidml
  * Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
  * PlaidML is licensed under the GNU AGPLv3
 
Traceback (most recent call last):
  File "/usr/local/bin/plaidml-setup", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/plaidml/plaidml_setup.py", line 44, in main
    devices, _ = plaidml.devices(ctx, limit=100, return_all=True)
  File "/usr/local/lib/python2.7/dist-packages/plaidml/__init__.py", line 997, in devices
    enumerator = _Enumerator(ctx)
  File "/usr/local/lib/python2.7/dist-packages/plaidml/__init__.py", line 963, in __init__
    ctx, ctypes.cast(None, _ENUM_DEVICES_FUNCTYPE), None)
  File "/usr/local/lib/python2.7/dist-packages/plaidml/__init__.py", line 684, in _check_err
    self.raise_last_status()
  File "/usr/local/lib/python2.7/dist-packages/plaidml/library.py", line 132, in raise_last_status
    raise self.last_status()
plaidml.exceptions.Unknown: creating a command queue for an OpenCL device: CL_OUT_OF_HOST_MEMORY

Feature request - port to Python 3.6

I've got PlaidML running on my AMD Bonaire on Arch Linux with Python 2.7 in a Conda environment. Every other Python package I have runs with 3.6 and my goal is to keep it that way. ;-)

There doesn't seem to even be a pip package for 3.6, so the pip install -U plaidml-keras fails with Python 3.6. If you can post build-from-GitHub-source instructions, I can make a local package and install it.

P.S.: Let me know if you want Arch setup instructions for AMD GPUs. Most of it is on the Arch User Repository wiki but I've got some scripts that do the work.

P.P.S.: Benchmark results

Using PlaidML backend.
INFO:plaidml:Initializing device bonaire.0: "Bonaire", vendor "Advanced Micro Devices, Inc."
INFO:plaidml:Opening device "bonaire.0": "Advanced Micro Devices, Inc. Bonaire"
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.6/mobilenet_1_0_224_tf.h5
16793600/17225924 [============================>.] - ETA: 0s 
Model loaded.
Compiling and running initial batch, batch_size=1
Warmup
Doing the main timing
Example finished, elapsed: 6.821215868 (compile), 15.0223557949 (execution)

plaidml.exceptions.Unknown: Duplicate updates

Setup:

sudo apt-get install clinfo
clinfo [sees 1080ti]
sudo pip install -U plaidml-keras
plaidml-setup
[insert before keras import:]
import plaidml.keras
plaidml.keras.install_backend()

But, intermediate problem:

 ImportError: No module named plaidml.keras
$ which python
/home/phobrain/anaconda2/bin//python

Fix:

sys.path.append('/usr/local/lib/python2.7/dist-packages/')
import plaidml.keras
plaidml.keras.install_backend()

'Real' issue being reported:

File "siaconv.py", line 919, in doit
epochs=epochs)
File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
return func(*args, **kwargs)
File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1926, in fit_generator
self._make_train_function()
File "/home/phobrain/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 967, in _make_train_function
**self._function_kwargs)
File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 1718, in function
return _Function(inputs, outputs, updates, name)
File "/usr/local/lib/python2.7/dist-packages/plaidml/keras/backend.py", line 931, in init
c.add_update(_plaidml_val(var), _plaidml_val(newval))
File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 1289, in add_update
_lib().plaidml_add_composer_update(self, dest, src)
File "/usr/local/lib/python2.7/dist-packages/plaidml/init.py", line 674, in _check_err
self.raise_last_status()
File "/usr/local/lib/python2.7/dist-packages/plaidml/library.py", line 136, in raise_last_status
raise self.last_status()
plaidml.exceptions.Unknown: Duplicate updates

model.fit_generator(
        myGen('data', tr_pairs, tr_y, batch_size, True),
        (len(tr_pairs)-1) / batch_size,
        validation_data=myGen('valid', te_pairs, te_y, batch_size, True),
        validation_steps=1,
        max_queue_size=2,
        workers=1,
        epochs=epochs)

Net:

KERNEL_INIT = 'glorot_normal'

    seq.add(Dense(dense_size, input_shape=input_shape,
                activation='relu', kernel_initializer=KERNEL_INIT))
    seq.add(BatchNormalization())
    seq.add(Dense((dense_size*2)/3,
            activation='relu',
            kernel_initializer=KERNEL_INIT))
    seq.add(Dropout(0.1, seed=SEED))
    seq.add(Dense(dense_size/4,
            activation='relu',
            kernel_initializer=KERNEL_INIT))
    seq.add(Dense((dense_size*2)/3,
            activation='relu',
            kernel_initializer=KERNEL_INIT))
    seq.add(Dense(dense_size,
            activation='relu',
            kernel_initializer=KERNEL_INIT))
    seq.add(Dense(512,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
    seq.add(Dense(256,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
    seq.add(Dense(128,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
    seq.add(Dense(256,
                activation='relu',
                kernel_initializer=KERNEL_INIT))
    seq.add(Dense(128,
                activation='relu',
                kernel_initializer=KERNEL_INIT))

RNNs require static sequence length

Recurrent networks currently require the sequence length to be known when the Tile code is constructed and bound. This requires the shape of the input layer for RNNs to have a specific size for their time dimension (i.e., not a None in the time dimension of their shape). This excludes a few use cases, and also breaks Keras networks that infer the sequence length automatically. The latter case has a workaround -- explicitly specify the sequence length in the input fed into the RNN (e.g. explicitly specifying the shape in an Input layer); but it would be better to not require a workaround.

[macOS] model.fit() loss: nan

Ran mnist_cnn.py from keras/examples after adding plaidml as the backend. This issue affects many others, but this is the simplest example.

Will run fine for a while, then loss will hit nan and acc will plummet until it hits 0, where it stays.

Andys-iMac-2:examples andy$ python mnist_cnn.py
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
INFO:plaidml:Opening device "amd_radeon_pro_580_compute_engine.0
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
59776/60000 [============================>.] - ETA: 0s - loss: 0.3177 - acc: 0.9025INFO:plaidml:Analyzing Ops: 85 of 285 operations complete
60000/60000 [==============================] - 27s - loss: 0.3172 - acc: 0.9026 - val_loss: 0.2699 - val_acc: 0.9217
Epoch 2/12
60000/60000 [==============================] - 18s - loss: 0.1104 - acc: 0.9666 - val_loss: 0.2247 - val_acc: 0.9308
Epoch 3/12
60000/60000 [==============================] - 19s - loss: nan - acc: 0.5408 - val_loss: nan - val_acc: 0.0000e+00
Epoch 4/12
60000/60000 [==============================] - 19s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 5/12
60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 6/12
60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 7/12
60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 8/12
60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 9/12
60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 10/12
60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 11/12
60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Epoch 12/12
60000/60000 [==============================] - 18s - loss: nan - acc: 0.0000e+00 - val_loss: nan - val_acc: 0.0000e+00
Test loss: nan
Test accuracy: 0.0

CLA for contributors

You guys might want to consider setting up a contributor license agreement for third party contributions.

For example:

CLAHub

plaidml / plaidml Goto Github PK

plaidml's People

Contributors

Stargazers

Watchers

Forkers

plaidml's Issues

Model

Results

Tensorflow 1.3 CPU

plaidml backend

Read in data

Manual Earlystopping: Patience < nb allowed fails

Error:

Recommend Projects

Recommend Topics

Recommend Org