Code Monkey home page Code Monkey logo

redisai's Introduction

GitHub issues CircleCI Dockerhub codecov Total alerts Forum Discord

Caution

RedisAI is no longer actively maintained or supported.

We are grateful to the RedisAI community for their interest and support.

RedisAI

RedisAI is a Redis module for executing Deep Learning/Machine Learning models and managing their data. Its purpose is being a "workhorse" for model serving, by providing out-of-the-box support for popular DL/ML frameworks and unparalleled performance. RedisAI both maximizes computation throughput and reduces latency by adhering to the principle of data locality, as well as simplifies the deployment and serving of graphs by leveraging on Redis' production-proven infrastructure.

To read RedisAI docs, visit redisai.io. To see RedisAI in action, visit the demos page.

Quickstart

RedisAI is a Redis module. To run it you'll need a Redis server (v6.0.0 or greater), the module's shared library, and its dependencies.

The following sections describe how to get started with RedisAI.

Docker

The quickest way to try RedisAI is by launching its official Docker container images.

On a CPU only machine

docker run -p 6379:6379 redislabs/redisai:1.2.7-cpu-bionic

On a GPU machine

For GPU support you will need a machine you'll need a machine that has Nvidia driver (CUDA 11.3 and cuDNN 8.1), nvidia-container-toolkit and Docker 19.03+ installed. For detailed information, checkout nvidia-docker documentation

docker run -p 6379:6379 --gpus all -it --rm redislabs/redisai:1.2.7-gpu-bionic

Building

You can compile and build the module from its source code. The Developer page has more information about the design and implementation of the RedisAI module and how to contribute.

Prerequisites

  • Packages: git, python3, make, wget, g++/clang, & unzip
  • CMake 3.0 or higher needs to be installed.
  • CUDA 11.3 and cuDNN 8.1 or higher needs to be installed if GPU support is required.
  • Redis v6.0.0 or greater.

Get the Source Code

You can obtain the module's source code by cloning the project's repository using git like so:

git clone --recursive https://github.com/RedisAI/RedisAI

Switch to the project's directory with:

cd RedisAI

Building the Dependencies

Use the following script to download and build the libraries of the various RedisAI backends (TensorFlow, PyTorch, ONNXRuntime) for CPU only:

bash get_deps.sh

Alternatively, you can run the following to fetch the backends with GPU support.

bash get_deps.sh gpu

Building the Module

Once the dependencies have been built, you can build the RedisAI module with:

make -C opt clean ALL=1
make -C opt

Alternatively, run the following to build RedisAI with GPU support:

make -C opt clean ALL=1
make -C opt GPU=1

Backend Dependancy

RedisAI currently supports PyTorch (libtorch), Tensorflow (libtensorflow), TensorFlow Lite, and ONNXRuntime as backends. This section shows the version map between RedisAI and supported backends. This extremely important since the serialization mechanism of one version might not match with another. For making sure your model will work with a given RedisAI version, check with the backend documentation about incompatible features between the version of your backend and the version RedisAI is built with.

RedisAI PyTorch TensorFlow TFLite ONNXRuntime
1.0.3 1.5.0 1.15.0 2.0.0 1.2.0
1.2.7 1.11.0 2.8.0 2.0.0 1.11.1
master 1.11.0 2.8.0 2.0.0 1.11.1

Note: Keras and TensorFlow 2.x are supported through graph freezing. See this script to see how to export a frozen graph from Keras and TensorFlow 2.x.

Loading the Module

To load the module upon starting the Redis server, simply use the --loadmodule command line switch, the loadmodule configuration directive or the Redis MODULE LOAD command with the path to module's library.

For example, to load the module from the project's path with a server command line switch use the following:

redis-server --loadmodule ./install-cpu/redisai.so

Give it a try

Once loaded, you can interact with RedisAI using redis-cli. Basic information and examples for using the module is described here.

Client libraries

Some languages already have client libraries that provide support for RedisAI's commands. The following table lists the known ones:

Project Language License Author URL
JRedisAI Java BSD-3 RedisLabs Github
redisai-py Python BSD-3 RedisLabs Github
redisai-go Go BSD-3 RedisLabs Github
redisai-js Typescript/Javascript BSD-3 RedisLabs Github
redis-modules-sdk TypeScript BSD-3-Clause Dani Tseitlin Github
redis-modules-java Java Apache-2.0 dengliming Github
smartredis C++ BSD-2-Clause Cray Labs Github
smartredis C BSD-2-Clause Cray Labs Github
smartredis Fortran BSD-2-Clause Cray Labs Github
smartredis Python BSD-2-Clause Cray Labs Github

The full documentation for RedisAI's API can be found at the Commands page.

Documentation

Read the docs at redisai.io.

Contact Us

If you have questions, want to provide feedback or perhaps report an issue or contribute some code, here's where we're listening to you:

License

RedisAI is licensed under your choice of the Redis Source Available License 2.0 (RSALv2) or the Server Side Public License v1 (SSPLv1).

redisai's People

Contributors

adobrzhansky avatar alonre24 avatar andresrinivasan avatar ashao avatar boat-builder avatar chayim avatar dengliming avatar dvirdukhan avatar filipecosta90 avatar germangh avatar gkorland avatar guyav46 avatar guyroyse avatar itamarhaber avatar jackhallam avatar jenhaoyang avatar joecianflone avatar jonaskuiler avatar k-jo avatar lantiga avatar leibale avatar meirshpilraien avatar mnunberg avatar rafie avatar snyk-bot avatar spartee avatar stockholmux avatar thomascaudron avatar tladd avatar tomerhekmati avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

redisai's Issues

Valgrind memory leak test

Most of leaks are within corner cases ( disconnects, bad input, etc... ).

  • Common Commands ( TENSORSET, TENSORGET )
    make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_common.py"

  • TF
    make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_tensorflow.py"

    • test_run_tf_model
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_tensorflow.py\:test_run_tf_model"

    • test_run_tf_model_errors ( severall leaks )
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_tensorflow.py\:test_run_tf_model_errors"

    • test_tensorflow_modelinfo #298
      this a leak that appears multiple times on our tests and it can be either a missuse in our side or a real leak on redis. We should investigate further prior that creating an issue on redis himself.
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_tensorflow.py\:test_tensorflow_modelinfo"

    • test_tensorflow_modelrun_disconnect ( larger on TF )
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_tensorflow.py\:test_tensorflow_modelrun_disconnect"

  • TFLite (see #288)
    make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_tflite.py"

    • test_run_tflite_model
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_tflite.py\:test_run_tflite_model"

    • test_tflite_modelinfo
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_tflite.py\:test_tflite_modelinfo"

    • test_tflite_modelrun_disconnect ( larger on tflite )
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_tflite.py\:test_tflite_modelrun_disconnect"

  • ONNX
    make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_onnx.py"

    • test_onnx_modelinfo ( same as test_tensorflow_modelinfo ) #298
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_onnx.py\:test_onnx_modelinfo"

    • test_onnx_modelrun_disconnect ( same as test_tensorflow_modelinfo ) #298
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_onnx.py\:test_onnx_modelrun_disconnect"

  • PYTORCH
    make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_pytorch.py"
    leaks on:

    • test_pytorch_modelrun_disconnect
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_pytorch.py\:test_pytorch_modelrun_disconnect"
    • test_pytorch_scriptrun_disconnect
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_pytorch.py\:test_pytorch_modelrun_disconnect"
    • test_pytorch_scriptset
      make -C opt test GEN=0 SLAVES=0 AOF=0 VALGRIND=1 TEST="tests_pytorch.py\:test_pytorch_scriptset"

Replicate results instead of commands in `MODELRUN`

Instead of replicating MODELRUN verbatim, we should just replicate the result.

This amounts to using RedisModule_Replicate and sending AI.TENSORSET instead of AI.MODELRUN, with the serialized output tensors as arguments.

This needs to happen in RedisAI_Run_Reply, once the computation has finished and the client has been unblocked and the response is being sent (since that is the first opportunity to have the outputs available in the main thread)

https://github.com/RedisAI/RedisAI/blob/master/src/redisai.c#L637

Quick start guide is incomplete

I understand it's under development. Certain things I found good-to-have in the document is mentioning about the command line arguments to get_deps.sh, pre-requisites for get_deps.sh to work such as - unzip, cmake, g++, redis server, GPU/CPU etc. Perhaps our shell script could give a warning before it starts the execution. I could give a PR, if these are not written yet

Implement serialization and replication

So far we didn't really take care of serialization and replication for RedisAI data types.
It should be straightforward, with one caveat: in MODELSET we take the protobuf, deserialize it and throw it awayย (we only keep the model in memory, not the protobuf as it would be wasteful).
However, we will need to serialize the protobuf as-is (pre deserialization) for persistence and replication within the call to ModelSet.

Related to #63.

Crash on AI.SCRIPTSET

redis-cli AI.SCRIPTSET ket CPU "a"
ERR: expected def but found 'ident' here:
a
~ <--- HERE

*** Error in `redis-server *:6379': free(): invalid pointer: 0x00007f555de37310 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f555e2a2bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f555e2a8fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7f555e2a980e]
/usr/lib/redis/modules/redisai.so(RAI_ClearError+0x29)[0x7f555bfd4fc6]
/usr/lib/redis/modules/redisai.so(RedisAI_ScriptSet_RedisCommand+0x16a)[0x7f555bfd197c]
redis-server *:6379(RedisModuleCommandDispatcher+0x43)[0x56123cb02b93]
redis-server *:6379(call+0xa7)[0x56123ca935a7]
redis-server *:6379(processCommand+0x35f)[0x56123ca93c5f]
redis-server *:6379(processInputBuffer+0x185)[0x56123caa3d15]
redis-server *:6379(aeProcessEvents+0x2a0)[0x56123ca8d770]
redis-server *:6379(aeMain+0x2b)[0x56123ca8da0b]
redis-server *:6379(main+0x4d3)[0x56123ca8a703]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f555e2522e1]
redis-server *:6379(_start+0x2a)[0x56123ca8a96a]

TENSORGET returns only zeros on INT32

127.0.0.1:6379> AI.TensorSET source FLOAT 2 2 VALUES 1 2 3 4
OK
127.0.0.1:6379>  AI.TENSORGET source VALUES
1) FLOAT
2) 1) (integer) 2
   2) (integer) 2
3) 1) "1"
   2) "2"
   3) "3"
   4) "4"

127.0.0.1:6379> AI.TensorSET source INT32 2 2 VALUES 1 2 3 4
OK
127.0.0.1:6379>  AI.TENSORGET source VALUES
1) INT32
2) 1) (integer) 2
   2) (integer) 2
3) 1) (integer) 0
   2) (integer) 0
   3) (integer) 0
   4) (integer) 0

Crash on AI.MODELSET

Running the following:

redis-cli AI.MODELSET resnet18 TF CPU INPUTS input OUTPUT target < graph.pb

Causes:

ERR: Invalid GraphDef
*** Error in `redis-server *:6379': free(): invalid pointer: 0x00007f149b228b58 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f149b852bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f149b858fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7f149b85980e]
/usr/lib/redis/modules/redisai.so(RAI_ClearError+0x29)[0x7f14993d4fc6]
/usr/lib/redis/modules/redisai.so(RedisAI_ModelSet_RedisCommand+0x58b)[0x7f14993d02e0]
redis-server *:6379(RedisModuleCommandDispatcher+0x43)[0x55fc4a029b93]
redis-server *:6379(call+0xa7)[0x55fc49fba5a7]
redis-server *:6379(processCommand+0x35f)[0x55fc49fbac5f]
redis-server *:6379(processInputBuffer+0x185)[0x55fc49fcad15]
redis-server *:6379(aeProcessEvents+0x2a0)[0x55fc49fb4770]
redis-server *:6379(aeMain+0x2b)[0x55fc49fb4a0b]
redis-server *:6379(main+0x4d3)[0x55fc49fb1703]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f149b8022e1]
redis-server *:6379(_start+0x2a)[0x55fc49fb196a]

Implement auto-batching

At some point we should introduce automatic batching of run requests. Models, especially on the GPU, run more efficiently when inputs are batched.

One possible use case is that multiple run requests to the same model that are sitting in the queue are batched together and invoked once. This could work:

  • analyze the queue, see if there are other calls to the same model (with inputs of the same shape) queued up
  • take a (configurable) number of requests and assemble the inputs tensors into a single tensor along the 0-th dimension
  • call the model
  • unpack the 0-th dimension over the output keys for each request
  • unblock the clients

This would allow requests from multiple clients to the same model to be batched.

A run could be triggered when a) enough requests have been queued up (aka the batch is large enough) OR b) some time has expired.

We could configure this when calling MODELSET, or with a separate command (like MODELCONFIG BATCH). Or both.

Test unhappy paths

We need to thoroughly test (in RLTest) as many unhappy paths as possible, like

  • incomplete or mis-crafted commands
  • corrupted protobufs
  • incorrect scripts

Add integration with Redis data structures

Two, non mutually exclusive, possibilities:

  • AI.TENSORSET key FROM otherkey, with eventual other options to guide the conversion
  • Have input and output keys to AI.MODELRUN and AI.SCRIPTRUN as streams (thanks to @antirez for the idea), to enqueue requests are read corresponding outputs using the stream index

The RDB file contains module data for the module type 'DL_TENSOR', that the responsible module is not able to load

RedisAI

Steps:

  1. Create a database without replication
  2. Put some AI keys to the database
  3. Enable replication
    Slave will fail to sync from master:
20664:S 22 Feb 2019 11:45:42.219 * Connecting to MASTER 172.31.1.242:25727
20664:S 22 Feb 2019 11:45:42.219 * MASTER <-> REPLICA sync started
20664:S 22 Feb 2019 11:45:42.220 * Non blocking connect for SYNC fired the event.
20664:S 22 Feb 2019 11:45:42.221 * Master replied to PING, replication can continue...
20664:S 22 Feb 2019 11:45:42.222 * Partial resynchronization not possible (no cached master)
20664:S 22 Feb 2019 11:45:48.364 * Full resync from master: cdbb5bc4e7c49589646c974370c0a49c58d1c665:14
20664:S 22 Feb 2019 11:45:48.365 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to parser
20664:S 22 Feb 2019 11:45:48.365 * MASTER <-> REPLICA sync: Flushing old data
20664:S 22 Feb 2019 11:45:48.365 * MASTER <-> REPLICA sync: Loading DB in memory
20664:S 22 Feb 2019 11:45:48.366 # The RDB file contains module data for the module type 'DL_TENSOR', that the responsible module is not able to load. Check for modules log above for additional clues.
20664:S 22 Feb 2019 11:45:48.366 # Short read error when parsing object
20664:S 22 Feb 2019 11:45:48.366 # Failed trying to load the MASTER synchronization DB from disk

Add synchronous MODELRUN execution for AOF loading, MULTI, Lua

AOF loading does not support commands that block the client.

In AI.MODELRUN we should detect MULTI and Lua contexts (AOF loading falls into MULTI) and in that case avoid the queue and just execute synchronously.

More in general, the use of MULTI for long-running jobs like executing models should be discouraged. Transactional guarantees are tricky in the presence of asynchronous, long-running commands.

make run crashed on Ubuntu

I'm getting the following error when trying to run

make run
=== REDIS BUG REPORT START: Cut & paste starting from here ===
20321:M 13 Sep 2018 12:18:05.167 # Redis 999.999.999 crashed by signal: 11
20321:M 13 Sep 2018 12:18:05.167 # Crashed running the instruction at: 0x7f11649ae244
20321:M 13 Sep 2018 12:18:05.167 # Accessing address: 0x8058
20321:M 13 Sep 2018 12:18:05.167 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/lib/x86_64-linux-gnu/libpthread.so.0(__pthread_mutex_trylock+0x14)[0x7f11649ae244]

Backtrace:
../install//redis-server *:6379(logStackTrace+0x5a)[0x562915eb1caa]
../install//redis-server *:6379(sigsegvHandler+0xb1)[0x562915eb2461]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f11649b6890]
/lib/x86_64-linux-gnu/libpthread.so.0(__pthread_mutex_trylock+0x14)[0x7f11649ae244]
../install//redis-server *:6379(je_base_alloc+0x51)[0x562915f2dfe1]
../install/libtensorflow_framework.so(+0x3db2b8)[0x7f115db3a2b8]
../install/libtensorflow_framework.so(+0x3e1cbc)[0x7f115db40cbc]
/lib64/ld-linux-x86-64.so.2(+0x10733)[0x7f116537d733]
/lib64/ld-linux-x86-64.so.2(+0x151ff)[0x7f11653821ff]
/lib/x86_64-linux-gnu/libc.so.6(_dl_catch_exception+0x6f)[0x7f116471a2df]
/lib64/ld-linux-x86-64.so.2(+0x147ca)[0x7f11653817ca]
/lib/x86_64-linux-gnu/libdl.so.2(+0xf96)[0x7f1164dcbf96]
/lib/x86_64-linux-gnu/libc.so.6(_dl_catch_exception+0x6f)[0x7f116471a2df]
/lib/x86_64-linux-gnu/libc.so.6(_dl_catch_error+0x2f)[0x7f116471a36f]
/lib/x86_64-linux-gnu/libdl.so.2(+0x1735)[0x7f1164dcc735]
/lib/x86_64-linux-gnu/libdl.so.2(dlopen+0x71)[0x7f1164dcc051]
../install//redis-server *:6379(moduleLoad+0x4a)[0x562915ee0a3a]
../install//redis-server *:6379(moduleLoadFromQueue+0x43)[0x562915ee0bb3]
../install//redis-server *:6379(main+0x470)[0x562915e62bc0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f11645d4b97]
../install//redis-server *:6379(_start+0x2a)[0x562915e62e9a]

Add `DAGRUN` (or `PIPERUN`) command

A common pattern is enqueuing multiple SCRIPTRUN and MODELRUN commands. Setting input tensors, storing intermediates of *RUN commands and storing outputs all in keyspace means that all written keys will be AOF'd and replicated even if they will be disposed of shortly after.

DAGRUN

A general solution to this problem is to have a DAGRUN (or PIPERUN, we'll see about the name - it is a DAG rather than a pipe, so it should probably be named correctly) command that allows to run a sequence of AI.* commands, e.g.:

AI.DAGRUN TENSORSET ~a~ FLOAT 2 VALUES 2 3 => \
          TENSORSET ~b~ FLOAT 2 VALUES 2 3 => \
          MODELRUN foo INPUTS ~a~ ~b~ OUTPUTS ~c~ => \
          SCRIPTRUN bar baz INPUTS ~c~ OUTPUTS d

Note the key names between ~ ~: these are volatile keys (~a~ is volatile because it has small wings :-) ). A volatile key is not set into keyspace, but it just lives in-memory for the duration of the command, after which is deallocated. In the command above, ~a~, ~b~, ~c~ are volatile, so they don't touch keyspace (and are not replicated). Only the output key d is stored in keyspace, for later retrieval.

Relationship to MODELRUN that returns results

This design supersedes #85, which proposed a MODELRUN variant that returns the output. This can be now obtained by:

AI.DAGRUN MODELRUN foo INPUTS a b OUTPUTS ~c~ => TENSORGET ~c~

or, without touching keyspace at all

AI.DAGRUN TENSORSET ~a~ FLOAT 2 VALUES 2 3 => \
          TENSORSET ~b~ FLOAT 2 VALUES 2 3 => \
          MODELRUN foo INPUTS ~a~ ~b~ OUTPUTS ~c~ => \
          TENSORGET ~c~

Advantages

Apart from the obvious convenience, there are several advantages with this design:

  • once a DAGRUN command is sent, we can apply DAG optimization strategies; for instance, in case of ensambles, we can execute different MODELRUN subcalls in parallel (this was not possible before because each call was blocking on the client) and then join on the results to execute a further SCRIPTRUN that computes the ensembled outputs;
  • when a DAGRUN call executes or if it fails, all volatile keys are deallocated at once; volatile keys are never seen by other clients, they only exist in the context of the call; therefore, we just need a small hash object in the call context that gets naturally deallocated

Additional commands

  • DAGRUNRO: read-only variant that is amenable for execution on replicas; if a non volatile key is attempted to be written, it errors out
  • DAGRUNASYNC: fully async variant, that just returns OK, or, probably better, an id (for eventually querying the status of the run or cancel it, as future commands). The user can then listen to keyspace notifications on the output keys (or check that the key has been written, or in the future query the status of the computation). This is relevant for use in webservices in which handlers shouldn't block.

Crash on MODELRUN with pt-minimal.pt

Redis server is crashing on the MODELRUN with pt-minimal.pt model from example. I am able to run the same model with pytorch from python terminal. I am using ubuntu 18.04.2. Also, I have issue only with libtorch backend. Tensorflow backend with graph.pb ran succesfully.

> redis-cli -x AI.MODELSET mymodel TORCH CPU < pt-minimal.pt
OK
> redis-cli
127.0.0.1:6379> AI.TENSORSET b FLOAT 2 VALUES 2 3
OK
127.0.0.1:6379> AI.TENSORSET a FLOAT 2 VALUES 2 3
OK
127.0.0.1:6379> AI.MODELRUN mymodel INPUTS a b OUTPUTS c
Could not connect to Redis at 127.0.0.1:6379: Connection refused not connected>

Installation failed on both Mac OS and Ubuntu 16.04

I haven't tried debugging further, but right off the bat, i'm seeing the following issues on MacOS and Ubuntu 16.04

Any help would be appreciated. Preparing for a demo with the Redis SF folks at the end of this month here: https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/254846318/

Thanks!!

MacOS:

Hint: It's a good idea to run 'make test' ;)

Downloading libtensorflow
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21.5M  100 21.5M    0     0  14.2M      0  0:00:01  0:00:01 --:--:-- 14.2M
dyld: Library not loaded: @rpath/libtensorflow.so
  Referenced from: /Users/cfregly/workspace-fluxcapacitor/PipelineAI/RedisTF/test/./a.out
  Reason: image not found
get_deps.sh: line 32: 15563 Abort trap: 6           ./a.out
Done

Linux:

Hint: It's a good idea to run 'make test' ;)

make[1]: Leaving directory '/root/RedisTF/deps/redis/src'
Downloading libtensorflow
get_deps.sh: 14: get_deps.sh: [[: not found
get_deps.sh: 16: get_deps.sh: [[: not found
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   226  100   226    0     0   1170      0 --:--:-- --:--:-- --:--:--  1170

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
get_deps.sh: 29: get_deps.sh: [[: not found
tf_api_test.c:2:32: fatal error: tensorflow/c/c_api.h: No such file or directory
compilation terminated.
Done

Create a queue per device, both for models and script

We should allow models and scripts to be placed on the available devices (CPU, GPU0, GPU1, etc) and execute concurrently.

To do so, instead of a single execution queue, we should have as many queues as we have devices.

Add CONFIG command for configuring/loading backends

Supporting multiple backends (and multiple versions of a backend, e.g. CPU-only, CUDA etc) will have to happen on a dynamic configuration basis, using a CONFIG command.

For instance, if we're not using TF, there's no point in loading the libraries or the CUDA kernels.

One tricky part could be unloading, so for the first iteration we wouldn't go back after a load.

TorchScript support could be loaded by default, but we'll need to decide what happens if the default is CPU-only and we then request CUDA libtorch support.

So at first we would require a CONFIG command even for basic script support (i.e. no backend would be loaded by default).

Crash on $REDIS_CLI -x AI.SET GRAPH yolo TF GPU < tiny-yolo-voc.pb

Running load_yolo.sh on MacOS Mojave with Tensorflow 1.12 but encounter "ERR failed creating the graph."

./load_yolo.sh 
SET GRAPH
(error) ERR failed creating the graph
SET TENSOR
(error) ERR data length does not match tensor shape and type
GET TENSOR
(error) WRONGTYPE Operation against a key holding the wrong kind of value
RUN GRAPH
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
GET OUTPUT TENSOR
(error) WRONGTYPE Operation against a key holding the wrong kind of value

There does not seem to be a tf-minimal.py equivalent for tiny-yolo-voc.pb.

Also having trouble finding stack trace or logs from redis-cli.

"did you mean 'libtensorflow_cc.so'?" when running build_deps.sh

ERROR: Skipping '//tensorflow:libtensorflow_c.so': no such target '//tensorflow:libtensorflow_c.so': target 'libtensorflow_c.so' not declared in package 'tensorflow' (did you mean 'libtensorflow_cc.so'?) defined by /home/guy/redislabsmodules/RedisTF/deps/tensorflow/tensorflow/BUILD

Add AI.INFO command

We should write the equivalent of the Redis INFO command, to make it easy to write plugin for Prometheus, Telegraf, etc for monitoring (time to compute, number of requests, memory, state of cuda allocations, etc)

Refactor RUN SET GET commands

The suggested idea is to concentrate all the backend-specific stuff (NAMES included) in the AI.SET GRAPH command;
This way if a client or RedisAI's codebase that calls RUN into a TensorFlow model can change the model underneath into a PyTorch model instead without having to update the code. The model can be updated by calling AI.SET, and all the AI.RUN commands will be unmodified.

RUN will be defined as
AI.RUN GRAPH key INPUTS key1 key2 OUTPUTS key2 key2
As for keeping GRAPH or SCRIPT in there, we could drop it because we can know the underlying type at the key
However, the rest of the API has GRAPH and SCRIPT set explicitly.
Also, the SCRIPT signature has the name of the function as an extra argument, so technically it would be better to have the subcommand set explicitly

AI.SET GRAPH key backend device [INPUTS name1 name2] [OUTPUTS name1 name2] BLOB graphBlob
AI.SET TENSOR tensor_key data_type DIMS dim1..dimN (BLOB data | VALUES ..)
AI.GET TENSOR source VALUES

1) INT32
2) (integer) 3           # remove this
3) 1) (integer) 2
   2) (integer) 2
   3) (integer) 1
4) (integer) 16         # remove this
5) 1) (integer) 721487365
   2) (integer) 168643407
   3) (integer) 658688
   4) (integer) 32554

Crash on AI.SCRIPTSET

redis-cli AI.SCRIPTSET ket CPU "return 1"
ERR: expected def but found 'return' here:
return 1
~~~~~~ <--- HERE

*** Error in `redis-server *:6379': free(): invalid pointer: 0x00007f7dfd03e410 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f7dfd496bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f7dfd49cfc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7f7dfd49d80e]
/usr/lib/redis/modules/redisai.so(RAI_ClearError+0x29)[0x7f7dfb1d4fc6]
/usr/lib/redis/modules/redisai.so(RedisAI_ScriptSet_RedisCommand+0x16a)[0x7f7dfb1d197c]
redis-server *:6379(RedisModuleCommandDispatcher+0x43)[0x55dbd52e0b93]
redis-server *:6379(call+0xa7)[0x55dbd52715a7]
redis-server *:6379(processCommand+0x35f)[0x55dbd5271c5f]
redis-server *:6379(processInputBuffer+0x185)[0x55dbd5281d15]
redis-server *:6379(aeProcessEvents+0x2a0)[0x55dbd526b770]
redis-server *:6379(aeMain+0x2b)[0x55dbd526ba0b]
redis-server *:6379(main+0x4d3)[0x55dbd5268703]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f7dfd4462e1]
redis-server *:6379(_start+0x2a)[0x55dbd526896a]

Error on more than on dimension input

127.0.0.1:6379> TF.TENSOR gg3 FLOAT 3 1 2 2 VALUES 2 2 2 2
(error) ERR wrong number of arguments for 'TF.TENSOR' command

It seems the bug is in

398:  if (hasdata && datafmt == REDISTF_DATA_VALUES && argc != len + 6) {

Should be:

  if (hasdata && datafmt == REDISTF_DATA_VALUES && argc != len + 5 + ndims) {

[bug] RedisTF > models > tf-minimal.py

before

with tf.Session() as sess:
    a = tf.Variable(tf.convert_to_tensor(5, dtype=tf.uint8), name='a')
    b = tf.Variable(tf.convert_to_tensor(6, dtype=tf.uint8), name='b')
 
    c = tf.mul(a, b, name="c")

after

with tf.Session() as sess:
    a = tf.Variable(tf.convert_to_tensor(5, dtype=tf.uint8), name='a')
    b = tf.Variable(tf.convert_to_tensor(6, dtype=tf.uint8), name='b')
 
    c = tf.multiply(a, b, name="c")

Please create a new branch. -> tensorFlow r1.2

The codes in the current master branch are compatible with versions below tensorflow 1.0.
Could you please allow me to try to write the contents of tf_r1.2 branch currently being written to tensorflow 1.0 or later version?

Crash on more than single dimension values

127.0.0.1:6379> TF.TENSOR gg3 FLOAT 3 1 2 2 VALUES 2 2 2 2
Error: Server closed the connection
=== REDIS BUG REPORT START: Cut & paste starting from here ===
9649:M 04 Oct 2018 18:49:35.029 # Redis 999.999.999 crashed by signal: 11
9649:M 04 Oct 2018 18:49:35.029 # Crashed running the instruction at: 0x7f4ceabf6f9a
9649:M 04 Oct 2018 18:49:35.029 # Accessing address: (nil)
9649:M 04 Oct 2018 18:49:35.029 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/home/guy/redislabsmodules/RedisTF/src/redistf.so(RedisTF_SetValueFromDouble+0x4c)[0x7f4ceabf6f9a]

Backtrace:
/home/guy/redislabsmodules/redis/src/redis-server *:6379(logStackTrace+0x5a)[0x557179345d9a]
/home/guy/redislabsmodules/redis/src/redis-server *:6379(sigsegvHandler+0xb1)[0x557179346551]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f4ced3fc890]
/home/guy/redislabsmodules/RedisTF/src/redistf.so(RedisTF_SetValueFromDouble+0x4c)[0x7f4ceabf6f9a]

Crash on TF.GRAPH foo graph.pb

Using graph sample - https://github.com/tensorflow/models/blob/master/samples/languages/java/training/model/graph.pb

redis-cli -x TF.GRAPH foo < graph.pb

causes

22990:M 07 Sep 2018 03:36:17.123 # Redis 999.999.999 crashed by signal: 11
22990:M 07 Sep 2018 03:36:17.123 # Crashed running the instruction at: 0x7f9b2ab313a5
22990:M 07 Sep 2018 03:36:17.123 # Accessing address: 0x68
22990:M 07 Sep 2018 03:36:17.123 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5Graph12AllocateNodeESt10shared_ptrINS_14NodePropertiesEEPKNS_4NodeE+0x55)[0x7f9b2ab313a5]

Backtrace:
/home/guy/redislabsmodules/redis/src/redis-server *:6379(logStackTrace+0x5a)[0x5591347721da]
/home/guy/redislabsmodules/redis/src/redis-server *:6379(sigsegvHandler+0xb1)[0x559134772991]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f9b3153a890]
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5Graph12AllocateNodeESt10shared_ptrINS_14NodePropertiesEEPKNS_4NodeE+0x55)[0x7f9b2ab313a5]
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5Graph7AddNodeERKNS_7NodeDefEPNS_6StatusE+0x365)[0x7f9b2ab318d5]
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5GraphC1EPKNS_19OpRegistryInterfaceE+0x323)[0x7f9b2ab33a93]
/usr/local/lib/libtensorflow.so(_ZN8TF_GraphC2Ev+0x23)[0x7f9b2b8914c3]
/usr/local/lib/libtensorflow.so(TF_NewGraph+0x1e)[0x7f9b2b8915ee]
/home/guy/redislabsmodules/RedisTF/src/redistf.so(RedisTF_Graph_RedisCommand+0x119)[0x7f9b2edf970f]

"python tf-minimal.py" not working

$ python tf-minimal.py 
Traceback (most recent call last):
  File "tf-minimal.py", line 1, in <module>
    import tensorflow as tf
ImportError: No module named tensorflow

Create 0.1.0 branch

This will be useful to support testing and backport changes, while master moves on towards 0.2.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.