redisai / redisai Goto Github PK

View Code? Open in Web Editor NEW

815.0 29.0 107.0 7.88 MB

A Redis module for serving tensors and executing deep learning graphs

Home Page: https://redisai.io

License: Other

Shell 2.13% C 62.44% C++ 5.74% Python 27.61% CMake 1.20% Makefile 0.88%

redisai pytorch tensorflow onnxruntime serving-tensors machine-learning

redisai's Introduction

Caution

RedisAI is no longer actively maintained or supported.

We are grateful to the RedisAI community for their interest and support.

RedisAI

RedisAI is a Redis module for executing Deep Learning/Machine Learning models and managing their data. Its purpose is being a "workhorse" for model serving, by providing out-of-the-box support for popular DL/ML frameworks and unparalleled performance. RedisAI both maximizes computation throughput and reduces latency by adhering to the principle of data locality, as well as simplifies the deployment and serving of graphs by leveraging on Redis' production-proven infrastructure.

To read RedisAI docs, visit redisai.io. To see RedisAI in action, visit the demos page.

Quickstart

RedisAI is a Redis module. To run it you'll need a Redis server (v6.0.0 or greater), the module's shared library, and its dependencies.

The following sections describe how to get started with RedisAI.

Docker

The quickest way to try RedisAI is by launching its official Docker container images.

On a CPU only machine

docker run -p 6379:6379 redislabs/redisai:1.2.7-cpu-bionic

On a GPU machine

For GPU support you will need a machine you'll need a machine that has Nvidia driver (CUDA 11.3 and cuDNN 8.1), nvidia-container-toolkit and Docker 19.03+ installed. For detailed information, checkout nvidia-docker documentation

docker run -p 6379:6379 --gpus all -it --rm redislabs/redisai:1.2.7-gpu-bionic

Building

You can compile and build the module from its source code. The Developer page has more information about the design and implementation of the RedisAI module and how to contribute.

Prerequisites

Packages: git, python3, make, wget, g++/clang, & unzip
CMake 3.0 or higher needs to be installed.
CUDA 11.3 and cuDNN 8.1 or higher needs to be installed if GPU support is required.
Redis v6.0.0 or greater.

Get the Source Code

You can obtain the module's source code by cloning the project's repository using git like so:

git clone --recursive https://github.com/RedisAI/RedisAI

Switch to the project's directory with:

cd RedisAI

Building the Dependencies

Use the following script to download and build the libraries of the various RedisAI backends (TensorFlow, PyTorch, ONNXRuntime) for CPU only:

bash get_deps.sh

Alternatively, you can run the following to fetch the backends with GPU support.

bash get_deps.sh gpu

Building the Module

Once the dependencies have been built, you can build the RedisAI module with:

make -C opt clean ALL=1
make -C opt

Alternatively, run the following to build RedisAI with GPU support:

make -C opt clean ALL=1
make -C opt GPU=1

Backend Dependancy

RedisAI currently supports PyTorch (libtorch), Tensorflow (libtensorflow), TensorFlow Lite, and ONNXRuntime as backends. This section shows the version map between RedisAI and supported backends. This extremely important since the serialization mechanism of one version might not match with another. For making sure your model will work with a given RedisAI version, check with the backend documentation about incompatible features between the version of your backend and the version RedisAI is built with.

RedisAI	PyTorch	TensorFlow	TFLite	ONNXRuntime
1.0.3	1.5.0	1.15.0	2.0.0	1.2.0
1.2.7	1.11.0	2.8.0	2.0.0	1.11.1
master	1.11.0	2.8.0	2.0.0	1.11.1

Note: Keras and TensorFlow 2.x are supported through graph freezing. See this script to see how to export a frozen graph from Keras and TensorFlow 2.x.

Loading the Module

To load the module upon starting the Redis server, simply use the --loadmodule command line switch, the loadmodule configuration directive or the Redis MODULE LOAD command with the path to module's library.

For example, to load the module from the project's path with a server command line switch use the following:

redis-server --loadmodule ./install-cpu/redisai.so

Give it a try

Once loaded, you can interact with RedisAI using redis-cli. Basic information and examples for using the module is described here.

Client libraries

Some languages already have client libraries that provide support for RedisAI's commands. The following table lists the known ones:

Project	Language	License	Author	URL
JRedisAI	Java	BSD-3	RedisLabs	Github
redisai-py	Python	BSD-3	RedisLabs	Github
redisai-go	Go	BSD-3	RedisLabs	Github
redisai-js	Typescript/Javascript	BSD-3	RedisLabs	Github
redis-modules-sdk	TypeScript	BSD-3-Clause	Dani Tseitlin	Github
redis-modules-java	Java	Apache-2.0	dengliming	Github
smartredis	C++	BSD-2-Clause	Cray Labs	Github
smartredis	C	BSD-2-Clause	Cray Labs	Github
smartredis	Fortran	BSD-2-Clause	Cray Labs	Github
smartredis	Python	BSD-2-Clause	Cray Labs	Github

The full documentation for RedisAI's API can be found at the Commands page.

Documentation

Read the docs at redisai.io.

Contact Us

If you have questions, want to provide feedback or perhaps report an issue or contribute some code, here's where we're listening to you:

License

RedisAI is licensed under your choice of the Redis Source Available License 2.0 (RSALv2) or the Server Side Public License v1 (SSPLv1).

redisai's People

Contributors

Stargazers

Watchers

Forkers

poseidon1214 mindis itamarhaber gkorland leobravorain meirshpilraien stockholmux k-jo tladd parekhabhishekn hhsecond codeaudit rlizzo rafie joecianflone zilbermanor ccf19881030 nlgrf sonfire186 endymecy jonaskuiler zofuthan filipecosta90 andresrinivasan vikash-kumar-1 tatsuiman june9666 gavinljj m1losz lanwan harriteja itterrefcm2791 roeelupo diotro xiaming9880 robbiejvmw iask stjordanis dudaka raymondzhouyang yypurdi pynchmeister f168 kiminh rlnbharadwaj apsis-inc germangh shobhit-agarwal guyroyse jimmyjimmy94 collabnix keithsu0215 aindrila2412 mmmika nishantgeorge koleini imaginary-person mludv shooterit dengliming jenhaoyang tomerhekmati spartee wangxianghust hakim-ou chiaxinliang nvhuong viettel-solutions beamsies ricardozv opendocs-editor matipan hj-j11 jaclit adofsauron ttvuo ashao jordanresearch vedant-gandhi shibinmak annacode747 reptilefury lgtm-migrator chayim justcherie songnous tanguofu al-rigazzi elm8116 greydoubt sandy4321 jamestiotio weedge xiaozhi308 wh-gd spetrescu yueyedeai saeid-a hellobrobro bpd1069

redisai's Issues

Valgrind memory leak test

Most of leaks are within corner cases ( disconnects, bad input, etc... ).

get_deps.sh should by default clone a specific stable versions of Redis

git clone https://github.com/antirez/redis.git

Add variant of MODELRUN that returns outputs instead of storing them at keys

This variant could be triggered by simply omitting the OUTPUTS arguments in the command.

Replicate results instead of commands in `MODELRUN`

Instead of replicating MODELRUN verbatim, we should just replicate the result.

This amounts to using RedisModule_Replicate and sending AI.TENSORSET instead of AI.MODELRUN, with the serialized output tensors as arguments.

This needs to happen in RedisAI_Run_Reply, once the computation has finished and the client has been unblocked and the response is being sent (since that is the first opportunity to have the outputs available in the main thread)

https://github.com/RedisAI/RedisAI/blob/master/src/redisai.c#L637

Quick start guide is incomplete

I understand it's under development. Certain things I found good-to-have in the document is mentioning about the command line arguments to get_deps.sh, pre-requisites for get_deps.sh to work such as - unzip, cmake, g++, redis server, GPU/CPU etc. Perhaps our shell script could give a warning before it starts the execution. I could give a PR, if these are not written yet

Implement serialization and replication

So far we didn't really take care of serialization and replication for RedisAI data types.
It should be straightforward, with one caveat: in MODELSET we take the protobuf, deserialize it and throw it away (we only keep the model in memory, not the protobuf as it would be wasteful).
However, we will need to serialize the protobuf as-is (pre deserialization) for persistence and replication within the call to ModelSet.

Related to #63.

Crash on AI.SCRIPTSET

redis-cli AI.SCRIPTSET ket CPU "a"

ERR: expected def but found 'ident' here:
a
~ <--- HERE

*** Error in `redis-server *:6379': free(): invalid pointer: 0x00007f555de37310 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f555e2a2bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f555e2a8fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7f555e2a980e]
/usr/lib/redis/modules/redisai.so(RAI_ClearError+0x29)[0x7f555bfd4fc6]
/usr/lib/redis/modules/redisai.so(RedisAI_ScriptSet_RedisCommand+0x16a)[0x7f555bfd197c]
redis-server *:6379(RedisModuleCommandDispatcher+0x43)[0x56123cb02b93]
redis-server *:6379(call+0xa7)[0x56123ca935a7]
redis-server *:6379(processCommand+0x35f)[0x56123ca93c5f]
redis-server *:6379(processInputBuffer+0x185)[0x56123caa3d15]
redis-server *:6379(aeProcessEvents+0x2a0)[0x56123ca8d770]
redis-server *:6379(aeMain+0x2b)[0x56123ca8da0b]
redis-server *:6379(main+0x4d3)[0x56123ca8a703]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f555e2522e1]
redis-server *:6379(_start+0x2a)[0x56123ca8a96a]

TENSORGET returns only zeros on INT32

127.0.0.1:6379> AI.TensorSET source FLOAT 2 2 VALUES 1 2 3 4
OK
127.0.0.1:6379>  AI.TENSORGET source VALUES
1) FLOAT
2) 1) (integer) 2
   2) (integer) 2
3) 1) "1"
   2) "2"
   3) "3"
   4) "4"

127.0.0.1:6379> AI.TensorSET source INT32 2 2 VALUES 1 2 3 4
OK
127.0.0.1:6379>  AI.TENSORGET source VALUES
1) INT32
2) 1) (integer) 2
   2) (integer) 2
3) 1) (integer) 0
   2) (integer) 0
   3) (integer) 0
   4) (integer) 0

Crash on AI.MODELSET

Running the following:

redis-cli AI.MODELSET resnet18 TF CPU INPUTS input OUTPUT target < graph.pb

Causes:

ERR: Invalid GraphDef
*** Error in `redis-server *:6379': free(): invalid pointer: 0x00007f149b228b58 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f149b852bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f149b858fc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7f149b85980e]
/usr/lib/redis/modules/redisai.so(RAI_ClearError+0x29)[0x7f14993d4fc6]
/usr/lib/redis/modules/redisai.so(RedisAI_ModelSet_RedisCommand+0x58b)[0x7f14993d02e0]
redis-server *:6379(RedisModuleCommandDispatcher+0x43)[0x55fc4a029b93]
redis-server *:6379(call+0xa7)[0x55fc49fba5a7]
redis-server *:6379(processCommand+0x35f)[0x55fc49fbac5f]
redis-server *:6379(processInputBuffer+0x185)[0x55fc49fcad15]
redis-server *:6379(aeProcessEvents+0x2a0)[0x55fc49fb4770]
redis-server *:6379(aeMain+0x2b)[0x55fc49fb4a0b]
redis-server *:6379(main+0x4d3)[0x55fc49fb1703]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f149b8022e1]
redis-server *:6379(_start+0x2a)[0x55fc49fb196a]

Use redis memory allocator for each backend

Implement auto-batching

At some point we should introduce automatic batching of run requests. Models, especially on the GPU, run more efficiently when inputs are batched.

One possible use case is that multiple run requests to the same model that are sitting in the queue are batched together and invoked once. This could work:

analyze the queue, see if there are other calls to the same model (with inputs of the same shape) queued up
take a (configurable) number of requests and assemble the inputs tensors into a single tensor along the 0-th dimension
call the model
unpack the 0-th dimension over the output keys for each request
unblock the clients

This would allow requests from multiple clients to the same model to be batched.

A run could be triggered when a) enough requests have been queued up (aka the batch is large enough) OR b) some time has expired.

We could configure this when calling MODELSET, or with a separate command (like MODELCONFIG BATCH). Or both.

Test unhappy paths

We need to thoroughly test (in RLTest) as many unhappy paths as possible, like

incomplete or mis-crafted commands
corrupted protobufs
incorrect scripts

Tensor values shouldn't be returned strings but as numbers

127.0.0.1:6379>  AI.TENSORGET source VALUES
1) FLOAT
2) 1) (integer) 2
   2) (integer) 2
3) 1) "1.1000000238418579"
   2) "2.0999999046325684"
   3) "3.0999999046325684"
   4) "4.0999999046325684"

Add integration with Redis data structures

Two, non mutually exclusive, possibilities:

AI.TENSORSET key FROM otherkey, with eventual other options to guide the conversion
Have input and output keys to AI.MODELRUN and AI.SCRIPTRUN as streams (thanks to @antirez for the idea), to enqueue requests are read corresponding outputs using the stream index

The RDB file contains module data for the module type 'DL_TENSOR', that the responsible module is not able to load

RedisAI

Steps:

Create a database without replication
Put some AI keys to the database
Enable replication
Slave will fail to sync from master:

20664:S 22 Feb 2019 11:45:42.219 * Connecting to MASTER 172.31.1.242:25727
20664:S 22 Feb 2019 11:45:42.219 * MASTER <-> REPLICA sync started
20664:S 22 Feb 2019 11:45:42.220 * Non blocking connect for SYNC fired the event.
20664:S 22 Feb 2019 11:45:42.221 * Master replied to PING, replication can continue...
20664:S 22 Feb 2019 11:45:42.222 * Partial resynchronization not possible (no cached master)
20664:S 22 Feb 2019 11:45:48.364 * Full resync from master: cdbb5bc4e7c49589646c974370c0a49c58d1c665:14
20664:S 22 Feb 2019 11:45:48.365 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to parser
20664:S 22 Feb 2019 11:45:48.365 * MASTER <-> REPLICA sync: Flushing old data
20664:S 22 Feb 2019 11:45:48.365 * MASTER <-> REPLICA sync: Loading DB in memory
20664:S 22 Feb 2019 11:45:48.366 # The RDB file contains module data for the module type 'DL_TENSOR', that the responsible module is not able to load. Check for modules log above for additional clues.
20664:S 22 Feb 2019 11:45:48.366 # Short read error when parsing object
20664:S 22 Feb 2019 11:45:48.366 # Failed trying to load the MASTER synchronization DB from disk

Fix Docker

Add synchronous MODELRUN execution for AOF loading, MULTI, Lua

AOF loading does not support commands that block the client.

In AI.MODELRUN we should detect MULTI and Lua contexts (AOF loading falls into MULTI) and in that case avoid the queue and just execute synchronously.

More in general, the use of MULTI for long-running jobs like executing models should be discouraged. Transactional guarantees are tricky in the presence of asynchronous, long-running commands.

[FR] Decrease crypticness of 'model fail to run' error message

make run crashed on Ubuntu

I'm getting the following error when trying to run

make run

=== REDIS BUG REPORT START: Cut & paste starting from here ===
20321:M 13 Sep 2018 12:18:05.167 # Redis 999.999.999 crashed by signal: 11
20321:M 13 Sep 2018 12:18:05.167 # Crashed running the instruction at: 0x7f11649ae244
20321:M 13 Sep 2018 12:18:05.167 # Accessing address: 0x8058
20321:M 13 Sep 2018 12:18:05.167 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/lib/x86_64-linux-gnu/libpthread.so.0(__pthread_mutex_trylock+0x14)[0x7f11649ae244]

Backtrace:
../install//redis-server *:6379(logStackTrace+0x5a)[0x562915eb1caa]
../install//redis-server *:6379(sigsegvHandler+0xb1)[0x562915eb2461]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f11649b6890]
/lib/x86_64-linux-gnu/libpthread.so.0(__pthread_mutex_trylock+0x14)[0x7f11649ae244]
../install//redis-server *:6379(je_base_alloc+0x51)[0x562915f2dfe1]
../install/libtensorflow_framework.so(+0x3db2b8)[0x7f115db3a2b8]
../install/libtensorflow_framework.so(+0x3e1cbc)[0x7f115db40cbc]
/lib64/ld-linux-x86-64.so.2(+0x10733)[0x7f116537d733]
/lib64/ld-linux-x86-64.so.2(+0x151ff)[0x7f11653821ff]
/lib/x86_64-linux-gnu/libc.so.6(_dl_catch_exception+0x6f)[0x7f116471a2df]
/lib64/ld-linux-x86-64.so.2(+0x147ca)[0x7f11653817ca]
/lib/x86_64-linux-gnu/libdl.so.2(+0xf96)[0x7f1164dcbf96]
/lib/x86_64-linux-gnu/libc.so.6(_dl_catch_exception+0x6f)[0x7f116471a2df]
/lib/x86_64-linux-gnu/libc.so.6(_dl_catch_error+0x2f)[0x7f116471a36f]
/lib/x86_64-linux-gnu/libdl.so.2(+0x1735)[0x7f1164dcc735]
/lib/x86_64-linux-gnu/libdl.so.2(dlopen+0x71)[0x7f1164dcc051]
../install//redis-server *:6379(moduleLoad+0x4a)[0x562915ee0a3a]
../install//redis-server *:6379(moduleLoadFromQueue+0x43)[0x562915ee0bb3]
../install//redis-server *:6379(main+0x470)[0x562915e62bc0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f11645d4b97]
../install//redis-server *:6379(_start+0x2a)[0x562915e62e9a]

Add `DAGRUN` (or `PIPERUN`) command

A common pattern is enqueuing multiple SCRIPTRUN and MODELRUN commands. Setting input tensors, storing intermediates of *RUN commands and storing outputs all in keyspace means that all written keys will be AOF'd and replicated even if they will be disposed of shortly after.

DAGRUN

A general solution to this problem is to have a DAGRUN (or PIPERUN, we'll see about the name - it is a DAG rather than a pipe, so it should probably be named correctly) command that allows to run a sequence of AI.* commands, e.g.:

AI.DAGRUN TENSORSET ~a~ FLOAT 2 VALUES 2 3 => \
          TENSORSET ~b~ FLOAT 2 VALUES 2 3 => \
          MODELRUN foo INPUTS ~a~ ~b~ OUTPUTS ~c~ => \
          SCRIPTRUN bar baz INPUTS ~c~ OUTPUTS d

Note the key names between ~ ~: these are volatile keys (~a~ is volatile because it has small wings :-) ). A volatile key is not set into keyspace, but it just lives in-memory for the duration of the command, after which is deallocated. In the command above, ~a~, ~b~, ~c~ are volatile, so they don't touch keyspace (and are not replicated). Only the output key d is stored in keyspace, for later retrieval.

Relationship to MODELRUN that returns results

This design supersedes #85, which proposed a MODELRUN variant that returns the output. This can be now obtained by:

AI.DAGRUN MODELRUN foo INPUTS a b OUTPUTS ~c~ => TENSORGET ~c~

or, without touching keyspace at all

AI.DAGRUN TENSORSET ~a~ FLOAT 2 VALUES 2 3 => \
          TENSORSET ~b~ FLOAT 2 VALUES 2 3 => \
          MODELRUN foo INPUTS ~a~ ~b~ OUTPUTS ~c~ => \
          TENSORGET ~c~

Advantages

Apart from the obvious convenience, there are several advantages with this design:

once a DAGRUN command is sent, we can apply DAG optimization strategies; for instance, in case of ensambles, we can execute different MODELRUN subcalls in parallel (this was not possible before because each call was blocking on the client) and then join on the results to execute a further SCRIPTRUN that computes the ensembled outputs;
when a DAGRUN call executes or if it fails, all volatile keys are deallocated at once; volatile keys are never seen by other clients, they only exist in the context of the call; therefore, we just need a small hash object in the call context that gets naturally deallocated

Additional commands

DAGRUNRO: read-only variant that is amenable for execution on replicas; if a non volatile key is attempted to be written, it errors out
DAGRUNASYNC: fully async variant, that just returns OK, or, probably better, an id (for eventually querying the status of the run or cancel it, as future commands). The user can then listen to keyspace notifications on the output keys (or check that the key has been written, or in the future query the status of the computation). This is relevant for use in webservices in which handlers shouldn't block.

Crash on MODELRUN with pt-minimal.pt

Redis server is crashing on the MODELRUN with pt-minimal.pt model from example. I am able to run the same model with pytorch from python terminal. I am using ubuntu 18.04.2. Also, I have issue only with libtorch backend. Tensorflow backend with graph.pb ran succesfully.

> redis-cli -x AI.MODELSET mymodel TORCH CPU < pt-minimal.pt
OK
> redis-cli
127.0.0.1:6379> AI.TENSORSET b FLOAT 2 VALUES 2 3
OK
127.0.0.1:6379> AI.TENSORSET a FLOAT 2 VALUES 2 3
OK
127.0.0.1:6379> AI.MODELRUN mymodel INPUTS a b OUTPUTS c
Could not connect to Redis at 127.0.0.1:6379: Connection refused not connected>

Installation failed on both Mac OS and Ubuntu 16.04

I haven't tried debugging further, but right off the bat, i'm seeing the following issues on MacOS and Ubuntu 16.04

Any help would be appreciated. Preparing for a demo with the Redis SF folks at the end of this month here: https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/254846318/

Thanks!!

MacOS:

Hint: It's a good idea to run 'make test' ;)

Downloading libtensorflow
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21.5M  100 21.5M    0     0  14.2M      0  0:00:01  0:00:01 --:--:-- 14.2M
dyld: Library not loaded: @rpath/libtensorflow.so
  Referenced from: /Users/cfregly/workspace-fluxcapacitor/PipelineAI/RedisTF/test/./a.out
  Reason: image not found
get_deps.sh: line 32: 15563 Abort trap: 6           ./a.out
Done

Linux:

Hint: It's a good idea to run 'make test' ;)

make[1]: Leaving directory '/root/RedisTF/deps/redis/src'
Downloading libtensorflow
get_deps.sh: 14: get_deps.sh: [[: not found
get_deps.sh: 16: get_deps.sh: [[: not found
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   226  100   226    0     0   1170      0 --:--:-- --:--:-- --:--:--  1170

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
get_deps.sh: 29: get_deps.sh: [[: not found
tf_api_test.c:2:32: fatal error: tensorflow/c/c_api.h: No such file or directory
compilation terminated.
Done

Add RLTests for PyTorch backend

GET TENSOR on missing key wrong error

current error

(error) WRONGTYPE Operation against a key holding the wrong kind of value

Should be more about missing key

Unmatched functions declarations between API and actual signatures

Errors are not handled when calling scripts and models via API.

/cc @MeirShpilraien

Create a queue per device, both for models and script

We should allow models and scripts to be placed on the available devices (CPU, GPU0, GPU1, etc) and execute concurrently.

To do so, instead of a single execution queue, we should have as many queues as we have devices.

Publish docker on docker hub

Refactor multi-backend support to use function pointers

See #45 (comment)

Add CONFIG command for configuring/loading backends

Supporting multiple backends (and multiple versions of a backend, e.g. CPU-only, CUDA etc) will have to happen on a dynamic configuration basis, using a CONFIG command.

For instance, if we're not using TF, there's no point in loading the libraries or the CUDA kernels.

One tricky part could be unloading, so for the first iteration we wouldn't go back after a load.

TorchScript support could be loaded by default, but we'll need to decide what happens if the default is CPU-only and we then request CUDA libtorch support.

So at first we would require a CONFIG command even for basic script support (i.e. no backend would be loaded by default).

Add libtorch backend

libtorch is a C++ library behind PyTorch.

It provides running models traced from PyTorch (from Python or the C++ frontend): https://pytorch.org/tutorials/advanced/cpp_export.html

It also provides TorchScript, which is a key feature in RedisAI.

Optimized TF compilation to use local CPU supported instructions

Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA"

Crash on $REDIS_CLI -x AI.SET GRAPH yolo TF GPU < tiny-yolo-voc.pb

Running load_yolo.sh on MacOS Mojave with Tensorflow 1.12 but encounter "ERR failed creating the graph."

./load_yolo.sh 
SET GRAPH
(error) ERR failed creating the graph
SET TENSOR
(error) ERR data length does not match tensor shape and type
GET TENSOR
(error) WRONGTYPE Operation against a key holding the wrong kind of value
RUN GRAPH
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
(error) WRONGTYPE Operation against a key holding the wrong kind of value
GET OUTPUT TENSOR
(error) WRONGTYPE Operation against a key holding the wrong kind of value

There does not seem to be a tf-minimal.py equivalent for tiny-yolo-voc.pb.

Also having trouble finding stack trace or logs from redis-cli.

Add ONNXRuntime backend

For more info on ONNXRuntime see https://github.com/Microsoft/onnxruntime

It has very extensive support for the ONNX standard.
It also supports ONNX-ML, allowing ML models created, say, with sklearn, to be executed in RedisAI.

"did you mean 'libtensorflow_cc.so'?" when running build_deps.sh

ERROR: Skipping '//tensorflow:libtensorflow_c.so': no such target '//tensorflow:libtensorflow_c.so': target 'libtensorflow_c.so' not declared in package 'tensorflow' (did you mean 'libtensorflow_cc.so'?) defined by /home/guy/redislabsmodules/RedisTF/deps/tensorflow/tensorflow/BUILD

Add AI.INFO command

We should write the equivalent of the Redis INFO command, to make it easy to write plugin for Prometheus, Telegraf, etc for monitoring (time to compute, number of requests, memory, state of cuda allocations, etc)

Set module version to 0.1.0

Refactor RUN SET GET commands

The suggested idea is to concentrate all the backend-specific stuff (NAMES included) in the AI.SET GRAPH command;
This way if a client or RedisAI's codebase that calls RUN into a TensorFlow model can change the model underneath into a PyTorch model instead without having to update the code. The model can be updated by calling AI.SET, and all the AI.RUN commands will be unmodified.

RUN will be defined as
AI.RUN GRAPH key INPUTS key1 key2 OUTPUTS key2 key2
As for keeping GRAPH or SCRIPT in there, we could drop it because we can know the underlying type at the key
However, the rest of the API has GRAPH and SCRIPT set explicitly.
Also, the SCRIPT signature has the name of the function as an extra argument, so technically it would be better to have the subcommand set explicitly

AI.SET GRAPH key backend device [INPUTS name1 name2] [OUTPUTS name1 name2] BLOB graphBlob

AI.SET TENSOR tensor_key data_type DIMS dim1..dimN (BLOB data | VALUES ..)

AI.GET TENSOR source VALUES

1) INT32
2) (integer) 3           # remove this
3) 1) (integer) 2
   2) (integer) 2
   3) (integer) 1
4) (integer) 16         # remove this
5) 1) (integer) 721487365
   2) (integer) 168643407
   3) (integer) 658688
   4) (integer) 32554

Crash on AI.SCRIPTSET

redis-cli AI.SCRIPTSET ket CPU "return 1"

ERR: expected def but found 'return' here:
return 1
~~~~~~ <--- HERE

*** Error in `redis-server *:6379': free(): invalid pointer: 0x00007f7dfd03e410 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f7dfd496bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f7dfd49cfc6]
/lib/x86_64-linux-gnu/libc.so.6(+0x7780e)[0x7f7dfd49d80e]
/usr/lib/redis/modules/redisai.so(RAI_ClearError+0x29)[0x7f7dfb1d4fc6]
/usr/lib/redis/modules/redisai.so(RedisAI_ScriptSet_RedisCommand+0x16a)[0x7f7dfb1d197c]
redis-server *:6379(RedisModuleCommandDispatcher+0x43)[0x55dbd52e0b93]
redis-server *:6379(call+0xa7)[0x55dbd52715a7]
redis-server *:6379(processCommand+0x35f)[0x55dbd5271c5f]
redis-server *:6379(processInputBuffer+0x185)[0x55dbd5281d15]
redis-server *:6379(aeProcessEvents+0x2a0)[0x55dbd526b770]
redis-server *:6379(aeMain+0x2b)[0x55dbd526ba0b]
redis-server *:6379(main+0x4d3)[0x55dbd5268703]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f7dfd4462e1]
redis-server *:6379(_start+0x2a)[0x55dbd526896a]

Optimized TF compilation to use GPU/CUDA

Error on more than on dimension input

127.0.0.1:6379> TF.TENSOR gg3 FLOAT 3 1 2 2 VALUES 2 2 2 2
(error) ERR wrong number of arguments for 'TF.TENSOR' command

It seems the bug is in

398:  if (hasdata && datafmt == REDISTF_DATA_VALUES && argc != len + 6) {

Should be:

  if (hasdata && datafmt == REDISTF_DATA_VALUES && argc != len + 5 + ndims) {

[bug] RedisTF > models > tf-minimal.py

before

with tf.Session() as sess:
    a = tf.Variable(tf.convert_to_tensor(5, dtype=tf.uint8), name='a')
    b = tf.Variable(tf.convert_to_tensor(6, dtype=tf.uint8), name='b')
 
    c = tf.mul(a, b, name="c")

after

with tf.Session() as sess:
    a = tf.Variable(tf.convert_to_tensor(5, dtype=tf.uint8), name='a')
    b = tf.Variable(tf.convert_to_tensor(6, dtype=tf.uint8), name='b')
 
    c = tf.multiply(a, b, name="c")

setting error

https://github.com/lantiga/RedisTF > Testing
bulid_deps.sh file

bazel build -c opt //tensorflow:libtensorflow_c.so

Error is checked in this part.

Please create a new branch. -> tensorFlow r1.2

The codes in the current master branch are compatible with versions below tensorflow 1.0.
Could you please allow me to try to write the contents of tf_r1.2 branch currently being written to tensorflow 1.0 or later version?

Crash on more than single dimension values

127.0.0.1:6379> TF.TENSOR gg3 FLOAT 3 1 2 2 VALUES 2 2 2 2
Error: Server closed the connection

=== REDIS BUG REPORT START: Cut & paste starting from here ===
9649:M 04 Oct 2018 18:49:35.029 # Redis 999.999.999 crashed by signal: 11
9649:M 04 Oct 2018 18:49:35.029 # Crashed running the instruction at: 0x7f4ceabf6f9a
9649:M 04 Oct 2018 18:49:35.029 # Accessing address: (nil)
9649:M 04 Oct 2018 18:49:35.029 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/home/guy/redislabsmodules/RedisTF/src/redistf.so(RedisTF_SetValueFromDouble+0x4c)[0x7f4ceabf6f9a]

Backtrace:
/home/guy/redislabsmodules/redis/src/redis-server *:6379(logStackTrace+0x5a)[0x557179345d9a]
/home/guy/redislabsmodules/redis/src/redis-server *:6379(sigsegvHandler+0xb1)[0x557179346551]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f4ced3fc890]
/home/guy/redislabsmodules/RedisTF/src/redistf.so(RedisTF_SetValueFromDouble+0x4c)[0x7f4ceabf6f9a]

Crash on TF.GRAPH foo graph.pb

Using graph sample - https://github.com/tensorflow/models/blob/master/samples/languages/java/training/model/graph.pb

redis-cli -x TF.GRAPH foo < graph.pb

causes

22990:M 07 Sep 2018 03:36:17.123 # Redis 999.999.999 crashed by signal: 11
22990:M 07 Sep 2018 03:36:17.123 # Crashed running the instruction at: 0x7f9b2ab313a5
22990:M 07 Sep 2018 03:36:17.123 # Accessing address: 0x68
22990:M 07 Sep 2018 03:36:17.123 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5Graph12AllocateNodeESt10shared_ptrINS_14NodePropertiesEEPKNS_4NodeE+0x55)[0x7f9b2ab313a5]

Backtrace:
/home/guy/redislabsmodules/redis/src/redis-server *:6379(logStackTrace+0x5a)[0x5591347721da]
/home/guy/redislabsmodules/redis/src/redis-server *:6379(sigsegvHandler+0xb1)[0x559134772991]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f9b3153a890]
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5Graph12AllocateNodeESt10shared_ptrINS_14NodePropertiesEEPKNS_4NodeE+0x55)[0x7f9b2ab313a5]
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5Graph7AddNodeERKNS_7NodeDefEPNS_6StatusE+0x365)[0x7f9b2ab318d5]
/usr/local/lib/libtensorflow_framework.so(_ZN10tensorflow5GraphC1EPKNS_19OpRegistryInterfaceE+0x323)[0x7f9b2ab33a93]
/usr/local/lib/libtensorflow.so(_ZN8TF_GraphC2Ev+0x23)[0x7f9b2b8914c3]
/usr/local/lib/libtensorflow.so(TF_NewGraph+0x1e)[0x7f9b2b8915ee]
/home/guy/redislabsmodules/RedisTF/src/redistf.so(RedisTF_Graph_RedisCommand+0x119)[0x7f9b2edf970f]

$ python tf-minimal.py 
Traceback (most recent call last):
  File "tf-minimal.py", line 1, in <module>
    import tensorflow as tf
ImportError: No module named tensorflow

Create 0.1.0 branch

This will be useful to support testing and backport changes, while master moves on towards 0.2.0

redisai / redisai Goto Github PK

redisai's Introduction

RedisAI

Quickstart

Docker

On a CPU only machine

On a GPU machine

Building

Prerequisites

Get the Source Code

Building the Dependencies

Building the Module

Backend Dependancy

Loading the Module

Give it a try

Client libraries

Documentation

Contact Us

License

redisai's People

Contributors

Stargazers

Watchers

Forkers

redisai's Issues

DAGRUN

Relationship to MODELRUN that returns results

Advantages

Additional commands

Recommend Projects

Recommend Topics

Recommend Org