awslabs / multi-model-server Goto Github PK

Multi Model Server is a tool for serving neural net models for inference

License: Apache License 2.0

Shell 1.79% Python 38.41% Roff 2.77% Java 57.03%

mxnet deep-learning inference ai neural-network onnx server

multi-model-server's Introduction

Multi Model Server

ubuntu/python-2.7	ubuntu/python-3.6

Multi Model Server (MMS) is a flexible and easy to use tool for serving deep learning models trained using any ML/DL framework.

Use the MMS Server CLI, or the pre-configured Docker images, to start a service that sets up HTTP endpoints to handle model inference requests.

A quick overview and examples for both serving and packaging are provided below. Detailed documentation and examples are provided in the docs folder.

Join our slack channel to get in touch with development team, ask questions, find out what's cooking and more!

Quick Start

Prerequisites

Before proceeding further with this document, make sure you have the following prerequisites.

Ubuntu, CentOS, or macOS. Windows support is experimental. The following instructions will focus on Linux and macOS only.
Python - Multi Model Server requires python to run the workers.
pip - Pip is a python package management system.

Java 8 - Multi Model Server requires Java 8 to start. You have the following options for installing Java 8:

For Ubuntu:

sudo apt-get install openjdk-8-jre-headless

For CentOS:

sudo yum install java-1.8.0-openjdk

For macOS:

brew tap homebrew/cask-versions
brew update
brew cask install adoptopenjdk8

Installing Multi Model Server with pip

Setup

Step 1: Setup a Virtual Environment

We recommend installing and running Multi Model Server in a virtual environment. It's a good practice to run and install all of the Python dependencies in virtual environments. This will provide isolation of the dependencies and ease dependency management.

One option is to use Virtualenv. This is used to create virtual Python environments. You may install and activate a virtualenv for Python 2.7 as follows:

pip install virtualenv

Then create a virtual environment:

# Assuming we want to run python2.7 in /usr/local/bin/python2.7
virtualenv -p /usr/local/bin/python2.7 /tmp/pyenv2
# Enter this virtual environment as follows
source /tmp/pyenv2/bin/activate

Refer to the Virtualenv documentation for further information.

Step 2: Install MXNet MMS won't install the MXNet engine by default. If it isn't already installed in your virtual environment, you must install one of the MXNet pip packages.

For CPU inference, mxnet-mkl is recommended. Install it as follows:

# Recommended for running Multi Model Server on CPU hosts
pip install mxnet-mkl

For GPU inference, mxnet-cu92mkl is recommended. Install it as follows:

# Recommended for running Multi Model Server on GPU hosts
pip install mxnet-cu92mkl

Step 3: Install or Upgrade MMS as follows:

# Install latest released version of multi-model-server 
pip install multi-model-server

To upgrade from a previous version of multi-model-server, please refer migration reference document.

Notes:

A minimal version of model-archiver will be installed with MMS as dependency. See model-archiver for more options and details.
See the advanced installation page for more options and troubleshooting.

Serve a Model

Once installed, you can get MMS model server up and running very quickly. Try out --help to see all the CLI options available.

multi-model-server --help

For this quick start, we'll skip over most of the features, but be sure to take a look at the full server docs when you're ready.

Here is an easy example for serving an object classification model:

multi-model-server --start --models squeezenet=https://s3.amazonaws.com/model-server/model_archive_1.0/squeezenet_v1.1.mar

With the command above executed, you have MMS running on your host, listening for inference requests. Please note, that if you specify model(s) during MMS start - it will automatically scale backend workers to the number equal to available vCPUs (if you run on CPU instance) or to the number of available GPUs (if you run on GPU instance). In case of powerful hosts with a lot of compute resoures (vCPUs or GPUs) this start up and autoscaling process might take considerable time. If you would like to minimize MMS start up time you can try to avoid registering and scaling up model during start up time and move that to a later point by using corresponding Management API calls (this allows finer grain control to how much resources are allocated for any particular model).

To test it out, you can open a new terminal window next to the one running MMS. Then you can use curl to download one of these cute pictures of a kitten and curl's -o flag will name it kitten.jpg for you. Then you will curl a POST to the MMS predict endpoint with the kitten's image.

In the example below, we provide a shortcut for these steps.

curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
curl -X POST http://127.0.0.1:8080/predictions/squeezenet -T kitten.jpg

The predict endpoint will return a prediction response in JSON. It will look something like the following result:

[
  {
    "probability": 0.8582232594490051,
    "class": "n02124075 Egyptian cat"
  },
  {
    "probability": 0.09159987419843674,
    "class": "n02123045 tabby, tabby cat"
  },
  {
    "probability": 0.0374876894056797,
    "class": "n02123159 tiger cat"
  },
  {
    "probability": 0.006165083032101393,
    "class": "n02128385 leopard, Panthera pardus"
  },
  {
    "probability": 0.0031716004014015198,
    "class": "n02127052 lynx, catamount"
  }
]

You will see this result in the response to your curl call to the predict endpoint, and in the server logs in the terminal window running MMS. It's also being logged locally with metrics.

Other models can be downloaded from the model zoo, so try out some of those as well.

Now you've seen how easy it can be to serve a deep learning model with MMS! Would you like to know more?

Stopping the running model server

To stop the current running model-server instance, run the following command:

$ multi-model-server --stop

You would see output specifying that multi-model-server has stopped.

Create a Model Archive

MMS enables you to package up all of your model artifacts into a single model archive. This makes it easy to share and deploy your models. To package a model, check out model archiver documentation

Recommended production deployments

MMS doesn't provide authentication. You have to have your own authentication proxy in front of MMS.
MMS doesn't provide throttling, it's vulnerable to DDoS attack. It's recommended to running MMS behind a firewall.
MMS only allows localhost access by default, see Network configuration for detail.
SSL is not enabled by default, see Enable SSL for detail.
MMS use a config.properties file to configure MMS's behavior, see Manage MMS page for detail of how to configure MMS.
For better security, we recommend running MMS inside docker container. This project includes Dockerfiles to build containers recommended for production deployments. These containers demonstrate how to customize your own production MMS deployment. The basic usage can be found on the Docker readme.

Other Features

Browse over to the Docs readme for the full index of documentation. This includes more examples, how to customize the API service, API endpoint details, and more.

External demos powered by MMS

Here are some example demos of deep learning applications, powered by MMS:


Product Review Classification	Visual Search
Facial Emotion Recognition	Neural Style Transfer

Contributing

We welcome all contributions!

To file a bug or request a feature, please file a GitHub issue. Pull requests are welcome.

multi-model-server's People

Contributors

Stargazers

Watchers

Forkers

hanifmahboobi tspannhw xiangliu886 caozhengquan jbnunn knjcode ml-lab lupesko vhoakab84 victortomaz vdantu annbech ankkhedia charlieyou yupbank drsnowbird leopd thomasdelteil apple006 b0noi deep-learning-tools-bot realtime4life cncoder bonya deep-learning-mms-bot john-andrilla awschris kehimka brentkellmer bradsev hxl1990 eeddaann tomthekkan augmen aguamar aaronmarkham zbxzc35 pluto16 vijinkp frankfliu piyushghai zachgk abhinavs95 wei-he stu1130 dongdongju jmscraig catmium aiconsultant vrakesh sumsuddin chaibapchya ourobouros changya1990 ttumiel photoszzt shafiahmed fitrialif dni138 mbabby o7s8r6 parano eundoosong richardwan7 strategist922 dihong aaronxsu harshp8l spencerh-b lgov jamesliu lovehoroscoper munirvlts jamesewoo ivybazan gaybro8777 ggaaooppeenngg karan6181 andrewfayres tomz vinitkapoor jgabriellima usr-av dongfeiwww crmne negation zjz19960805 pks-os lightwind002 yakaib goswamig murari023 awesome-docs justinhochn navidmostofi kaynewest gachiemchiep wwmmqq alicialihong roy-engineering

multi-model-server's Issues

Start with Export first and then serving

I think it will be useful to start explaining first by the model format that the server expects, the export tool and then start showing the serving part.

model export failure - consumes all disk space

Env:
Windows 10 / Conda / Python 2.7

Attempted a caffenet export. Used files from the model zoo. First time failed because I had several symbol files in the same directory, but it made a caffenet.model file anyway. I tried serving this file but it said it was not a zip file. (this is probably a separate bug)
Second time I run export after moving the other symbol files away, then I see the error below and...

It will eat up all available disk space!!! It made some massive 3-GB file. Then I cleared some space and now have 6GB caffenet.model, but this model should only be about 238MB.

Repeatable, by just dropping a 0 byte x.model file and running export on x model's params/symbol/signature files. It will break when there's already an x.model file there.

Note that you will get this broken warning message (which should also be fixed):

  warnings.warn("%s.model already in %s and will be overwritten." % (model_name, model_path))

In the middle of this error:

(dms_p27) C:\Users\Aaron\Source\Repos\dms\examples\models\resnet-18>deep-model-export --model-name resnet-18 --model-path .
c:\users\aaron\appdata\local\conda\conda\envs\dms_p27\lib\site-packages\urllib3\contrib\pyopenssl.py:46: DeprecationWarning: OpenSSL.rand is deprecated - you should use os.urandom instead
  import OpenSSL.SSL
c:\users\aaron\appdata\local\conda\conda\envs\dms_p27\lib\site-packages\dms\export_model.py:90: UserWarning: resnet-18.model already in . and will be overwritten.
  warnings.warn("%s.model already in %s and will be overwritten." % (model_name, model_path))
Traceback (most recent call last):
  File "c:\users\aaron\appdata\local\conda\conda\envs\dms_p27\lib\runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "c:\users\aaron\appdata\local\conda\conda\envs\dms_p27\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Users\Aaron\AppData\Local\conda\conda\envs\dms_p27\Scripts\deep-model-export.exe\__main__.py", line 9, in <module>
  File "c:\users\aaron\appdata\local\conda\conda\envs\dms_p27\lib\site-packages\dms\export_model.py", line 214, in export
    _export_model(args)
  File "c:\users\aaron\appdata\local\conda\conda\envs\dms_p27\lib\site-packages\dms\export_model.py", line 94, in _export_model
    zip_file.write(item, os.path.basename(item))
  File "c:\users\aaron\appdata\local\conda\conda\envs\dms_p27\lib\zipfile.py", line 801, in __exit__
    self.close()
  File "c:\users\aaron\appdata\local\conda\conda\envs\dms_p27\lib\zipfile.py", line 1347, in close
    " would require ZIP64 extensions")
zipfile.LargeZipFile: Central directory offset would require ZIP64 extensions

(dms_p27) C:\Users\Aaron\Source\Repos\dms\examples\models\resnet-18>

Failed download not handled

If initial download fails, a partial file is created. Retries to run just give error that the file isn’t a zip file rather than checking the source, etc. Maybe should be downloaded ad temp and flip-activate? Or check with server that we have most recent file (date, size, etc.)?

Model name overlap not supported?

After initial download, changing name or url (anything but changing the actual model file name) results in no download attempt for the new item. I would expect the initial model_name= to be the “unique” identifier for another download, or something mixed with url, since “resnet-18.model” ay be a common name.

Provide details on dms_app.config settings

Several questions on this:

When would you change the Gunicorn arguments; for what purpose/effect?
I noticed change to the config from 1 worker to 4, and OMP_NUM_THREADS from 4 to 1. What's up with that? Why? Are they linked in this way where if I have a bigger instance I can go to 8 workers with two threads? Why not 4 workers and 4 threads? Or 64 workers and 4 threads?
What is worker-class? What options are there?
What is limit-request-line? What's the max? What impact does this have when changed?

params file inclusion (over/under)

dms requires a prefix-0000.params file.
dme does not.

Maybe if dms requires it, so should dme, or dms should be made to support non 0000 param files.

Example:
When exporting Inception-BN using the model zoo it has Inception-BN-0126.params. The export is successful, but when serving you get an error that it can't find 0000.params.
Copied and renamed the copy to 0000, the export doubles the size of the model file. It includes both. Changing the name to Inception-BN-0126.params.bak doesn't matter - it's still included.
Also, if you ran server already in that folder you now have a subfolder with a .params file in it. If you run export with just the params file in the parent, it finds this other params file in the subfolder and adds that too. Very greedy. :)

So two things here:

If it is so picky about it being specifically prefix-0000.params at the server cli, what is going on with the export cli!?
Why is it that I can just rename the checkpoint to 0000 and eventually get it to serve properly? Why not just pick prefix-####.params and rename it internally?

2.5 (and warn people before all params are rolled up into the .model file or don't let that happen)

Installing on python3 throws import error

Installing collected packages: itsdangerous, click, Flask, flask-cors, deep-model-server
Successfully installed Flask-0.12.2 click-6.7 deep-model-server-0.1 flask-cors-3.0.3 itsdangerous-0.24
[hadoop@ip-10-0-0-121 dms]$ deep-model-export
Traceback (most recent call last):
File "/usr/local/bin/deep-model-export", line 7, in
from mms.export_model import export
File "/usr/local/lib/python3.4/site-packages/mms/export_model.py", line 7, in
from arg_parser import ArgParser
ImportError: No module named 'arg_parser'

Export CLI model params confusing

Current export CLI --model parameters are designed with a multiple key-value pair, e.g. dms --model <model_name>=<model_path> which has a few issues:

It allows multiple models in a single package which is not supported by DMS
It expects a single JSON+Weights files pair in the model_path, but there can be cases where there are multiple pairs in the target path, and it is not clear which one to package

We can consider a few alternatives:

dms --model <model_prefix_path> --output <model_archive_name>
dms --model <model_archive_name>=<model_prefix_path> and return an error if more than one model_archive_name is specified.

model_prefix_path is the path plus the file name prefix of the JSON and Weights (so file names minus the extensions)
model_archive_name is the generated model archive file name, not including the prefix that gets added by the tool

server cannot bind to public IP

Trying to run:

deep-model-server --models squeezenet=https://s3.amazonaws.com/model-server/models/squeezenet_v1.1/squeezenet_v1.1.model --host 34.228.254.167 --port 8080

Gives the error Cannot assign requested address:

[ERROR 2017-11-09 20:50:18,592 PID:6760 /home/ec2-user/anaconda3/envs/mxnet_p27/lib/python2.7/site-packages/dms/deep_model_server.py:start_model_serving:97] Failed to start model serving host: Flask handler failed to start: [Errno 99] Cannot assign requested address

Tried different ports. Tried tinkering with inbound rules, but same error.
Binding to 0.0.0.0 doesn't throw an error, but trying to access the API gives a timeout.

Binding to an internal IP seems to work, but that's inaccessible.

Error in example signature.json

In readme, I tried to copy and modify example signature.json according to my application. We need to fix following issues:

Single quotes do not work. I get json parser error. Double quotes works fine.
input => inputs, output => outputs.

CloudWatch integration documentation

Write documentation how to publish metrics to cloudwatch in deep-model-server

arg_parser dependency missing

I assume that the pip install should handle all of the project dependencies, but I'm getting an error that I'm missing arg_parser.

(mxnet3.6) 8c8590217d26:Development markhama$ deep-model-server
Traceback (most recent call last):
  File "/Users/markhama/Development/mxnet3.6/bin/deep-model-server", line 7, in <module>
    from mms.mxnet_model_server import start_serving
  File "/Users/markhama/Development/mxnet3.6/lib/python3.6/site-packages/mms/mxnet_model_server.py", line 1, in <module>
    from arg_parser import ArgParser
ModuleNotFoundError: No module named 'arg_parser'

Pixel2pixel should be in examples, not under utils/service

We have code for a custom pixel2pixel model, I believe it was done as part of Yao's demo.
A better place for this code could be in the /examples folder with a new example for pixel2pixel?

how do you make sure your signature.json is correct?

When all you have is a symbol.json and a params file?

I think I'm seeing this fail with the caffenet conversion. The model exports just fine - no errors - but it can't be served. I tried messing with the outputs in signature and that leads me to believe that there's a problem there, but when I try other models like nin or inception, I can just use the resnet-18 signature and there's not problem. I want to make sure, however, that the signature.json is really correct before adding a .model file to the model zoo.

dms error output:

[21:27:10] C:\projects\mxnet-distro-win\mxnet-build\src\nnvm\legacy_json_util.cc:190: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[21:27:10] C:\projects\mxnet-distro-win\mxnet-build\src\nnvm\legacy_json_util.cc:198: Symbol successfully upgraded!
[21:27:10] C:\projects\mxnet-distro-win\mxnet-build\dmlc-core\include\dmlc/logging.h:308: [21:27:10] c:\projects\mxnet-distro-win\mxnet-build\src\operator\tensor\../elemwise_op_common.h:122: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node  at 0-th output: expected (4096,9216), got (4096,57600)
�[31mE1026 21:27:10 14580 c:\users\aaron\appdata\local\conda\conda\envs\dms_p27\lib\site-packages\dms\mxnet_model_server.py:_arg_process:140]�[0m Failed to process arguments: [21:27:10] c:\projects\mxnet-distro-win\mxnet-build\src\operator\tensor\../elemwise_op_common.h:122: Check failed: assign(&dattr, (*vec)[i]) Incompatible attr in node  at 0-th output: expected (4096,9216), got (4096,57600)

add docs on logging and new metrics features

docker docs updates needed

clone the repo to get the files
add mention of the GPU docs below
remove GPU mention not supported
add reminder about opening ports

Usage of relative imports is generally not preferred

In many parts of the code, relative imports are used. Example: from ..log import logger.
Relative imports are generally not encouraged. We should revisit and consider using the full path or absolute_import from future.

For example, running unit tests from pytest fails with

from ..log import get_logger
22:07:28 E ValueError: Attempted relative import beyond toplevel package

help output incorrect

When you don't provide minimum required inputs you get this response:

usage: mxnet-model-serving

It should say usage: mxnet-model-server or deep-model-server.

Also, when you use the -h flag, it mentions:

MXNet Model Serving, which it probably should say Deep Model Server instead.

Explain what is the significance of `name` in --models name=path argument

Explain that the name is used as a part of the URI to serve this particular model.

deep-model-server --models resnet-18=https://s3.amazonaws.com/mms-models/resnet-18.model

the name from the --models argument is used as a URI prefix. For the example above, the serving URI would be http://127.0.0.1/resnet-18/predict.

Create directory / not if note exist in user provided export_path for model export

should we create all directories in the user provided export_path where we need to export models?
Or
we should validate and print export_path provided directory does not exist

When I was exporting it was hard to quickly know because I did not had the directory and error was my_export_dir/resnet50.model does not exist.

swagger instructions needed

The guide tells you to use swagger_client for generated client code, but doesn't tell you how to install it or where to get it.

python export example doesn't work

I tried a couple of variations.
It seems like mms.export_model still exists, but I would have thought this would have been changed to be like dms instead of mms. But even the original example still doesn't work.

>>> import mxnet as mx
>>> from dms.export_model import export_serving
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named dms.export_model
>>> from mms.export_model import export_serving
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name export_serving
>>>

Update name to "Model Server for Apache MXNet" in code

We've decided to brand the product as "Model Server for Apache MXNet".
More specifically:

Product documentation should use the full name "Model Server for Apache MXNet" or a short hand "Model Server"
PIP package name will be "mxnet-model-server"
PIP commands will be "mxnet-model-server" and "mxnet-model-export"
Folders and files should be changes accordingly, e.g. "/dms" folder will be changed to "model_server"

The task is to perform all updates in the code.
This #73 is to update the docs.

Server start output

It would be great if when the server starts it gives you a valid URL rather than this thing that gives you a 404 which makes you think it's broken:
Service started at 127.0.0.1:8080/

Why not:

Service started. 
DMS API Description: http://127.0.0.1:8080/api-description
DMS API Health: http://127.0.0.1:8080/ping

Unable to find `portico-server`

Instructions in README.md specify running portico-server but that command is not found.

Input shape in signature enforces batch shape and replace with 1 during loading

https://github.com/awslabs/deep-model-server/blob/master/dms/model_service/mxnet_model_service.py#L94

For MXNet Module binding, we assume 1st dimension in input shape is batch and then replaces it with 1.

For example, if my input is a color image of 512*512, If I specify input shape (3, 512, 512) then during load time, MMS changes it to (1,512,512) assuming 3 is batch_size.

So input shape in the signature is basically input shape along with batch size used in training. We need to revisit this part.

Not all Logs written to the log file with --log-file option.

I started the server with `--log-file=/tmp/x.log`. Only the metrics and the response are written to this file and everything else is dumped to the console.
The logs i find in the log file

Initialized model serving.
Adding endpoint: squeezenet_predict to Flask
Adding endpoint: ping to Flask
Adding endpoint: api-description to Flask
Metric error_number for last 300 seconds is 0.000000
Metric requests_number for last 300 seconds is 0.000000
Metric cpu for last 300 seconds is 0.222000
Metric memory for last 300 seconds is 0.005583
Metric disk for last 300 seconds is 0.955000
Metric overall_latency for last 300 seconds is 0.000000
Metric inference_latency for last 300 seconds is 0.000000
Metric preprocess_latency for last 300 seconds is 0.000000
Service started successfully.
Service description endpoint: 127.0.0.1:8080/api-description
Service health endpoint: 127.0.0.1:8080/ping

Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
Request input: input0 should be image with jpeg format.
Getting file data from request.
Response is text.
Jsonifying the response: {'prediction': [[{'class': 'n02123394 Persian cat', 'probability': 0.8297115564346313}, {'class': 'n02086079 Pekinese, Pekingese, Peke', 'probability': 0.04721757397055626}, {'class': 'n02098413 Lhasa, Lhasa apso', 'probability': 0.019571054726839066}, {'class': 'n02113624 toy poodle', 'probability': 0.018856260925531387}, {'class': 'n02085936 Maltese dog, Maltese terrier, Maltese', 'probability': 0.016827790066599846}]]}
127.0.0.1 - - [09/Nov/2017 12:47:05] "POST /squeezenet/predict HTTP/1.1" 200

Latest pip build do not install deep-model-export as CLI utility

With latest pip build of deep-model-server, I am unable to use deep-model-export CLI utility.
Something in setup.py is broken?

I need to use full path - /usr/bin/.....

predict_test.py expects a cat image that's missing

Please test this test and assure that the required files are available in the repo, or supply a link to the cat image it is expecting.

warning message not populating

Env:
Windows 10 / Conda / Python 2.7

Run an export successfully:

('Successfully exported %s model. Model file is located at %s.', 'resnet-18', 'C:\\Users\\Aaron\\Source\\Repos\\dms\\examples\\models\\resnet-18\\resnet-18.model')

Document docker setup of TLS for exposing secured endpoints

Our docker setup readme (https://github.com/deep-learning-tools/deep-model-server/tree/master/docker) does not include instructions/mention on how to setup exposing TLS endpoint. This needs to be added since production endpoints should be exposing TLS by default.

Add service file which can handle json input

Confusing error message for 1st-time bad url

Bad URL given originally:
Common error isn’t very understandable in error output.

After one model is downloaded, the error is more sensicle.

Unable to predict on large image

I test with an image of size ~2MB and it returns the XML response that

<html>
<head><title>413 Request Entity Too Large</title></head>
<body bgcolor="white">
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx/1.4.6 (Ubuntu)</center>
</body>
</html>

Readme's "Start Serving" and other CLI examples include text that creates errors

Example: https://github.com/deep-learning-tools/deep-model-server#start-serving
The CLI includes params in squared brackets ([]) which fails the CLI, e.g. "deep-model-server --models resnet-18=https://s3.amazonaws.com/mms-models/resnet-18.model [--service mxnet_vision_service] [--gen-api python] [--port 8080] [--host 127.0.0.1]"

These needs to be removed; the CLI examples need to work as is, and optional parameters can be described later.

invalid inputs cause model server to seg fault and quit

i tried
a0999b061e03:~ ddivakar$ curl -X POST http://127.0.0.1:8080/resnet-18/predict -F "[email protected]"
curl: (52) Empty reply from server

src/io/image_io.cc:145: Invalid image file. Only supports png and jpg.
Segmentation fault: 11

numpy version warning

Should the pip package include the appropriate version of numpy?

8c8590217d26:Development markhama$ deep-model-server
RuntimeError: module compiled against API version 0xb but this version of numpy is 0x9
RuntimeError: module compiled against API version 0xb but this version of numpy is 0x9

I think it's "fixed" it via (I don't see the warning anymore):

8c8590217d26:Development markhama$ pip install -U numpy
Collecting numpy
  Downloading numpy-1.13.3-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.6MB)
    100% |████████████████████████████████| 4.6MB 271kB/s 
Installing collected packages: numpy
  Found existing installation: numpy 1.8.0rc1
    Not uninstalling numpy at /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python, outside environment /Users/markhama/Development/caffe2-2.7
Successfully installed numpy-1.13.3

Scalable behavior?

Are concurrent requests handed serially? When sending many concurrent requests, the pattern of handling appears to be serial (i.e. throughput seems to be similar to serial calls of the same count).

README required in docs/ folder

Will be good to have README file in docs/ folder. Will serve as a index page in docs/

Exception Handling

We need to revisit on exception handling - "error message" and "error codes" thrown back to users in various use cases like - Invalid Input, timeout, unknown exceptions and more.

I sent a wrong input received 500 error without an error message.

SSD integration test downloads 125 MB of model binaries

Integration test for SSD in dms/tests/integration_tests/ downloads MXNet model files (~125 MB).
We should probably have an integration test which needs smaller model file. If some developer is running this tests remotely, he would not be able to complete integration test quickly.

Ideally, we should create dms/tests/nightly_tests and move SSD test to this folder. And this test can run nightly on Jenkins CI.

Revisit effect of logging pre-process, post-process and inference time

Currently we log time for pre-process, post-process and inference time in "Debug Mode".
We need to revisit to answer the following:

Writing to log file is costly, in performance service use-case, this can cause significant bottle neck for the users. What is the optimal log details that gives enough information and do not become bottle neck? Can we give users an option?
[Imp] probably keep average inference time and log once in 5 minutes? to reduce effect of logging every request. And we can do this in log.INFO level.

README grammar

README: Data shape is a list of integer <— integer+s (integers). It “should contain” or “contains”. (would contain if the user creates is manually).

Ok, various grammar changes in README

Update name to "Model Server for Apache MXNet" in docs

We've decided to brand the product as "Model Server for Apache MXNet".
More specifically:

Product documentation should use the full name "Model Server for Apache MXNet" or a short hand "Model Server"
PIP package name will be "mxnet-model-server"
PIP commands will be "mxnet-model-server" and "mxnet-model-export"
Folders and files should be changes accordingly, e.g. "/dms" folder will be changed to "model_server"

The task is to perform all updates in the docs.
This #74 handles the required code updates.

Implementing new model archive format

Create signature.json files for the TBD models

https://github.com/awslabs/deep-model-server/blob/c45dccc5b00890a47d1b728b01e1f54307b862e0/docs/model_zoo.md

Some of the models need new sig files, or to be removed from the zoo list entirely.

Extending DMS code samples needs update

I see many restructuring the code related to utils and model service. Our documentation examples, mainly extended the service Python code samples needs to be fixed.

For example:

No module named mxnet_utils . This is used in overriding example
It should be - from dms.utils.mxnet import image
No module named mxnet_model_service
It should be - from dms.model_service import mxnet_model_service
In preprocess.
We should use data[0]. data is list of inputs.
Input Data must be a list
_preprocess must return list

Define output shape in signature file

We take output shape in signature file. This is the output shape of NDArray that we get after forward pass (inference).
This shape is not and should not be enforced on the output from the service.
We need clearly document on how we use this output_shape.

custom service docs update

Now that the custom service file is inside the model file, all of the references and instructions around export and custom service needs updating.