aws / sagemaker-pytorch-inference-toolkit Goto Github PK

Toolkit for allowing inference and serving with PyTorch on SageMaker. Dockerfiles used for building SageMaker Pytorch Containers are at https://github.com/aws/deep-learning-containers.

License: Apache License 2.0

Python 100.00%

sagemaker-pytorch-inference-toolkit's Introduction

SageMaker PyTorch Inference Toolkit

SageMaker PyTorch Inference Toolkit is an open-source library for serving PyTorch models on Amazon SageMaker. This library provides default pre-processing, predict and postprocessing for certain PyTorch model types and is responsible for starting up the TorchServe model server on SageMaker, which is responsible for handling inference requests.

For training, see SageMaker PyTorch Training Toolkit.

For the Dockerfiles used for building SageMaker PyTorch Containers, see AWS Deep Learning Containers.

For information on running PyTorch jobs on Amazon SageMaker, please refer to the SageMaker Python SDK documentation.

For notebook examples: SageMaker Notebook Examples.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

SageMaker PyTorch Serving Container is licensed under the Apache 2.0 License. It is copyright 2018 Amazon .com, Inc. or its affiliates. All Rights Reserved. The license is available at: http://aws.amazon.com/apache2.0/

sagemaker-pytorch-inference-toolkit's People

Contributors

Stargazers

Watchers

Forkers

amenityllc akartsky laurenyu dfan abhinavs95 arjkesh yystreet tusharkanekidey giuseppeporcelli wwwehr ranzvi pahorton rolliegoodman-wf awsavazq footage-firm tracycxw kokuno1122 ahsan-z-khan dhanainme michaelliu2 jbarz1 amaharek vdantu theisshe stevenmanton qpc-database munhouiani nskool rayanramoul saimidu davidthomas426 nelsontseng0704 js-dieu songm28 alfrizzle jiangbos svats2k kabyleai nthon mikesterlingw josephevans mseth10 satishpasumarthi qingzi-lan muskanmahajan486 test-mass-forker-org-1 rohithkrn waytrue17 cxz sachanub dz902 youarhache sephiartlist bhartigoel insutil-lab chen3933 edwardpwtsoi mattwonglfo namannandan vineet79ankam ivan-khvostishkov tencode pravali96 carljeske humanzz edge-ai-drinks seanpm2001 bnatng

sagemaker-pytorch-inference-toolkit's Issues

Documentation for inference.py `transform_fn`

What did you find confusing? Please describe.
Huggingface have documented how to use the sagemaker pytorch inference API in order to host their models. They make it quite clear that you must supply model_fn and then either transform_fn or (input_fn, predict_fn and output_fn). By using transform_fn you can have fine control of batch size for example, allowing you to handle large requests (in particular I have an issue where my batch transform jobs continuously die because the minimum payload of 1MB is way to large for my model - due to the large intermediate matrices I..e probabilities = batch_szie x num_labels)

I cannot find any mention of transform_fn in the documentation - https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html

It is mentioned in passing in one of the examples - https://sagemaker-examples.readthedocs.io/en/latest/frameworks/pytorch/get_started_mnist_deploy.html

Describe how documentation can be improved
Document the use of transform_fn as an alternative to input_fn, predict_fn and output_fn

Additional context
[Add any other context or screenshots about the documentation request here.]

This is how I was aware of transform_fn:

https://aws.amazon.com/blogs/machine-learning/run-computer-vision-inference-on-large-videos-with-amazon-sagemaker-asynchronous-endpoints/

The I found this:

(https://huggingface.co/docs/sagemaker/inference)

[Question] Using model.mar with built-in handler script

What did you find confusing? Please describe.
Hi, I've recently used a torchserve export utility provided by the MMDetection library. This uses the torch model archiver to package the following files into a model.mar:

Archive:  model.mar
  bbox_mAP_epoch_1.pth
  config.py
  mmdet_handler.py
  MAR-INF/MANIFEST.json

Is it possible to use this model.mar directly with SM TorchServe without needing to pull out the handler script and reformat it?

Additional context
I've added mmdet into my requirements.txt so the dependencies are not a problem.

The mmdet_handler.py script looks like this:

# Copyright (c) OpenMMLab. All rights reserved.
import base64
import os

import mmcv
import numpy as np
import torch
from ts.torch_handler.base_handler import BaseHandler

from mmdet.apis import inference_detector, init_detector
from mmdet.utils import register_all_modules

register_all_modules(True)


class MMdetHandler(BaseHandler):
    threshold = 0.5

    def initialize(self, context):
        properties = context.system_properties
        self.map_location = 'cuda' if torch.cuda.is_available() else 'cpu'
        self.device = torch.device(self.map_location + ':' +
                                   str(properties.get('gpu_id')) if torch.cuda.
                                   is_available() else self.map_location)
        self.manifest = context.manifest

        model_dir = properties.get('model_dir')
        serialized_file = self.manifest['model']['serializedFile']
        checkpoint = os.path.join(model_dir, serialized_file)
        self.config_file = os.path.join(model_dir, 'config.py')

        self.model = init_detector(self.config_file, checkpoint, self.device)
        self.initialized = True

    def preprocess(self, data):
        images = []

        for row in data:
            image = row.get('data') or row.get('body')
            if isinstance(image, str):
                image = base64.b64decode(image)
            image = mmcv.imfrombytes(image)
            images.append(image)

        return images

    def inference(self, data, *args, **kwargs):
        results = inference_detector(self.model, data)
        return results

    def postprocess(self, data):
        # Format output following the example ObjectDetectionHandler format
        output = []
        for data_sample in data:
            pred_instances = data_sample.pred_instances
            bboxes = pred_instances.bboxes.cpu().numpy().astype(
                np.float32).tolist()
            labels = pred_instances.labels.cpu().numpy().astype(
                np.int32).tolist()
            scores = pred_instances.scores.cpu().numpy().astype(
                np.float32).tolist()
            preds = []
            for idx in range(len(labels)):
                cls_score, bbox, cls_label = scores[idx], bboxes[idx], labels[
                    idx]
                if cls_score >= self.threshold:
                    class_name = self.model.dataset_meta['CLASSES'][cls_label]
                    result = dict(
                        class_label=cls_label,
                        class_name=class_name,
                        bbox=bbox,
                        score=cls_score)
                    preds.append(result)
            output.append(preds)
        return output

No model logs from PyTorch 1.10 SageMaker endpoint

Describe the bug

No model logs show up in endpoints created for PyTorch 1.10. Works fine by going to PyTorch 1.9 / 1.9.1, but not with 1.10.

Dependencies install correctly, and the model loads up, but there's no logs from the container that get forwarded to CloudWatch.

I tried updating to 2.8.0 within the container but that doesn't work because the properties file is different and it fails trying to find log4j.properties.

Expected behavior

Logs should be forwarded to CloudWatch.

Screenshots or logs

System information
A description of your system. Please provide:

Toolkit version: 2.7.0
Framework version: 1.10
Python version: 3.8
CPU or GPU: GPU
Custom Docker image (Y/N): N

Incorrect reporting of memory utilisation

Describe the bug
I'm running into issues with batch transform due to what I assume is an OOM condition. The main problem appears to be because as far as I can see there's no way to explicitly configure the batch_size for a batch transform that I'm aware of.

Instead the batch_size appears to be controlled by MaxPayloadInMB which has a minimum of 1. I added logging in my predict_fn and observe that I'm receiving a mix of batches containing 1000 examples, and some that contain 10k+ examples. The huge batches are pretty much 1MB is size - I have no idea where the batches of 1000 come from (I'm wondering if its splitting the last batch that is less than the 1MB payload).

The issue is that the large batches seem to occasionally cause the worker to crash - I suspect it's an out-of-memory (the obvious workaround is to pick a machine with more memory). When I look at the logs the maximum utilisation appears to be around 50% - but looking closer that metric appears wrong, the example below has MemoryUsed=3537.828125 / MemoryAvailable=3843.3515625 = MemoryUtilization=50%

Expected behavior
MemoryUtilization = 100.0 * MemoryUsed / MemoryAvailable

Screenshots or logs

2023-03-22T12:53:27.708+11:00 | 2023-03-22T01:53:26,857 [INFO ] pool-3-thread-2 TS_METRICS - MemoryAvailable.Megabytes:3843.3515625\|#Level:Host\|#hostname:4a73e96743e7,timestamp:1679450006
-- | --
  | 2023-03-22T12:53:27.708+11:00 | 2023-03-22T01:53:26,857 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUsed.Megabytes:3537.828125\|#Level:Host\|#hostname:4a73e96743e7,timestamp:1679450006
  | 2023-03-22T12:53:27.708+11:00 | 2023-03-22T01:53:26,857 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUtilization.Percent:50.0\|#Level:Host\|#hostname:4a73e96743e7,timestamp:1679450006

System information
A description of your system. Please provide:

Toolkit version: pytorch
Framework version: 1.13.1
Python version: 3.9
CPU or GPU: CPU
Custom Docker image (Y/N): No

Additional context
Add any other context about the problem here.

Support Elastic Inference for PyTorch 1.5.1

Hello,

the inference-toolkit hasn't yet adopted the change required to use Elastic Inference with PyTorch 1.5.1. Following this guide. It looks like the default_model_fn would have to attach the EIA to the model before returning. Similar to this .

Any feedback is appreciated.

Thank you,
Theiss

Serving a model using custom container, instance run of disk

Describe the bug
Using a custom container to serve a Pytorch model, defined as below, it throw "No space left on device"

container = {"Image": image, "ModelDataUrl": model_artifact}

create_model_response = sm.create_model(
    ModelName=model_name, ExecutionRoleArn=role, PrimaryContainer=container
)

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": "ml.g4dn.8xlarge",
            "InitialVariantWeight": 1,
            "InitialInstanceCount": 1,
            "ModelName": model_name,
            "VariantName": "AllTraffic",
        }
    ],
)

Docker image size is 17 GB and Torchserve mar file is 8 GB. I was wondering if there is any way to increase the storage for the instances that are serving the model. Going through the doc for endpoint configuration seems there is no setting for specifics about instances.

-- Cloud watch log

Expected behavior

Having knobs to set the storage for the serving instances.

How do I access Custom Attributes from the model during inference?

What did you find confusing? Please describe.
It is clear how to invoke an endpoint with Custom Attributes but it is completely unclear to me how one would access this information from the Pytorch SDK inference code. I can see for the Tensorflow SDK they have documentation for it and imply the second argument of the input handler function will provide access to it but it seems in the Pytorch framework it is just a string indicating the input content MIME type.

It would seem from the source that the context object is not passed to the transformation functions, so not accessible at all to the user.

Describe how documentation can be improved
Explanation of how to access provided custom attributes using the Pytorch Inference Toolkit

Additional context
I want to achieve what is being done here https://aws.amazon.com/marketplace/pp/prodview-5jlvp43tsn3ny?sr=0-1&ref_=beagle&applicationId=AWSMPContessa - namely the ability to provide a confidence threshold to an inference request. I would like to use a binary format for the body of the payload (image/jpeg) so I cannot provide parameters as part of the body itself like if the body were a JSON.

Generate Release?

Could you possibly generate a release for the latest version? Trying to get this added to conda-forge and need a release.

@jpeddicord @mvsusp @laurenyu @nskool @akartsky @ChuyangDeng

Thanks!

renaming of mxnet-model-server in sagemaker-inference package 1.5.3 causing entrypoint with command `serve` to fail

Describe the bug
sagemaker-inference recently (10/15) released v1.5.3, which included this commit updating the name of the model server artifact and command from mxnet-model-server to multi-model-server.

all containers defined in this repository install sagemaker-inference as a dependency of this repo itself, on lines

RUN pip install --no-cache-dir "sagemaker-pytorch-inference<2"

and this repo's setup.py has an install_requires which includes sagemaker-inference>=1.3.1. as a result, sagemaker-inference=1.5.3 installed.

so while the Dockerfile's CMD value (which calls mxnet-model-server directly) will succeed, attempts to use the ENTRYPOINT with serve as a build arg will fail with message:

Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 22, in <module>
    serving.main()
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/serving.py", line 39, in main
    _start_model_server()
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/conda/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/serving.py", line 35, in _start_model_server
    model_server.start_model_server(handler_service=HANDLER_SERVICE)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/model_server.py", line 94, in start_model_server
    subprocess.Popen(multi_model_server_cmd)
  File "/opt/conda/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/opt/conda/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'multi-model-server': 'multi-model-server'

To reproduce

build any container
mount a model and inference.py (e.g. half_plus_three) into /opt/ml/model
docker run [tag name] serve

Expected behavior
tensorflow serving serves the mounted model / inference.py

System information
A description of your system. Please provide:

Toolkit version: 2.0.5, but should apply to all versions
Framework version: 1.4, but should apply to all versions
Python version: 3.7
CPU or GPU: cpu, but should apply to both
Custom Docker image (Y/N): N

unable to build

I am unable to build an image with the dockerfile in this repo.
I'm using gpu version and below is the error.
How can I fix the error?

Environment : sagemaker
And I cloned this repository.

pytorch 1.4.0
Step 18/27 : COPY mms-entrypoint.py /usr/local/bin/dockerd-entrypoint.py COPY failed: stat /var/lib/docker/tmp/docker-builder661779586/mms-entrypoint.py: no such file or directory

pytorch 1.2.0, 1.3.1
Step 15/26 : RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && chmod +x ~/miniconda.sh && ~/miniconda.sh -b -p /opt/conda && rm ~/miniconda.sh && /opt/conda/bin/conda update conda && /opt/conda/bin/conda install -y python=$PYTHON_VERSION cython==0.29.12 ipython==7.7.0 mkl-include==2019.4 mkl==2019.4 numpy==1.16.4 scipy==1.3.0 typing==3.6.4 && /opt/conda/bin/conda clean -ya ---> Running in 1733b9e3bc06 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 /bin/sh: 1: /opt/conda/bin/conda: not found The command '/bin/sh -c curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && chmod +x ~/miniconda.sh && ~/miniconda.sh -b -p /opt/conda && rm ~/miniconda.sh && /opt/conda/bin/conda update conda&& /opt/conda/bin/conda install -y python=$PYTHON_VERSION cython==0.29.12 ipython==7.7.0 mkl-include==2019.4 mkl==2019.4 numpy==1.16.4 scipy==1.3.0 typing==3.6.4 && /opt/conda/bin/conda clean -ya' returned a non-zero code: 127

add environment variable "OMP_NUM_THREADS"

Describe the feature you'd like
A clear and concise description of the functionality you want.
add environment variable "OMP_NUM_THREADS" (default value :1) in CPU instances, and write this value into TorchServe config.properties.

How would this feature be used? Please describe.
A clear and concise description of the use case for this feature. Please provide an example, if possible.
Here is a related ticket in TorchServe side.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

How to extend the container with custom inference code?

I'm looking to extend the 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-eia:1.3.1-cpu-py36-ubuntu16.04 container for inference with Fairseq and BART. I've built my container and pushed it to ECR but I'm getting this error when running inference:

Please provide a model_fn implementation.
See documentation for model_fn at https://github.com/aws/sagemaker-python-sdk

My Dockerfile is as follows:

ARG REGION=us-east-1

FROM 763104351884.dkr.ecr.$REGION.amazonaws.com/pytorch-inference-eia:1.3.1-cpu-py36-ubuntu16.04

ENV PATH="/opt/ml/code:${PATH}"

COPY /bart /opt/ml/code

RUN pip3 install -r /opt/ml/code/requirements.txt

ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code

ENV SAGEMAKER_PROGRAM bart.py

bart.py contains the implementation of model_fn but apparently it isn't getting picked up. I've based my Dockerfile off of this one:

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/pytorch_extending_our_containers/container/Dockerfile

I assume things have changed and I have to provide my inference code in a different way. How do I do that please?

ModuleNotFoundError: Sagemaker only copies entry_point file to /opt/ml/code/ instead of the holy-cloned source code

I am using the Sagemaker Pytorch Estimator based on a custom docker image stored in AWS ECR.

from sagemaker.pytorch.estimator import PyTorch

    role = "arn:..."

    estimator = PyTorch(
        image_uri="1...ecr...amazonaws.com/...:prototype",
        git_config={"repo": "https://github.com/celsofranssa/LightningPrototype.git", "branch": "sagemaker"},
        entry_point="main.py",
        role=role,
        region="us-...",
        instance_type="local", # ml.g4dn.2xlarge
        instance_count=1,
        volume_size=225,
        hyperparameters=hparams
    )
    estimator.fit()

Sagemaker correctly clones the sources from GitHub and performs the checkout into the specified branch.

The Bug:
However, it only copies the main.py to /opt/ml/code inside the container instead of the holy-cloned source code, which causing ModuleNotFoundError: No module named 'source':

Traceback (most recent call last):
2y9byzwyxr-algo-1-reuoy  |   File "/opt/ml/code/main.py", line 15, in <module>
2y9byzwyxr-algo-1-reuoy  |     from source.helper.EvalHelper import EvalHelper
2y9byzwyxr-algo-1-reuoy  | ModuleNotFoundError: No module named 'source'

Logging the /opt/ml/code content only shows the main.py:

print(f"Content: {os.listdir(os.getcwd())}")
['main.py']

model_fn ignored on PyTorch v1.6 serving container?

Describe the bug

PyTorch inference container at v1.6 seems to ignore the provided model_fn() and attempt to load a model.pth file (non-existent in my case), resulting in an error for code which worked fine on the v1.4 container.

Maybe this is just a doc issue? I couldn't see any indication of how this override should be provided differently in v1.6.

To reproduce

Train a model in PyTorch framework container v1.6 and save it as a non-standard artifact (e.g. put it in a zip file inside the model.tar.gz, or something)
Create a PyTorchModel with e.g. source_dir="src/", entry_point="src/inference.py", where the entry point script defines a model_fn(model_dir: str)
Try to run the model e.g. as a batch transform.

(Will see if there's any public example I can link to to accelerate reproduction)

Expected behavior

The container calls the model_fn per the SageMaker SDK docs and loads the model successfully.

Screenshots or logs

My job appears to have generated A LOT of duplicate log entries repeating the below, before eventually going quiet/hanging. Still shows as in-progress with 0 CPU utilization many minutes later - much longer than the typical for same dataset on previous PyTorch versions - so I forcibly stopped it.

FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/model.pth'
Traceback (most recent call last):
  File "/opt/conda/bin/torch-model-archiver", line 10, in <module>
    sys.exit(generate_model_archive())
  File "/opt/conda/lib/python3.6/site-packages/model_archiver/model_packaging.py", line 60, in generate_model_archive
    package_model(args, manifest=manifest)
  File "/opt/conda/lib/python3.6/site-packages/model_archiver/model_packaging.py", line 37, in package_model
    model_path = ModelExportUtils.copy_artifacts(model_name, **artifact_files)
  File "/opt/conda/lib/python3.6/site-packages/model_archiver/model_packaging_utils.py", line 150, in copy_artifacts
    shutil.copy(path, model_path)
  File "/opt/conda/lib/python3.6/shutil.py", line 245, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/opt/conda/lib/python3.6/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:

System information
A description of your system. Please provide:

SageMaker Python SDK version: 2.15.0
Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
Framework version: 1.6
Python version: 3
CPU or GPU: GPU
Custom Docker image (Y/N): N

Additional context

N/A

BERT model loading not working with pytorch 1.3.1-eia container

I have a custom python file for inference in which I have implemented the functions model_fn, input_fn, predict_fn and output_fn. I have saved the model as a torchscript using torch.jit.trace, torch.jit.save and loading it using torch.jit.load. The model_fn implementation is as follows:

import torch
import os
import logging

logger = logging.getLogger()
is_ei = os.getenv("SAGEMAKER_INFERENCE_ACCELERATOR_PRESENT") == "true"
logger.warn(f"Elastic Inference enabled: {is_ei}")

def model_fn(model_dir):
    model_path = os.path.join(model_dir, "model_best.pt")
    try:
        loaded_model = torch.jit.load(model_path, map_location=torch.device('cpu'))
        loaded_model.eval()
        return loaded_model
    except Exception as e:
        logger.exception(f"Exception in model fn {e}")
        return None

This implementation works perfectly for the container with pytorch 1.5. But for container with torch 1.3.1 it exits abruptly when loading the pretrained model without any logs. The only line I see in the logs is

algo-1-nvqf7_1  | 2020-11-30 07:17:15,392 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died.
algo-1-nvqf7_1  | 2020-11-30 07:17:15,393 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Retry worker: 9000-44f1cd64 in 1 seconds.

The worker dies and tries to restart, and the process repeats till I stop the container.

The model I am using is trained with pytorch 1.5. But since EI support is only supported till 1.3.1, I am using this container.

Things I have tried:

The same code with same model works outside the container with pytorch version 1.3.1. So, I don't think pytorch version compatibility is the issue.
Tried using debug and notset levels for logs. Didn't get any more info as to why model loading fails
Tried loading the original model instead of the traced one. Again this works in 1.5 but not in 1.3.1. Fails at the same point, while loading the BERT pretrained model.
Tried this setup on sagemaker notebook instance with gpu accelerator and sagemaker PytorchModel's deploy() function with framework_version as 1.3.1. Also tried it using the 1.3.1 container without eia. Has same behaviour everywhere.

Am I doing something wrong or missing something crucial from the documentation? Any help would be much appreciated.

**Logs for container with torch 1.3.1-eia **

algo-1-nvqf7_1  | 2020-11-30 07:17:14,333 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
algo-1-nvqf7_1  | MMS Home: /opt/conda/lib/python3.6/site-packages
algo-1-nvqf7_1  | Current directory: /
algo-1-nvqf7_1  | Temp directory: /home/model-server/tmp
algo-1-nvqf7_1  | Number of GPUs: 0
algo-1-nvqf7_1  | Number of CPUs: 8
algo-1-nvqf7_1  | Max heap size: 6972 M
algo-1-nvqf7_1  | Python executable: /opt/conda/bin/python
algo-1-nvqf7_1  | Config file: /etc/sagemaker-mms.properties
algo-1-nvqf7_1  | Inference address: http://0.0.0.0:8080
algo-1-nvqf7_1  | Management address: http://0.0.0.0:8080
algo-1-nvqf7_1  | Model Store: /.sagemaker/mms/models
algo-1-nvqf7_1  | Initial Models: ALL
algo-1-nvqf7_1  | Log dir: /logs
algo-1-nvqf7_1  | Metrics dir: /logs
algo-1-nvqf7_1  | Netty threads: 0
algo-1-nvqf7_1  | Netty client threads: 0
algo-1-nvqf7_1  | Default workers per model: 1
algo-1-nvqf7_1  | Blacklist Regex: N/A
algo-1-nvqf7_1  | Maximum Response Size: 6553500
algo-1-nvqf7_1  | Maximum Request Size: 6553500
algo-1-nvqf7_1  | Preload model: false
algo-1-nvqf7_1  | Prefer direct buffer: false
algo-1-nvqf7_1  | 2020-11-30 07:17:14,391 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model
algo-1-nvqf7_1  | 2020-11-30 07:17:14,481 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_pytorch_serving_container.handler_service --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /home/model-server/tmp
algo-1-nvqf7_1  | 2020-11-30 07:17:14,482 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
algo-1-nvqf7_1  | 2020-11-30 07:17:14,482 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 51
algo-1-nvqf7_1  | 2020-11-30 07:17:14,482 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,483 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.6
algo-1-nvqf7_1  | 2020-11-30 07:17:14,483 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,487 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,496 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
algo-1-nvqf7_1  | 2020-11-30 07:17:14,544 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080
algo-1-nvqf7_1  | 2020-11-30 07:17:14,545 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
algo-1-nvqf7_1  | Model server started.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,547 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
algo-1-nvqf7_1  | 2020-11-30 07:17:14,962 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - PyTorch version 1.3.1 available.
algo-1-nvqf7_1  | 2020-11-30 07:17:15,314 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Lock 140580224398952 acquired on /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.lock
algo-1-nvqf7_1  | 2020-11-30 07:17:15,315 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/torch/transformers/tmpcln39mxo
algo-1-nvqf7_1  | 2020-11-30 07:17:15,344 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - 
algo-1-nvqf7_1  | 2020-11-30 07:17:15,349 [WARN ] W-9000-model-stderr com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]
algo-1-nvqf7_1  | 2020-11-30 07:17:15,349 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - storing https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt in cache at /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
algo-1-nvqf7_1  | 2020-11-30 07:17:15,349 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - creating metadata file for /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
algo-1-nvqf7_1  | 2020-11-30 07:17:15,349 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Lock 140580224398952 released on /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084.lock
algo-1-nvqf7_1  | 2020-11-30 07:17:15,350 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /root/.cache/torch/transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
algo-1-nvqf7_1  | 2020-11-30 07:17:15,378 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Created tokenizer
algo-1-nvqf7_1  | 2020-11-30 07:17:15,378 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Elastic Inference enabled: True
algo-1-nvqf7_1  | 2020-11-30 07:17:15,378 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - inside model fn
algo-1-nvqf7_1  | 2020-11-30 07:17:15,379 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - /.sagemaker/mms/models/model
algo-1-nvqf7_1  | 2020-11-30 07:17:15,379 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - /.sagemaker/mms/models/model/model.pt
algo-1-nvqf7_1  | 2020-11-30 07:17:15,379 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ['model.pt', 'model.tar.gz', 'code', 'model_tn_best.pth', 'MAR-INF']
algo-1-nvqf7_1  | 2020-11-30 07:17:15,379 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Loading torch script
algo-1-nvqf7_1  | 2020-11-30 07:17:15,392 [INFO ] epollEventLoopGroup-4-1 com.amazonaws.ml.mms.wlm.WorkerThread - 9000-44f1cd64 Worker disconnected. WORKER_STARTED
algo-1-nvqf7_1  | 2020-11-30 07:17:15,392 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.BatchAggregator - Load model failed: model, error: Worker died.
algo-1-nvqf7_1  | 2020-11-30 07:17:15,393 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Retry worker: 9000-44f1cd64 in 1 seconds.
algo-1-nvqf7_1  | 2020-11-30 07:17:16,065 [INFO ] W-9000-model ACCESS_LOG - /172.18.0.1:45110 "GET /ping HTTP/1.1" 200 8

Documentation link broken

In page https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#load-a-model

Text: "For more information on the default inference handler functions, please refer to: SageMaker PyTorch Default Inference Handler." has a link to
https://github.com/aws/sagemaker-pytorch-serving-container/blob/master/src/sagemaker_pytorch_serving_container/default_inference_handler.py

which does not exist and gives a 404

Should instead probably be:

https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/default_pytorch_inference_handler.py

Is this Dockerfile compatible with sagemaker elastic inference

What did you find confusing? Please describe.
This is Dockerfile link:
https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/docker/1.5.0/py3/Dockerfile.cpu
This link https://docs.aws.amazon.com/sagemaker/latest/dg/ei-endpoints.html#ei-endpoints-pytorch
states:
You can download the Elastic Inference enabled binary for PyTorch from the public Amazon S3 bucket at console.aws.amazon.com/s3/buckets/amazonei-pytorch. For information about building a container that uses the Elastic Inference enabled version of PyTorch, see Building your image.
I am confused. If I use the Dockerfile above, do I still need to download and install https://console.aws.amazon.com/s3/buckets/amazonei-pytorch to build docker container image?
If I want to use customer docker image for Sagemaker elastic inference, do I need to convert pytorch code into torchscript?
This part is not covered.
Can I use it for Python version >=3.7 and PyTorch version >=1.12?

Describe how documentation can be improved
A clear and concise description of where documentation was lacking and how it can be improved.

Additional context
Add any other context or screenshots about the documentation request here.

using cuda enabled pytorch image

What did you find confusing? Please describe.
kindly tell what is the dockerfile format in order to use cuda/gpu on sagemaker instance, i heard we have to use sagemaker-inference-toolkit , i tried using simple cuda image but its having problem serving
Describe how documentation can be improved
there was no examples/samples related to building image with cuda support
Additional context
Add any other context or screenshots about the documentation request here.

Specify batch size for MME

What did you find confusing? Please describe.
How do you specify batch size for MME models?

Describe how documentation can be improved
This blog describes using env vars to set batch size and other parameters for a single-model endpoint, however, I haven't found any documentation on setting batch size for individual models within a MME.

Additional context
Each model in my MME has a MAR-INF/MANIFEST.json within its model.tar.gz, so I tried to specify batchSize in these files, but I don't think it's being applied.

Switching to TorchServe for 1.6.0 inference has caused undocumented breaking changes and regression

See aws/sagemaker-python-sdk#1909.

Endpoint dependencies error

Describe the bug
I am using a package called Farm to perform the training and the inference. The training works fine: I pass the requirements.txt and all the packages are correctly installed. When I try to deploy it, there is also no problem. But at the moment of inference, when it is required to import farm, I get an error showing that the package doesn't exist.

To reproduce

estimator = PyTorch(
    base_job_name="imdb",
    max_run= 60*30,
    entry_point="train_aws.py",
    source_dir="source",  
    framework_version="1.5",
    py_version = 'py3',
    instance_count=1,
    role=role, 
    hyperparameters=hyperparameters,
    instance_type= "ml.p3.2xlarge", 
    output_path="s3://<bucket>/opt/ml/model", # the S3 path where model is outputted 
)

estimator.fit(
    {'training': '/'.join(training_input_path.split('/')[:-1])}, 
    wait=True
)

model = estimator.create_model(role=role, entry_point='inference.py')
mdm = multidatamodel.MultiDataModel(name="Models",
                                    model_data_prefix="s3://<bucket>/opt/ml/model/", 
                                    model = model,
                                    sagemaker_session=sess
                                   )
predictor = mdm.deploy(initial_instance_count=1,instance_type='ml.t2.medium', endpoint_name="inference")

Note: when I create the model.tar.gz, I include 'requirements.txt' and 'inference.py' on code folder.

Expected behavior

Inference should be working, but the packages are not being correctly installed. I try to do it using brute force, including these lines on inference.py:

import subprocess
import sys
subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'farm'])

But then it says that another package is missing: ImportError: cannot import name 'rng_integers'

Is there a better way to install the packages I want inside the inference container?

JVM detect the CPU count as 1 when more CPUs are available for the container.

Describe the bug
This issue is related to the issue JVM bug 82 in sagemaker-inference-toolkit

To reproduce

Clone the SaeMaker example
Deploy the model using the same endpoint.
Check CloudWatch logs and the number of CPU cores detected will be like Number of CPUs: 1
JVM detect the CPU count as 1 when more CPUs are available for the container.

Expected behavior
The CPU count from CloudWatch should match the CPU count for the used instance. For example, 4 if the instance is ml.m4.xlarge

System information
Container: pytorch-inference:1.7-cpu-py3 and pytorch-inference:1.7-gpu-py3
SageMaker inference v1.1.2

Additional context
This clearly does not allow the usage of all CPUs on the instance for Sagemaker Inference

MultiDataModel error during prediction: Please provide a model_fn implementation.

Describe the bug
When deploying a packaged PyTorch model using the PyTorchModel class I can successfully deploy and call the predict function, but as soon as I use the same model and pass it to a MultiDataModel class, the deployment process goes through, but when I call predictor.predict(data=data, target_model='model.tar.gz') I get the following error:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.".

I'm not sure if the error is related to the 'Please provide a model_fn implementation.' error I get in cloudwatch, but the model_fn function is in actually implemented and MultiDataModel somehow doesn't load it.

To reproduce

create a sample PyTorch model, train and package it.
deploy the model using PyTorchModel: (This will successfully deploy the model and when calling predictor.predict() successfully returns the inference results.

class InvoiceExtraction(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super().__init__(endpoint_name, sagemaker_session=sagemaker_session, serializer=json_serializer, 
                         deserializer=json_deserializer, content_type='application/json')
        
model = PyTorchModel(model_data=str('/home/ec2-user/SageMaker/model.tar.gz'),
                   name=name_from_base(MODEL_NAME),
                   role=role, 
                   entry_point='predictor.py',
                   framework_version='1.5.0', # Breaks for 1.6.0
                   py_version='py3',
                   predictor_cls=InvoiceExtraction)

predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge', endpoint_name=ENDPOINT_NAME)
predicted_value = predictor.predict(data=data)

if you deploy the model using a MultiDataModel instead, it will get deployed, but the predict function returns the error mentioned above.

model_data_prefix = 's3://multi-model-endpoint-models/'
model.sagemaker_session = sagemaker_session # not setting this results in the model's session not being initialized
mme = MultiDataModel(name=MODEL_NAME,
                     model_data_prefix=model_data_prefix,
                     model=model,# passing our pytorch model
                     sagemaker_session=sagemaker_session)

ENDPOINT_INSTANCE_TYPE = 'ml.m4.xlarge'
ENDPOINT_NAME = 'test-endpoint'

predictor = mme.deploy(initial_instance_count=1,
                       instance_type=ENDPOINT_INSTANCE_TYPE,
                       endpoint_name=ENDPOINT_NAME)

mme.add_model(model_data_source='/home/ec2-user/SageMaker/model.tar.gz', model_data_path='model.tar.gz')
list(mme.list_models())

predicted_value = predictor.predict(data=data, target_model='model.tar.gz')

Expected behavior
MultiDataModel should deploy and work without any errors.

Screenshots or logs
This is what's included in the CloudWatch logs:

2021-02-02 20:24:54,652 [INFO ] W-9000-2093075ac497ff81bd6238817 com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 1

2021-02-02 20:24:54,653 [WARN ] W-9000-2093075ac497ff81bd6238817 com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker thread exception.

java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n:

Please provide a model_fn implementation.

See documentation for model_fn at https://github.com/aws/sagemaker-python-sdk

#011at io.netty.handler.codec.http.HttpResponseStatus.(HttpResponseStatus.java:555)

#011at io.netty.handler.codec.http.HttpResponseStatus.(HttpResponseStatus.java:537)

#011at io.netty.handler.codec.http.HttpResponseStatus.valueOf(HttpResponseStatus.java:465)

#011at com.amazonaws.ml.mms.wlm.Job.response(Job.java:85)

#011at com.amazonaws.ml.mms.wlm.BatchAggregator.sendResponse(BatchAggregator.java:85)

#011at com.amazonaws.ml.mms.wlm.WorkerThread.run(WorkerThread.java:146)

#011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

#011at java.util.concurrent.FutureTask.run(FutureTask.java:266)

#011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

#011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

#011at java.lang.Thread.run(Thread.java:748)

2021-02-02 20:24:54,655 [ERROR] W-9000-2093075ac497ff81bd6238817 com.amazonaws.ml.mms.wlm.BatchAggregator - Unexpected job: 99ef86e2-aedf-47d7-8f6c-950fde1bec88

System information
A description of your system. Please provide:

Toolkit version: Latest
Framework version: tried both on 1.5.0 and 1.6.0
Python version: 3.6
CPU or GPU: CPU
Custom Docker image (Y/N): N

Additional context
Add any other context about the problem here.

requirements.txt not honored in 1.2.0

Hi,

I have a requirements.txt file in the same directory as my generate.py. When I deploy this model, if I use the container for pytorch 1.3.1, the packages in requirements.txt get installed (expected behavior).
However, when using pytorch 1.2.0, the requirements.txt file is ignored and no specified packages are installed. Where is the requirements.txt file used to install the packages? Can someone point me to that?

Thanks
Akhil

Cannot change the CloudWatch log level

Describe the bug
The PyTorch SageMaker endpoint cloudwatch log level is INFO only which cannot be changed without creating a BYO container.

Hence all the access including /ping besides the /invocations are generating logs that clutters the cloudwarch log stream making it difficult to go directly to the errors for troubleshooting. In my understanding, this will incur the cloudwatch cost as well.

2020-08-28 00:31:16,598 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:54100 "GET /ping HTTP/1.1" 200 0Copy2020-08-28 00:31:16,598 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:54100 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:31:21,598 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:54100 "GET /ping HTTP/1.1" 200 0Copy2020-08-28 00:31:21,598 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:54100 "GET /ping HTTP/1.1" 200 0
... repetition of the INFO on /ping.

The AWS support case 7309023801 was opened and it was indicated the log level cannot be changed, or need to build our own container to control the log level.


the sagemaker python inference serving container module with earlier version need to change the log4j from the sagemaker inference toolkit as it was built based on the sagemaker inference module, we need to change the log4j in the sagemaker inference toolkit. So I have built two wheel files for my custom docker container and installed both of them in the Dockerfile:

=========
#RUN pip install --no-cache-dir "sagemaker-pytorch-inference<2"
COPY sagemaker_inference-1.5.3.dev0-py2.py3-none-any.whl /tmp
RUN pip install /tmp/sagemaker_inference-1.5.3.dev0-py2.py3-none-any.whl

COPY sagemaker_pytorch_inference-1.5.1-py2.py3-none-any.whl /tmp
RUN pip install /tmp/sagemaker_pytorch_inference-1.5.1-py2.py3-none-any.whl
=========

Then I built and pushed the docker to my ECR and used that docker image to deploy the model to an endpoint. I was only seeing the logs related to the model server, for example:

=========
EVENTS	1598507953058	2020-08-27 05:59:07,670 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 533	1598507947905
EVENTS	1598508601335	2020-08-27 06:09:55,987 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 62	1598508596097
EVENTS	1598508607396	2020-08-27 06:10:01,440 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 292	1598508602099
=========

The ping health check logs are all gone. Please note that, as the logs for the invocation requests handled by the model server is logged at INFO level, so I would suggest to keep the log4j setting as below:

=========
log4j.logger.com.amazonaws.ml.mms = INFO, mms_log
log4j.logger.ACCESS_LOG = ERROR, access_log
log4j.logger.MMS_METRICS = WARN, mms_metrics
log4j.logger.MODEL_METRICS = WARN, model_metrics
log4j.logger.MODEL_LOG = WARN, model_log
=========

Now let's come back to the issue of changing logging level. The environment variable you used 'SAGEMAKER_CONTAINER_LOG_LEVEL' is the correct one. From our sagemaker sdk source code [2][3], you can see that the service used the same value when it is set in the python sdk. I have used our example notebook and used the parameter in the PyTorchModel function as below:

----
sagemaker_model = PyTorchModel(
                                  model_data = 's3://<mybucket>/pytorch-training-2020-08-26-00-44-33-303/output/model.tar.gz',
                                  role = role,
                                  container_log_level=40,
                                  py_version='py3',
                                  entry_point = 'mnist.py')
----

I can confirm the environment variable set up for the model is 
Key	                                                              Value
SAGEMAKER_CONTAINER_LOG_LEVEL	40

However, this value is only for the logs that are generated by python logging (entrypoint script).  I have tested this with a training job and can see the logs only shows DEBUG or ERROR. I modified the entrypoint script to print the environment variables and also trying to use logger.info and logger.error to add test logs to the model_fn as below:

----
def model_fn(model_dir):
    logger.info("Run model function")
    print(os.environ)
    logger.error("display error message")
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = torch.nn.DataParallel(Net())
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f))
    return model.to(device)
----

I can confirm that the logger.info is not showing in the cloudwatch and the 'display error message' logged by logger.error was showing in the cloudwatch log. This means the container log level is correct.

As in pytorch serving containers, we are using 'mxnet-model-server' which handles the \ping and \invocation http calls for you [8]. The model server will not be controlled by the container logging level. The logs that you have seen in the endpoint shown as [INFO] is defined here [5] in the log4j.properties. For example, in the endpoint logs, you may see the below entry:

2020-08-26 03:27:10,540 [INFO ] pool-1-thread-2 ACCESS_LOG - /10.32.0.2:52222 "GET /ping HTTP/1.1" 200 22

This is defined in the "log4j.logger.ACCESS_LOG = INFO, access_log" section in [5]. 


Therefore, to change the log levels for the serving module you need to build your own pytorch image based on our github repo [6]. But this will require you to make necessary code change to our Dockerfile as well as installing additional modules.

To reproduce
Deploy a PyTorch Model where Python log level is set to logging.ERROR via SageMaker SDK and refer to the cloudwach log for /aws/sagemaker/Endpoints/<endpoint_name>.

import logging

logging.basicConfig()
logging.getLogger().setLevel(logging.ERROR)

Expected behavior
The log level configuration is reflected and only ERROR will be logged in the cloudwatch.

System information
SageMaker endpoint in us-east-1.

Toolkit version:
Not sure
Framework version:
PyTorch 1.4.0, 1.5.1
Python version:
Python 3.6
CPU or GPU:
GPU
Custom Docker image (Y/N):
N

Additional context

Endpoint startup message in the cloudwatch.

Warning: Calling MMS with mxnet-model-server. Please move to multi-model-server.
2020-08-28 00:29:08,131 [INFO ] main com.amazonaws.ml.mms.ModelServer -
MMS Home: /opt/conda/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of CPUs: 1
Max heap size: 3806 M
Python executable: /opt/conda/bin/python
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2020-08-28 00:29:08,222 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model
2020-08-28 00:29:08,320 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_pytorch_serving_container.handler_service --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /home/model-server/tmp
2020-08-28 00:29:08,321 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
2020-08-28 00:29:08,321 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 43
2020-08-28 00:29:08,321 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2020-08-28 00:29:08,321 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.10
2020-08-28 00:29:08,322 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded.
2020-08-28 00:29:08,328 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2020-08-28 00:29:08,341 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2020-08-28 00:29:08,406 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080
Model server started.
2020-08-28 00:29:08,408 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
2020-08-28 00:29:08,409 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2020-08-28 00:29:08,972 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - PyTorch version 1.5.1 available.
2020-08-28 00:29:42,837 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 34388
2020-08-28 00:29:42,839 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-1
2020-08-28 00:29:55,821 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 19
2020-08-28 00:30:00,606 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 1
2020-08-28 00:30:05,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:10,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:15,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:20,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:25,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:30,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
...

Launch TorchServe without repackaging model contents

Currently, the startup code will repackage the model contents in environment.model_dir into TS format using the TS model archiver: https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/torchserve.py#L78

This causes the model contents to be read and rewritten to disk on container startup, which increases container startup time. For SageMaker Serverless Inference, this causes cold starts to be longer (even longer for larger models).

TorchServe is making a change to support loading models from a directory without the need to repackage the model as a .mar file: pytorch/serve#1498

This issue is to request for this inference toolkit to use this new feature and avoid repackaging the model contents. I /think/ this should be as simple as removing the model archiver command execution and setting [--models in the TorchServe command to --models model=environment.model_dir

Reuse the requirements.txt installation logic from sagemaker-inference-toolkit

Describe the feature you'd like

sagemaker-inference-toolkit is in the middle of introducing support for installing requirements.txt dependencies from CodeArtifact in aws/sagemaker-inference-toolkit#130

Additional context

https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/torchserve.py#L81 is identical to https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/model_server.py#L82
Reusing the logic from sagemaker-inference-toolkit would allow sagemaker pytorch containers to get the support for CodeArtifact

how to use gpu in sagemaker instance

hi i am going with a custom docker image with all the cuda cudnn installed and also tested locally gpu is being utilized. but when upload to ecr and create endpoint it does not create endpoint and says kindly make sure docker serve command is valid , from debugging i came to found out that inference toolkit is needed inside image for the image to see if sagemaker gpu is avail or not, but there is no sample dockerfile from which i can understand , kindly tell
1)how to enable cuda support in custom built docker images for sagemaker
2)will using prebuilt images e.g accountnum.aws.amazon.com/pytorch:1.10-cuda113-py3 directly use cuda/gpu of sagemaker instance?

MMS mode in inference does not support in GPU instance

I created the image using 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker, but I cannot deploy with MMS mode for GPU instance. ClientError: An error occurred (ValidationException) when calling the CreateEndpointConfig operation: MultiModel mode is not supported for instance type ml.g4dn.xlarge.
from here, it said GPU instance is not supported aws/sagemaker-python-sdk#1323

So why does the Pytorch GPU prebuilt image uses MMS as the model server, while the inference endpoint does not support it?

response = sagemaker_client.create_endpoint_config(
                EndpointConfigName = 'MultiModelConfig',
                ProductionVariants=[
                     {
                        'InstanceType':        'ml.g4dn.xlarge',
                        'InitialInstanceCount': 1,
                        'InitialVariantWeight': 1,
                        'ModelName':            'MultiModel',
                        'VariantName':          'AllTraffic'
                      }
                ]
           )
print(response)

MultiDataModel fails at installing requirements.txt

Describe the bug
The packages in requirements.txt are not installed, when deploying a MultiDataModel with Sagemaker.

To reproduce
Deploy a model to sagemaker that depends on requirements.txt installing specific packages, e.g. pandas.

My architecture looks like this:

model.tar.gz/
| - ....
| - code/
  | - inference.py
  | - requirements.txt

Expected behavior
It is expected that the requirements are installed.

Additional context
I first had the same issues as here, which is solved by the inference.py being be included in the model.tar.gz - but it also indicates that all files inside /opt/ml/model are removed when the image is being spun up. I suspect that requirements.txt is therefore also removed and first "recovered" again later, when the model.tar.gz is unpacked. This would explain why the requirements are never installed.

Zombie process exception

Describe the bug
Getting zombie process exception as already reported for the sagemaker-inference-toolkit

To reproduce
Using 763104351884.dkr.ecr.eu-central-1.amazonaws.com/pytorch-inference:2.2.0-gpu-py310-cu118-ubuntu20.04-sagemaker and custom inference script in a batch-transform causes to trigger such error. Even a simple initial time.sleep(60) in the inference.py script can be used to trigger the error.
A custom requirements.txt file also needs to be provided with custom inference script.

Here the full traceback:

Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 23, in <module>
    serving.main()
  File "/opt/conda/lib/python3.10/site-packages/sagemaker_pytorch_serving_container/serving.py", line 38, in main
    _start_torchserve()
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 56, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 257, in call
    return attempt.get(self._wrap_exception)
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 301, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/conda/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 251, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/conda/lib/python3.10/site-packages/sagemaker_pytorch_serving_container/serving.py", line 34, in _start_torchserve
    torchserve.start_torchserve(handler_service=HANDLER_SERVICE)
  File "/opt/conda/lib/python3.10/site-packages/sagemaker_pytorch_serving_container/torchserve.py", line 102, in start_torchserve
    ts_process = _retrieve_ts_server_process()
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 56, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 266, in call
    raise attempt.get()
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 301, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/conda/lib/python3.10/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/lib/python3.10/site-packages/retrying.py", line 251, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/conda/lib/python3.10/site-packages/sagemaker_pytorch_serving_container/torchserve.py", line 187, in _retrieve_ts_server_process
    if TS_NAMESPACE in process.cmdline():
  File "/opt/conda/lib/python3.10/site-packages/psutil/__init__.py", line 719, in cmdline
    return self._proc.cmdline()
  File "/opt/conda/lib/python3.10/site-packages/psutil/_pslinux.py", line 1714, in wrapper
    return fun(self, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/psutil/_pslinux.py", line 1853, in cmdline
    self._raise_if_zombie()
  File "/opt/conda/lib/python3.10/site-packages/psutil/_pslinux.py", line 1758, in _raise_if_zombie
    raise ZombieProcess(self.pid, self._name, self._ppid)

System information
A description of your system. Please provide:

Sagemaker model image: 763104351884.dkr.ecr.eu-central-1.amazonaws.com/pytorch-inference:2.2.0-gpu-py310-cu118-ubuntu20.04-sagemaker
Sagemaker model mode: single-mode
Batch-transform instance type: ml.g4dn.2xlarge
Batch-transform Invocation timeout in seconds: 600

docker build should aimed to cpu

https://github.com/aws/sagemaker-pytorch-serving-container/blob/79fa1f87e44f5f4f1420aa3a64514e352c7f3bf1/buildspec.yml#L57

Prepend `code_dir` to `sys.path` rather than `append`

Describe the bug
The bug is that the semantics of running python scripts prioritize the directory containing the script by prepending to sys.path. To have similar semantics, we should prepend code_dir to sys.path rather than append here:

sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/handler_service.py

Lines 46 to 49 in 6610a41

    
           if (not self._initialized) and ENABLE_MULTI_MODEL: 
        
               code_dir = os.path.join(context.system_properties.get("model_dir"), 'code') 
        
               sys.path.append(code_dir) 
        
               self._initialized = True

Here's a quick example showing the prepend semantics when running a script from the command line.

First, put this in a file './code/myscript.py' in a local shell environment:

import sys
print(sys.path)

Then run it:

$ pwd
/home/ubuntu
$ python3 ./code/main.py
['/home/ubuntu/code', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/usr/local/lib/python3.8/dist-packages',  '/usr/lib/python3/dist-packages']

The current appending behavior would cause an issue for a customer put a filename in code_dir that clashed with an installed package. If the customer ran their inference script locally, it would load their file due to prepend semantics, but when deploying to MME with this toolkit's handler, it would prioritize the installed package instead.

The single-model endpoint case is already prepended:

sagemaker-pytorch-inference-toolkit/src/sagemaker_pytorch_serving_container/torchserve.py

Lines 112 to 120 in fb65d8a

    
           def _set_python_path(): 
        
               # Torchserve handles code execution by appending the export path, provided 
        
               # to the model archiver, to the PYTHONPATH env var. 
        
               # The code_dir has to be added to the PYTHONPATH otherwise the 
        
               # user provided module can not be imported properly. 
        
               if PYTHON_PATH_ENV in os.environ: 
        
                   os.environ[PYTHON_PATH_ENV] = "{}:{}".format(environment.code_dir, os.environ[PYTHON_PATH_ENV]) 
        
               else: 
        
                   os.environ[PYTHON_PATH_ENV] = environment.code_dir

Other sagemaker inference toolkits already prepend, as well. See how sagemaker-huggingface-inference-toolkit handles this (https://github.com/aws/sagemaker-huggingface-inference-toolkit/blob/2f1fae5cbb3b68299e73cc591c0a912b7cccee29/src/sagemaker_huggingface_inference_toolkit/handler_service.py#L72-L73), as well as how sagemaker-inference-toolkit and sagemaker-mxnet-inference-toolkit try to handle this (though they have their own bug in this part of the code--see aws/sagemaker-mxnet-inference-toolkit#135).

Improve debuggability during model load and inference failures

Describe the feature you'd like
Enable logging errors with traceback during model load and inference to help with debugging.

Current implementation:

sagemaker-pytorch-inference-toolkit/src/sagemaker_inference/transformer.py

Lines 159 to 168 in c7365b9

    
           except Exception as e:  # pylint: disable=broad-except 
        
               trace = traceback.format_exc() 
        
               if isinstance(e, BaseInferenceToolkitError): 
        
                   return self.handle_error(context, e, trace) 
        
               else: 
        
                   return self.handle_error( 
        
                       context, 
        
                       GenericInferenceToolkitError(http_client.INTERNAL_SERVER_ERROR, str(e)), 
        
                       trace, 
        
                   )

How would this feature be used? Please describe.
Errors during model loading and inference will be logged.

Describe alternatives you've considered
N/A

Additional context
This is useful in scenarios where there's no direct access to an endpoint where a model is deployed, for ex: Sagemaker Endpoint, where we only have access to logs.

Need for a minimum reproducible example in readme.md

It would help if you add a minimal example on how to use the toolkit in the repo's readme.md.

Thanks.

Batch Inference does not work when using the default handler

Describe the bug

In batch inference, the model-server (in this case, torchserve) will return a 'batch' i.e list of requests to the handler. The handler is expected to process them and send back the responses. This would be a list of 'batch-size' responses.
Currently, the pt toolkit uses the transform() function from the base inference-toolkit to receive requests from the model server, and process them by calling the _transform_fn() i.e [which calls _input_fn, _predict_fn, _output_fn].
However, it seems to only process the 'first' request in the batch: https://github.com/aws/sagemaker-inference-toolkit/blob/master/src/sagemaker_inference/transformer.py#L114
When using the default handler, all but the first requests get dropped.
This restricts using the default handler for batch inference.

To reproduce
A clear, step-by-step set of instructions to reproduce the bug:

A run-through of this notebook: https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/pytorch_batch_inference/sagemaker_batch_inference_torchserve.ipynb results in the failure (log attached in respective section).
[This notebook has a workaround PR using custom container: https://github.com/aws/amazon-sagemaker-examples/pull/3395]

Expected behavior

The transform() function should process all requests in a batch, and return a list of responses which equals in size to the input list.
Attaching an untested suggestion in the form of a PR: link

Screenshots or logs

A run of the notebook results in:

2022-04-27T20:51:36,026 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1651092696026
2022-04-27T20:51:36,028 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Backend received inference at: 1651092696
2022-04-27T20:51:36,028 [WARN ] W-9000-model_1.0-stderr MODEL_LOG - Downloading: 100%|██████████| 28.0/28.0 [00:00<00:00, 42.8kB/s]
2022-04-27T20:51:36,028 [WARN ] W-9000-model_1.0-stderr MODEL_LOG - Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
2022-04-27T20:51:36,119 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT1
2022-04-27T20:51:36,120 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - INPUT2
2022-04-27T20:51:36,120 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Got input Data: {Bloomberg has decided to publish a new report on global economic situation.}
2022-04-27T20:51:36,120 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PRED SequenceClassifierOutput(loss=None, logits=tensor([[ 0.1999, -0.2964]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
2022-04-27T20:51:36,120 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - PREDICTION ['Not Accepted']
2022-04-27T20:51:36,120 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - model: model, number of batch response mismatched, expect: 3, got: 1.
2022-04-27T20:51:36,121 [INFO ] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 94

System information
A description of your system. Please provide:

Toolkit version:
Framework version:
Python version: py38
CPU or GPU: CPU
Custom Docker image (Y/N): N, DLC:

Additional context
Add any other context about the problem here.

Document how to locally run the container

What did you find confusing? Please describe.

I tried to extend the image adding my code and run it locally. However, the server does not start and it doesn't publish any logs from our scripts.

Dockerfile:

FROM 763104351884.dkr.ecr.eu-west-1.amazonaws.com/pytorch-inference:1.10.0-cpu-py38

ENV SAGEMAKER_PROGRAM "my_amazing_entrypoint.py"
ENV SAGEMAKER_REGION "eu-west-1"
ENV SAGEMAKER_SUBMIT_DIRECTORY "/opt/ml/model/code"

WORKDIR "/opt/ml/model/"
COPY model_new.tar.gz "/opt/ml/model/model.tar.gz"
RUN tar -xf model.tar.gz

model.tar.gz:

.
| - code/
  | - my_amazing_entrypoint.py
  | - more_packages/
| - pytorch_model.pth

Commands executed:

docker build -t pytorch-test .
docker run -ti pytorch-test

Output:

Warning: TorchServe is using non-default JVM parameters: -XX:+UseContainerSupport -XX:InitialRAMPercentage=8.0 -XX:MaxRAMPercentage=10.0 -XX:-UseLargePages -XX:+UseG1GC -XX:+ExitOnOutOfMemoryError
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2022-05-25T10:26:35,124 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2022-05-25T10:26:35,219 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.5.2
TS Home: /opt/conda/lib/python3.8/site-packages
Current directory: /opt/ml/model
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 8
Max heap size: 3166 M
Python executable: /opt/conda/bin/python3.8
Config file: /home/model-server/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://127.0.0.1:8082
Model Store: /home/model-server
Initial Models: ALL
Log dir: /opt/ml/model/logs
Metrics dir: /opt/ml/model/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 8
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Metrics report format: prometheus
Enable metrics API: true
Workflow Store: /home/model-server
Model config: N/A
2022-05-25T10:26:35,225 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2022-05-25T10:26:35,247 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: tmp
2022-05-25T10:26:35,249 [WARN ] main org.pytorch.serve.ModelServer - Failed to load model: /home/model-server/tmp
org.pytorch.serve.archive.model.ModelNotFoundException: Model not found at: tmp
        at org.pytorch.serve.archive.model.ModelArchive.downloadModel(ModelArchive.java:75) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.ModelManager.createModelArchive(ModelManager.java:167) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.ModelManager.registerModel(ModelManager.java:133) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.ModelManager.registerModel(ModelManager.java:69) ~[model-server.jar:?]
        at org.pytorch.serve.ModelServer.initModelStore(ModelServer.java:194) [model-server.jar:?]
        at org.pytorch.serve.ModelServer.startRESTserver(ModelServer.java:356) [model-server.jar:?]
        at org.pytorch.serve.ModelServer.startAndWait(ModelServer.java:117) [model-server.jar:?]
        at org.pytorch.serve.ModelServer.main(ModelServer.java:98) [model-server.jar:?]
2022-05-25T10:26:35,264 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2022-05-25T10:26:35,325 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2022-05-25T10:26:35,325 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2022-05-25T10:26:35,327 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2022-05-25T10:26:35,327 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2022-05-25T10:26:35,328 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2022-05-25T10:26:35,588 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:d90f998d682a,timestamp:1653474395
2022-05-25T10:26:35,589 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:334.71700286865234|#Level:Host|#hostname:d90f998d682a,timestamp:1653474395
2022-05-25T10:26:35,590 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:157.27763748168945|#Level:Host|#hostname:d90f998d682a,timestamp:1653474395
2022-05-25T10:26:35,590 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:32.0|#Level:Host|#hostname:d90f998d682a,timestamp:1653474395
2022-05-25T10:26:35,590 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:27624.3125|#Level:Host|#hostname:d90f998d682a,timestamp:1653474395
2022-05-25T10:26:35,591 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:3569.87109375|#Level:Host|#hostname:d90f998d682a,timestamp:1653474395
2022-05-25T10:26:35,591 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:12.7|#Level:Host|#hostname:d90f998d682a,timestamp:1653474395

Describe how documentation can be improved

It would be nice to add a section in the README file (or similar) with an example on how to run the image / container in a local docker installation.

Additional context

This would improve developer experience by reducing the amount of time between trial and error. (It takes a while to deploy to Sagemaker)
We can easily debug problems step by step.

	if (not self._initialized) and ENABLE_MULTI_MODEL:
	code_dir = os.path.join(context.system_properties.get("model_dir"), 'code')
	sys.path.append(code_dir)
	self._initialized = True

	def _set_python_path():
	# Torchserve handles code execution by appending the export path, provided
	# to the model archiver, to the PYTHONPATH env var.
	# The code_dir has to be added to the PYTHONPATH otherwise the
	# user provided module can not be imported properly.
	if PYTHON_PATH_ENV in os.environ:
	os.environ[PYTHON_PATH_ENV] = "{}:{}".format(environment.code_dir, os.environ[PYTHON_PATH_ENV])
	else:
	os.environ[PYTHON_PATH_ENV] = environment.code_dir

	except Exception as e: # pylint: disable=broad-except
	trace = traceback.format_exc()
	if isinstance(e, BaseInferenceToolkitError):
	return self.handle_error(context, e, trace)
	else:
	return self.handle_error(
	context,
	GenericInferenceToolkitError(http_client.INTERNAL_SERVER_ERROR, str(e)),
	trace,
	)

aws / sagemaker-pytorch-inference-toolkit Goto Github PK

sagemaker-pytorch-inference-toolkit's Introduction

SageMaker PyTorch Inference Toolkit

Contributing

License

sagemaker-pytorch-inference-toolkit's People

Contributors

Stargazers

Watchers

Forkers

sagemaker-pytorch-inference-toolkit's Issues

Expected behavior

Recommend Projects

Recommend Topics

Recommend Org