model_data = 's3://kraft-source-bucket/huggingface_model/model.tar.gz' <p dir="aut

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

I can got the prediction results and endpoint is inService status. I am

My steps are the following: 1/ Added to <a href="https://github.com

[Issue] When use the MMS host model. e.g. HuggingFace Model, no any info in cloudwatch log and can not use ssh about sagemaker-ssh-helper HOT 10 CLOSED

aws-samples commented on May 23, 2024

[Issue] When use the MMS host model. e.g. HuggingFace Model, no any info in cloudwatch log and can not use ssh

from sagemaker-ssh-helper.

Comments (10)

ivan-khvostishkov commented on May 23, 2024

Hi, @KraftZzz. Thanks you for raising this issue.

I have few questions to clarify:

1/ You cannot use ssh / ssm, but do you get any prediction results?

2/ If you don't see any logs in CloudWatch, you probably have misconfigured permissions (no access to CloudWatch). Can you try any of the SageMaker examples, e.g. Deploy a pretrained PyTorch BERT model from Hugging Face and confirm that this example is working in your environment and that you can see the logs for it?

If you have the same issue even if you don't use SageMaker SSH Helper, you might need to reach out to the AWS Support.

from sagemaker-ssh-helper.

KraftZzz commented on May 23, 2024

I can got the prediction results and endpoint is inService status.
I am sure that the permission is not bad and I can see the MMS(multi-model-server) launch log, but no any ssm info or some infos about ssh-helper. And I checked the examples(Deploy a pretrained PyTorch BERT model from Hugging Face) you provided, in this example, using Pytorch Model and is a single model, no multi model.
I refer this example:https://github.com/huggingface/notebooks/tree/main/sagemaker/17_custom_inference_script, could you please help me to confirm whether is available in this example

from sagemaker-ssh-helper.

ivan-khvostishkov commented on May 23, 2024

OK, got it. Could you try instead of dependencies parameter add requirements.txt to the code/?

In the requirements add SageMaker SSH Helper :

sagemaker-ssh-helper

I've tried your example and with it works for me with this approach.

As a side comment, all examples including your code are single-model endpoints. "Multi-model-server" name is somewhat confusing. If you really want to deploy a multi-model endpoint, you will need to use MultiDataModel and SSHMultiModelWrapper. See the FAQ for more details.

from sagemaker-ssh-helper.

KraftZzz commented on May 23, 2024

You add this code:
import os
import sys
sys.path.append(os.path.join(os.path.dirname(file), "lib"))

import sagemaker_ssh_helper
sagemaker_ssh_helper.setup_and_start_ssh()
in inference.py, right?
I mention MMS because I see the following information in the endpoint log:

Warning: MMS is using non-default JVM parameters: -XX:-UseContainerSupport

2023-04-26T04:35:25,060 [INFO ] main com.amazonaws.ml.mms.ModelServer -
MMS Home: /opt/conda/lib/python3.8/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 1
Number of CPUs: 4
Max heap size: 3500 M
Python executable: /opt/conda/bin/python3.8
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: null
Metrics dir: null
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2023-04-26T04:35:25,118 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model
2023-04-26T04:35:25,179 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_huggingface_inference_toolkit.handler_service --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /home/model-server/tmp
2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 72
2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.8.10
2023-04-26T04:35:25,181 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded.
2023-04-26T04:35:25,187 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-04-26T04:35:25,199 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2023-04-26T04:35:25,256 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080
Model server started.

from sagemaker-ssh-helper.

KraftZzz commented on May 23, 2024

no any sagemaker-ssh-helper worklog in endpoint cloudwatch log, so I run:
instance_ids = ssh_wrapper.get_instance_ids()
print(f'To connect over SSM run: aws ssm start-session --target {instance_ids[0]} --region {sess.boto_region_name}')
no any output

from sagemaker-ssh-helper.

KraftZzz commented on May 23, 2024

Could you please share your step ?

from sagemaker-ssh-helper.

ivan-khvostishkov commented on May 23, 2024

My steps are the following:

1/ Added to inference.py the following lines:

+import os
+import sys
+sys.path.append(os.path.join(os.path.dirname(__file__), "lib"))
+
+import sagemaker_ssh_helper
+sagemaker_ssh_helper.setup_and_start_ssh()
+
+
 from transformers import AutoTokenizer, AutoModel
 import torch
 import torch.nn.functional as F

2/ Modified in sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb and executed the following cell:

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker_ssh_helper.wrapper import SSHModelWrapper  # <--NEW--


# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=s3_location,       # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.26",  # transformers version used
   pytorch_version="1.13",        # pytorch version used
   py_version='py39',            # python version used
)

ssh_wrapper = SSHModelWrapper.create(huggingface_model, connection_wait_time_seconds=0)  # <--NEW--


# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge"
    )

After endpoint has been deployed, I was able to fetch the instance_ids():

ssh_wrapper.get_instance_ids()

INFO:sagemaker-ssh-helper:Querying SSM instance IDs for endpoint huggingface-pytorch-inference-2023-04-24-17-00-23-155
INFO:sagemaker-ssh-helper:Got preliminary SSM instance IDs: ['mi-01234567890abcd00']
INFO:sagemaker-ssh-helper:Got final SSM instance IDs: ['mi-01234567890abcd00']

['mi-01234567890abcd00']

from sagemaker-ssh-helper.

KraftZzz commented on May 23, 2024

Oh, To my confusion, I managed to get the Mi-xxxx in one of my experiments yesterday. But I don't modify any code...

from sagemaker-ssh-helper.

KraftZzz commented on May 23, 2024

Thanks for your share

from sagemaker-ssh-helper.

ivan-khvostishkov commented on May 23, 2024

You're welcome! Let me know if you managed to make your code work, so we can close this issue.

from sagemaker-ssh-helper.

[Issue] When use the MMS host model. e.g. HuggingFace Model, no any info in cloudwatch log and can not use ssh about sagemaker-ssh-helper HOT 10 CLOSED

Comments (10)

Warning: MMS is using non-default JVM parameters: -XX:-UseContainerSupport

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent