Comments (10)
Hi, @KraftZzz. Thanks you for raising this issue.
I have few questions to clarify:
1/ You cannot use ssh / ssm, but do you get any prediction results?
2/ If you don't see any logs in CloudWatch, you probably have misconfigured permissions (no access to CloudWatch). Can you try any of the SageMaker examples, e.g. Deploy a pretrained PyTorch BERT model from Hugging Face and confirm that this example is working in your environment and that you can see the logs for it?
If you have the same issue even if you don't use SageMaker SSH Helper, you might need to reach out to the AWS Support.
from sagemaker-ssh-helper.
- I can got the prediction results and endpoint is inService status.
- I am sure that the permission is not bad and I can see the MMS(multi-model-server) launch log, but no any ssm info or some infos about ssh-helper. And I checked the examples(Deploy a pretrained PyTorch BERT model from Hugging Face) you provided, in this example, using Pytorch Model and is a single model, no multi model.
- I refer this example:https://github.com/huggingface/notebooks/tree/main/sagemaker/17_custom_inference_script, could you please help me to confirm whether is available in this example
from sagemaker-ssh-helper.
OK, got it. Could you try instead of dependencies
parameter add requirements.txt
to the code/
?
In the requirements add SageMaker SSH Helper :
sagemaker-ssh-helper
I've tried your example and with it works for me with this approach.
As a side comment, all examples including your code are single-model endpoints. "Multi-model-server" name is somewhat confusing. If you really want to deploy a multi-model endpoint, you will need to use MultiDataModel
and SSHMultiModelWrapper
. See the FAQ for more details.
from sagemaker-ssh-helper.
You add this code:
import os
import sys
sys.path.append(os.path.join(os.path.dirname(file), "lib"))
import sagemaker_ssh_helper
sagemaker_ssh_helper.setup_and_start_ssh()
in inference.py, right?
I mention MMS because I see the following information in the endpoint log:
Warning: MMS is using non-default JVM parameters: -XX:-UseContainerSupport
2023-04-26T04:35:25,060 [INFO ] main com.amazonaws.ml.mms.ModelServer -
MMS Home: /opt/conda/lib/python3.8/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 1
Number of CPUs: 4
Max heap size: 3500 M
Python executable: /opt/conda/bin/python3.8
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: null
Metrics dir: null
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2023-04-26T04:35:25,118 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model
2023-04-26T04:35:25,179 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_huggingface_inference_toolkit.handler_service --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /home/model-server/tmp
2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 72
2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2023-04-26T04:35:25,180 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.8.10
2023-04-26T04:35:25,181 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded.
2023-04-26T04:35:25,187 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-04-26T04:35:25,199 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2023-04-26T04:35:25,256 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080
Model server started.
from sagemaker-ssh-helper.
no any sagemaker-ssh-helper worklog in endpoint cloudwatch log, so I run:
instance_ids = ssh_wrapper.get_instance_ids()
print(f'To connect over SSM run: aws ssm start-session --target {instance_ids[0]} --region {sess.boto_region_name}')
no any output
from sagemaker-ssh-helper.
Could you please share your step ?
from sagemaker-ssh-helper.
My steps are the following:
1/ Added to inference.py the following lines:
+import os
+import sys
+sys.path.append(os.path.join(os.path.dirname(__file__), "lib"))
+
+import sagemaker_ssh_helper
+sagemaker_ssh_helper.setup_and_start_ssh()
+
+
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
2/ Modified in sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb and executed the following cell:
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker_ssh_helper.wrapper import SSHModelWrapper # <--NEW--
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data=s3_location, # path to your model and script
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.26", # transformers version used
pytorch_version="1.13", # pytorch version used
py_version='py39', # python version used
)
ssh_wrapper = SSHModelWrapper.create(huggingface_model, connection_wait_time_seconds=0) # <--NEW--
# deploy the endpoint endpoint
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge"
)
After endpoint has been deployed, I was able to fetch the instance_ids():
ssh_wrapper.get_instance_ids()
INFO:sagemaker-ssh-helper:Querying SSM instance IDs for endpoint huggingface-pytorch-inference-2023-04-24-17-00-23-155
INFO:sagemaker-ssh-helper:Got preliminary SSM instance IDs: ['mi-01234567890abcd00']
INFO:sagemaker-ssh-helper:Got final SSM instance IDs: ['mi-01234567890abcd00']
['mi-01234567890abcd00']
from sagemaker-ssh-helper.
Oh, To my confusion, I managed to get the Mi-xxxx in one of my experiments yesterday. But I don't modify any code...
from sagemaker-ssh-helper.
Thanks for your share
from sagemaker-ssh-helper.
You're welcome! Let me know if you managed to make your code work, so we can close this issue.
from sagemaker-ssh-helper.
Related Issues (20)
- Issue trying with a SageMaker Notebook: "sagemaker-ssh-helper:SSMManager:SSH Helper not yet started? Retrying." HOT 6
- Is there any additional cost&limits related to using (SSM) Session Manager? HOT 4
- Error on `dpkg` when running `sm-local-configure` HOT 4
- How to enable cloudwatch logs for SSM HOT 2
- [Feature] Support Hugging Face Accelerate for training HOT 1
- Issue: Invalid bucket name "sagemaker.config INFO - Fetched defaults config from location: ": Bucket name must match the regex
- Is there any plan to support it in BJS/ZHY? HOT 1
- [Issue]failed to find agent identity HOT 6
- [Issue] `sm-local-configure` breaks on MacOS HOT 4
- Issue] STS client is not using regional endpoints HOT 2
- [Feature] Make switching instances in SageMaker Studio and PyCharm more smooth
- [Feature] Support HF accelerate and DeepSpeed for inference HOT 1
- Thoughts on using a configuration management framework? HOT 6
- sm-local-configure only works with bash like installations - no Powershell/CMD support / Windows support at all HOT 4
- Error occurred when starting amazon-ssm-agent: failed to get identity: failed to find agent identity HOT 1
- Are scripts supposed to work on SageMaker notebook instances? HOT 12
- How to install VSCode, other apps in WebVNC view? HOT 2
- JupyterServer URL suffix when tunnelling into KernelGateway app HOT 2
- Notebook `SageMaker_SSH_Notebook.ipynb` fails due to docker-compose HOT 5
- Enable advanced-instances tier to use Session Manager with your on-premises instances HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sagemaker-ssh-helper.