Comments (5)
Looking into exception stack trace, I see that it's again something related to SageMaker itself rather than to SSH Helper. It's downloading the code from S3, most likely from the default bucket that looks like s3://sagemaker-eu-west-1-555555555555/
. Could you check that this bucket exists, you can access this bucket from your notebook instance (e.g. by running aws s3 cp
command from the Terminal) and it's located in the same region as your notebook?
If the above steps don't help, please, raise a support case:
https://docs.aws.amazon.com/awssupport/latest/user/case-management.html
from sagemaker-ssh-helper.
Hi, @djmarti , thanks for bringing up this important observation. The issue is probably rooted in the recent changes of docker-compose
: docker/compose#10797 .
Please, downgrade the version as a workaround:
sudo curl -L "https://github.com/docker/compose/releases/download/v2.18.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
There's nothing that we can do on the SageMaker SSH Helper side, I'll keep this issue open until SageMaker Notebooks will get an update.
from sagemaker-ssh-helper.
Thanks Ivan for your prompt response and for the workaround. I think I gave a misleading hint. I am still unable to run the notebook after downgrading docker-compose
to version 2.18.1. I checked that the version of docker-compose is the expected one:
$ whereis docker-compose
docker-compose: /usr/local/bin/docker-compose
$ docker-compose -v
Docker Compose version v2.18.1
But now I get an error that smells like a permission error:
e13eeylbz4-algo-1-c64ep | 2023-11-17 01:34:50,818 sagemaker_pytorch_container.training INFO Invoking user training script.
e13eeylbz4-algo-1-c64ep | 2023-11-17 01:34:50,875 botocore.credentials INFO Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
e13eeylbz4-algo-1-c64ep | 2023-11-17 01:34:51,010 sagemaker-training-toolkit ERROR Reporting training FAILURE
e13eeylbz4-algo-1-c64ep | 2023-11-17 01:34:51,010 sagemaker-training-toolkit ERROR Framework Error:
e13eeylbz4-algo-1-c64ep | Traceback (most recent call last):
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/sagemaker_training/trainer.py", line 88, in train
e13eeylbz4-algo-1-c64ep | entrypoint()
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/sagemaker_pytorch_container/training.py", line 153, in main
e13eeylbz4-algo-1-c64ep | train(environment.Environment())
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/sagemaker_pytorch_container/training.py", line 100, in train
e13eeylbz4-algo-1-c64ep | entry_point.run(uri=training_environment.module_dir,
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/sagemaker_training/entry_point.py", line 92, in run
e13eeylbz4-algo-1-c64ep | files.download_and_extract(uri=uri, path=environment.code_dir)
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/sagemaker_training/files.py", line 138, in download_and_extract
e13eeylbz4-algo-1-c64ep | s3_download(uri, dst)
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/sagemaker_training/files.py", line 174, in s3_download
e13eeylbz4-algo-1-c64ep | s3.Bucket(bucket).download_file(key, dst)
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/boto3/s3/inject.py", line 277, in bucket_download_file
e13eeylbz4-algo-1-c64ep | return self.meta.client.download_file(
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/boto3/s3/inject.py", line 190, in download_file
e13eeylbz4-algo-1-c64ep | return transfer.download_file(
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/boto3/s3/transfer.py", line 326, in download_file
e13eeylbz4-algo-1-c64ep | future.result()
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/s3transfer/futures.py", line 103, in result
e13eeylbz4-algo-1-c64ep | return self._coordinator.result()
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/s3transfer/futures.py", line 266, in result
e13eeylbz4-algo-1-c64ep | raise self._exception
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/s3transfer/tasks.py", line 269, in _main
e13eeylbz4-algo-1-c64ep | self._submit(transfer_future=transfer_future, **kwargs)
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/s3transfer/download.py", line 354, in _submit
e13eeylbz4-algo-1-c64ep | response = client.head_object(
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
e13eeylbz4-algo-1-c64ep | return self._make_api_call(operation_name, kwargs)
e13eeylbz4-algo-1-c64ep | File "/opt/conda/lib/python3.9/site-packages/botocore/client.py", line 960, in _make_api_call
e13eeylbz4-algo-1-c64ep | raise error_class(parsed_response, operation_name)
e13eeylbz4-algo-1-c64ep | botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
e13eeylbz4-algo-1-c64ep |
e13eeylbz4-algo-1-c64ep | An error occurred (403) when calling the HeadObject operation: Forbidden
e13eeylbz4-algo-1-c64ep | 2023-11-17 01:34:51,011 sagemaker-training-toolkit ERROR Encountered exit_code 1
A permission error is surprising because I didn't have issues before and because there haven't been any changes in my setup.
from sagemaker-ssh-helper.
Apologies for the long delay. I retried with the exact same code and the problem is gone, which is consistent with your suggestion that this was something related to SageMaker. Everything works as expected, closing the ticket.
from sagemaker-ssh-helper.
I've faced the similar message with HeadObject
, but it looks like the notebook instance was running for a very long time. I've stopped and started this instance again and the issue is gone.
from sagemaker-ssh-helper.
Related Issues (20)
- [Feature] Support HF accelerate and DeepSpeed for inference HOT 1
- Thoughts on using a configuration management framework? HOT 6
- sm-local-configure only works with bash like installations - no Powershell/CMD support / Windows support at all HOT 4
- Error occurred when starting amazon-ssm-agent: failed to get identity: failed to find agent identity HOT 1
- Are scripts supposed to work on SageMaker notebook instances? HOT 12
- How to install VSCode, other apps in WebVNC view? HOT 2
- JupyterServer URL suffix when tunnelling into KernelGateway app HOT 2
- Enable advanced-instances tier to use Session Manager with your on-premises instances HOT 2
- Connecting to SageMaker BYOC Inference Endpoint? HOT 2
- SSH port forwarding to KernelGateway app container HOT 2
- [Question] Shell environment different from web terminal HOT 2
- [bug] - `SageMaker_SSH_IDE.ipynb` does not work HOT 1
- [Feature] Support shared spaces in SageMaker Studio Classic
- [Feature] Support the updated SageMaker Studio experience HOT 1
- [Question] How to connect to sagemaker notebooks HOT 4
- does ssh helper support sagemaker's remote debug's ssm connection? HOT 2
- vscode connect fails HOT 3
- VSCode disconnects after credentials refresh. HOT 6
- does ssh helper support byoc sagemaker endpoint? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sagemaker-ssh-helper.