Comments (6)
Hi, @harish-kamath , very nice that you like this library! I have few questions:
1/ When you say InternalServerError: We encountered an internal error. Please try again
- which log file exactly do you see this message at? Is it in CloudWatch?
2/ Do you know which process generates this message, e.g., is there any prefix in front of this line?
3/ What SageMaker component you are connecting to, e.g., SageMaker Training or Studio, or Inference?
from sagemaker-ssh-helper.
Upon further digging - I'm not sure if it is actually the credentials (only).
I noticed that the ssm agent will first pull AWS credentials from the environment variables - so I tried including explicit AWS access key and secret key in my training job. The logs still show that the credentials are being refreshed, but the machine doesn't actually crash ~1m after the credentials are refreshed anymore. However, now it just crashes otherwise (even if there is nothing running, so no chance that it's a resource issue).
And here is the last cloudwatch log:
(Note that in this case, I did not do sm-wait stop
, but it doesn't actually matter for this error. It occurs even if I do that)
-
There's no prefix or process unfortunately. Since it just crashes the machine, and there's no persistent storage, I'm not sure how I can actually debug after a crash either.
-
Sagemaker Training Jobs
from sagemaker-ssh-helper.
On the bright side, it no longer crashes always after 30 mins of being connected. However, it is still crashing within an hour.
from sagemaker-ssh-helper.
Never mind, just got another crash in <30 minutes.
I'm pretty sure it is still this package, because connecting over plain SSH is still fine and never causes a crash.
from sagemaker-ssh-helper.
Related Issues (20)
- [Feature] Support HF accelerate and DeepSpeed for inference HOT 1
- Thoughts on using a configuration management framework? HOT 6
- sm-local-configure only works with bash like installations - no Powershell/CMD support / Windows support at all HOT 4
- Error occurred when starting amazon-ssm-agent: failed to get identity: failed to find agent identity HOT 1
- Are scripts supposed to work on SageMaker notebook instances? HOT 12
- How to install VSCode, other apps in WebVNC view? HOT 2
- JupyterServer URL suffix when tunnelling into KernelGateway app HOT 2
- Notebook `SageMaker_SSH_Notebook.ipynb` fails due to docker-compose HOT 5
- Enable advanced-instances tier to use Session Manager with your on-premises instances HOT 2
- Connecting to SageMaker BYOC Inference Endpoint? HOT 2
- SSH port forwarding to KernelGateway app container HOT 2
- [Question] Shell environment different from web terminal HOT 2
- [bug] - `SageMaker_SSH_IDE.ipynb` does not work HOT 1
- [Feature] Support shared spaces in SageMaker Studio Classic
- [Feature] Support the updated SageMaker Studio experience HOT 1
- [Question] How to connect to sagemaker notebooks HOT 4
- does ssh helper support sagemaker's remote debug's ssm connection? HOT 2
- vscode connect fails HOT 3
- does ssh helper support byoc sagemaker endpoint? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sagemaker-ssh-helper.