Code Monkey home page Code Monkey logo

sagemaker-studio-apps-lifecycle-config-examples's Introduction

SageMaker Studio Lifecycle Configuration examples

Overview

A collection of sample scripts customizing SageMaker Studio applications using lifecycle configurations.

Lifecycle Configurations (LCCs) provide a mechanism to customize SageMaker Studio applications via shell scripts that are executed at application bootstrap. For further information on how to use lifecycle configurations with SageMaker Studio applications, please refer to the AWS documentation:

Warning The sample scripts in this repository are designed to work with SageMaker Studio JupyterLab and Code Editor applications. If you are using SageMaker Studio Classic, please refer to https://github.com/aws-samples/sagemaker-studio-lifecycle-config-examples

Sample Scripts

  • auto-stop-idle - Automatically shuts down JupyterLab applications that have been idle for a configurable time.
  • auto-stop-idle - Automatically shuts down Code Editor applications that have been idle for a configurable time.

Common scripts

These scripts will work with both SageMaker JupyterLab and SageMaker Code Editor apps. Note that if you want the script to be available across both apps, you will need to set them as an LCC script for both apps.

  • ebs-s3-backup-restore - This script backs up content in a user space's EBS volume (user's home directory under /home/sagemaker-user) to an S3 bucket that's specified on the script, optionally on a schedule. If the user profile is tagged with a SM_EBS_RESTORE_TIMESTAMP tag, then the script will restore the backup files into the user's home directory, in addition to backups.

Developing LLCs for SageMaker Studio applications

For best practices, please check DEVELOPMENT.

License

This project is licensed under the MIT-0 License.

Authors

Giuseppe A. Porcelli - Principal, ML Specialist Solutions Architect - Amazon SageMaker
Spencer Ng - Software Development Engineer - Amazon SageMaker
Durga Sury - Senior ML Specialist Solutions Architect - Amazon SageMaker

sagemaker-studio-apps-lifecycle-config-examples's People

Contributors

amazon-auto avatar aws-spenceng avatar durgasury avatar giuseppeporcelli avatar lavaraja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sagemaker-studio-apps-lifecycle-config-examples's Issues

auto-stop-idle scripts sometimes not working.

I've tried these scripts on both code editor and jupyter lab apps. Yesterday they were working. Today, the scripts seem to be started as indicated by the last log in the LifecycleConfigOnStart

JupyterLab/default/LifecycleConfigOnStart

+ echo */2 * * * * /bin/bash -ic '/opt/conda/bin/python /var/tmp/auto-stop-idle/sagemaker_studio_jlab_auto_stop_idle/auto_stop_idle.py --idle-time 300 --hostname 0.0.0.0 --port 8888 --base-url /jupyterlab/default/ --ignore-connections True --skip-terminals False --state-file-path /var/tmp/auto-stop-idle/auto_stop_idle.st >> /var/log/apps/app_container.log'

/CodeEditor/default/LifecycleConfigOnStart

echo */2 * * * * /bin/bash -ic '/opt/conda/bin/python /var/tmp/auto-stop-idle/sagemaker_code_editor_auto_shut_down/auto_stop_idle.py --time 300 --region eu-central-1 >> /var/log/apps/app_container.log'

then:

the last few JupyterLab/default logs only show:

[I 2024-03-28 15:13:50.675 ServerApp] Client connected. ID: b897bf3a043b44db9b39988623623653
[I 2024-03-28 15:13:54.440 ServerApp] Client disconnected. ID: b897bf3a043b44db9b39988623623653

repeated several times with different IDs, instead of the usual logs where the last activity's time in the JupyterLab is tracked.

The /CodeEditor/default logs weren't shown yesterday at all. Funny enough, the script was working as intended.
Today, however, they are showing, just now outputting anything useful. And the script is also not working.

2024-03-28 15:05:53,228 INFO success: codeeditorserver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
[15:05:53] Extension host agent started.
[15:06:26] [169.255.255.1][ad61f395][ManagementConnection] New connection established.
[15:06:26] [169.255.255.1][dabd5c36][ExtensionHostConnection] New connection established.
[15:06:27] [169.255.255.1][dabd5c36][ExtensionHostConnection] <310> Launched Extension Host Process.

Have you experienced something like this? I find it odd, that it's suddenly behaving like this without any apparent reason.


UPDATE

Apparently this might have to do with a new Sagemaker Distribution image. Yesterday, the latest Sagemaker Distribution image was 1.5. Today 1.6 seems to be the newest one. I tried this with the 1.5 version and it worked again. Then again with the 1.6 version and it doesn't work. This might be worth investigating to make sure the LCC works with the newest Sagemaker Distribution images

Remove state file on shutdown?

Ran into a situation where I tried to re-launch a space after the idle checker shutdown the app. Because the state file already existed the LCC kept triggering SIGTERM before the app could even get started. Should we remove the state file on shutdown to start fresh on every launch?

Monitor SageMaker Studio Apps with CloudWatch Agent

Is it possible to install the cloudwatch agent on sagemaker studio apps to monitor its usage over time through exporting metrics into a custom namespace similar to the approach employed here for SageMaker Notebook Instances? I have run into the issue where it does not appear to be able to run the agent on Studio Apps even when I manually installed and configured it from the terminal.

image

CodeEditor Auto Shutdown

I assume this script does not work on code editor space apps?

Therefore the question / feature request is can you provide an equivalent example to auto shutdown code editor apps? Or aim me to an example on how to do this?

Thanks

Laurence

Studio Misconfigured After trying the LCC

Hello,

After I tried this LCC my studio began bugging. I cannot tell for certain if this is an issue with the LCC or is another issue with the new Studio experience.

After I tryed to apply the LCC I cannot longer see the instance types or images in the new studio Spaces.

image

I see in CloudTrail an error that says:

ValidationException - "1 validation error detected: Value null at 'spaceSettings.codeEditorAppSettings.defaultResourceSpec.ec2InstanceType' failed to satisfy constraint: Member must not be null"

If I describe the space I can see these settings which do not contain the ec2InstanceType param

"SpaceSettings": { "CodeEditorAppSettings": { "DefaultResourceSpec": { "SageMakerImageArn": "arn:aws:sagemaker:us-east-1:885854791233:image/sagemaker-distribution-cpu", "SageMakerImageVersionAlias": "1.3.0", "InstanceType": "ml.t3.medium" } }

Any ideas on how to fix this problem without deleting the domain?

This issue is also posted in this AWS forum

Autostop doesn't work for shared spaces sagemaker studio

Hello,

I use your autostop script in private spaces for sagemaker studio and it works fine. However, using the same script on shared space with the same custom image brings me an error in Cloudwatch after deploying LCC:

[W 240307 14:44:02 log:49] 404 GET /jupyterlab/default/api/sessions (127.0.0.1) 4.18ms referer=None
[W 240307 14:46:01 log:49] 404 GET /jupyterlab/default/api/terminals (127.0.0.1) 4.21ms referer=None
2024-03-07T14:46:01.743687z - [auto-stop-idle] - An error accurred while checking idle state. Exception: string indices must be integers

Does anyone know what is the matter and how to fix this? Thanks in advance😄

App [default] associated with space [space-name] and userprofile [null] is not found or in unusable state. Please try again after making sure that the app is created and InService

I'm getting this error sporadically since I started using the LCCs for the new SageMaker Studio:

App [default] associated with space [space-name] and userprofile [null] is not found or in unusable state. Please try again after making sure that the app is created and InService

I have tried to solving it by unselecting the LCC then running the app (update the space), then shutting it down and finally selecting the LCC again. This seems to work but the error still appears periodically.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.