getindata / kedro-sagemaker Goto Github PK

View Code? Open in Web Editor NEW

14.0 9.0 4.0 1.14 MB

Kedro Plugin to support running pipelines on AWS SageMaker.

Home Page: https://kedro-sagemaker.readthedocs.io

License: Apache License 2.0

Python 100.00%

kedro kedro-plugin machinelearning mlops sagemaker

kedro-sagemaker's Introduction

Kedro SageMaker Pipelines plugin

We help companies turn their data into assets

About

This plugin enables you to run Kedro projects on Amazon SageMaker. Simply install the package and use the provided kedro sagemaker commands to build, push, and run your project on SageMaker.

Documentation

For detailed documentation refer to https://kedro-sagemaker.readthedocs.io/

Usage guide

Usage: kedro sagemaker [OPTIONS] COMMAND [ARGS]...

Options:
  -e, --env TEXT  Environment to use.
  -h, --help      Show this message and exit.

Commands:
  compile  Compiles the pipeline to a JSON file
  init     Creates basic configuration for Kedro SageMaker plugin
  run      Runs the pipeline on SageMaker Pipelines

Quickstart

Follow quickstart section on kedro-sagemaker.readthedocs.io to see how to run your Kedro project on AWS SageMaker or watch the video below:

kedro-sagemaker's People

Contributors

Stargazers

Watchers

Forkers

wmikolajczyk-fandom noklam pandinosaurus jonasbn

kedro-sagemaker's Issues

kedro-sagemaker plugin fails to execute if a kedro project has more than 100 parameters

I'm currently using the Kedro-SageMaker plugin to run a Kedro pipeline on SageMaker. My Kedro project has multiple parameters (using several parameters YAML files).

While executing kedro sagemaker run -i <image name>, I encountered this error:

ClientError: ValidationError for UpdatePipeline operation: Unable to parse pipeline definition. Model validation failed - container ProcessingEnvironmentMap length (132) exceeds maximum limit (100).

Here are the steps to reproduce the issue.

Add a dummy_parameter.yml file under conf/base/parameters/ in the kedro space flight example code. (check dummy_pamaramter.yml file content below)
Simply run kedro sagemaker run --auto-build

dummy_parameter.yml file content:

dummy_params:
  parameter1: 1
  parameter2: 2
  parameter3: 3
  parameter4: 4
  parameter5: 5
  .
  .
  .
  .
  .
 parameter100: 100

Validation Error for KedroSageMakerPluginConfig

While trying to implement spaceflight tutorial but i am getting error when running "kedro sagemaker run --auto-build" command.

Using following package versions:
kedro == 0.18.13
kedro-datasets ==1.7.0
kedro-sagemaker == 0.3.0
kedro-viz == 6.5.0

Error:
ValidationError: 1 validation error for KedroSageMakerPluginConfig
root
KedroSageMakerPluginConfig expected dict not NoneType (type=type_error)

Processing Jobs tagging and multiple nodes running on the same instance.

Hello and thanks for this plugin. I have three questions:

I want to assign a tag (in addition to those assigned automatically like 'sagemaker:pipeline-execution-arn' and 'sagemaker:pipeline-step-name') to the Processing Jobs launched through this plugin. Is there a way to do this automatically without interfacing through the AWS management console?
I want that all the nodes runs on the same instance. The default behaviour is that when a step of the pipeline is executed a new instance is start-up and when it finishes the instance is shut-down increasing the total execution time. Is there a way to make it executing on the same instance to optimize execution time?
Moreover I have some nodes that can be executed in parallel, can I do this configuring the plugin in a certain way?

Deploying a model after training with kedro-sagemaker

Hi,

Thanks for this cool project! I would like to integrate Kedro with Sagemaker. If I understand things correctly, using kedro-sagemaker I can run a Kedro Pipeline in Sagemaker Pipelines. This would result in a trained model that I can deploy as a Sagemaker Endpoint for inference.

When the deployed model receives a request, the data still needs to be transformed (e.g. scaling, one-hot encoding, ...). Is there a way to run the Kedro Pipeline as part of the Sagemaker Endpoint using transformations that are fitted during training?

Kedro-flask-sagemaker

I want to give a flask app.py as the main file while building docker. The endpoint in this flask will call the kedro cli, build the pipeline and deploy to sagemaker. Does it seem feasible?

Consider porting grouping feature from kedro-vertexai

Adding as a reminder of things to do/consider.