Code Monkey home page Code Monkey logo

kedro-aws-batch's Introduction

Kedro AWS Batch

Run a Kedro Project on AWS Batch

Prerequisites

  • Docker
  • Kedro 0.16.6
  • ECR, S3 & AWS Batch
  • scikit-learn 0.23.0
  • pickle5 0.0.11

Build

  • Build image

    example$ ./scripts/build.sh
    
  • ECR Login

    aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <ecr_id>.dkr.ecr.<region>.amazonaws.com
    
  • Login AWS Console and create an ECR repository

  • Push image into ECR

    docker tag [repository uri]
    docker push [repository uri]
    
  • Create IAM role

    Name the newly-created IAM role `batchJobRole`.
    The policy (step 3) should be added `AmazonS3FullAccess`
    
  • Create AWS Batch compute environment

    Create a managed, on-demand one named `spaceflights_env` and
    let it choose to create new service and instance roles
    
  • Create AWS Batch job queue

    Create a queue named `spaceflights_queue`,
    connected to your newly created compute environment `spaceflights_env`, and give it `Priority` 1.
    
  • Create AWS Batch job definition

    Create a job definition named `kedro_run`, assign it the newly created `batchJobRole` IAM role,
    the container image you’ve packaged above, execution timeout of 300s and 2000MB of memory
    
    For me: Should set the execution timeout is 900s.
    It avoids the main batch to fail due to execution timeout.
    
  • Run Kedro node(Run Jobs)

    Command: kedro run --node preprocessing_companies
    
  • Submit AWS Batch jobs(Run Jobs)

    Command: kedro run --env aws_batch --runner example.runner.AWSBatchRunner
    

Issues

  • Error: ECR registry auth

    ResourceInitializationError: unable to pull secrets or registry auth:
    execution resource retrieval failed: unable to retrieve ecr registry auth:
    service call has been retried 1 time(s):
    AccessDeniedException: User: arn:aws:sts::783560535431:assumed-rol...
    
    ResourceInitializationError: unable to pull secrets or registry auth:
    execution resource retrieval failed:
    unable to retrieve ecr registry auth: service call has been retried 1 time(s):
    RequestError: send request failed caused by: Post https://api.ecr....
    

    Fixed: add permission ECS for batchJobRole

  • Error: Cloudwatch log stream

    ResourceInitializationError: failed to validate logger args:
    create stream has been retried 1 times: failed to create Cloudwatch log stream:
    AccessDeniedException: User: arn:aws:sts::783560535431:assumed-role/batchJobRole/986cca09ac1748c08b77360b92e314...
    

    Fixed: add permission CloudWatch for batchJobRole

    ECS-CloudWatchLogs
    https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html
    
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents",
                    "logs:DescribeLogStreams"
                ],
                "Resource": [
                    "arn:aws:logs:*:*:*"
                ]
            }
        ]
    }
    
  • Error: SubmitJob

    botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when
    calling the SubmitJob operation:
    User: arn:aws:sts::783560535431:assumed-role/batchJobRole/27a6c01f23ec42dfac5d20d539eb48bf
    is not authorized to perform:
    batch:SubmitJob on resource: arn:aws:batch:ap-southeast-1:783560535431:job-definition/kedro_run
    

    Fixed: add permission AWSBatchFullAccess for batchJobRole

  • BatchJobRole BatchJobRole

Results

  • Kedro Visualise Pipelines Kedro Viz

  • Kedro run Node NodeRun

  • Kedro Example Job ExampleJob

  • Kedro Example Job Log ExampleJobLog

References

kedro-aws-batch's People

Contributors

nhatthaiquang-agilityio avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

shelleysu84

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.