Code Monkey home page Code Monkey logo

aws-athena-partition-autoloader's Introduction

aws-athena-partition-autoloader

Automatically adds new partitions detected in S3 to an existing Athena table

Purpose

Athena is fantastic for querying data in S3 and works especially well when the data is partitioned. The issue comes when you have a lot of partitions and need to issue the MSCK LOAD PARTITONS command as it can take a long time.

This solution subscribes to S3 events on a bucket and detects when a new partition is created and then loads only that partition into Athena. It uses a cache of the existing partitions to minimize the number of calls needed to Athena to query the parition list.

Installing and Configuring

AWS Setup

Deploying to AWS

Before starting, you will need:

  • The AWS CLI installed and default credentials configured
  • The AWS SAM CLI installed
  • An existing S3 bucket where the AWS Lambda code will be deployed to by SAM
  • An existing Athena table backed by content in S3 with at least 1 partition key
  • This repo cloned
  1. Run the deploy.sh script like
./deploy.sh <function_name> <s3 bucket region> <athena region> <action>  <s3 bucket to store lamba code in> <s3 bucket containing athena data> <S3 bucket for Athena results> <Athena database> <Athena table> <comma-seperated list of athena partition names> <AWS profile>

For Example:

./deploy.sh athena_loader_mytable eu-west-1 us-east-1 ALL lambda-sam-staging stage-audit-log aws-athena-query-results-123456789-us-east-1 audit_log_db api_audit_log 'destination_platform_id,date' staging

The list of partition keys must exactly match that which was defined on the table.

deploy.sh uses AWS SAM to package the AWS Lambda functions and then deploys them to AWS. Everything is deployed as a Cloudformation Stack in the specified region.

NOTE: If you don't have SAM installed, you can replace the SAM commands in the deploy script with aws cloudformation package... and aws cloudformation deploy.. instead

aws-athena-partition-autoloader's People

Contributors

dnorth98 avatar jackric avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-athena-partition-autoloader's Issues

Use aws cloudformation package | deploy instead of SAM

Hey, great tool, love it. I use SAM and know it's super useful, however just wanted to let you know that the aws cloudformation package|deploy commands work just as well as the equivalent sam package|deploy commands, removing the deploy time dependency.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.