Code Monkey home page Code Monkey logo

serverless-cf-analysis's Introduction

serverless-cf-analysis

Description

Creates partitions in Athena on behalf of files added to S3 that use a /year/month/day/hour/ key prefix.

Build

As a one-off operation, you'll need to install the Athena JDBC driver into a lib folder, and then add it to your local Maven repository so that it can be incorporated into the final jar:

mkdir lib
aws s3 cp s3://athena-downloads/drivers/AthenaJDBC41-1.0.1.jar lib/
mvn install:install-file -Dfile=lib/AthenaJDBC41-1.0.1.jar -DgroupId=com.amazonaws -DartifactId=athena.jdbc41 -Dversion=1.0.0 -Dpackaging=jar -DgeneratePom=true

And then, to build:

mvn clean compile verify

Create an IAM Role

Before you create a Lambda function, you will need to create an IAM role that allows Lambda to execute queries in Athena. Create a role named lambda_athena_exec_role and attach the following managed policies to the role: AmazonS3FullAccess, AmazonAthenaFullAccess.

Add this inline access policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        }
    ]
}

And attach the following trust relationship to enable Lambda to assume the role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create a Lambda Function to Add Partitions to Athena

Create a Lambda function that can be associated with S3 new object event notifications. When creating the function, you'll need to set several environment variables:

  • PARTITION_TYPE Supply one of the following values: Month, Day or Hour. This environment variable is optional: if you omit it, the function will default to Day.
  • TABLE_NAME Use the format ``.<table_name>. For example, sampledb.vpc_flow_logs`.
  • S3_STAGING_DIR An Amazon S3 location to which your query output will be written. (Although the Lambda function is only executing DDL statements, Athena still writes an output file to S3.)
  • ATHENA_REGION The region in which Athena is located (e.g. us-east-1).
  • DDB_TABLE_NAME The name of the DynamoDB table holding partition information.

Specify the handler and an existing role:

  • Handler: com.amazonaws.services.lambda.CreateAthenaPartitionsBasedOnS3Event::handleRequest
  • Existing role: lambda_athena_exec_role

Set the timeout to one minute.

serverless-cf-analysis's People

Contributors

hyandell avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.