Code Monkey home page Code Monkey logo

serverless-data-pipeline-sam's Introduction

Serverless Data Pipeline - Powered by AWS SAM

Serverless Data Pipeline build with Amazon API Gateway, AWS Lambda, Amazon Kinesis Firehose, Amazon S3, and Amazon Athena.

How to deploy the stack

See scripts/deploy.sh (customize your deployment bucket and stack name).

How to ingest new records via API

See scripts/track.sh (customize your stack name).

What kind of queries can I run on the dataset?

It depends on the data that you collect and on the virtual tables that you define on Athena and Glue.

The file queries.sql contains a few sample queries that you can run with the default schema (e.g. {"name": "John", "action": "charge", "value": 100}).

Resources list

This stack will create the following resources:

  • An API Gateway endpoint that you can use to track events by submitting any JSON data via the HTTP POST method
  • A Kinesis Firehose Delivery Stream that will buffer, optionally compress, and write each record into S3
  • A Lambda Function to process/manipulate/clean/skip records before they get written into S3
  • An S3 Bucket that will contain all the collected data
  • Three Athena Named Queries to get started quickly with serverless queries
  • An IAM Role and Policy for API Gateway
  • An IAM Role and Policy for Kinesis Firehose

Parameters

  • ApiStageName: The API Gateway Stage name (e.g. dev, prod, etc.)
  • FirehoseS3Prefix: The S3 Key prefix for Kinesis Firehose
  • FirehoseCompressionFormat: The compression format used by Kinesis Firehose
  • FirehoseBufferingInterval: How long Firehose will wait before writing a new batch into S3
  • FirehoseBufferingSize: The maximum batch size in MB
  • LambdaTimeout: Lambda's max execution time in seconds
  • LambdaMemorySize: Lambda's max memory configuration
  • AthenaDatabaseName: The Athena database name
  • AthenaTableName: The Athena table name

Outputs

  • TrackURL: The public URL to submit new records
  • BucketName: The bucket that will store your data
  • FunctionName: The Lambda Function that will process/validate records

Gotchas

  • The architecture is 100% serverless (no hourly costs, no servers to manage)
  • The API Gateway endpoint is publicly accessible (i.e. any browser or anonymous website user can potentially submit new records/events)
  • You can customize the template to enable encryption at rest for Kinesis Firehose
  • You can configure Kinesis Firehose's buffering (see Parameters above)
  • Athena's Named Queries cannot be updated (you need to create a new query with a different logical name)
  • Make sure the S3 bucket is empty when you delete the stack

serverless-data-pipeline-sam's People

Contributors

alexcasalboni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

serverless-data-pipeline-sam's Issues

CF template error on resource DeliveryAPI

AWS::ApiGateway::RestApi DeliveryApi Errors found during import: Unable to put integration on 'POST' for resource at path '/track': AWS ARN for integration must contain path or action (Service: AmazonApiGateway; Status Code: 400; Error Code: BadRequestException; Request ID: ff23a943-c6b4-11e8-9032-c3761defd2dd)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.