Code Monkey home page Code Monkey logo

pydantic-glue's Introduction

JSON Schema to AWS Glue schema converter

Installation

pip install pydantic-glue

What?

Converts pydantic schemas to json schema and then to AWS glue schema, so in theory anything that can be converted to JSON Schema could also work.

Why?

When using AWS Kinesis Firehose in a configuration that receives JSONs and writes parquet files on S3, one needs to define a AWS Glue table so Firehose knows what schema to use when creating the parquet files.

AWS Glue lets you define a schema using Avro or JSON Schema and then to create a table from that schema, but as of *May 2022` there are limitations on AWS that tables that are created that way can't be used with Kinesis Firehose.

https://stackoverflow.com/questions/68125501/invalid-schema-error-in-aws-glue-created-via-terraform

This is also confirmed by AWS support.

What one could do is create a table set the columns manually, but this means you now have two sources of truth to maintain.

This tool allows you to define a table in pydantic and generate a JSON with column types that can be used with terraform to create a Glue table.

Example

Take the following pydantic class

from pydantic import BaseModel
from typing import List


class Bar(BaseModel):
    name: str
    age: int


class Foo(BaseModel):
    nums: List[int]
    bars: List[Bar]
    other: str

Running pydantic-glue

pydantic-glue -f example.py -c Foo

you get this JSON in the terminal:

{
  "//": "Generated by pydantic-glue at 2022-05-25 12:35:55.333570. DO NOT EDIT",
  "columns": {
    "nums": "array<int>",
    "bars": "array<struct<name:string,age:int>>",
    "other": "string"
  }
}

and can be used in terraform like that

locals {
  columns = jsondecode(file("${path.module}/glue_schema.json")).columns
}

resource "aws_glue_catalog_table" "table" {
  name          = "table_name"
  database_name = "db_name"

  storage_descriptor {
    dynamic "columns" {
      for_each = local.columns

      content {
        name = columns.key
        type = columns.value
      }
    }
  }
}

Alternatively you can run CLI with -o flag to set output file location:

pydantic-glue -f example.py -c Foo -o example.json -l

How it works?

  • pydantic gets converted to JSON Schema
  • the JSON Schema types get mapped to Glue types recursively

Future work

  • Not all types are supported, I just add types as I need them, but adding types is very easy, feel free to open issues or send a PR if you stumbled upon a non-supported use case
  • the tool could be easily extended to working with JSON Schema directly
  • thus, anything that can be converted to a JSON Schema should also work.

pydantic-glue's People

Contributors

svdimchenko avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.