Code Monkey home page Code Monkey logo

ro-create-table-from-sheet's Introduction

Import Semi-Structured Data from Google Sheets to Snowflake

Requirements

Background

Suppose you have a Google Sheet, accessible via a service account, that looks something like this:

id code date cost
1 abc 03/01/2019 100.00
2 xyz 04/01/2019 200.00

And you want to import this data into your data warehouse on a regular basis. Further suppose that this sheet's structure is likely to change frequently, with fields being added and removed, so you don't want to use a rigid schema.

On Snowflake, one possibility is to take advantage of variant, a semi-structured data type data type.

Using this module, the result of importing the sheet above would look like this:

source imported_at data
[worksheet name] [timestamp] {"id": 1, "code": "abc", "date": "2019-03-01", "cost": 100.00}
[worksheet name] [timestamp] {"id": 2, "code": "xyz", "date": "2019-04-01", "cost": 200.00}

Usage

To get set up:

  • Copy db.json.example to db.json
  • Edit db.json to contain your Snowflake connection information and credentials
  • Download your Google service account file
  • Find the ID of a sheet you'd like to import (and to which your service account has access)

You can then invoke this script:

python create_table_from_sheet.py
    --schema [destination_schema] --table [destination_table]
    --sheet [sheet_id]
    --service-account-file [path_to_service_file]
    --db-config [path_to_db_config_file]

If omitted, ./service-account.json and ./db.json are used as the default values for the service account file and DB config file respectively.

However, this will import id, date, and cost as strings containing the contents reflected in the sheet. You can use --coercions to specify that they should be interpreted specially:

python create_table_from_sheet.py
    # ... same as above ...
    --coercions '{"id": "int", "date": "date", "cost": "float"}'

This says that the column id should be interpreted as an integer, date as a date, and cost as a float.

By default, the first worksheet is imported, but you can specify a worksheet by name with the --worksheet argument.

There are also options --verbose (which will print the SQL generated) and --dry-run (which will read the sheet and generate the SQL, but not execute it).

Limitations

This script replaces the full table in the database every time it is run, so if historical information is removed from the sheet, it will be removed from the database too.

ro-create-table-from-sheet's People

Contributors

johnmastro avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.