Code Monkey home page Code Monkey logo

siemactk's Introduction

siemactk logo

Toolkit to support operations on sie-mac.org

Workflow

siemactk-workflow

  1. Get datasets urls and codelists from Google Spreadsheets.
  2. Scrap these datasets from https://ec.europa.eu/eurostat.
  3. Filter, clean & recode datasets.
  4. Generate translated output files in both .json and .tsv formats.
  5. Upload output files to a bucket in Google Cloud Storage (GCS).
  6. Notify via email the result of the automation process.

Thus, site sie-mac.org can report tables with wpDataTables plugin, sourcing from data on GCS.

Auth

This project uses Google Drive API and Google Cloud Storage behind the scenes. Both services need authentication.

Google Drive API

As siemactk uses yagdrive as a wrapper for the Google Drive API, you should follow these instructions in order to make it properly work.

At the end of the authentication process, two files must be saved in the present work directory:

  • gdrive-credentials.json
  • gdrive-secrets.json

Google Cloud Storage

Follow these instructions in order to get authentication for the Google Cloud Storage:

  1. Go to APIs Console and create a new project (or reuse an existing one).
  2. Go to IAM Administration inside your project and click on Service Accounts on the left hand side.
  3. Click on Create Service Account on the header of the page.
  4. Three steps must be completed:
    • Step 1: Give a name for this service account.
    • Step 2: Assign function "Owner" to completely access the project.
    • Step 3: You can leave it blank.
  5. Click on the newly created service account.
  6. Go to the "Keys" menu at the top.
  7. Click on the "Add Key" button and then select "Create new key".
  8. Leave "JSON" as the type of key and click on "Create".
  9. A file called <project>-<long_id>.json is downloaded.
  10. Rename this file to gcs-credentials.json and place it on the present work directory.

At the end of the authentication process, one file must be saved in the present work directory:

  • gcs-credentials.json

Setup

  1. Install dependencies:

    $ pip install -r requirements.txt
  2. Generate credentials for authentication.

  3. Create a .env file containing parameters with no default values on settings.py.

Usage

$ python main.py

GitHub Workflow

In order to automate the launching of this scraping, a GitHub workflow has been developed and scheduled every month.

Changelog

Consult the Changelog page for bugfixes and features in each version.

siemactk's People

Contributors

sdelquin avatar

Watchers

 avatar  avatar

Forkers

luisperez-evm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.