Code Monkey home page Code Monkey logo

clive's Introduction

#Technical mapping generation Known issues

  • When both "oneOf" and "properties" have the same property it results in generating two or more target columns with the same name. After generating SQL we can see unnecessary GET_JSON_OBJECT with erronous $. json path

  • TYPE - All target types are string

  • ISO timestamp strings to TIMESTAMP conversion -

  • Are not converted into timestamp. TIMESTAMP type Is not handled properly as during export from mongo there is an implicit conversion to struct, and another transformation is made to sanitise JSON so that HIVE can use ingest: $ -> d_data

  • INT into DATE conversion - Is not supported

  • Mongo _id - Schema doesn't provide mongo id. ids handled by Mongo

  • COLUMN NAMES - Column names are generated automatically and camel case, - they have to be changed manually if necessary

  • implicit conversions {"d_date":"2017-07-10T10:52:12.192Z"} - duting import to hdfs {"d_numberlong":"40846"} - during mongoexport

Testing

N/A

Many tables derived from single schema - Accepted-data database

Accepted-data collections use the same schema urn:jsonschema:uk:gov:dwp:universe:accepted:data:AcceptedData. The only difference is that oneOf found in urn:jsonschema:uk:gov:dwp:universe:claim:ClaimElement. The collection_to_schema_mapping configuration maps collections to schema for example: collection 'address' to schema 'uk.gov.dwp.universe.claim.address.Address.json'.

Complex schema convert into many tables from different location

Databases that use the above steps to generate SQL anf CSV files

accepted-data

advances

agent-core

core

matchingservice

Penalties-and-deductions

I jive accepts one location per mapping file hence we split mappings into individual file. Generate splittable mapping $ mongo -quiet DATABASE_NAME.js > DATABASE_NAME.csv

II Split mapping in folder from DATABASE_NAME.csv $ cd temp $ ../toolkit.py -p ../DATABASE_NAME.csv --split-mappings

III Generate sql from mappings from folder $ cd .. $ ./toolkit.py --generate-sql -o DATABASE_NAME.sql -p temp

IV Combine one mapping from all mappings in a folder $ ./toolkit.py --combine-mapping-files -p temp -o DATABASE_NAME.csv

$ jive -tm appointments.csv -l /etl/uc/mongo/${hiveconf:BATCH_DATE} -orc > appointments.sql

#Database that don't require any additional modeling

appointments

Complex schema convert into many tables from the same data location

##agenttodo ##agenttotoarchive ##core-todo ##core-journal

steps

I Generate schema II Generate where caluses III Generate SQL

#CLIVE list of commented tables journal_AnnualVerificationJournalEntry todo_REPORT_SELF_EMPLOYMENT_EARNINGS_PROPERTIES claimantCommitment agent_todo_archive_VERIFY_SOCIAL_HOUSING_PROPERTIES

CLIVE list of co,emted views

claimantCommitment_no_pii

clive's People

Contributors

alpha-cluster-ofgem avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.