Code Monkey home page Code Monkey logo

theovankraay / azure-cosmos-db-graph-npm-bom-sample Goto Github PK

View Code? Open in Web Editor NEW

This project forked from azure-samples/azure-cosmos-db-graph-npm-bom-sample

0.0 1.0 0.0 6.16 MB

Sample application describing a Bill of Materials scenario through an NPM (Node package manager) dependency explorer solution. Code by Chris Joakim (https://github.com/cjoakim/).

License: MIT License

JavaScript 80.44% PowerShell 1.90% Shell 4.25% CSS 3.39% HTML 10.02%

azure-cosmos-db-graph-npm-bom-sample's Introduction

azure-cosmos-db-graph-npm-bom-sample

An example Bill-of-Material application with NPM data using Azure CosmosDB Graph/Gremlin database.

cosmosgraph

Created By

  • Chris Joakim, Microsoft, Azure Cloud Solution Architect, Charlotte
  • Luis Bosquez, Microsoft, Azure CosmosDB Program Manager, Redmond

NPM as the Data Source

In our work with Azure CosmosDB we've seen that Bill-of-Materials (BOM) is a common use-case for companies, especially in the manufacturing sector. The graph of their manufactured products, and their many nested components, is perfectly suited for CosmosDB with the Gremlin Graph API.

Industry and company-specific product and component data, however, is both proprietary as well as not immediately relatable to most readers. We wanted to create a BOM sample application with data that was immediately relatable to most Information Technology audiences. Therefore, we chose the domain of software, since software end-products are typically composed of a nested graph of software libraries (i.e. - manufacturing components), and IT audiences innately understand this.

We considered using NuGet (DotNet), MavenCentral (Java), and PyPI (Python) as the datasource. But we chose
npm (Node Package Manager) in the Node.js and JavaScript ecosystem as Node.js is fast-growing, appeals to a wide-audience, has great CLI tooling, and CosmosDB itself is JavaScript and JSON oriented.

Given the NPM orientation of this project, Node.js and JavaScript was chosen as the implementation language. JavaScript is widely supported in Azure PaaS services - such as Azure App Service, Azure Functions, and even in CosmosDB for server-side Stored Procedures, Triggers, and UDFs. It is worth noting that the free and cross-platform Visual Studio Code editor was used for the development of this project. Visual Studio Code is itself implemented in Node.js.

Architecture

This application uses a Azure CosmosDB account, with the Gremlin API, as its sole datastore. There is a batch process which "spiders" npm for information about npm libraries, wrangles this JSON data, and then loads it into CosmosDB. There is also a web application, in the webapp/ directory, built with Node.js and Express which queries and displays the CosmosDB data. The open-source JavaScript library D3.js is used to visualize the graph data.

The database design includes two graphs, or containers. One contains the Graph data, with the Vertices being the NPM Libraries and their Maintainers, with Edges connecting libraries to their dependent libraries. Edges also connect the Maintainers to their respective libraries.

The second container in an implementation of the concept of materialized views; a set of data pre-aggregated and pre-processed so as to enable faster queries at runtime. For example, some of the pre-aggregated data answers the question: "Where else is this library/component used?" and "What other packages does this Maintainer work on?". Additionally, the materalized views contain pre-calculated library ages - in days and years, based on their original and current version dates. Pre-aggregating data such as this can significantly reduce the RU usage for CosmosDB users, and thus is why we are featuring materialized views in this sample BOM project.

What's very interesting about the materialized views in this project is that they are accessed via the CosmosDB SQL API rather than the Gremlin API. The materialized views are queried efficiently in this project via their partition key attribute whose name is simply 'pk'. This is actually a best practice - to name your partition key attributes with a generic name like 'pk' or 'partition_key' rather than a given business-oriented attribute name.

This is currently the only case where a single Azure CosmosDB account can be accessed via two programatic APIs; in this case a Gremlin account accesssed via the Gremlin and SQL APIs.

The advantage of this approach is that your BOM data is in one database, with independent and independently scalable graph and view collections. It enables expressive graph traversal via the Gremlin API, and also very efficient
queries via the SQL API.

See file webapp/dao/cosmosdb_dao.js which implements the DAO Design Pattern for both the Gremlin and SQL APIs.


Links


Azure Setup

Provision an Azure CosmosDB instance, in your subscription, which uses the Gremlin API.

Then create a new collection in your CosmosDB Graph database, as shown below. A database named dev with collection named npm is recommended. Specify a partition key named /pk and 10,000 RUs.

provision-gremlin-collection

Then go to the Keys panel, as shown below, and set the following environment variables on your computer based on the values you see in Azure Portal.

gremlin-keys-panel

Note, the values shown below are just examples; your values will be different.

AZURE_COSMOSDB_GRAPHDB_ACCT=cjoakimcosmosdbgremlin
AZURE_COSMOSDB_GRAPHDB_COLNAME=npm
AZURE_COSMOSDB_GRAPHDB_CONN_STRING= ...secret...
AZURE_COSMOSDB_GRAPHDB_DBNAME=dev
AZURE_COSMOSDB_GRAPHDB_GRAPH=npm
AZURE_COSMOSDB_GRAPHDB_VIEWS=views
AZURE_COSMOSDB_GRAPHDB_KEY= ...secret...
AZURE_COSMOSDB_GRAPHDB_URI=https://cjoakimcosmosdbgremlin.documents.azure.com:443/

PORT=3000  (Also add this environment variable for the localhost webserver port)

Batch Processing Overview

The batch processing does the following:

  1. Starts with a hand-edited list of seed npm libraries that are interesting to you.

  2. Programatically invoke the npm cli to recursively Spider npm for information about each library.

    • The spider process starts with your hand-edited list of seed npm libraries
    • The spider will iterate n-number of times to get the dependencies of those seed libraries
    • Then dependencies of those libraries, and their dependencies, etc, etc
    • The command npm view library -json is executed for each library and the JSON response is captured
  3. Wrangle the JSON files for each library that are captured in the Spidering process.

  4. Generate Gremlin load statements, from the Wrangled data, to insert the Vertices and Edges for the npm graph.

    • The Vertices are the npm libraries as well as their Maintainers
    • Edges connect one library to another in a uses or used_by relationship
    • Edges also connect the Maintainers to each Library they maintain
    • Currently there isn't a knows Edge from one Maintainer to another within a Library.
  5. Load the Azure CosmosDB/Graph database from the generated Gremlin statements

Batch Processing Detail

Since npm and thus JavaScript is the subject of this Graph, the implementation code is Node.js. This Node.js code is portable to Windows, Linux, and macOS. Both Linux and macOS bash shell scripts (.sh) and Windows PowerShell Scripts (.ps1) are provided in this repo.

First clone this repository and install the npm libraries necessary for this project in the project root directory.

$ git clone [email protected]:Azure-Samples/azure-cosmos-db-graph-npm-bom-sample.git

$ cd azure-cosmos-db-graph-npm-bom-sample

$ mkdir tmp

$ npm install 

Edit file *seeds.txt, the execute the following:

$ node main.js seed2json

This creates file data/seed_libraries.json

Then execute the npm "Spidering" process, with 10 iterations.

$ ./spider_npm.sh

The above Spidering process will take roughly 10-minutes to execute, depending on the number of seed libraries and your network bandwidth.

Then execute the data-wrangling and gremlin-statement-generation process:

$ ./wrangle_npm_data.sh

Note that the Spidering process is intentionally decoupled from the Wrangling process, and that intermediate files are produced by the Wrangling process to increase clarity and understanding.

Finally, load your Azure CosmosDB Graph database, npm collection, with the generated file data/gremlin/gremlin_load_file.txt.

$ ./load_gremlin_graph.sh

Also load the Azure CosmosDB Graph database, views collection, with the materialized views.

$ ./load_materialized_views.sh

What do the Materialized View documents look like:

For Libraries:

  {
    "name": "express",
    "desc": "Fast, unopinionated, minimalist web framework",
    "keywords": [
      "express",
      "framework",
      "sinatra",
      "web",
      "rest",
      "restful",
      "router",
      "app",
      "api"
    ],
    "dependencies": {
      "accepts": "~1.3.7",
      "array-flatten": "1.1.1",
      "body-parser": "1.19.0",
      "content-disposition": "0.5.3",
      "content-type": "~1.0.4",
      "cookie": "0.4.0",
      "cookie-signature": "1.0.6",
      "debug": "2.6.9",
      "depd": "~1.1.2",
      "encodeurl": "~1.0.2",
      "escape-html": "~1.0.3",
      "etag": "~1.8.1",
      "finalhandler": "~1.1.2",
      "fresh": "0.5.2",
      "merge-descriptors": "1.0.1",
      "methods": "~1.1.2",
      "on-finished": "~2.3.0",
      "parseurl": "~1.3.3",
      "path-to-regexp": "0.1.7",
      "proxy-addr": "~2.0.5",
      "qs": "6.7.0",
      "range-parser": "~1.2.1",
      "safe-buffer": "5.1.2",
      "send": "0.17.1",
      "serve-static": "1.14.1",
      "setprototypeof": "1.1.1",
      "statuses": "~1.5.0",
      "type-is": "~1.6.18",
      "utils-merge": "1.0.1",
      "vary": "~1.1.2"
    },
    "devDependencies": {
      "after": "0.8.2",
      "connect-redis": "3.4.1",
      "cookie-parser": "~1.4.4",
      "cookie-session": "1.3.3",
      "ejs": "2.6.1",
      "eslint": "2.13.1",
      "express-session": "1.16.1",
      "hbs": "4.0.4",
      "istanbul": "0.4.5",
      "marked": "0.6.2",
      "method-override": "3.0.0",
      "mocha": "5.2.0",
      "morgan": "1.9.1",
      "multiparty": "4.2.1",
      "pbkdf2-password": "1.2.1",
      "should": "13.2.3",
      "supertest": "3.3.0",
      "vhost": "~3.0.2"
    },
    "author": "TJ Holowaychuk <[email protected]>",
    "users": {
      "422303771": true,
      "coverslide": true,
      "gevorg": true,
       ... many users ...
      "payaamemami": true,
      "pvoronin": true,
      "spaceface777": true
    },
    "contributors": [
      "Aaron Heckmann <[email protected]>",
      "Ciaran Jessup <[email protected]>",
      "Douglas Christopher Wilson <[email protected]>",
      "Guillermo Rauch <[email protected]>",
      "Jonathan Ong <[email protected]>",
      "Roman Shtylman <[email protected]>",
      "Young Jae Sim <[email protected]>"
    ],
    "maintainers": [
      "dougwilson <[email protected]>",
      "jasnell <[email protected]>",
      "mikeal <[email protected]>"
    ],
    "version": "4.17.1",
    "versions": [
      "0.14.0",
      "0.14.1",
      "1.0.0",
      "1.0.1",
      "1.0.2",
      "1.0.3",
      ... many versions...
      "3.21.0",
      "3.21.1",
      "3.21.2",
    ],
    "time": {
      "modified": "2019-05-28T18:15:26.253Z",
      "created": "2010-12-29T19:38:25.450Z",
      "0.14.0": "2010-12-29T19:38:25.450Z",
      "0.14.1": "2010-12-29T19:38:25.450Z",
      ... many versions ...
      "4.17.1": "2019-05-26T04:25:34.606Z"
    },
    "homepage": "http://expressjs.com/",
    "user_count": 2556,
    "dependencies_count": 30,
    "maintainers_count": 3,
    "versions_count": 263,
    "usage_count": 1,
    "used_in": [],
    "version_date": "2019-05-26T04:25:34.606Z",
    "created_date": "2010-12-29T19:38:25.450Z",
    "created_epoch": 1293651505450,
    "version_epoch": 1558844734606,
    "library_age_days": 3090,
    "version_age_days": 20,
    "pk": "express",
    "key": "express",
    "doctype": "library"
  }

For Maintainers:

  {
    "email": "<[email protected]>",
    "libs": [
      "basic-auth",
      "better-assert",
      "bytes",
      "callsite",
      "commander",
      "component-emitter",
      "cookie-signature",
      "debug",
      "delegates",
      "escape-html",
      "growl",
      "indexof",
      "merge-descriptors",
      "methods",
      "object-component",
      "range-parser",
      "statuses",
      "throttleit"
    ],
    "pk": "tjholowaychuk",
    "key": "tjholowaychuk",
    "doctype": "maintainer"
  }

Web Application

The Web Application for this project is implemented with Node.js and the Express web framework. D3.js is used in the client-side browser code for Graph Visualization.

$ cd webapp

$ npm install

$ ./webserver.sh
    ...
    Express server listening on port 3000
    ...

Then visit http://localhost:3000/ with your browser.

Web App Screen Shots

Splash Screen

splash-screen


Bill-of-Material View

bom-view


Library View

library-view


Maintainer View

maintainer-view


Gremlin Queries

g.V().count()

g.V(["tcx-js","tcx-js"])
g.V(["tedious","tedious"])
g.V(["express","express"])

g.V(["tcx-js", "tcx-js"]).emit().repeat(outE("uses_lib").inV()).times(16).path().by("id")
g.V(["express", "express"]).emit().repeat(outE("uses_lib").inV()).times(16).path().by("id")

g.V(["MAINT-cjoakim","MAINT-cjoakim"])
g.V(["MAINT-luisbosquez","MAINT-luisbosquez"])
g.V(["MAINT-tjholowaychuk","MAINT-tjholowaychuk"])

azure-cosmos-db-graph-npm-bom-sample's People

Contributors

cjoakim avatar microsoftopensource avatar msftgits avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.