Code Monkey home page Code Monkey logo

shiftleft-tinkergraph-gremlin's Introduction

Build Status Maven Central

ShiftLeft TinkerGraph

This is Fork of Apache TinkerGraph that uses uses 70% less memory (for our use case, ymmv) and implements a strict schema validation. Related blog article on ShiftLeft Blog

Usage

  1. add a dependency to the latest published artifact on maven central
  2. extend SpecializedTinkerVertex for vertices and SpecializedTinkerEdge for edges
  3. create instances of SpecializedElementFactory.ForVertex and SpecializedElementFactory.ForEdge and pass them to TinkerGraph.open

The repository contains examples for the grateful dead graph and there is a full test setup that uses them. 2) and 3) are basically boilerplate and therefor good candidates for code generation.

Other than that, it's a minimally invasive operation, because all other graph and traversal APIs remain the same, i.e., you won't need to change any of your queries. We didn't encounter a single issue when we deployed this into production.

Motivation and context

The main difference is that instead of generic HashMaps we use specific structures as per your domain. To make this more clear, let's look at the main use cases for HashMaps in TinkerGraph:

  1. allow any vertex and any edge to have any property (basically a key/value pair, e.g., foo=42). To achieve this, each element in the graph has a Map<String, Property>, and each property is wrapped inside a HashMap$Node, see TinkerVertex and TinkerEdge.
  2. TinkerGraph allows to connect any two vertices by any edge. Therefor each vertex holds two Map<String, Set<Edge>> instances (one for incoming and one for outgoing edges), where the String refers to the edge label.

Being generic and not enforcing a schema makes complete sense for the default TinkerGraph - it allows users to play without restrictions and build prototypes. Once a project is more mature though, chances are you have a good understanding of your domain and can define a schema, so that you don't need the generic structure any more and can save a lot of memory.

Using less memory is not the only benefit, though: knowing exactly which properties a given element can have, of which type they are and which edges are allowed on a specific vertex, helps catching errors very early in the development cycle. Your IDE can help you to build valid (i.e., schema conforming) graphs and traversals. If you use a statically-checked language, your compiler can find errors that would otherwise only occur at runtime. Even if you are using a dynamic language you are better off, because you'll get an error when you load the graph, e.g., by setting a property on the wrong vertex type. This is far better than getting invalid results at query time, when you need to debug all the way back to a potentially very simple mistake. Since we already had a loosely-defined schema for our code property graph, this exercise helped to complete and strengthen it.

What does this mean in practice?

'Enforcing a strict schema' actually translates to something very simple: we just replaced the generic HashMaps with specific members:

  1. Element properties: vertices and edges contain generic HashMap<String, Object> that hold all the element's properties. We just replaced them with specific class members, e.g., String name and String return_type

  2. Edges on a vertex: the generic TinkerVertex contains two HashMap<String, Set<Edge>> in|outEdges which can reference any edge. We replaced these by specific Set<SomeSpecificEdgeType> for each edge type that is allowed to connect this vertex with another vertex.

This means that we can throw an error if the schema is violated, e.g., if a the user tries to set a property that is not defined for a specific vertex, or if the user tris to connect a vertex via an edge that's not supposed to be connected to this vertex. It is important to note though, that it's up to you if you want to make this a strict validation or not - you can choose to tolerate schema violations in your domain classes.

Limitations

  • indices aren't updated automatically when you mutate or add elements to the graph. This would be easy to do I guess, but we haven't had the need yet. Workaround: drop and recreate the index.
  • an OLAP (GraphComputer) implementation is available, but we haven't really tested it yet
  • you cannot (yet) mix generic and specialized Elements: it's all or nothing, and you'll get an error if you accidentally try

Bring in changes from upstream TinkerGraph

When a new Apache TinkerGraph is being released, here's the steps to bring them into this fork:

# view diff
cd ~/Projects/tinkerpop/tinkerpop3
git diff 3.3.2..3.3.3 tinkergraph-gremlin/src > ~/tp-upgrade.patch
# apply patch (-p2 strips the base directory, which is different in our fork)
cd ~/Projects/shiftleft/tinkergraph-gremlin
git apply -p2 ~/tp-upgrade.patch
# manually fix all conflicts (*.orig / *.rej files)
# update all versions in pom.xml
mvn clean test

Release instructions

  • change the version in pom.xml to a non-snapshot (e.g. 3.3.0.3)
  • commit and tag it (e.g. v3.3.0.3), push everything (including the tag!)
  • await Travis to automatically deploy the tagged version to sonatype and stage it so that it'll be synchronized to maven central within a few hours. Note: check the log output of the last travis step ($ ./travis/deploy.sh) to be sure. You should see something like the following at the very end:
[INFO] Remote staged 1 repositories, finished with success.
[INFO] Remote staging repositories are being released...
Waiting for operation to complete...
............
[INFO] Remote staging repositories released.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------

  • change the version to the next snapshot (e.g. 3.3.0.4-SNAPSHOT)

shiftleft-tinkergraph-gremlin's People

Contributors

davyhua avatar fabsx00 avatar hubertp avatar jeremysears avatar ml86 avatar mpollmeier avatar robertdale avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.