Code Monkey home page Code Monkey logo

segment.io-avro's Introduction

Segment Spec Avro Implementation

This project provides a full implementation of the data-types defined in the Segment Spec in AVRO (1.8.x).

Building

To build the AVRO classes run:

$ mvn compile

Testing

To execute all tests run:

$ mvn clean test

Adding Additional Test Cases

The test framework by default will try to find a valid AVRO schema for each event (.json file) that is placed in ./testCases and check that serializing the given JSON to AVRO and subsequent deserialization of the AVRO encoded data happens without data loss.

If you wish to add a new test event, simple put it into a new file in ./testCases and ending in .json and the integration tests will pick it up automatically.

segment.io-avro's People

Contributors

original-brownbear avatar acmeguy avatar

Watchers

Gudjon Mar Gudjonsson avatar  avatar Halli avatar James Cloos avatar  avatar  avatar

segment.io-avro's Issues

Dry up Common Fields

There is a lot of code duplication across all schemata coming from the common fields. Since we don't have inheritance in AVRO their code is currently duplicated across all other schemata.

  • Question 1: Do we want to fix this?
  • Question 2: If yes, what kind of fix do we want?
    • I see two possible approaches here:
      • If we want to keep the 1:1 correspondence between JSON and AVRO field hierarchy, then we could try to generate the AVRO sources before actually compiling them with the AVRO compiler. Concretely, we could have a placeholder {{common_fields}} or so that we can put in an avsc file and have Maven replace it with the common field code before compilation.
      • Better option: If we can live with having the AVRO and JSON hierarchy no correspond 1:1 then we could simply create a record schema CommonFields that has all the common fields in it and put it at the top level of all the other schemata under a key common_fields or so.

How to Handle Spaces in Field Names?

Avor can't handle spaces in field names, JSON can.
So far i have only found one example of this being a problem in the Group example that contains a field "total billed" which is a valid JSON but not a valid Avro field name.

I handled this by encoding any space as __escape_space__ in AVRO for now to make tests pass.
The other approach would be to keep a dictionary of these fields instead and replace from that (the upside of this is nicer AVRO field names, the downside is that it's not entirely trivial as that dictionary would have to be path aware, taking away a lot of the simple logic that Jackson gives us for converting JSON to Avro).

Heterogeneous Style for Playback Event Ok?

There is a bit of a difference between VideoPlaybackEvent and e.g. the various mobile events work.

While the mobile events had different properties, the VideoPlaybackEvents have all the same properties across their various event types.
This initially prompted me to just implement a single event for all of them with an enum to describe the different kinds:

    {
      "name": "event",
      "type": {
        "type": "enum",
        "name": "PlaybackEventType",
        "symbols": [
          "VIDEO_PLAYBACK_STARTED",
          "VIDEO_PLAYBACK_PAUSED",
          "VIDEO_PLAYBACK_INTERRUPTED",
          "VIDEO_PLAYBACK_BUFFER_COMPLETED",
          "VIDEO_PLAYBACK_SEEK_STARTED",
          "VIDEO_PLAYBACK_SEEK_COMPLETED",
          "VIDEO_PLAYBACK_RESUMED",
          "VIDEO_PLAYBACK_COMPLETED"
        ]
      }
    }

Question here:

Is this a desired solution or should I split this up into different classes for each event type?

  • I think it's the driest we can do here, but it also requires custom logic for translating the JSON into Avro here in tests as well as in future production code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.