Code Monkey home page Code Monkey logo

opensnowcat-enrich's Introduction

OpenSnowcat Enrich

GitHub Release main License

Welcome

The OpenSnowcat Enrich is an open-source fork of Snowplow Enrich following the license changes in 2023 and early 2024. We're looking to sustain an analytics platform for the many businesses dependent on the rights granted by the original Apache v2.0 License.

OpenSnowcat Enrich provides record-level enrichment only: feeding in 1 raw Snowplow event will yield exactly 1 record out, where a record may be an enriched Snowplow event or a reported bad record.

Project Resources

Security

If you discover a potential security issue in this project we ask that you notify OpenSnowcat Security directly via email to [email protected]. Please do not create a public GitHub issue.

License

This project is licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright

Copyright OpenSnowcat Contributors. See NOTICE for details.

Trademark

OpenSnowcat includes certain Apache-licensed Snowplow code from Snowplow Ltd. and other source code. Snowplow Ltd. is not the source of that other source code. SNOWPLOW is a registered trademark of Snowplow Ltd.

opensnowcat-enrich's People

Contributors

aalekh avatar aldemirenes avatar alexanderdean avatar alexitc avatar benfradet avatar benjben avatar bogaert avatar chuwy avatar dilyand avatar fblundun avatar istreeter avatar jbeemster avatar joaolcorreia avatar knservis avatar lmath avatar lukeindykiewicz avatar matus-tomlein avatar miike avatar misterpig avatar ninjabear avatar oguzhanunlu avatar peel avatar pondzix avatar ronnyml avatar rzats avatar spenes avatar stanch avatar szareiangm avatar voropaevp avatar wiringbits-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

opensnowcat-enrich's Issues

Check maxRecordSize when producing a JSON output by eventbridge and kinesis

Follow up from #1 and #4, I have noticed a problem with the JSON output produced by eventbridge and kinesis modules, JSON output could exceed the maxRecordSize value because the encoding is done without the validation.

A potential alternative for this is to set a featureFlag that defines whether the output will be encoded as TSV or JSON, this way, we could leverage this approach for all outputs, similarly, we could define another featureFlag that specifies whether the JSON output is flattened or not (or use a single flag that specifies the format).

Its ideal to cover this important functionality with tests.

See Enrich#serializeEnriched

Fix release workflow attachments

When executing the release workflow, the jar files must be attached to the github release.

We have tried running this workflow (https://github.com/opensnowcat/opensnowcat-enricher/actions/runs/7487179270/job/20379062445) and got a few warnings saying that the jar files can't be found:

πŸ€” Pattern 'modules/pubsub/target/scala-2.12/opensnowcat-enrich-pubsub-v0.0.1.jar' does not match any files.
πŸ€” Pattern 'modules/kinesis/target/scala-2.12/opensnowcat-enrich-kinesis-v0.0.1.jar' does not match any files.
πŸ€” Pattern 'modules/kafka/target/scala-2.12/opensnowcat-enrich-kafka-v0.0.1.jar' does not match any files.
πŸ€” Pattern 'modules/nsq/target/scala-2.12/opensnowcat-enrich-nsq-v0.0.1.jar' does not match any files.
πŸ€” modules/pubsub/target/scala-2.12/opensnowcat-enrich-pubsub-v0.0.1.jar,modules/kinesis/target/scala-2.12/opensnowcat-enrich-kinesis-v0.0.1.jar,modules/kafka/target/scala-2.12/opensnowcat-enrich-kafka-v0.0.1.jar,modules/nsq/target/scala-2.12/opensnowcat-enrich-nsq-v0.0.1.jar not include valid file.

This is likely caused by our partial renames introduced at #5 .

While the workflow succeeded, the github release does not have any jars attached.

Fix EventBridge module integration tests

Follow up from #3, #4 introduced a few important test cases that need more work in order to enable them (EnrichEventbridgeSpec.scala), let's fix and enable the tests.

Hint: It seems that the problem is that we are encoding the json with flattened fields which can't be parsed directly as an Event, we should find the way to do this.

Fix lacework.yml workflow

We tried running the workflow (see https://github.com/opensnowcat/opensnowcat-enricher/actions/runs/7487073724/job/20378732392?pr=7) but it failed with this error: ERROR: Authentication details are not configured.

Everything indicates that we need to create an account, then, fill the necessary secrets in github secrets:

Let's fix this workflow.

Experimental json output is not compatible with BigQuery

Hi, first of all thanks for your efforts in providing this repo and new features

I deployed 1.1.0 and using config

{
  ...
  "experimental": {
    "customOutputFormat": {
      "type": "FlattenedJson"
    }
  }
  ...
}

I am using pubsub bigquery subscription but the writing to bq does not work because of date/time/timestamp need to be integer in the JSON according to https://cloud.google.com/pubsub/docs/bigquery#date_time_int

could we plan a fix here. I am also open to contribute/open a PR if needed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.