Code Monkey home page Code Monkey logo

dbt-snowplow-mobile's Introduction

Snowplow logo

Release Release activity Latest release Docker pulls Discourse posts License


As of January 8, 2024, Snowplow is introducing the Snowplow Limited Use License Agreement, and we will be releasing new versions of our core behavioral data pipeline technology under this license.

Our mission to empower everyone to own their first-party customer behavioral data remains the same. We value all of our users and remain dedicated to helping our community use Snowplow in the optimal capacity that fits their business goals and needs.

We reflect on our Snowplow origins and provide more information about these changes in our blog post here → https://eu1.hubs.ly/H06QJZw0


Overview

Snowplow is a developer-first engine for collecting behavioral data. In short, it allows you to:

Thousands of organizations like Burberry, Strava, and Auto Trader rely on Snowplow to collect, manage, and operationalize real-time event data from their central data platform to uncover deeper customer journey insights, predict customer behaviors, deliver differentiated customer experiences, and detect fraudulent activities.

Table of contents

Why Snowplow?

  • 🏔️ “Glass-box” technical architecture capable of processing billions of events per day.
  • 🛠️ Over 20 SDKs to collect data from web, mobile, server-side, and other sources.
  • ✅ A unique approach based on schemas and validation ensures your data is as clean as possible.
  • 🪄 Over 15 enrichments to get the most out of your data.
  • 🏭 Stream data to your data warehouse/lakehouse or SaaS destinations of choice — Snowplow fits nicely within the Modern Data Stack.

➡ Where to start? ⬅️

Snowplow Community Edition Snowplow Behavioral Data Platform
Community Edition equips you with everything you need to start creating behavioral data in a high-fidelity, machine-readable way. Head over to the Quick Start Guide to set things up. Looking for an enterprise solution with a console, APIs, data governance, workflow tooling? The Behavioral Data Platform is our managed service that runs in your AWS, Azure or GCP cloud. Book a demo.

The documentation is a great place to learn more.

Would rather dive into the code? Then you are already in the right place!


Snowplow technology 101

Snowplow architecture

The repository structure follows the conceptual architecture of Snowplow, which consists of six loosely-coupled sub-systems connected by five standardized data protocols/formats.

To briefly explain these six sub-systems:

  • Trackers fire Snowplow events. Currently we have 15 trackers, covering web, mobile, desktop, server and IoT
  • Collector receives Snowplow events from trackers. Currently we have one official collector implementation with different sinks: Amazon Kinesis, Google PubSub, Amazon SQS, Apache Kafka and NSQ
  • Enrich cleans up the raw Snowplow events, enriches them and puts them into storage. Currently we have several implementations, built for different environments (GCP, AWS, Apache Kafka) and one core library
  • Storage is where the Snowplow events live. Currently we store the Snowplow events in a flat file structure on S3, and in the Redshift, Postgres, Snowflake and BigQuery databases
  • Data modeling is where event-level data is joined with other data sets and aggregated into smaller data sets, and business logic is applied. This produces a clean set of tables which make it easier to perform analysis on the data. We officially support data models for Redshift, Snowflake and BigQuery.
  • Analytics are performed on the Snowplow events or on the aggregate tables.

For more information on the current Snowplow architecture, please see the Technical architecture.


About this repository

This repository is an umbrella repository for all loosely-coupled Snowplow components and is updated on each component release.

Since June 2020, all components have been extracted into their dedicated repositories (more info here) and this repository serves as an entry point for Snowplow users and as a historical artifact.

Components that have been extracted to their own repository are still here as git submodules.

Trackers

A full list of supported trackers can be found on our documentation site. Popular trackers and use cases include:

Web Mobile Gaming TV Desktop & Server
JavaScript Android Unity Roku Command line
AMP iOS C++ iOS .NET
React Native Lua Android Go
Flutter React Native Java
Node.js
PHP
Python
Ruby
Scala
C++
Rust
Lua

Loaders

Iglu

Data modeling

Web

Mobile

Media

Retail

Testing

Parsing enriched event


Community

We want to make it super easy for Snowplow users and contributors to talk to us and connect with one another, to share ideas, solve problems and help make Snowplow awesome. Join the conversation:

  • Meetups. Don’t miss your chance to talk to us in person. We are often on the move with meetups in Amsterdam, Berlin, Boston, London, and more.
  • Discourse. Our forum for all Snowplow users: engineers setting up Snowplow, data modelers structuring the data, and data consumers building insights. You can find guides, recipes, questions and answers from Snowplow users and the Snowplow team. All questions and contributions are welcome!
  • Twitter. Follow @Snowplow for official news and @SnowplowLabs for engineering-heavy conversations and release announcements.
  • GitHub. If you spot a bug, please raise an issue in the GitHub repository of the component in question. Likewise, if you have developed a cool new feature or an improvement, please open a pull request, we’ll be glad to integrate it in the codebase! For brainstorming a potential new feature, Discourse is the best place to start.
  • Email. If you want to talk to Snowplow directly, email is the easiest way. Get in touch at [email protected].

Copyright and license

Snowplow is copyright 2012-2023 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

dbt-snowplow-mobile's People

Contributors

agnessnowplow avatar bill-warner avatar emielver avatar georgewoodhead avatar paulboocock avatar rahul-snowplow avatar rlh1994 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbt-snowplow-mobile's Issues

Clarify that some tables are Redshift & Postgres only

Describe the feature

Some users are confused by the wording in the docs regarding the context tables for page views. Can we clarify the wording to ensure that it is clear that these tables are going to be processed/present in Redshift/Postgres only?

Tests sometimes fail on previous_session_id unique column

Describe the bug

Described in customer ticket, but somehow in the tracking the previous_session_id is sometimes not unique at a session and user table level (where we run the tests). We think this is where someone has ported files to a new phone but then continued to use both devices.

Steps to reproduce

N/A

Expected results

Tests pass

Actual results

Tests fail

Are you interested in contributing towards the fix?

Yes, will reduce the test severity to warn so that the models can still pass but alert people if it fails.

Integration tests do not run on Redshift

Describe the bug

The type for the context/unstruct columns does not work for redshift, the length need specifying

Steps to reproduce

Run int tests on redshift

Expected results

They work

Actual results

The seed fails

Screenshots and log output

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • databricks
  • other (specify: ____________)

Additional context

They just need changing to varchar(65535) instead

Are you interested in contributing towards the fix?

Will do with next useful release

Bump copyright to 2022

For the next release we should bump to Copyright notices to state 2022 rather than 2021.

Clean up the incremental_manifest table varchar logic

Describe the feature

Take advantage of the type_string functionality in the snowplow_utils package to clean up the incremental manifest table logic:

{# Redshift produces varchar(1) column. Fixing char limit #}
{% set type_string = dbt_utils.type_string() %}
{% set type_string = 'varchar(4096)' if type_string == 'varchar' else type_string %}

with prep as (
  select
    cast(null as {{ type_string }}) model,

to

with prep as (
  select
    cast(null as {{ snowplow_utils.type_string(4096) }}) model,

Add `query_tag` to Snowflake queries

Describe the feature

In order to tag Snowplow models as being run by Snowplow, we need to add a query tag in Snowflake. Can we have some functionality that allows this to be generated when running the dbt models in Snowflake but not in the other databases?

Fix user_stitching bug with differing column lengths

Describe the bug

The stitched_user_id column is being created as a varchar column of the same length as device_user_id. If the user_id is longer than the device_user_id then this leads to errors since we later update the value of stitched_user_id to that of user_id.

Steps to reproduce

Create dummy data where the domain_userid is a VARCHAR(6) and the user_id is VARCHAR(12).

Upgrade snowplow_utils package version

Describe the bug

This package uses an older version of snowplow_utils which will cause dependancy conflicts if using other snowplow packages which depend on different versions of snowplow_utils in your dbt project.

Steps to reproduce

  1. Use both the latest mobile and web packages in your dbt project packages.yml file:
packages:
  - package: snowplow/snowplow_web
    version: 0.9.2
  - package: snowplow/snowplow_mobile
    version: 0.5.3
  1. Run dbt deps.

Expected results

No error, all packages install successfully.

Actual results

Version error for package snowplow/snowplow_utils: Could not find a satisfactory version from options: ['>=0.12.0', '<0.13.0', '>=0.11.0', '<0.12.0']

Additional context

Issue first raised in discourse.

Fix unstruct_events

Describe the bug

Currently the unstruct_event_com_snowplowanalytics_mobile_screen_view_1 is treated as an array just like similar context fields such as the contexts_com_snowplowanalytics_mobile_screen_1. This, however, produces a Field name should be String Literal, but it's 0 error when the snowplow_mobile_base_events_this_run is executed. To resolve this, [0] needs to be removed during field extraction. Example: a.unstruct_event_com_snowplowanalytics_mobile_screen_view_1[0].id::STRING AS screen_view_id

Drop support for dbt versions below 1.3

Describe the feature

dbt-core v1.3.x is a decent shift away from previous versions of dbt in terms of functionality and macros, so we need to prepare our mobile package to be able to support this version and drop support for lower versions.

Dependency

We need to wait for dbt-utils v1.0 to be released, then update snowplow-utils to support that version of dbt-utils, and then build on top of that later snowplow-utils release

Update docs

Describe the feature

Update docs ahead of the 0.3.0 release

Typo in models/base/scratch/base_scratch.yml, line 46

Describe the bug

Typo in file models/base/scratch/base_scratch.yml, line 46: screescreen_top_view_controllern_type

Steps to reproduce

Expected results

Actual results

Screenshots and log output

image

System information

The contents of your packages.yml file:

# contents goes here

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • databricks
  • other (specify: ____________)

The output of dbt --version:

<output goes here>

The operating system you're using:

The output of python --version:

Additional context

Are you interested in contributing towards the fix?

Drop support for dbt versions before 1.0.0

Describe the feature

In order to drop deprecation warnings and to not worry about backwards compatibility, we need to remove support for older versions of dbt and start supporting dbt v1.0.0 onwards.

Add automated testing to mobile model

Describe the feature

We should add automated tests to the repository such that on every PR the tests run to ensure that we are not making breaking changes.

Who will this benefit?

This will benefit all developers of future versions of the mobile package, as well as all reviewers as there will be some checks in place to ensure that changes don't deviate from expected behaviour

Update documentation

For the release of snowplow-mobile v0.2.0 we will need to:

  • Update the READMEs
  • Update the docsite
  • Regenerate the docsite

Add unique indexes to derived tables

Describe the feature

We use the unique_key parameter in the config of many derived tables (see here), but this seems to have no actual effect outside of dbt's snapshot functionality (see here). We should use indexes to actually add a unique key index to Postgres to improve performance (see here).

So in short, we should remove the unique_key parameter from the tables, and introduce indexes with the unique key index in our derived tables.

Sessions depends on user_mapping even when not enabled

Describe the bug

If you have user_stitching disabled, you still have to keep the user_mapping table enabled as the sessions table depends directly on this by the ref call in the arguments.

Steps to reproduce

  1. Create project with mobile package as dependency.
  2. Disable user sitching
  3. Disable the user_mapping model
  4. Run dbt compile/docs/whatever

Expected results

Will compile/run and not see dependency on user_mapping to sessions

Actual results

Will not compile, see dependency

Screenshots and log output

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • databricks
  • other (specify: ____________)

The output of dbt --version:

The operating system you're using:

The output of python --version:

Additional context

Are you interested in contributing towards the fix?

Sure, why not

Optimize performance in Databricks for incremental models

Describe the feature

We need to optimize performance in Databricks. We can leverage the optimizeWrite and optimizeCompact table properties in Databricks to achieve this, and we should focus on all incremental (Snowplow or otherwise) tables that are being generated from the Snowplow models.

Fix dbt-spark incompatibility for Databricks support

Describe the bug

Since dbt Cloud uses the dbt-spark adapter for Databricks which does not support the Unity Catalog, but dbt-databricks 1.1.1+ does support the unity catalog, we need to release a fix for the web package to enable both version to be able to work in tandem.

Update datediff based filters to timestamp based

Datediff based filters which calc the number of days between two timestamps and filter accordingly, produce counterintuitive results. An example of such a filter:

where {{ snowplow_utils.timestamp_diff('e.collector_tstamp', 'str.start_tstamp', 'day')}} <= {{ var("snowplow__max_session_days", 3) }}
and {{ snowplow_utils.timestamp_diff('e.dvce_created_tstamp', 'e.dvce_sent_tstamp', 'day') }} <= {{ var("snowplow__days_late_allowed", 3) }}

When inputting a timestamp into the datediff function while using date based datepart (year, month, week, day), the time component of the timestamp is effectively ignored. An example of the problem:

with prep as (
select cast('2021-01-03 13:00:00' as timestamp) as dvce_created_tstamp, cast('2021-01-06 13:00:10' as timestamp) as dvce_sent_tstamp 
)

select 
    dvce_created_tstamp,
    dvce_sent_tstamp,
    datediff(day, dvce_created_tstamp, dvce_sent_tstamp) as date_diff_calc,
    case when datediff(day, dvce_created_tstamp, dvce_sent_tstamp) > 3 then true else false end late_arriving_day_comparison,
    case when dvce_sent_tstamp > dateadd(day, 3, dvce_created_tstamp) then true else false end late_arriving_tstamp_comparison
    
from prep

The data point should be classified as late arriving since the dvce_sent_tstamp is more than 3 days after then dvce_created_tstamp. datediff misclassifies this whereas when comparing two timestamps (by using dateadd) we get the intended result.

Models including such filters:

  • snowplow_mobile_base_events_this_run

Add support for Redshift

Describe the feature

Add support for the dbt mobile model on Redshift.

Additional context

This is only to support Redshift for the dbt-snowplow-mobile model

Who will this benefit?

Beta testers of the Redshift dbt-snowplow-mobile model

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.