Code Monkey home page Code Monkey logo

sojourn's Introduction

Sojourn Build Status Code Climate Test Coverage

Simple source & event tracking for Rails. This gem automatically tracks sojourners (i.e. unique visitors) based on:

  • Referer
  • UTM parameters
  • Browser (User Agent)
  • The currently logged-in user (i.e. current_user)
  • Various other request data

How It Works

Whenever a new visitor ("sojourner") arrives to the site, an event is tracked containing basic data about their browser and where they came from. Similar events are also tracked whenever a user logs in, logs out, or visits again from an external site. In addition, you can track a custom event anytime a visitor does something of interest to you.

Ultimately, rather than storing parts of the data in separate tables, all data is tracked in the form of events. Yep, events all the way down. (See 'Why Events?' below for the reasoning behind this.)

Sojourn assigns each "sojourner" a UUID, which is tracked across requests. All events are associated with this UUID and with the current user's ID (if logged-in). The current request is also assigned a UUID (which defaults to the X-Request-ID header).

Events consist of an event name (defining a collection of events), a session UUID, and a set of properties (key-value data) which includes information about the request. In the PostgreSQL implementation, we use a JSONB column to store the key-value data.

Usage

# Track a custom event (highly encouraged!):
sojourn.track! 'clicked call-to-action', plan_choice: 'enterprise'

# Read events using ActiveRecord
e = Sojourn::Event.last
e.name               # event name (e.g. 'clicked call-to-action')
e.sojourner_uuid     # uuid tracked across requests, stored in cookie
e.user               # User or nil
e.properties         # key-value hash (e.g. "{ plan_choice: 'enterprise' }")

# If you don't have access to a controller context (i.e. the event is not occurring during a web
# request), you can still track a raw event like this:
Sojourn.track_raw_event! 'subscription expired', plan: 'enterprise', customer_id: 'xyb123'

Default Events

The three built-in events are as follows:

'!sojourning' # The sojourner has arrived from an external source.
'!logged_in'  # The sojourner has logged-in.
'!logged_out' # The sojourner has logged-out.

A '!sojourning' event takes place whenever any of the following conditions is met:

  • The sojourner has never been seen before (i.e. direct traffic of some kind)
  • The referer is from an external source (i.e. not the current request.host)
  • The request contains tracked (utm-style) parameters. (These can be configured in the sojourn.rb initializer.)

Properties

In addition to properties that you manually add, events will automatically include data about the current web request. An example looks like this:

{
  "custom_property":"value",
  "request":{
    "uuid":"5e698f6ca74a016c49ca6b91a79cada7",
    "host":"example.com",
    "path":"/my-news",
    "controller":"news",
    "action":"index",
    "method":"get",
    "params":{
      "utm_campaign":"daily_updates",
      "page":"1"
    },
    "referer":"https://mail.google.com",
    "ip_address":"42.42.42.42",
    "user_agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.48 Safari/537.36"
  },
  "browser":{
    "bot":false,
    "name":"Chrome",
    "known":true,
    "version":"48",
    "platform":"mac"
  },
  "campaign":{
    "utm_campaign":"daily_updates"
  }
}

Installation

Add this line to your application's Gemfile:

gem 'sojourn'

And then execute:

$ bundle

To install migrations and the sojourn.rb initializer, execute:

$ rails g sojourn:install

Why Events? Why not track visits/visitors as their own objects?

The idea is that, at a certain scale, this kind of tracking should be dumped directly into append-only logs (or an event bus / messaging queue) for asynchronous processing.

This is made easier when everything can be represented, at a basic level, as a set of discrete events. In theory, it works with just about any data store, and makes for easy time series and funnel analysis. I'd like to move away from ActiveRecord at some point and open up the door for other, more horizontally scalable data backends, ideally with a focus on streaming data (e.g. Kafka combined with Samza or Storm).

An added benfit of storing the start of each visit as its own event in the series (i.e. the built-in !sojourning event) is that you can change the length of your visit window after the fact and re-run your analysis. The more traditional approach is to tag each event with some kind of incrementing visit ID, which forces you into defining what a "unique visit" means for your product before you've even collected any data.

Current Limitations (i.e. the 'todo' list)

  • Tested only on rails 3.2.18 and ruby 2.0.0 with ActiveRecord and PostgreSQL.
  • Assumes User and current_user convention for user tracking.
  • Assumes that if request.referer does not match request.host, the referer is external to your website.
  • Relies solely on cookies to track visitor UUID across requests (no JS, fingerprinting, etc)
  • Relies on ActiveRecord for storage. (At a bigger scale, append-only logs are preferred)

sojourn's People

Contributors

smudge avatar

Watchers

 avatar James Cloos avatar

sojourn's Issues

Defer 'sojourning!' event to subsequent request.

Use case: a tracking pixel external to the app.

New visitors viewing this pixel should not fire-off a 'sojourning!' event. (We can track a custom event instead). It is possible to skip the 'sojourning!' event by calling skip_before_filter :track_sojourning at the top of the controller, so this is fine.

However, when the user finally does visit the site, they have already been assigned a uuid and are therefore recognized as returning, so no 'sojourning!' event is fired. This is not fine.

Specs

Need to write specs so that I can more confidently make changes to the gem. Right now, I must very carefully test each change by hand, which is not going to work as the requirements get more complex.

Better "visit" tracking

There are two ways to track visits:

  1. Assign a visit ID to each visit, and tag all requests/events with that visit id. This requires defining what a "unique visit" means for your product in advance, but allows your data to be more compact, since you can discard data about uninteresting requests if they are within the visit window.
  2. Don't assign a visit ID in advance, but track interesting things as they happen and then group them into visits after the fact. This allows you to choose a different visit window and rerun your analysis, but requires that you store more data up front in case you decide to change your visit window later.

Sojourn takes the 2nd approach. Within that approach, there are two ways of collecting data:

  1. Store data about every web request and/or page view. Lots of data, but lots of flexibility.
  2. Only track requests if they meet a set of conditions that make them candidates for the start of a new visit.

Sojourn takes the 2nd approach here as well, because the first was too much data for the early implementation to handle. Right now this means that requests are marked as potential new visits when A) the visitor is totally new to the site, B) the visitor's referer is that of a different site, or C) the visitor's request has UTM parameters.

But what this doesn't capture is what happens when a visitor comes to the site, leaves for a long time, and comes back again.

Ideally we'd just start storing data about every web request, but if we can't do that, there should be a timeout between requests where any new requests should be marked as a potential new visit. This could be configurable but should default to something like a day or a week. The downside is that there is still potential to lose valuable data if you decide to change your visit window to be shorter than that timeout, but at least there is some flexibility beyond just having to choose one visit window and stick with it.

So, TLDR; Add a time-out between requests which would cause a new !sojourning event to be fired, even if there is no UTM or referer data that would otherwise prompt a new !sojourning.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.