Code Monkey home page Code Monkey logo

save-analytics-from-content-blockers's Introduction

Google Tag Manager (Google Analytics) Proxy

A proxy back end for Google Analytics / Google Tag Manager, which allows to avoid ad-blockers blocking client-side analytics tools.

docker pull zitros/analytics-saviour

Note: this repository is an open-sourced alternative (former PoC) of dataunlocker.com. DataUnlocker is a fully-managed solution (SaaS) for fixing ad blockers' impact on the client-side analytics tools such as Google Analytics, Google Tag Manager, Segment, Facebook Pixel and any other client-side tools, also without code changes in your web application. DataUnlocker uses a very different (and a better) approach to what this repository offers.

Originally introduced as an example in the article "How to prevent your analytics data from being blocked by ad blockers", this open-sourced application is now a complete stand-alone solution for Google Tag Manager and Google Analytics.

How It Works

Google Tag Manager (or plain Google Analytics) is a set of scripts used on the front end to track user actions (button clicks, page hits, device analytics, etc). Google's out-of-the-box solution works well, however, almost all ad-blocking software block Google tag manager / Google analytics by default. Hence, companies that are just on their start may loose a big portion of valuable information about their customers - how to they use the product? What do they like/dislike? Where do they stuck? And so on - an individual precision in analytics is crucial to understand the behavior of users.

In order to solve ad-blocking issues, we have to introduce a proxy which will forward front-end requests to Google domain through our own domain. Also, we have to modify Google Tag Manager scripts "on-the-fly" to request our own domains instead of Google's ones, because all ad-blocking software block requests to domains (and some particular URLs!) which they have in their filters. Furthermore, some requests require additional data modifications which can't be done using standard proxying.

The next diagram demonstrates the problem with Google Tag Manager / Google Analytics being blocked by ad blockers.

Google Tag Manager Proxy - Without Proxy

In general, all ad blocks work the same way: they block requests to Google Analytics servers and some URLs which match their blacklists. In order to avoid blocking Google analytics, all such requests must be proxied through URLs that aren't blacklisted. Furthermore, some URLs have to be masked in order for ad-blocker not to recognize the URL.

Thus, this proxy service:

  1. Works as a proxy for configured domains (see below).
  2. Modifies the response when proxying scripts to replace Google domains with custom ones.
  3. Modifies the response and replaces URLs containing blacklisted paths like /google-analytics.
  4. Modifies proxied request to Google Measurement Protocol and overwrites user's IP address.

Google Tag Manager Proxy - With Proxy

This repository contains a NodeJS-based proxy server which does the smart proxying magic for you. All you need is to run this proxy server on your end and figure out how to combine it with your application. Read more on this below.

Technically, NodeJS proxy API works as follows:

  1. Request to / returns sample application (see src/static-test/index.html) if enabled (see config).
  2. Request to /domain-name-or-masked-name/* proxies requests to domain-name-or-masked-name with path *.
  3. You can run the application using npm install && npm run start and request http://localhost/www.googletagmanager.com/gtag/js?id=GTM-1234567 (replace GTM-1234567 with your GTM tag). That's it!

Prerequisites

In order to enable analytics proxying, you have to perform some DevOps in your infrastructure. Assuming you're using microservices:

  1. Run a dedicated back end (container) with proxy (NodeJS application / container in this repository) - see setup instructions below.
  2. Create forwarding rule from your front end to hit this back end.
    1. For instance, proxy all calls requesting /gtm-proxy/* to this back end. In this case you must also specify env variable APP__STRIPPED_PATH=/gtm-proxy. Ultimately, the request path https://your-domain.com/gtm-proxy/www.google-analytics.com/analytics.js should land as /www.google-analytics.com/analytics.js at the NodeJS proxy application/container (this repository), stripping /gtm-proxy from the URL.
    2. It is important to use your own domain, as using centralized domains might one day appear at the ad-blocking databases.
  3. Modify your initial Google Tag Manager / Google Analytics script to request the proxied file
    1. Replace https://www.googletagmanager.com/gtag/js?id=UA-123456-7 there to use https://your-domain.com/gtm-proxy/www.googletagmanager.com/gtag/js?id=UA-123456-7 (or whatever path you've set up). Also, mask the URL by running npm run mask <YOUR_URL> in this repository so that ad-blockers won't block it right away.
    2. For instance, if you run npm run mask www.google-analytics.com/analytics.js, you get this masked URL: *(d3d3Lmdvb2dsZS1hbmFseXRpY3MuY29t)*/*(YW5hbHl0aWNzLmpz)*. Use it in your script tag now: <script src="/gtm-proxy/*(d3d3Lmdvb2dsZS1hbmFseXRpY3MuY29t)*/*(YW5hbHl0aWNzLmpz)*" async></script>.
    3. The example in this repository uses unmasked /www.googletagmanager.com/gtm.js (which is equivalent of http://localhost/www.googletagmanager.com/gtm.js).
  4. Test the thing!

This to consider before implementing the solution:

  1. Your third-parties in Google Tag Manager can rate-limit your requests if you have many users, as now they're all going from the same IP address (your back end). If you've faced rate-limiting, please let me know by creating an issue in this repository! So far, we didn't.
  2. Some third-parties like owox.com (yet) does not support IP overriding like Google Analytics does, meaning that all the users in your reports may appear on a map near your office/server. That's apparently their fault, but anyway you have to deal with this somehow.
  3. Not all the third-parties are covered by the current solution. This repository is open for your PRs if you've found more third-parties that require proxying!

Setup

In Docker

The light Docker container of 41.5MB is available and ready to be run in your infrastructure.

docker pull zitros/analytics-saviour
docker run -p 80:80 zitros/analytics-saviour
# Now open http://localhost and check the proxy.

Available environment variables:

# Below are the environment variables that can configure this proxy.
# The proxy URL requested by the browser is expected to be
# protocol://$APP__PROXY_DOMAIN$APP__STRIPPED_PATH/*(masked-url)*.

APP__STRIPPED_PATH=/gtm-proxy
# A prefix which has been stripped in the request path reaching analytics-saviour.
# If your ingress/router/etc strips the prefix you are required to set this variable.
#
# On your website, most likely you'll decide to route analytics using f.e. `/gtm-proxy`
# prefix. Your "entry URL" in case of Google Analytics case will be
# example.com/gtm-proxy/*(d3d3Lmdvb2dsZS1hbmFseXRpY3MuY29t)*/*(YW5hbHl0aWNzLmpz)*
# (masked example.com/gtm-proxy/www.google-analytics.com/analytics.js).
# Your ingress/router/etc must strip the `/gtm-proxy` path and thus analytics-saviour
# gets localhost/*(d3d3Lmdvb2dsZS1hbmFseXRpY3MuY29t)*/*(YW5hbHl0aWNzLmpz)* hit.
# However, many scripts which are proxied reference external domains. Normally, these
# domains are blocked by adblockers, but luckily analytics-saviour finds and replaces
# those domains with your (request) domain and the appropriate path to handle again later.
# THE ONLY THING it cannot figure out is which part of the URL has been stripped before
# reaching analytics-saviour so that next front end requests land to the same prefixed path
# on your domain e.g. example.com/gtm-proxy/*(d3d3Lmdvb2dsZS1hbmFseXRpY3MuY29t)*/collect?..
# Because of this, the path you strip must be explicitly provided.

APP__PROXY_DOMAIN=
# The domain name used as a proxy for analytics scripts (optional). 
# When set, the traffic will be proxied via this domain.
# When not set, the current request domain (host) is used as a proxy domain.
# This is useful to proxy traffic via f.e. subdomain or another domain, so that you don't need to
# strip the prefix path.

APP__ENV_NAME=local
# APP__ENV_NAME=local or APP__ENV_NAME=test (default for local NodeJS app)
# will display static content from `static-test`.
# APP__ENV_NAME=prod is used inside the Docker container unless overwritten.

APP__HOSTS_WHITELIST_REGEX="^(example\\.com|mysecondwebsite\\.com)$"
# A JavaScript regular expression that the host must match. By default, it matches ANY HOST, MAKING
# YOUR PROXY AVAILABLE TO ANYONE. Make sure you screen all special regexp characters here. Examples:
# APP__HOSTS_WHITELIST_REGEX="^example\\.com$" (only the domain example.com is allowed to access the proxy)
# APP__HOSTS_WHITELIST_REGEX="\\.example\\.com$" (only subdomains of example.com are allowed)
# APP__HOSTS_WHITELIST_REGEX="(^|\\.)example\\.com$" (example.com and all its subdomains are allowed)
# APP__HOSTS_WHITELIST_REGEX="^(example\\.com|mysecondwebsite\\.com)$" (multiple specified domains are allowed)

NodeJS Application

To run the NodeJS application, simply clone the repository, navigate to its directory and run:

npm install && npm run start

By default, this will run a proxy with a test front end on http://localhost. You can get there and check how the request http://localhost/www.google-analytics.com/collect?v=1&_v=j73&a=... was proxied and that the ad-blocker didn't block the request. If the start is successful, after visiting http://localhost you'll see this:

Web server is listening on port 80
Proxied: www.google-analytics.com/analytics.js
Proxied: www.google-analytics.com/collect?v=1&_v=j73&a=531530768&t=pageview&_s=1&dl=http%3A%2F%2Flocalhost%2F&ul=ru&de=UTF-8&dt=Test&sd=24-bit&sr=1500x1000&vp=744x880&je=0&_u=AACAAEAB~&jid=&gjid=&cid=2E31579F-EE30-482F-9888-554A248A9495&tid=UA-98253329-1&_gid=1276054211.1554658225&z=1680756830&uip=1

Check the static-test/index.html file's code to see how to bind the proxied analytics to your front end.

Proxy in Front of the Proxy

Before the request hits this NodeJS app / container, you have to proxy/assign some useful headers to it (host and x-real-ip or x-forwarded-for). Below is the example of the minimal Nginx proxy configuration.

location /gtm-proxy/ {
    proxy_set_header Host $host;
    proxy_set_header x-real-ip $remote_addr;
    proxy_set_header x-forwarded-for $proxy_add_x_forwarded_for;
    proxy_pass http://app-address-running-in-your-infrastructure;
}

Configuration

You can configure which third-parties to proxy/replace and how to do it in the config file. Find the actual configuration in config.js file:

    proxy: {
        domains: [ // These domains are replaced in any proxied response (they are prefixed with your domain)
            "adservice.google.com",
            "www.google-analytics.com",
            "www.googleadservices.com",
            "www.googletagmanager.com",
            "google-analytics.bi.owox.com",
            "stats.g.doubleclick.net",
            "ampcid.google.com",
            "www.google.%",
            "www.google.com"
        ],
        ipOverrides: { // IP override rules for domains (which query parameter to add overriding IP with X-Forwarded-For header)
            "www.google-analytics.com": {
                urlMatch: /\/collect/,
                queryParameterName: "uip"
            }
        },
        maskPaths: [ // Which paths to mask in URLs. Can be regular expressions as strings
            "/google-analytics",
            "/r/collect",
            "/j/collect",
            "/pageread/conversion",
            "/pagead/conversion"
        ]
    }

License

Contributions

Any contributions are very welcome!

save-analytics-from-content-blockers's People

Contributors

dependabot[bot] avatar julesgoullee avatar mdmower avatar nikitaeverywhere avatar sakojpa avatar stegano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

save-analytics-from-content-blockers's Issues

Is it still working?

Hi @ZitRos,

Thanks for this excellent and unique tutorial.
I tried to implement this solution with the only difference that I used NGINX for the proxy.
Everything looks fine excepts that I don't see any visits in the Google Analytics console.

I can see that the "collect" request and the 200 response in the Developer Tools, but the Tag Assistant says "No HTTP response detected."

I'm pretty sure I did everything as told, and I don't believe the proxy would make any difference.
So, I'm wondering if this technique still works according to your own experience?

Regards,
Benoit

When will the update come out?

Thank you for the current solution. According to the readme, there will be a "more robust solution". I'm really looking forward to it. May I ask is there an estimated arrival time?

Help needed with Step 3: Implementing the Simplest Proxy Server

Hey there,

I tried to follow your tutorial on https://www.freecodecamp.org/news/save-your-analytics-from-content-blockers-7ee08c6ec7ee/

And i managed to get things running but i'm stuck at Step 3: Implementing the Simplest Proxy Server

When i browse to http://my-ip/index.html

I get:

Proxied: www.google-analytics.com/r/collect?v=1&_v=j80&a=1503182168&t=pageview&_s=1&dl=http%3A%2F%2Fipremoved%2F&ul=en-us&de=UTF-8&dt=Google%20Analytics%20Test%20Proxy&sd=24-bit&sr=1920x1080&vp=1034x898&je=0&_u=AACAAEAB~&jid=1401297639&gjid=114978947&cid=88200760.1581435787&tid=UA-removed-1&_gid=1603874559.1581435787&_r=1&z=2065047711&uip=maskedip
Proxied: www.google-analytics.com/analytics.js

So it seems to be running.

Problem i get stuck is this:
Gathering all the files in the example together (see GitHub), we should end up with the following JavaScript server code:

var express = require("express"), 
    proxy = require("express-http-proxy"), app = express();

app.use(express.static(__dirname)); // serve static files from cwd

function getIpFromReq (req) { // get the client's IP address
    var bareIP = ":" + ((req.connection.socket && req.connection.socket.remoteAddress)
        || req.headers["x-forwarded-for"] || req.connection.remoteAddress || "");
    return (bareIP.match(/:([^:]+)$/) || [])[1] || "127.0.0.1";
}

// proxying requests from /analytics to www.google-analytics.com.
app.use("/analytics", proxy("www.google-analytics.com", {
    proxyReqPathResolver: function (req) {
        return req.url + (req.url.indexOf("?") === -1 ? "?" : "&")
            + "uip=" + encodeURIComponent(getIpFromReq(req));
    }
}));

app.listen(1280);
console.log("Web application ready on http://localhost:1280");

Were should i place this code or what should i do with this code?

How to use Dataunlocker in nextjs13?

I tried adding the script as a client component and importing it into the layout file, and it doesn't seem to be working. On the dashboard, I can see the count api.dataunlocker.com increasing but that's it. google analytics or tag manager arent being proxied.

On the other hand if I inject the script via tampermonkey on the same site it seems to be working normally.

Confused on what to do

doubt: shared webhost

is there any way that we can run the setup on a shared webhost that the most '' advanced '' access is PHP / SQL / ???

I have a serious adblock problem holding our statistics, we use customized Wordpress by ourselves, but we have this limitation of the webhost

It won't work with Google Tag Manager

In your article Google Analytics is described.
Google Tag Manager now replaces Google Analytics and is different. GTM first load a js file. That js file then loads "google-analytics.com/analytics.js".

However, there's no difrect link to "google-analytics.com/collect" in analytics.js anymore. Instead there're different kinds of other links to:

Which ones should be proxied? Which ones not?

How to work with all that now?

"Proxy error: domain \"gas\" is not proxied. Requested URL: /gas/"

I've been trying this out but keep coming up against the above error. When running via NPM I see:

Environment variables:
APP__STRIPPED_PATH=/gas (a prefix path added to the original host, which will be removed in the proxy request)
APP__PROXY_DOMAIN= (an optional proxy domain which will be used for client-side requests instead of the current domain)
APP__ENV_NAME=local (should not be local nor test in production)
APP__HOSTS_WHITELIST_REGEX=/.*/ (YAY!! Anyone can use your proxy!)
2021-10-26T18:30:07.999Z Analytics proxy web server is listening on port 80

As the proxy domain is optional, I've not changed that.

Any clues that you could provide as to where I am going wrong, or what i've missed?

/ga-audiences

It appears when Signals data is turned on, the browser makes a call to a google page for /ga-audiences.

This does not appear mapped so some browsers will try and connect to a Google server.

In our instanace the URL in the javascript seems half encoded but half not.

https://www.%/*(YWRz)*/ga-audiences

Using path instead domain for multidomain webpages

Hello,

I'm trying to configure this proxy to prevent the block of google scripts on my site, and finally I've made it work. All scripts are loaded from my site and url are replaced inside that files without problem.
My problem is that proxyDomain is mandatory, and I just want to use the domain where i'm doing the request.
In some files I can just leave it empty or just add the path, but when the source scripts have double slash (//) then it fails, for example //www.googleadservices.com

I don't know about JS or NodeJS and i'm was not able to fix it. Is possible?

Thanks!

uBlock Origin Blocking */gtag/js*

I use uBlock Origin, and after testing my implementation, it looks like this won't work for any adblocker that uses https://easylist.to/easylist/easylist.txt because it has a filter for /gtag/js?

I masked my googletagmanager URL using npm run mask www.googletagmanager.com/gtag/js?id=XYZ
and I have a reference to that in my app's javascript

adblock_network_calls

In the network calls, it redirects from the mask to http://MY-DOMAIN/www.googletagmanager.com/gtag/js?id=XYZ&l=dataLayer&cx=c which then gets blocked by uBlock Origin.

Not sure if this is anything that can be addressed on the implementation side, but just wanted to report this since I saw from #14 (comment) that @ZitRos you use AdGuard (tracking works when I enable AdGuard instead of uBlock Origin)

Which third party scripts do you need help with?

I'm pretty motivated to get this working for a GTM set up. Where can I see the scripts you've already got this working with?

Outside of setting X-Real-IP, X-Forwarded-For and "uip," is there anything else that needs to be done for IP address forwarding?

GA Changes on 17th Feb

It looks like GA made some changes to the JS on 17th Feb and this script in the current form no longer works.

It alters one GA with an incomplete record and makes it unusable so maybe worth considering for anyone yet to roll out.

Proxy doesn't start in Plesk

Hello,

I'm trying to configure the proxy in Plesk according to this guide: https://docs.plesk.com/en-US/onyx/customer-guide/nodejs-support.76652/, but it tells me that there is no app.js file. Here I tried to put it pointing to src/api.js, but it doesn't work.

If I'm going to execute script and run "start", I get the following error:

> [email protected] start /var/www/vhosts/domain.com/gtm.domain.com
> node -r esm src/api.js

npm WARN lifecycle The node binary used for scripts is /usr/bin/node but npm is using /opt/plesk/node/8/bin/node itself. Use the `--scripts-prepend-node-path` option to include the path for the node binary npm was executed with.
/var/www/vhosts/domain.com/gtm.domain.com/src/api.js:18
export async function init () {

SyntaxError: Unexpected token function
    at Object.Module._extensions..js (module.js:586:10)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: `node -r esm src/api.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     [/var/www/vhosts/domain.com/.npm/_logs/2019-09-05T10_03_59_393Z-debug.log]

I attach the generated log: 2019-09-05T10_03_59_393Z-debug.log

Regards,
Jordi

Weird behaviour: Google Analytics reports visits but realtime counter stats at zero

Hi! I am testing this solution to try and fix GA's stats. The idea is brilliant! :)

I am seeing something weird though. The script is proxied etc and in the realtime view of GA I can see something in "Top Pages" as I visit pages, but the "active users" counter stays at zero as if something is missing in the tracking. Any idea of how to fix? Thanks!

ps. when will the service be available?

[Question]: Github static gh-pages

Is there a way to prevent Google analytics being blocked by ad blockers for static gh-pages without requiring a server-side solution?

Change cache headers to public?

Files like www.googletagmanager.com/gtag/js return cache-control: private, max-age=900.

I suggest modifying it to public as it should be safe and will save a lot of load on the servers, what do you think?

Facebook Pixel event not fired

Thank you for the very excellent repo.
I installed it successfully, and it works fine with GTM and GA.
However, when adding Facebook Pixel in GTM. The Facebook Pixel script is fully loaded through the proxy, but no Pixel Event is fired.
Have you tried FB Pixel yet? Please give me suggestions to get FB Pixel working.
Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.