Privacy-Aware Design: Replacing Google Analytics with a decentralized alternative

In late 2005, Google started to provide free access to a web analytics product based on the previously expensive Urchin software suite. In the seven years since, this strategy succeeded to get Google Analytics tracking code included in a stunning share of websites by providing access to a powerful tool at (seemingly) no cost for everyone from big corporations to hobbyist bloggers.

“Oh, and we’ll of course add Google Analytics to the site” is a common phrase in the context of a web project, by large agencies and teenage family webmasters alike: Google has managed to define their product as an implicit standard for visitor analysis on the web. Adding the tracking code is easy and the data the service provides is of unquestionable quality.

Yet, privacy advocates have long pointed out the serious implications of one corporation being able to track users around such a massive slice of the internet.

This post is part of a series on my research on “Privacy-Aware Design“ in a UI/UX context.

See list of all posts in the series

  1. Making the case for “Privacy-Aware Design”
  2. Privacy-Aware Design: Replacing Google Analytics with a decentralized alternative (this post)
  3. Privacy-Aware Design: Opt-in alternatives for social media sharing

Connecting the dots

In April 2012, Google stated its analytics suite to be used by over 10 million websites and external sources estimate the analytics tool to currently run on about half of the world’s 1 million most popular websites.

A few lines of code, embedded in millions of websites, allow Google to track users as they browse around the internet.
Image caption: A few lines of code, embedded in millions of websites, allow Google to track users as they browse around the internet.

While the browser cookies set by Google Analytics are so called “first party cookies”, i.e. their use is restricted to the domain of the tracked website, the sheer fact that every page load is registered on a Google server provides at least the hypothetical possibility to “observe” individual users. Having one piece of code included in about half of the internet’s pages enables Google to follow any particular user as they browse around the internet.

Whether and to what degree such concatenation of data is really carried out is not public knowledge. The terms of service probably leave plenty of room to cover such activities, e.g. under the goal of “improving user experience”, and the technical feasibility has been identified as a privacy threat (in German) by the ULD Independent Center for Privacy Protection in Germany already in 2008. Yet, just the fact that the data would enable such tracking is concerning enough.

In combination with the huge share of people constantly logged in to the Google platform due to their use of Gmail (another product given away for free that has turned into an almost undisputed “default” for e-mail) or other Google products, it is even technically possible for the company to connect the data trail with identified individuals.

Self-defense or privacy-aware design?

The most technology-literate of internet users, people not only concerned about their privacy but also owning the skills to take action, have long found ways to protect themselves, e.g. by setting “Do not track” (DNT) variables in their browser:

Do Not Track is a technology and policy proposal that enables users to opt out of tracking by websites they do not visit, including analytics services, advertising networks, and social platforms.

Many services have committed to respect DNT, though Google apparently does not belong to these companies. An alternative is to install browser extensions that block the submission of any tracking data, for instance Google’s own opt-out browser plugin or general privacy suites like Ghostery (see my earlier post on a self-experiment on its application).

While both the activation of the “do not track” flag or the installation of a privacy plugin are means for the user to influence who gets to know what sites they visit, it appears that the majority of users are not even aware of the data collection in the background.

Only browser plugins like Ghostery help to visualize and understand how many services a web user is tracked by (screenshot from a visit on one of Finland's Top 10 media websites).
Image caption: Only browser plugins like Ghostery help to visualize and understand how many services a web user is tracked by (screenshot from a visit on one of Finland’s Top 10 media websites).

The fact that an assumed majority of users do not have the understanding and/or the technical ability to identify the issue and do something about it does not justify to hand such data over to a third party without their consent.

The centralized collection of usage data by Google is something users have no control about if Analytics is installed on a website. It therefore is worth to consider alternatives that still enable a service provider with the required insight but protect their users’ privacy – to apply principles of “Privacy-Aware Design” to not even subject a site’s users to such privacy risks.

Alternatives are out there

Adhering to the principle to not share user data with third parties and implementing an analytics solution that does respect privacy is not overly complicated – though it will mean to give up on the “free” Google Analytics and run a separate tracking software or buy a solution from a vendor  offering decentralized alternatives.

Contrary to popular belief, the “Do not track” setting – which I believe to be a good guideline for the privacy-aware designer in this regard – does not imply that visits have to be excluded from every form of usage statistics (see e.g. the EFF’s interpretation) but that it needs to happen in a privacy-considerate way: e.g. limiting the data logged and its retention time, not identifying users beyond website boundaries, and most importantly implementing it in a way that only the owner of the tracked website has access to the data.

The “Do Not Track Cookbook” serves as a good primer on such strategies:

A few simple steps would significantly mitigate the privacy concerns raised by outsourced analytics. First, an analytics service should technologically limit user identifiers to each customer website. […] Second, an analytics service should separately store and handle the data from each customer website using technical and business protections. Last, an analytics service should be contractually prohibited from using the data it collects.

Decentralized alternatives to Google Analytics are plentiful – analytics packages that adhere to a strict privacy policy, ranging from software for on-site installation to hosted cloud services:

  • On a small scale (e.g. for blogs), lightweight self-hosted software packages like Shaun Inman’s Mint provide an easy solution to gather visitor insight without any data leaving the server.
  • Other on-site solutions are designed to provide a rich feature set comparable to that of Google Analytics and can scale even to the requirements of multiple big-sized websites, like the open source analytics package Piwik and its hosted instances.
  • Also specialized commercial packages are available, for example the feature-rich “Secure Web Analytics Software” Angelfish.

In Germany, where privacy laws are particularly strict (and actively enforced), Piwik has already gained good traction as a Google Analytics alternative. In a recent audit, also the French Independent Center for Data Privacy Protection in France CNIL found Piwik to be the “only web analytics tool that provides full compliance with data protection laws”.

Towards privacy as a default in web analytics

Naturally, all this leads to questions on the practicability of replacing Google Analytics for commercial operators – technology on the UI level is often only the manifestation of business level strategies and requirements, and building on-site analytics capabilities is a question of both financial and technical resources.

This issue is deeply interwoven with current practices of online business and change cannot happen in an instant; yet, it is about time to start questioning the status quo and evaluating alternatives. While in Germany few web designers would light-heartedly plug a default install of Google Analytics into a website in fear of legal consequences from the violation of privacy laws, critical reflection on the implications of “just using the industry standard” still seems surprisingly rare on the web at large.

On a small scale, for experimental and evaluation purposes, I have recently migrated all my own and two client websites’ tracking to on-site instances of Piwik. Installation and integration with mainstream CMSs is easy, and for the less tech-savvy users or more resource-hungry sites also hosted options are available.

The Piwik Dashboard features data similar to what Google Analytics provides (screenshot from the demo site at piwik.org)
Image caption: The Piwik Dashboard features data similar to what Google Analytics provides (screenshot from demo.piwik.org)

The data has been of reliable quality and despite certain limitations (Piwik’s current lack of proper event tracking is a major drawback) it already offers more detail than most websites not reliant on advertising would need. I strongly believe in it to gradually evolve into an equally powerful software as Google Analytics, especially if it would receive broad support from a growing community of users, in particular from the commercial side.

This work in progress is part of a blog post series based on my ongoing research on restoring privacy on the web. Any commentary is highly encouraged and you may subscribe here to follow the upcoming posts on Privacy-Aware Design.

Responding with a post on your own blog? Submit the URL as webmention (?)

If you prefer, there is also an RSS feed