Is your Google Analytics GDPR-compliant?

It’s now been 6 months since GDPR came into effect, and the initial panic appears to have settled. However, it’s not over yet! It turns out GDPR isn’t a problem you can solve overnight, and then tick it off your to-do list.

In the past few months, we’ve seen some businesses struggle with Personally Identifiable Information (PII) being captured within their Google Analytics. And so, here’s a short how-to guide, for ensuring you aren’t picking up that pesky personal data along with all your valuable customer insights.

How can GDPR be breached in Google Analytics?

In Google Analytics, there are some obvious ways GDPR can be breached, for instance:

  • Collecting customer information via form fills, for instance postcode or address
  • Having an on-site privacy policy that doesn’t match your data retention settings in GA
  • Loading data in via Data Import which includes customer PII
  • Capturing PII-related custom dimensions that aren’t using hashed or salted encryption.

Most of these areas will have been assessed by any pre-GDPR audits that took place across data-collecting platforms, as a breach of GDPR through these means are pretty obvious (if you haven’t had a GDPR audit, get in touch with us here). Despite that, there are still a few ways that Google Analytics can mistakenly collect PII data if you’re not careful. The are two ways we see PII breaches in GA:

  1. Page URL’s. The basic Universal Analytics tag collects the URL of every page viewed, and passes this unfiltered through to Google Analytics. Occasionally PII can make it’s way into a URL, particularly during redirects from an email service provider.
  2. Search Terms. Users seem to love accidentally searching for their own email addresses, and of course, this ends up straight in your Search Terms report.

To identify whether you are collecting PII data through either of the above, navigate to the relevant report (For Page URL: Site Content > All Pages, for Search Terms: Site Search > Search Terms) and search the @ character in the in-report search function. This will show up all instances of email collection in the time period you have selected.

I’ve found some PII in my GA reports – how do I fix this?

If that’s the case, do not fear. There are a few ways the issue can be eliminated.

There is PII data showing in my All Pages report

Issue: URL captured contains ‘customer_email’ as the query parameter which is included for reporting

Report: Behavior > Site Content > All Pages

How to Find: Search the @ character in the in-report search function

Solution:

> If you use GTM

This is the preferred solution, as it ensures no PII data is passed from the URL from the outset.

You need to create a Custom JavaScript variable in GTM, which will perform a PII check in each URL collected. If the URL contains an email address, you can replace the email address with a word or phrase such as null, or undefined.

In the example code below, we have chosen to replace the email address with the word ‘redacted’. The variable, in bold, can be replaced with whichever URL variable you have set up.

function () {
  var pagePath = {{Page Path}};
  var query = location.search.substr(1);
  var em = /^(([^<>()[\]\\.,;:\[email protected]\"]+(\.[^<>()[\]\\.,;:\[email protected]\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
  var r = '';
  var result = [];

  if (query) {
    query.split("&").forEach(function(part) {
      var item = part.split("=");
      var qStr = item[0];
      var dCo = decodeURIComponent(item[1]);
      var ma = em.test(dCo);
      var s;
      if (ma) {
        s = qStr +"=REDACTED ";
      } else {
        s = qStr + "=" + dCo;
      }
      result.push(s);
    });

    var r = pagePath + '?' + query.replace(query,result.join('&'));
  } else {
    var r = pagePath;
  }
  return r;
}

Then you need to update your Universal Analytics tag to include the Custom JavaScript variable in the ‘Page’ field, as shown in the below screenshot:  

> If you want to make the change in Google Analytics itself

If you don’t use GTM, the PII issue can be solved via the GA View Settings in the Admin section. Within the View Settings, include the relevant query parameter in the ‘Exclude URL Query Parameters’ box for instance customer_email, or whichever the term is which precedes the PII data in your URL.

> If your Universal Analytics tag is hard-coded on-site

If your tagging is hard-coded on-site, you can add the below line of code to your Universal Analytics script to amend the URL’s before sending data to GA for reporting. The ‘new page value’ is where an altered page path is sent to GA, which will be coded by your site developers. The new line of code should be fired on all pages on the site along with your hard-coded GA script.

ga(‘send’, ‘pageview’, ‘new page value’);

Your updated hard-coded script on the site should look something like the following:

<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

ga('create', 'UA-XXXXX-Y', 'auto');
ga('send', 'pageview');
ga(‘send’, ‘pageview’, ‘new page value’);
</script>
<!-- End Google Analytics -->

There is PII data showing in my Search Terms report

Issue: Users perform a search on site using their email address (it happens more than you think!)

Report: Behavior > Site Search > Search Terms

How to Find: Search the @ character in the in-report search function

Solution:

The solution to this is much like the approach identified in section 1 (‘there is PII data showing in my All Pages report’) however, the change CAN’T be made within Google Analytics itself, using the option of excluding URL query parameters, as this will remove all on-site search tracking.

The same GTM solution can be used, as detailed already, with the same results. The hard-coded option can be used, however your on-site developers will need to ensure that the ‘new page value’ which is passed to Google Analytics only excludes search queries where an email address is present, otherwise all on-site search tracking will be excluded from reporting.

Summary

PII data can be found in Google Analytics, typically passed from URL’s, Page Titles and Search Terms. However, these instances can be prevented easily enough using the above tips! The best option is implementing fixes in GTM, as it means PII is excluded from the very start of the data collection journey, but there are also options in GA or through hardcoding that are available. 

If you need any help with the above, or are concerned about GDPR compliance in your Google Analytics, give us a shout.