Charities are a cornerstone of society in the UK. They save, protect, sustain, and enrich life in too many ways to count. We trust them to look after our loved ones in their hour of need; we trust them to care for our veterans, our environment, our pets, our mental well-being... the list goes on.
However, our research suggests that some of this trust is being misplaced. By analysing 82,804 charity websites in the UK, we discovered that global for-profit advertising companies could be profiling users of charity websites, often visiting pages related to highly sensitive topics such as mental health, sexual violence, and disability.
Charities play a unique role and hold a special place in our hearts. They are required, by law, to serve the public interest, but at the same time, they feel many of the commercial pressures of a regular business. Donations remain the most high-profile form of income generation and those with the means are quickly adopting new technologies to maximise that income. Chief amongst these technologies is programmatic advertising. Put simply, programmatic advertising uses machines and algorithms to purchase ad space on websites. This ability to target specific users based on detailed user profiles has presented new opportunities for the charity sector.
But these opportunities have an altogether more sinister side. The AdTech industry relies on highly sophisticated mass-profiling of citizens. From basic information like age, gender, and income to highly personal data like religion, political affiliation, and sexuality, this user data is gathered by dozens of companies and shared with thousands more. The ecosystem has grown so complex that it is almost impossible to say where user data goes and what it might be used for in the future.
Our study also found that the vast majority of charities are failing to meet their obligations under European data protection and privacy laws. All website providers have a responsibility to protect the privacy of their users and comply with existing laws, but this is particularly important for websites that share potentially granular or sensitive data with third parties.
It’s perfectly understandable that charities would use every weapon at their disposal in order to generate income; but allowing companies to collect data about users, often in their darkest hour, without their knowledge or consent is intrusive and according to the Information Commissioner’s Office, ethically questionable.
- We used open datasets provided by the Charity Commission to extract all domains registered on the charity database (July 2020). This data was cleaned to remove subdomains belonging to larger sites (wordpress.com, wix.com etc.) and education sites (.edu and .ac.uk). We also attempted to remove domains we knew to belong to for-profit organisations, even if they had a charitable arm of the business.
- We then analysed the remaining 82,804 domains to detect third-party HTTP requests and cookies. Similar to previous investigations, our hope was to use the open-source tool WebXray to analyse the URLs; however, the tool was removed from GitHub prior to analysis. To continue the investigation, we developed an in-house tool.
- The tool, which we plan to make publicly available, inspects the loading process of the given URL. During the inspection, it looks for external links and other references that possibly belong to user tracking. It cross-references this with datasets provided by disconnect.me and whotracks.me. The tool runs in an AWS Lambda environment and is scalable for large analyses such as this.
- Once we had a list of third-party elements loading on each page, the domains they contacted, and the owner of each domain, we categorised them as follows:
Lotame, LiveRamp, Quantcast, Eyeota, BlueKai (Oracle), MediaMath
ADYOULIKE, The Trade Desk, Avocet, AppNexus, AdRoll, TripleLift, ONE by AOL, Bidswitch, OpenX, PubMatic, Rubicon Project
This is Google/Alphabet's programmatic advertising platform and is so pervasive, we felt it best to filter it out from other services
LinkedIn Ads, Twitter Advertising, Facebook Custom Audience, LinkedIn Analytics, Facebook Connect, Twitter Analytics, Twitter Conversion Tracking, Twitter Button, Twitter Syndication, AddThis, Facebook Social Graph, ShareThis, AddToAny
This was everything else that was categorised as advertising by open data sets but did not fit into the three categories above (see appendix for complete list)
- Using YouGov’s index of popular UK charities, we selected the top 100 websites for manual analysis. Because our tool was unable to accept cookie consent, this final stage of analysis involved visiting each website in order to carry out manual verification of third-party trackers loading on the page. To expedite this analysis we used HTTP Toolkit to intercept HTTP requests and Ghostery Insights Beta to quickly identify domains.
- We manually accepted cookies on each page to understand exactly which trackers were loading for the Top 100.
- We conducted a thorough analysis of cookie consent processes, noting whether elements loaded before or after consent was given and whether revoking consent had any technical impact on trackers.
- Finally, we explored the content of each website. We logged those with potentially sensitive subjects that could potentially be used for granular profiling and once again verified the presence of trackers on those pages.
Back to basics: What is a tracker?
When a user visits a website, they are in fact, downloading an HTML file from a server. This is a ‘first-party’ request because only two parties are involved in this process: the user and the company that owns the website.
However, webpages almost always include additional elements that are not served up by the owner of the site. Pictures, videos, social buttons, or other types of code not necessarily visible to the user. As soon as this new content is included from other domains, the number of parties involved in the transaction increases from two to three or more, which is why we refer to these as ‘third-party’ requests.
Each time one of these resources or elements is requested from the third-party server, certain pieces of user data are transmitted to the third party company. At a minimum this includes:
- The IP address of the device that made the request
- The type of device used
- The web browser
- The date and time of the request
- The 'referrer' - the URL that the request was made from
There is nothing sinister about any of this; it is simply a function of how the internet works. What is arguably more concerning is what these third parties do with that data. Simply by storing this data across thousands or millions of websites, almost any company can begin to analyse it and see patterns in behaviour. This is tracking and user profiling at its most basic.
However, an entire industry has evolved around the tracking and profiling of internet users. While most users are at least vaguely familiar with the notion of cookies, much of this tracking is now done server-side and so is almost impossible for the average user to understand or stop. It is complex, poorly understood, escapes many of the existing regulatory frameworks, and poses significant threats to the privacy of internet users worldwide.
Privacy Death Stars: The most complex digital ecosystem in existence
Using the basic examples above, an average user might not see a huge reason for concern. They visit a webpage, that information is recorded, tracked, and stored for advertising purposes later on. However, there are several other factors that make AdTech altogether more disturbing when viewed through the lens of privacy. First, is the sheer complexity of the ecosystem. There are companies that most people will never have heard of that profit by scooping up vast amounts of personal data.
Data brokers like Oracle (BlueKai) and LiveRamp (formerly Axicom) have been described as Privacy Death Stars. They probably know more than Google, Facebook, or any other single entity that gathers human-specific trackable intelligence. They are aggregators of data, not just online, but offline too. Everything from credit card transactions to criminal records is gathered from thousands of partners in order to build detailed dossiers of as many citizens around the world as possible. LiveRamp (which we found on 10 of the top 100 charity sites) boasts on its website that it holds data on more than 45 million UK citizens. That's just shy of 80% of the adult UK population.
What does a data broker know about you?
Credit: CrackedLabs. Available: CC BY 4.0
So how many of these granular data points could be inferred simply by visiting a charity webpage? The Internet Advertising Bureau provides a ‘content taxonomy’ which is used across the digital advertising industry for the categorisation of website content. It contains fields for the likes of ‘Heart and Cardiovascular Diseases’, ‘Mental Health’, ‘Sexual Health’ and ‘Infectious Diseases’ whilst Google’s publisher verticals include ‘Reproductive Health’, ‘Substance Abuse’, ‘Health Conditions’, ‘Politics’ and ‘Ethnic & Identity Groups’.
Essentially, these are boxes that exist within the industry and the companies involved are listening in wherever possible in order to tick them. The level of detail that these companies strive to obtain leaves absolutely no scope for privacy.
Social buttons: Watching everything you do
Social buttons allow users to quickly share content and have become commonplace on many private, public, and charity websites. However, their primary purpose is almost always used track and profile users for advertising purposes.
"AddThis Data is collected online and may indirectly identify you. It includes, for example:
- Unique IDs such as a cookie ID on your browser;
- IP addresses and information derived from IP addresses, such as geographic location;
It goes on to say:
"...this data can be used "to enable AddThis Publishers and Oracle Marketing & Data Cloud customers and partners to market products and services to you".
Disturbing examples of data brokers tracking users on UK charity sites
Marie Curie: Recently diagnosed with a terminal illness
Here we can see a tracking pixel belonging to Quantcast loading on Marie Curie's page for those recently diagnosed with a terminal illness. This pixel is added so that Quantcast can begin to model visitors to the site so that the charity can later use this data to target internet users most likely to 'convert' when shown an ad on a different site.
However, we can clearly see in the HTTP header below that Quantcast, a well-known aggregator of data across 100 million web domains, not only has access to the user agent data, which can be used to identify users, but also has access to the complete URL as well as sensitive keywords.
Alzheimer's Society: Worried about memory problems
This page loads a number of different trackers related to AdTech including:
- DoubleClick (Alphabet)
- BlueKai (Oracle)
- Index Exchange (Formerly Casale Media)
- LiveRamp (formerly Axicom)
Even for a charity engaging in programmatic advertising, the number of data aggregators and other AdTech companies seems over-reaching and disproportionate.
While not all pixels seem to be firing, they are present on the page.
Mencap: Benefits for people with a learning disability
To its credit, Mencap takes a more technically sophisticated and privacy-focused approach. While there are a number of AdTech trackers on its homepage (DoubleClick, AppNexus, PubMatic, MediaMath, and TradeDesk) these are not included on deeper help pages.
However, it does include AddThis, which as mentioned above, is explicitly used to provide data to Oracle's marketing platform.
In short, Mencap is allowing the same data broker to profile users on even its most sensitive pages.
Other high profile examples of data broker trackers:
British Heart Foundation: Heart Conditions
Broker: LiveRamp, DoubleClick
NSPCC: Spotting signs of child abuse
Brokers: Quantcast, DoubleClick
Stroke Association: Are you at risk of stroke?
Brokers: LiveRamp, DoubleClick, AddThis (Oracle)
Real-time-bidding: Why is it a privacy nightmare?
Real-time bidding (RTB) is a practice that is facing scrutiny by regulators and privacy advocates and is particularly concerning when trackers belonging to RTB platforms are found on web pages that are sensitive in nature.
RTB is an automated process that takes place in the fractions of a second that it takes to load a webpage. When a user visits a page that has advertising space available, the user's data is broadcast to hundreds or even thousands of bidders via an ad exchange. This broadcast includes information about the ad space, such as the page it is on, the topic and category, as well as the user's IP address and precise location if available. This all constitutes personal data under GDPR. It's impossible to know who these bidders are, but every single one of them receives the data - whether or not they win the bid.
This is happening billions of times per day. It is, as the Economist puts it, a ‘data-protection free zone’.
Credit: The Economist, recreated by ProPrivacy
It's not just the companies involved in an RTB transaction that have access to this data. The rise of RTB has forced AdTech companies to collaborate closely with one another because without exchanging user data, companies cannot participate in RTB auctions. This ecosystem has become so big and complex that it is impossible to untangle and has been described in the past as the "UK's worst data breach".
Charities seem to be using RTB to retarget their previous visitors elsewhere on the internet. In doing so, they are exposing every one of those visitors to this opaque ecosystem.
Again, these elements have access to data that is unnecessary for charities to take part in programmatic advertising. As an example, we can see here on Cancer Research UK's support page, not only is the full referrer URL included in this AppNexus request, but a cookie is set with a unique ID as well as time stamps.
Even if these platforms are being used on a charity's behalf, every visitor to a charity site with one of these trackers installed is giving away, at a minimum, basic user data such as IP and browser type, as well as the full URL they were visiting. This referrer URL often contains potentially sensitive information.
We identified 344 charity websites across the UK that appear to engage with real-time bidding (RTB) based on the elements we found on the page. Within the top 100, there were 36 charities with trackers belonging to platforms within the RTB ecosystem.