In our articles on Flash cookies and browser fingerprinting we looked at how commercial internet companies, particularly third party analytics and advertising domains, are using increasingly sneaky and sophisticated methods to evade public awareness of the dangers of HTTP cookies, so they can continue to uniquely identify and track our movements across the web.
While Flash cookies (including so-called zombie cookies) and, increasingly, browser fingerprinting, are the most commonly used methods used to do this, there are others. In this article we will look at some of these, and discuss how they may be foiled.
HTML5 Web Storage
A feature of HTML5 (the much vaunted replacement to Flash) is Web storage(also known as DOM (Document Object Model) storage). Even creepier and much more powerful than cookies, web storage is a way analogous to cookies of storing data in a web browser, but which is much more persistent, has a much greater storage capacity, and which cannot normally be monitored, read, or selectively removed from your web browser.
Unlike regular HTTP cookies which contain 4 kB of data, web storage allows 5MB per origin in Chrome,Firefox, and Opera, and 10 MB in Internet Explorer. Websites have a much greater level of control over web storage and, unlike cookies, web storage does not automatically expire after a certain length of time (i.e. it is permanent by default).
When Ashkan Soltani and a team of researchers at UC Berkeley conducted a study of web tracking in in 2011, they found that of the top 100 websites surveyed, 17 used web storage, including twitter.com, tmz.com, squidoo.com, nytimes.com, hulu.com, foxnews.com, and cnn.com. Most of these connected to a third party analytics service such as Meebo, KISSanalytics, or Pollydaddy.
How do I stop it?
Web storage is quite easy to turn off, but many sites (e.g. CNN) will not work properly if you do so.
In Firefox:
- Launch Firefox and type about:config in the address bar
- In Click ‘I'll be careful, I promise!’
- Scroll down until you reach dom.storage.enabled or copy/paste ‘dom.storage.enabled’ into the search bar
- Double-click ‘dom.storage.enabled’, and it will change from its default value ‘true’ to ‘false’
Firefox users can also configure the BetterPrivacy addon to remove web storage automatically on a regular basis, or use the Click&Clean addon.
In Internet Explorer:
- Launch Internet Explorer and open the Tools Menu
- Select ‘Internet Options’
- Click the ‘Advanced’ tab
- Scroll down until you reach ‘Security’
- Uncheck the box for ‘Enable DOM Storage’
- Click ‘Ok’
In Chrome:
Chrome users can use the Click&Clean extension or, alternatively, the versatile Google NotScripts extension, but this requires a high degree of configuring.
In Safari and Opera:
These browsers do use web storage, but as far as we are aware there is no way to turn it off.
Do note that the use of any browser extension increases the chance of making the browser’s fingerprint unique.
HTTP ETags
ETags (or entity tags, sometimes referred to as ‘cookieless cookies’) are ‘part of HTTP, the protocol for the World Wide Web’ whose purpose is to identify a specific resource at a URL, and track any changes made to it.
The method by which these resources are compared allows them to be used as fingerprints, as the server simply gives each browser a unique ETag, and when it connects again it can look the ETag up in its database.
The first discovered use of ETags ‘in the wild’ as a tracking mechanism was made by Ashkan Soltani and his team, who found that media streaming website Hulu was using a service hosted by web analytics company KISSmetrics to respawn (Zombie style) HTTP and HTML5 cookies using ‘the cache to mirror values, specifically ETags.’
The report notes that ‘ETag tracking and respawning is particularly problematic because the technique generates unique tracking values even where the consumer blocks HTTP, Flash, and HTML5 cookies,’ and ‘even in private browsing mode, ETags can track the user during a browser session.’
Perhaps even worse, ‘the ETag respawning we observed set a first party cookie on hulu.com. This means that other sites subscribing to the kissmetrics.com service could synchronize these identifiers across their domains.’
How can I stop them?
Unfortunately this kind of cache tracking is virtually undetectable, so reliable prevention is very hard. Clearing your cache between each website you visit should work, as should turning off your cache altogether. Unfortunately these methods are arduous, and will negatively impact your browsing experience.
The Firefox add-on Secret Agent prevents tracking by ETags, but will likely increase your browser fingerprint (or because of the way it works, maybe not).
History stealing
Now we start to get really scary. History stealing (also known as history snooping) exploits the way in which the Web is designed, allowing a website you visit to discover your past browsing history.
The simplest method, which has been known about for a decade, relies on the fact that web links change color when you click on them (traditionally from blue to purple). When you connect to a website it can query your browser through a series of yes/no questions to which your browser will faithfully respond, allowing the attacker to discover which links have changed color, and therefore to track your browsing history.
Despite a reluctance to tackle this security flaw because it affects the way in which the World Wide works, most modern browsers now provide protection against this basic history stealing attack, but other more sophisticated attacks that rely on CSS page layouts and image attributes remain in use.
Using History stealing to uniquely identify you
Ok, so a website can track your browsing history using history stealing, but it couldn’t possibly identify who you are, right? Wrong. Using a process of identity fingerprinting, whereby a website matches the web pages you have visited to social networking groups, it has a high chance of identifying you as a unique individual.
Consider: Almost all social networks (e.g. Facebook) allow you to join interest groups, and most of us join dozens of these groups, many of which are likely to be public. The list of public groups you join is often enough to give you an individual fingerprint, which can be matched to your social network profile. If you regularly visit websites which correlate to your social network group interests, then it is a fairly easy matter to match your stolen web history to your social network profile.
Like we said, scary! Also, unfortunately, there is not much you can do about it.* Even more than ‘regular’ supercookies, the web industry considers history stealing to be unethical, but attempts to establish voluntarily self-imposed industry guidelines have so far come to nothing.
*Note: As reader Annie has observed, deleting your browsing history and cookies should also reset link colors. Which will prevent history stealing. Of course, unless you delete you browsing history etc. after every single page you visit, this technique will still likely be able to determine a great deal about your browsing history.
Conclusion
There is effectively an ongoing arms war between commercial internet advertising interests and the ordinary internet using public, and it has to be said that commercial interests are winning. Many attacks are now so sophisticated and subtle (most notable browser finger printing and history stealing) that reliable prevention is almost impossible, and certainly takes a degree of effort, inconvenience and technical knowhow to make even the most concerned of us shrug our shoulders and give up. We hate to say it, but perhaps the only answer lies in legislation, or at least a robust industry-recognised voluntary code of conduct that will discourage more respectable websites from indulging in this kind of behaviour.
The one good thing about the situation is that although they track you, most methods do not individually identify you (even social network fingerprinting, while scarily effective, is not reliable), and if you mask your IP address with a VPN (or Tor) then you will be going a long way to disassociate your real identity from your tracked web behaviour.