More things that go bump in the night: HTTP ETags, Web Storage, and ‘history stealing’

In our articles on Flash cookies and browser fingerprinting we looked at how commercial internet companies, particularly third party analytics and advertising domains, are using increasingly sneaky and sophisticated methods to evade public awareness of the dangers of HTTP cookies, so they can continue to uniquely identify and track our movements across the web.

While Flash cookies (including so-called zombie cookies) and, increasingly, browser fingerprinting, are the most commonly used methods used to do this, there are others. In this article we will look at some of these, and discuss how they may be foiled.

HTML5 Web Storage

A feature of HTML5 (the much vaunted replacement to Flash) is Web storage(also known as DOM (Document Object Model) storage). Even creepier and much more powerful than cookies, web storage is a way analogous to cookies of storing data in a web browser, but which is much more persistent, has a much greater storage capacity, and which cannot normally be monitored, read, or selectively removed from your web browser.

Unlike regular HTTP cookies which contain 4 kB of data, web storage allows 5MB per origin in Chrome,Firefox, and Opera, and 10 MB in Internet Explorer. Websites have a much greater level of control over web storage and, unlike cookies, web storage does not automatically expire after a certain length of time (i.e. it is permanent by default).

When Ashkan Soltani and a team of researchers at UC Berkeley conducted a study of web tracking in in 2011, they found that of the top 100 websites surveyed, 17 used web storage, including twitter.com, tmz.com, squidoo.com, nytimes.com, hulu.com, foxnews.com, and cnn.com. Most of these connected to a third party analytics service such as Meebo, KISSanalytics, or Pollydaddy.

How do I stop it?

Web storage is quite easy to turn off, but many sites (e.g. CNN) will not work properly if you do so.

In Firefox:

Launch Firefox and type about:config in the address bar
In Click ‘I'll be careful, I promise!’
Scroll down until you reach dom.storage.enabled or copy/paste ‘dom.storage.enabled’ into the search bar
Double-click ‘dom.storage.enabled’, and it will change from its default value ‘true’ to ‘false’

Firefox users can also configure the BetterPrivacy addon to remove web storage automatically on a regular basis, or use the Click&Clean addon.

In Internet Explorer:

Launch Internet Explorer and open the Tools Menu
Select ‘Internet Options’
Click the ‘Advanced’ tab
Scroll down until you reach ‘Security’
Uncheck the box for ‘Enable DOM Storage’
Click ‘Ok’

In Chrome:

Chrome users can use the Click&Clean extension or, alternatively, the versatile Google NotScripts extension, but this requires a high degree of configuring.

In Safari and Opera:

These browsers do use web storage, but as far as we are aware there is no way to turn it off.

Do note that the use of any browser extension increases the chance of making the browser’s fingerprint unique.

HTTP ETags

ETags (or entity tags, sometimes referred to as ‘cookieless cookies’) are ‘part of HTTP, the protocol for the World Wide Web’ whose purpose is to identify a specific resource at a URL, and track any changes made to it.

The method by which these resources are compared allows them to be used as fingerprints, as the server simply gives each browser a unique ETag, and when it connects again it can look the ETag up in its database.

The first discovered use of ETags ‘in the wild’ as a tracking mechanism was made by Ashkan Soltani and his team, who found that media streaming website Hulu was using a service hosted by web analytics company KISSmetrics to respawn (Zombie style) HTTP and HTML5 cookies using ‘the cache to mirror values, specifically ETags.’

The report notes that ‘ETag tracking and respawning is particularly problematic because the technique generates unique tracking values even where the consumer blocks HTTP, Flash, and HTML5 cookies,’ and ‘even in private browsing mode, ETags can track the user during a browser session.’

Perhaps even worse, ‘the ETag respawning we observed set a first party cookie on hulu.com. This means that other sites subscribing to the kissmetrics.com service could synchronize these identifiers across their domains.’

How can I stop them?

Unfortunately this kind of cache tracking is virtually undetectable, so reliable prevention is very hard. Clearing your cache between each website you visit should work, as should turning off your cache altogether. Unfortunately these methods are arduous, and will negatively impact your browsing experience.

The Firefox add-on Secret Agent prevents tracking by ETags, but will likely increase your browser fingerprint (or because of the way it works, maybe not).

History stealing

Now we start to get really scary. History stealing (also known as history snooping) exploits the way in which the Web is designed, allowing a website you visit to discover your past browsing history.

The simplest method, which has been known about for a decade, relies on the fact that web links change color when you click on them (traditionally from blue to purple). When you connect to a website it can query your browser through a series of yes/no questions to which your browser will faithfully respond, allowing the attacker to discover which links have changed color, and therefore to track your browsing history.

Despite a reluctance to tackle this security flaw because it affects the way in which the World Wide works, most modern browsers now provide protection against this basic history stealing attack, but other more sophisticated attacks that rely on CSS page layouts and image attributes remain in use.

Using History stealing to uniquely identify you

Ok, so a website can track your browsing history using history stealing, but it couldn’t possibly identify who you are, right? Wrong. Using a process of identity fingerprinting, whereby a website matches the web pages you have visited to social networking groups, it has a high chance of identifying you as a unique individual.

Consider: Almost all social networks (e.g. Facebook) allow you to join interest groups, and most of us join dozens of these groups, many of which are likely to be public. The list of public groups you join is often enough to give you an individual fingerprint, which can be matched to your social network profile. If you regularly visit websites which correlate to your social network group interests, then it is a fairly easy matter to match your stolen web history to your social network profile.

Like we said, scary! Also, unfortunately, there is not much you can do about it.* Even more than ‘regular’ supercookies, the web industry considers history stealing to be unethical, but attempts to establish voluntarily self-imposed industry guidelines have so far come to nothing.

*Note: As reader Annie has observed, deleting your browsing history and cookies should also reset link colors. Which will prevent history stealing. Of course, unless you delete you browsing history etc. after every single page you visit, this technique will still likely be able to determine a great deal about your browsing history.

Conclusion

There is effectively an ongoing arms war between commercial internet advertising interests and the ordinary internet using public, and it has to be said that commercial interests are winning. Many attacks are now so sophisticated and subtle (most notable browser finger printing and history stealing) that reliable prevention is almost impossible, and certainly takes a degree of effort, inconvenience and technical knowhow to make even the most concerned of us shrug our shoulders and give up. We hate to say it, but perhaps the only answer lies in legislation, or at least a robust industry-recognised voluntary code of conduct that will discourage more respectable websites from indulging in this kind of behaviour.

The one good thing about the situation is that although they track you, most methods do not individually identify you (even social network fingerprinting, while scarily effective, is not reliable), and if you mask your IP address with a VPN (or Tor) then you will be going a long way to disassociate your real identity from your tracked web behaviour.

Jay Tob

on June 12, 2017

Douglas, Very huge thanks for your work, your writing, and your sanity. Two Q's 1) Question: If I make a button that automatically deletes the browser from a MacOS at the end of every day, then installs fresh copy, will that "re-set" the browser history, cookies, ETags, and confuse the signature seeking software? Yes, ideal would be to do so after every website, but at least daily, should substantially cut the window of information to near zero. No? 2) Question: What is the best browser for security now?

Douglas Crawford replied to Jay Tob

on June 13, 2017

Hi Jay, Aw, thanks! 1) That might help some, although it seems rather extreme on a practical level. Note that if using the same browser on the same OS with the same screen resolution, etc., some fingerprinting is still possible. Some people run their browser inside ta Virtual Machine (or use the Qubes OS, which basically runs almost everything in its own unique VM) in order to help counter this. 2) The first thing to note is that security and privacy are not the same thing. Something like Chrome is very secure, but it is also Google spyware (as Safari is Apple spyware). For privacy I recommend using open source Firefox with some privacy extensions (Privacy Badger + uBlock Origin or uMatrix, Cookie Monster, Better Privacy, HTTPS Everywhere, and Canvas Defender). The Privacy Settings is also handy. An alternative approach is to use the plain vanilla Tor browser (which is based on Firefox) with Tor itself disabled. You don't benefit this way from all the anti-tracking extensions just mentioned, but your browser will look just like every other Tor browser. This makes it resistant to browser fingerprinting.

Jani "robsku" Saksa

on March 2, 2017

EASY SOLUTION - But am I right? First, I'm thankful for this information, especially about the history stealing. As a coder and a person with a significant interest in prevention of tracking&profiling and protecting my privacy I already knew something about possibility of web browsers leaking browsing history, but I had totally wrong idea about it - partly because it's rarely even mentioned in blogs of privacy tools like noscript, content blockers (ABP, µBlock Origin, etc.) and such, and partly because I've assumed too much and have never needed to learn more than fraction of all that Javascript is and is not able to do. In the past there has been a number of browser specific extensions to web standards that haven't always been so well thought from the security POV, and for some reason I thought that accessing history was actually an ill-thought feature of JS! And I also thought that it's since been disabled, but that newer methods used some security holes in browsers. I never thought it was something as simple as link colours, even though I've written a couple userscripts that read and modify CSS styles of matched tags, id's and classes. The solution is simple and also seems to exist as setting in at least couple of mobile browsers, and I've always wondered why: disable changing visited link colour. Unfortunately it seems to not have the desired effect with UC Browser, which I'm using right now to write this message, but such protection shouldn't be hard to implement in any browser that has sufficient add-on or userscripts (Greasemonkey Firefox add-on, for Dolphin Browser on Android Tampermonkey, etc.). I've actually used an add-on, with desktop Firefox, that implements option to override link colours of websites, both globally and on per-site basis. I'm sure that there are ways to override the override, so likely an add-on should be written to do this for specifically preventing history stealing (the add-on I'm talking of was for improving readability). The method however is obvious, and should be implemented in the browsers as configuration item so that links ignored any colour or style to be applied specifically on visited links. What do you think? As for Etags, I'm not sure I understand how and why do they work, which makes them really scary for me in comparison.

Douglas Crawford replied to Jani "robsku" Saksa

on March 3, 2017

Hi Jani, Disabling changing visited link colour in your browser would foul this form of history stealing, but a) many people find this feature very useful, and b) by changing your browser's settings in this way (or running a browser extension to do it) you increase the uniqueness of your browser, and therefore make it more susceptible to browser fingerprinting. Re. Etags, this article explains how they work in detail. But, yes, they are scary!

Annie

on January 10, 2017

Hi Douglas I feel a bit silly for having to ask this, but deleting private data like cookies and such from your browser is enough to keep sites from "history stealing" the sites afterward, right? Even with zombie cookies possibly repopulating the cookie file, it wouldn't re-colorize the links, would it? (This is all assuming that clearing private data is what turns the links back from purple to blue) Thanks, and great articles!

Douglas Crawford replied to Annie

Hi Annie, You are right. Deleting your browsing history should also reset link colors, which will prevent history stealing. Of course, unless you delete you browsing history after every single page you visit, this technique will still likely be able to determine a great deal about your browsing history. I have added a note to the article flagging up this point. Thanks.

bs-sshhhh

on May 10, 2015

I tried Random Agent Spoofer. It has a MAJOR problem; if you use Firefox, Cyberfox, Waterfox... most any mozilla based browser other than Pale Moon - you need to disable WebRTC tracking (about:config / media.peerconnection.enabled / false ) to prevent having your ip address tracked, even when using most reputable VPN services. Every time Random Agent Switcher is enabled, or auto-changes the browser agent, it re-enables WebRTC tracking. Something you should keep in mind if you wish to have any hope of some privacy on the web

Douglas Crawford replied to bs-sshhhh

on May 11, 2015

Hi bs-sshhhh, Thanks for the info! That is very good to know.