The CDN Trap: Hazards of a Multi-Origin Web

"The attention race is by nature monopolistic. People can't give their attention to two advertisers at once. So the various powers are now starting to block each other's access to users and their data. When websites are heavily multi-origin, the likelihood of broken pages rises. Already, broken sites have ceased to even raise an eyebrow."

Back in 2016, when Europe's General Data Protection Regulation (GDPR) [PDF] was first communicated to businesses, a warning shot echoed across the realm of online tracking. The forthcoming law came with a proposition of breathtaking fines, and it declared that tracking cookies could no longer be used as a default part of the open Web.

The tech industry came up with a notoriously annoying short-term solution that didn't actually comply, but looked, it was hoped, enough like an attempt to comply to keep the regulators at bay. Namely, the cookie consent popup. This hare-brained scheme was and is non-compliant for a range of reasons - not least because it needs to set a cookie on the devices of users who disconsent to cookie use, in order to recognise that they've disconsented. Pure farce.

"The CDN is the new third-party cookie."

GDPR itself gave no advice on how to achieve cookie compliance. It just said explicit, per-person consent was essential before any cookies could be used. That proved so unrealistic a demand that the UK regulator itself did not even manage to comply. The entire concept of consent popups came from a tech industry desperate to keep tracking cookies on life support, and no authority actually accepted that the system was legal. The popup banners and option mechanisms were, nevertheless, widely adopted after the new law went live in May 2018.

Knowing that this tenuous-bordering-on-ridiculous solution could go tits-up at any time, elite cybertech has been working tirelessly on a Plan B ever since. A raft of backup schemes that will kick in and allow the preds to reliably 'cookie' the public, in the event that European regulators finally decide to bother enforcing the law.

And the quest to develop these alternative tracking methods has intensified further due to the awareness-raising effect of cookie consent popups. Over time, more and more people have realised they can blanket-block all cookies - especially third-party cookies, which can be disabled outright in the browser with only minor adverse consequences. Even if blocking third-party cookies creates too much disruption, users can still set their browser to clear cookies automatically at the end of each session. This renders cookies in themselves near useless as a tracking mechanism.

LADIES AND GENTLEMEN - PLEASE WELCOME TO THE STAGE, STRONG-ARM SURVEILLANCE!

So what is Plan B, exactly? Well, the basic strategy has two main tenets...

Distribute the technology across a range of schemes so that if a new law bans one scheme, others are not necessarily affected.
Make the technology indispensable so it becomes impossible to ban.

This is a new form of surveillance. Strong-arm surveillance. A system of online tracking and behaviour-monitoring whose gameplan is to achieve what third-party cookies never could, and become totally unavoidable.

How to do that? In true Doctor Evil fashion... Take over the actual display of all Web pages. Become a source of those pages. I mean, visitors want that content, right? That's why they're there. So if a tech provider delivers the content, visitors will no longer want to block that tech provider. Inevitability, this has triggered a page-dependency gold-rush among elite cybertech firms. Dependencies such as remote fonts, widgets and script-rendering routines like Google Tag Manager rank among the many options in the strong-arm surveillance toolkit. Anything that breaks the page when it can't load. But the ground almost every former behemoth in the third-party cookie stakes has been chasing over the past five years, is the Content Delivery Network - the CDN.

CDN WARS

Yahoo/Verizon/Apollo has Edgecast, Amazon has Cloudfront, Facebook has fbCDN, Microsoft has Azure CDN, Google has Google Cloud CDN, and so on. And they are joined by the subterranean providers whose brand names are still not assimilated by the average person in the street. Cloudflare, Fastly, jsDelivr, etc.

CDNs are a very old idea, and originally, the main goal of them really was to make ye olde Interwebz a better place. Akamai did the early groundwork, offering the first commercial content delivery network.

But it was Cloudflare (then styled as CloudFlare) who began the era of CDN-based surveillance, launching a free, simple to use, almost consumer-orientated CDN in 2010. The product came out of beta late that September, but had already been in use on publicly-accessible sites through the summer. The intention behind this new breed of CDN, from the start, was for Cloudflare to get its claws into as many e-commerce sites as possible, implement its now renowned, man-in-the-middle gatekeeping racket as a necessary part of the CDN offer, gain executive control of the commercial Web, and forcibly collect an arterial flow of highly exploitable data in the process.

Pre-GDPR, and outside of Cloudflare, who had a broader agenda than the ad companies, the economics of the CDN remained fairly pedestrian. But post-GDPR, and especially now Google is vowing to deprecate third-party cookie use in Chrome, CDN providers are offering the biggest sites much more attractive propositions. Indeed, according to this survey, pricing dropped by around 70% between 2017 and 2020, and some providers were offering the services at a loss. If that doesn't tell you that the CDN is the new third-party cookie, I don't know what will.

WEBMASTER BEWARE

But this is good for site owners, right? If you run a website, an ever-helpful band of megatech providers now offer to reliably push various elements of content onto your pages - either super-cheap or gratis. You get a logistical turbocharge at negligible expense or free as in beer. I mean, what's not to like?

Well, we can start with the inconvenient fact that when you allow tech monopolists to collect visitor data from your site (which is the real role of CDNs in the 2020s), they can ultimately use the data they collect to put you out of business. And that's exactly what they will do. Big Tech sucks up information from small sites, and uses it to destroy them. That's how monopoly works.

How many sites do you now find in the top websearch results, which are not household names, based in Silicon Valley and essentially mouthpieces for Big Tech?...

Wikipedia - based in Silicon Valley, cash-gagged by Google, spectacularly anticompetitive.
YouTube - Owned by Google.
eBay - Duh.
Amazon - Duh.
Google Places - Duh.
Quora - Based in Silicon Valley, login deals with Google and Facebook.
IMDb - Owned by Amazon.
Medium - Based in Silicon Valley, funded by Google. Login deals with Google, Facebook and Twitter.
Techcrunch - Based in Silicon Valley, owned by Yahoo.
Techdirt - Based in Silicon Valley. Another cash-gagged Google mouthpiece.
Github - Based in Silicon Valley, owned by Microsoft.
Flickr - Based in Silicon Valley, part of Google's anti-copyright cartel and sees creators as mugs.

This is not an accident. It's monopoly at work. Silicon Valley does not even see a world outside Silicon Valley. Here's an absolutely classic story in which a Silicon Valley company decides it "wAnTs To ReWaRd CoNtEnT cReAtOrS". Here's the rationale, as expressed by the company - search startup Neeva, notably run by Google dropouts...

"When creators aren’t rewarded for creating great content, they are not motivated to create it, and we all suffer..."

Quite. So how does Neeva go about rewarding creators for creating great content? Wait for it... You know what's coming... By giving money to Quora and Medium. Hey, Neeva. QUORA AND MEDIUM ARE NOT CONTENT CREATORS. THEY'RE BILLIONAIRE-ASS SILICON VALLEY TECH BROS. Jeez!

So look, even when Silicon Valley thinks it's helping the little people, it is in fact just bunging more cash at its next-door neighbours, whilst giving them the usual automatic priority in reach and visibility. How about, maybe, I dunno... JUST MATCH PEOPLE'S SEARCH QUERIES WITH THE MOST RELEVANT POST ON THE WEB?... No. They don't wanna do that. They wanna do what every other Silicon Valley search engine does, and run a shop window for their mates. This is how Silicon Valley elitism works, and how it will always work. Anything you allow a Silicon Valley enterprise to harvest about your site visitors will be used to further the interests of Silicon Valley, at your expense.

BROKEN DREAMS

It gets worse. Multi-origin web pages are a rabbit-hole. You start with a remote font, or two. Then you farm out your image loads to a CDN provider. Then... why not take a bit of strain off the script delivery?... Before you know it, your site's pages have become extensively multi-origin, and that can cause all sorts of unforeseen problems. For example...

Can all countries of the world even receive every source on the page? The Web has become increasingly political, and some countries block some services. Increase the number of sources required to display your pages, and - especially if the sources are American tech giants - you increase the likelihood of your pages being broken or outright blocked in various parts of the world. This could even start to happen widely across Europe.
If a CDN provider is running their service at a loss in order to access data, what happens when they no longer need that data access? The answer is the price rockets up. Then you, the site admin, either endure the nightmare of migrating to another provider, or the cost of running your site increases.
Some CDN provisions (i.e. Cloudflare) double as "bot protection services". Essentially, the CDN serves as a loss-leader for the real 'product', which is a total diversion of all your site traffic via the provider's servers. This scheme will not only extract more data from your traffic than you do - it will also lose you human visitors. Especially in genres such as privacy, security and alt tech, where the incidence of content/script blocking is high, and a substantial proportion of visitors use Tor. And when "bot protection services" block real people from accessing your site, they'll either insult them by calling them "bots", or blame YOU, claiming that your site doesn't work.
Many of the companies contesting the CDN market are advertisers, locked in a battle for public attention. That attention race is by nature monopolistic. People can't give their attention to two advertisers at once. So the various powers are now starting to block each other's access to users and their data. In this scenario, when websites are heavily multi-origin, the likelihood of broken pages rises. Already, broken sites have ceased to even raise an eyebrow. And looking ahead, we'll see this problem worsen markedly, as rising forces like Brave - whose business model revolves around blocking rival ad-tech - gain more clout in the landscape and become bolder in their blocking regimes. The more third-party dependencies your Web pages have, the more risk there is of a rival ad-tech firm blocking something that breaks your site.
Google is still to play its hand on an official third-party cookie replacement. And if we know Google, the final mechanism will be grossly anticompetitive. Will it accommodate massively multi-origin pages that serve Google's rivals? Why should it? And if it doesn't, it won't be the first time Google has told 80% of site admins to piss off and spend the next month rewiring their sites.
Last but not least, there's also the prospect of more aggressive blocking policies from a growing number of individual privacy obsessives - as the surveillance industry rolls out full-on nineteen eighty-four. After Google deprecates Manifest v2, which supports content-blocking browser extensions, it's only a matter of time before the classic content-blocker uBlock Origin dies a death. In Google's Internet fairytale, privacy vigilantes will give up at that point. But in the real world they won't. They'll start chucking hardcore block lists into their hosts files, and flat-out firewalling the most noxious preds. This crude, DIY alternative will not have the sophistication and discretion of browser extensions, and that means a HUGE volume of heavily multi-origin Web pages will break.

And if your site does break for any of the above reasons, visitors may use a browser-based correction tool such as Firefox Reader View to organise the mess. This may in turn take out even more page content - even when it's not technically blocked.

PLAN AHEAD

Now is the time to start shifting back towards a single-origin Web. Privacy matters a lot more to the public than most site admins think it does, and the pace of epiphany on privacy strategies is picking up. Today, I'm seeing advice that in 2018 would have been real specialist ground, being Tweeted out by general tech influencers and getting substantial engagement. People are waking up. How many years will it take them to latch onto dependency-based tracking, and begin selectively deciding which elements of your pages they can do without? Do they really need the widgets? The fonts? Do they even need the pictures? And if the answer is no, how will that affect the prosperity of your site, versus other, single-origin sites, that display perfectly without the need for all those blocked dependencies?

Multi-origin page assembly is out of control. There are some Web pages that depend on so many different sources that even now, the odds of them displaying properly are genuinely remote. It's insane. I use very strict content-blocking, which allows first-party loads only. And to me, half the Web looks like something Stallman posted in 1993, whilst simultaneously trying to move some furniture, drunk. That's how out-of-control multi-sourcing has run. They're not even loading the basic CSS from their own domains. Some of them might as well just change the name on their whois ownership record to Big Tech. Because, that is who owns the site.

If you care about presenting a professional image to your site visitors, aim to move back to single-origin pages, and if the load is too heavy, consider making it inherently lighter rather than farming out the heavy lifting to a bunch of data-mining yobs who want you to pay for their stalking campaign. Ten years down the line, I think you'll be glad you did.