Why the Subscription Model is not the Solution to Dysfunctional Websearch

How do you run a search engine that disclaims its own fitness for purpose, on Google servers, drawing 'Bingle' results, and somewhow convince a paying userbase that they're not wasting their money? Ask Kagi…

We've recently entered the first ever period in which the progress of the World Wide Web is widely perceived to have dropped into retrograde. Yep, it's now obvious to more than just a fringe of tinfoil hatters that cybertech is going backwards, and no one really has a solution.

You're LOGGED IN. You can't not be logged in. They know it's you. Every search. That's not privacy. It's the opposite of privacy. And if you believe a Surveillance Valley startup when it says your data isn't being collected and exploited, you must have spent the past two decades asleep.

Search engines, it's roundly believed, are deliberately piss-pumping their results for profit. In a recent post on Wired, the former DuckDuckGo exec Megan Gray - also a seasoned lawyer attending the Google antitrust proceedings - suggested that Google was background-switching search terms for similar but more profitable alternatives.

The post [archived version] was to a large extent speculative, and has since been removed by Wired.

But Google's denial comprised the usual amalgam of meaninglessly vague statements and self-promotional links. And let's face it, they're hardly gonna admit to fraud - which is the offence of which they'd be guilty if they were trousering fees from companies who advertised against non-existent searches for each other's brand names.

Wired, meanwhile, have only said that "the story does not meet our editorial standards" - not that it was untrue.

But most notably, all of the people I saw sharing the story in the Fediverse were totally convinced by it. Whether or not Google is background-switching search queries, if the Fediverse could remark, in unison, "Ah, so THAT'S why it no longer finds what I ask for", there's clearly been a seismic decline in the quality of search.

"THE SERVICES ARE PROVIDED "AS IS" WITH ALL FAULTS. TO THE EXTENT PERMITTED BY LAW, KAGI AND THE INDEMNIFIED PARTIES HEREBY DISCLAIM ALL WARRANTIES, WHETHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES THAT THE SERVICES ARE FREE OF DEFECTS, MERCHANTABLE, FIT FOR A PARTICULAR PURPOSE, AND NON-INFRINGING." - Kagi's legal paints a very different picture from its marketing.

NEW OPPORTUNITIES

This very real decline has provided a soapbox for a new wave of companies whose goal is to turn websearch into a recurring debit. These slick-talkin' search operations began to hatch from the almost inevitable homebase of Silicon Valley around the end of the twenty-tens - launching to coincide precisely with Google's full-scale dive into blatant dysfunction.

The initial front-runner - Neeva - has already performed the classic Silicon Valley startup trick of selling itself (and no doubt the vault of data it pretended it didn't collect) to a self-confessed data-mining company.

MARKETING MISDIRECTION

Subscription search's copiously-shilled mantra claims that advertising has ruined websearch, and that without advertising, websearch once again becomes useful.

It's a convincing pitch, because we can see real-life examples of advertising companies whose search quality has gone so far down the lav that even a noted lawyer believes they're flat-out subbing in fake queries. But we forget, as we read subscription search's emotive marketing, that so far, the primary sources for subscription-based search results are the same advertising companies they're telling us we should avoid.

The only quality third-party source Kagi has is Wikipedia, and you can search that directly, with no ads, for free.

This is only part of a highly disingenuous marketing regime that subscription search uses to gloss over its fundamental flaws.

It claims to be "private". In truth it's anything but. Kagi - a premium search service whose terms specifically disclaim its fitness for purpose - runs on Google servers and sends search queries to Google's central brain. But regardless of who's getting their grubby hands on your data, imprisoning yourself in an aggressively-logged enclosure, with verified identity, is the polar opposite of privacy. What part of "NEVER LOG INTO A SEARCH ENGINE" are we failing to grasp?

Subscription search also claims to provide better results. But it's predominantly just the same junk in a different order. Most of it is coming from the same sources. How could it not be the same junk?

Whilst it's arguably true that advertising has corrupted websearch, the greater corruptive influences lie outside of the search engine's control. First and foremost, capitalism, which drives an unthinkable volume of spam. More than 99% of the Web has an agenda beyond simply giving you information. That makes it biased. And subscription search prioritises biased content just like 'Bingle' does. Don't be surprised to see advertorial boards like TechRadar coming up as top results on Kagi. A revolution in integrity this is definitely not.

DON'T WATCH THAT, WATCH THIS! THIS IS THE HEAVY HEAVY RESEARCH PAPER!

Some of the least biased information on the Web comes in the form of student and scholarly research papers. They're one of the only forms of content whose sole motivation, typically at least, is accuracy. Broadly, if students write inaccurate papers they don't get their qualifications. So they write accurate papers.

The bar on search is now so low that subscription-based options don't have to be good. They only have to be marginally better than a wall of spam, and many of their users can't even conclusively decide if they meet that brief.

But how often do you see research papers coming up in your search results? Regardless of your search provider, it virtually never happens. And the reason behind that brings us to our next corruptive influence: user retention.

Research papers are usually tiring to read, and rich in what the public consider to be "spurious waffle". Reams of attributions, citations and method. So blogs skim the research papers for information, and present it in a much more digestible format, which the public likes. But blogs also spin the research to appease their funders. So you'll only get one side of the story, or cherry-picked lines, quoted out of context. The accuracy is gone, and a bias is introduced.

Search engines then rank the blogs, because they know the public won't like being presented with heavy research papers. The way the search engine itself is monetised doesn't come into it. You're gonna see compromised results because most users will not tolerate the format of the accurate results.

There are many other parallels, in which search engines sacrifice content integrity for user experience.

ALL FOR ONE - NOT ONE FOR ALL

We've become so reliant on the convenience of one search engine which acts as a personal assistant, that we've allowed the convenience to subordinate the usefulness of the machine.

We're now at a point where individual sites' search routines can provide much better results than any general websearch engine. For example, if you want high-integrity research, ResearchGate's search engine affords access to well over a hundred million papers. Is downloading and trawling through PDFs as easy as dolloping a query into Kagi and then reading whatever [insert brand here] paid TechRadar to publish about it? No. But if you care about the truth, you'll end up wasting a lot more time trying to find it via general websearch than if you go to sources of truth.

The more convenient a means of finding things, the more susceptible its output is to corruption.

And it's not just scholars who deal in truth. Communities deal in truth too. Or at least, they intend to. Going directly to community platforms like Hacker News, Mastodon, or smaller forums, can bypass websearch's single point of failure.

True, people within communities may be brainwashed by the falsehoods pushed at them by websearch, but the difference is that they can be, and will be, contradicted by peers. Even when companies start astroturfing, there will still be disagreement from genuinely impartial members of the public. So you get to see different sides of the story. With a search engine, there is no contradictor, and we're only just starting to see how far that absence of dissenting voices can be pushed in the name of exploitation…

CENTRAL CONTROL

Websearch's decision-making is totally centralised. We don't see any conversation or debate. We only see "the results". And just one party gets to prioritise them, in secret. It's a black box. AI bots make this problem even worse. In the context of search, AI assistants are black boxes within a black box. The more convenient a means of finding things, the more susceptible its output is to corruption.

In a bid to illustrate the above problem, I tried playing devil's advocate with DuckDuckGo. After reading an article containing the phrase "don't log into search engines", I searched the phrase on DuckDuckGo - using quotes - and I got no results. No results at all. Then I searched another line from the same article, on the same search engine. The article magically appeared in the second search. That encapsulates the current state of websearch. Blatant censorship of "the wrong kind of query".

And that has nothing to do with advertising. Censorship is a property of any centralised source with vested interests. The solution to it is to 'travel'. To cross-reference. To query many sources. Not to pay a new single source to encage you in a "personalised" bubble where censorship becomes even easier.

Spreading the load of discovery across a wide range of original sources is more labour-intensive, but it results in a much better grip on reality.

PARTYING LIKE IT'S 1999

Another resort I've found useful in the age of websearch dysfunction is a return to 1990s-style discovery. That is, selecting a twenty-odd-year-old page on the Internet Archive, and then clicking links. In the 'nineties and early two-thousands, people linked out to third parties in a way they no longer do in this age of eternal transaction.

AI assistants are black boxes within a black box.

Shock horror, when used in good faith the hyperlink does the job it was originally intended to do, and lets you navigate the archived Web without a search engine. Within an archive, the method is only of use for historical study - it's no good for contemporary fact-finding. But it's a great illustration of how a return to outlinking in today's world would help us heavily reduce our dependency on websearch.

As writers, let's pledge to link to a few genuinely interesting, useful (and ideally non-commercial) things in each post. Not as a transaction with other writers - transactional linking is yet another of websearch's polluting factors - but as a service to the people who read our work. Let's make ourselves part of a collective, community search engine, which is fully decentralised.

Ultimately, with paid search you're LOGGED IN. You can't not be logged in. They know it's you. Every search. That's not privacy. It's the opposite of privacy. And if you believe a Surveillance Valley startup when it says your data isn't being collected and exploited, you must have spent the past two decades asleep. By and large, the results have no more integrity than those you'll find on other websearch engines - and many will be the exact same sites.

So pay a sub to a search engine if you wish, but when the boss decides to sell you off to a cyber giant (and Kagi's Vladimir Prelovac already has form in this department), leaving you no wiser than you would have been with Google or Bing, don't say I didn't warn you.

There's no shortcut to high quality information. You get what you work for. If you're not working to find your facts, what you're getting is almost certainly a low-grade substitute.