The Wayback Machine: Why Big Tech Fears the Internet Archive
Brewster Kahle built a library to save the entire internet from disappearance. The work still matters, and the opposition proves it.
The Wayback Machine: Why Big Tech Fears the Internet Archive
In 1996, Brewster Kahle had a simple observation: the internet is ephemeral. Web pages vanish. Services shut down. History evaporates. The medium that was supposed to be a permanent record was actually a massively distributed amnesia machine, where anything that was not actively maintained would disappear as surely as if it had never existed.
So he built a library.
The Internet Archive, founded in 1996 and publicly announced in 2001, undertook the largest archival project in human history: to preserve the entire web. Crawl every site, save every version, keep multiple copies in multiple locations, and make it all searchable and accessible to anyone, for free, forever. The Wayback Machine, launched in 2001, made this archive public. You can visit any web page and see what it looked like at any point in the past thirty years.
This should be uncontroversial. A library preserving history for public benefit is one of civilization's oldest institutions. That the Internet Archive faces legal threats, cease-and-desist letters, and constant pressure to remove content proves how radical archival is in an age where the powerful do not want history to exist.
What Disappears, What Remains
The first thing you notice when you use the Wayback Machine is how much is actually gone. Visit a major news site's URL from 2005 and you will likely see nothing. The domain still exists, but the specific URLs are gone, rewritten, redirected, or deleted entirely. All the articles from that year are irretrievable, not because they were never archived, but because the publisher changed their URL structure and broke everything.
News organizations do not think in terms of permanent history. They think in terms of traffic and SEO. When Google changed its algorithm to favor fresh content, news sites responded by deprecating old articles, hiding them behind paywalls, or restructuring URLs to optimize for current engagement instead of archive value.
This is where the Wayback Machine becomes essential. An article that no longer exists on the publisher's site might still be findable on archive.org. A press release that was scrubbed from the internet when it became inconvenient can be retrieved. A corporate website from ten years ago that revealed business practices no longer acknowledged is still there, a permanent record that somebody wanted to hide.
The politicians understand this. The corporations understand this. The wealthy understand this. The Wayback Machine is a threat precisely because it operates on the principle that history should be accessible, that old information should not be erased simply because someone with power decided it should be.
The Legal Wars
The Internet Archive has been in constant litigation for decades. Tor Books sued to prevent archival of their e-books (settled in 2020 with an agreement that some books would not be archived). Publishers have repeatedly claimed that preserving their websites constitutes copyright infringement, despite the fact that the Archive is doing exactly what libraries have always done: preserving published works for future generations.
The most aggressive attacks come from the companies that have the most to lose from accessible history. The Wayback Machine has been used to reveal how technology companies changed their terms of service over time, what they claimed before they changed their claims, how they characterized their business models before they changed those models. It is a source of embarrassment for any company that wants to pretend that today's version of themselves is the only version that ever existed.
In 2020, the Internet Archive faced a major challenge when the courts ruled that it could not legally lend digital books under existing copyright law. The decision did not stop the Archive, but it illustrated the fundamental conflict: copyright law was written in an era when physical scarcity was the limiting factor. Digital materials can be preserved at almost zero cost, yet copyright law treats preservation as a threat instead of a public good.
The Archive also faces constant cease-and-desist letters from companies demanding that their websites be removed from the archive. Most requests are denied unless they are from the site owner (which is rare; most companies do not bother). But the pressure is constant. Big Tech does not want you to see what it looked like before it became what it is now.
What Gets Preserved
The Wayback Machine has preserved billions of web pages. What makes some preservation meaningful is not its completeness but its accessibility. The Archive does not just save; it makes searchable, contextualizes, and curates. You can see a timeline of how a website evolved. You can compare snapshots from different years. You can follow a URL through transformations that revealed the shifting priorities of the people who maintained it.
The Archive also preserves more than text. It saves images, styles, scripts, and structure. A website from 2003 is not just text; it is a complete artifact of how the web worked in 2003. The design choices, the bandwidth constraints, the aesthetic sensibilities, all of it is preserved as a historical record.
Beyond websites, the Archive preserves books, texts, audio, and video. The Open Library project has digitized millions of books. The Audio Archive has preserved decades of radio broadcasts, podcasts, and recorded performances. The Video Archive contains television news, historical footage, and cultural artifacts that would otherwise be lost to time.
This is not nostalgic preservation. This is essential infrastructure. When a journalist needs to verify a fact about what a politician said in 2010, the Archive might be the only source. When a researcher needs to understand how an industry changed, the Archive provides the timeline. When a whistleblower needs to prove that a company lied about its history, the Archive provides evidence.
The Censorship Problem
Governments also want access to the Wayback Machine, not to preserve it but to erase it. The Internet Archive has been pressured to remove archived content by governments in Turkey, Russia, China, and others. The requests are often framed as removal of "misinformation" or "harmful content," which is another way of saying "stuff we do not like."
The Archive has largely refused these requests, defending the principle that the historical record should not be subjected to political whims. This makes the Archive a geopolitical actor, not by intention but by consequence. Preserving history in an age of information control is inherently political.
The Chinese government has been particularly aggressive. After the Hong Kong protests, the Archive was pressured to remove content related to the protests. When COVID emerged, governments wanted archived content about early pandemic statements removed. The Archive has become a battlefield in the larger war over what should be forgotten and what should remain.
This is why Brewster Kahle has always emphasized that the Archive needs redundancy. Information wants to be free, sure, but only if the infrastructure preserving it survives. The Archive maintains multiple copies on multiple servers in multiple countries specifically to ensure that no single actor can unilaterally delete history.
The Larger Principle
The Internet Archive operates on a principle that has become radical: that preserved cultural artifacts should belong to everyone, that history should not require permission, that the past should be accessible to anyone who cares to look.
This contradicts the assumption that built the internet as a commercial platform. In the attention economy, the past is overhead. If an old article contradicts today's narrative, it is better if that article is hard to find. If a website changed dramatically over time, it is better if visitors only see the current version. Deletability and revisibility are features, not bugs.
The Wayback Machine insists otherwise. It says: the past matters. It is permanent. You cannot erase what happened by stopping publication. The record remains.
In an era of deepfakes, AI-generated content, and deliberate disinformation, this principle has become urgent. The only way to distinguish what actually happened from what someone claims happened is to have an accessible, tamper-resistant record. The Internet Archive is not a relic of web culture. It is essential infrastructure for truth.
Keeping It Alive
The Internet Archive is a nonprofit, and nonprofits are fragile. Brewster Kahle has funded it personally for years, but institutions this important cannot depend on the charity of individual billionaires. The Archive needs stable funding, institutional support, and protection from the legal and political attacks that will only intensify as the companies and governments it archives become more desperate to control their histories.
There is a reason to care. The Wayback Machine is not just a curiosity. It is a permanent record that the powerful cannot erase. It is a library that proves something different is possible. It is proof that we can build institutions designed for the public good instead of shareholder value.
The internet does not have to be ephemeral. The past does not have to disappear. Someone is saving it. They have been saving it for thirty years. The only question is whether we choose to protect that work or allow power to erase it.
The archive is there. The record is accessible. The history remains.