InternetArchive.NET

MIT licensed, open source, and in support of Experimental Television Center, a 501(c)(3) non-profit established 1971.


InternetArchive.NET is a type-safe library providing access to all features at Internet Archive.

Internet Archive’s API evolved over nearly two decades using a variety of technical conventions each fashionable at specific moments in time. XML? S3-like? REST-ish? JSON Patch-sorta? They got it. Many API remain in beta. Documentation is sparse. No fewer than seven different date formats are in use.

And it’s a fantastic resource.

The data source itself is an evolving schema implemented in PHP on the server typically consumed by curl or a Python client.

Using scripting languages on both the client and server is perhaps not the best approach to enforce consistency across 700 billion documents.

That said, I watched Microsoft slog through years of overengineered distributed systems technology and sat through endless meetings arguing about namespaces and identifiers. I’m not surprised that informal solutions like “if that doesn’t work, try two underscores” scales just fine to nearly a trillion objects.

Still, I couldn’t get anything done with the toolset as provided.

I reverse engineered the archive.org backend, discovered the expected quirks and security glitches, and provided the robustness and reliability necessary to treat archive.org like a true platform through careful engineering and custom retry logic.

InternetArchive.NET