Thread

Here's a left-side-of-the-bell-curve way to do the Internet Archive "right": - Create browser extension - User loads page - User clicks "archive" button - Whatever is in user's browser gets signed & published to relays - Archival event contains URL, timestamp, etc. - Do OpenTimestamps attestation via NIP-03 - ??? - Profit I'm sure there's a 100 details I'm glossing over but because this is user-driven and does all the archiving "on the edge" it would just work, not only in theory but very much so in practice. The reason why the Internet Archive can be blocked is because it is a central thing, and if users do an archival request they don't do the archiving themselves, they send the request to a central server that does the archiving. And that central server can be blocked.

Replies (73)

I hate to bring the bad news but, without some added mechanism, this would only work for truly static pages (i.e. those where the same .html file is served over and over again) whereas most of the content served is tainted with server generated β€œfluffy”, which could range from some ads on the sidebar to the very text of an article being changed. My point is: even if we both visit the same β€œpage” at the same time, it’s more likely than not that we’ll get different versions served, even if they differ only by a few lines of code and would look identical as far as the actual content is concerned.
Consider using kind 31. We put some thought into the tags, to meet academic citation standards. { "kind": 31, "pubkey": "<citation-writer-pubkey>", "tags": [ // mandatory tags ["u", "<URL where citation was accessed>"] ["accessed_on", "<date-time in ISO 8601 format>"], ["title", "<title to display for citation>"], ["author", "<author to display for citation>"], // additional, optional tags ["published_on", "<date-time in ISO 8601 format>"], ["published_by", "<who published the citation>"], ["version", "<version or edition of the publication>"], ["location", "<where was it written or published>"], ["g", "<geohash of the precise location>"], ["open_timestamp", "<`e` tag of kind 1040 event>"], ["summary", "<short explanation of which topics the citation covers>"], ], "content": "<text cited>" }
It depends on the server. Some servers can require the user to be subscribed. But there is also an option to pay-per-request. I built an example server using it
Whenever the HTML that’s rendered in your browser contains some personal information (e.g. an email, your legal name, whatever), it would be included in the archive page and signed by you. If you are not really really careful about what the extension includes in the page, you could leak information that you don’t want to share. The same with stuff that might not even be visible to you. Imagine a newspaper that has a profile page modal for logged in users. The modal is part of the HTML, but hidden via css until you open it. HTML scrapers would still include all the data that is part of the hidden model, without it ever being visible on the users visit
i would have for sure read that years ago, but a great reminder ty gigi i knew they try were trying to use this on music but i didnt realise they were embedding gotcha code so they can police how the user uses their computer, and heaven forbid, copies something. fucking hilarious. hard to imagine why they’re dying such a quick death πŸ˜‚ they’re suiciding themselves. making their product shit all because they cant come to terms with the characteristics of water. i guess we should thank them
That’s a good thought . I have an extension I’m working on that bridges the web over to nostr allowing users to create discussions anywhere on the web using nostr. It seems like an archive function would be a solid addition. If I can get the universal grill box idea solid I will work on the archival concept as well.
I've been casually vibe coding this since Wednesday. I think it's quite a powerful idea. I have zero experience with making an extension, but it's the first time AI called a project 'seriously impressive' when I threw Gigi's idea in there. So far I have come up with a few additional features but the spec would be this at a minimum: OTS via NIP-03 Blossom for media 3 different types of archiving modes: Forensic Mode: Clean server fetch, zero browser involvement = no tampering Verified Mode: Dual capture (server + local) + automatic comparison = manipulation detection Personal Mode: Exact browser view including logged-in content = your evidence Still debugging Blossom integration and NIP-07 for signing extensions seems tricky. The only caveat is you would need a proxy to run verified + forensic modes, as CORS will block requests otherwise. Not sure how that would be handled other than hosting a proxy. Once I have a somewhat working version I may just throw all the source code out there, I dunno. image Some test archives I've done on a burner account using this custom Nostr archive explorer here. View quoted note β†’
I made this extension: https://github.com/fiatjaf/nostr-web-archiver/releases/tag/whatever, which is heavily modified from that other one. Damn, this "Lit" framework for making webgarbage is truly horrible, and this codebase is a mess worse than mine, but I'm glad they have the dirty parts of actually archiving the pages working pretty well. Then there is for browsing archives from others. Please someone test this. If I have to test it again myself I'll cry. I must wait some days now to see if Google approves this extension on their store, meanwhile you can install it manually from the link above.