Here's a left-side-of-the-bell-curve way to do the Internet Archive "right":
- Create browser extension
- User loads page
- User clicks "archive" button
- Whatever is in user's browser gets signed & published to relays
- Archival event contains URL, timestamp, etc.
- Do OpenTimestamps attestation via NIP-03
- ???
- Profit
I'm sure there's a 100 details I'm glossing over but because this is user-driven and does all the archiving "on the edge" it would just work, not only in theory but very much so in practice.
The reason why the Internet Archive can be blocked is because it is a central thing, and if users do an archival request they don't do the archiving themselves, they send the request to a central server that does the archiving. And that central server can be blocked.
Thread
Login to reply
Replies (73)
I'm not saying this isn't an issue with Internet Archive, but would it be easy to spoof page contents?
If so, it would be neat to allow other npubs to sign and verify content is accurate
That is a very integrating suggestion. E.g. the hash of the single archived page should match the hash of the original single page.
Can be done implicitly. If my archived version is the same as your archived version I automatically vouch for your version.
I hate to bring the bad news but, without some added mechanism, this would only work for truly static pages (i.e. those where the same .html file is served over and over again) whereas most of the content served is tainted with server generated βfluffyβ, which could range from some ads on the sidebar to the very text of an article being changed.
My point is: even if we both visit the same βpageβ at the same time, itβs more likely than not that weβll get different versions served, even if they differ only by a few lines of code and would look identical as far as the actual content is concerned.
Just need to nostrify (add social login) to this 

Webrecorder
ArchiveWeb.page β’ Webrecorder
Archive websites as you browse with the ArchiveWeb.page Chrome extension or standalone desktop app.
Interesting! This extension will archive a complete website?
A would say a single page.
It records a session, so a whole website if you click everything
IMO recording the session is better than trying to crawl the whole site since it captures exactly what you interested in and doesn't get confused by the way websites are built nowadays
That's like the Playwright snapshots.
Consider using kind 31. We put some thought into the tags, to meet academic citation standards.
{
"kind": 31,
"pubkey": "<citation-writer-pubkey>",
"tags": [
// mandatory tags
["u", "<URL where citation was accessed>"]
["accessed_on", "<date-time in ISO 8601 format>"],
["title", "<title to display for citation>"],
["author", "<author to display for citation>"],
// additional, optional tags
["published_on", "<date-time in ISO 8601 format>"],
["published_by", "<who published the citation>"],
["version", "<version or edition of the publication>"],
["location", "<where was it written or published>"],
["g", "<geohash of the precise location>"],
["open_timestamp", "<`e` tag of kind 1040 event>"],
["summary", "<short explanation of which topics the citation covers>"],
],
"content": "<text cited>"
}
That's very much right-side-of-the-bell-curve from the looks of it.
If we wanna use something existing as the base and add some nostr magic to it I'd probably go with something like ArchiveBox
https://github.com/ArchiveBox/ArchiveBox
Yuge. Immediate fork, with upstream nostr PR would be π₯
Or better yet: https://github.com/gildas-lormeau/SingleFile - very simple, and is a browser extension already.
@fiatjaf @Terry Yiu can yβall combine with nostr browser extension?
Push output HTML file to blossom & create archive event accordingly.
SingleFile can even be set up to push things to an arbitrary API. So it should be possible to jerryrig this quite quickly with some spit and some duct tape.
@hzrd149 does blossom have some micro-payment mechanism to allow payment for server costs of hosting a webpage?
Maybe per file/webpage?
duct tape is stupid short sighted, you need a decent standard
true
we need this
@utxo the webmaster π§βπ»
It depends on the server. Some servers can require the user to be subscribed. But there is also an option to pay-per-request. I built an example server using it

GitHub
GitHub - hzrd149/blob-drop: A blossom server that only stores blobs for a day
A blossom server that only stores blobs for a day. Contribute to hzrd149/blob-drop development by creating an account on GitHub.
Hmmm we could nostrify
where all the data & assets should be stored on relays and blossom servers.
I have used Linkwarden (self-hosted) for a while for my bookmarks.

Linkwarden - Bookmarks, Evolved
Linkwarden helps you collect, read, annotate, and fully preserve what matters, all in one place.
Linkwarden looks great! What is a blossom server?
A place where media presented on nostr protocol are stored, compressed to be opened by any client. When you upload a video or image via nistr app you can choose a blossom server that stores it for you. It cannbe one that you run by yourself forever, this way you own your data. You don't want to store everything on the relays for many reasons...
Very interesting. Thank you. Do you have a good link for beginners to run your own relay?
Linkwarden has one big problem with accepting cookies dialogue. Your PDF will likely contain only that fcking popup.
wasnt p2p yacy search engine doing archival and browsing and spidering stuff in the old days?
web archiving is a childs play.
what we want to do is decentralize web crawling data. nobody uses yacy which implicate it failed.
what is the total size of latest common crawl? (estimate)
Compressed size (gzipβed WARC) 250β350β―TB
solve this.
I'm sure you're not glossing over any details
academics might disagree but they're dead inside
I fear this would lead to soooo much personal information being doxxed on accident. I would never risk to click such a button.
Maybe we can make sure the extension scans for PID first, but stillβ¦
?
Whenever the HTML thatβs rendered in your browser contains some personal information (e.g. an email, your legal name, whatever), it would be included in the archive page and signed by you. If you are not really really careful about what the extension includes in the page, you could leak information that you donβt want to share. The same with stuff that might not even be visible to you.
Imagine a newspaper that has a profile page modal for logged in users. The modal is part of the HTML, but hidden via css until you open it. HTML scrapers would still include all the data that is part of the hidden model, without it ever being visible on the users visit
Oh! That's a great point. One of the aforementioned 100 things I didn't think about. Solvable issue, but still an issue.
User-driven archiving on #Nostr makes history harder to erase and impossible to fake.
The issue with user-driven archiving is that not all users have good intentions. One could try and will probably succeed trying to upload malicious "copies" of a website
This is why we have the web of trust
Itβs signed by the user and the reputation becomes king.
Sounds interesting.
Anyone complaining things can be faked or spoofed etc.
Yes, yes indeed, everything that is published unhashed and unsigned is fake and gay to begin with. Kill the web, all hail the new web.
Preach!
We absolutely need this
@Marty Bent is the web extension guy now. Get to work.
Interesting idea. Would it be enough to make a screenshot of the website, hash it and timestamp it?
DECENTRALIZE EVERYTHING.
View quoted note β
This is an urgent case as we live in the last days of "truth". Everything is being manipulated and erased in REAL TIME. Decentralized Internet Archive!
Love the idea. LFG!
Read about it before. Crazy how digital archives are targeted too...but this here is outrageous!
There really need to be swift vigilantiesque public hangings and pikings for the destruction of knowledge. The total control of access to knowledge is on the same demonic wish list as total surveillance.


Ars Technica
Anthropic destroyed millions of print books to build its AI models
Company hired Google's book-scanning chief to cut up and digitize "all the books in the world."
Iβm very left side. Can you just get a bot to auto archive everything
"Whatever is in user's browser gets signed & published to relays ".
This is the problem. For paywalled content, how can we be sure that there is no beacon stored somewhere in the page (DOM, js, html) that identifies the subscriber?
Let's focus on regular content and cross that bridge if we get there.
The main issue is that the big services are centralized and the self-hosted stuff isn't syndicated.
are paywalling services doing that - and punishing the user for screenshotting etc?
hahahahahahahshahahaha
thatβs so crazy if so. wow
They're trying everything in their power to make water not wet. 

i would have for sure read that years ago, but a great reminder ty gigi
i knew they try were trying to use this on music but i didnt realise they were embedding gotcha code so they can police how the user uses their computer, and heaven forbid, copies something. fucking hilarious.
hard to imagine why theyβre dying such a quick death π
theyβre suiciding themselves. making their product shit all because they cant come to terms with the characteristics of water.
i guess we should thank them
talk about failing the ego test
How thatβs Archive do it?
I don't think the goal has ever been to make data impossible to copy. The goal is most likely to make copying certain data more difficult. DRM has done that, whether you like it or not. The industry wouldn't do it if it didn't work to some degree.
But I also hate DRM and how it works. Totally agree on that.
Futile action by a dying industry.
not a dying industry⦠the players will just change and the rules of engagement will evolve

Thatβs a good thought . I have an extension Iβm working on that bridges the web over to nostr allowing users to create discussions anywhere on the web using nostr. It seems like an archive function would be a solid addition. If I can get the universal grill box idea solid I will work on the archival concept as well.
All the JavaScript getting ingested? Worried about the privacy part but very interesting.
Calling all vibe coders!
View quoted note β
I've been casually vibe coding this since Wednesday. I think it's quite a powerful idea. I have zero experience with making an extension, but it's the first time AI called a project 'seriously impressive' when I threw Gigi's idea in there.
So far I have come up with a few additional features but the spec would be this at a minimum:
OTS via NIP-03
Blossom for media
3 different types of archiving modes:
Forensic Mode: Clean server fetch, zero browser involvement = no tampering
Verified Mode: Dual capture (server + local) + automatic comparison = manipulation detection
Personal Mode: Exact browser view including logged-in content = your evidence
Still debugging Blossom integration and NIP-07 for signing extensions seems tricky. The only caveat is you would need a proxy to run verified + forensic modes, as CORS will block requests otherwise. Not sure how that would be handled other than hosting a proxy. Once I have a somewhat working version I may just throw all the source code out there, I dunno.
Some test archives I've done on a burner account using this custom Nostr archive explorer here.
View quoted note β
Some test archives I've done on a burner account using this custom Nostr archive explorer here.
Nostrie - Nostr Web Archive Explorer
You say this is left-side but there is nothing on the right-side of the curve since what you describe here is already at maximum complexity. And that archiver extension is a mess.
But sure, it's a good idea, so it must be done.
I made this extension: https://github.com/fiatjaf/nostr-web-archiver/releases/tag/whatever, which is heavily modified from that other one.
Damn, this "Lit" framework for making webgarbage is truly horrible, and this codebase is a mess worse than mine, but I'm glad they have the dirty parts of actually archiving the pages working pretty well.
Then there is for browsing archives from others.
Please someone test this. If I have to test it again myself I'll cry. I must wait some days now to see if Google approves this extension on their store, meanwhile you can install it manually from the link above.
websitestr
π
Please keep things uploaded into non-Google sites π
Google is sold, Google is finished and should be heavily boycotted for what they're doing to us π³οΈπ
It works. I'm not sure how to view my own, but my Amber log shows what I think is all the right activities.
I'm not sure what the crying is about. This extension is more cooperative than the scrobbler one.
