Thread - Nostr Hypermedia

Here's a left-side-of-the-bell-curve way to do the Internet Archive "right": - Create browser extension - User loads page - User clicks "archive" button - Whatever is in user's browser gets signed & published to relays - Archival event contains URL, timestamp, etc. - Do OpenTimestamps attestation via NIP-03 - ??? - Profit I'm sure there's a 100 details I'm glossing over but because this is user-driven and does all the archiving "on the edge" it would just work, not only in theory but very much so in practice. The reason why the Internet Archive can be blocked is because it is a central thing, and if users do an archival request they don't do the archiving themselves, they send the request to a central server that does the archiving. And that central server can be blocked.

Replies (73)

npub1am50...ks29 4 months ago

I'm not saying this isn't an issue with Internet Archive, but would it be easy to spoof page contents? If so, it would be neat to allow other npubs to sign and verify content is accurate

Duvel 4 months ago

↩ replying to npub1am50...ks29

That is a very integrating suggestion. E.g. the hash of the single archived page should match the hash of the original single page.

Gigi 4 months ago

↩ replying to Duvel

Can be done implicitly. If my archived version is the same as your archived version I automatically vouch for your version.

npub14w4q...3rwx 4 months ago

↩ replying to Gigi

I hate to bring the bad news but, without some added mechanism, this would only work for truly static pages (i.e. those where the same .html file is served over and over again) whereas most of the content served is tainted with server generated “fluffy”, which could range from some ads on the sidebar to the very text of an article being changed. My point is: even if we both visit the same “page” at the same time, it’s more likely than not that we’ll get different versions served, even if they differ only by a few lines of code and would look identical as far as the actual content is concerned.

hzrd149 🛡️ 4 months ago

Just need to nostrify (add social login) to this

Webrecorder

ArchiveWeb.page • Webrecorder

Archive websites as you browse with the ArchiveWeb.page Chrome extension or standalone desktop app.

Sebastix 🛡️ 4 months ago

↩ replying to hzrd149

Interesting! This extension will archive a complete website?

Duvel 4 months ago

↩ replying to Sebastix

A would say a single page.

hzrd149 🛡️ 4 months ago

↩ replying to Sebastix

It records a session, so a whole website if you click everything IMO recording the session is better than trying to crawl the whole site since it captures exactly what you interested in and doesn't get confused by the way websites are built nowadays

Laeserin 🇻🇦 🛡️ 4 months ago

↩ replying to Sebastix

That's like the Playwright snapshots.

Laeserin 🇻🇦 🛡️ 4 months ago

↩ replying to hzrd149

Consider using kind 31. We put some thought into the tags, to meet academic citation standards. { "kind": 31, "pubkey": "<citation-writer-pubkey>", "tags": [ // mandatory tags ["u", "<URL where citation was accessed>"] ["accessed_on", "<date-time in ISO 8601 format>"], ["title", "<title to display for citation>"], ["author", "<author to display for citation>"], // additional, optional tags ["published_on", "<date-time in ISO 8601 format>"], ["published_by", "<who published the citation>"], ["version", "<version or edition of the publication>"], ["location", "<where was it written or published>"], ["g", "<geohash of the precise location>"], ["open_timestamp", "<`e` tag of kind 1040 event>"], ["summary", "<short explanation of which topics the citation covers>"], ], "content": "<text cited>" }

Gigi 4 months ago

↩ replying to hzrd149

That's very much right-side-of-the-bell-curve from the looks of it.

Gigi 4 months ago

↩ replying to Gigi

If we wanna use something existing as the base and add some nostr magic to it I'd probably go with something like ArchiveBox https://github.com/ArchiveBox/ArchiveBox

elsat 🛡️ 4 months ago

↩ replying to Gigi

Yuge. Immediate fork, with upstream nostr PR would be 🔥

Gigi 4 months ago

↩ replying to elsat

Or better yet: https://github.com/gildas-lormeau/SingleFile - very simple, and is a browser extension already.

elsat 🛡️ 4 months ago

↩ replying to Gigi

@fiatjaf @Terry Yiu can y’all combine with nostr browser extension?

Gigi 4 months ago

↩ replying to elsat

Push output HTML file to blossom & create archive event accordingly.

Gigi 4 months ago

↩ replying to Gigi

SingleFile can even be set up to push things to an arbitrary API. So it should be possible to jerryrig this quite quickly with some spit and some duct tape.

elsat 🛡️ 4 months ago

↩ replying to Gigi

@hzrd149 does blossom have some micro-payment mechanism to allow payment for server costs of hosting a webpage? Maybe per file/webpage?

the axiom 4 months ago

↩ replying to Gigi

duct tape is stupid short sighted, you need a decent standard

the axiom 4 months ago

↩ replying to hzrd149

true

the axiom 4 months ago

↩ replying to elsat

we need this @utxo the webmaster 🧑‍💻

hzrd149 🛡️ 4 months ago

↩ replying to elsat

It depends on the server. Some servers can require the user to be subscribed. But there is also an option to pay-per-request. I built an example server using it

GitHub

GitHub - hzrd149/blob-drop: A blossom server that only stores blobs for a day

A blossom server that only stores blobs for a day. Contribute to hzrd149/blob-drop development by creating an account on GitHub.

Sebastix 🛡️ 4 months ago

Hmmm we could nostrify

Linkwarden - Bookmarks, Evolved

Linkwarden helps you collect, read, annotate, and fully preserve what matters, all in one place.

where all the data & assets should be stored on relays and blossom servers. I have used Linkwarden (self-hosted) for a while for my bookmarks.

Duvel 4 months ago

↩ replying to Sebastix

Linkwarden looks great! What is a blossom server?

2140.wtf #blockworkgallery 🛡️ 4 months ago

↩ replying to Duvel

A place where media presented on nostr protocol are stored, compressed to be opened by any client. When you upload a video or image via nistr app you can choose a blossom server that stores it for you. It cannbe one that you run by yourself forever, this way you own your data. You don't want to store everything on the relays for many reasons...

Duvel 4 months ago

↩ replying to 2140.wtf #blockworkgallery

Very interesting. Thank you. Do you have a good link for beginners to run your own relay?

Diacone Frost 🛡️ 4 months ago

↩ replying to Sebastix

Linkwarden has one big problem with accepting cookies dialogue. Your PDF will likely contain only that fcking popup.

npub1d3uj...hmqz 4 months ago

wasnt p2p yacy search engine doing archival and browsing and spidering stuff in the old days?

npub1h4r3...7ev4 4 months ago

↩ replying to npub1d3uj...hmqz

web archiving is a childs play. what we want to do is decentralize web crawling data. nobody uses yacy which implicate it failed. what is the total size of latest common crawl? (estimate) Compressed size (gzip‑ed WARC) 250‑350 TB solve this.

npub1rhfs...xf7n 4 months ago

I'm sure you're not glossing over any details academics might disagree but they're dead inside

Egge 🛡️ 4 months ago

I fear this would lead to soooo much personal information being doxxed on accident. I would never risk to click such a button. Maybe we can make sure the extension scans for PID first, but still…

Gigi 4 months ago

↩ replying to Egge

Egge 🛡️ 4 months ago

↩ replying to Gigi

Whenever the HTML that’s rendered in your browser contains some personal information (e.g. an email, your legal name, whatever), it would be included in the archive page and signed by you. If you are not really really careful about what the extension includes in the page, you could leak information that you don’t want to share. The same with stuff that might not even be visible to you. Imagine a newspaper that has a profile page modal for logged in users. The modal is part of the HTML, but hidden via css until you open it. HTML scrapers would still include all the data that is part of the hidden model, without it ever being visible on the users visit

Gigi 4 months ago

↩ replying to Egge

Oh! That's a great point. One of the aforementioned 100 things I didn't think about. Solvable issue, but still an issue.

Milan Ljepoja 🛡️ 4 months ago

User-driven archiving on #Nostr makes history harder to erase and impossible to fake.

Nathan Day 🛡️ 4 months ago

+ Blossom + Attestations

NostrHub

NostrHub | Discover and Publish NIPs

Explore official NIPs and publish your own custom NIPs on NostrHub.

Owner_of_donky 4 months ago

The issue with user-driven archiving is that not all users have good intentions. One could try and will probably succeed trying to upload malicious "copies" of a website

Russell 🛡️ 4 months ago

↩ replying to Owner_of_donky

This is why we have the web of trust

ihsotas 4 months ago

↩ replying to Owner_of_donky

It’s signed by the user and the reputation becomes king.

npub158qe...4xtl 4 months ago

Sounds interesting.

Constant 4 months ago

Anyone complaining things can be faked or spoofed etc. Yes, yes indeed, everything that is published unhashed and unsigned is fake and gay to begin with. Kill the web, all hail the new web.

Gigi 4 months ago

↩ replying to Constant

Preach!

Sondre | satoshiconsult.com 4 months ago

We absolutely need this

DudeChicken 🛡️ 4 months ago

@Marty Bent is the web extension guy now. Get to work.

npub190qu...qt8q 4 months ago

Interesting idea. Would it be enough to make a screenshot of the website, hash it and timestamp it?

GLACA 🛡️ 4 months ago

DECENTRALIZE EVERYTHING. View quoted note →

GLACA 🛡️ 4 months ago

This is an urgent case as we live in the last days of "truth". Everything is being manipulated and erased in REAL TIME. Decentralized Internet Archive! Love the idea. LFG!

Owen Gregory 4 months ago

↩ replying to GLACA

Ars Technica

Anthropic destroyed millions of print books to build its AI models

Company hired Google's book-scanning chief to cut up and digitize "all the books in the world."

GLACA 🛡️ 4 months ago

↩ replying to Owen Gregory

Read about it before. Crazy how digital archives are targeted too...but this here is outrageous!

Owen Gregory 4 months ago

There really need to be swift vigilantiesque public hangings and pikings for the destruction of knowledge. The total control of access to knowledge is on the same demonic wish list as total surveillance.

Ars Technica

Anthropic destroyed millions of print books to build its AI models

Company hired Google's book-scanning chief to cut up and digitize "all the books in the world."

Dan 4 months ago

I’m very left side. Can you just get a bot to auto archive everything

Richi 🛡️ 4 months ago

#YESTR View quoted note →

npub19xhd...2n8g 4 months ago

"Whatever is in user's browser gets signed & published to relays ". This is the problem. For paywalled content, how can we be sure that there is no beacon stored somewhere in the page (DOM, js, html) that identifies the subscriber?

Gigi 4 months ago

↩ replying to npub19xhd...2n8g

Let's focus on regular content and cross that bridge if we get there. The main issue is that the big services are centralized and the self-hosted stuff isn't syndicated.

47 4 months ago

↩ replying to Gigi

are paywalling services doing that - and punishing the user for screenshotting etc? hahahahahahahshahahaha that’s so crazy if so. wow

Gigi 4 months ago

↩ replying to 47

They're trying everything in their power to make water not wet.

47 4 months ago

↩ replying to Gigi

i would have for sure read that years ago, but a great reminder ty gigi i knew they try were trying to use this on music but i didnt realise they were embedding gotcha code so they can police how the user uses their computer, and heaven forbid, copies something. fucking hilarious. hard to imagine why they’re dying such a quick death 😂 they’re suiciding themselves. making their product shit all because they cant come to terms with the characteristics of water. i guess we should thank them

47 4 months ago

↩ replying to 47

talk about failing the ego test

npub15psl...qnr7 4 months ago

↩ replying to npub19xhd...2n8g

How that’s Archive do it?

[Redacted] 🛡️ 4 months ago

↩ replying to Gigi

I don't think the goal has ever been to make data impossible to copy. The goal is most likely to make copying certain data more difficult. DRM has done that, whether you like it or not. The industry wouldn't do it if it didn't work to some degree.

[Redacted] 🛡️ 4 months ago

↩ replying to [Redacted]

But I also hate DRM and how it works. Totally agree on that.

npub15dcc...g5zv 4 months ago

↩ replying to Gigi

Futile action by a dying industry.

47 4 months ago

↩ replying to npub15dcc...g5zv

not a dying industry… the players will just change and the rules of engagement will evolve

npub15dcc...g5zv 4 months ago

↩ replying to 47

ihsotas 4 months ago

That’s a good thought . I have an extension I’m working on that bridges the web over to nostr allowing users to create discussions anywhere on the web using nostr. It seems like an archive function would be a solid addition. If I can get the universal grill box idea solid I will work on the archival concept as well.

npub1hk9w...ppcr 4 months ago

All the JavaScript getting ingested? Worried about the privacy part but very interesting.

Reed 🛡️ 4 months ago

Calling all vibe coders! View quoted note →

npub180x9...mehq 4 months ago

I've been casually vibe coding this since Wednesday. I think it's quite a powerful idea. I have zero experience with making an extension, but it's the first time AI called a project 'seriously impressive' when I threw Gigi's idea in there. So far I have come up with a few additional features but the spec would be this at a minimum: OTS via NIP-03 Blossom for media 3 different types of archiving modes: Forensic Mode: Clean server fetch, zero browser involvement = no tampering Verified Mode: Dual capture (server + local) + automatic comparison = manipulation detection Personal Mode: Exact browser view including logged-in content = your evidence Still debugging Blossom integration and NIP-07 for signing extensions seems tricky. The only caveat is you would need a proxy to run verified + forensic modes, as CORS will block requests otherwise. Not sure how that would be handled other than hosting a proxy. Once I have a somewhat working version I may just throw all the source code out there, I dunno.

Some test archives I've done on a burner account using this custom Nostr archive explorer here.

Nostrie - Nostr Web Archive Explorer

View quoted note →

fiatjaf 🛡️ 3 months ago

You say this is left-side but there is nothing on the right-side of the curve since what you describe here is already at maximum complexity. And that archiver extension is a mess. But sure, it's a good idea, so it must be done.

fiatjaf 🛡️ 2 months ago

I made this extension: https://github.com/fiatjaf/nostr-web-archiver/releases/tag/whatever, which is heavily modified from that other one. Damn, this "Lit" framework for making webgarbage is truly horrible, and this codebase is a mess worse than mine, but I'm glad they have the dirty parts of actually archiving the pages working pretty well. Then there is

websitestr

for browsing archives from others. Please someone test this. If I have to test it again myself I'll cry. I must wait some days now to see if Google approves this extension on their store, meanwhile you can install it manually from the link above.

Gigi 2 months ago

↩ replying to fiatjaf

👀

Pana - The Refuge Network State 🛡️ 2 months ago

↩ replying to fiatjaf

Please keep things uploaded into non-Google sites 🙏 Google is sold, Google is finished and should be heavily boycotted for what they're doing to us 🕳️🐇

Dawn 🛡️ 2 months ago

↩ replying to fiatjaf

It works. I'm not sure how to view my own, but my Amber log shows what I think is all the right activities. I'm not sure what the crying is about. This extension is more cooperative than the scrobbler one.