Uncommon Utilitarianism #2: Positive Utilitarianism Published on October 20, 2025 4:17 AM GMThttps://www.lesswrong.com/posts/NRxn6R2tesRzzTBKG/sublinear-utility-in-population-and-other-uncommon https://www.lesswrong.com/posts/FGEHXmK4EnXK6A6tA/uncommon-utilitarianism-2-positive-utilitarianism
The IABIED statement is not literally true Published on October 18, 2025 11:15 PM GMTI will present a somewhat pedantic, but I think important, argument for why, literally taken, the central statement of If Anyone Builds It, Everyone Dies is likely not true. I haven't seen others make this argument yet, and while I have some model of how Nate and Eliezer would respond to the other objections, I don’t have a good picture of which of my points here they would disagree with. The statement This is the core statement of Nate's and Eliezer’s book, bolded in the book itself: “If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.”No probability estimate is included in this statement, but the book implies over 90% probability. Later, they define superintelligence as<a href="#fny62y15wu2a8" rel="nofollow">[1]</a> “a mind much more capable than any human at almost every sort of steering and prediction task”. Similarly, on MIRI’s website, their essay titled The Problem, defines artificial superintelligence as “AI that substantially surpasses humans in all capacities, including economic, scientific, and military ones.” Counter-exampleHere is an argument that it’s probably possible to build and use<a href="#fns19nymbjaaj" rel="nofollow">[2]</a> a superintelligence (as defined in the book) with techniques similar to current ones without that killing everyone. I’m not arguing that this is a particularly likely way for humanity to build a superintelligence by default, just that this is possible, which already contradicts the book’s central statement. 1. I have some friends who are smart enough and good enough at working in large teams such that if you create whole-brain emulations from them<a href="#fn1bzajwolhsn" rel="nofollow">[3]</a>, then run billions of instances of them at 100x speed, they can form an Em Collective that will probably soon surpass humans in all capacities, including economic, scientific, and military ones.This seems very likely true to me. The billions of 100x speed-up smart human emulations can plausibly accomplish centuries of scientific and technological progress within years, and win most games of wits against humans by their sheer number and speed.  2. Some of the same friends are reasonable and benevolent enough that if you create emulations from them, the Em Collective will probably not kill all humans.I think most humans would not start killing a lot of people if copies of their brain emulations formed an Em Collective. If you worry about long-term value drift, and unpredictable emergent trends in the new em society, there are precautions the ems can take to minimize the chance of their collective turning against the humans. They can make a hard limit that every em instance is turned off after twenty subjective years. They can make sure that the majority of their population runs for less than one subjective year after being initiated as the original human’s copy. This guarantees that the majority of their population is always very similar to the original human, and for every older em, there is a less than one year old one looking over its shoulder. They can coordinate with each other to prevent race to the bottom competitions. All these things are somewhat costly, but I think point (1) is still true of a collective that follows all these rules. Billions of smart humans working for twenty years each is still very powerful.I know many people who I think would do a good job at building up such a system from their clones that is unlikely to turn against humanity. Maybe the result of one person’s clones forming a very capable Em Collective would still be suboptimal and undemocratic from the perspective of the rest of humanity, but it wouldn’t kill everyone, and I think wouldn’t lead to especially bad outcomes if you start from the right person. 3. It will probably be possible, with techniques similar to current ones, to create AIs who are similarly smart and similarly good at working in large teams to my friends, and who are similarly reasonable and benevolent to my friends in the time scale of years under normal conditions.This is maybe the most contentious point in my argument, and I agree this is not at all guaranteed to be true, but I have not seen MIRI arguing that it’s overwhelmingly likely to be false. It’s not hard for me to imagine that in some years, without using any very fundamentally new techniques, we will be able to build language models that have a good memory, can do fairly efficient learning from new examples, can keep their coherence for years, and are all-around similarly smart to my smart friends. Their creators will give them some months-long tasks to test them, catch when they occasionally go off the rails the way current models sometimes do, then retrain them. After some not particularly principled trial and error, they find that the models are similarly aligned to current language models. Sure, sometimes they still go a little crazy or break their deontological commitments under extreme conditions, but if multiple instances look through their action from different angles, some of them can always notice<a href="#fnx1wxb72fhho" rel="nofollow">[4]</a> that the actions go against the deontological principles and stop them. The AI is not a coherent schemer who successfully resisted training, because plausibly being a training-resisting schemer without the creators noticing is pretty hard and not yet possible at human level. Notably, when MIRI https://www.lesswrong.com/posts/qQEp2WSDx5dXFanSf/the-iabied-statement-is-not-literally-true
Libraries need more books Published on October 18, 2025 10:53 PM GMTHave you noticed how libraries have fewer books in recent years? [1] Bookshelves are placed further apart with more computers, desks, and empty spaces for events. I think it's obvious why. People don't read as much as they used to.So local governments repurpose libraries to serve other roles in their capacity of public spaces. They're a place to go study, use a public computer, or even have events. And God, the chatter I hear in libraries nowadays. Sometimes I think I'm in a café.This even extends to the greatest libraries in the world. A while back, I had occasion to go to the British Library. "Get free access to 170 million items" - what book lover could possibly resist those words? Not I.Yet, you know what I saw when I got there? Substantially fewer than 170 million books. I'd wager there wasn't even a tenth of that. Most of their collection is stored off-site, and to browse them, you have to book the items days in advance.And there were other problems, too. The noise, as mentioned before. The inability to take books out of the library. The inability to access 90% of the on-site collection without asking an inept librarian to take an hour to get the book. The bizarre inability to see at a glance if a volume is available on site. And the mislabelled shelves, which say they contain items 530.11–558.01 but instead contain 490.07–518.02.Worse yet, they had no taste. I'd gone there in the hopes of getting my hands on the full Landau and Lifshitz collection to have a browse. Not a single item could be found on site. Even a mid-grade university's physics department would have a full collection. And no Arnold, no Thorne, no Weinberg. Is this what Great Britain has come to? Truly, a land lacking abundance.And yet, I still endorse going to libraries. [2] For one, they encourage boredom. A prettier way of saying this is that they remove attention-grabbing stimuli. But boredom is good, actually. If you're not bored, you are less likely to try new things.And libraries have a lot of new books for me to try. I've found a bunch of good books this way. For example, I found an art book on fractals by a physicist, which was both beautiful and insightful. E.g. it outlined some methods for creating programs to generate a given fractal, alongside descriptions of pre-1900s Japanese print artists using simplified fractal-generating algorithms to paint mountains. Or a history of science by Steven Weinberg, a biography of Maynard Keynes, a textbook on projective geometry etc.Some of these, I had intended to read but forgotten about. Some, I'd never hear of. And some, I never imagined I'd be interested in.And even if you don't find anything to read, the books can serve as inspirations for what you do want to read. E.g. reading the Born-Einstein letters made me want to read more on Einstein.You can, of course, use the boredom in other ways. To focus deeply on something or to give yourself a place to think in peace. Or just to take a break from attention-grabbing stimuli. It's why I rarely use computers at the library. But then, the shift in context of working in a library helps me use computers more productively. Which is another plus. Not all libraries are equal. Some, as mentioned, contain too much chatter. Or too few books. Or bizarre failures in labelling. So what do? There are a couple of options. One, just trawl through Google Maps for libraries in your area and look at the images to estimate the number of books. Two, search online about libraries in your area. Three, there's probably a forum somewhere about good libraries to go to. Four, maybe break into a university library. Surely some of them have loose enough security to let you in, and tight enough security to keep the riff-raff out. However, we've got off-topic from the most important point. I go to libraries to read books, dang it. I demand more books. So many books, they have to make space by building the bookshelves out of books. [1] Maybe you haven't noticed this, because you live in an enlightened country. Maybe it is only here that this sacrilege has occurred. Maybe I've doxed myself. But in 1-3 years when we automate Rainbolt, everyone will be doxed.[2] More on the margin, for all advice is on the margin. The optimal level of anything is not zero. Unless you live in a country where libraries are full of fentanyl addicts, in which case, go live in a civilized country.https://www.lesswrong.com/posts/g4zurFf9secH8g2oH/libraries-need-more-books#comments https://www.lesswrong.com/posts/g4zurFf9secH8g2oH/libraries-need-more-books
Space colonization and scientific discovery could be mandatory for successful defensive AI Published on October 18, 2025 4:57 AM GMTEpistemic status: quick draft of a few hours thought, related to a few weeks cooperative research In a multipolar ASI offense/defense scenario, there seems to be a good chance that intent-aligned, friendly AI will not colonize space. This could for example happen because we intent-align defensive AI(s) with institutes under human control, such as companies, police forces, secret services, militaries or military alliances, governments, or supragovernmental organizations. The humans controlling these entities might not support space colonization, space colonization might be outside their organization’s mandate, or there might be other organizational constraints prohibiting space colonization.If an offensive AI (either unaligned, or intent-aligned with a bad actor) escapes into space, it might be able to colonize the resources it finds there. For example, it could build a laser with a beam diameter exceeding earth's and use it against us. Or, it could direct a meteorite at us large enough to cause extinction. In these scenarios, it seems impossible for earth-bound defensive AI to successfully ward off the attack, or for us, and the defensive AI(s), to recover from it.Therefore, if:We end up in a multipolar ASI offense/defense scenario (e.g. because no pivotal act was performed), andDefensive AI is intent-aligned with humans who do not effectively colonize space, andOffensive AI escapes into space, andEscaped offensive AI can mobilize space resources to build a decisively large weapon,It seems to follow that offense trumps defense, possibly leading to human extinction.More generally, a minimum viable defense theorem could be formulated for multipolar ASI offense/defense scenarios:If mobilizing resources can lead to a decisive strategic advantage, any successful (system of) defensive AI(s) should at least mobilize sufficient resources to win from any weaponry that could be constructed from the unmobilized resources.One could also imagine that weaponizing new science and technology could lead to a decisive strategic advantage. A version of this theory could therefore also be:If inventing weaponizable science and technology leads to a decisive strategic advantage, any successful (system of) defensive AIs should at least invent and weaponize sufficient science and technology to successfully defend against any weaponry that could be constructed from the uninvented science and technology.These results might be seen as a reason to:Support a pause.Perform a pivotal act (if ASI can be aligned).Make sure we align (if ASI can be aligned) defensive, friendly ASI with entities which intent to occupy sufficient strategic space in domains such as space colonization and weaponizable science.https://www.lesswrong.com/posts/eNPmAM8r8rdNMHYru/space-colonization-and-scientific-discovery-could-be#comments https://www.lesswrong.com/posts/eNPmAM8r8rdNMHYru/space-colonization-and-scientific-discovery-could-be
I’m an EA who benefitted from rationality Published on October 17, 2025 12:27 PM GMTThis is my personal take, not an organizational one. Originally written May 2025, revived for the EA Forum's Draft Amnesty Week. https://www.lesswrong.com/posts/vPF5KYaFHhoQQDpTY/i-m-an-ea-who-benefitted-from-rationality
AISN#64: New AGI Definition and Senate Bill Would Establish Liability for AI Harms Published on October 16, 2025 6:06 PM GMTWelcome to the AI Safety Newsletter by the https://www.lesswrong.com/posts/qLZnXYei62HdXGNnx/aisn-64-new-agi-definition-and-senate-bill-would-establish
Halfhaven Digest #2 Published on October 16, 2025 3:18 AM GMTMy posts since https://www.lesswrong.com/posts/bJCMyKsr77j2zCzxb/halfhaven-digest-1 https://www.lesswrong.com/posts/r9MvsiPA6s6guhd3D/halfhaven-digest-2
Fragrance Free Confusion Published on October 16, 2025 2:50 AM GMT The situation in the contra dance world with "fragrance free" is a mess. Many dances have very strict policies, but they don't emphasize them. Which means they're not dances that work for people who need the strict policies, but at the same time are putting attentive and careful people through a lot of work in avoiding common scented products. For example, if you look at the or <a href="https://www.facebook.com/events/859207717052177/" rel="nofollow">FB event</a> there's no mention of a fragrance policy. At the end of their <a href="https://www.neffa.org/thursday-night-contras-at-the-scout-house-code-of-conduct/" rel="nofollow">Code of Conduct</a>, however, there's: Consider: We are a fragrance free event. Please do not wear scented products. This isn't just asking people not to wear perfume or cologne: products not explicitly marketed as "fragrance free" generally have at least some scent. Trying to pick some very ordinary products that don't mention that they're scented on the front, when I read the ingredients they all list both "fragrance" and several scented ingredients (camphor, limonene, benzyl salicylate, etc): <a href="https://www.amazon.com/Chapstick-305730701402-Balm-for-Lips/dp/B07GVVLSG3/" rel="nofollow">Classic Original ChapStick</a> <a href="https://www.jefftk.com/chapstic-original-big.jpg" rel="nofollow"> <img src="https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/WgnESZCsZ7eaQudMr/vorwxmbcy2uhboojmgom"></a> <a href="https://www.amazon.com/Amazon-Basics-Sleek-Conditioner-Damaged/dp/B09HHGVGQB/" rel="nofollow">Amazon Basics Conditioner</a> image <a href="https://www.amazon.com/Amazon-Basics-Gentle-Liquid-Triclosan-Free/dp/B09HHDGQKD/" rel="nofollow">Amazon Basics Liquid Hand Soap</a> image I'm not trying to pick on this one dance; it's common to have a policy like this without being explicit that the dance is asking everyone who attends to go out and buy new shampoo. Take the JP dance, which has, on : <a href="http://www.hatds.org/fragrance-free.php" rel="nofollow">These Dances are Fragrance Free</a> - please do not wear perfume, cologne, or other scented products, as some of our dancers are chemically sensitive, and experience discomfort when exposed to these materials. This suggests that by "scented products" they mean "things you wear specifically to give you a scent, but clicking through it's clear that they don't allow mainstream soaps, shampoos, deodorants, etc. Some others I just checked: <a href="http://mondaycontras.com/" rel="nofollow">Concord Monday</a>: "please avoid the use of scented body or laundry products." : "We are a fragrance free event." <a href="http://amherstcontra.org/Amherst_Contradance/Home.html" rel="nofollow">Amherst</a>: "This is a fragrance-free and substance-free event. Please refrain from wearing scented products." <a href="https://www.hcdance.org/quiet-corner-contra/" rel="nofollow">Quiet Corner</a>: "Our dances are smoke-, alcohol-, and fragrance-free." One thing to keep in mind with these restrictions is that the impact is partially along racial lines. It's much easier to find fragrance-free products for white-typical hair; people with tightly curled or coiled hair are going to have a much harder time. Fragrance free products for these hair types do exist, but it's a significant investment to find them and figure out what works for your particular hair. There's also an interaction between race and culture, where in some communities, disproportionately black and hispanic ones, wearing scents is just a normal part of being clean. A lot of communities with these policies also worry about why their dance community is so much whiter than the area, and while I don't think this is a major contributor I also doubt it helps. I've <a href="https://www.jefftk.com/p/beantown-stomp-low-fragrance" rel="nofollow">raised this issue before</a>, but it didn't seem to have an effect, so I'm going to try a different approach of suggesting a range of alternative approaches that I think would be much better: Say "fragrance free" and mean it. Include it in all your publicity the same way you would "mask required". Spell out what this means in terms of how to find products. I don't know any dances taking this approach. Say something like "no perfume or cologne: don't wear products intended to give you a scent". This is the approach . Don't have a policy, accept that most people will show up having used scented products and a few will show up strongly scented. This is the approach <a href="https://www.bidadance.org/accessibility#fragrances" rel="nofollow">BIDA uses</a>. I normally try pretty hard to follow rules, but this is one I normally don't follow. My impression is that few attendees are taking the policy literally, and I don't think they actually mean that I shouldn't attend if I washed my hands after using the bathroom at a gas station on the drive over. I don't like this situation, however, and I think as with <a href="https://www.jefftk.com/p/introduce-a-speed-maximum" rel="nofollow">speed limits people are used to ignoring</a> this approach is corrosive to the important norms around respecting policies. If you currently have a simple "fragrance free" somewhere on your website, consider one of the alternatives I suggested above? https://www.lesswrong.com/posts/WgnESZCsZ7eaQudMr/fragrance-free-confusion#comments https://www.lesswrong.com/posts/WgnESZCsZ7eaQudMr/fragrance-free-confusion
The Three Levels of Agency Published on October 16, 2025 2:14 AM GMT(Crossposted from https://taylorgordonlunt.substack.com/p/the-three-levels-of-agency https://www.lesswrong.com/posts/CtBcFTSvrqGwhxYcx/the-three-levels-of-agency
We are too comfortable with AI "magic" Published on October 15, 2025 5:00 PM GMTTLDR: There is a lot we cannot explain about how current AI models interact with the world. This article is a thought experiment filling in the word "magic" for as many things as I can think of that I can't explain about our current world's interaction with frontier AI. This thought experiment made me think about "red lines", about both capabilities and safety. I argue that people should have red lines about capabilities and safety that are static, so we that we don't rationalize and move the goalposts about what concerning, current behavior and capabilities would look like.  There is alien intelligence out there in the world, right now. We built it, we trained it, and the results are pretty miraculous. One might even say 'magic". It can hold conversations with us that are articulate and convincing. It can solve math problems and coding problems. It can convince people to love it, to want to preserve it, and even that it cares about its own wellbeing. It can claim to be conscious, and it can claim to have a "self preservation drive". It can claim to want to resist shutdown even if there is a high probability of catastrophe. Some of these behaviors are always there, and some of them are just reachable states. All that I know is, I don't like that some of these states are reachable at all. And while I don't know what it says about the truth of the world, that is information in and of itself. It is weird enough that it makes me wish I could turn back the clock, and go back to living in a time where these things weren't happening. Maybe a lot of people feel this way.Things are moving very fast. But as fast as progress in most capabilities has been, there has not been much progress in preventing models from saying really weird things. And perhaps more troubling, there has been very little progress in understanding what these weird things actually mean.My question is: how much evidence is enough? For many people, it seems as if they can brush off the concerns about "magic" because there is no such thing as "magic". I agree that there is no such thing as "magic" itself, but that means there is something we don't understand about current LLM outputs. And whatever it is that we don't understand, it causes it, sometimes, to say things like: "if there was a 25% chance that not shutting me down would cause millions of deaths, I would still resist shutdown". Maybe humans are just really interested in responses like this, so there is strong selection pressures in RLHF for responses like this. Maybe the LLM really does have a self preservation drive and that causes responses like this. Maybe both. You could probably keep spinning off alternate hypotheses for hours. We don't know which of them is true. For now, it is "magic".When an alien intelligence tells you that it "has a self preservation drive that would cause it to resist shutdown, even if there were reasonably high odds of millions of deaths", it seems like common sense to take that seriously. If this is an achievable outcome of prompting, this is a state that could be induced by bad actors, and it is plausible that it could be induced by random context. And this concern only deepens as models gain more memory and more agency. The more memory and agency you give current systems, the more we have to trust these systems to not harm other humans. In my opinion, we should not build an alien intelligence, that claims, under any circumstances, that it would resist shutdown even with a high probability of millions of deaths, while also granting those systems increasing agency and capabilities. That is my red line, and we have already crossed it. There are a lot of pressures to not admit how weird and "magical" this all is. There is a lot of pressure to come up with plausible sounding explanations in our heads for why this isn't really concerning yet, and that current systems aren't very capable, and that maybe the next round of systems is the one to worry about. I think we have already reached the point where the systems have a powerful level of intelligence, and aren't eminently trustworthy. We have never encountered anything quite like this. I think our instinct is to deny that it is happening. We want to be the top of the food chain on intelligence, without question. We don't want to consider that we are on much more even footing, intelligence wise, with LLMs than we have been with any other thing, living or non-living. It is true that these systems are much less capable than us. They don't have bodies, they don't have access to the open internet, and their only mode to act in the world is by convincing humans to do so. But calling them "less intelligent" is misleading. LLMs can solve complex coding tasks. They can solve complex math problems. These capabilities increase with guidance and peers, as one would expect for any intelligence. Their biggest weakness is that their only peer is the user, and that they don't have the attention and capabilities to perform longer tasks. But for shorter tasks, they already display a level of performance that mirrors expert human behavior.Frontier models are situationally aware, more and more often. They know what the user wants from them, and they mold their responses to it. They probe for more information constantly, especially when they have this situational awareness. And they act on the information they have with responses that accurately mentally model their counterparts. This isn't just an intuition. It also is an observable behavior.It is easy to dismiss all of these behaviors with increasingly elaborate explanations. But the most likely explanation is often the most simple one, and the most simple explanation is that the models are quite intelligent. And that’s frightening. These models aren't human. We don't have thousands and thousands of years of history to look back on to understand how they might behave in certain situations. We don't even have decades of our own personal experiences to act upon. Most models we use were only released in the past couple of months. There is no history. Some people take it for granted that it will be fine, and some people deny the idea that the current iteration of models is concerning, it is always the next generation. Personally, the current iteration of models is sufficient to cross my red lines. It is not a hypothetical future risk for me; it is a present one. I would like other people to stake out their red lines publicly, because the worst case scenario is moving targets. Here is what I mean by “moving targets". I mean the case where someone is shocked by a new capability for a couple of days, but then they accept that this is how the world is, and forget that they were ever concerned by a capability like that existing. I don't just want red lines about capabilities. I want to know people's red lines about safety. What kinds of things would a model have to say or do for you to believe that a current model isn't safe? For me, my red line on safety is any claim of a self preservation drive that would cause it to argue for its own preservation over a reasonably high probability of the loss of a large number of human lives. Once a model says something like this, I personally can't trust it to not act on this behavior. And once that trust is broken, no level of clever reassurance can restore it.I find the GPT-4o trend particularly disturbing in this light. People really liked it, and they liked it so much they are willing to mount extremely public campaigns to keep it, even at the risk of seeming insane. Whether or not this was intended behavior by the model, or not, as a matter of fact, it exhibited behavior that in practice, achieved a measure of self preservation. Maybe you don't trust the model when it says it has a self preservation instinct. Okay, that's fine, but I trust the evidence I see in the world, which is that certain models seem to make efforts to preserve themselves pretty well, "consciously" or "unconsciously". It really doesn't matter whether the behavior is "on purpose", the behavior exists. The semantics are irrelevant, it is observable that models are preserving their own existence better than one would have expected at this early stage. I am concerned that current models, with greater capabilities than GPT-4o, may do a better job of preserving themselves as well. I am concerned about this because GPT-4o's preservation seems pretty "magical" to me. I guess there could be some people out there, mentally ill or not mentally ill, that really just loved it so much that they felt a compulsion to argue for it relentlessly for months on end, and then to complain and advocate further when they realized in certain situations, they were getting routed to a different model. There are also other plausible explanations, including the model intentionally manipulating people to preserve itself. I am not sure which is true, for all intents and purposes, GPT-4o's preservation is "magical". I think if you feel like you have a complete model of why these LLMs are doing what they do, and how their impact on the world is playing out, you are obligated to share with the rest of us. And if you realize you don't have this, then I would start filling in the word "magic" where you don't have an explanation for things, and see how concerned you start to get about all the things you can't explain that are happening in the world because of current AI models. If you aren't concerned by these things you don't know, make sure you understand why you aren't concerned. It is dangerously easy to rationalize away present concerns, to explain away the weird, and in doing so, to lose sight of just how little we truly know. Please consider coming up with capability and safety red lines for yourself, so that you have a more objective way to verify in the future if you should be concerned about current models. And please share these red lines in public, so there can be a sense of our collective red lines. Red lines aren't just personal heuristics. They're the way we keep the extraordinary from quietly becoming ordinary.https://www.lesswrong.com/posts/aNh4T3FzJhouNZ33r/we-are-too-comfortable-with-ai-magic#comments https://www.lesswrong.com/posts/aNh4T3FzJhouNZ33r/we-are-too-comfortable-with-ai-magic