LessWrong (RSS Feed) 7 months ago

Don't rely on a "race to the top" Published on May 1, 2025 12:33 AM GMTTo make frontier AI safe enough, we need to "lift up the floor" with minimum safety practicesAnthropic has popularized the idea of a “race to the top” in AI safety: Show you can be a leading AI developer while still prioritizing safety. Make safety a competitive differentiator, which pressures other developers to be safe too. Spurring a race to the top is core to Anthropic’s mission, according to its co-founder and CTO.

Don't rely on a "race to the top"

To make frontier AI safe enough, we need to "lift up the floor" with minimum safety practices

https://www.lesswrong.com/posts/LFxdvPiksvLHA58Mx/don-t-rely-on-a-race-to-the-top

LessWrong (RSS Feed) 7 months ago

Obstacles in ARC's agenda: Finding explanations Published on April 30, 2025 11:03 PM GMTAs an employee of the European AI Office, it's important for me to emphasize this point: The views and opinions of the author expressed herein are personal and do not necessarily reflect those of the European Commission or other EU institutions.Also, to stave off a common confusion: I worked at ARC Theory, which is now simply called ARC, on Paul Christiano's theoretical alignment agenda. The more famous ARC Evals was a different group working on evaluations, their work was completely separate from ARC Theory, and they were only housed under the same organization out of convenience, until ARC Evals spun off under the name METR. Nothing I write here has any implication about the work of ARC Evals/METR in any way.Personal introductionFrom October 2023 to January 2025, I worked as a theoretical researcher at

Alignment Research Center

ARC is a non-profit research organization whose mission is to align future machine learning systems with human interests.

https://www.lesswrong.com/posts/xtcpEceyEjGqBCHyK/obstacles-in-arc-s-agenda-finding-explanations

LessWrong (RSS Feed) 7 months ago

State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost] Published on April 30, 2025 7:58 PM GMTThis time around, I'm sharing a post on Interconnects on why he doesn't believe that the AI 2027 scenario by https://www.lesswrong.com/users/daniel-kokotajlo?mention=user https://www.lesswrong.com/posts/vQJAjgo7uxWg9LnuM/untitled-draft-3jfe