AI Moratorium
To avoid an existential catastrophe, large AI training runs have to be stopped until scientists consider it safe to proceed.
“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
— A statement signed by leading AI researchers and executives
“The alarm bell I’m ringing has to do with the existential threat of them taking control [...] If you take the existential risk seriously, as I now do, it might be quite sensible to just stop developing these things any further”
“I would advocate not moving fast and breaking things. [...] When it comes to very powerful technologies—and obviously AI is going to be one of the most powerful ever—we need to be careful. [...] It’s like experimentalists, many of whom don’t realize they’re holding dangerous material”
“Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die”
“Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity”
“Rogue AI may be dangerous for the whole of humanity. Banning powerful AI systems (say beyond the abilities of GPT-4) that are given autonomy and agency would be a good start”
“I’ve not met anyone in AI labs who says the risk [from a large-scale AI experiment] is less than 1% of blowing up the planet. It’s important that people know lives are being risked”
“The development of full artificial intelligence could spell the end of the human race”
“If we pursue [our current approach], then we will eventually lose control over the machines”
“Superintelligent AIs are in our future. [...] There’s the possibility that AIs will run out of control. [Possibly,] a machine could decide that humans are a threat, conclude that its interests are different from ours, or simply stop caring about us”
“[We] should not underestimate the real threats coming from AI. [Fully quoted the above statement on the risk of extinction.] [...] It is moving faster than even its developers anticipated. [...] We have a narrowing window of opportunity to guide this technology responsibly”
“AI poses a long-term global risk. Even its own designers have no idea where their breakthrough may lead. I urge [the UN Security Council] to approach this technology with a sense of urgency. Unforeseen consequences of some AI-enabled systems could create security risks by accident. Generative AI has enormous potential for good and evil at scale. Its creators themselves have warned that much bigger, potentially catastrophic and existential risks lie ahead. Without action to address these risks, we are derelict in our responsibilities to present and future generations.”
“The potential impact of AI might exceed human cognitive boundaries. To ensure that this technology always benefits humanity, we must regulate the development of AI and prevent this technology from turning into a runaway wild horse. [...] We need to strengthen the detection and evaluation of the entire lifecycle of AI, ensuring that mankind has the ability to press the pause button at critical moments”
Why we need a global AI moratorium
Summary
AI systems have huge realised and potential benefits, but it's crucial to avoid their harms. As AI gets increasingly integrated into society, we need to address the risks related to racial, gender and other biases, misinformation, cybersecurity, equality, and privacy, both through technical research and governance. And there's another worry, which we focus on here.
Leading AI scientists warn about the existential threat from advanced Artificial Intelligence (AI) systems.
To develop everyday software, programmers write down instructions that computers follow. But AI is not like that: no one designs or understands the instructions AIs follow. Instead, AI systems are grown. We don't know how to control advanced AI systems or set the goals they pursue.
Within the next 10-15 years, many researchers expect to achieve superhuman Artificial General Intelligence (AGI). Leading AI labs (OpenAI, Google DeepMind, and Anthropic) state[1][2] the creation of a superhuman AGI as their explicit objective.
While researchers find ways to get closer to a superhuman AI, the field does not currently recognize any promising leads as to how to make a future AGI controllable or safe.
"AI alignment" is the problem of aligning future AI goals and behavior with human values. We're not on track to solve this problem in time (before we reach AGI). Because of that, some employees of OpenAI, DeepMind, and Anthropic think the probability of extinction is around 80-90%1. They use the word “extinction” literally: the end of all life on the planet.
They don't expect their companies to behave responsibly around the time of AGI without government oversight and intervention. Urgent action is needed from governments around the world to prevent an existential catastrophe.
Today, cutting-edge AI systems are artificial neural networks: millions to trillions of numbers we automatically adjust until they start achieving a high score on some metric. We don't know what these numbers represent. We do not know how the resulting AI systems work or what their goals are.
Artificial neural networks can implement algorithms that are smart, have an internal representation of some goals and try to achieve these goals (what we call “agentic”). The field of modern machine learning focuses on searching for neural networks that implement algorithms that perform well on some objective, and this search tends to go towards smarter and more agentic systems, but we have very little insight into what the algorithms that we find actually do.
We know how to find systems with some goals and we get better at finding “agentic” systems, but we have no idea how to precisely specify goals that would be safe for a superhuman system to pursue, and furthermore, we don’t even know how to find systems with any goals that we’d want AIs to have. The default path to a superhuman AGI is a path to a system with alien goals that have no place for human values. We don’t know how to make sufficiently advanced AI systems care about humans at all, or have any of the goals we'd want AI systems to have. The technical problem of creating AI that’s aligned with human values consists of multiple hard-to-solve parts, and researchers don't expect to be able to solve it in time, unless governments intervene.
If a capable enough mind doesn't care about humans, then we’re just atoms it can use to achieve its random alien goals (and also entities that produce a threat of launching another AI that it’d have to deal with), and the natural consequence is that everyone (literally) dies as a side effect of AI utilising all available resources on cosmic scales.
Without international coordination to regulate potentially dangerous AI training runs and prevent AGI from being created before the technical problem is solved, many researchers expect humanity to go extinct. We welcome the progress we observed over the recent months, but it's still too slow, and we hope to increase engagement with the problem and technical understanding among policymakers.
Read about the technical problemRead about the technical problem of AI alignment: how modern AI works and why exactly experts expect a catastrophe.
AI Moratorium
How do we prevent a catastrophe?
The leading AI labs are in a race to create a powerful general AI, and the closer they get, the more pressure there is to continue developing even more generally capable systems.
Imagine a world where piles of uranium produce gold, and the larger a pile of uranium is, the more gold it produces. But past some critical mass, a nuclear explosion ignites the atmosphere, and soon everybody dies. This is similar to our situation, and the leading AI labs understand this and say they would welcome regulation.
Researchers have developed techniques that allow the top AI labs to predict some performance metrics of a system before it is launched, but they are still unable to predict its general capabilities.
Every time a new, smarter AI system starts interacting with the world, there's a chance that it will start to successfully pursue its own goals. Until we figure out how to make general AI systems safe, every training run and every new composition of existing AI systems into a smarter AI system poses a catastrophic risk.
A suggested way to prevent dangerous AI launches is to impose strict restrictions on training AI systems that could potentially be generally capable and pose a catastrophic risk. The restrictions need to be implemented both on national levels and, eventually, on the international level, with the goal of preventing bad and reckless actors from having access to compute that might allow them to launch AI training that could be dangerous to humanity as a whole.
The supply chain of AI is well understood and contains multiple points with near-monopolies, so many effective interventions can be relatively simple and cheap. Almost no AI applications require the amount of compute that training frontier general AI models requires, so we can regulate large general AI training runs without significantly impacting other markets and economically valuable use of narrow AI systems.
For future measures to be effective, we need to:
- Introduce monitoring to increase governments' visibility into what's going on with AI: have requirements to report frontier training runs and incidents;
- Ensure non-proliferation of relevant technologies to non-allied countries;
- Build the capacity to regulate and stop frontier general AI training runs globally, so that if the governments start to consider it to be likely that using a certain amount of compute poses a catastrophic risk to everyone, there's already infrastructure to prevent such use of compute anywhere in the world.
Then, we'll need to impose restrictions on AI training runs that require more than a calculated threshold: the amount of compute below which training with current technologies is considered to be unlikely to produce dangerous capabilities we could lose control over. This threshold needs to be revisable since, as machine learning methods improve, the same level of capabilities can be achieved with lower compute.
As a lead investor of Anthropic puts it, “I’ve not met anyone in AI labs who says the risk [from a large-scale AI experiment] is less than 1% of blowing up the planet”.
Potentially dangerous training runs should be prohibited by default, although we should be able to make exceptions, under strict monitoring, for demonstrably safe use of compute for training or using narrow models that clearly won’t develop the ability to pursue dangerous goals. At the moment, narrow AI training runs usually don't take anywhere near the amount of compute utilised for current frontier general models, but in the future, applications such as novel drug discovery could require similar amounts of compute.
Regulation of AI to prevent catastrophic risks is widely supported by the general public. In the US, 86% believe AI could accidentally cause a catastrophic event; 82% say we should go slow with AI compared to just 8% who would rather speed it up; 70% agree with the statement that “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war” (YouGov for AIPI, July 2023). 77% express their preference for policies with the goal of preventing dangerous and catastrophic outcomes from AI (57% for preventing AI from causing human extinction) (YouGov for AIPI, October 2023). Across 17 major countries, 71% believe AI regulation is necessary (KPMG, February 2023). In the UK, 74% agree preventing AI from quickly reaching superhuman capabilities should be an important goal of AI policy (13% don't agree); 60% would support the introduction of an international treaty to ban any smarter-than-human AI (16% would oppose). 78% don't trust the CEOs of technology companies to act in the public interest when discussing regulation for AI (YouGov for ai_ctrl, October 2023).
We shouldn't give AI systems a chance to become more intelligent than humans until we can figure out how to do that safely.
Until the technical problem of alignment is solved, to safeguard the future of humanity, we need strict regulation of general AI and international coordination.
Some regulations that help with existential risk from future uncontrollable AI can also address shorter-term global security risks: experts believe that systems capable of developing biological weapons could be about 2-3 years away. Introducing regulatory bodies, pre-training licensing, and strong security and corporate governance requirements can prevent the irreversible proliferation of frontier AI technologies and establish a framework that could be later adapted for the prevention of existential risk.
We call on policymakers around the world to establish and enforce national restrictions and then a global AI moratorium that would prevent anyone in the world from risking human extinction.
- From personal conversations with people working at OpenAI, Google DeepMind, and Anthropic.↩
- The current scientific consensus is that the processes in the human brain are computable: a program can, theoretically, simulate the physics that run a brain.↩
- For example, imagine that you haven't specified the value you put into a vase in the living room not getting destroyed, and no one getting robbed or killed. So if there’s a vase in the way of the robot, it won’t care about accidentally destroying it. What if there’s no coffee left in the kitchen? The robot might drive to the nearest café or grocery to get coffee, not worrying about the lives of pedestrians. It won’t care about paying for the coffee if it wasn’t specified in its only objective. If anyone tries to turn it off, it will do its best to prevent that: it can’t fetch the coffee and achieve its objective if it’s dead. And it will try to make sure you’ve definitely got the coffee. It knows there might be some small probability of its memory malfunctioning or camera lying to it; and it’ll try to eradicate even the tiniest chance that it hasn’t achieved the goal.↩