The story so far
In parts I and II, I’ve provided a case for why widgets (if this post is the first you’re reading, widgets = AGI) could be concerning. I’ve used the term widget to remove any preconceptions people might have about AGI and whether it’s possible or not. To summarize, a widget is an AI system with the following properties.
The ability to plan over long time horizons, on the order of months to years.
Human-level or above fluency with language and language-based tasks, including coding.
Competence in interacting directly with the world digitally, rather than having interactions mediated through humans. This competence includes a general understanding of how the world works, but may not include skills like physical manipulation of objects.
To recap, we’re building systems whose operation we don’t understand, but which are increasingly capable of acting in the digital world over long time horizons to achieve arbitrary goals. The construction of such systems poses serious risks because they could be misused by both governments and individuals to cause catastrophic damage, they may act in out-of-control ways, and even their use may cause things to spiral out of control even if nobody does clearly bad things with them.
The story so far has tried to build up some intuitions for why we should not be going full steam ahead on widget construction.
There are a bunch of unknowns about how widgets will behave. E.g., will they pursue convergent instrumental goals to extreme ends? If we don’t know how likely extremely bad things are, it seems prudent to slow down so that we can find out.
Our regulatory capacities have not kept pace with technological development. Efforts like the proposed EU AI act are a start, but don’t seem strong enough to manage the risks I’ve outlined. In the meantime, widget development is still going full steam ahead.
Economic incentives to deploy systems could outstrip our technical capacity to control them. Technical work aims to reduce out-of-control behaviour. Some work is promising, but the economic incentives at play may reduce its reach. OpenAI is certainly aware that its control/safety techniques are not yet up to par, yet Microsoft and OpenAI have still deployed the unhinged Bing chatbot.
Regardless of whether we should at some point in the future build widgets or not, I think there is a strong case for significantly slowing down widget development now.
What slowing down could look like
Given the intuitions above, let’s spell out what significantly slowing down widget development looks like. The following is not the only possible direction, but gives a broad outline of what would be a better scenario.
International control and/or oversight over widget development and deployment, such that we substantially reduce economic and strategic incentives from private and national competition to develop widgets.
Regulations on the capabilities of developed and deployed systems, such as ones based on the amount of compute used in training runs. We should only be deploying systems when we are confident that their capabilities will remain within our collective control.
In carefully controlled settings, continued study into the possible harms of widgets and how to mitigate them. This work will likely involve some research that looks something like developing widgets, but with a different focus and with vastly more controls. This work would also involve research to make widgets safer, like technical alignment and cooperation research. We could have something similar to biosafety levels to reduce the risk that any systems developed cause harm in the real world.
Research into developing systems for socially beneficial applications could continue. This research would have to be vetted, preferably by a diverse panel of experts (like in bioethics), for the possibility that such research could contribute significantly to widget development. More Alphafolds seem great, but not if they are given arbitrary control over a computer.
We continue to develop policy ideas and implement legislation.
There are some issues with the pause I’ve outlined, some of which make it quite difficult to achieve.
It requires trust amongst actors that have reason for mutual distrust. Pausing widget development is a collective action problem, because if China pauses widget development, the US has an incentive to forge ahead to gain a strategic advantage over China. Same for AI companies like OpenAI and DeepMind.
It requires convincing a lot of people that the risk from widgets is serious. The benefits of widgets could be enormous, and it would be difficult to risk giving that up for possible harms that have not materialized.
It might be extremely difficult to separate widget development research from research into developing systems for socially beneficial applications. Or, some of the latter research could proceed, but we could be letting go of a lot of potential upside by carefully controlling widget development.
It remains to be seen how to ensure that the benefits of research are widely distributed, especially if widget development becomes centralized.
Still, the problems above are tractable. Already, more and more people are getting the sense that the current pace and trajectory of AI development is concerning. There do seem to be some historical precedents for actively controlling the developmental pace of emerging technologies.
The voluntary moratorium on recombinant DNA research in advance of the Asilomar conference, which led to risk-based regulations around such research that still hold up today.
The partial and threshold nuclear test ban treaties, whose enforcement was made possible by advances in monitoring technology.
The alternative to controlling the pace is to barrel ahead into the dark of potential catastrophes.
The scale of harms from widgets is enormous. The possibilities of enabling billions to develop catastrophic tools like bioweapons, permitting permanent totalitarianism, threatening human existence with the extreme pursuit of convergent instrumental goals, and encouraging mass societal unrest deserve sober thought.
That we haven’t ruled out these catastrophic risks means that widget developers are imposing massive negative externalities on the rest of society. Given the rate of technological progress, a handful of companies could decide the future of society while governments fail to respond quickly enough. As I talk about elsewhere, it is a democratic imperative to allow time for society to consider the risks before moving ahead.
Failing to proceed cautiously has caused unacceptable risks in the past. While nuclear non-proliferation has been relatively successful in that only 9 countries have atomic bombs rather than 193, we lived through half a century of the possibility of mutually assured destruction during the Cold War, avoiding at least one nuclear disaster by sheer luck. We have lost nuclear bombs that have barely avoided exploding. Above-ground nuclear tests in the 50s contaminated people worldwide with Strontium-90, which binds to bones like calcium does and can cause bone cancer; no follow-up study was ever performed to determine how much disease incidence increased.
Objections to slowing down
There are some serious objections to slowing down that we should address.
Throwing away upside
AI has the potential to bring about a wealth of benefits. This upside is particularly evident when considering the vast array of technical problems that AI could help address effectively. For instance, AI-driven research could significantly accelerate the development of clean energy solutions, enabling a faster transition to a sustainable future and mitigating the impacts of climate change. Similarly, AI's ability to analyze large amounts of data and recognize complex patterns could revolutionize medical research, leading to breakthroughs in the prevention, diagnosis, and treatment of diseases. We could alleviate considerable suffering and improve the overall quality of life for millions of people around the world.
These potential benefits are quite compelling to me, but they still do not seem worth risking catastrophe. There is still much uncertainty surrounding the potential catastrophic risks of AI. It is crucial that we proceed with caution and do not throw everything away in our pursuit of AI's upside. We can still try to pursue some of these benefits by focusing on applications and narrow systems, while at the same time slowing down the development of generally capable, agentic systems.
Capability overhangs
Capability overhangs present an interesting objection to the idea of slowing down AI development. Suppose we are successful in decelerating progress. In the future, either responsible actors or malicious entities could suddenly train AI systems that are significantly more capable than what we have been accustomed to, simply because the availability of computing power will have increased exponentially. Deployment of such a system could be destabilizing and not permit time for effective policy responses.
Continuing the pace of current AI research might help to raise public and governmental awareness of AI risk and allow time for policy responses. For example, the release of AI systems like ChatGPT and GPT-4 has spurred widespread discussion about the benefits and risks of AI.
Furthermore, if computing power becomes less of a bottleneck in developing capable systems in the future, regulating AI development may become increasingly difficult. If algorithmic progress becomes the main bottleneck to AI development, a widget could come from anywhere, which would be concerning for misuse, misalignment, and systemic risks.
Despite these objections, it still seems to me that slowing down can occur in a net positive way. For instance, even when slowing down AI progress we can at the same time strengthen institutions against misuse, such as using AI to improve cyber-defense systems. Additionally, decelerating AI development doesn't necessarily mean ignoring public awareness. There could be support for formal education campaigns and initiatives to help people understand AI and its potential risks. This would create a more informed public, better equipped to navigate the complexities of AI integration into society.
The most significant remaining issue is that compute would no longer be a bottleneck for AI development. As AI development becomes more accessible and widespread, preventing misuse or misaligned AGI could become increasingly challenging. I’m not sure how to manage this. My hope is that, by slowing down AI development now, we can use the additional time to improve our institutions and regulatory frameworks to reduce the impact of negative consequences in the event of misalignment or misuse.
What after?
If we were able to significantly slow down widget development, what would we do then? I think there are at least two ideal endgames.
We find convincing evidence that widgets are too risky and decide not to develop them any further. I think this evidence is most likely to come from out-of-control risks. For example, we could find convincing evidence that it’s basically guaranteed for widgets to become out-of-control. We then decide only to deploy systems that are not widgets (e.g., systems without long-horizon planning, systems that do not interact directly with the world, systems that are specifically designed for a given scientific purpose like protein-folding, etc).
We become extremely confident in our ways to radically reduce misuse, out-of-control, and systemic risks. Some of this confidence could come from the development of AI systems to improve our institutions and defenses. We deploy widgets, perhaps slowly at first and maybe not the most advanced versions, so as to acclimatize society to the new technology. Too quick of an introduction could result in much societal instability.
It may be that in the future, we come up with conclusive evidence that widgets are going to be dangerous no matter how we build them. It's possible that we conclude never to build widgets. If we knew we were in a world in which widgets were unavoidably dangerous, we would be foolish to race ahead and build them anyway. Since we do not know if we are in such a world, acting pragmatically under uncertainty means that we should still slow down. Better to slow down and make sure of things, even if we give up some moderate technological progress for some time, rather than lose it all in case widgets are unavoidably dangerous.