Let's talk about the CAIS statement on extinction risk from AI

Jun 04, 2023

The Center for AI Safety released a public statement on extinction risk from AI. The idea for a statement came from David Krueger,1 a Cambridge AI professor, and its execution received assistance from several other academics.

Hundreds of researchers signed the following statement:

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

In this post, I will discuss some of the more common responses that have opposed the statement.

Context

Since some of the opposition to the statement questions the motives and identities of the signatories, I'll say a little bit more about my background and how I'm involved.

I am a third-year PhD student in machine learning at Mila. After publishing research in RL during my MSc, at the start of my PhD I shifted my focus to sociotechnical and technical research on socially responsible ML, including existential safety and FATE work. Here are some examples of my published work in this regard: [2102.01265] The Limits of Global Inclusion in AI Development, [2302.10329] Harms from Increasingly Agentic Algorithmic Systems, [2303.09001] Reclaiming the Digital Commons: A Public Data Trust for Training Data.

I also participated in discussions around the construction of the statement.

Let's get into some responses.

Extinction risk from AI is not real

Since this objection underlies many of the subsequent ones, I’ll address it first.

Many works (academic works included) have laid out the cases for extinction risk from AI. Some of these works are:

I highly encourage reading the links above for a more in-depth discussion. Here, I'll address what seem to me to be main disagreements about extinction risk from AI.

Can AI systems be smarter (e.g., be better at reasoning, persuasion, planning, strategizing, knowing more about the world, etc) than humans?
If we can build AI systems that are smarter than us, how fast will we get there?
If we build AI systems that are smarter than us, will we be able to control them?

The answer to the first question seems to be yes. There does not seem to be an a priori reason why not. A stated goal of many research agendas in AI (e.g., the whole field of reinforcement learning) is to build systems that are smarter than humans. In the past decade we have been making good on this aim by building machine learning systems that have met or surpassed human performance in domain after domain. More recently, simply scaling up compute and data has given rise to language models that can interact with the digital world, reason, and plan in a much more general way than previous systems.

There is a lot of uncertainty about the second question. Estimates differ, but being able to build AIs that are smarter than humans seems increasingly a possibility within our lifetimes, and potentially in the next decade, given the rapid pace of progress in machine learning. An additional, unsettling uncertainty is the fact that we might be quite surprised about and unprepared for how smart AI systems get. AI systems exhibit emergent capabilities (emergent = unpredictable, as far as we can tell at the moment) with scale. Even the ones building AI systems may not be aware of their full capabilities; for example, the public discovered chain-of-thought prompting long after the release of GPT-3.

The answer to this question seems to be no, or at least not yet since solving this control problem is very difficult. We have trouble specifying goals correctly. When we do not specify them correctly, AI systems have acted outside of our intentions. Even when we specify goals correctly, the training environment provided to our models may cause them to miss generalize those goals anyway when acting out of distribution. Despite these problems, there are strong economic, political, and cultural incentives that push for the development of smarter AI systems. These incentives also push for giving these AI systems actuators to interface directly with the world and manage societal production and management.

By definition, a risk is the possibility of harm. There is going to be uncertainty for any risk. Yet, uncertainty does not preclude action to reduce extinction risk from AI. The standard of proof for taking action to mitigate risks is not the same as the standard of proof for a scientific paper. The evidence so far already suggests to me that extinction from AI is urgent enough, likely enough, and difficult enough to resolve so as to warrant immediate action.

Requiring a scientific standard of proof is not at all how society manages global risks. We have never fought a nuclear war and do not have certainty about its effects; yet, we must still avoid one. Every pandemic is different and we have no certainty over how precisely it will spread and impact society; yet, we must still prevent pandemics. We knew about the greenhouse effect and the increasing amount of CO2 concentration in the atmosphere in the mid-20th century; it should have been enough evidence to take action at that point.

The motives of the signatories are suspect

The argument here seems to be that if the motives of the signatories are suspect, then to a first approximation one should doubt the importance of AI existential risk. Suspicion about motives is reasonable. However, some of the suspicion does not hold up upon further inspection.

First, some claim that talking about extinction risk is just industry players trying to hype up their products. Yet, the fact is that the signatory list includes more than 100 AI professors. Here is just a small selection of them.

Tegan Maharaj, Shiri Dori-Hacohen, Dawn Song, Dan Roy, Elad Hazan, Ya-Qin Zhang, Andrew Barto, Marina Jirotka, Nando de Freitas, Philip Isola, Anca Dragan, Shai Shalev-Shwartz, Jakob Foerster, Scott Niekum, Ju Li, Jeff Clune, Diyi Yang, Atoosa Kasirzadeh, Peter Norvig, Mengye Ren, Phil Thomas, Jacob Steinhardt, Sharon Li, Dae-Shik Kim, Vincent Conitzer, David McAllester, Dylan Hadfield-Menell, Sam Bowman, David Krueger

Some of the signatories are and have for a long time been pursuing work to address ongoing harms from AI. It is extremely implausible for all of them to have suspect motives.

Moreover, as a hype strategy, discussing extinction risk seems quite roundabout and inefficient. OpenAI already seems to be generating ample hype just by showing the capabilities of GPT-4 and its integration with various plugins. OpenAI certainly has an incentive to build hype, yet it seems that the CAIS statement would decrease enthusiasm for AI if anything (extinction sounds quite bad). Put another way, the best way for a toaster company to make money isn't to say that its toasters are so powerful that they can cause fires. People can have complicated motivations. Someone can both believe in AI extinction risk and have a financial incentive to hype their products.

Second, another variant of criticizing motives is that any regulation to prevent extinction risk really serves just to create an industry moat. The argument seems to be the following.

If AI poses an extinction risk, then its development should be heavily centralized.
Centralization would disproportionately benefit the key players of AI today, i.e. the big tech companies.
Other players would be barred from competing with big tech.
Big tech would have much more power in shaping our society.
This power is bad because big tech is unaccountable and unrepresentative of society.

The argument is compelling, but it should be weighed with other considerations too. First, a particular intervention can both encourage the formation of moats and reduce extinction risk. We already accept trade-offs between moats and risk reduction in many regulated industries, such as pharmaceuticals, aviation, and finance. Second, preventing excessive market concentration and reducing extinction risk are both important. There should be more work into figuring out how to balance both issues.

The statement distracts from current harms

I often hear that discussing AI extinction risk draws attention away from ongoing harms. I think the following are some of the points behind this claim.

Addressing ongoing harms from AI and AI extinction risk respectively require very different interventions.
There is a limited amount of time, resources, and attention to address problems from AI.
AI extinction risk is too speculative.
There is little we can do about AI extinction risk right now.

At a high-level, my response is the following. Ongoing harms from AI systems are serious and deserve more attention; more people in the AI safety community should acknowledge the importance of these harms and how they lead to persistent injustice and exacerbate extreme concentrations of power absent enough attention. To name a few harms, existing uses of AI systems ruin livelihoods, reinforce systemic racism, violate privacy, and exacerbate polarization. At the same time, AI developers are still rushing ahead to develop new systems that could pose additional harms and exacerbate extinction risk. It is possible to address both ongoing harms and extinction risk because the problems are intertwined and society’s resources are not meaningfully limited.

I’ll say more about each point. The section Extinction risk from AI is not real already addresses 3.

There are useful and feasible interventions that could address both ongoing harms from AI and AI existential safety. This point addresses 1 and 4. Here are some interventions (these ideas are not new):

Mandatory pre-deployment evaluations: Nobody should be deploying a model that has not been extensively tested. A mandatory battery of tests could and should include evaluation of both ongoing and potential harms. Marginalized communities and those directly affected by a particular system should have a great say in what the battery of tests includes.
Continual risk assessment: AI systems should be continually evaluated for any problems that might crop up unexpectedly. We should be able to pull AI systems from the market as the FTC is trying to do currently, just as we already do for pharmaceuticals.
Data transparency requirements: Understanding what is in the training data of large language models would help to identify and mitigate any harms, whether they relate to harm we are experiencing right now like bias and exclusion, or harms relevant to existential safety like power-seeking and deception.
More funding for mitigating negative societal impacts relative to funding for capabilities: The amount of funding going into making AI systems more capable of performing arbitrary tasks is much higher than the amount of funding going into understanding and mitigating any negative impacts from AI development and deployment. Increasing the relative amount of funding going into the latter would make it easier to address ongoing harms without new systems and harms cropping up all the time, and would also allow more time for research to address extinction risk.

Not all interventions will be common. For example, pre-registration and public notification of large training runs does not seem to have much of an impact on ongoing harms. Yet, I think there are likely to be more common interventions in the near future: both ongoing harms and extinction risk stem from the ability of certain players to push the negative externalities of AI development and deployment onto society. In the same way that companies can release systems for questionable applications or without sufficiently testing for bias, those same companies can accelerate their AI research and encourage perceptions of an arms race, exacerbating extinction risk. I think the AI existential safety community has a lot to learn from the FATE community about the complexity of sociotechnical systems and how to address risks thereof.

Next, neither time, resources, nor attention is a significant limitation.

Time and resources are not limited:

Far more work exists to make AIs smarter rather than to prevent and mitigate harms. Redistribution of effort from capabilities research to either AI existential safety or ongoing harms would benefit both.
Even if we cannot redistribute research priorities, the AI field continues to grow rapidly. Incoming researchers could focus more on addressing harms.
Funding to address harms can increase, such as here.

Attention is not limited:

Different researchers can devote their attention to different things.
To the extent that there are common interventions, drawing the attention of the public and policy makers to those interventions does not draw attention away from ongoing harms.

People are the problem, not AI

Since the statement begins with, “Mitigating the risk of extinction from AI”, one could claim that the statement wrongly shifts responsibility from humans onto AIs. One could also claim that in general, discussion about rogue AIs has the same effect.

I agree with the contention that people are the problem, but in a different way than I think the objection intends. First, just because we talk about “risks from AI” does not mean we are forgetting about the humans responsible; clearly, somebody has to be building and deploying these systems. Similarly, we can talk about “risks from nuclear weapons” with the understanding that humans are the ones building and launching the nuclear weapons. People may also be incentivized to build AI systems in ways that would make extinction risk worse, such as by reducing human oversight and granting AIs access to weapon systems.

Second, all of the following can be true at the same time without contradiction.

Humans bear full responsibility for building and deploying AI systems.
AI systems can act outside of the intentions of their designers, deployers, users (see the section Extinction risk from AI is not real).

In particular, humans still bear full responsibility for deploying a system that goes rogue. A designer of a pandemic-level virus does not know the specific way in which the pathogen will infect and kill people, but the designer is still responsible for the subsequent deaths. Similarly, although developers, deployers, and users might not be able to predict the ways in which an AI system can act outside of their control, they are still responsible (likely to differing degrees) for subsequent harms. We are also collectively responsible for changing the system of incentives (e.g., competitive pressures) that currently permits some democratically unaccountable actors to unilaterally trade away collective safety for individual gain.

The contention from AI existential safety is that we should not build or deploy systems that could result in human extinction. Such systems would not be an unavoidable act of God; rather, they would be a deliberate choice made in response to incentives.

The statement doesn't call for any concrete action

It is true that the statement does not call for concrete action. It would be great if there was broad consensus amongst the hundreds of signatories about what exactly to do about extinction risk from AI, but none exists yet.

In the absence of a consensus for concrete action, it is still useful to create common knowledge about who is concerned about extinction risk from AI. Without common knowledge, one might be afraid to talk about extinction risk for fear of attack or dismissal from one’s peers. This consideration is especially important for junior researchers. Public, informed discussion of extinction risk is a prerequisite for democratically legitimate interventions to address the risk.

Closing

A public statement on extinction risk from AI is clearly not enough. Decisions about the trajectory of AI development are still made not by an informed public after deliberation, but by small groups of unrepresentative people behind closed doors. Commercial access to AI systems is no substitute for collective control over how they are developed and deployed. Encouraging public discussion of extinction risk is a first step.

Acknowledgements

I'm grateful to the following people for constructive feedback on this post: Usman Anwar, Tegan Maharaj, David Krueger, and Roger Grosse.

Here is a recent interview.

Alan’s Substack

Discussion about this post