As part of pre-release safety testing for its new GPT-4 AI model, launched Tuesday, OpenAI allowed an AI testing group to assess the potential risks of the model’s emergent capabilities—including “power-seeking behavior,” self-replication, and self-improvement.
While the testing group found that GPT-4 was “ineffective at the autonomous replication task,” the nature of the experiments raises eye-opening questions about the safety of future AI systems. Raising alarms
“Novel capabilities often emerge in more powerful models,” writes OpenAI in a GPT-4 safety document published yesterday. “Some that are particularly concerning are the ability to create and act on long-term plans, to accrue power and resources (“power-seeking”), and to exhibit behavior that is increasingly ‘agentic.'” In this case, OpenAI clarifies that “agentic” isn’t necessarily meant to humanize the models or declare sentience but simply to denote the ability to accomplish independent goals.
FURTHER READING
Report: Microsoft cut a key AI ethics team
Over the past decade, some AI researchers have raised alarms that sufficiently powerful AI models, if not properly controlled, could pose an existential threat to humanity (often called “x-risk,” for existential risk). In particular, “AI takeover” is a hypothetical future in which artificial intelligence surpasses human intelligence and becomes the dominant force on the planet. In this scenario, AI systems gain the ability to control or manipulate human behavior, resources, and institutions, usually leading to catastrophic consequences.