



V. Anantha Nageswaran: AI does not know what it doesn’t know—and that’s reason enough for abundant caution
Subscribe to enjoy similar stories.A group of 20 AI researchers recently spent two weeks trying to break a set of autonomous AI agents—systems with real email accounts, persistent memory, shell access and the authority to act on their owners’ behalf. They succeeded 11 times out of the cases they documented. The agents disclosed private medical records, wiped email servers, broadcast defamatory messages, looped in resource-consuming spirals for nine days, and were corrupted through a fake governance document that gave an attacker persistent but invisible control across multiple sessions.
The paper they published, ‘Agents of Chaos,’ is important. But the most important thing about it is not the 11 breaches. It is a methodological admission buried in the introduction: the red-teaming approach was chosen specifically to surface ’unknown unknowns’—failure modes that cannot be anticipated theoretically and only reveal themselves through adversarial interaction with deployed systems.
That phrase should stop every policymaker, corporate decision-maker and AI developer in their tracks. Because unknown unknowns are not a technical problem amenable to better regulation. They are an epistemological condition that changes the entire moral calculus of AI deployment.The failures documented in the paper are not the familiar ones—hallucinations, toxicity, refusal errors.
They are emergent failures that arise when a language model is given tool access, persistent memory, delegated authority and multiple interlocutors operating simultaneously. And crucially, nobody predicted them. The researchers did not walk in with a list of vulnerabilities to test.
Read on livemint.com