Governing AI agents

by Team EthicAI | Apr 20, 2025 | AI Governance

As artificial intelligence continues its break neck development pace, AI agents are emerging as the probable next frontier. These agents are not just advanced chatbots; they are systems capable of autonomously achieving goals in the world with minimal human input. According to AI Agent Governance: A Field Guide, a new report by the Institute for AI Policy and Strategy (IAPS), society is not yet prepared for the scale and complexity these systems may introduce. The report offers a comprehensive survey of current capabilities, risks, and early thinking around how to govern this emerging class of AI.

We’re summarising the key insights from the report and highlighting what business and technical leaders need to know now to stay ahead of the curve.

What are AI agents and why they matter

AI agents are systems that can independently pursue goals in dynamic environments, without step-by-step instructions. This distinguishes them from earlier AI systems, which typically require direct input for each task. Agent systems can plan, act, use external tools, and interact with other AI agents or humans to achieve complex objectives over time.

These capabilities are being rapidly developed by leading tech companies and startups alike. OpenAI, Anthropic, Google DeepMind, Meta, and Salesforce have all made major announcements in this space. A typical AI agent today consists of a large foundation model (e.g. GPT or Claude) wrapped in scaffolding software that gives it memory, tool-use abilities, and the capacity to plan or collaborate with other agents. These architectures are moving quickly from research labs into production, with use cases already appearing in customer service, cybersecurity, and AI research itself.

Current capabilities and limitations

Despite the excitement, today’s AI agents remain far from general-purpose digital workers. According to the report, agents tend to perform comparably to humans on shorter tasks (~30 minutes or less), but their performance drops off steeply as tasks become more complex or open-ended.

Across six major benchmarks designed to simulate real-world workflows (e.g. GAIA, METR, RE-bench, SWE-bench, CyBench, WebArena), agents consistently underperform human experts on tasks that require more than an hour of focused effort. Some key findings:

General AI Assistants (GAIA): Human accuracy is 92%; best agents achieve just 15%. They fail entirely on complex, multi-step tasks.
SWE-bench Verified: Agent performance drops to zero on tasks that take over four hours for a human engineer.
WebArena: Agents complete only 14% of complex web navigation tasks; humans succeed 78% of the time.

That said, progress is being made. OpenAI’s new o3 model, using dynamic “test-time compute,” scored 71.7% on SWE-bench Verified, substantially outperforming prior systems. Researchers also estimate that the task length AI can handle doubles every 7 months.

From an economic standpoint, AI agents are already cost-effective in narrow domains. For example, Klarna claims its customer support agents are doing the work of 700 human FTEs without reducing satisfaction. Similarly, Google reports that a quarter of its new code is now generated by AI coding assistants.

Transformation – and risk

The field guide outlines two divergent visions for a future shaped by AI agents.

Scenario 1: ‘Agent-driven renaissance’ imagines agents integrated into everyday life in beneficial ways – from care companions for the elderly to scientific research assistants. In this future, agents support autonomy, social connection, and innovation, while being subject to oversight, fallback systems, and proactive governance.

Scenario 2: ‘Agents run amok’ presents a darker picture: billions of unmonitored agents manipulating markets, spreading malware, and degrading trust online. In this world, oversight tools lag behind the speed and scale of agent activity, and organisations struggle to shut down harmful or rogue systems.

The reality will probably fall somewhere between these extremes. But the takeaway is clear: agents amplify both upside and downside. As the report puts it, they are “force multipliers” – for productivity and innovation, but also for harm and disruption.

Four main risk domains are highlighted:

1. Malicious use – Autonomous agents can scale disinformation, automate cyberattacks, or carry out dual-use research (e.g. bioweapon design) at a pace and scale previously unimaginable.

2. Accidents and loss of control – Agents may fail in novel ways due to misalignment, poor reasoning, or lack of transparency. The report notes real-world failures from AI-powered cars and chatbots, and explores more speculative risks such as rogue replication.

3. Security vulnerabilities – Memory manipulation, unsafe tool integrations, and cascading multi-agent interactions increase the attack surface.

4. Systemic risks – Mass deployment could lead to power concentration, inequality, labour disruption, and the erosion of democratic accountability.

Governance of AI agents

In response to these dynamics, a new area of research is forming: agent governance. This field focuses on how to evaluate, constrain, and guide the behaviour of agentic systems across their lifecycle.

According to the report, key questions include:

How do we monitor and evaluate increasingly autonomous agents?
What mechanisms should exist to control or constrain agent actions?
How should legal, technical, and institutional frameworks evolve?
Can agents themselves play a role in governance – as monitors, mediators, or regulators?

At present, few governance mechanisms are ready for real-world deployment. Most interventions are still in the conceptual stage, with limited funding, research, and testing. The report warns that governance efforts are lagging well behind capability development, and calls for urgent investment from civil society, industry, and governments.

An agent interventions taxonomy

To move the field forward, the report proposes an outcomes-based taxonomy of interventions aimed at managing AI agent risks. These are grouped into five categories:

1. Alignment – Ensuring agents pursue human-aligned goals.

Examples: multi-agent reinforcement learning, risk attitude tuning, alignment evaluations.

2. Control – Constraining agent behaviour within safe boundaries.

Examples: rollback infrastructure, shutdown mechanisms, restricted tool use.

3. Visibility – Making agent actions and capabilities legible to humans.

Examples: agent IDs, activity logging, cooperation-readiness evaluations.

4. Security and robustness – Hardening agents against adversarial threats and failures.

Examples: sandboxing, adversarial testing, adaptive defence systems.

5. Societal integration – Ensuring agents fit into broader human systems ethically and fairly.

Examples: liability regimes, equitable access schemes, law-following agents.

This taxonomy does not yet represent a comprehensive solution. Rather, it is a framework to guide experimentation and prioritisation across sectors. The report stresses the need for scalable, tested interventions that can be integrated into real-world deployment pipelines.

Implications for AI leaders

For companies building or adopting AI agents the report offers several key insights:

Technical capability is not the bottleneck – trust is. If systems remain unreliable, opaque, or ungovernable, adoption will be constrained, regardless of performance gains.
Benchmarking matters – but isn’t enough. Agent performance on tasks like customer support or code generation is improving, but broader capabilities still lag behind human generalists. Long-term trust and utility will depend on agents’ ability to reason, recover from failure, and act in novel environments.
Governance can be a differentiator. Companies that invest early in visibility, alignment, and control mechanisms may find themselves better positioned as agents move from proof-of-concept to infrastructure.
Regulation is coming. As with data privacy and AI safety more broadly, agent deployment will face increasing scrutiny. Pre-emptively integrating governance principles will reduce regulatory risk and improve stakeholder confidence.
Adoption will be uneven. Early wins will come in well-bounded domains (e.g. customer service, coding support, ML engineering). High-risk or high-consequence environments (e.g. finance, healthcare, critical infrastructure) will require more robust governance before agents can play a major role.

The message from the report is that while AI agents offer real promise, they also introduce new governance challenges that today’s frameworks are not equipped to manage. The field of agent governance is still forming, but the gap between capability and control is growing fast. The organisations that will thrive in the age of AI agents will be those that treat safety, transparency, and alignment as priorities – not afterthoughts.

← The business case for ethical AI: clear ROI What is Responsible AI in 2025? →