Menu

Advanced AI agents – opportunities and governance imperatives

by | Nov 1, 2024 | AI Risk

The rapid progression of artificial intelligence (AI) has introduced a new generation of “advanced AI agents,” defined as systems leveraging large language models (LLMs) to execute complex, multistep tasks across diverse environments autonomously. Unlike basic chatbots, these agents are capable of handling multiple types of input, such as text and images, and can execute straightforward tasks autonomously once given initial instructions. For instance, after receiving a clear goal—like sorting files or retrieving specific information from a database—they can carry out the necessary steps without continuous guidance (Woodside & Toner, 2024). However, they still lack the nuanced decision-making required to adapt independently to unpredictable or complex real-world situations and need specific instructions and human oversight to operate effectively. As advanced AI systems evolve, their applications extend far beyond conventional tasks, bringing both opportunities for innovation and complex regulatory challenges that necessitate informed governance. This article explores the implications of autonomous AI capabilities, from task execution to safety and oversight.

Autonomous planning and decision-making

The ability of advanced AI agents to break down tasks into sequential actions, or “planning,” is crucial for their autonomous functioning. This involves responding to user commands and initiating and carrying out tasks independently. Autonomous planning and decision-making in advanced AI agents involve a process where the agent independently decomposes complex tasks into smaller, sequential actions. This planning capability means that, rather than merely responding to direct commands from a user, the AI can initiate steps on its own, making decisions along the way based on the task’s evolving requirements (Woodside & Toner, 2024). For example, when assigned a goal like “organise data files,” an AI agent with planning capabilities might autonomously analyse file categories, organise them according to specific rules, and adjust its actions if it encounters anomalies or new patterns in the data.

The planning process relies on the agent’s ability to recognise overarching goals and divide them into actionable subtasks, often involving real-time adjustments and adaptability. This approach is essential for the AI to handle open-ended, complex tasks without constant user intervention, although it still faces challenges in managing unpredictability over longer sequences. Applications have shown promising outcomes in executing straightforward instructions but reveal limitations in complex multi-step plans, especially where unforeseen obstacles arise (Woodside & Toner, 2024).

Capabilities and challenges

One of the most significant capabilities of these agents is their integration with external tools, enabling them to perform real-time interactions with third-party services. For instance, the ChatGPT plugin for Instacart enables users to generate shopping lists and add ingredients to their cart directly through Instacart based on food-related prompts. This integration allows ChatGPT to interpret user requests, like asking for recipes or meal ideas, and convert them into actionable shopping items within Instacart’s app. For example, a user could input something like, “What do I need to make spaghetti Bolognese?” and ChatGPT would compile a list of necessary ingredients, automatically adding them to the Instacart cart for easy checkout and fast delivery options (Instacart, 2023).

Additionally, researchers are expanding the use of these agents to control physical systems. For example, in Microsoft’s exploration of using ChatGPT for robotics, one notable test involved giving ChatGPT control over a drone within the Microsoft AirSim simulator. This test aimed to gauge the model’s ability to interpret natural language commands and execute specific tasks autonomously. ChatGPT completed several commands: identifying a drink based on a description, suggesting a “healthy option,” and even taking a selfie in front of a reflective surface. In another scenario, researchers tested ChatGPT’s manipulation skills by having it stack blocks to recreate Microsoft’s four-coloured logo. This task showcased ChatGPT’s ability to draw upon its internal knowledge and apply it practically, interpreting the task requirements and using virtual manipulation to complete it (Microsoft, 2023). Although ChatGPT demonstrated strong capability in following high-level commands, its performance still relied on pre-defined prompts and specific situational guidance. For tasks requiring flexibility and error correction, human oversight remains critical. Microsoft emphasises that, while promising, this level of control is not yet ready for critical, real-world applications due to risks associated with unsupervised errors in complex environments (Microsoft. n.d.).

As another example, in gaming, advanced AI agents like Google DeepMind’s Scalable, Instructable, Multiworld Agent (SIMA) enhance player interaction by interpreting user language inputs and autonomously navigating complex virtual environments. For example, users can instruct the AI to “find items” or “avoid obstacles,” and it will execute these tasks with minimal guidance, making gameplay more immersive. However, these agents are not yet fully reliable in all gaming scenarios, as they sometimes struggle with ambiguous instructions or complex, multi-step actions without further human input, highlighting the need for continued development in AI adaptability (SIMA Team et al., 2024).

Beyond gaming and consumer services, advanced AI agents also show promise in personalised education, healthcare assistance, and creative industries, where they offer new methods of engagement and efficiency. However, each of these fields introduces unique governance challenges, necessitating tailored safeguards to address privacy, dependency, and ethical considerations specific to these sectors. While the applications of this are promising—current limitations restrict their utility in complex scenarios. Challenges arise when agents face unexpected obstacles or need to perform long sequences of actions, where even a single misstep could disrupt the entire task. The planning capabilities of LLMs, while effective in controlled environments, often struggle with extended or intricate tasks, as each step carries the risk of compounding errors (Woodside & Toner, 2024).

Governance challenges and recommendations

The evolution of advanced AI agents raises critical questions about governance and regulation. These agents’ autonomy and their ability to interact with external systems increase their “attack surface”—the range of ways malicious actors could exploit them. This makes secure operation and oversight essential. To ensure secure operation and oversight, cross-sectoral collaboration is essential, where industry standards and regulatory bodies work together to create ‘sandboxed’ environments for testing advanced AI agents. This testing approach allows stakeholders to identify vulnerabilities before these agents are deployed in critical or sensitive applications, thereby reducing risks associated with unforeseen interactions or misuse.

As highlighted by Thomas Woodside and Helen Toner (2024) from the Center for Security and Emerging Technology, effective governance mechanisms must address the following concerns:

Attack surfaces

Recent studies highlight the complex vulnerabilities of large language models (LLMs), especially regarding prompt injection attacks and context contamination, where adversarial prompts or external data sources are used to manipulate the AI’s output. For instance, attackers can exploit model prompts or integrate malicious content through web tools, causing LLMs to leak private information or follow harmful instructions (Wei et al., 2024; Liu et al., 2023)  Another recent work details backdoor attacks, where malicious patterns are inserted into the model during training, allowing future malicious activations. This method is increasingly seen in commercial LLM applications, making information flow control and secure training crucial to protect sensitive data (Cui et al., 2023)​.

Liability and accountability

The complexity of assigning accountability in AI-driven systems is growing as autonomous systems become more embedded in critical sectors. Scholars from NIST have underscored the need for robust frameworks in the U.S. and internationally, especially where AI is used in sensitive areas like finance and healthcare. They advocate for liability models that assign responsibility to developers and third-party providers as part of a holistic risk management approach (NIST, 2023)​. The European Union’s AI Act is another reference point, setting clear accountability standards that require AI developers and vendors to address risks posed by their systems, making it one of the first regulatory efforts to outline liability measures in AI usage across sectors (EU, 2023).

Disclosure and transparency

Transparency is essential in AI to foster user trust and responsible interaction. Emerging regulatory frameworks, such as the European Union’s AI Act, advocate for mandatory disclosures on AI’s decision-making processes, especially in high-stakes applications, to enhance transparency and accountability (European Union, 2023). Studies in AI governance emphasise that systems should clearly indicate when users are interacting with an AI agent (OECD AI Policy Observatory, 2024)​.

Monitoring and oversight

Monitoring frameworks are crucial in detecting and mitigating risks in real time as autonomous agents perform tasks across various domains. NIST’s AI Risk Management Framework recommends continuous oversight, especially for high-stakes applications, advocating for regular audits and adaptive monitoring to detect misuse early (NIST, 2023). Recent empirical research also supports integrating machine-learning-based monitoring tools capable of identifying unusual behaviours in AI systems, such as real-time flagging of potential harmful outputs in customer-facing applications (IEEE Xplore, 2023).

Tool restrictions and human-in-the-loop approaches

Human oversight is increasingly advocated in high-risk AI tasks, such as those involving sensitive data or high-impact actions. Studies have demonstrated that maintaining a human-in-the-loop for critical functions minimises errors and ensures ethical safeguards. Microsoft’s experiments with AI-controlled physical systems emphasise the importance of restricting autonomous agents’ access to critical tools without human verification steps, especially in financial or healthcare applications (Microsoft, 2022)​.

Restricting AI’s access to potentially harmful tools—such as automated transaction processing or medical decision-making software—helps prevent unverified, autonomous actions that could lead to significant harm. Tool restrictions also mitigate risks from emerging threats like adversarial attacks, where malicious actors might attempt to exploit an AI system’s decision-making. Research shows that establishing “guardrails,” where specific functions are gated by human intervention, is vital for maintaining control in environments where small errors can lead to severe consequences (Brundage et al., 2023). This layered approach of tool restrictions combined with human oversight serves as a protective barrier, ensuring that AI agents remain within ethical and operational boundaries, particularly in fields where high-stakes outcomes demand accountability.

Emergent risks

The potential for emergent risks from interactions between AI agents, humans, and other systems is a key concern as AI integrates into complex environments. Emerging research underscores the unpredictable behaviours that can result from these interactions, such as the unforeseen consequences of LLMs engaging with external data sources or performing complex, multi-step processes. Such emergent risks have highlighted the need for comprehensive, system-level analyses that anticipate inter-agent interactions and develop preemptive mitigations (Cui et al., 2023).

The potential of advanced AI agents is substantial, yet so are the governance challenges they present. As the capabilities of these systems continue to evolve, thoughtful regulation and adaptive governance will be essential to maximise their benefits while mitigating their risks. Incorporating proactive and adaptive governance approaches can help policymakers balance innovation and safety, enabling the responsible development of advanced AI agents. By doing so, we can ensure these transformative technologies contribute positively to society while managing the inherent risks they bring. As AI agents become more embedded in daily life, the role of policymakers in monitoring and managing these impacts will grow. Ongoing research into AI ethics and its societal impacts will be essential to building adaptive and robust governance models that protect the public interest.

References

Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., Khlaaf, H., Yang, J., Toner, H., Fong, R., Maharaj, T., Koh, P. W., & Leung, J. (2018). The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. Centre for the Governance of AI, Future of Humanity Institute, University of Oxford. Retrieved from https://www.governance.ai/research-paper/the-malicious-use-of-artificial-intelligence-forecasting-prevention-and-mitigation

Cui, L., Zhao, Y., & Li, W. (2023). Backdoor attacks in commercial LLM applications: Security threats and countermeasures. Journal of Applied Machine Learning, 34(6), 345-361.

European Union. (2023). AI Act: A regulatory framework on AI. Retrieved from https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai

IEEE Xplore. (2023). Toward Intelligent Monitoring in IoT: AI Applications for Real-Time Analysis and Prediction.

Instacart. (2023). Instacart’s ChatGPT plugin: Enhancing the grocery shopping experience with AI. Retrieved from https://www.instacart.com/company/updates/instacart-chatgpt/

Liu, Y., Deng, G., Li, Y., Wang, K., Wang, Z., Wang, X., Zhang, T., Liu, Y., Wang, H., Zheng, Y., & Liu, Y. (2023). Prompt Injection attack against LLM-integrated Applications. ArXiv. https://arxiv.org/abs/2306.05499

Microsoft. (n.d.). AirSim: High-fidelity visual and physical simulation for autonomous vehicles. Retrieved from https://microsoft.github.io/AirSim/

Microsoft Research. (2023). ChatGPT for robotics: Enabling natural language interaction with robots. Retrieved from https://www.microsoft.com/en-us/research/articles/chatgpt-for-robotics/

MultiOn. (2024). Autonomous digital assistant supported by Amazon’s Alexa Fund. MultiOn White Paper. Retrieved from https://multion.ai/

National Institute of Standards and Technology (NIST). (2023). AI risk management framework. Retrieved from https://www.nist.gov/itl/ai-risk-management-framework

OECD AI Policy Observatory. (2024). OECD AI principles: Fostering trust and transparency in AI systems. OECD AI Policy Observatory. Retrieved from https://oecd.ai/en/dashboards/ai-principles/P7

Team, S., Raad, M. A., Ahuja, A., Barros, C., Besse, F., Bolt, A., Bolton, A., Brownfield, B., Buttimore, G., Cant, M., Chakera, S., Chan, S. C., Clune, J., Collister, A., Copeman, V., Cullum, A., Dasgupta, I., De Cesare, D., Di Trapani, J., . . . Young, N. (2024). Scaling Instructable Agents Across Many Simulated Worlds. ArXiv. https://arxiv.org/abs/2404.10179

Wei, Cheng’an & Chen, Kai & Zhao, Yue & Gong, Yujia & Xiang, Lu & Zhu, Shenchen. (2024). Context Injection Attacks on Large Language Models. 10.48550/arXiv.2405.20234.

Woodside, T., & Toner, H. (2024, March 8). Multimodality, Tool Use, and Autonomous Agents: Large Language Models Explained, Part 3. Center for Security and Emerging Technology. https://cset.georgetown.edu/article/multimodality-tool-use-and-autonomous-agents/