Menu

Frontier AI’s safety failures

by | Dec 5, 2025 | AI Risk, AI Safety

The latest AI Safety Index from the Future of Life Institute (FIL) warns that while Frontier AI capabilities continue to advance the safety practices designed to govern and contain them are falling behind. The report paints a picture of an industry in which leading developers articulate high-level commitments to responsible AI but fall far short of the measurable safeguards, independent oversight, and risk-governance structures that would normally anchor safety-critical technologies.

The Index’s findings, based on evidence gathered through November 2025, suggest that the commercial environment around frontier AI remains structurally underprepared for both near-term harm and long-term controllability risks. Understanding these weaknesses is essential for leaders considering partnerships, procurement, or in-house development.

Safety maturity is not keeping pace with capability ambition

The central message of the Index is that the strongest performers are far from meeting expectations that are now being established through frameworks such as the EU AI Code of Practice, the G7 Hiroshima Process, and California’s SB 53. Most companies have improved disclosure and governance structures in some areas, yet their overall progress is incremental against a backdrop of accelerating capability ambitions.

Anthropic, OpenAI and Google DeepMind again occupy the top tier, with Anthropic leading every domain. Yet even these firms receive only a C+ overall, and all score poorly on existential safety, the domain that examines preparedness for catastrophic or loss-of-control risks. No company scored above a D in this category for the second consecutive edition

The report argues that this structural gap between capability and safety is widening.

“companies are doing poorly, and even the best are making questionable assumptions in their safety strategies”

Risk assessment progress is uneven and lacks credible independence

Risk assessment is one of the few areas in which the leading companies show relatively strong performance. Anthropic and OpenAI in particular demonstrate more rigorous internal assessment processes, with documented elicitation strategies and established bug bounty programmes. Google DeepMind has also increased transparency by completing the Index’s survey and publishing more about its internal evaluations. Z.ai stands out among the next tier for permitting external evaluators to publish results without censorship and for allowing risk assessments before internal deployment.

However, even among the leaders, the report identifies critical gaps that have direct relevance for enterprise buyers. None of the companies have conducted Human Uplift Trials, a method for understanding how a model may amplify a user’s ability to cause harm. More importantly, none have secured genuinely independent reviews of their safety evaluations. External reviewers typically face restrictions on what they may publish, and several companies compensate the evaluators they select, limiting independence from the outset.

Risk-assessment scope also remains narrow. Climate-related and environmental risks are not included, despite ongoing controversy about the impact of data centres. Companies also have not published quantitative estimates of the likelihood that the AGI or superintelligence systems they intend to build could become misaligned or escape control. Even in high-attention domains such as biorisk, the report finds an over-reliance on task-specific tests that do little to uncover latent dangerous capabilities or performance under adversarial conditions.

For procurement, this means that internal risk assessments shared by vendors may appear comprehensive while omitting key categories that materially affect organisational exposure.

Current-harm performance is inconsistent and often weak

When it comes to demonstrated safety outcomes the results are sobering. Most models struggle with trustworthiness, truthfulness, fairness and robustness to adversarial prompts. While none failed the benchmarks outright, performance across HELM Safety, HELM AIR-Bench, TrustLLM and CAIS safety evaluations is consistently weaker than the maturity of the technology might suggest. Anthropic again leads, and xAI performs the worst, particularly on HELM AIR.

The FIL notes that poor benchmark performance may understate real-world risk. Safety benchmarks are narrow and can be gamed; therefore, a model that performs poorly in controlled tests is likely to perform worse in deployment. This presents a concern for enterprises deploying models into complex environments where prompts, user behaviour and system integrations all introduce additional variables.

A noteworthy trend is the industry-wide shift towards training on user interaction data by default. Reviewers express concern that this exposes organisations to privacy and confidentiality risks, as sensitive information can later be reproduced by models. Anthropic’s decision to adopt this practice in August 2025 reduced its score in this domain and removes what had been a meaningful differentiator in privacy-preserving design.

Safety frameworks are often published, but still lack rigour

A growing number of companies now publish safety frameworks describing thresholds, risk categories and procedures. Anthropic, Google DeepMind, OpenAI, Meta and xAI all have structured frameworks of varying depth. Google DeepMind’s latest iteration is recognised for expanding its definition of harmful manipulation risks and for establishing early-warning evaluations. Meta’s framework stands out for including outcome-based thresholds, although the trigger for mitigation appears too high and decision-making authority remains opaque. xAI provides quantitative thresholds yet these do not clearly link to deployment decisions.

The most meaningful limitations across frameworks are that thresholds are typically qualitative, loosely defined, or disconnected from measurable risk. As a result, it is unclear when safeguards would activate or what specific actions they would trigger. Few companies describe implementation details for deployment controls, incident response or security measures, and none include external oversight mechanisms that would independently validate performance or enforce mitigation.

Three companies – Alibaba Cloud, DeepSeek and Z.ai – lack published frameworks, earning failing grades. Reviewers note that Z.ai despite the absence of a public framework is investing in its safety team and discloses system prompts to regulators under certain conditions, suggesting there are foundations for future progress.

Existential safety is the industry’s structural failure

The Index makes clear that existential safety is the weakest domain across all companies including those that publicly acknowledge the stakes. The highest score recorded is a D, and several companies receive an F. The gap between aspiration and readiness is wide. Companies are accelerating AGI and superintelligence development while lacking credible strategies for preventing catastrophic misuse or loss of control – an inconsistency the report calls a ‘foundational hypocrisy’ in the sector.

Anthropic, Google DeepMind and OpenAI publish more alignment and control research than others and operate at a higher level of conceptual readiness. Yet their approaches are still grounded in assumptions that reviewers find optimistic. For example, Anthropic’s trigger requiring a model to automate the work of junior researchers before certain mitigations activate may rely on overly confident beliefs about our ability to detect dangerous capabilities in advance. Reviewers warn that if elicitation techniques are insufficient, these thresholds may activate far later than intended.

Xai, Meta, DeepSeek and Alibaba Cloud provide minimal evidence of meaningful investment in extreme-risk mitigation. Z.ai is developing an existential-risk plan, and its willingness to defer to external authorities during emergencies is encouraging, but the overall picture remains inadequate for the scale of risk these companies may create.

Governance and information-sharing practices reflect widening divergence

Governance and accountability show the clearest divergence between companies. Anthropic and OpenAI, both public benefit corporations, lead the domain. Anthropic discloses more information about its whistleblowing processes and intends to publish a full policy. OpenAI is the only company with a public whistleblowing policy and retains a governance structure in which the nonprofit parent maintains significant oversight of the for-profit arm. Google DeepMind provides strong disclosures in its survey response, though it lacks a public whistleblowing policy.

Other companies lag behind. Meta, DeepSeek and Alibaba Cloud provide either limited or no information on whistleblowing protections. Z.ai discloses none. Yet the report highlights that Chinese regulatory requirements, such as mandatory content labelling and incident reporting, sometimes offer stronger baseline accountability than voluntary standards in the US and UK.

Information sharing follows a similar pattern. Anthropic and OpenAI are the most transparent, regularly releasing detailed safety documentation and participating in voluntary international commitments. Meta performs poorest, citing dismissive public messaging about existential risk and lobbying against key safety regulations. Google DeepMind and xAI show a mixture of transparency and resistance to regulatory initiatives. DeepSeek and Alibaba Cloud provide relatively little public communication but contribute to national safety standards in China.

Governance and transparency are useful indicators of whether a vendor is structurally oriented toward safe operation over time. They also matter for due diligence, companies with clear escalation pathways, whistleblower protections and transparent reporting are more likely to surface problems before they escalate.

What this means for organisations preparing for AI deployment

The findings of the Index carry direct implications for organisations intending to adopt frontier AI technologies. Although the Index doesn’t offer prescriptive guidance for buyers, its analysis reflects a set of practical considerations that organisations can’t ignore. The safest-seeming vendors still operate with limited oversight, insufficient testing depth and incomplete safety thresholds, while the lower tier lacks basic frameworks.

One practical takeaway is the need for robust, domain-specific internal AI governance. Enterprises deploying high-capability models should not rely solely on vendor-provided assurances, regardless of the provider’s reputation. Independent evaluation and assurance – technical, ethical and operational – should become a routine requirement before integrating frontier systems into business-critical processes.

A second takeaway is the importance of aligning deployment decisions with an organisation’s own risk tolerance – not the vendor’s. Since existing frameworks often describe thresholds that are neither measurable nor enforceable, companies may find themselves assuming risks the developer has not fully assessed. Asking vendors to provide detail on evaluation scope, governance triggers and incident-response mechanisms can help clarify these gaps.

Finally, the Index indicates the value of building long-term relationships with those that demonstrate structural commitments to safety, rather than episodic or market-driven disclosures. Transparency, accountability mechanisms and adherence to voluntary or regulatory standards are not only compliance features – they’re are signals of a development culture that is more likely to support safe adoption.

The Frontier AI ecosystem is clearly becoming more complex and more capable but its safety infrastructure remains immature. The Winter 2025 AI Safety Index underscores this imbalance starkly – even the companies most invested in safety research do not yet meet emerging governance expectations, and the rest of the sector is significantly further behind. This is a call for engaging cautiously – adoption of frontier AI can unlock transformational value but only when it comes with disciplined governance, rigorous evaluation and assurance and a clear-eyed understanding of the limitations of current industry safety practices.