AI Governance7 min read

AI Gets Answers Wrong. Courts, Regulators and Clients Are Starting to Notice. Here Is the Governance Response.

Stanford's 2026 AI Index found hallucination rates of 22% to 94% across top AI models. Courts are imposing sanctions on organisations whose AI produced wrong outputs. The AI Incident Database recorded 362 incidents in 2025, up 55% from 2024. Most enterprise AI governance frameworks approved systems at deployment and then stopped. Here is what the governance response to AI output accuracy risk actually requires.

Mark MillerBy Mark Miller
AI Hallucinations Are a Legal Liability. Here Is the Fix.

In April 2026, Stanford University's Human-Centred Artificial Intelligence Institute published its annual AI Index. Among the findings: hallucination rates across 26 top AI models range from 22% to 94% depending on the benchmark. GPT-4o's accuracy on a specific tasks benchmark dropped from 98.2% to 64.4%. DeepSeek R1 fell from above 90% to 14.4% on certain evaluations.

Those figures are from controlled testing. The AI systems your organisation is running in production have never been evaluated to that standard. Most were approved at deployment based on a demonstration and a handful of test cases. After that, they were released into production with no systematic programme for monitoring whether their outputs are accurate.

That was a tolerable design choice when the consequences of getting something wrong were contained. It is a less tolerable design choice now that courts are imposing financial sanctions, regulators are conducting reviews and clients are presenting remediation invoices.

What the Incident Record Actually Shows

The AI Incident Database recorded 362 documented AI-related incidents in 2025, up from 233 in 2024. That is a 55% increase year on year, and the rate is accelerating as AI adoption deepens and organisations move from pilots into full production.

The pattern across those incidents is consistent. An AI system produced an output that was wrong. That output was relied upon. The cost of correction fell on the organisation that deployed the system, not the one that built it.

Air Canada's experience with its AI chatbot is the most widely cited example, and it remains instructive. The chatbot told a customer that a bereavement travel discount could be claimed retroactively. That policy did not exist. When Air Canada sought to disavow the chatbot's statement on the grounds that the bot was a "separate legal entity" responsible for its own representations, the Canadian Civil Resolution Tribunal rejected that argument. Air Canada was held accountable for what its AI said. The liability attached to the deploying organisation.

In the legal profession, the pattern has been sharper and the consequences more public. US courts have imposed financial sanctions against attorneys who filed AI-generated briefs containing case citations that did not exist. The lawyers argued they had relied on the AI and had not independently verified the citations. Courts found that reliance on AI output without verification does not satisfy a practitioner's professional obligations. The first quarter of 2026 produced the highest quarterly total of such sanctions on record.

In Australian professional services, a widely reported incident involved a major consulting firm's client report containing fabricated citations and a non-existent court quotation. The firm returned fees and incurred substantial remediation costs. The reputational exposure was documented across the professional services press.

Two things connect every one of these cases. First, the deploying organisation was held accountable for the AI's output regardless of which vendor's model produced it. Second, the governance failure was the same in each: no systematic review of AI outputs before they left the organisation.

Why Most Enterprise Governance Frameworks Miss This

There is a structural reason for this gap, and it is not negligence. It is design.

Enterprise AI governance frameworks were built to answer one question: should we deploy this AI system? They include risk classification, use-case approval, bias assessment, privacy review and compliance mapping. Those are the right questions to ask before deployment.

What they do not ask is the question that matters after: is what this AI system is producing accurate enough, often enough, to meet our obligations? That is a different question. It requires different processes, a different cadence of oversight and different accountability.

What tends to happen instead is that post-deployment oversight is left to the business unit using the system, which monitors business KPIs but not output accuracy. Or to the technology team, which monitors system availability but not whether the content the system produces is factually correct. Governance teams see the use-case register entry showing the system was approved, and assume the system is performing as intended.

The Stanford data removes the basis for that assumption. Hallucination rates of 22% to 94% across top models are not marginal edge cases. At any meaningful scale of deployment, a material number of outputs are wrong. The question is whether your organisation has any mechanism to catch them.

The Regulatory Dimension Is Already Live

Several obligations that currently apply to Australian organisations carry implicit accuracy requirements for AI outputs.

The Privacy Act's automated decision-making transparency obligations, taking effect December 10, 2026, imply that decisions made significantly by computer programs affecting individuals must be based on accurate information. An AI system prone to generating inaccurate outputs that then drives decisions about people creates a compliance exposure under that obligation, not a future one.

ASIC's enforcement framework under the Corporations Act requires financial services entities to act "efficiently, honestly and fairly." For organisations using AI to generate advice, produce client communications or assist in decisions that affect clients' financial positions, that standard applies to AI outputs in the same way it applies to human outputs. ASIC has made this explicit.

The EU AI Act's Article 9, which covers risk management systems for high-risk AI, requires ongoing monitoring and evaluation of AI system performance including accuracy against defined objectives. For Australian organisations with EU-facing high-risk AI deployments, that obligation is already live.

None of these frameworks require AI to be perfect. All of them require organisations to have a demonstrable programme for managing the risk that it is not.

What Good Output Accuracy Governance Actually Looks Like

The governance response to AI hallucination risk is not to remove AI from the enterprise. It is to design oversight around the outputs, proportional to the consequences of getting them wrong.

Human review checkpoints for externally used outputs. For any AI output that is sent to a client, filed with a court, submitted to a regulator or used to make a decision affecting an individual, a human review step should be mandatory, not optional. The Air Canada case and the legal citation sanctions both involved situations where verification was left to individual discretion. Making it mandatory and documented is the governance change that creates a different outcome.

Statistical output auditing for high-volume deployments. For AI systems operating at scale where individual review of every output is impractical, sampling and auditing provides the necessary oversight. A programme that reviews a representative sample of outputs against ground truth, tracks accuracy metrics over time and reports findings to governance leadership creates both a risk management control and a documentary record of oversight.

Verification protocols embedded in professional workflows. For AI systems used in legal, advisory, research or analytical contexts, verification of citations, data sources and factual claims needs to be built into the workflow, not added as an optional step. The professional who signs the document is accountable for its accuracy regardless of which tool assisted in drafting it.

Clear accountability assignment. In most organisations, nobody owns the question of whether AI outputs are accurate on an ongoing basis after deployment. Assigning that accountability explicitly, whether to a use-case owner, a risk function or a quality assurance process, is the foundational step toward managing the risk.

What This Means for Your Organisation

Most AI governance programmes in Australian enterprises were designed for a world where the primary risk was deploying the wrong AI system. The governance controls at the gate were the main defence.

The cases accumulating around AI hallucination liability describe a different world: one where the risk is not the deployment decision but what the deployed system produces afterward, at scale, in contexts where being wrong has consequences.

The organisations that handle the next AI-related incident well are those that can point to documented output accuracy protocols for high-risk use cases, evidence of ongoing human oversight for externally facing outputs, and a clear chain of accountability from AI output to responsible person. Those without that record face a considerably harder conversation with a regulator, a client or a court.

Building that capability is not a large undertaking relative to the AI governance infrastructure most organisations already have in place. It is adding one more dimension to the post-deployment oversight that should sit alongside the pre-deployment assessment. The question is not whether to add it. It is how quickly.

Key Takeaways

  • Stanford's 2026 AI Index found hallucination rates of 22% to 94% across 26 top models: material numbers of AI outputs are wrong, at enterprise scale, across all current systems
  • Courts in multiple jurisdictions are holding deploying organisations accountable for AI outputs regardless of which vendor's model produced them
  • The AI Incident Database recorded 362 AI-related incidents in 2025, up 55% from 2024: the incident rate is accelerating as deployment deepens
  • Most enterprise AI governance frameworks have no systematic post-deployment mechanism for monitoring output accuracy
  • Australia's Privacy Act ADM obligations, ASIC's enforcement framework and the EU AI Act's Article 9 all carry implicit accuracy requirements for AI outputs in scope

How Trusenta Can Help

AI Governance captures each AI deployment with the risk classification and output profile needed to determine what level of accuracy oversight is proportionate, making it possible to flag which use cases require mandatory human review and which can be managed through periodic statistical auditing.

Risk Management enables organisations to track AI output accuracy as an ongoing risk dimension with assigned accountability, treatment controls and the documentary record of oversight that regulatory review and client accountability require.

AI Governance Foundations establishes the governance infrastructure organisations need to build output accuracy protocols into their AI oversight programme from the outset, before the first post-deployment incident forces a reactive response rather than a designed one.

The Question Most Frameworks Do Not Ask

The cases accumulating around AI hallucination liability share a common element: the organisations involved had governance processes that stopped at deployment. What came out of the system afterward was assumed to be adequate.

That assumption is no longer tenable. Stanford's data confirms that AI systems produce wrong outputs at rates that matter at enterprise scale. The regulatory and legal environment confirms that deploying organisations bear the consequences. The governance framework that most enterprises are running was not designed to bridge that gap.

Closing it requires asking the question most frameworks do not: not "should we run this AI" but "is what this AI is producing accurate enough, often enough, to meet our obligations to the people and institutions that rely on it."

Mark Miller

Written by

Mark Miller

Mark brings a rare blend of C-suite leadership and hands-on consulting experience to Trusenta. As former SVP of Services, SVP of Business Operations, Managing Director and CIO he brings a breadth of experience in his specialty in guiding organisations through AI strategy, governance and adoption; bridging ambition with practical execution. His focus is on helping clients embed AI responsibly, at scale and in service of real business outcomes.

Connect on LinkedIn

More from AI Governance

Ready to transform your AI strategy?

Partner with Australia's AI strategy and governance specialists. From adoption roadmaps to ISO 42001 audit readiness.