How to build an AI agent incident response plan for your SME.
When an AI agent goes wrong in production, the first thirty minutes decide how bad the outcome gets. This guide gives you a four-part response structure, a kill switch specification, the logging minimums, and a clear path through customer communication, regulatory notification, and insurance documentation.
Key takeaways
- An AI agent incident is not the same as a software bug. The liability exposure, the regulatory obligations, and the customer impact are different in kind, not just in degree.
- Every production agent needs a kill switch operable by an on-call engineer in under five minutes, with no deployment required.
- Moffatt v. Air Canada (2024) confirmed that organisations cannot disclaim liability for outputs their AI agents produce. Your agent speaks for your business.
- GDPR Article 33 requires supervisory authority notification within 72 hours when a personal data breach is involved. EU AI Act Article 26(5) adds a parallel obligation for deployers of high-risk systems.
- Insurers including Munich Re aiSure and AIUC require specific audit telemetry and incident logs before settling claims. If the documentation does not exist at incident time, it cannot be reconstructed afterwards.
What an AI agent incident actually is
A software bug is a defect in code that produces an incorrect output. The fix is a patch, and the liability is usually covered by the software vendor's terms. An AI agent incident is different in three important ways.
First, the output came from your deployed system making decisions, not from a coding error in a third-party library. Under Moffatt v. Air Canada (2024), the Canadian Civil Resolution Tribunal was explicit: Air Canada could not disclaim responsibility for misinformation provided by its chatbot by calling the chatbot a separate legal entity. The tribunal held that Air Canada was bound by the fare information its agent gave to the customer, and ordered compensation. The deployer is responsible for what the agent says and does.
Second, the failure mode is often a plausible-sounding wrong answer rather than an obvious crash. A hallucinated policy, a wrong price, or an invented commitment is harder to catch than a 500 error, and can run for hours or days before anyone notices. By that point, multiple customers may have relied on the output.
Third, the regulatory exposure is distinct. When an AI agent failure also involves personal data, you are looking at a potential GDPR Article 33 notification. When the failure involves a high-risk AI system under the EU AI Act, Article 26(5) adds a parallel notification obligation to the provider and potentially the market surveillance authority. Neither of those applies to a standard software bug.
The EU AI Act defines a serious incident in Article 3(49) as one causing death, serious harm to health, significant irreversible disruption to critical infrastructure, or serious damage to property. For most SME agents, that threshold is unlikely. But the incident response discipline that applies to serious incidents is worth building even when the stakes are lower, because the documentation it produces is exactly what insurers require at claim time.
The four-part incident response structure
Good incident response for AI agents follows the same four phases used in cybersecurity: detect, contain, recover, report. The key difference is that you need to add a fifth output: the insurance file.
Detect
Detection depends entirely on what you have set up before the incident. The minimum viable monitoring stack for a customer-facing agent includes automated output flagging (pattern matching on known failure modes, profanity filters, price thresholds), a human review cadence for a sample of outputs, and a clear channel for customer-reported issues that routes to the named agent owner rather than a generic support queue.
The moment you have a credible report that the agent produced a wrong output affecting a customer, open an incident record. Write down the time, the source of the report, and the initial description of what happened. Do not wait until you have confirmed the full scope. The clock for regulatory notification starts when you become aware, not when you finish the investigation.
Contain
Containment means stopping the agent from producing further harmful outputs while you understand what happened. This is where the kill switch matters. A kill switch is a technical mechanism that pauses the agent completely, without requiring a code deployment, without needing the model provider's dashboard, and without requiring the approval of a senior engineer who may be unavailable at 2 AM on a Sunday. Any on-call engineer should be able to use it in under five minutes.
For most SME deployments, the kill switch is a feature flag or an environment variable checked at the point where requests reach the agent. If the flag is set to inactive, requests return a static fallback message and are logged for manual review. The flag must be changeable without a deployment. Store it in your configuration management system, not in code.
A partial containment option is also worth building: a mode where the agent continues to run but routes every output to a human reviewer before it reaches the customer. This is slower than a full kill, but it allows you to keep a degraded service running during lower-severity incidents where a complete shutdown would cause more disruption than the original failure.
Recover
Recovery is the work of understanding root cause, building a fix, testing it, and restoring normal service. For AI agents, root cause analysis often requires reviewing the conversation logs in detail to understand exactly what inputs triggered the wrong output and whether the failure is repeatable. This is the phase where your logging quality determines how long recovery takes.
Before you restore the agent to production after a serious incident, document the change you made and why it addresses the root cause. If you cannot explain why the fix works, the fix may not hold. A restart without a root cause explanation is a repeat incident waiting to happen.
Report
Reporting has three audiences: your customers, your regulator, and your insurer. Each one needs something different, and the order matters. Customers come first, within hours of containment. Regulators come next, within the statutory window. Insurers come last, once you have the documentation in order. We cover all three in the sections below.
The kill switch: what it is and why every production agent needs one
A kill switch is not a feature. It is a safety constraint built into the architecture before the agent goes to production. The test of whether you have a real kill switch is simple: can any on-call engineer pause the agent completely, right now, without waking anyone up, without touching the model provider's system, and without a deployment? If the answer is no, you do not have a kill switch. You have a deployment-dependent off button that will not be available when you need it.
The specification for a minimum viable kill switch is: a configuration flag checked on every request, changeable without a code deployment, accessible to any on-call engineer via a read-write permission in your secrets or config management system, documented in the runbook that on-call engineers read, and tested at least quarterly to confirm it actually works. Testing means setting the flag, sending a request, and confirming the fallback message is returned. Most teams that say they have a kill switch have never tested it.
A more complete implementation adds alerting: if the kill switch is activated, a notification goes to the named agent owner and the on-call lead within five minutes, with a timestamp and the engineer who activated it. This creates the audit trail that your insurer will want to see.
Logging for incident response: what to keep and for how long
Log retention is the part of incident response that operators consistently get wrong, usually by keeping too little. The standard guidance for customer-facing AI agents is to retain full conversation logs, tool call records, and system prompt versions for a minimum of two years. That covers the statute of limitations for most consumer protection claims in EU jurisdictions, the GDPR data subject access request window, and the evidence period most insurers require.
What a good incident log contains: a unique session or conversation ID, timestamps on every message and tool call (including the model's internal latency), the system prompt and version in effect at the time, the full input and output for every turn, any tool calls made and their responses, and any guardrail or filter events triggered. If you are not storing the system prompt version alongside the conversation, you will not be able to reconstruct what the agent was instructed to do at the time of the incident. Prompt changes are often the root cause of unexpected behaviour, and without version logging they are invisible.
Store logs in an append-only system. Logs that can be modified after the fact are not useful for defence and are not accepted by insurers as audit telemetry. Append-only cloud storage with immutable object locks is the standard approach. The cost is low. The alternative, needing logs that do not exist during a live claim, is not.
Customer communication after an AI agent mistake
The Air Canada case is the clearest precedent for what happens when an organisation communicates badly after an AI agent mistake. Air Canada's defence relied in part on its website disclaimer stating that the chatbot's information should not be relied upon. The tribunal rejected this, noting that the chatbot itself had not directed the customer to the correct policy page and had instead provided inaccurate information directly. The disclaimer did not protect the business. The quality and speed of communication after the incident was then used as evidence in the customer's favour.
The principle for SME operators is straightforward: treat an AI agent mistake as you would treat a mistake by a member of staff. Acknowledge it promptly, correct the record, and make the customer whole where they have suffered a loss. A template for the initial customer message should be prepared before any incident occurs, because writing one under pressure, with a frustrated customer on the phone, is where organisations create additional liability.
A good template covers: acknowledgement that the agent provided incorrect information, a clear statement of what the correct information is, the action the business is taking to address any loss the customer suffered, and a contact point for follow-up. The message should be sent within two hours of containment for customer-facing incidents. It should not include any admission of fault on specific legal grounds, speculate about cause, or offer compensation before you have assessed the loss. That language belongs in a follow-up message drafted with legal input.
Document every customer communication that goes out during and after an incident, with timestamps. This forms part of the insurance claim file.
Regulatory notification obligations
Two regulatory frameworks create notification obligations that may apply to SME operators after an AI agent incident.
GDPR Article 33 requires that a personal data breach be notified to the supervisory authority within 72 hours of the controller becoming aware of it. An AI agent incident that involves the agent accessing, disclosing, or mishandling personal data is likely to qualify as a personal data breach. The 72-hour clock starts when your incident record is opened, not when the investigation concludes. If you miss the window, you can still notify late and explain the delay, but a missed notification is an aggravating factor in any subsequent regulatory action.
EU AI Act Article 26(5) requires deployers of high-risk AI systems to report serious incidents to the provider and, where applicable, to market surveillance authorities without undue delay. The definition of serious incident in Article 3(49) covers death, serious harm to health, significant irreversible disruption of critical infrastructure, or serious damage to property. Most SME AI agents do not fall into the high-risk category under the Act, and a hallucinated refund policy is unlikely to meet the serious incident threshold. But if you are deploying an agent in a high-risk category (credit, employment, education, essential services), the notification obligation applies and you should have a legal review of your specific deployment before go-live, not after an incident.
For non-high-risk deployments, the same incident response discipline is best practice even without a regulatory mandate. The documentation you produce will be required by your insurer regardless of whether a regulator asks for it. See our Why It Matters page for the full regulatory context and timeline.
Preparing the insurance claim: what documents you will need
Dedicated AI agent liability policies are being built by specialty carriers including Munich Re aiSure and AIUC (AI Underwriting Consortium). Both require specific documentation before settling claims. Operators who do not have this documentation at incident time cannot reconstruct it afterwards, and claims face delays or partial settlement as a result.
The documentation set that a well-prepared claim file contains is: the scope document for the agent (version and effective date), the incident record (time opened, reported cause, detection source), the full conversation or action log covering the period of the incident, the system prompt version in effect at the time, the kill switch activation record (time, engineer, timestamp), the root cause analysis, a record of every customer communication sent after the incident with timestamps, and the regulatory notification record if applicable.
If your insurer's policy predates dedicated AI coverage, the same documentation set is what you will need to establish that the incident falls within (or outside) the existing policy's scope. A written request to your broker confirming how each policy responds to AI agent claims should be sent before you need to make one. The process for doing that is set out on our Get Covered page.
The certification pathway at agentcertified.eu is designed to produce exactly this documentation set as a byproduct of the certification process. An operator who completes certification has, at the same time, built the evidentiary foundation for a well-documented insurance claim. That is by design, because the certification methodology was built in consultation with the reinsurance market.
Mata v. Avianca (2023) is instructive here in a different way. In that case, attorneys used ChatGPT to research case law and cited six non-existent cases in court filings. Judge Castel of the US Southern District of New York imposed sanctions on the attorneys involved. The case established that professionals cannot outsource due diligence to AI tools and then disclaim responsibility for the outputs. The parallel for operators is that the decision to deploy an agent, and the governance around it, is not something you can outsource to the vendor. The documentation of that governance is the operator's responsibility.
Frequently asked questions
What counts as an AI agent incident for regulatory purposes?
Under EU AI Act Article 3(49), a serious incident involves death, serious harm to health, significant irreversible disruption of critical infrastructure, or serious damage to property. For non-high-risk agents there is no statutory definition, but any output causing customer loss, a contractual dispute, or a GDPR-notifiable data breach should be treated as an incident and documented accordingly.
How quickly must I notify the regulator after an AI agent incident?
If the incident involves a personal data breach, GDPR Article 33 requires supervisory authority notification within 72 hours of becoming aware. For high-risk AI systems under EU AI Act Article 26(5), deployers must notify the provider and, where applicable, the market surveillance authority without undue delay. For non-high-risk deployments, regulatory notification may not be mandatory, but your incident log should record the decision and the reasoning either way.
Can I be held liable for what my AI agent says to a customer?
Yes. Moffatt v. Air Canada (2024) confirmed that an organisation is bound by misinformation provided by its chatbot and cannot disclaim responsibility by calling the agent a separate entity. Statements your agent makes carry the same weight as statements from a human employee.
What documentation do insurers require before settling an AI agent claim?
Munich Re aiSure and AIUC require audit telemetry and incident logs before settling claims. Expect requests for the full conversation or action log, the scope document with version and date, evidence of a named owner, the incident timeline, and records of customer communication sent after the incident. Documentation that does not exist at claim time cannot be reconstructed.
Does every AI agent need a kill switch?
Every production agent that can take external actions, communicate with customers, or modify records needs a kill switch. It should be operable by any on-call engineer without a deployment or senior approval. Test it at least quarterly. An agent without a tested kill switch is not ready for production.