← Notes from the Crossings
× Post-Quantum Security × Hardware × Physical-World Care

The counterfactual accountability problem: when the alternative outcome is unobservable

Holding an AI agent accountable for an outcome requires establishing that the agent's action — or inaction — caused it. Causation requires a counterfactual: what would have happened otherwise? That counterfactual is unobservable. This is not a gap in evidence. It is a structural gap in the accountability framework itself.

Asaptic Labs 2026-06-04 6 min read

Accountability in human institutions is built on a causal premise: the party held responsible must have caused the harm, or must have failed to prevent it when they had the duty and capacity to do so. Negligence law, professional licensing, organizational governance — all require some version of the same question: did the action, or the failure to act, produce the outcome in a way that would not have occurred otherwise? That "otherwise" is the counterfactual. And in most accountability proceedings involving AI agents, the counterfactual is structurally unobservable.

This is not a new problem in philosophy. But it becomes a practical engineering and governance problem the moment AI agents operate in domains where accountability is load-bearing — where the question of who or what is responsible for an outcome has legal, clinical, or security consequences. When those agents are deployed at the three crossings, the counterfactual problem appears in distinct forms that existing frameworks are not designed to handle.

Why causation requires a counterfactual

Consider the simplest case: an AI monitoring agent fails to flag a deteriorating patient. The patient deteriorates. Was the agent responsible? To answer that question with any rigor, you need to know what would have happened if the agent had flagged the situation. Would a caregiver have responded in time? Would they have intervened effectively? Would the patient have recovered? None of these are observable. The world you need to compare against — the one where the agent acted correctly — did not happen.

The same structure appears in the opposite direction. An AI agent flags an anomaly. A caregiver responds. The patient stabilizes. Was the agent responsible for the good outcome? Again, you cannot know without observing the counterfactual: what would have happened had the agent not flagged anything? Perhaps the caregiver would have noticed the anomaly independently. Perhaps the stabilization was unrelated to the intervention. The agent gets credit or blame for an outcome that cannot be causally attributed to it with the precision that accountability requires.

Human accountability frameworks have a longstanding workaround for this: the standard of practice. Rather than asking whether the professional's action caused the outcome, accountability proceedings ask whether the professional's action met the standard that a reasonable professional in the same circumstances would have followed. This substitutes a procedural test for a causal one. It sidesteps the counterfactual by redirecting the question from "did it cause this?" to "did it do what it should have done?"

AI agents do not yet have established standards of practice for most of their deployment domains. When accountability proceedings reach for a causal argument instead, they encounter the counterfactual wall. What emerges is not accountability. It is the appearance of accountability over an unresolvable evidentiary gap.

At the care crossing

The counterfactual problem is sharpest in care settings, where outcomes are clinically complex, confounded by patient condition, and frequently non-recoverable. A care agent that failed to escalate a deteriorating patient cannot be assessed against the counterfactual outcome — what would have happened had escalation occurred — because that timeline does not exist. Retrospective clinical review can form expert opinions about whether escalation would have helped, but expert opinion about an unobserved counterfactual is a weak evidentiary foundation for accountability.

The deeper structural issue is that care settings accumulate many such moments. An agent deployed across many patients over many months generates a statistical record of decisions and outcomes. In aggregate, that record can support comparative analysis: agents with configuration A produce different outcome distributions than agents with configuration B, controlling for patient acuity. Population-level counterfactuals become visible where individual ones are not. But accountability in care settings is typically about specific decisions affecting specific patients, not about aggregate performance. Population data does not resolve individual causal attribution, and courts and licensing boards do not typically operate in population-statistical terms.

At the hardware crossing

In fleet-scale hardware deployment, the counterfactual problem takes a different form. An AI agent monitoring device attestations fails to escalate an anomaly that is later identified as an early signal of a coordinated firmware compromise. Hundreds of devices are eventually affected. Was the agent responsible?

The causal question requires knowing: had the agent escalated the anomaly, would security operations have responded? Would the response have been in time? Would it have been effective against this specific threat vector? Each link in that causal chain is counterfactual. In practice, the anomaly was not escalated, the response did not happen, and the impact cannot be compared against the alternative timeline.

What hardware security teams actually do in these situations is reconstruct intent: they ask whether the agent performed the function it was designed to perform, whether the alert criteria were appropriately configured, and whether the operational processes attached to the agent were sound. This is close to a standard-of-practice test. But it is applied retrospectively to a causal question, and the gap between "the agent performed its designed function" and "the agent caused this outcome" is never fully closed.

At the post-quantum crossing

The post-quantum transition creates a particularly time-extended version of the counterfactual problem. Decisions made during the migration window — which cryptographic primitives to migrate first, in what order, with what validation thresholds — will determine exposure to threats that may not materialize for years. An AI agent that classifies a migration-period validation anomaly as routine noise may be contributing to a vulnerability that is not exploited until after quantum computing capability is available to an adversary. The causal chain spans years, and the counterfactual — what would have happened if the anomaly had been escalated and investigated — becomes impossible to reconstruct across that timeframe.

This extended temporal gap changes the accountability calculus fundamentally. Organizations and governance regimes that use accountability as a learning mechanism — investigate failures, attribute causes, change practices — lose that mechanism when the causal chain is too long to trace. The feedback loop that normally tightens security practice is broken, not by any individual failure, but by the structural inaccessibility of the counterfactual.

What the counterfactual problem requires

The answer is not to abandon causal accountability. It is to design AI agent deployments to proactively generate the evidentiary conditions that make counterfactual reasoning tractable — even if not fully resolvable.

This means recording not just what an agent decided, but what it perceived at the moment of decision: the inputs, the confidence levels, the alternative actions it evaluated, and the thresholds that would have triggered a different response. An agent that logs only its outputs produces an audit trail that cannot support counterfactual reasoning. An agent that logs its decision state — including what would have changed its decision — creates the evidentiary basis for asking: given what the agent knew, what would a different configuration have produced?

It also means designing accountability frameworks that are explicit about when they are using procedural tests rather than causal ones. If the operative question is "did the agent meet the applicable standard of practice?" then that standard needs to exist before the agent is deployed, not constructed retrospectively to fit the outcome under review. Standards of practice for AI agents must be specified in advance, in terms that can be evaluated against the decision-state logs the agent actually produces.

The counterfactual cannot be observed. But it can be made less inaccessible — if the audit architecture is designed, from the beginning, to capture what the alternative would have required.

Key point

Accountability requires causation. Causation requires a counterfactual. When AI agents act in high-stakes domains, the counterfactual — what would have happened had the agent acted differently — is unobservable. Human accountability frameworks sidestep this with standards of practice, but AI agents lack established standards in most deployment domains. The remedy is twofold: build decision-state logging that captures what would have changed the agent's output, and specify standards of practice before deployment so accountability can be evaluated procedurally rather than causally. The counterfactual cannot be observed. But accountability architecture can be designed to make it less inaccessible.

人类机构中的问责制建立在一个因果前提之上:被追责的一方必须已经造成了损害,或者在有义务且有能力阻止时未能阻止。侵权法、职业执照、组织治理——都要求回答同一个问题的某种版本:该行为或不作为,是否以原本不会发生的方式产生了该结果?"原本"就是反事实。而在大多数涉及AI智能体的问责程序中,反事实在结构上是不可观察的。

这在哲学上不是新问题。但当AI智能体在问责具有实质分量的领域运作时——即谁或什么对某一结果负责的问题具有法律、临床或安全后果时——它就成了一个实际的工程和治理问题。当这些智能体部署在三个交叉点时,反事实问题以现有框架无法处理的独特形式出现。

为什么因果关系需要反事实

考虑最简单的情况:一个AI监测智能体未能标记病情恶化的患者。患者病情恶化了。智能体负有责任吗?要严格回答这个问题,需要知道如果智能体标记了这种情况会发生什么。护理人员会及时响应吗?他们的干预会有效吗?患者会康复吗?这些都无法观察。你需要与之比较的那个世界——智能体正确行动的那个世界——并未发生。

相反方向也呈现出同样的结构。一个AI智能体标记了一个异常。护理人员做出了响应。患者病情稳定了。智能体对这一良好结果负有责任吗?同样,不知道反事实就无法判断:如果智能体什么都没有标记会发生什么?也许护理人员本来也会独立注意到异常。也许病情稳定与干预无关。智能体因一个无法以问责所要求的精度归因于它的结果而获得功劳或受到谴责。

人类问责框架对此有一个由来已久的解决方案:实践标准。问责程序不是问专业人员的行为是否导致了结果,而是问该行为是否符合同等情况下合理专业人员所应遵循的标准。这用程序性检验代替了因果检验,将问题从"它是否导致了这个?"重定向为"它是否做了应该做的?",从而绕开了反事实。

AI智能体在其大多数部署领域中还没有确立的实践标准。当问责程序转而寻求因果论证时,它们就遇到了反事实之墙。结果不是问责,而是在无法解决的证据缺口上营造问责的表象。

在照护交叉点

在照护场景中,反事实问题最为尖锐,因为其结果在临床上错综复杂,受患者状况干扰,且往往无法恢复。一个未能上报病情恶化患者的照护智能体,无法针对反事实结果进行评估——如果上报发生了会有什么结果——因为那条时间线不存在。回顾性临床审查可以形成关于上报是否有帮助的专家意见,但关于未观察反事实的专家意见是一个薄弱的问责证据基础。

更深层的结构性问题是,照护场景积累了许多这样的时刻。在长期部署许多患者的智能体会生成决策和结果的统计记录。在总体层面上,该记录可以支持比较分析:在控制患者严重程度的情况下,配置A的智能体产生的结果分布与配置B的不同。群体层面的反事实在个体反事实不可见之处变得可见。但照护场景中的问责通常关于影响特定患者的具体决策,而非整体绩效。群体数据无法解决个体因果归因,法院和执照委员会通常也不以群体统计学方式运作。

在硬件交叉点

在大规模硬件部署中,反事实问题呈现出不同形式。一个监测设备认证的AI智能体未能上报一个后来被识别为协调固件攻击早期信号的异常。数百台设备最终受到影响。智能体负有责任吗?

因果问题需要知道:如果智能体上报了异常,安全运营会响应吗?响应会及时吗?对这个特定威胁向量有效吗?该因果链中的每个环节都是反事实的。实际上,异常未被上报,响应未发生,损失无法与替代时间线进行比较。

硬件安全团队在这种情况下实际做的是重建意图:他们询问智能体是否执行了其设计功能、警报标准是否配置得当,以及附属于智能体的操作流程是否合理。这接近于实践标准检验。但它被追溯性地应用于一个因果问题,"智能体执行了其设计功能"与"智能体导致了这一结果"之间的差距从未完全弥合。

在后量子交叉点

后量子过渡创造了反事实问题的一个特别延伸的时间版本。迁移窗口期间做出的决策——首先迁移哪些密码原语、以什么顺序、使用什么验证阈值——将决定对可能在数年后才实现的威胁的暴露程度。一个将迁移期间的验证异常归类为常规噪声的AI智能体,可能正在助长一个在量子计算能力对对手可用之后才被利用的漏洞。因果链跨越数年,而反事实——如果异常被上报并调查会发生什么——在这个时间跨度内变得无法重建。

这种延伸的时间差距从根本上改变了问责核算。使用问责作为学习机制的组织和治理机制——调查失败、归因、改变实践——在因果链太长无法追溯时失去了这一机制。通常能收紧安全实践的反馈回路被打断了,不是因为任何单一失败,而是因为反事实在结构上无法获取。

反事实问题的要求

答案不是放弃因果问责,而是设计AI智能体部署,主动生成使反事实推理可处理的证据条件——即使不能完全解决。

这意味着不仅要记录智能体的决策,还要记录决策时刻它感知到的内容:输入、置信水平、它评估的替代行动,以及触发不同响应的阈值。一个只记录输出的智能体产生的审计追踪无法支持反事实推理。一个记录其决策状态的智能体——包括什么会改变其决策——创造了提问的证据基础:鉴于智能体所知道的,不同的配置会产生什么结果?

这也意味着设计明确说明何时使用程序性检验而非因果检验的问责框架。如果核心问题是"智能体是否符合适用的实践标准?"那么该标准需要在智能体部署之前存在,而不是事后构建以适应被审查的结果。AI智能体的实践标准必须提前指定,以可以针对智能体实际产生的决策状态日志进行评估的方式表述。

反事实无法被观察。但它可以变得不那么难以触及——如果审计架构从一开始就被设计为捕捉替代方案所需要的内容。

核心要点

问责需要因果关系。因果关系需要反事实。当AI智能体在高风险领域行动时,反事实——如果智能体采取不同行动会发生什么——是不可观察的。人类问责框架通过实践标准绕开了这一问题,但AI智能体在大多数部署领域缺乏既定标准。解决方案是双重的:构建捕捉什么会改变智能体输出的决策状态日志,并在部署前指定实践标准,使问责能够以程序性而非因果性方式评估。反事实无法被观察,但问责架构可以被设计为使其不那么难以触及。

人類機構中的問責制建立在一個因果前提之上:被追責的一方必須已經造成了損害,或者在有義務且有能力阻止時未能阻止。侵權法、職業執照、組織治理——都要求回答同一個問題的某種版本:該行為或不作為,是否以原本不會發生的方式產生了該結果?「原本」就是反事實。而在大多數涉及AI智能體的問責程序中,反事實在結構上是不可觀察的。

這在哲學上不是新問題。但當AI智能體在問責具有實質分量的領域運作時——即誰或什麼對某一結果負責的問題具有法律、臨床或安全後果時——它就成了一個實際的工程和治理問題。當這些智能體部署在三個交叉點時,反事實問題以現有框架無法處理的獨特形式出現。

為什麼因果關係需要反事實

考慮最簡單的情況:一個AI監測智能體未能標記病情惡化的病人。病人病情惡化了。智能體負有責任嗎?要嚴格回答這個問題,需要知道如果智能體標記了這種情況會發生什麼。護理人員會及時響應嗎?他們的干預會有效嗎?病人會康復嗎?這些都無法觀察。你需要與之比較的那個世界——智能體正確行動的那個世界——並未發生。

相反方向也呈現出同樣的結構。一個AI智能體標記了一個異常。護理人員做出了響應。病人病情穩定了。智能體對這一良好結果負有責任嗎?同樣,不知道反事實就無法判斷:如果智能體什麼都沒有標記會發生什麼?也許護理人員本來也會獨立注意到異常。也許病情穩定與干預無關。智能體因一個無法以問責所要求的精度歸因於它的結果而獲得功勞或受到譴責。

人類問責框架對此有一個由來已久的解決方案:實踐標準。問責程序不是問專業人員的行為是否導致了結果,而是問該行為是否符合同等情況下合理專業人員所應遵循的標準。這用程序性檢驗代替了因果檢驗,將問題從「它是否導致了這個?」重新導向為「它是否做了應該做的?」,從而繞開了反事實。

AI智能體在其大多數部署領域中還沒有確立的實踐標準。當問責程序轉而尋求因果論證時,它們就遇到了反事實之牆。結果不是問責,而是在無法解決的證據缺口上營造問責的表象。

在照護交叉點

在照護場景中,反事實問題最為尖銳,因為其結果在臨床上錯綜複雜,受病人狀況干擾,且往往無法恢復。一個未能上報病情惡化病人的照護智能體,無法針對反事實結果進行評估——如果上報發生了會有什麼結果——因為那條時間線不存在。回顧性臨床審查可以形成關於上報是否有幫助的專家意見,但關於未觀察反事實的專家意見是一個薄弱的問責證據基礎。

更深層的結構性問題是,照護場景積累了許多這樣的時刻。長期部署許多病人的智能體會生成決策和結果的統計記錄。在總體層面上,該記錄可以支持比較分析:在控制病人嚴重程度的情況下,配置A的智能體產生的結果分佈與配置B的不同。群體層面的反事實在個體反事實不可見之處變得可見。但照護場景中的問責通常關於影響特定病人的具體決策,而非整體績效。群體數據無法解決個體因果歸因,法院和執照委員會通常也不以群體統計學方式運作。

在硬件交叉點

在大規模硬件部署中,反事實問題呈現出不同形式。一個監測設備認證的AI智能體未能上報一個後來被識別為協調韌體攻擊早期信號的異常。數百台設備最終受到影響。智能體負有責任嗎?

因果問題需要知道:如果智能體上報了異常,安全運營會響應嗎?響應會及時嗎?對這個特定威脅向量有效嗎?該因果鏈中的每個環節都是反事實的。實際上,異常未被上報,響應未發生,損失無法與替代時間線進行比較。

硬件安全團隊在這種情況下實際做的是重建意圖:他們詢問智能體是否執行了其設計功能、警報標準是否配置得當,以及附屬於智能體的操作流程是否合理。這接近於實踐標準檢驗。但它被追溯性地應用於一個因果問題,「智能體執行了其設計功能」與「智能體導致了這一結果」之間的差距從未完全彌合。

在後量子交叉點

後量子過渡創造了反事實問題的一個特別延伸的時間版本。遷移窗口期間做出的決策——首先遷移哪些密碼原語、以什麼順序、使用什麼驗證閾值——將決定對可能在數年後才實現的威脅的暴露程度。一個將遷移期間的驗證異常歸類為常規噪聲的AI智能體,可能正在助長一個在量子計算能力對對手可用之後才被利用的漏洞。因果鏈跨越數年,而反事實——如果異常被上報並調查會發生什麼——在這個時間跨度內變得無法重建。

這種延伸的時間差距從根本上改變了問責核算。使用問責作為學習機制的組織和治理機制——調查失敗、歸因、改變實踐——在因果鏈太長無法追溯時失去了這一機制。通常能收緊安全實踐的反饋迴路被打斷了,不是因為任何單一失敗,而是因為反事實在結構上無法獲取。

反事實問題的要求

答案不是放棄因果問責,而是設計AI智能體部署,主動生成使反事實推理可處理的證據條件——即使不能完全解決。

這意味著不僅要記錄智能體的決策,還要記錄決策時刻它感知到的內容:輸入、置信水平、它評估的替代行動,以及觸發不同響應的閾值。一個只記錄輸出的智能體產生的審計追蹤無法支持反事實推理。一個記錄其決策狀態的智能體——包括什麼會改變其決策——創造了提問的證據基礎:鑑於智能體所知道的,不同的配置會產生什麼結果?

這也意味著設計明確說明何時使用程序性檢驗而非因果檢驗的問責框架。如果核心問題是「智能體是否符合適用的實踐標準?」那麼該標準需要在智能體部署之前存在,而不是事後構建以適應被審查的結果。AI智能體的實踐標準必須提前指定,以可以針對智能體實際產生的決策狀態日誌進行評估的方式表述。

反事實無法被觀察。但它可以變得不那麼難以觸及——如果審計架構從一開始就被設計為捕捉替代方案所需要的內容。

核心要點

問責需要因果關係。因果關係需要反事實。當AI智能體在高風險領域行動時,反事實——如果智能體採取不同行動會發生什麼——是不可觀察的。人類問責框架通過實踐標準繞開了這一問題,但AI智能體在大多數部署領域缺乏既定標準。解決方案是雙重的:構建捕捉什麼會改變智能體輸出的決策狀態日誌,並在部署前指定實踐標準,使問責能夠以程序性而非因果性方式評估。反事實無法被觀察,但問責架構可以被設計為使其不那麼難以觸及。