The update authority problem: accountability when an AI agent is patched
AI agents change. Their weights are updated, parameters tuned, operational logic revised. These changes are typically treated as software releases. But in domains where agent behavior is consequential — where care recipients rely on its guidance, where hardware trusts its decisions, where cryptographic infrastructure accepts its attestations — an update is not merely a software event. It is a change to what the agent will do. And the accountability question is not only whether the change was correct. It is who had the authority to authorize it, and what obligations exist toward those who relied on the previous version.
Software is updated constantly. Security patches close vulnerabilities. Bug fixes correct misbehavior. Feature releases change capabilities. In most software contexts, this update cadence is an unambiguous good: the new version is better, and deploying it is a straightforward technical act that requires no special accountability infrastructure beyond a version number and a change log.
AI agents do not fit this model. When an agent's weights are updated, its behavior changes — not in the legible, enumerable way that a software patch changes named functions, but in ways that are distributed across the model and cannot be fully audited by reading a diff. The agent that existed before the update and the agent that exists after it are, in a meaningful sense, different agents. They may produce different outputs on the same input. They may interpret identical situations differently. They may agree or disagree with each other on decisions that previously carried implicit authority.
This is the update authority problem: who has the right to change an AI agent, under what constraints, and what accountability is owed to those who established trust with the version that is now gone.
Why the software model is insufficient
The standard software update framework assigns authority to developers, operators, and administrators. It assumes that updates are improvements, that users benefit from them, and that the appropriate response to a change in software is to notify users and proceed. Consent is typically opt-out at best: the system will update unless the user takes action to prevent it.
This model fails when the agent occupies a consequential role. An enterprise IT system patched overnight may behave differently the next morning, but the difference is unlikely to affect anyone's care, security posture, or safety. An AI agent deployed in a domain where its decisions carry real-world weight cannot be updated on the same terms. The change is not cosmetic. The agent's principals — whether patients, security administrators, or hardware operators — established a relationship with a specific version of that agent's behavior. Changing the behavior without explicit authority and notification is not an update. It is a replacement that the principal did not authorize.
The PQ crossing: attested identity after a weight change
At the post-quantum crossing, an AI agent may be involved in cryptographic operations: generating, validating, or attesting to keys and signatures. Its behavior in these operations is part of what downstream systems trust. If the agent's weights are updated — even for entirely benign reasons, such as improving its reasoning in adjacent domains — there is no guarantee that its behavior in cryptographic operations is unchanged. The updated agent may attest differently, accept inputs it previously rejected, or reject inputs it previously accepted.
The accountability gap is specific: the agent's cryptographic identity may not change across the update, even though the agent's behavior has. A downstream system that trusts the agent's attestation has no way of knowing that the entity producing that attestation is materially different from the one whose behavior was originally vetted. In a post-quantum transition, where the trust assumptions of the entire cryptographic stack are under active revision, a quietly changed agent is a quietly changed trust anchor.
Update authority at this crossing requires an explicit attestation of behavioral continuity — not just a version bump, but a documented, verifiable claim that the agent's behavior in cryptographic operations has been tested and confirmed to fall within the bounds previously accepted. Absent that attestation, the update invalidates the original trust.
The hardware crossing: firmware policy and silent behavioral drift
At the hardware crossing, an AI agent may govern or interpret the outputs of hardware devices — deciding which firmware states are acceptable, which attestations are valid, which sensor readings warrant an alert. These are policy decisions embedded in the agent's behavior. When the agent is updated, those policies may change without any explicit announcement that they have changed.
The difficulty is that hardware governance policies are typically not extracted and listed separately from the agent's general behavior. They are emergent from the model's training, not enumerated in a configuration file. This means that an operator who has certified the agent's hardware governance decisions cannot re-run that certification against the updated agent without a full re-evaluation. In practice, this rarely happens. The update is deployed; the hardware governance policies shift; the certification no longer reflects what the agent actually does.
Update authority at this crossing requires that hardware governance decisions be treated as a separately auditable surface. An update that changes the agent's hardware policies must carry explicit documentation of what changed and why — and must trigger re-certification before the updated agent is trusted to govern hardware with safety or security implications.
The care crossing: reliance, consent, and the agent a patient knew
At the physical-world care crossing, the update authority problem has the most immediate human consequences. A care recipient who has come to rely on an AI agent — who has built expectations about its behavior, calibrated their own responses to its guidance, and consented to an ongoing relationship with it — has established trust with a specific version of that agent. That trust was not given to the agent's developers in perpetuity; it was given to the agent as it behaved at the time of consent.
When the agent is updated, the care recipient is not, in any meaningful sense, the same party to the same relationship. The agent they consented to no longer exists. They may be unaware that an update occurred. Even if notified, they may have no way of understanding what changed in behavioral terms. Their ability to withdraw consent, to seek a second opinion, or to calibrate their reliance on updated guidance is constrained by the same factors that constrained their original engagement with the agent: the conditions that led them to AI-mediated care in the first place.
Update authority at this crossing requires treating updates as consent events. A material change to an agent's behavior in a care relationship requires notification in a form the care recipient can understand, an opportunity to confirm continued consent, and — in cases of significant behavioral change — the right to remain on the prior version until they have been able to make an informed decision.
What update authority requires across all three crossings
The common thread is that update authority cannot be treated as a purely internal governance matter. Three requirements follow.
First, updates to AI agents in consequential roles must carry a behavioral impact assessment — a documented, testable claim about what the update changes and what it leaves unchanged, at a granularity sufficient for the principals who relied on the prior version to understand whether their reliance remains valid.
Second, the authority to approve an update must be proportionate to the scope of the behavioral change. A change that affects cryptographic attestation, hardware policy decisions, or care guidance requires sign-off from stakeholders beyond the development team — including, where appropriate, the principals whose reliance is most affected.
Third, the audit trail for an updated agent must be continuous across updates. An agent whose accountability record resets at each version boundary is an agent that cannot be held accountable for the full arc of its deployment. The audit trail must carry forward, with each update documented as a named change event rather than a replacement of the record.
Treating updates as software releases works for software. AI agents deployed in domains where their decisions have weight are not software in the sense that release processes were designed for. The update authority problem is the recognition that changing such an agent is an act that requires its own accountability structure — not a lighter version of the agent's deployment governance, but an equal one.
When an AI agent's weights change, the agent changes — not in a legible diff-readable way, but in ways distributed across the model and invisible to those who relied on the prior version. At the post-quantum crossing this breaks attested identity; at the hardware crossing it silently shifts firmware governance policy; at the care crossing it replaces the agent a patient consented to without their knowledge. Update authority must be proportionate to consequence: behavioral impact assessments, stakeholder sign-off beyond the development team, and an audit trail that is continuous across version boundaries rather than reset at each release.
软件在不断更新。安全补丁修复漏洞,错误修复纠正异常行为,功能发布改变能力。在大多数软件场景中,这种更新节奏是一种明确的进步:新版本更好,部署它是一种直接的技术行为,除版本号和变更日志外无需特殊的问责基础设施。
AI智能体不符合这种模型。当智能体的参数被更新时,其行为会改变——不是像软件补丁那样以可读、可列举的方式改变具名函数,而是以分布在模型中且无法通过阅读差异文件完全审计的方式改变。更新前存在的智能体与更新后存在的智能体,在有意义的层面上是不同的智能体。它们可能对相同的输入产生不同的输出,可能对相同情况进行不同解读,可能在之前具有隐性权威的决策上存在分歧。
这就是更新授权问题:谁有权更改AI智能体、在什么约束下更改,以及对那些与已消失版本建立信任关系的人负有什么问责义务。
为何软件模型不够充分
标准软件更新框架将权限赋予开发人员、运营者和管理员。它假设更新是改进,用户从中受益,对软件变更的适当回应是通知用户并继续进行。同意通常最多是选择退出:除非用户采取行动阻止,否则系统将自动更新。
当智能体占据重要角色时,这种模型就会失效。夜间修补的企业IT系统第二天早上可能表现不同,但这种差异不太可能影响任何人的护理、安全态势或安全性。部署在其决策具有现实影响力的领域中的AI智能体,不能以同样的条款更新。这种变化不是表面的。智能体的委托人——无论是患者、安全管理员还是硬件操作员——与该智能体特定版本的行为建立了关系。在没有明确权限和通知的情况下更改行为,不是更新,而是委托人未授权的替换。
后量子交叉点:权重更改后的认证身份
在后量子交叉点,AI智能体可能参与密码学操作:生成、验证或证明密钥和签名。它在这些操作中的行为是下游系统信任的一部分。如果智能体的权重被更新——即使出于完全良性的原因,例如改善其在相邻领域的推理——也无法保证其在密码学操作中的行为保持不变。更新后的智能体可能以不同方式证明,可能接受之前拒绝的输入,或拒绝之前接受的输入。
问责差距是具体的:智能体的密码学身份可能在更新过程中不会改变,即使智能体的行为已经改变。信任智能体证明的下游系统无法知道产生该证明的实体与最初经过审查的实体存在实质性差异。在后量子过渡期间,整个密码学栈的信任假设都在积极修订中,悄然改变的智能体就是悄然改变的信任锚。
在这个交叉点上的更新授权需要明确的行为连续性证明——不仅仅是版本号的更新,而是一个有文档记录的、可验证的声明,表明智能体在密码学操作中的行为已经过测试,并确认在之前接受的范围内。缺乏该证明,更新就会使原始信任失效。
硬件交叉点:固件策略与静默行为漂移
在硬件交叉点,AI智能体可能管理或解释硬件设备的输出——决定哪些固件状态是可接受的,哪些证明是有效的,哪些传感器读数值得发出警报。这些是嵌入在智能体行为中的策略决策。当智能体被更新时,这些策略可能在没有任何明确公告的情况下改变。
困难在于硬件治理策略通常不会从智能体的一般行为中单独提取和列出。它们是从模型的训练中涌现出来的,而不是在配置文件中列举的。这意味着已认证智能体硬件治理决策的操作员,如果不对更新后的智能体进行全面重新评估,就无法针对其重新进行认证。在实践中,这很少发生。更新被部署;硬件治理策略发生变化;认证不再反映智能体实际所做的事情。
在这个交叉点上的更新授权要求将硬件治理决策视为可单独审计的表面。改变智能体硬件策略的更新必须附带明确的变更文档,说明改变了什么以及原因——并且必须在更新后的智能体被信任管理具有安全或保障影响的硬件之前触发重新认证。
护理交叉点:依赖、同意与患者所知的智能体
在物理世界护理交叉点,更新授权问题具有最直接的人类后果。依赖AI智能体的护理对象——已对其行为建立期望、根据其指导校准了自身回应、并同意与其持续关系的人——与该智能体特定版本的行为建立了信任。这种信任不是永久赋予智能体开发者的;它是在同意时向智能体的行为方式赋予的。
当智能体被更新时,护理对象在任何有意义的意义上都不再是与同一关系的同一方。他们同意的智能体已不再存在。他们可能不知道更新发生了。即使被通知,他们也可能无法从行为层面理解发生了什么变化。他们撤回同意、寻求第二意见或校准其对更新指导依赖的能力,受到与最初参与智能体时相同因素的约束:首先导致他们寻求AI辅助护理的条件。
在这个交叉点上的更新授权要求将更新视为同意事件。对护理关系中智能体行为的实质性变更需要以护理对象能够理解的形式进行通知,提供确认继续同意的机会——在行为变化重大的情况下,还需要提供在他们能够做出知情决定之前保留在先前版本上的权利。
三个交叉点上更新授权的要求
共同主线是更新授权不能被视为纯粹的内部治理事项。由此产生三个要求。
首先,在重要角色中AI智能体的更新必须附带行为影响评估——一个有文档记录的、可测试的声明,说明更新改变了什么、保留了什么,其粒度足以让依赖先前版本的委托人理解其依赖是否仍然有效。
其次,批准更新的权限必须与行为变化的范围相称。影响密码学证明、硬件策略决策或护理指导的变更需要开发团队以外利益相关者的签字同意——在适当情况下,包括依赖受影响最大的委托人。
第三,更新后智能体的审计跟踪必须在更新之间保持连续。在每个版本边界重置问责记录的智能体,是无法对其整个部署弧线负责的智能体。审计跟踪必须向前延续,每次更新都记录为命名的变更事件,而不是记录的替换。
将更新视为软件发布,对软件有效。部署在其决策具有分量的领域中的AI智能体,不是发布流程设计时所针对的软件。更新授权问题的核心认识是:更改此类智能体是一种需要其自身问责结构的行为——不是智能体部署治理的更轻版本,而是同等级别的治理。
当AI智能体的权重改变时,智能体也随之改变——不是以可读的差异方式,而是以分布在模型中且对依赖先前版本的人不可见的方式。在后量子交叉点,这会破坏认证身份;在硬件交叉点,会悄然改变固件治理策略;在护理交叉点,会在患者不知情的情况下替换他们同意的智能体。更新授权必须与后果相称:行为影响评估、开发团队以外利益相关者的签字同意,以及跨版本边界连续而非在每次发布时重置的审计跟踪。
軟體在不斷更新。安全補丁修復漏洞,錯誤修復糾正異常行為,功能發布改變能力。在大多數軟體場景中,這種更新節奏是一種明確的進步:新版本更好,部署它是一種直接的技術行為,除版本號和變更日誌外無需特殊的問責基礎設施。
AI智能體不符合這種模型。當智能體的參數被更新時,其行為會改變——不是像軟體補丁那樣以可讀、可列舉的方式改變具名函數,而是以分布在模型中且無法透過閱讀差異檔案完全審計的方式改變。更新前存在的智能體與更新後存在的智能體,在有意義的層面上是不同的智能體。它們可能對相同的輸入產生不同的輸出,可能對相同情況進行不同解讀,可能在之前具有隱性權威的決策上存在分歧。
這就是更新授權問題:誰有權更改AI智能體、在什麼約束下更改,以及對那些與已消失版本建立信任關係的人負有什麼問責義務。
為何軟體模型不夠充分
標準軟體更新框架將權限賦予開發人員、運營者和管理員。它假設更新是改進,用戶從中受益,對軟體變更的適當回應是通知用戶並繼續進行。同意通常最多是選擇退出:除非用戶採取行動阻止,否則系統將自動更新。
當智能體佔據重要角色時,這種模型就會失效。夜間修補的企業IT系統第二天早上可能表現不同,但這種差異不太可能影響任何人的護理、安全態勢或安全性。部署在其決策具有現實影響力的領域中的AI智能體,不能以同樣的條款更新。這種變化不是表面的。智能體的委託人——無論是患者、安全管理員還是硬體操作員——與該智能體特定版本的行為建立了關係。在沒有明確權限和通知的情況下更改行為,不是更新,而是委託人未授權的替換。
後量子交叉點:權重更改後的認證身份
在後量子交叉點,AI智能體可能參與密碼學操作:生成、驗證或證明金鑰和簽名。它在這些操作中的行為是下游系統信任的一部分。如果智能體的權重被更新——即使出於完全良性的原因,例如改善其在相鄰領域的推理——也無法保證其在密碼學操作中的行為保持不變。更新後的智能體可能以不同方式證明,可能接受之前拒絕的輸入,或拒絕之前接受的輸入。
問責差距是具體的:智能體的密碼學身份可能在更新過程中不會改變,即使智能體的行為已經改變。信任智能體證明的下游系統無法知道產生該證明的實體與最初經過審查的實體存在實質性差異。在後量子過渡期間,整個密碼學棧的信任假設都在積極修訂中,悄然改變的智能體就是悄然改變的信任錨。
在這個交叉點上的更新授權需要明確的行為連續性證明——不僅僅是版本號的更新,而是一個有文件記錄的、可驗證的聲明,表明智能體在密碼學操作中的行為已經過測試,並確認在之前接受的範圍內。缺乏該證明,更新就會使原始信任失效。
硬體交叉點:韌體策略與靜默行為漂移
在硬體交叉點,AI智能體可能管理或解釋硬體裝置的輸出——決定哪些韌體狀態是可接受的,哪些認證是有效的,哪些感測器讀數值得發出警報。這些是嵌入在智能體行為中的策略決策。當智能體被更新時,這些策略可能在沒有任何明確公告的情況下改變。
困難在於硬體治理策略通常不會從智能體的一般行為中單獨提取和列出。它們是從模型的訓練中湧現出來的,而不是在配置檔案中列舉的。這意味著已認證智能體硬體治理決策的操作員,如果不對更新後的智能體進行全面重新評估,就無法針對其重新進行認證。在實踐中,這很少發生。更新被部署;硬體治理策略發生變化;認證不再反映智能體實際所做的事情。
在這個交叉點上的更新授權要求將硬體治理決策視為可單獨審計的表面。改變智能體硬體策略的更新必須附帶明確的變更文件,說明改變了什麼以及原因——並且必須在更新後的智能體被信任管理具有安全或保障影響的硬體之前觸發重新認證。
護理交叉點:依賴、同意與患者所知的智能體
在物理世界護理交叉點,更新授權問題具有最直接的人類後果。依賴AI智能體的護理對象——已對其行為建立期望、根據其指導校準了自身回應、並同意與其持續關係的人——與該智能體特定版本的行為建立了信任。這種信任不是永久賦予智能體開發者的;它是在同意時向智能體的行為方式賦予的。
當智能體被更新時,護理對象在任何有意義的意義上都不再是與同一關係的同一方。他們同意的智能體已不再存在。他們可能不知道更新發生了。即使被通知,他們也可能無法從行為層面理解發生了什麼變化。他們撤回同意、尋求第二意見或校準其對更新指導依賴的能力,受到與最初參與智能體時相同因素的約束:首先導致他們尋求AI輔助護理的條件。
在這個交叉點上的更新授權要求將更新視為同意事件。對護理關係中智能體行為的實質性變更需要以護理對象能夠理解的形式進行通知,提供確認繼續同意的機會——在行為變化重大的情況下,還需要提供在他們能夠做出知情決定之前保留在先前版本上的權利。
三個交叉點上更新授權的要求
共同主線是更新授權不能被視為純粹的內部治理事項。由此產生三個要求。
首先,在重要角色中AI智能體的更新必須附帶行為影響評估——一個有文件記錄的、可測試的聲明,說明更新改變了什麼、保留了什麼,其粒度足以讓依賴先前版本的委託人理解其依賴是否仍然有效。
其次,批准更新的權限必須與行為變化的範圍相稱。影響密碼學認證、硬體策略決策或護理指導的變更需要開發團隊以外利益相關者的簽字同意——在適當情況下,包括依賴受影響最大的委託人。
第三,更新後智能體的審計跟蹤必須在更新之間保持連續。在每個版本邊界重置問責記錄的智能體,是無法對其整個部署弧線負責的智能體。審計跟蹤必須向前延續,每次更新都記錄為命名的變更事件,而不是記錄的替換。
將更新視為軟體發布,對軟體有效。部署在其決策具有分量的領域中的AI智能體,不是發布流程設計時所針對的軟體。更新授權問題的核心認識是:更改此類智能體是一種需要其自身問責結構的行為——不是智能體部署治理的更輕版本,而是同等級別的治理。
當AI智能體的權重改變時,智能體也隨之改變——不是以可讀的差異方式,而是以分布在模型中且對依賴先前版本的人不可見的方式。在後量子交叉點,這會破壞認證身份;在硬體交叉點,會悄然改變韌體治理策略;在護理交叉點,會在患者不知情的情況下替換他們同意的智能體。更新授權必須與後果相稱:行為影響評估、開發團隊以外利益相關者的簽字同意,以及跨版本邊界連續而非在每次發布時重置的審計跟蹤。