← Notes from the Crossings
× SYSTEMS ENGINEERING

Building Fault Tolerance into Unmanned Rotorcraft

2026-06-10 6 min read

A helicopter with a failed swashplate servo offers no graceful degradation window. The physics of rotary-wing flight — continuous high-frequency control authority over collective and cyclic pitch — means that the interval between a single-point failure and an unrecoverable attitude is measured in milliseconds, not seconds. For unmanned rotorcraft operating beyond visual line of sight, where no pilot reflex is available, redundancy is not a design enhancement. It is the minimum viable architecture.

Understanding what redundancy actually means in this context requires distinguishing three separate layers: the computational layer where flight control logic executes, the sensing layer where state is measured, and the actuation and power layer where commands become physical forces. A well-designed system is redundant at all three. A system that is redundant at only one layer has a false sense of its own robustness — it has eliminated one category of failure while leaving the others as uncovered single points.

At the computational layer, the dominant pattern is a dual or triple-redundant flight controller arrangement. In a dual-redundant configuration, two independent processors execute the same flight control algorithm in parallel and continuously compare their outputs. On divergence above a defined threshold, the monitoring controller can either take over as primary or command a predefined safe state. In a triple-redundant architecture, the system applies a majority voting rule: if two controllers agree and one dissents, the dissenting output is masked. Triple redundancy allows the system to remain fully operational through a single controller failure rather than merely degrading gracefully — a distinction that matters when the aircraft is mid-mission over an area where an immediate landing would be dangerous or impractical.

The key implementation challenge is ensuring the processors are genuinely independent. Independence means separate power rails, separate oscillators, separate firmware images on separate non-volatile memory, and — critically — no shared software library that could carry a common-cause bug. A common-cause failure, where a single defect triggers simultaneous failure across all redundant channels, destroys the statistical value of the redundancy entirely. For unmanned systems targeting heavy-lift UAV applications, software verification practices from manned aviation — including modified condition/decision coverage — are increasingly being adopted rather than derived from ground robotics.

The sensing layer presents a different set of challenges. Flight controllers depend on a continuous, accurate stream of inertial, barometric, and positional data. Redundant inertial measurement units (IMUs) are standard on any serious rotorcraft platform, but raw redundancy is not enough. The system must adjudicate between sensor readings in real time, and the adjudication logic itself must be robust. The most common approach is a voting or consistency-checking scheme: if three IMUs are present and one produces readings that diverge from the other two beyond a statistical threshold, the outlier is removed from the fusion estimate and flagged for ground monitoring. For barometric altitude, which is subject to transient errors from pressure wash or temperature gradients, sensor fusion with GPS or radar altimetry provides cross-validation.

Magnetometers used for heading reference are particularly susceptible to interference from electric motors, battery packs, and structural ferromagnetic elements. Good practice isolates the magnetometer on an extended boom away from the main power distribution, and dual magnetometers mounted with deliberate angular offset allow the system to detect heading divergence before it propagates into attitude error. On a coaxial unmanned helicopter — a configuration common in the approximately 300 kg payload class due to its compact footprint and favourable torque cancellation — the tight packaging of counter-rotating drive systems makes this isolation engineering non-trivial.

Actuator and power redundancy is the third layer. The swashplate uses multiple servos in concert; the failure mode of a single servo depends on whether it fails open, jammed, or at an intermediate position. A fail-safe analysis must enumerate all three cases for every actuator and verify that the remaining servos can maintain controlled flight or achieve a safe landing. Power architecture has moved toward isolated bus designs: a primary bus powers propulsion and primary flight control; a secondary bus maintains the flight computer and fail-safe logic even if the primary collapses. The cross-connect between buses defaults to open, preventing a short on the primary bus from cascading into the avionics bus.

The distinction between fail-operational and fail-safe behaviour sits at the top of the design hierarchy. A fail-safe system responds to a detected fault by transitioning to a known safe state — typically an immediate controlled descent and landing at the nearest safe point. A fail-operational system can continue the mission through a detected fault, because the redundant architecture absorbs the failure without performance degradation. Which posture is appropriate depends on the operational context. For a mission over an uninhabited area with a designated emergency landing zone beneath the flight path, fail-safe is acceptable. For a mission over a populated area or a time-critical logistics corridor, fail-operational for at least one redundancy layer may be a regulatory or operational requirement. The architecture choice cannot be made at the component level; it has to be driven from the top-level operational requirements.

Beyond line of sight, the command-and-control link itself becomes a redundancy problem. A single-frequency, single-path datalink is a single point of failure in the mission-critical chain. Robust BLOS architectures layer at least two independent communication paths: a primary high-bandwidth link for telemetry and payload data, and a secondary low-bandwidth but highly reliable link reserved for command-and-control and emergency uplink. The two links should operate on different frequency bands — a cellular or satellite primary and a licensed sub-GHz secondary, for example — so that interference or congestion on one does not affect the other. The fail-safe logic must detect link loss independently on each path and respond appropriately, entering a defined holding pattern if both fail simultaneously rather than continuing on a stale command set.

The convergence of these layers — redundant compute, sensor voting, isolated power, actuator fail-analysis, and layered BLOS links — is what defines the difference between an unmanned rotorcraft that is genuinely certifiable and one that is merely operational under benign conditions. For teams working on heavy-lift unmanned helicopter platforms, the redundancy architecture is not a feature checklist to complete after the flight dynamics are solved. It is an integral part of the system design from the first requirements pass, and the decisions made at that stage determine whether the eventual platform can be approved for the missions that make it commercially meaningful.

摘要 — 简体

无人直升机的容错架构必须覆盖三个独立层次:飞控计算层(双/三余度控制器与多数表决)、传感器层(IMU冗余与一致性投票)、以及执行器与电源层(独立总线与失效安全分析)。超视距运行还需分层BLOS链路设计,以确保单一通信路径故障不会导致指控链断裂。失效运行与失效安全两种策略的选择,取决于顶层任务需求,而非组件设计阶段。这些冗余要求并非功能附加项,而是从首次需求评审起就嵌入系统架构的核心约束。

摘要 — 繁體

無人直升機的容錯架構必須涵蓋三個獨立層次:飛控計算層(雙/三餘度控制器與多數表決)、感測器層(IMU冗餘與一致性投票)、以及執行器與電源層(獨立匯流排與失效安全分析)。超視距運行還需分層BLOS鏈路設計,確保單一通訊路徑故障不致中斷指管鏈。失效運行與失效安全兩種策略的選擇,取決於頂層任務需求,而非零件設計階段。這些冗餘要求並非功能附加項,而是從首次需求評審起即嵌入系統架構的核心約束。

× 系统工程

为无人旋翼机构建容错架构

2026-06-10 6 分钟阅读

直升机飞行物理决定了从单点故障到不可恢复姿态的时间窗口以毫秒计,而非秒级。在超视距运行的无人旋翼机上,飞行员的反射干预不存在,冗余因此不是设计增强项,而是最低可行架构的组成部分。

理解这一背景下的冗余,需要区分三个独立层次:执行飞行控制逻辑的计算层、测量飞行状态的传感器层,以及将指令转化为物理力的执行器与电源层。仅在一个层次冗余的系统具有虚假的鲁棒性——它消除了一类故障,却将其他类别留作未覆盖的单点。

在计算层,双余度或三余度飞控控制器并行运行相同算法并持续比对输出。三余度架构应用多数表决规则:两控制器一致时,异常输出被屏蔽,系统可在单控制器故障下保持全功能运行,而非仅做优雅降级。实现中的关键挑战是确保各处理器真正独立——独立供电轨、独立振荡器、独立固件镜像,且不共享可能含公共错误的软件库。公共原因故障会彻底摧毁冗余的统计价值。

传感器层面临不同挑战。冗余IMU是标准配置,但原始冗余不够,系统须实时仲裁传感器读数。三IMU的一致性检验方案:若某个读数偏离另两个超过统计阈值,则将其剔除融合估计并上报地面。磁力计受电机和电池干扰明显,将其安装于延伸支臂上并采用双磁力计角度偏置配置,可在航向误差传播前检测出偏差。

执行器与电源冗余要求对每个执行器的三种失效模式(无输出、锁死、中间位置)进行完整分析,并验证剩余执行器能否维持受控飞行。电源架构采用隔离总线设计:主总线断路时,二级总线仍可维持飞控计算机与遥测系统,总线间的交叉互联默认断开,防止主总线短路级联至航电总线。

超视距运行中,指挥控制链路本身也成为冗余问题。稳健的BLOS架构至少叠加两条独立通信路径:高带宽主链路用于遥测与载荷数据,低带宽但高可靠的备用链路专用于指控与应急上行。两条链路须工作于不同频段,使单一链路受干扰时不影响另一条,失效安全逻辑须能独立检测各路径的链路丢失并据此响应。

冗余计算、传感器表决、隔离电源、执行器失效分析与分层BLOS链路的协同收敛,定义了真正可适航的无人旋翼机与仅在理想条件下运行的系统之间的本质差异。这些冗余要求并非在飞行动力学解决后补充的功能清单,而是从首次需求评审起就嵌入系统架构的核心约束。

× 系統工程

為無人旋翼機構建容錯架構

2026-06-10 6 分鐘閱讀

直升機飛行物理決定了從單點故障到不可恢復姿態的時間窗口以毫秒計,而非秒級。在超視距運行的無人旋翼機上,飛行員的反射干預不存在,冗餘因此不是設計增強項,而是最低可行架構的組成部分。

理解這一背景下的冗餘,需區分三個獨立層次:執行飛行控制邏輯的計算層、測量飛行狀態的感測器層,以及將指令轉化為物理力的執行器與電源層。僅在一個層次冗餘的系統具有虛假的穩健性——它消除了一類故障,卻將其他類別留作未覆蓋的單點。

在計算層,雙餘度或三餘度飛控控制器並行執行相同演算法並持續比對輸出。三餘度架構應用多數表決規則:兩控制器一致時,異常輸出被遮蔽,系統可在單控制器故障下保持全功能運行。實現中的關鍵挑戰是確保各處理器真正獨立——獨立供電軌、獨立振盪器、獨立韌體映像,且不共享可能含公共錯誤的軟體庫。

感測器層面臨不同挑戰。冗餘IMU是標準配置,但原始冗餘不夠,系統須即時仲裁感測器讀數。三IMU的一致性檢驗方案:若某個讀數偏離另兩個超過統計閾值,則將其剔除融合估計並上報地面。磁力計受電機和電池干擾明顯,將其安裝於延伸支臂並採用雙磁力計角度偏置配置,可在航向誤差傳播前檢測出偏差。

執行器與電源冗餘要求對每個執行器的三種失效模式(無輸出、鎖死、中間位置)進行完整分析,並驗證剩餘執行器能否維持受控飛行。電源架構採用隔離匯流排設計:主匯流排斷路時,二級匯流排仍可維持飛控計算機與遙測系統,匯流排間的交叉互聯預設斷開,防止主匯流排短路級聯至航電匯流排。

超視距運行中,指揮控制鏈路本身也成為冗餘問題。穩健的BLOS架構至少疊加兩條獨立通訊路徑:高頻寬主鏈路用於遙測與載荷數據,低頻寬但高可靠的備用鏈路專用於指控與緊急上行。兩條鏈路須工作於不同頻段,失效安全邏輯須能獨立檢測各路徑的鏈路丟失並據此響應。

冗餘計算、感測器表決、隔離電源、執行器失效分析與分層BLOS鏈路的協同收斂,定義了真正可適航的無人旋翼機與僅在理想條件下運行的系統之間的本質差異。這些冗餘要求並非在飛行動力學解決後補充的功能清單,而是從首次需求評審起就嵌入系統架構的核心約束。