← Notes from the Crossings
× HARDWARE

Building Edge AI That Survives the Field

2026-06-10 6 min read

The first generation of ground robots equipped with on-board AI compute shared a common assumption: that inference hardware designed for controlled indoor environments would simply work outside. It did not. What failed was not the model or the software stack. What failed was the hardware underneath — boards, connectors, and cooling systems built to survive an air-conditioned server room, not a construction site in July or a border patrol route in January.

The problem is architectural. The AI accelerator market matured inside data centres, where the thermal envelope is managed, vibration is absent, humidity is controlled, and power rails are clean. A UGV operating on uneven terrain violates every one of those assumptions simultaneously. Before a team can deploy a physical AI system that reasons about its environment in real time, it has to solve a set of hardware problems that have nothing to do with machine learning and everything to do with mechanical and electrical engineering.

Shock and vibration are the first killers. Off-road platforms routinely generate sustained vibration profiles across a wide frequency band, with occasional sharp transients from obstacles, drops, or track engagement. Consumer and commercial compute modules are not designed for these loads. The failure modes are predictable: solder joint fatigue on BGA packages, connector intermittency, and delamination of PCB layers at vias. Military and aerospace standards — MIL-STD-810, DO-160 — define test profiles for exactly these environments. The engineering response involves underfill on critical ball-grid packages, vibration-damped mounting systems that decouple the compute enclosure from the chassis, and locking connector formats rather than friction-fit headers. None of this is exotic; all of it adds mass, volume, and cost that system architects have to account for before the first prototype.

Thermal range is the second constraint. A UGV may sit idle overnight at sub-zero temperatures and then ramp to full inference load within minutes of mission start. It may operate in direct sun on an exposed surface where the ambient temperature around the chassis climbs well above the rated operating range of commercial components. The standard datacenter answer — active cooling via fans — introduces its own failure modes in field conditions: fan bearings ingest dust and fail; fan apertures require filtration that clogs; fan-cooled enclosures cannot be sealed. The preferred answer for sealed, ruggedized designs is fanless thermal management: large thermal mass in the chassis itself, heat pipes or vapour chambers routing heat from the compute die to an external heatsink surface, and phase-change thermal interface materials that maintain conductivity across the temperature range. This places real constraints on the compute budget. A module that dissipates significant power without active airflow must be either underclocked, paired with a chassis that has adequate thermal surface area, or replaced with a lower-TDP part. These are design tradeoffs that have to be made before layout, not after integration testing.

Ingress protection — the IP rating system — addresses dust and water. A ground robot operating outdoors needs to survive rain, mud, water crossings, and the fine dust that penetrates any unsealed enclosure over thousands of hours of operation. IP65 excludes dust entirely and resists low-pressure water jets from any direction. IP67 adds the ability to survive temporary immersion. Achieving these ratings on a compute enclosure is straightforward in principle — sealed enclosures with compressed O-rings and gland connectors — but adds weight and makes thermal management harder by removing convective pathways. Every connector penetration is a potential ingress point, which pushes designers toward sealed circular connectors and away from the rectangular PCIe and USB formats that make prototyping fast. For teams sourcing components, this means filtering for IP-rated compute platforms from the start of the design process, rather than attempting to seal consumer hardware after the fact.

Power transients are less visible but equally destructive. A mobile robot's power bus is shared between motors, servos, sensors, and compute. Motor start-up and braking generate voltage spikes and sag events that can exceed the absolute maximum ratings of compute silicon or corrupt memory contents mid-inference. Proper isolation requires bulk capacitance on the compute power rail, voltage regulation with adequate transient response, and in some cases dedicated DC-DC conversion for the AI accelerator to isolate it from the rest of the load. The cost of not doing this is not a clean shutdown — it is silent corruption, intermittent resets, and inference results that are wrong in ways that are difficult to attribute.

Conformal coating is the last line of hardware defence. A thin layer of acrylic, polyurethane, or silicone coating over the assembled PCB resists moisture ingress, salt fog, and conductive contamination. It is also incompatible with rework and repair without stripping, which creates a production discipline: conformal coating happens after final test and before sealing. The choice of coating chemistry matters — silicone coatings offer the widest thermal range but are difficult to inspect; acrylic coatings are more accessible but less flexible and prone to cracking under sustained vibration. These are not decisions that can be made at the last stage of a product design cycle.

All of these constraints converge on the compute and power budget in a way that is qualitatively different from cloud or embedded IoT design. The on-board inference platform for a UGV has to deliver enough throughput for the perception and planning workloads the mission requires — and it has to do it within a power envelope set by the battery, at a weight that the chassis can carry, inside a thermal envelope the fanless enclosure can manage, across the full operational temperature and vibration range the vehicle will see. Every dimension of that budget is tighter than its datacenter equivalent. The path forward is not to transplant datacenter compute into a ruggedized enclosure; it is to source hardware designed from the outset for the physical environment it will inhabit.

The teams making progress on this are not necessarily the ones with the most sophisticated models. They are the ones who treated the hardware integration problem with the same rigour they applied to the software stack — and who resolved the mechanical, thermal, and electrical constraints before they wrote a line of inference code. In physical AI, the substrate is never neutral. What the hardware can survive defines what the system can do.

摘要 — 简体

数据中心级 AI 计算硬件并非为野外环境设计。用于地面无人车(UGV)的边缘 AI 部署,面临五大核心挑战:冲击与振动导致焊点疲劳和连接器失效;宽温域工作要求无风扇导热设计;防尘防水等级(IP65/67)与散热路径之间的矛盾;电机启停引发的电源瞬变干扰推理计算;以及用于防潮和防污染的共形涂层工艺。这些约束共同压缩了板载推理平台的计算与功耗预算。真正能在野外部署的团队,是那些在编写任何推理代码之前,就将硬件集成问题视为一等工程挑战的团队。

摘要 — 繁體

資料中心級 AI 運算硬體並非為野外環境設計。用於地面無人車(UGV)的邊緣 AI 部署,面臨五大核心挑戰:衝擊與振動導致焊點疲勞及連接器失效;寬溫域工作要求無風扇導熱設計;防塵防水等級(IP65/67)與散熱路徑之間的矛盾;馬達啟停引發的電源暫態干擾推理運算;以及用於防潮和防污染的共形塗層工藝。這些限制共同壓縮了板載推理平台的運算與功耗預算。真正能在野外部署的團隊,是那些在撰寫任何推理程式碼之前,就將硬體整合問題視為一等工程挑戰的團隊。

× 硬件

打造能在野外存活的边缘 AI

2026-06-10 6 分钟阅读

数据中心级 AI 计算硬件并非为野外环境设计。将 GPU 模块安装到地面无人车(UGV)上,并不意味着系统就能在野外工作——真正失效的,从来不是模型或软件栈,而是底层硬件:那些为空调机房而非七月建筑工地或一月边境巡逻路线设计的电路板、连接器与散热系统。

冲击与振动是第一杀手。越野平台会持续产生宽频振动,并伴有来自障碍物、跌落或履带啮合的瞬态冲击。商用计算模块并非为此负载设计,典型失效模式包括:BGA 封装焊点疲劳、连接器间歇性接触不良,以及 PCB 过孔处的层间分层。应对措施包括关键球栅阵列封装的底部填充、减振安装系统,以及锁定型连接器格式。

热域范围是第二约束。UGV 可能在接近零度的低温中静置过夜,随后在任务开始数分钟内达到满载推理功耗。无风扇散热设计是密封加固机箱的标准答案:利用热管或均热板将热量从芯片导至外部散热面,同时采用相变导热界面材料维持全温域导热性。这对计算预算形成真实约束——功耗越高的模块,对机箱散热面积的要求越苛刻。

防护等级方面,IP65 可完全防尘并抵御任意方向低压水柱,IP67 则可承受短暂浸水。实现这些等级需要密封机箱配合 O 形圈与压盖连接器,但这也使散热更加困难——每个连接器穿孔都是潜在的侵入点,迫使设计者转向密封圆形连接器,而非原型开发中常用的矩形 PCIe 与 USB 接口。

电源瞬变同样不可忽视。机器人总线由电机、伺服、传感器与计算单元共享,电机启停产生的电压尖峰可能超过计算芯片的绝对最大额定值,或在推理过程中造成内存数据损坏。正确的隔离方案包括:计算电源轨上的大容量储能电容、具备足够瞬态响应能力的稳压模块,以及必要时为 AI 加速器配置独立的 DC-DC 变换器。

共形涂层是硬件防护的最后一道防线。覆盖在组装 PCB 上的薄层涂料可抵御潮湿、盐雾与导电污染。涂层化学品的选择至关重要——有机硅涂层温域最宽,但难以检测;丙烯酸涂层更易检视,但在持续振动下容易开裂。这些决策必须在产品设计周期的早期确定,而非在最后阶段补救。

所有这些约束共同压缩了 UGV 板载推理平台的计算与功耗预算,其程度远超云端或嵌入式 IoT 设计。真正能在野外部署的团队,是那些在编写任何推理代码之前,就将硬件集成问题视为一等工程挑战,并在布局之前解决机械、热学与电气约束的团队。在实体 AI 领域,硬件底层从不中立——硬件能承受什么,决定了系统能做什么。

× 硬體

打造能在野外存活的邊緣 AI

2026-06-10 6 分鐘閱讀

資料中心級 AI 運算硬體並非為野外環境設計。將 GPU 模組安裝到地面無人車(UGV)上,並不意味著系統就能在野外工作——真正失效的,從來不是模型或軟體堆疊,而是底層硬體:那些為空調機房而非七月建築工地或一月邊境巡邏路線設計的電路板、連接器與散熱系統。

衝擊與振動是第一殺手。越野平台會持續產生寬頻振動,並伴有來自障礙物、跌落或履帶嚙合的瞬態衝擊。商用運算模組並非為此負載設計,典型失效模式包括:BGA 封裝焊點疲勞、連接器間歇性接觸不良,以及 PCB 過孔處的層間分層。應對措施包括關鍵球柵陣列封裝的底部填充、減振安裝系統,以及鎖定型連接器格式。

熱域範圍是第二約束。UGV 可能在接近零度的低溫中靜置過夜,隨後在任務開始數分鐘內達到滿載推理功耗。無風扇散熱設計是密封加固機箱的標準答案:利用熱管或均熱板將熱量從晶片導至外部散熱面,同時採用相變導熱介面材料維持全溫域導熱性。這對運算預算形成真實約束——功耗越高的模組,對機箱散熱面積的要求越苛刻。

防護等級方面,IP65 可完全防塵並抵禦任意方向低壓水柱,IP67 則可承受短暫浸水。實現這些等級需要密封機箱配合 O 形圈與壓蓋連接器,但這也使散熱更加困難——每個連接器穿孔都是潛在的侵入點,迫使設計者轉向密封圓形連接器,而非原型開發中常用的矩形 PCIe 與 USB 接口。

電源暫態同樣不可忽視。機器人總線由電機、伺服、感測器與運算單元共享,電機啟停產生的電壓尖峰可能超過運算晶片的絕對最大額定值,或在推理過程中造成記憶體資料損壞。正確的隔離方案包括:運算電源軌上的大容量儲能電容、具備足夠暫態響應能力的穩壓模組,以及必要時為 AI 加速器配置獨立的 DC-DC 變換器。

共形塗層是硬體防護的最後一道防線。覆蓋在組裝 PCB 上的薄層塗料可抵禦潮濕、鹽霧與導電污染。塗層化學品的選擇至關重要——矽酮塗層溫域最寬,但難以檢測;丙烯酸塗層更易檢視,但在持續振動下容易開裂。這些決策必須在產品設計週期的早期確定,而非在最後階段補救。

所有這些約束共同壓縮了 UGV 板載推理平台的運算與功耗預算,其程度遠超雲端或嵌入式 IoT 設計。真正能在野外部署的團隊,是那些在撰寫任何推理程式碼之前,就將硬體整合問題視為一等工程挑戰,並在佈局之前解決機械、熱學與電氣約束的團隊。在實體 AI 領域,硬體底層從不中立——硬體能承受什麼,決定了系統能做什麼。