Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI
Deployment-aligned low-precision NAS enhances spaceborne edge AI performance, achieving 0.826 mIoU.
Key Findings
Methodology
This paper proposes a method that integrates deployment-aligned low-precision training directly into hardware-aware Neural Architecture Search (NAS). Candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation, allowing joint optimization of architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. This approach is particularly suited for vessel segmentation in Earth Observation tasks, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU).
Key Results
- Result 1: In vessel segmentation tasks, deployment-aligned low-precision training achieved 0.826 mIoU on the Intel Movidius Myriad X VPU, whereas post-training precision conversion reduced on-device performance from 0.85 to 0.78 mIoU.
- Result 2: For the same architecture (95,791 parameters), deployment-aligned low-precision training recovered approximately two-thirds of the accuracy loss without increasing model complexity.
- Result 3: Compared to GPU optimization, deployment-consistent numerical constraints significantly reduced the performance gap between optimization and deployment while maintaining compact model size and real-time execution capability.
Significance
This research significantly enhances the robustness and alignment between optimization and deployment for resource-constrained edge AI by incorporating deployment-consistent numerical constraints into hardware-aware NAS. This approach is particularly significant for Earth Observation tasks, where rapid and autonomous decision-making is crucial in spaceborne systems. By reducing accuracy loss and maintaining model compactness, this method offers new possibilities for deploying deep learning models on low-power, low-memory edge devices.
Technical Contribution
The technical contribution of this paper lies in the first-time integration of deployment-aligned low-precision training into hardware-aware NAS, optimizing both architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. Compared to existing methods, this approach directly considers FP16 numerical constraints during optimization, significantly reducing the performance gap between optimization and deployment and offering new engineering possibilities.
Novelty
This study is the first to introduce deployment-aligned low-precision training directly into the NAS process rather than treating it as a post-processing step. This innovation allows the consideration of numerical constraints during optimization, enhancing the robustness and performance of models in real deployment scenarios.
Limitations
- Limitation 1: This method primarily optimizes for FP16 numerical constraints and may not be applicable to other low-precision formats or more complex numerical constraint scenarios.
- Limitation 2: Although the method performs well on specific tasks and hardware, its generality and applicability to other tasks need further validation.
- Limitation 3: The computational cost might be high due to the need for low-precision training during NAS, especially in large-scale search spaces.
Future Work
Future research directions include exploring the applicability of this method to other low-precision formats and more complex numerical constraints, as well as its generality across different tasks and hardware platforms. Additionally, research can focus on reducing computational costs in larger search spaces while maintaining optimization-deployment consistency.
AI Executive Summary
In modern space missions, rapid and autonomous decision-making has become crucial, especially in Earth Observation tasks. Traditional neural network architectures often suffer from performance degradation when deployed on edge devices due to numerical precision conversion. Existing hardware-aware Neural Architecture Search (NAS) methods typically optimize under full precision and then convert to low precision during deployment, failing to effectively address the performance gap between optimization and deployment.
This paper proposes a novel method that integrates deployment-aligned low-precision training directly into hardware-aware NAS. By introducing FP16 numerical constraints during fine-tuning and evaluation, candidate architectures can optimize both architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. This approach is particularly suited for vessel segmentation in Earth Observation tasks, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU).
In experiments, the proposed method achieved significant performance improvements in vessel segmentation tasks. Compared to traditional post-training precision conversion, deployment-aligned low-precision training recovered approximately two-thirds of the accuracy loss for the same architecture without increasing model complexity. This demonstrates that considering deployment-time numerical constraints during optimization can significantly enhance the robustness and performance of models in real deployment scenarios.
The significance of this method lies in its potential to offer new possibilities for deploying deep learning models on low-power, low-memory edge devices. By reducing accuracy loss and maintaining model compactness, this approach has important applications in space systems where rapid and autonomous decision-making is required.
However, the method also has some limitations. It primarily optimizes for FP16 numerical constraints and may not be applicable to other low-precision formats or more complex numerical constraint scenarios. Additionally, while the method performs well on specific tasks and hardware, its generality and applicability to other tasks need further validation.
Future research directions include exploring the applicability of this method to other low-precision formats and more complex numerical constraints, as well as its generality across different tasks and hardware platforms. Additionally, research can focus on reducing computational costs in larger search spaces while maintaining optimization-deployment consistency.
Deep Analysis
Background
With the rapid development of deep learning, Neural Architecture Search (NAS) has become an important method for automatically designing deep neural networks. Hardware-aware NAS extends this idea by incorporating device-level performance metrics such as latency, throughput, or memory footprint directly into the optimization loop, enabling architectures to be selected with deployment constraints in mind. However, despite explicitly accounting for hardware characteristics, most hardware-aware NAS pipelines still optimize candidate architectures under full-precision floating-point (FP32) training assumptions and apply low-precision adaptation only after the search is complete. This decoupling introduces a systematic mismatch between optimization-time behavior and deployment-time execution on low-precision edge accelerators, often resulting in substantial accuracy degradation once models are deployed.
Core Problem
In Earth Observation tasks, rapid and autonomous decision-making has become crucial. Traditional Earth Observation pipelines rely on downlinking raw or minimally processed imagery to ground stations, introducing delays due to limited visibility windows, bandwidth constraints, and the need for multiple ground station passes before actionable information can be generated. Recent missions and demonstrators have shown that performing inference directly on-board can significantly reduce these delays. However, this shift toward on-board intelligence places stringent constraints on the computational resources available for data processing. Spaceborne platforms, particularly small satellites and CubeSats, operate under tight limits on power consumption, memory footprint, and processing throughput, requiring deep learning models that are both compact and efficient.
Innovation
The core innovation of this paper lies in integrating deployment-aligned low-precision training directly into hardware-aware NAS rather than treating it as a post-processing step. This innovation allows the consideration of numerical constraints during optimization, enhancing the robustness and performance of models in real deployment scenarios. Specifically, candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation, allowing joint optimization of architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy.
Methodology
- �� Search Space: Defined as a discrete search space composed of single-path networks with up to n=6 configurable blocks. Each block is sampled from a library of convolutional primitives and macro-blocks.
- �� Evolutionary Algorithm Settings: Uses an evolutionary NAS strategy with a population size s=16 and G=10 generations.
- �� Device-in-the-Loop Evaluation: Each candidate architecture is exported to an FP16 OpenVINO intermediate representation (IR) and executed on the Intel Movidius Myriad X VPU, directly measuring throughput and latency.
- �� Deployment-Aligned Low-Precision Training: Each candidate architecture is trained for FP32=10 epochs, followed by FP16-aware fine-tuning for LP=10 epochs.
Experiments
All experiments are conducted using the HRSC2016 dataset, a publicly available benchmark for high-resolution ship detection and segmentation. The dataset consists of images primarily collected from Google Earth, with ground sampling distances ranging from approximately 0.4 m to 2 m. In the experiments, all candidate architectures are trained on an AMD Radeon GPU using full-precision (FP32) arithmetic. Deployment-time evaluation is performed on the Intel Movidius Myriad X, a low-power edge accelerator representative of onboard computing platforms.
Results
In the experiments, deployment-aligned low-precision training achieved significant performance improvements in vessel segmentation tasks. Compared to traditional post-training precision conversion, deployment-aligned low-precision training recovered approximately two-thirds of the accuracy loss for the same architecture without increasing model complexity. This demonstrates that considering deployment-time numerical constraints during optimization can significantly enhance the robustness and performance of models in real deployment scenarios.
Applications
This method is particularly suited for vessel segmentation in Earth Observation tasks, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU). By reducing accuracy loss and maintaining model compactness, this approach has important applications in space systems where rapid and autonomous decision-making is required.
Limitations & Outlook
This method primarily optimizes for FP16 numerical constraints and may not be applicable to other low-precision formats or more complex numerical constraint scenarios. Additionally, while the method performs well on specific tasks and hardware, its generality and applicability to other tasks need further validation. The computational cost might be high due to the need for low-precision training during NAS, especially in large-scale search spaces.
Plain Language Accessible to non-experts
Imagine you're in a kitchen cooking a meal. You need to make a delicious dish with limited ingredients and time. Now, imagine you have a smart assistant that can automatically design the best recipe for you based on the ingredients and time you have. That's what Neural Architecture Search (NAS) does; it helps us automatically design optimal neural network architectures.
However, in practice, we often need to cook in different kitchens (hardware), and each kitchen has different equipment and conditions. Some kitchens might only have a small stove (low-precision hardware), while we assume a large stove (full-precision hardware) when designing the recipe. This can lead to the dish not tasting as good when actually cooked.
The method in this paper is like considering the conditions of different kitchens when designing the recipe. This way, whether cooking on a large stove or a small stove, the dish tastes good. This method is particularly useful in space missions where rapid and autonomous decision-making is needed, such as monitoring vessels from satellites.
With this approach, we can make tastier dishes (improve model accuracy and efficiency) without adding more ingredients (model complexity). This offers new possibilities for deploying deep learning models in resource-constrained environments.
ELI14 Explained like you're 14
Hey there! Imagine you're playing a super cool game where you need to defeat enemies with limited time and resources. You have a smart assistant that can automatically design the best strategy for you based on the resources and time you have. That's what Neural Architecture Search (NAS) does; it helps us automatically design optimal neural network architectures.
But here's the problem! In the actual game, we often need to play on different devices, and each device has different performance. Some devices might only have low graphics (low-precision hardware), while we assume high graphics (full-precision hardware) when designing the strategy. This can lead to lag or frame drops during the actual game.
The method in this paper is like considering the performance of different devices when designing the strategy. This way, whether playing on high graphics or low graphics devices, the gaming experience is smooth. This method is particularly useful in tasks where rapid and autonomous decision-making is needed, like monitoring vessels from satellites.
With this approach, we can score higher (improve model accuracy and efficiency) without adding more resources (model complexity). This offers new possibilities for deploying deep learning models in resource-constrained environments.
Glossary
Neural Architecture Search (NAS)
A method for automatically designing neural network architectures by optimizing architectures in a predefined search space to meet specific task requirements.
Used in this paper to design efficient neural networks that meet deployment constraints.
Hardware-Aware Optimization
Considers hardware characteristics and limitations during optimization to select architectures suitable for specific devices.
Used in NAS to account for device-level performance metrics.
Low-Precision Training
Uses low-precision numerical formats (e.g., FP16) during training to reduce computational cost and memory usage.
Integrated into NAS in this paper to enhance deployment-time numerical robustness.
FP16 (Half-Precision Floating Point)
A 16-bit floating-point format used to reduce computation and storage demands, especially on resource-constrained hardware.
Used in this paper to simulate deployment-time numerical constraints.
Evolutionary Algorithm
An optimization algorithm based on natural selection and genetic mechanisms, used to find optimal solutions in a search space.
Used in NAS to generate and evaluate candidate architectures.
Device-in-the-Loop Evaluation
Directly evaluates candidate architectures' performance on the target hardware to capture hardware-specific behavior.
Used in NAS to measure throughput and latency of candidate architectures.
Post-Training Quantization (PTQ)
Converts a trained model to a low-precision format after training to reduce computational cost.
Compared with deployment-aligned low-precision training in this paper.
Vessel Segmentation
An image processing task aimed at identifying and segmenting vessels from images.
Used as an experimental task to evaluate the effectiveness of the method.
Earth Observation
The process of acquiring information about Earth's surface from space using satellites or other devices.
One of the application scenarios in this paper.
Intel Movidius Myriad X
A low-power visual processing unit (VPU) used for neural network inference on edge devices.
Used as the target hardware for experimental evaluation in this paper.
Open Questions Unanswered questions from this research
- 1 Although the method performs well on specific tasks and hardware, its applicability to other low-precision formats and more complex numerical constraints needs further validation. More experiments on different hardware platforms and tasks are needed to assess its generality and robustness.
- 2 Low-precision training in large-scale search spaces might lead to high computational costs. How to reduce computational costs while maintaining optimization-deployment consistency is a question worth exploring. This requires developing more efficient optimization algorithms and strategies.
- 3 The method primarily optimizes for FP16 numerical constraints and may not be applicable to other low-precision formats. Exploring the impact of numerical constraints under other low-precision formats on model performance is an important direction for future research.
- 4 In practical applications, the hardware characteristics of edge devices may affect model performance. How to better simulate and consider these hardware characteristics during NAS to improve deployment performance is a question worth exploring.
- 5 While the method performs well in vessel segmentation tasks, its applicability and performance in other tasks need further validation. More experiments in different application scenarios are needed to assess its generality and effectiveness.
Applications
Immediate Applications
Vessel Monitoring in Space Missions
Improve vessel segmentation model performance on edge devices using deployment-aligned low-precision training, suitable for space missions requiring rapid decision-making.
Image Processing on Low-Power Edge Devices
Deploy efficient image processing models on low-power, low-memory edge devices to improve processing speed and accuracy for various real-time applications.
Object Detection in Intelligent Transportation Systems
Apply deployment-aligned low-precision training in intelligent transportation systems to improve object detection model performance on edge devices for more efficient traffic management.
Long-term Vision
Autonomous Space Exploration
Apply efficient edge AI in space exploration missions to achieve more autonomous decision-making and operations, reducing reliance on ground stations.
Edge Computing in Smart Cities
Apply efficient edge AI in smart cities to achieve more intelligent city management and services, such as real-time monitoring and emergency response.
Abstract
Designing deep networks that meet strict latency and accuracy constraints on edge accelerators increasingly relies on hardware-aware optimization, including neural architecture search (NAS) guided by device-level metrics. Yet most hardware-aware NAS pipelines still optimize architectures under full-precision assumptions and apply low-precision adaptation only after the search, leading to a mismatch between optimization-time behavior and deployment-time execution on low-precision hardware that can substantially degrade accuracy. We address this limitation by integrating deployment-aligned low-precision training directly into hardware-aware NAS. Candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation, enabling joint optimization of architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. We evaluate the proposed framework on vessel segmentation for spaceborne maritime monitoring, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU). While post-training precision conversion reduces on-device performance from 0.85 to 0.78 mIoU, deployment-aligned low-precision training achieves 0.826 mIoU on-device for the same architecture (95,791 parameters), recovering approximately two-thirds of deployment-induced accuracy gap without increasing model complexity. These results demonstrate that incorporating deployment-consistent numerical constraints into hardware-aware NAS substantially improves robustness and alignment between optimization and deployment for resource-constrained edge Artificial Intelligence (AI).
References (20)
Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review
Ayoub Benali Amjoud, M. AMROUCH
Overview of ESA’s Earth Observation upcoming small satellites missions
M. Pastena, M. Tossaint, A. Regan et al.
Optimizing deep learning models for on-orbit deployment through neural architecture search
Roberto del Prete, P. Thind, Andrea Mazzeo et al.
Post training 4-bit quantization of convolutional networks for rapid-deployment
Ron Banner, Yury Nahshan, Daniel Soudry
NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search
Arber Zela, Julien N. Siems, F. Hutter
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Han Cai, Ligeng Zhu, Song Han
Pol-NAS: A Neural Architecture Search Method With Feature Selection for PolSAR Image Classification
Guangyuan Liu, Yangyang Li, Yanqiao Chen et al.
Development and implementation of the Φsat-2 mission
N. Melega, N. Longépé, A. Paskeviciute et al.
DARTS: Differentiable Architecture Search
Hanxiao Liu, K. Simonyan, Yiming Yang
Designing a Classifier for Active Fire Detection From Multispectral Satellite Imagery Using Neural Architecture Search
Amber Cassimon, Philippe Reiter, Siegfried Mercelis et al.
Neural gradients are near-lognormal: improved quantized and sparse training
Brian Chmiel, Liad Ben-Uri, Moran Shkolnik et al.
AutoML-Based Neural Architecture Search for Object Recognition in Satellite Imagery
Povilas Gudžius, O. Kurasova, Vytenis Darulis et al.
NASEO: Neural Architecture Search for Earth Observation Onboard Processing
P. Thind, Roberto del Prete, Matthew Whitley et al.
A White Paper on Neural Network Quantization
Markus Nagel, Marios Fournarakis, Rana Ali Amjad et al.
DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures
Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan et al.
A comprehensive survey on model compression and acceleration
T. Choudhary, V. Mishra, Anurag Goswami et al.
On-board Volcanic Eruption Detection through CNNs and Satellite Multispectral Imagery
M. P. D. Rosso, A. Sebastianelli, D. Spiller et al.
Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges
Yu Cheng, Duo Wang, Pan Zhou et al.
Towards onboard thermal hotspots segmentation with raw multispectral satellite imagery
Cristopher Castro Traba, David Rijlaarsdam, Jian Guo et al.
Neural architecture search: A contemporary literature review for computer vision applications
Matt Poyser, T. Breckon