Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

TL;DR

Deployment-aligned low-precision NAS enhances spaceborne edge AI performance, achieving 0.826 mIoU.

cs.CV 🔴 Advanced 2026-04-27 38 views

Parampuneet Kaur Thind Vaibhav Katturu Giacomo Zema Roberto Del Prete

Neural Architecture Search Low-Precision Training Edge Computing Spaceborne AI Hardware-Aware Optimization

Key Findings

Methodology

This paper proposes a method that integrates deployment-aligned low-precision training directly into hardware-aware Neural Architecture Search (NAS). Candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation, allowing joint optimization of architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. This approach is particularly suited for vessel segmentation in Earth Observation tasks, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU).

Key Results

Result 1: In vessel segmentation tasks, deployment-aligned low-precision training achieved 0.826 mIoU on the Intel Movidius Myriad X VPU, whereas post-training precision conversion reduced on-device performance from 0.85 to 0.78 mIoU.
Result 2: For the same architecture (95,791 parameters), deployment-aligned low-precision training recovered approximately two-thirds of the accuracy loss without increasing model complexity.
Result 3: Compared to GPU optimization, deployment-consistent numerical constraints significantly reduced the performance gap between optimization and deployment while maintaining compact model size and real-time execution capability.

Significance

This research significantly enhances the robustness and alignment between optimization and deployment for resource-constrained edge AI by incorporating deployment-consistent numerical constraints into hardware-aware NAS. This approach is particularly significant for Earth Observation tasks, where rapid and autonomous decision-making is crucial in spaceborne systems. By reducing accuracy loss and maintaining model compactness, this method offers new possibilities for deploying deep learning models on low-power, low-memory edge devices.

Technical Contribution

The technical contribution of this paper lies in the first-time integration of deployment-aligned low-precision training into hardware-aware NAS, optimizing both architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. Compared to existing methods, this approach directly considers FP16 numerical constraints during optimization, significantly reducing the performance gap between optimization and deployment and offering new engineering possibilities.

Novelty

This study is the first to introduce deployment-aligned low-precision training directly into the NAS process rather than treating it as a post-processing step. This innovation allows the consideration of numerical constraints during optimization, enhancing the robustness and performance of models in real deployment scenarios.

Limitations

Limitation 1: This method primarily optimizes for FP16 numerical constraints and may not be applicable to other low-precision formats or more complex numerical constraint scenarios.
Limitation 2: Although the method performs well on specific tasks and hardware, its generality and applicability to other tasks need further validation.
Limitation 3: The computational cost might be high due to the need for low-precision training during NAS, especially in large-scale search spaces.

Future Work

Future research directions include exploring the applicability of this method to other low-precision formats and more complex numerical constraints, as well as its generality across different tasks and hardware platforms. Additionally, research can focus on reducing computational costs in larger search spaces while maintaining optimization-deployment consistency.

AI Executive Summary

In modern space missions, rapid and autonomous decision-making has become crucial, especially in Earth Observation tasks. Traditional neural network architectures often suffer from performance degradation when deployed on edge devices due to numerical precision conversion. Existing hardware-aware Neural Architecture Search (NAS) methods typically optimize under full precision and then convert to low precision during deployment, failing to effectively address the performance gap between optimization and deployment.

This paper proposes a novel method that integrates deployment-aligned low-precision training directly into hardware-aware NAS. By introducing FP16 numerical constraints during fine-tuning and evaluation, candidate architectures can optimize both architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. This approach is particularly suited for vessel segmentation in Earth Observation tasks, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU).

In experiments, the proposed method achieved significant performance improvements in vessel segmentation tasks. Compared to traditional post-training precision conversion, deployment-aligned low-precision training recovered approximately two-thirds of the accuracy loss for the same architecture without increasing model complexity. This demonstrates that considering deployment-time numerical constraints during optimization can significantly enhance the robustness and performance of models in real deployment scenarios.

The significance of this method lies in its potential to offer new possibilities for deploying deep learning models on low-power, low-memory edge devices. By reducing accuracy loss and maintaining model compactness, this approach has important applications in space systems where rapid and autonomous decision-making is required.

However, the method also has some limitations. It primarily optimizes for FP16 numerical constraints and may not be applicable to other low-precision formats or more complex numerical constraint scenarios. Additionally, while the method performs well on specific tasks and hardware, its generality and applicability to other tasks need further validation.

Deep Analysis

Background

With the rapid development of deep learning, Neural Architecture Search (NAS) has become an important method for automatically designing deep neural networks. Hardware-aware NAS extends this idea by incorporating device-level performance metrics such as latency, throughput, or memory footprint directly into the optimization loop, enabling architectures to be selected with deployment constraints in mind. However, despite explicitly accounting for hardware characteristics, most hardware-aware NAS pipelines still optimize candidate architectures under full-precision floating-point (FP32) training assumptions and apply low-precision adaptation only after the search is complete. This decoupling introduces a systematic mismatch between optimization-time behavior and deployment-time execution on low-precision edge accelerators, often resulting in substantial accuracy degradation once models are deployed.

Core Problem

In Earth Observation tasks, rapid and autonomous decision-making has become crucial. Traditional Earth Observation pipelines rely on downlinking raw or minimally processed imagery to ground stations, introducing delays due to limited visibility windows, bandwidth constraints, and the need for multiple ground station passes before actionable information can be generated. Recent missions and demonstrators have shown that performing inference directly on-board can significantly reduce these delays. However, this shift toward on-board intelligence places stringent constraints on the computational resources available for data processing. Spaceborne platforms, particularly small satellites and CubeSats, operate under tight limits on power consumption, memory footprint, and processing throughput, requiring deep learning models that are both compact and efficient.

Innovation

The core innovation of this paper lies in integrating deployment-aligned low-precision training directly into hardware-aware NAS rather than treating it as a post-processing step. This innovation allows the consideration of numerical constraints during optimization, enhancing the robustness and performance of models in real deployment scenarios. Specifically, candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation, allowing joint optimization of architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy.

Methodology

�� Search Space: Defined as a discrete search space composed of single-path networks with up to n=6 configurable blocks. Each block is sampled from a library of convolutional primitives and macro-blocks.

�� Evolutionary Algorithm Settings: Uses an evolutionary NAS strategy with a population size s=16 and G=10 generations.

�� Device-in-the-Loop Evaluation: Each candidate architecture is exported to an FP16 OpenVINO intermediate representation (IR) and executed on the Intel Movidius Myriad X VPU, directly measuring throughput and latency.

�� Deployment-Aligned Low-Precision Training: Each candidate architecture is trained for FP32=10 epochs, followed by FP16-aware fine-tuning for LP=10 epochs.

Experiments

All experiments are conducted using the HRSC2016 dataset, a publicly available benchmark for high-resolution ship detection and segmentation. The dataset consists of images primarily collected from Google Earth, with ground sampling distances ranging from approximately 0.4 m to 2 m. In the experiments, all candidate architectures are trained on an AMD Radeon GPU using full-precision (FP32) arithmetic. Deployment-time evaluation is performed on the Intel Movidius Myriad X, a low-power edge accelerator representative of onboard computing platforms.

Results

In the experiments, deployment-aligned low-precision training achieved significant performance improvements in vessel segmentation tasks. Compared to traditional post-training precision conversion, deployment-aligned low-precision training recovered approximately two-thirds of the accuracy loss for the same architecture without increasing model complexity. This demonstrates that considering deployment-time numerical constraints during optimization can significantly enhance the robustness and performance of models in real deployment scenarios.

Applications

This method is particularly suited for vessel segmentation in Earth Observation tasks, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU). By reducing accuracy loss and maintaining model compactness, this approach has important applications in space systems where rapid and autonomous decision-making is required.

Limitations & Outlook

This method primarily optimizes for FP16 numerical constraints and may not be applicable to other low-precision formats or more complex numerical constraint scenarios. Additionally, while the method performs well on specific tasks and hardware, its generality and applicability to other tasks need further validation. The computational cost might be high due to the need for low-precision training during NAS, especially in large-scale search spaces.

Plain Language Accessible to non-experts

Imagine you're in a kitchen cooking a meal. You need to make a delicious dish with limited ingredients and time. Now, imagine you have a smart assistant that can automatically design the best recipe for you based on the ingredients and time you have. That's what Neural Architecture Search (NAS) does; it helps us automatically design optimal neural network architectures.

However, in practice, we often need to cook in different kitchens (hardware), and each kitchen has different equipment and conditions. Some kitchens might only have a small stove (low-precision hardware), while we assume a large stove (full-precision hardware) when designing the recipe. This can lead to the dish not tasting as good when actually cooked.

The method in this paper is like considering the conditions of different kitchens when designing the recipe. This way, whether cooking on a large stove or a small stove, the dish tastes good. This method is particularly useful in space missions where rapid and autonomous decision-making is needed, such as monitoring vessels from satellites.

With this approach, we can make tastier dishes (improve model accuracy and efficiency) without adding more ingredients (model complexity). This offers new possibilities for deploying deep learning models in resource-constrained environments.

ELI14 Explained like you're 14

Hey there! Imagine you're playing a super cool game where you need to defeat enemies with limited time and resources. You have a smart assistant that can automatically design the best strategy for you based on the resources and time you have. That's what Neural Architecture Search (NAS) does; it helps us automatically design optimal neural network architectures.

But here's the problem! In the actual game, we often need to play on different devices, and each device has different performance. Some devices might only have low graphics (low-precision hardware), while we assume high graphics (full-precision hardware) when designing the strategy. This can lead to lag or frame drops during the actual game.

The method in this paper is like considering the performance of different devices when designing the strategy. This way, whether playing on high graphics or low graphics devices, the gaming experience is smooth. This method is particularly useful in tasks where rapid and autonomous decision-making is needed, like monitoring vessels from satellites.

With this approach, we can score higher (improve model accuracy and efficiency) without adding more resources (model complexity). This offers new possibilities for deploying deep learning models in resource-constrained environments.

Glossary

Neural Architecture Search (NAS)

A method for automatically designing neural network architectures by optimizing architectures in a predefined search space to meet specific task requirements.

Used in this paper to design efficient neural networks that meet deployment constraints.

Hardware-Aware Optimization

Considers hardware characteristics and limitations during optimization to select architectures suitable for specific devices.

Used in NAS to account for device-level performance metrics.

Low-Precision Training

Uses low-precision numerical formats (e.g., FP16) during training to reduce computational cost and memory usage.

Integrated into NAS in this paper to enhance deployment-time numerical robustness.

FP16 (Half-Precision Floating Point)

A 16-bit floating-point format used to reduce computation and storage demands, especially on resource-constrained hardware.

Used in this paper to simulate deployment-time numerical constraints.

Evolutionary Algorithm

An optimization algorithm based on natural selection and genetic mechanisms, used to find optimal solutions in a search space.

Used in NAS to generate and evaluate candidate architectures.

Device-in-the-Loop Evaluation

Directly evaluates candidate architectures' performance on the target hardware to capture hardware-specific behavior.

Used in NAS to measure throughput and latency of candidate architectures.

Post-Training Quantization (PTQ)

Converts a trained model to a low-precision format after training to reduce computational cost.

Compared with deployment-aligned low-precision training in this paper.

Vessel Segmentation

An image processing task aimed at identifying and segmenting vessels from images.

Used as an experimental task to evaluate the effectiveness of the method.

Earth Observation

The process of acquiring information about Earth's surface from space using satellites or other devices.

One of the application scenarios in this paper.

Intel Movidius Myriad X

A low-power visual processing unit (VPU) used for neural network inference on edge devices.

Used as the target hardware for experimental evaluation in this paper.

Open Questions Unanswered questions from this research

1 Although the method performs well on specific tasks and hardware, its applicability to other low-precision formats and more complex numerical constraints needs further validation. More experiments on different hardware platforms and tasks are needed to assess its generality and robustness.
2 Low-precision training in large-scale search spaces might lead to high computational costs. How to reduce computational costs while maintaining optimization-deployment consistency is a question worth exploring. This requires developing more efficient optimization algorithms and strategies.
3 The method primarily optimizes for FP16 numerical constraints and may not be applicable to other low-precision formats. Exploring the impact of numerical constraints under other low-precision formats on model performance is an important direction for future research.
4 In practical applications, the hardware characteristics of edge devices may affect model performance. How to better simulate and consider these hardware characteristics during NAS to improve deployment performance is a question worth exploring.
5 While the method performs well in vessel segmentation tasks, its applicability and performance in other tasks need further validation. More experiments in different application scenarios are needed to assess its generality and effectiveness.

Applications

Immediate Applications

Vessel Monitoring in Space Missions

Improve vessel segmentation model performance on edge devices using deployment-aligned low-precision training, suitable for space missions requiring rapid decision-making.

Image Processing on Low-Power Edge Devices

Deploy efficient image processing models on low-power, low-memory edge devices to improve processing speed and accuracy for various real-time applications.

Object Detection in Intelligent Transportation Systems

Apply deployment-aligned low-precision training in intelligent transportation systems to improve object detection model performance on edge devices for more efficient traffic management.

Long-term Vision

Autonomous Space Exploration

Apply efficient edge AI in space exploration missions to achieve more autonomous decision-making and operations, reducing reliance on ground stations.

Edge Computing in Smart Cities

Apply efficient edge AI in smart cities to achieve more intelligent city management and services, such as real-time monitoring and emergency response.

Abstract

Designing deep networks that meet strict latency and accuracy constraints on edge accelerators increasingly relies on hardware-aware optimization, including neural architecture search (NAS) guided by device-level metrics. Yet most hardware-aware NAS pipelines still optimize architectures under full-precision assumptions and apply low-precision adaptation only after the search, leading to a mismatch between optimization-time behavior and deployment-time execution on low-precision hardware that can substantially degrade accuracy. We address this limitation by integrating deployment-aligned low-precision training directly into hardware-aware NAS. Candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation, enabling joint optimization of architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. We evaluate the proposed framework on vessel segmentation for spaceborne maritime monitoring, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU). While post-training precision conversion reduces on-device performance from 0.85 to 0.78 mIoU, deployment-aligned low-precision training achieves 0.826 mIoU on-device for the same architecture (95,791 parameters), recovering approximately two-thirds of deployment-induced accuracy gap without increasing model complexity. These results demonstrate that incorporating deployment-consistent numerical constraints into hardware-aware NAS substantially improves robustness and alignment between optimization and deployment for resource-constrained edge Artificial Intelligence (AI).

cs.CV cs.AI cs.ET cs.LG cs.NE

References (20)

Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review

Ayoub Benali Amjoud, M. AMROUCH

2023 247 citations

Overview of ESA’s Earth Observation upcoming small satellites missions

M. Pastena, M. Tossaint, A. Regan et al.

2020 10 citations

Optimizing deep learning models for on-orbit deployment through neural architecture search

Roberto del Prete, P. Thind, Andrea Mazzeo et al.

2025 5 citations

Post training 4-bit quantization of convolutional networks for rapid-deployment

Ron Banner, Yury Nahshan, Daniel Soudry

2018 722 citations

NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search

Arber Zela, Julien N. Siems, F. Hutter

2020 156 citations View Analysis →

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

Han Cai, Ligeng Zhu, Song Han

2018 2038 citations View Analysis →

Pol-NAS: A Neural Architecture Search Method With Feature Selection for PolSAR Image Classification

Guangyuan Liu, Yangyang Li, Yanqiao Chen et al.

2022 13 citations

Development and implementation of the Φsat-2 mission

N. Melega, N. Longépé, A. Paskeviciute et al.

2025 3 citations

DARTS: Differentiable Architecture Search

Hanxiao Liu, K. Simonyan, Yiming Yang

2018 4882 citations View Analysis →

Designing a Classifier for Active Fire Detection From Multispectral Satellite Imagery Using Neural Architecture Search

Amber Cassimon, Philippe Reiter, Siegfried Mercelis et al.

2024 5 citations View Analysis →

Neural gradients are near-lognormal: improved quantized and sparse training

Brian Chmiel, Liad Ben-Uri, Moran Shkolnik et al.

2020 64 citations

AutoML-Based Neural Architecture Search for Object Recognition in Satellite Imagery

Povilas Gudžius, O. Kurasova, Vytenis Darulis et al.

2022 14 citations

NASEO: Neural Architecture Search for Earth Observation Onboard Processing

P. Thind, Roberto del Prete, Matthew Whitley et al.

2025 1 citations

A White Paper on Neural Network Quantization

Markus Nagel, Marios Fournarakis, Rana Ali Amjad et al.

2021 823 citations View Analysis →

DPP-Net: Device-aware Progressive Search for Pareto-optimal Neural Architectures

Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan et al.

2018 196 citations View Analysis →

A comprehensive survey on model compression and acceleration

T. Choudhary, V. Mishra, Anurag Goswami et al.

2020 506 citations

On-board Volcanic Eruption Detection through CNNs and Satellite Multispectral Imagery

M. P. D. Rosso, A. Sebastianelli, D. Spiller et al.

2021 48 citations View Analysis →

Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges

Yu Cheng, Duo Wang, Pan Zhou et al.

2018 664 citations

Towards onboard thermal hotspots segmentation with raw multispectral satellite imagery

Cristopher Castro Traba, David Rijlaarsdam, Jian Guo et al.

2026 2 citations

Neural architecture search: A contemporary literature review for computer vision applications

Matt Poyser, T. Breckon

2023 61 citations

Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Neural Architecture Search (NAS)

Hardware-Aware Optimization

Low-Precision Training

FP16 (Half-Precision Floating Point)

Evolutionary Algorithm

Device-in-the-Loop Evaluation

Post-Training Quantization (PTQ)

Vessel Segmentation

Earth Observation

Intel Movidius Myriad X

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Vessel Monitoring in Space Missions

Image Processing on Low-Power Edge Devices

Object Detection in Intelligent Transportation Systems

Long-term Vision

Autonomous Space Exploration

Edge Computing in Smart Cities

Abstract

References (20)

Related Papers

DeepTaxon: An Interpretable Retrieval-Augmented Multimodal Framework for Unified Species Identification and Discovery

Learn&Drop: Fast Learning of CNNs based on Layer Dropping

SS3D: End2End Self-Supervised 3D from Web Videos

PASR: Pose-Aware 3D Shape Retrieval from Occluded Single Views

A Non-Invasive Alternative to RFID: Self-Sufficient 3D Identification of Group-Housed Livestock

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges