Dual Pose-Graph Semantic Localization for Vision-Based Autonomous Drone Racing

TL;DR

Proposed dual pose-graph semantic localization reduces ATE by 56% to 74% on TII-RATM dataset.

cs.RO 🔴 Advanced 2026-04-16 38 views

David Perez-Saura Miguel Fernandez-Cortizas Alvaro J. Gaona Pascual Campoy

drone visual SLAM semantic localization autonomous flight pose graph

Key Findings

Methodology

This study introduces a dual pose-graph architecture that fuses odometry with semantic detections for robust localization. A temporary graph accumulates multiple gate observations between keyframes and optimizes them into a single refined constraint per landmark, which is then promoted to a persistent main graph. This design preserves the information richness of frequent detections while preventing graph growth from degrading real-time performance. The system is designed to be sensor-agnostic, and in this work, it is validated using monocular visual-inertial odometry and visual gate detections.

Key Results

Experimental evaluation on the TII-RATM dataset shows a 56% to 74% reduction in Absolute Trajectory Error (ATE) compared to standalone VIO.
An ablation study confirms that the dual-graph architecture achieves 10% to 12% higher accuracy than a single-graph baseline at identical computational cost.
Deployment in the A2RL competition demonstrated that the system performs real-time onboard localization during flight, reducing the drift of the odometry baseline by up to 4.2 meters.

Significance

This research holds significant implications for both academia and industry. It addresses the limitations of existing visual SLAM systems under high-speed flight and aggressive maneuvers, particularly in drone racing. By introducing a dual pose-graph architecture, the study not only enhances localization accuracy but also significantly reduces computational costs. This method opens new possibilities for autonomous drone racing, advancing the application of drone technology in complex dynamic environments.

Technical Contribution

The technical contributions include the introduction of a novel dual pose-graph architecture that improves localization accuracy without increasing computational cost compared to existing single-graph methods. By compressing multiple observations into a single constraint, the method addresses the issue of graph growth leading to degraded real-time performance. Additionally, the sensor-agnostic design of the system allows for broad applicability.

Novelty

This study is the first to introduce a dual pose-graph architecture in drone racing, tightly coupling semantic detections with odometry for drift-corrected localization. Compared to existing methods, this approach significantly improves localization accuracy without increasing computational cost, demonstrating advantages in high-dynamic environments.

Limitations

Due to keyframe-based integration, the dual-graph architecture's optimization frequency is lower, which may reduce correction responsiveness under very high update rates.
The system's performance may be affected under extreme lighting conditions, as direct methods are sensitive to photometric changes.
While the system is designed to be sensor-agnostic, its performance under different sensor configurations still needs further validation.

Future Work

AI Executive Summary

In drone racing, precise real-time localization is crucial for achieving autonomous flight. However, existing visual SLAM systems often perform poorly under high-speed flight and aggressive maneuvers, primarily due to motion blur and feature instability. To address this issue, researchers have proposed a dual pose-graph architecture that fuses odometry with semantic detections for robust localization. This method accumulates multiple gate observations between keyframes and optimizes them into a single refined constraint per landmark, which is then promoted to a persistent main graph. This approach preserves the information richness of frequent detections while preventing graph growth from degrading real-time performance.

In experiments, the researchers validated the effectiveness of this method using the TII-RATM dataset. The results show a 56% to 74% reduction in Absolute Trajectory Error (ATE) compared to standalone visual-inertial odometry (VIO). Furthermore, an ablation study indicates that the dual-graph architecture achieves 10% to 12% higher accuracy than a single-graph baseline at identical computational cost. These findings highlight the potential of this method in autonomous drone racing.

The technical contributions of this study include the introduction of a novel dual pose-graph architecture that improves localization accuracy without increasing computational cost compared to existing single-graph methods. By compressing multiple observations into a single constraint, the method addresses the issue of graph growth leading to degraded real-time performance. Additionally, the sensor-agnostic design of the system allows for broad applicability.

However, the method also has some limitations. Due to keyframe-based integration, the dual-graph architecture's optimization frequency is lower, which may reduce correction responsiveness under very high update rates. Additionally, direct methods are sensitive to photometric changes, which may affect the system's performance under extreme lighting conditions. Nevertheless, this study opens new possibilities for the application of drone technology in complex dynamic environments.

Future research directions include validating the sensor-agnostic capabilities of the framework and exploring integration with additional odometry and detection sources. Additionally, the study can be extended to other robotics applications, such as autonomous driving and indoor navigation, to verify its applicability and robustness in different environments. Overall, this research provides new possibilities for autonomous drone racing, advancing the application of drone technology in complex dynamic environments.

Deep Analysis

Background

Autonomous drone racing has emerged as a challenging benchmark, pushing the limits of onboard sensing and computation. Competitions require drones to navigate through sequences of racing gates at high speed, often relying on a single monocular camera for perception. Under these conditions, accurate and robust localization is critical for trajectory planning and gate traversal. However, existing visual SLAM systems often struggle under these conditions due to motion blur and feature instability. Feature-based SLAM systems like ORB-SLAM3 can mitigate drift through loop closure, but they rely on stable visual features that degrade in fast motion. Visual-inertial odometry (VIO) approaches like VINS-Mono offer improved robustness through IMU fusion, but IMU data may not always be available or reliable in the extreme dynamics encountered in racing. Direct methods are sensitive to photometric changes, which may affect performance under varying lighting conditions.

Core Problem

The core problem in drone racing is achieving precise real-time localization under high-speed and aggressive maneuvering conditions. Existing visual SLAM systems often perform poorly under these conditions due to motion blur and feature instability. Additionally, these systems do not exploit the structured nature of racing environments. Racing tracks provide a strong semantic prior: gates are distinctive, repeated landmarks whose positions define the track layout. Incorporating gate detections into the localization pipeline can provide drift-correcting constraints analogous to loop closures. However, naively adding every gate observation as a new edge in a pose graph rapidly inflates the graph, increasing optimization time and undermining real-time performance.

Innovation

The core innovations of this study include the introduction of a dual pose-graph architecture to address the limitations of existing methods in drone racing. • This architecture accumulates multiple gate observations between keyframes and optimizes them into a single refined constraint per landmark, which is then promoted to a persistent main graph, preserving the information richness of frequent detections. • By doing so, the system improves localization accuracy without increasing computational cost. • Additionally, the sensor-agnostic design of the system allows for broad applicability, enabling integration with different odometry and detection sources.

Methodology

Method details: • Input: The system takes as input odometry estimates from any source (visual odometry, VIO, or other) providing relative pose constraints and detections of semantic objects, providing bearing and range measurements to landmarks. • Graph representation: The method is formulated as a factor graph composed of two types of nodes and two types of edges. • Temporary graph: Between main graph keyframes, a temporary graph accumulates detection edges at high frequency. • Main graph: The main graph maintains a compact, long-lived representation suitable for incremental optimization and loop closure. • Optimization: The optimal trajectory and landmark estimates are obtained by minimizing the sum of squared Mahalanobis distance errors over all edges.

Experiments

Experimental design: The researchers validated the effectiveness of the method using the TII-RATM dataset, which provides high-resolution images, IMU data, and motion-capture ground-truth poses. The experiments followed a standard protocol, using a downsampled image stream and the built-in online calibration system for camera calibration. The researchers also validated the system on sequences collected during the A2RL drone racing competition, each consisting of two laps through 11 gates. The experiments evaluated Absolute Trajectory Error (ATE) and graph optimization time to assess the trade-off between accuracy and computational cost.

Results

Results analysis: The experimental results show that the method reduces Absolute Trajectory Error (ATE) by 56% to 74% compared to standalone visual-inertial odometry (VIO) on the TII-RATM dataset. An ablation study indicates that the dual-graph architecture achieves 10% to 12% higher accuracy than a single-graph baseline at identical computational cost. Deployment in the A2RL competition demonstrated that the system performs real-time onboard localization during flight, reducing the drift of the odometry baseline by up to 4.2 meters. These findings highlight the potential of this method in autonomous drone racing.

Applications

Application scenarios: The method can be directly applied in autonomous drone racing to improve localization accuracy and reduce drift. Additionally, the sensor-agnostic design of the system allows for broad applicability, enabling integration with different odometry and detection sources. In the future, the method can also be extended to other robotics applications, such as autonomous driving and indoor navigation, to verify its applicability and robustness in different environments.

Limitations & Outlook

Limitations & outlook: Despite the method's excellent performance in drone racing, the dual-graph architecture's optimization frequency is lower due to keyframe-based integration, which may reduce correction responsiveness under very high update rates. Additionally, direct methods are sensitive to photometric changes, which may affect the system's performance under extreme lighting conditions. Future research directions include validating the sensor-agnostic capabilities of the framework and exploring integration with additional odometry and detection sources.

Plain Language Accessible to non-experts

Imagine you're in a kitchen preparing a big meal. You have a limited amount of time to get everything ready. To make sure each dish is served on time, you need a plan. This plan is like the localization system in drone racing. Each step in preparing a dish is like a drone flying through gates on a track. You need to check between steps to make sure everything is going according to plan. This is like the localization checks a drone needs to perform during flight. Now, imagine you have two assistants: one prepares the ingredients, and the other does the cooking. Each assistant has their tasks, but they need to work together to complete the meal. This is like the dual pose-graph architecture with two graphs: one accumulates observations, and the other optimizes and integrates information. With this teamwork, you can ensure each dish is served on time, just like a drone can accurately localize itself on the track and pass through each gate smoothly.

ELI14 Explained like you're 14

Hey there! Imagine you're playing a super cool drone racing game. You need to fly your drone through a track, passing through colorful gates. To win the race, you need to make sure your drone flies through each gate accurately. Now, imagine the game has a super smart assistant that helps you calculate your drone's position and tells you where the next gate is. That's what scientists are doing in their research! They've invented a technology called dual pose-graph that helps drones accurately localize themselves during races. This technology is like the assistant in the game, helping the drone stay on the right path and avoid going off track. Isn't that cool? Next time you play a game, you can try this method to make your drone fly faster and more accurately!

Glossary

Dual Pose-Graph

A localization architecture for drone racing that combines odometry with semantic detections for robust localization.

Used in the paper to describe the localization method for drone racing.

Absolute Trajectory Error (ATE)

A metric that measures the difference between estimated and true trajectories, often used to evaluate localization accuracy.

Used in experiments to assess localization accuracy.

Visual-Inertial Odometry (VIO)

A localization method that combines visual and inertial measurement unit (IMU) data to provide more robust state estimation.

Used as a baseline method for comparison in experiments.

Keyframe

A frame representing important positions in SLAM systems, often used for optimization and loop closure.

Used in the dual pose-graph architecture to trigger optimization of the temporary graph.

Ablation Study

An evaluation method that assesses the impact of removing or modifying certain components of a system on overall performance.

Used to validate the effectiveness of the dual-graph architecture.

Sensor-Agnostic

A system design that is independent of specific sensor types, allowing compatibility with various sensor configurations.

Describes the applicability of the system in the methodology.

Motion Blur

Image blurring caused by rapid camera movement during exposure, affecting the performance of visual SLAM.

Described as a limitation of existing methods in the problem statement.

Semantic Detection

The process of identifying and locating specific objects (e.g., racing gates) to provide additional localization constraints.

Used in the dual pose-graph architecture to enhance localization accuracy.

Graph Optimization

The process of optimizing nodes by minimizing errors in graph edges, commonly used in SLAM systems.

Used in the dual pose-graph architecture to obtain optimal trajectory and landmark estimates.

Drift Correction

The process of reducing accumulated errors in a localization system by introducing additional constraints.

Describes the advantage of the dual pose-graph architecture in the methodology.

Open Questions Unanswered questions from this research

1 How to improve system robustness under extreme lighting conditions? Existing direct methods are sensitive to photometric changes, which may affect performance under varying lighting conditions. New methods are needed to enhance system adaptability in diverse environments.
2 How to validate system performance under different sensor configurations? While the system is designed to be sensor-agnostic, its performance under different sensor configurations still needs further validation. This requires more experiments and data to support.
3 How to increase the optimization frequency of the dual-graph architecture? Due to keyframe-based integration, the dual-graph architecture's optimization frequency is lower, which may reduce correction responsiveness under very high update rates. New methods need to be explored to increase optimization frequency.
4 How to verify the method's effectiveness in applications beyond drone racing? While the method performs well in drone racing, its applicability and robustness in other robotics applications still need to be verified.
5 How to further reduce computational cost? Although the method improves localization accuracy without increasing computational cost, new methods need to be explored to further reduce computational cost, especially on resource-constrained platforms.

Applications

Immediate Applications

Drone Racing

The method can be directly applied in autonomous drone racing to improve localization accuracy and reduce drift, helping drones fly accurately on the track.

Autonomous Driving

By integrating semantic detection and odometry, the method can be used for vehicle localization in autonomous driving, especially in complex urban environments.

Indoor Navigation

The method can be used for indoor robot navigation, utilizing structured features of the environment (e.g., doors and walls) to enhance localization accuracy.

Long-term Vision

Smart Cities

In smart cities, the method can be used for localization and navigation of drones and autonomous vehicles, supporting intelligent management of urban infrastructure.

Disaster Response

In disaster response, the method can be used for drone localization and navigation, aiding search and rescue teams in complex environments.

Abstract

Autonomous drone racing demands robust real-time localization under extreme conditions: high-speed flight, aggressive maneuvers, and payload-constrained platforms that often rely on a single camera for perception. Existing visual SLAM systems, while effective in general scenarios, struggle with motion blur and feature instability inherent to racing dynamics, and do not exploit the structured nature of racing environments. In this work, we present a dual pose-graph architecture that fuses odometry with semantic detections for robust localization. A temporary graph accumulates multiple gate observations between keyframes and optimizes them into a single refined constraint per landmark, which is then promoted to a persistent main graph. This design preserves the information richness of frequent detections while preventing graph growth from degrading real-time performance. The system is designed to be sensor-agnostic, although in this work we validate it using monocular visual-inertial odometry and visual gate detections. Experimental evaluation on the TII-RATM dataset shows a 56% to 74% reduction in ATE compared to standalone VIO, while an ablation study confirms that the dual-graph architecture achieves 10% to 12% higher accuracy than a single-graph baseline at identical computational cost. Deployment in the A2RL competition demonstrated that the system performs real-time onboard localization during flight, reducing the drift of the odometry baseline by up to 4.2 m per lap.

cs.RO

References (17)

Situational Graphs for Robot Navigation in Structured Indoor Environments

Hriday Bavle, Jose Luis Sanchez-Lopez, Muhammad Shaheer et al.

2022 66 citations View Analysis →

AlphaPilot: autonomous drone racing

Philipp Foehn, Dario Brescianini, Elia Kaufmann et al.

2020 165 citations View Analysis →

A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors

Tong Qin, Jie Pan, Shaozu Cao et al.

2019 397 citations View Analysis →

OpenVINS: A Research Platform for Visual-Inertial Estimation

Patrick Geneva, Kevin Eckenhoff, Woosik Lee et al.

2020 684 citations

Drift-Corrected Monocular VIO and Perception-Aware Planning for Autonomous Drone Racing

Maulana Bisyir Azhari, Donghun Han, Jeongbin You et al.

2025 3 citations View Analysis →

VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator

Tong Qin, Peiliang Li, S. Shen

2017 4156 citations View Analysis →

ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras

Raul Mur-Artal, J. D. Tardós

2016 6249 citations View Analysis →

Champion-level drone racing using deep reinforcement learning

Elia Kaufmann, L. Bauersfeld, Antonio Loquercio et al.

2023 753 citations

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM

C. Campos, Richard Elvira, J. Rodr'iguez et al.

2020 4001 citations View Analysis →

SVO: Fast semi-direct monocular visual odometry

Christian Forster, Matia Pizzoli, D. Scaramuzza

2014 2125 citations

SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat et al.

2013 993 citations

Race Against the Machine: A Fully-Annotated, Open-Design Dataset of Autonomous and Piloted High-Speed Flight

Michael Bosello, Davide Aguiari, Yvo Keuter et al.

2023 14 citations View Analysis →

DM-VIO: Delayed Marginalization Visual-Inertial Odometry

L. Stumberg, D. Cremers

2022 131 citations View Analysis →

Aerostack2: A Software Framework for Developing Multi-robot Aerial Systems

Miguel Fernández-Cortizas, Martin Molina, Pedro Arias-Perez et al.

2023 32 citations View Analysis →

G2o: A general framework for graph optimization

R. Kümmerle, G. Grisetti, Hauke Strasdat et al.

2011 2407 citations

Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback

Michael Bloesch, M. Burri, Sammy Omari et al.

2017 472 citations

Hydra: A Real-time Spatial Perception System for 3D Scene Graph Construction and Optimization

Nathan Hughes, Yun Chang, L. Carlone

2022 273 citations View Analysis →

Dual Pose-Graph Semantic Localization for Vision-Based Autonomous Drone Racing

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Dual Pose-Graph

Absolute Trajectory Error (ATE)

Visual-Inertial Odometry (VIO)

Keyframe

Ablation Study

Sensor-Agnostic

Motion Blur

Semantic Detection

Graph Optimization

Drift Correction

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Drone Racing

Autonomous Driving

Indoor Navigation

Long-term Vision

Smart Cities

Disaster Response

Abstract

References (17)

Related Papers

Passage-Aware Structural Mapping for RGB-D Visual SLAM

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

Pushing Radar Odometry Beyond the Pavement: Current Capabilities and Challenges

Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations

Computational Design and Co-Robotic Fabrication for Material Reuse in Architecture

Guiding Vector Field Generation via Score-based Diffusion Model