Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI
Deployment-aligned low-precision NAS enhances spaceborne edge AI performance, achieving 0.826 mIoU.
Parampuneet Kaur Thind, Vaibhav Katturu, Giacomo Zema et al.
Deployment-aligned low-precision NAS enhances spaceborne edge AI performance, achieving 0.826 mIoU.
Parampuneet Kaur Thind, Vaibhav Katturu, Giacomo Zema et al.
DeepTaxon: An interpretable retrieval-augmented multimodal framework significantly improves species identification and discovery accuracy.
Jiawei Wang, Ming Lei, Yaning Yang et al.
Learn&Drop accelerates CNN training by layer dropping, reducing ResNet-152 forward propagation FLOPs by 83.74%.
Giorgio Cruciata, Luca Cruciata, Liliana Lo Presti et al.
SS3D achieves end-to-end self-supervised 3D estimation from monocular video using the YouTube-8M dataset.
Marwane Hariat, Gianni Franchi, David Filliat et al.
PASR achieves 81.59% Top-1 retrieval accuracy on Pix3D and 76.43% on Pascal3D datasets.
Jiaxin Shi, Guofeng Zhang, Wufei Ma et al.
Utilizing TARA framework for non-invasive 3D identification of group-housed livestock, achieving 100% accuracy.
Shiva Paudel, TsungCheng Tsai, Dongyi Wang
EV-CLIP efficiently adapts CLIP for few-shot action recognition under visual challenges using visual prompts.
Hyo Jin Jon, Longbin Jin, Eun Yi Kim
FlowAnchor stabilizes video editing signals using spatial attention and adaptive modulation for efficient multi-object scene editing.
Ze Chen, Lan Chen, Yuanhang Li et al.
Evaluates a VPP dispatch algorithm in smart distribution systems using a co-simulation framework, revealing significant impacts of communication delays.
Houchao Gan
MUA method achieves up to 2000X lower computational cost using Wavelet-guided Multi-level Spatial Factorized Blendshapes.
Heming Zhu, Guoxing Sun, Marc Habermann
SynAgent leverages Solo-to-Cooperative Agent Synergy for generalizable humanoid manipulation, significantly enhancing generalization across diverse object geometries.
Wei Yao, Haohan Ma, Hongwen Zhang et al.
MetaCloak-JPEG enhances JPEG robustness of adversarial perturbations for DreamBooth deepfake prevention, achieving 32.7 dB PSNR.
Tanjim Rahaman Fardin, S M Zunaid Alam, Mahadi Hasan Fahim et al.
OneVL achieves one-step latent reasoning and planning with vision-language explanations, surpassing explicit CoT at answer-only latency.
Jinghui Lu, Jiayi Guan, Zhijian Huang et al.
XEmbodied model enhances VLA models with 3D geometric and physical cues, improving performance across benchmarks.
Kangan Qian, ChuChu Xie, Yang Zhong et al.
LaviGen framework repurposes 3D generative models for autoregressive layout generation, achieving 19% higher physical plausibility on LayoutVLM benchmark.
Haoran Feng, Yifan Niu, Zehuan Huang et al.
This study systematically evaluates various vision-language models for country-level image geolocalization, revealing their limitations in capturing fine-grained geographic cues.
Siddhant Bharadwaj, Ashish Vashist, Fahimul Aleem et al.
CollideNet enhances time-to-collision forecasting precision by disentangling temporal patterns in multi-scale video representation learning.
Nishq Poorav Desai, Ali Etemad, Michael Greenspan
Proposed a two-stage deep learning framework using YOLOv8n and RexNet-150, achieving 95% accuracy in cheating detection.
Van-Truong Le, Le-Khanh Nguyen, Trong-Doanh Nguyen
SENSE leverages stereo vision and vision-language models to enhance open-vocabulary semantic segmentation, achieving a 2.9% precision improvement on PhraseStereo.
Thomas Campagnolo, Ezio Malis, Philippe Martinet et al.
Bi-CMPStereo framework significantly improves accuracy and generalization in event-frame asymmetric stereo matching.
Ninghui Xu, Fabio Tosi, Lihui Wang et al.