NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval
NanoVDR distills a 2B vision-language retriever into a 70M text-only encoder for visual document retrieval, retaining 95.1% of teacher quality.
Zhuchenyang Liu, Yao Zhang, Yu Xiao