Paper Insights - AI Arxiv Paper Analysis

cs.CL 2604.24720

Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and AutoML Benchmarking

Sentiment and emotion classification of Indonesian e-commerce reviews using Multi-Task BiLSTM and AutoML, achieving high accuracy.

Hermawan Manurung, Ibrahim Al-Kahfi, Ahmad Rizqi et al.

2026-04-28 28

cs.CL 2604.24372

SeaEvo: Advancing Algorithm Discovery with Strategy Space Evolution

SeaEvo enhances algorithm discovery via strategy space evolution, achieving 21% improvement in system optimization tasks.

Sichun Luo, Yi Huang, Haochen Luo et al.

2026-04-27 25

cs.CL 2604.24040

Improving Robustness of Tabular Retrieval via Representational Stability

Improving tabular retrieval robustness via representational stability using centroid averaging to reduce format-specific variance.

Kushal Raj Bhandari, Adarsh Singh, Jianxi Gao et al.

2026-04-27 23

cs.CL 2604.22749

Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities

Study reveals representational harms in LLM narratives against Global Majority nationalities using a QA model on 500,000 stories.

Ilana Nguyen, Harini Suresh, Thema Monroe-White et al.

2026-04-25 27

cs.CL 2604.22693

CRAFT: Clustered Regression for Adaptive Filtering of Training data

CRAFT method enhances BLEU score by 2.13 points in English-Hindi translation through clustered regression for adaptive filtering.

Parthasarathi Panda, Asheswari Swain, Subhrakanta Panda

2026-04-25 35

cs.CL 2604.22678

BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering

BERAG improves retrieval-augmented generation with Bayesian ensemble, significantly enhancing knowledge-based visual question answering performance.

Jinghong Chen, Jingbiao Mei, Guangyu Yang et al.

2026-04-25 27

cs.CL 2604.21890

EVENT5Ws: A Large Dataset for Open-Domain Event Extraction from Documents

EVENT5Ws: A large dataset for open-domain event extraction, manually annotated and statistically verified.

Praval Sharma, Ashok Samal, Leen-Kiat Soh et al.

2026-04-24 30

cs.CL 2604.19716

Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views

Discovering a shared logical subspace in LLMs improves logical reasoning accuracy by up to 11% via alignment of natural-language and symbolic views.

Feihao Fang, My T. Thai, Yuanyuan Lei

2026-04-22 33

cs.CL 2604.19685

An Answer is just the Start: Related Insight Generation for Open-Ended Document-Grounded QA

InsightGen generates diverse and relevant insights to enhance open-ended document QA.

Saransh Sharma, Pritika Ramu, Aparna Garimella et al.

2026-04-22 37

cs.CL 2604.19645

The signal is the ceiling: Measurement limits of LLM-predicted experience ratings from open-ended survey text

GPT models predict experience ratings from open-ended survey text; prompt optimization improves accuracy by 2%.

Andrew Hong, Jason Potteiger, Luis E. Zapata

2026-04-22 31

cs.CL 2604.19642

Micro Language Models Enable Instant Responses

Micro Language Models (μLMs) enable instant responses by generating the first 4-8 words on-device, with cloud models completing the response.

Wen Cheng, Tuochao Chen, Karim Helwani et al.

2026-04-22 32

cs.CL 2604.19578

Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI

Study shows large language models impact AI conference peer reviews, especially in linguistic complexity and evaluative focus.

Wenqing Wu, Chengzhi Zhang, Yi Zhao et al.

2026-04-21 48

cs.CL 2604.18563

Dual Alignment Between Language Model Layers and Human Sentence Processing

The study reveals dual alignment between language model layers and human sentence processing, with early layers suited for natural reading and later layers better modeling complex syntactic processing.

Tatsuki Kuribayashi, Alex Warstadt, Yohei Oseki et al.

2026-04-21 33

cs.CL 2604.18556

GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling

GSQ achieves high-accuracy low-bit quantization using Gumbel-Softmax sampling, narrowing the accuracy gap with QTIP methods.

Alireza Dadgarnia, Soroush Tabesh, Mahdi Nikdan et al.

2026-04-21 33

cs.CL 2604.18539

Transition-Matrix Regularization for Next Dialogue Act Prediction in Counselling Conversations

Transition-matrix regularization improves next dialogue act prediction in counseling conversations, boosting macro-F1 by 9-42%.

Eric Rudolph, Philipp Steigerwald, Jens Albrecht

2026-04-21 26

cs.CL 2604.18362

ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation

ArbGraph enhances long-form RAG reliability through conflict-aware evidence arbitration, reducing hallucinations.

Qingying Niu, Yuhao Wang, Ruiyang Ren et al.

2026-04-20 28

cs.CL 2604.16270

From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text

A dual-aspect evaluation framework analyzes LLMs on Vietnamese legal text, revealing readability-accuracy trade-offs.

Van-Truong Le

2026-04-18 29

cs.CL 2604.16241

BAGEL: Benchmarking Animal Knowledge Expertise in Language Models

BAGEL benchmark evaluates language models' performance on animal knowledge using closed-book questions on taxonomy, morphology, etc.

Jiacheng Shen, Masato Hagiwara, Milad Alizadeh et al.

2026-04-18 27

cs.CL 2604.15574

Why Fine-Tuning Encourages Hallucinations and How to Fix It

Self-distillation reduces fine-tuning-induced hallucinations, lowering factual forgetting from 15% to 3%.

Guy Kaplan, Zorik Gekhman, Zhen Zhu et al.

2026-04-17 30

cs.CL 2604.15244

From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning

SpecGuard enhances multi-step reasoning efficiency and accuracy using internal signals for step-level verification.

Kiran Purohit, Ramasuri Narayanam, Soumyabrata Pal

2026-04-17 31