LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction

TL;DR

LoopCTR enhances CTR prediction through loop scaling, significantly reducing computational costs.

cs.IR πŸ”΄ Advanced 2026-04-21 30 views
Jiakai Tang Runfeng Zhang Weiqiu Wang Yifei Liu Chuan Wang Xu Chen Yeqiu Yang Jian Wu Yuning Jiang Bo Zheng
CTR prediction loop scaling Transformer parameter sharing industrial application

Key Findings

Methodology

LoopCTR introduces a loop scaling paradigm that increases training-time computation through recursive reuse of shared model layers, decoupling computation from parameter growth. The method employs an enhanced sandwich architecture with Hyper-Connected Residuals and Mixture-of-Experts, and process supervision at every loop depth to encode multi-loop benefits into shared parameters. This enables a train-multi-loop, infer-zero-loop strategy.

Key Results

  • LoopCTR achieved state-of-the-art performance on three public benchmarks and one industrial dataset. On the Amazon dataset, LoopCTR(1/3) achieved an AUC of 0.8728, surpassing OneTrans's 0.8689.
  • On the KuaiVideo dataset, LoopCTR(1/3) achieved an AUC of 0.7450, outperforming DIN by 0.0020.
  • Oracle analysis revealed that models trained with fewer loops exhibit higher oracle ceilings, indicating significant potential for adaptive inference.

Significance

LoopCTR is significant for both academia and industry as it addresses the computational and storage overhead issues of traditional CTR models by introducing a loop scaling paradigm. This method not only improves prediction accuracy but also significantly reduces inference costs, making it more feasible for industrial deployment. Its innovative architecture opens a new scaling dimension for CTR prediction, with broad application prospects.

Technical Contribution

LoopCTR's technical contributions include its unique loop scaling paradigm, which differs from existing methods that scale models by adding parameters. By recursively reusing shared layers, LoopCTR achieves computational scaling without increasing parameter count. Additionally, it introduces Hyper-Connected Residuals and Mixture-of-Experts to enhance model expressiveness and internalizes multi-loop benefits during training through process supervision.

Novelty

LoopCTR is the first to introduce a loop scaling paradigm in CTR prediction, offering a more efficient computational scaling method compared to existing parameter stacking approaches. Its core innovation lies in decoupling computation from parameter growth through shared parameters, significantly reducing inference costs.

Limitations

  • LoopCTR may still require multi-loop inference in complex scenarios to achieve optimal performance, potentially increasing inference time.
  • The benefits of loop scaling may not be as significant on certain datasets.
  • For extremely large datasets, the training time may still be considerable.

Future Work

Future research directions include developing adaptive inference strategies to dynamically allocate loop depth per sample. Additionally, integrating system-level optimizations such as FlashAttention and mixed-precision training/inference to further improve training and inference efficiency is worth exploring.

AI Executive Summary

In modern recommendation systems, click-through rate (CTR) prediction is a crucial task. However, with the success of Transformer architectures in natural language processing, CTR prediction has also begun to adopt this architecture. Traditional methods of scaling models by adding parameters have resulted in significant computational and storage overhead, limiting their deployment in industrial environments.

To address this issue, this paper proposes LoopCTR, a novel loop scaling paradigm. LoopCTR increases training-time computation through recursive reuse of shared model layers, achieving computational scaling without increasing parameters. The method employs an enhanced sandwich architecture with Hyper-Connected Residuals and Mixture-of-Experts, and process supervision at every loop depth to encode multi-loop benefits into shared parameters.

The core technical principle of LoopCTR lies in its loop scaling paradigm. By sharing parameters, LoopCTR decouples computation from parameter growth. This method not only improves model prediction accuracy but also significantly reduces inference costs, making it more feasible for industrial deployment. Its innovative architecture opens a new scaling dimension for CTR prediction.

Experimental results show that LoopCTR achieves state-of-the-art performance on three public benchmarks and one industrial dataset. On the Amazon dataset, LoopCTR(1/3) achieved an AUC of 0.8728, surpassing OneTrans's 0.8689. On the KuaiVideo dataset, LoopCTR(1/3) achieved an AUC of 0.7450, outperforming DIN by 0.0020. Oracle analysis revealed that models trained with fewer loops exhibit higher oracle ceilings, indicating significant potential for adaptive inference.

The broad application prospects of LoopCTR lie in its ability to achieve computational scaling without increasing parameters, which is particularly important for industrial applications requiring efficient inference. However, the method may still require multi-loop inference in complex scenarios to achieve optimal performance, potentially increasing inference time. Future research directions include developing adaptive inference strategies to dynamically allocate loop depth per sample. Additionally, integrating system-level optimizations such as FlashAttention and mixed-precision training/inference to further improve training and inference efficiency is worth exploring.

Deep Analysis

Background

Click-through rate (CTR) prediction is a vital task in recommendation systems. With the success of Transformer architectures in natural language processing, CTR prediction has also begun to adopt this architecture. Traditional CTR models typically improve performance by adding parameters, but this results in significant computational and storage overhead, limiting their deployment in industrial environments. Recently, more research has begun exploring scaling phenomena in the recommendation domain, hoping to replicate the remarkable scaling laws observed in large language models. However, these methods often come with increased parameters, data volume, or computation.

Core Problem

The core problem in CTR prediction is how to improve model performance without increasing parameters. Traditional methods of scaling models by adding parameters have resulted in significant computational and storage overhead, limiting their deployment in industrial environments. Additionally, CTR prediction models need to ensure high accuracy while meeting real-time requirements in industrial applications, making the problem more complex and challenging.

Innovation

The core innovations of LoopCTR include its loop scaling paradigm, which increases training-time computation through recursive reuse of shared model layers, achieving computational scaling without increasing parameters. β€’ The method employs an enhanced sandwich architecture with Hyper-Connected Residuals and Mixture-of-Experts to enhance model expressiveness. β€’ Process supervision at every loop depth encodes multi-loop benefits into shared parameters, enabling a train-multi-loop, infer-zero-loop strategy. β€’ This innovative architecture opens a new scaling dimension for CTR prediction, significantly reducing inference costs.

Methodology

The methodology of LoopCTR is detailed as follows: β€’ Sandwich Architecture: LoopCTR employs an enhanced sandwich architecture with Hyper-Connected Residuals and Mixture-of-Experts. β€’ Loop Scaling: Recursive reuse of shared model layers achieves computational scaling. β€’ Process Supervision: At every loop depth, process supervision encodes multi-loop benefits into shared parameters. β€’ Zero-Loop Inference: During inference, a single forward pass already outperforms all baseline models.

Experiments

The experimental design includes three public benchmark datasets and one industrial dataset, namely Amazon, TaobaoAds, KuaiVideo, and InHouse. β€’ Baseline models include traditional methods such as DLRM, DIN, DCNv2, Wukong, and Transformer-based methods like OneTrans, HSTU, MTGR. β€’ Evaluation metrics are AUC and NE, and ablation studies are conducted to analyze the contribution of each component.

Results

Experimental results show that LoopCTR achieves state-of-the-art performance on all datasets. β€’ On the Amazon dataset, LoopCTR(1/3) achieved an AUC of 0.8728, surpassing OneTrans's 0.8689. β€’ On the KuaiVideo dataset, LoopCTR(1/3) achieved an AUC of 0.7450, outperforming DIN by 0.0020. β€’ Oracle analysis revealed that models trained with fewer loops exhibit higher oracle ceilings, indicating significant potential for adaptive inference.

Applications

Application scenarios of LoopCTR include: β€’ Industrial Recommendation Systems: By reducing inference costs, LoopCTR improves the real-time performance and accuracy of recommendation systems, suitable for e-commerce platforms and content recommendation. β€’ Online Advertising: LoopCTR can improve the accuracy of ad click-through rate prediction without increasing computational resources, enhancing ad delivery effectiveness. β€’ Personalized Recommendation: LoopCTR can achieve efficient personalized recommendation on large-scale datasets, applicable to music, video, and other content platforms.

Limitations & Outlook

The limitations of LoopCTR include: β€’ It may still require multi-loop inference in complex scenarios to achieve optimal performance, potentially increasing inference time. β€’ For extremely large datasets, the training time may still be considerable. β€’ Future research directions include developing adaptive inference strategies to dynamically allocate loop depth per sample.

Plain Language Accessible to non-experts

Imagine you're in a kitchen cooking. The traditional way is to use different pots and tools for each dish, similar to traditional CTR models that improve performance by adding parameters. LoopCTR, however, is like a multifunctional pot that you can use to cook different dishes by simply adjusting the settings inside the pot. This method not only saves space but also improves efficiency. By sharing pots, LoopCTR decouples computation from parameter growth. This method not only improves model prediction accuracy but also significantly reduces inference costs, making it more feasible for industrial deployment. Its innovative architecture opens a new scaling dimension for CTR prediction.

ELI14 Explained like you're 14

Hey there! Did you know that when you shop online, websites recommend products you might like based on your browsing history? That's called click-through rate prediction! Traditional methods are like using a new tool for every task, which isn't very efficient. But LoopCTR is like a super-smart toolbox where you only need one tool to do everything! This not only saves time but also improves accuracy. Imagine using one tool to finish all your homework, isn't that cool? That's the power of LoopCTR! It makes recommendation systems smarter and more efficient, helping us find what we like faster when shopping online.

Glossary

Loop Scaling

A method that increases training-time computation through recursive reuse of shared model layers, decoupling computation from parameter growth.

In LoopCTR, loop scaling achieves computational scaling without increasing parameter count.

Sandwich Architecture

An enhanced architectural design that combines Hyper-Connected Residuals and Mixture-of-Experts to improve model expressiveness.

LoopCTR employs a sandwich architecture to enhance model performance.

Hyper-Connected Residuals

An enhanced residual connection mechanism that improves computational flow through input-dependent adaptive fusion.

In LoopCTR, Hyper-Connected Residuals are used to enhance the expressiveness of loop blocks.

Mixture-of-Experts

A method that expands parameter capacity by routing each token to a subset of experts.

LoopCTR uses Mixture-of-Experts to enhance model expressiveness.

Process Supervision

Supervision at every loop depth that encodes multi-loop benefits into shared parameters.

LoopCTR uses process supervision to enable a train-multi-loop, infer-zero-loop strategy.

AUC (Area Under Curve)

A metric used to evaluate the performance of binary classification models, representing the model's performance at different thresholds.

AUC is used as a primary evaluation metric in LoopCTR experiments.

Zero-Loop Inference

A strategy where a single forward pass during inference already outperforms all baseline models.

LoopCTR significantly reduces inference costs through zero-loop inference.

Oracle Analysis

A method to evaluate the potential performance ceiling of a model by comparing the best realized result with the Oracle result.

Oracle analysis in LoopCTR experiments reveals the model's potential performance ceiling.

Parameter Sharing

A method that reduces parameter count by sharing model layers, achieving more efficient computation.

LoopCTR achieves computational scaling by parameter sharing.

Recommender System

A system that recommends personalized content based on users' historical behavior and preferences.

LoopCTR is applied in recommender systems to improve click-through rate prediction accuracy.

Open Questions Unanswered questions from this research

  • 1 LoopCTR may still require multi-loop inference in complex scenarios to achieve optimal performance, potentially increasing inference time. How to improve performance in complex scenarios without increasing inference time remains an open question.
  • 2 For extremely large datasets, the training time may still be considerable. How to shorten training time while ensuring model performance is a direction worth exploring.
  • 3 On certain datasets, the benefits of loop scaling may not be as significant as expected. How to optimize the effects of loop scaling on different datasets is a question that needs further research.
  • 4 The development of adaptive inference strategies remains an open question. How to dynamically allocate loop depth per sample for more efficient inference is a direction worth exploring.
  • 5 Although LoopCTR performs well on multiple datasets, its application potential in other fields still needs further verification. How to apply LoopCTR to other fields to verify its generality is an open question.

Applications

Immediate Applications

Industrial Recommendation Systems

LoopCTR can improve the real-time performance and accuracy of recommendation systems by reducing inference costs, suitable for e-commerce platforms and content recommendation.

Online Advertising

LoopCTR can improve the accuracy of ad click-through rate prediction without increasing computational resources, enhancing ad delivery effectiveness.

Personalized Recommendation

LoopCTR can achieve efficient personalized recommendation on large-scale datasets, applicable to music, video, and other content platforms.

Long-term Vision

Adaptive Inference Strategies

Develop adaptive inference strategies to dynamically allocate loop depth per sample for more efficient inference.

System-Level Optimization

Integrate system-level optimizations such as FlashAttention and mixed-precision training/inference to further improve training and inference efficiency.

Abstract

Scaling Transformer-based click-through rate (CTR) models by stacking more parameters brings growing computational and storage overhead, creating a widening gap between scaling ambitions and the stringent industrial deployment constraints. We propose LoopCTR, which introduces a loop scaling paradigm that increases training-time computation through recursive reuse of shared model layers, decoupling computation from parameter growth. LoopCTR adopts a sandwich architecture enhanced with Hyper-Connected Residuals and Mixture-of-Experts, and employs process supervision at every loop depth to encode multi-loop benefits into the shared parameters. This enables a train-multi-loop, infer-zero-loop strategy where a single forward pass without any loop already outperforms all baselines. Experiments on three public benchmarks and one industrial dataset demonstrate state-of-the-art performance. Oracle analysis further reveals 0.02--0.04 AUC of untapped headroom, with models trained with fewer loops exhibiting higher oracle ceilings, pointing to a promising frontier for adaptive inference.

cs.IR

References (20)

DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems

Ruoxi Wang, Rakesh Shivanna, D. Cheng et al.

2020 783 citations ⭐ Influential View Analysis β†’

AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks

Weiping Song, Chence Shi, Zhiping Xiao et al.

2018 1038 citations ⭐ Influential View Analysis β†’

Behavior sequence transformer for e-commerce recommendation in Alibaba

Qiwei Chen, Huan Zhao, Wei Li et al.

2019 473 citations ⭐ Influential View Analysis β†’

Visualizing the Loss Landscape of Neural Nets

Hao Li, Zheng Xu, Gavin Taylor et al.

2017 2260 citations ⭐ Influential View Analysis β†’

Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation

Tianli Zhang, Mengqi Xue, Jiangtao Zhang et al.

2023 13 citations ⭐ Influential View Analysis β†’

Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems

Huan Gui, Ruoxi Wang, Ke Yin et al.

2023 28 citations ⭐ Influential View Analysis β†’

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Jiaqi Zhai, Lucy Liao, Xing Liu et al.

2024 192 citations ⭐ Influential View Analysis β†’

Deep Interest Network for Click-Through Rate Prediction

Guorui Zhou, Cheng-Ning Song, Xiaoqiang Zhu et al.

2017 2170 citations ⭐ Influential View Analysis β†’

Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation

V. Lai, Huiyuan Chen, Chin-Chia Michael Yeh et al.

2023 11 citations ⭐ Influential View Analysis β†’

Decoupled Weight Decay Regularization

I. Loshchilov, F. Hutter

2017 32815 citations ⭐ Influential

HHFT: Hierarchical Heterogeneous Feature Transformer for Recommendation Systems

Liren Yu, Wenming Zhang, Silu Zhou et al.

2025 9 citations ⭐ Influential View Analysis β†’

Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models

Clara Na, Sanket Vaibhav Mehta, Emma Strubell

2022 25 citations ⭐ Influential View Analysis β†’

Visualizing the loss landscape of Self-supervised Vision Transformer

Youngwan Lee, Jeffrey Willette, Jonghee Kim et al.

2024 2 citations ⭐ Influential View Analysis β†’

Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation

Jiakai Tang, Sunhao Dai, Teng Shi et al.

2025 59 citations View Analysis β†’

GPT-4 Technical Report

OpenAI Josh Achiam, Steven Adler, S. Agarwal et al.

2023 23707 citations View Analysis β†’

OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender

Zhaoqi Zhang, Haolei Pei, Jun Guo et al.

2025 35 citations View Analysis β†’

Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts

Yeskendir Koishekenov, Aldo Lipani, Nicola Cancedda

2025 8 citations View Analysis β†’

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Junnan Li, Dongxu Li, S. Savarese et al.

2023 7701 citations View Analysis β†’

TokenMixer-Large: Scaling Up Large Ranking Models in Industrial Recommenders

Yuchen Jiang, Jie Zhu, Xintian Han et al.

2026 4 citations View Analysis β†’

mHC: Manifold-Constrained Hyper-Connections

Zhenda Xie, Yixuan Wei, Huan Cao et al.

2025 31 citations View Analysis β†’