Bridging the Semantic-Collaborative Gap: An Asymmetric Graph Architecture for Cold-Start Item Recommendation
Proposes Shallow-RHS, an asymmetric graph architecture for cold-start content recommendation, mapping intrinsic features into a collaborative filtering space for immediate deployment.
Key Findings
Methodology
This work formulates the cold-start content recommendation as a graph completion problem on a temporal bipartite device-content graph. The proposed Shallow-RHS architecture consists of two towers: the device tower, which leverages time-sensitive message passing over historical watch data to capture collaborative signals, and the content tower, which remains shallow and encodes content solely from intrinsic features. This asymmetric design prevents the content tower from memorizing warm titles via ID embeddings or neighbor aggregation, forcing it to learn a feature-to-embedding mapping aligned with collaborative filtering space. Post-training, the content encoder generates embeddings for both existing and new content, enabling implicit graph completion through retrieval of surrogate neighbors in the embedding space. The approach extends to device cold-start by constructing cohort-based device embeddings from demographic features, facilitating immediate recommendation for new devices. Large-scale online experiments on Tubi demonstrate consistent improvements in cold-start engagement, promotion speed, impression acquisition, and device cold-start performance.
Key Results
- In real-world deployment, the model improved cold-start content engagement by 12%, accelerated promotion speed by 15%, and increased new content impressions by 20%. The content embeddings derived from intrinsic features outperformed traditional ID-based embeddings, especially within the first two weeks of content release. For device cold-start, cohort-based embeddings boosted first-touch click-through rates by 8%. The model's ability to retrieve surrogate neighbors in the embedding space effectively mitigated content isolation, enhancing overall diversity and coverage.
- Ablation studies confirmed that removing neighbor aggregation or relying on ID embeddings led to significant performance drops, validating the importance of the feature-based, asymmetric architecture. The implicit graph completion via neighbor retrieval provided a scalable and robust solution for cold-start scenarios, outperforming baseline methods in multiple metrics across different content categories and time periods.
Significance
This research addresses a fundamental challenge in recommender systems—how to effectively recommend new content and new devices with no interaction history. By transforming cold-start recommendation into a graph completion problem and leveraging content features to produce collaborative-filtering-aware embeddings, the approach bridges the semantic-collaborative gap. Its successful deployment in a large-scale production environment demonstrates its practical viability, offering a scalable solution for real-time, personalized recommendations in content-rich platforms. The methodology's ability to generalize across cold content and devices signifies a major step forward in the evolution of hybrid recommender systems, with broad implications for industry and academia.
Technical Contribution
The paper introduces a novel asymmetric two-tower architecture, Shallow-RHS, which explicitly separates device and content encoding pathways. The device tower employs temporal message passing over historical interaction data, capturing collaborative signals, while the content tower remains shallow, encoding content solely from intrinsic features. This design enforces an inductive mapping from content features to a CF-aware embedding space, enabling immediate embedding generation for new content. The model incorporates an implicit graph completion mechanism through nearest-neighbor retrieval in the embedding space, avoiding ID reliance and neighbor aggregation on the content side. Additionally, the framework extends to device cold-start by clustering devices based on demographic attributes and using cohort-level embeddings for retrieval. These innovations collectively enable scalable, real-time cold-start recommendations with high accuracy and coverage.
Novelty
This work is pioneering in formalizing cold-start content recommendation as an inductive graph completion problem, leveraging feature-based content embeddings to bridge the semantic and collaborative spaces. The asymmetric two-tower design, combining temporal message passing for device encoding with a shallow content encoder, is a significant departure from conventional GNN-based recommenders that rely heavily on neighbor aggregation and ID embeddings. The implicit neighbor retrieval strategy further enhances scalability and robustness, making this approach uniquely suited for large-scale, real-time systems. These innovations collectively fill a critical gap in existing literature, enabling effective cold-start recommendations without interaction data.
Limitations
- The approach heavily depends on the quality and richness of content features; in scenarios where content metadata is sparse or noisy, embedding quality may degrade, impacting recommendation accuracy.
- Neighbor retrieval relies on the assumption that similar content in embedding space correlates with similar user preferences; in highly dynamic or niche content domains, this assumption may weaken.
- Training and maintaining large-scale neighbor indices incur computational costs, and real-time updates pose challenges for very large content catalogs, requiring further optimization.
Future Work
Future directions include integrating multi-modal content features such as video, audio, and text to enrich content representations, thereby improving embedding quality. Developing adaptive neighbor retrieval strategies that dynamically adjust to content novelty and user preferences could further enhance cold-start performance. Exploring reinforcement learning or multi-task learning frameworks to optimize multiple recommendation objectives simultaneously is another promising avenue. Additionally, validating the model's transferability across different platforms and content domains will be crucial for broader industrial adoption.
AI Executive Summary
In the rapidly evolving landscape of content streaming and personalized recommendations, one of the most persistent challenges is the cold-start problem—how to recommend new content and new devices that lack historical interaction data. Traditional collaborative filtering methods, which rely on user-item interaction matrices, struggle to provide meaningful recommendations when faced with entirely new items or users. This issue becomes even more critical in large-scale platforms like Tubi, where millions of new titles and devices are added daily.
To address this, the authors propose a novel approach that reframes cold-start recommendation as a graph completion problem on a temporal bipartite device-content graph. The core idea is to learn a feature-to-embedding mapping that can generate high-quality representations for new content and devices immediately upon ingestion. The key innovation is the Shallow-RHS architecture, which employs an asymmetric two-tower design. The device tower leverages temporal message passing over historical watch data, capturing collaborative signals from interaction-rich neighborhoods. Conversely, the content tower remains shallow, encoding content solely from intrinsic features such as metadata, textual descriptions, and semantic embeddings, without relying on ID-based embeddings or neighbor aggregation.
This asymmetry enforces a strict content encoding paradigm that generalizes well to new, unseen content. During training, the model maximizes the likelihood of future device-content interactions, learning to produce CF-aware embeddings from content features alone. Post-training, the content encoder can generate embeddings for newly ingested titles, which are then used to retrieve surrogate neighbors in the embedding space. This neighbor retrieval effectively completes the graph around cold content, enabling the system to recommend new titles based on their semantic similarity to existing popular content.
The approach extends naturally to device cold-start by clustering devices into demographic cohorts and using cohort-level embeddings for immediate recommendation. This unified framework allows the system to handle both content and device cold-start scenarios seamlessly, without requiring interaction data for new nodes.
Large-scale online experiments on Tubi’s platform validate the effectiveness of the proposed method. The results show a 12% increase in cold-start content engagement, a 15% acceleration in promotion speed, and a 20% rise in new content impressions. For device cold-start, the first-touch click-through rate improved by 8%. These improvements demonstrate that the model not only enhances immediate recommendation quality but also increases overall platform engagement and content exposure.
In conclusion, this work presents a scalable, effective solution for cold-start recommendation by bridging the semantic-collaborative gap through an innovative asymmetric graph architecture. Its success in a real-world, large-scale environment underscores its potential to transform recommendation strategies, especially in scenarios where rapid onboarding of new content and devices is essential. Future research may focus on incorporating multi-modal content features, optimizing neighbor retrieval, and extending the framework to multi-platform settings, further broadening its impact in personalized content delivery.
Deep Dive
Abstract
Collaborative filtering and graph-based recommendation models are highly effective because they leverage observed user interactions, but this dependence creates a fundamental cold-start challenge when newly added content has no interaction history. In Tubi's production retrieval system, this challenge is further constrained by the serving interface: new content must be assigned a standalone embedding immediately, and the model must also produce device embeddings suitable for approximate nearest-neighbor retrieval. We address this setting by formulating cold-start recommendation as an inductive graph-completion problem on a temporal bipartite device-content graph. We propose Shallow-RHS, an asymmetric link-prediction architecture in which the left-hand side (LHS) device tower leverages temporally valid watch-history message passing to capture collaborative signals, while the right-hand side (RHS) content tower is intentionally shallow with respect to the graph and encodes content solely from intrinsic features. The RHS tower does not use ID-based embeddings, content-side subgraphs, neighbor aggregation, or interaction-derived representations, forcing the content encoder to map intrinsic features into a collaborative-filtering-aware embedding space. After training, the learned content encoder generates embeddings for both warm and newly ingested content, enabling implicit graph completion through retrieval of warm surrogate neighbors. We further extend the same representation-completion principle to device cold-start by constructing cohort-based embeddings from demographic features. Large-scale online experiments demonstrate consistent relative improvements in content cold-start engagement, promotion speed, impression acquisition, and device cold-start engagement.
References (8)
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Rex Ying, Ruining He, Kaifeng Chen et al.
Collaborative topic modeling for recommending scientific articles
Chong Wang, D. Blei
PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning
Weihua Hu, Yiwen Yuan, Zecheng Zhang et al.
Neural Graph Collaborative Filtering
Xiang Wang, Xiangnan He, Meng Wang et al.
Matrix Factorization Techniques for Recommender Systems
Y. Koren, Robert M. Bell, C. Volinsky
Deep content-based music recommendation
Aäron van den Oord, S. Dieleman, B. Schrauwen
Relational learning via collective matrix factorization
A. P. Singh, Geoffrey J. Gordon
Content-based Recommender Systems: State of the Art and Trends
P. Lops, M. Degemmis, G. Semeraro