Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation
Nemotron-Cascade 2 achieves top-tier reasoning with Cascade RL and multi-domain distillation in a 30B MoE model.
Key Findings
Methodology
Nemotron-Cascade 2 employs Cascade Reinforcement Learning (Cascade RL) and Multi-Domain On-Policy Distillation (MOPD). Initially, it undergoes Supervised Fine-Tuning (SFT) on a meticulously curated dataset, followed by an expansion of Cascade RL to encompass a broader spectrum of reasoning and agentic domains. Multi-domain on-policy distillation from the strongest intermediate teacher models in each domain is introduced throughout the Cascade RL process, allowing for efficient recovery of benchmark regressions and sustained performance gains.
Key Results
- Nemotron-Cascade 2 achieved gold medal-level performance in the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI), despite being a 30B MoE model with 3B activated parameters.
- In mathematical reasoning, Nemotron-Cascade 2 scored 79.3 on the IMO AnswerBench and achieved 92.4 on AIME 2025.
- In code reasoning, Nemotron-Cascade 2 scored 439.28/600 in IOI 2025 and achieved 87.2 on LiveCodeBench v6.
Significance
The introduction of Nemotron-Cascade 2 marks a breakthrough in achieving high reasoning capabilities in compact models. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.
Technical Contribution
Nemotron-Cascade 2 technically advances the complexity of multi-domain RL by simplifying the engineering challenges through Cascade RL and effectively recovering benchmark performance through multi-domain on-policy distillation. The model showcases the possibility of achieving high performance with limited parameters, offering new perspectives for future AI model design.
Novelty
Nemotron-Cascade 2 is the first to achieve gold medal-level performance in international competitions with a compact parameter model. Its core innovation lies in the combination of Cascade RL and multi-domain on-policy distillation, a method not fully explored in prior work.
Limitations
- Nemotron-Cascade 2 underperforms in knowledge-intensive tasks compared to Qwen3.5-35B-A3B, indicating a need for improvements in knowledge pretraining and agentic RL.
- The model may experience performance degradation in complex environments, particularly when cross-domain interference is significant.
- Despite excelling in multiple benchmarks, fine-grained optimization in specific domains remains to be strengthened.
Future Work
Future research could focus on enhancing the model's knowledge-intensive pretraining and agentic RL capabilities. Additionally, exploring more efficient multi-domain on-policy distillation methods to further improve performance is a significant direction.
AI Executive Summary
Nemotron-Cascade 2 is an open 30B Mixture-of-Experts (MoE) model with 3B activated parameters, showcasing exceptional reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. Nemotron-Cascade 2 is the second open-weight LLM, after DeepSeek-V3.2-Speciale-671B-A37B, to achieve gold medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. The key technical advancements are as follows. After Supervised Fine-Tuning (SFT) on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.
Nemotron-Cascade 2 excels in multiple international competitions, demonstrating the potential for high intelligence density. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.
The introduction of Nemotron-Cascade 2 marks a breakthrough in achieving high reasoning capabilities in compact models. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.
Nemotron-Cascade 2 technically advances the complexity of multi-domain RL by simplifying the engineering challenges through Cascade RL and effectively recovering benchmark performance through multi-domain on-policy distillation. The model showcases the possibility of achieving high performance with limited parameters, offering new perspectives for future AI model design.
Nemotron-Cascade 2 is the first to achieve gold medal-level performance in international competitions with a compact parameter model. Its core innovation lies in the combination of Cascade RL and multi-domain on-policy distillation, a method not fully explored in prior work.
Deep Analysis
Background
In recent years, large language models (LLMs) have made significant strides in the field of natural language processing. However, as model sizes continue to grow, achieving efficient reasoning capabilities under limited resources has become an important research direction. The introduction of Nemotron-Cascade 2 marks a breakthrough in achieving high intelligence density in compact models. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.
Core Problem
Achieving efficient reasoning capabilities under limited parameters is a significant challenge in current LLM research. Traditional large-scale models, while performing well, incur high computational and storage costs, making them difficult to deploy in resource-constrained environments. Nemotron-Cascade 2 addresses this issue by achieving exceptional reasoning capabilities in a compact parameter model through Cascade RL and multi-domain on-policy distillation, providing a new approach to solving this problem.
Innovation
The core innovation of Nemotron-Cascade 2 lies in the combination of Cascade RL and multi-domain on-policy distillation. Cascade RL simplifies the engineering complexity of multi-domain RL through staged domain-specific training, achieving state-of-the-art performance across multiple benchmarks. Multi-domain on-policy distillation effectively recovers benchmark performance by extracting knowledge from the strongest intermediate teacher models in each domain. This method, not fully explored in prior work, offers new perspectives for future AI model design.
Methodology
- �� Conduct Supervised Fine-Tuning (SFT) on a meticulously curated dataset to equip the model with foundational capabilities.
- �� Employ Cascade RL to simplify the engineering complexity of multi-domain RL through staged domain-specific training.
- �� Introduce multi-domain on-policy distillation throughout the Cascade RL process to extract knowledge from the strongest intermediate teacher models in each domain.
- �� Combine Cascade RL and multi-domain on-policy distillation to recover benchmark performance and achieve state-of-the-art performance across multiple benchmarks.
Experiments
The experimental design includes performance evaluations in multiple international competitions, such as the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). By comparing with other models, Nemotron-Cascade 2's exceptional performance in mathematical and code reasoning is validated. Additionally, the effectiveness of multi-domain on-policy distillation is verified, demonstrating its advantage in recovering benchmark performance.
Results
Nemotron-Cascade 2 achieved gold medal-level performance in multiple international competitions, demonstrating the potential for high intelligence density. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.
Applications
The application scenarios of Nemotron-Cascade 2 include achieving efficient AI reasoning capabilities in resource-constrained environments. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.
Limitations & Outlook
Despite Nemotron-Cascade 2's exceptional performance across multiple benchmarks, it underperforms in knowledge-intensive tasks compared to Qwen3.5-35B-A3B, indicating a need for improvements in knowledge pretraining and agentic RL. Additionally, the model may experience performance degradation in complex environments, particularly when cross-domain interference is significant. Future research could focus on enhancing the model's knowledge-intensive pretraining and agentic RL capabilities.
Plain Language Accessible to non-experts
Imagine you're in a kitchen preparing a large meal. Nemotron-Cascade 2 is like a smart sous chef that helps you make delicious dishes with limited ingredients and time. First, it prepares by learning your menu (dataset) through Supervised Fine-Tuning, ensuring it knows how to handle each ingredient. Next, it cooks in stages based on different cuisines (domains) using Cascade RL, ensuring each dish reaches its best flavor. Finally, it learns from the best chefs (multi-domain on-policy distillation), ensuring it can make tasty dishes even in complex cooking environments. In this way, Nemotron-Cascade 2 not only achieves efficient reasoning capabilities under limited resources but also excels in multiple international competitions, demonstrating the potential for high intelligence density.
ELI14 Explained like you're 14
Hey there, buddy! Did you know that Nemotron-Cascade 2 is like a super-smart robot that can win gold medals in math and coding competitions? Imagine you're playing a complex game, and this robot is like your game assistant, helping you solve all kinds of puzzles. First, it learns all the game rules (dataset), then it tackles different levels (Cascade RL). The coolest part is that it learns tricks from the best players (multi-domain on-policy distillation), ensuring it performs well in every level. In this way, Nemotron-Cascade 2 not only achieves efficient reasoning capabilities under limited resources but also excels in multiple international competitions, demonstrating the potential for high intelligence density. Isn't that awesome?
Glossary
Cascade Reinforcement Learning
A staged domain-specific training method that simplifies the engineering complexity of multi-domain RL and achieves state-of-the-art performance across multiple benchmarks.
Used in the training framework of Nemotron-Cascade 2 to help the model achieve optimal performance in different domains.
Multi-Domain On-Policy Distillation
Effectively recovers benchmark performance by extracting knowledge from the strongest intermediate teacher models in each domain.
Introduced throughout the Cascade RL process to recover benchmark regressions and sustain performance gains.
Supervised Fine-Tuning
Training conducted on a meticulously curated dataset to equip the model with foundational capabilities.
The initial training stage of Nemotron-Cascade 2, ensuring the model has basic reasoning capabilities.
Activated Parameters
The number of parameters actually used during model inference, affecting computational efficiency and performance.
Nemotron-Cascade 2 has 3B activated parameters, despite having a total parameter count of 30B.
International Mathematical Olympiad (IMO)
A global mathematics competition attracting top students from around the world.
Nemotron-Cascade 2 achieved gold medal-level performance in the 2025 IMO.
International Olympiad in Informatics (IOI)
A global programming competition testing participants' algorithmic and coding skills.
Nemotron-Cascade 2 achieved gold medal-level performance in the 2025 IOI.
ICPC World Finals
The highest-level event of the International Collegiate Programming Contest, attracting top university students worldwide.
Nemotron-Cascade 2 performed exceptionally well in the 2025 ICPC World Finals.
High Intelligence Density
Nemotron-Cascade 2 demonstrates the potential for high intelligence density, excelling in multiple international competitions.
Knowledge-Intensive Tasks
Tasks requiring extensive background knowledge and reasoning capabilities, typically demanding high knowledge pretraining from models.
Nemotron-Cascade 2 underperforms in knowledge-intensive tasks compared to some other models.
Agentic Capabilities
The ability of a model to autonomously make decisions and execute tasks in complex environments.
Nemotron-Cascade 2 demonstrates strong agentic capabilities, excelling across multiple benchmarks.
Open Questions Unanswered questions from this research
- 1 How can the model's performance in knowledge-intensive tasks be further improved under limited parameters? Current methods still have shortcomings in knowledge pretraining and agentic RL, and future research could focus on enhancing these capabilities.
- 2 How to effectively avoid performance degradation in complex environments with significant cross-domain interference? Nemotron-Cascade 2 may experience performance degradation in certain complex environments, and exploring more efficient multi-domain on-policy distillation methods could be a solution.
- 3 How to achieve more efficient AI reasoning capabilities in resource-constrained environments? Nemotron-Cascade 2 demonstrates the possibility of achieving high performance under limited parameters, but fine-grained optimization in specific domains remains to be strengthened.
- 4 How to further simplify the engineering complexity in multi-domain RL? Nemotron-Cascade 2 simplifies the engineering complexity of multi-domain training through Cascade RL, but there is still room for improvement.
- 5 How to improve the model's reasoning capabilities without increasing computational and storage costs? Nemotron-Cascade 2 excels in multiple international competitions, but there is still room for improvement in certain specific tasks.
Applications
Immediate Applications
Education Sector
Nemotron-Cascade 2 can be used in mathematics and programming education to help students improve learning efficiency under limited resources.
Automated Programming
With Nemotron-Cascade 2's code reasoning capabilities, automated programming tasks can be achieved, improving software development efficiency.
Intelligent Assistants
Nemotron-Cascade 2 can serve as an intelligent assistant, providing decision support and task execution in complex environments.
Long-term Vision
AI Applications in Resource-Constrained Environments
Nemotron-Cascade 2 demonstrates the possibility of achieving efficient AI under limited parameters, providing insights for future AI applications in resource-constrained environments.
Exploration of High Intelligence Density
Nemotron-Cascade 2 demonstrates the potential for high intelligence density, offering new perspectives for future AI model design.
Abstract
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.