Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Key Findings

Methodology

Nemotron-Cascade 2 employs Cascade Reinforcement Learning (Cascade RL) and Multi-Domain On-Policy Distillation (MOPD). Initially, it undergoes Supervised Fine-Tuning (SFT) on a meticulously curated dataset, followed by an expansion of Cascade RL to encompass a broader spectrum of reasoning and agentic domains. Multi-domain on-policy distillation from the strongest intermediate teacher models in each domain is introduced throughout the Cascade RL process, allowing for efficient recovery of benchmark regressions and sustained performance gains.

Key Results

Nemotron-Cascade 2 achieved gold medal-level performance in the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI), despite being a 30B MoE model with 3B activated parameters.
In mathematical reasoning, Nemotron-Cascade 2 scored 79.3 on the IMO AnswerBench and achieved 92.4 on AIME 2025.
In code reasoning, Nemotron-Cascade 2 scored 439.28/600 in IOI 2025 and achieved 87.2 on LiveCodeBench v6.

Significance

The introduction of Nemotron-Cascade 2 marks a breakthrough in achieving high reasoning capabilities in compact models. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.

Technical Contribution

Nemotron-Cascade 2 technically advances the complexity of multi-domain RL by simplifying the engineering challenges through Cascade RL and effectively recovering benchmark performance through multi-domain on-policy distillation. The model showcases the possibility of achieving high performance with limited parameters, offering new perspectives for future AI model design.

Novelty

Nemotron-Cascade 2 is the first to achieve gold medal-level performance in international competitions with a compact parameter model. Its core innovation lies in the combination of Cascade RL and multi-domain on-policy distillation, a method not fully explored in prior work.

Limitations

Nemotron-Cascade 2 underperforms in knowledge-intensive tasks compared to Qwen3.5-35B-A3B, indicating a need for improvements in knowledge pretraining and agentic RL.
The model may experience performance degradation in complex environments, particularly when cross-domain interference is significant.
Despite excelling in multiple benchmarks, fine-grained optimization in specific domains remains to be strengthened.

Future Work

Future research could focus on enhancing the model's knowledge-intensive pretraining and agentic RL capabilities. Additionally, exploring more efficient multi-domain on-policy distillation methods to further improve performance is a significant direction.

AI Executive Summary

Nemotron-Cascade 2 is an open 30B Mixture-of-Experts (MoE) model with 3B activated parameters, showcasing exceptional reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. Nemotron-Cascade 2 is the second open-weight LLM, after DeepSeek-V3.2-Speciale-671B-A37B, to achieve gold medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. The key technical advancements are as follows. After Supervised Fine-Tuning (SFT) on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.

Nemotron-Cascade 2 excels in multiple international competitions, demonstrating the potential for high intelligence density. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.

The introduction of Nemotron-Cascade 2 marks a breakthrough in achieving high reasoning capabilities in compact models. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.

Nemotron-Cascade 2 technically advances the complexity of multi-domain RL by simplifying the engineering challenges through Cascade RL and effectively recovering benchmark performance through multi-domain on-policy distillation. The model showcases the possibility of achieving high performance with limited parameters, offering new perspectives for future AI model design.

Nemotron-Cascade 2 is the first to achieve gold medal-level performance in international competitions with a compact parameter model. Its core innovation lies in the combination of Cascade RL and multi-domain on-policy distillation, a method not fully explored in prior work.

Deep Analysis

Background

In recent years, large language models (LLMs) have made significant strides in the field of natural language processing. However, as model sizes continue to grow, achieving efficient reasoning capabilities under limited resources has become an important research direction. The introduction of Nemotron-Cascade 2 marks a breakthrough in achieving high intelligence density in compact models. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.

Core Problem

Achieving efficient reasoning capabilities under limited parameters is a significant challenge in current LLM research. Traditional large-scale models, while performing well, incur high computational and storage costs, making them difficult to deploy in resource-constrained environments. Nemotron-Cascade 2 addresses this issue by achieving exceptional reasoning capabilities in a compact parameter model through Cascade RL and multi-domain on-policy distillation, providing a new approach to solving this problem.

Innovation

The core innovation of Nemotron-Cascade 2 lies in the combination of Cascade RL and multi-domain on-policy distillation. Cascade RL simplifies the engineering complexity of multi-domain RL through staged domain-specific training, achieving state-of-the-art performance across multiple benchmarks. Multi-domain on-policy distillation effectively recovers benchmark performance by extracting knowledge from the strongest intermediate teacher models in each domain. This method, not fully explored in prior work, offers new perspectives for future AI model design.

Methodology

�� Conduct Supervised Fine-Tuning (SFT) on a meticulously curated dataset to equip the model with foundational capabilities.
�� Employ Cascade RL to simplify the engineering complexity of multi-domain RL through staged domain-specific training.
�� Introduce multi-domain on-policy distillation throughout the Cascade RL process to extract knowledge from the strongest intermediate teacher models in each domain.
�� Combine Cascade RL and multi-domain on-policy distillation to recover benchmark performance and achieve state-of-the-art performance across multiple benchmarks.

Experiments

The experimental design includes performance evaluations in multiple international competitions, such as the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). By comparing with other models, Nemotron-Cascade 2's exceptional performance in mathematical and code reasoning is validated. Additionally, the effectiveness of multi-domain on-policy distillation is verified, demonstrating its advantage in recovering benchmark performance.

Results

Nemotron-Cascade 2 achieved gold medal-level performance in multiple international competitions, demonstrating the potential for high intelligence density. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.

Applications

The application scenarios of Nemotron-Cascade 2 include achieving efficient AI reasoning capabilities in resource-constrained environments. By leveraging Cascade RL and multi-domain on-policy distillation, the model excels in multiple international competitions, demonstrating the potential for high intelligence density. This not only provides new research directions for academia but also offers insights for industry to achieve efficient AI under resource constraints.

Limitations & Outlook

Despite Nemotron-Cascade 2's exceptional performance across multiple benchmarks, it underperforms in knowledge-intensive tasks compared to Qwen3.5-35B-A3B, indicating a need for improvements in knowledge pretraining and agentic RL. Additionally, the model may experience performance degradation in complex environments, particularly when cross-domain interference is significant. Future research could focus on enhancing the model's knowledge-intensive pretraining and agentic RL capabilities.

Plain Language Accessible to non-experts

Imagine you're in a kitchen preparing a large meal. Nemotron-Cascade 2 is like a smart sous chef that helps you make delicious dishes with limited ingredients and time. First, it prepares by learning your menu (dataset) through Supervised Fine-Tuning, ensuring it knows how to handle each ingredient. Next, it cooks in stages based on different cuisines (domains) using Cascade RL, ensuring each dish reaches its best flavor. Finally, it learns from the best chefs (multi-domain on-policy distillation), ensuring it can make tasty dishes even in complex cooking environments. In this way, Nemotron-Cascade 2 not only achieves efficient reasoning capabilities under limited resources but also excels in multiple international competitions, demonstrating the potential for high intelligence density.

ELI14 Explained like you're 14

Hey there, buddy! Did you know that Nemotron-Cascade 2 is like a super-smart robot that can win gold medals in math and coding competitions? Imagine you're playing a complex game, and this robot is like your game assistant, helping you solve all kinds of puzzles. First, it learns all the game rules (dataset), then it tackles different levels (Cascade RL). The coolest part is that it learns tricks from the best players (multi-domain on-policy distillation), ensuring it performs well in every level. In this way, Nemotron-Cascade 2 not only achieves efficient reasoning capabilities under limited resources but also excels in multiple international competitions, demonstrating the potential for high intelligence density. Isn't that awesome?

Glossary

Cascade Reinforcement Learning

A staged domain-specific training method that simplifies the engineering complexity of multi-domain RL and achieves state-of-the-art performance across multiple benchmarks.

Used in the training framework of Nemotron-Cascade 2 to help the model achieve optimal performance in different domains.

Multi-Domain On-Policy Distillation

Effectively recovers benchmark performance by extracting knowledge from the strongest intermediate teacher models in each domain.

Introduced throughout the Cascade RL process to recover benchmark regressions and sustain performance gains.

Supervised Fine-Tuning

Training conducted on a meticulously curated dataset to equip the model with foundational capabilities.

The initial training stage of Nemotron-Cascade 2, ensuring the model has basic reasoning capabilities.

Activated Parameters

The number of parameters actually used during model inference, affecting computational efficiency and performance.

Nemotron-Cascade 2 has 3B activated parameters, despite having a total parameter count of 30B.

International Mathematical Olympiad (IMO)

A global mathematics competition attracting top students from around the world.

Nemotron-Cascade 2 achieved gold medal-level performance in the 2025 IMO.

International Olympiad in Informatics (IOI)

A global programming competition testing participants' algorithmic and coding skills.

Nemotron-Cascade 2 achieved gold medal-level performance in the 2025 IOI.

ICPC World Finals

The highest-level event of the International Collegiate Programming Contest, attracting top university students worldwide.

Nemotron-Cascade 2 performed exceptionally well in the 2025 ICPC World Finals.

High Intelligence Density

Nemotron-Cascade 2 demonstrates the potential for high intelligence density, excelling in multiple international competitions.

Knowledge-Intensive Tasks

Tasks requiring extensive background knowledge and reasoning capabilities, typically demanding high knowledge pretraining from models.

Nemotron-Cascade 2 underperforms in knowledge-intensive tasks compared to some other models.

Agentic Capabilities

The ability of a model to autonomously make decisions and execute tasks in complex environments.

Nemotron-Cascade 2 demonstrates strong agentic capabilities, excelling across multiple benchmarks.

Open Questions Unanswered questions from this research

1 How can the model's performance in knowledge-intensive tasks be further improved under limited parameters? Current methods still have shortcomings in knowledge pretraining and agentic RL, and future research could focus on enhancing these capabilities.
2 How to effectively avoid performance degradation in complex environments with significant cross-domain interference? Nemotron-Cascade 2 may experience performance degradation in certain complex environments, and exploring more efficient multi-domain on-policy distillation methods could be a solution.
3 How to achieve more efficient AI reasoning capabilities in resource-constrained environments? Nemotron-Cascade 2 demonstrates the possibility of achieving high performance under limited parameters, but fine-grained optimization in specific domains remains to be strengthened.
4 How to further simplify the engineering complexity in multi-domain RL? Nemotron-Cascade 2 simplifies the engineering complexity of multi-domain training through Cascade RL, but there is still room for improvement.
5 How to improve the model's reasoning capabilities without increasing computational and storage costs? Nemotron-Cascade 2 excels in multiple international competitions, but there is still room for improvement in certain specific tasks.

Applications

Immediate Applications

Education Sector

Nemotron-Cascade 2 can be used in mathematics and programming education to help students improve learning efficiency under limited resources.

Automated Programming

With Nemotron-Cascade 2's code reasoning capabilities, automated programming tasks can be achieved, improving software development efficiency.

Intelligent Assistants

Nemotron-Cascade 2 can serve as an intelligent assistant, providing decision support and task execution in complex environments.

Long-term Vision

AI Applications in Resource-Constrained Environments

Nemotron-Cascade 2 demonstrates the possibility of achieving efficient AI under limited parameters, providing insights for future AI applications in resource-constrained environments.

Exploration of High Intelligence Density

Nemotron-Cascade 2 demonstrates the potential for high intelligence density, offering new perspectives for future AI model design.

Abstract

We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical and coding reasoning performance approaches that of frontier open models. It is the second open-weight LLM, after DeepSeekV3.2-Speciale-671B-A37B, to achieve Gold Medal-level performance in the 2025 International Mathematical Olympiad (IMO), the International Olympiad in Informatics (IOI), and the ICPC World Finals, demonstrating remarkably high intelligence density with 20x fewer parameters. In contrast to Nemotron-Cascade 1, the key technical advancements are as follows. After SFT on a meticulously curated dataset, we substantially expand Cascade RL to cover a much broader spectrum of reasoning and agentic domains. Furthermore, we introduce multi-domain on-policy distillation from the strongest intermediate teacher models for each domain throughout the Cascade RL process, allowing us to efficiently recover benchmark regressions and sustain strong performance gains along the way. We release the collection of model checkpoint and training data.

cs.CL cs.AI cs.LG

Key Findings

Methodology

Key Results

Significance

Technical Contribution

Novelty

Limitations

Future Work

AI Executive Summary

Deep Analysis

Background

Core Problem

Innovation

Methodology

Experiments

Results

Applications

Limitations & Outlook

Plain Language Accessible to non-experts

ELI14 Explained like you're 14

Glossary

Cascade Reinforcement Learning

Multi-Domain On-Policy Distillation

Supervised Fine-Tuning

Activated Parameters

International Mathematical Olympiad (IMO)

International Olympiad in Informatics (IOI)

ICPC World Finals

High Intelligence Density

Knowledge-Intensive Tasks

Agentic Capabilities

Open Questions Unanswered questions from this research

Applications

Immediate Applications

Education Sector

Automated Programming

Intelligent Assistants

Long-term Vision

AI Applications in Resource-Constrained Environments

Exploration of High Intelligence Density

Abstract

Related Papers

Sentiment and Emotion Classification of Indonesian E-Commerce Reviews via Multi-Task BiLSTM and AutoML Benchmarking

SeaEvo: Advancing Algorithm Discovery with Strategy Space Evolution

Improving Robustness of Tabular Retrieval via Representational Stability

Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities

CRAFT: Clustered Regression for Adaptive Filtering of Training data

BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering