Nemotron-Cascade | NLP.COM.AI

Parameters

8B / 14B (3 variants: 8B unified, 8B-Thinking, 14B-Thinking; dense, post-trained from Qwen3)

License

NVIDIA Open Model License

Key Features

General-purpose reasoning models trained with novel Cascade RL (sequential domain-wise reinforcement learning); 14B-Thinking outperforms DeepSeek-R1-0528 (671B) on LiveCodeBench v5/v6/Pro; achieves silver-medal performance on 2025 IOI (International Olympiad in Informatics); 8B models match DeepSeek-R1-0528 on LiveCodeBench despite being 80× smaller; beats Gemini 2.5 Pro, o4-mini, Qwen3-235B on coding benchmarks; unified 8B model operates in both thinking and instruct modes with /think and /no_think tags; Cascade RL trains sequentially across domains (vs blending all domains like DeepSeek-R1/Qwen3), reducing engineering complexity; RLHF pre-step boosts reasoning ability far beyond preference optimization; subsequent domain-wise RLVR stages maintain or improve earlier performance; fully transparent training recipes and open data; demonstrates smaller models can achieve frontier performance through superior post-training techniques.

Paper / Source

https://arxiv.org/abs/2512.13607