36B (3 variants: Base w/ synthetic
Apache 2.0
Base w/o synthetic, Instruct; dense),Native 512K context window (4x mainstream models); "Thinking Budget" mechanism for flexible reasoning depth control (512 to 16K tokens); trained on only 12T tokens yet achieves SOTA on multiple benchmarks; SOTA open-source on AIME24 (91.7%), LiveCodeBench v6 (67.4), RULER 128K (94.6); research-friendly dual base release (with/without synthetic instruction data) for cleaner experimentation; strong math (90.8% GSM8K, 81.7 MATH), code (76.8 HumanEval), and reasoning (87.7 BBH); optimized for international use; serves as base for Hermes 4.3.