456B / 45.9B (MoE)
Open (permissive)
Hybrid-attention reasoning model; Lightning attention for efficient scaling; 1M token context; RL with CISPO; outperforms DeepSeek-R1 on SWE-bench/GPQA; function calling/agentic tools.