MiMo-V2-Flash | NLP.COM.AI

Parameters

309B / 15B (MoE with 256 experts; 8 active)

License

MIT

Key Features

Frontier-class foundation model excelling in reasoning, coding, and agentic workflows; #1 open-source on SWE-Bench Verified (73.4%) and SWE-Bench Multilingual (71.7%); 94.1% AIME 2025 (top 2 open-source); hybrid attention architecture with 5:1 SWA:GA ratio using aggressive 128-token sliding window; 6× reduction in KV-cache vs full attention; 256K context; trained on 27T tokens with FP8 mixed precision; introduces MOPD (Multi-Teacher Online Policy Distillation) paradigm using 1/50th compute of traditional RL; self-speculative decoding via Multi-Token Prediction achieves 2.0-2.6× speedup (150 tokens/sec); hybrid thinking mode (toggle reasoning on/off); ultra-low cost at $0.1/$0.3 per million tokens; competitive with GPT-5 High and Claude Sonnet 4.5 on agentic tasks; inference code contributed to SGLang on Day 0; represents Xiaomi's leap from 7B debut (April 2025) to frontier-class performance in 8 months.

Paper / Source

https://github.com/XiaomiMiMo/MiMo-V2-Flash