16B / 1.4B mini and 100B / 6.1B flash (2 variants; MoE)
Apache 2.0
First diffusion language model (dLLM) scaled to 100B parameters; uses iterative refinement approach instead of autoregressive generation (starts with fully masked sequence and unmasks tokens in parallel across multiple rounds); 2.1x faster inference than comparable AR models (535 tokens/s); trained on ~20T tokens; excels at code generation, complex reasoning, and tool calling; mini variant (16B/1.4B active) for efficiency, flash variant (100B/6.1B active) for maximum performance; dramatically reduced computational costs through sparse activation; trained using dFactory framework with FSDP2; fundamentally different architecture paradigm from traditional LLMs.