321M (dense)
Apache 2.0
Small reasoning model with ultra-deep 80-layer "baguette" architecture; trained on 200B tokens of fully synthetic SYNTH dataset; native thinking traces with stenographic notation; best-in-class for size on MMLU, GSM8K, HotPotQA; multilingual (French, German, Italian, Spanish, Polish); trained on only 16 H100s; RAG-optimized with source grounding.