3B (dense)
Apache 2.0
Compact multilingual reasoning model with dual-mode (think/no_think); trained on 11.2T tokens; supports 128K context (6 languages); outperforms Llama 3.2 3B and Qwen2.5 3B; competitive with 4B models; GQA and NoPE architecture.
https://huggingface.co/blog/smollm3