Baguettotron | NLP.COM.AI

Parameters

321M (dense)

License

Apache 2.0

Key Features

Small reasoning model with ultra-deep 80-layer "baguette" architecture; trained on 200B tokens of fully synthetic SYNTH dataset; native thinking traces with stenographic notation; best-in-class for size on MMLU, GSM8K, HotPotQA; multilingual (French, German, Italian, Spanish, Polish); trained on only 16 H100s; RAG-optimized with source grounding.

Paper / Source

https://huggingface.co/PleIAs/Baguettotron