3.8B mini and 5.6B multimodal (2 variants; dense)
MIT
Small model family: mini (3.8B) compact reasoning for math/coding on mobile devices with GQA; multimodal (5.6B) supports text, image, audio, video via Mixture of LoRAs; outperforms Gemini 2.0 Flash on audio+visual benchmarks; 200K vocab (20+ languages); top Hugging Face OpenASR leaderboard (6.14% WER); 128K context.