24B (2 variants: Base
Apache 2.0
Instruct; dense),Adds state-of-the-art vision understanding to Small 3; 128K context window; multimodal (text + vision); first open-source model to surpass leading proprietary models across text, vision, and multilingual capabilities in its weight class; 150 tokens/s; runs on single RTX 4090 or 32GB RAM Mac; multilingual (dozens of languages); both Base and Instruct checkpoints for fine-tuning.