4B / 8B (2 sizes; dense; Instruct and Thinking variants)
Apache 2.0
Vision-language family with 256K→1M context; OCR, spatial grounding (2D/3D), visual coding, GUI agents; 32-language OCR; FP8 optimized for low VRAM; Thinking variants enhance multimodal reasoning/STEM; strong in long-doc/video comprehension.