235B / 22B (2 variants: Instruct
Apache 2.0
Thinking; MoE),Flagship vision-language model; Instruct variant outperforms Gemini 2.5 Pro on visual perception, GUI navigation, screenshot-to-code; Thinking variant SOTA on multimodal reasoning/STEM with deep causal analysis; 256K+ context for videos/PDFs; 32-lang OCR and 2D/3D spatial reasoning.