32B (dense vision-language model)
Apache 2.0
Advanced vision-language reasoning model scaling beyond SEED 14B Think; unified Transformer architecture processing text tokens and visual patches in shared embedding space; multimodal capabilities across text, images, and video with 128K context window; optional thinking mode for deep controllable reasoning; knowledge cutoff May 2025; strengthens Korean-centric reasoning and agentic capabilities beyond simple parameter scaling; achieved Grade 1 scores on Korean college entrance examination (KCSAT) across Korean language, mathematics, English, and Korean history with perfect scores in English and history; processes questions through direct image input demonstrating advanced visual reasoning; vision encoder: SigLIP-2 at 512×512 pixels; LLaVA-1.5-HD framework with C-Abstractor connector; supports up to 1.57M visual tokens; excels on KCSAT STEM subjects (206 items across math, physics, chemistry, earth science, biology); performance on Artificial Analysis composite index (10 benchmarks) within range of global AI models; released under Ministry of Science and ICT Independent AI Foundation Model program.