Open Weight Text-to-Text Models (2025)

96 models tracked

Date Model Developer Parameters License Key Features Paper
December 29 HyperCLOVA X SEED 8B Omni Naver Cloud (South Korea) 8B (dense any-to-any model) Apache 2.0 First native omnimodal architecture from Naver; Korean-centered any-to-any model processing and generating across text, images, and audio in unified architecture; eliminates modality barriers through integrated reasoning workflows; seamlessly handles explanations, conversations, visual analysis, and voice interactions; text-based image generation and editing functions; built on deep Korean language understanding with exceptional … Link
December 29 HyperCLOVA X SEED 32B Think Naver Cloud (South Korea) 32B (dense vision-language model) Apache 2.0 Advanced vision-language reasoning model scaling beyond SEED 14B Think; unified Transformer architecture processing text tokens and visual patches in shared embedding space; multimodal capabilities across text, images, and video with 128K context window; optional thinking mode for deep controllable reasoning; knowledge cutoff May 2025; strengthens Korean-centric reasoning and agentic capabilities … Link
December 29 Llama 3.3 8B Instruct Meta (USA) 8B (dense) Llama 3.3 Community License The "lost" Llama 3.3 8B extracted from Meta's Llama API via finetuning workaround; model existed behind API since April 2025 but weights never officially released; extracted by downloading finetuned model and subtracting adapter to recover base; significant improvements over Llama 3.1 8B: 81.95% IFEval (vs 78.2%), 37.0% GPQA Diamond (vs … Link
December 29 WeDLM-8B Tencent (China) 8B (2 variants: Base, Instruct; dense) Apache 2.0 First production diffusion language model with standard causal attention; initialized from Qwen3-8B; introduces Topological Reordering for parallel mask recovery under causal attention + Streaming Parallel Decoding for continuous prefix commitment; 3-6× faster than vLLM-optimized Qwen3-8B on math reasoning (GSM8K), 2-3× faster on code generation, 1.5-2× faster on open-ended QA; outperforms … Link
December 25 MiniMax M2.1 MiniMax (China) 230B / 10B (MoE with 23:1 sparsity ratio) Apache 2.0 Enhanced successor to M2 focused on multi-language programming and real-world complex tasks; 74% SWE-bench Verified, 72.5% SWE-bench Multilingual, 88.6% VIBE benchmark (Visual & Interactive Benchmark for Execution); outperforms Claude Sonnet 4.5 and approaches Claude Opus 4.5 on multilingual coding; exceptional multi-language capabilities across Rust, Java, Go, C++, Kotlin, Objective-C, TypeScript, … Link
December 25 LFM2-2.6B-Exp Liquid AI (USA) 2.6B (dense) Apache 2.0 Experimental checkpoint built on LFM2-2.6B using pure reinforcement learning; hybrid architecture with 10 double-gated short-range convolution blocks + 6 Grouped Query Attention (GQA) blocks; specifically trained on instruction following, knowledge, and math; achieves 82.41% GSM8K, 79.56% IFEval, 42% GPQA; IFBench score surpasses DeepSeek R1-0528 (a model 263× larger); 3× faster … Link
December 22 GLM-4.7 Zhipu AI (China) 358B / ~32B (MoE) MIT Latest flagship model with major improvements in coding and creative writing; Core Coding: 73.8% SWE-bench (+5.8%), 66.7% SWE-bench Multilingual (+12.9%), 41% Terminal Bench 2.0 (+10%), 84.9% LiveCodeBench v6; Vibe Coding: cleaner/more modern webpages and better-looking slides with accurate layout/sizing; Complex Reasoning: 42.8% HLE with tools (+12.4%), 95.7% AIME 2025, 97.1% … Link
December 18 Hearthfire-24B LatitudeGames (USA) 24B (dense) Apache 2.0 Narrative longform writing model designed to embrace quiet moments and atmosphere; based on Mistral Small 3.2 Instruct; philosophy of 'vibes over velocity' prioritizing introspection and slow burn over constant action; deliberately slower-paced with cooperative and atmospheric tone (vs Wayfarer's grit and consequence); trained with SFT on single dataset of thousands … Link
December 18 FunctionGemma Google (USA) 270M (dense) Gemma License Specialized Gemma 3 270M fine-tuned for unified chat and function calling; translates natural language into structured API calls while maintaining conversational ability; achieves 85% accuracy on Mobile Actions benchmark after fine-tuning (vs 58% baseline); designed for edge deployment on mobile phones and devices like NVIDIA Jetson Nano; runs fully offline … Link
December 16 T5Gemma 2 Google (USA) 270M-270M, 1B-1B, 4B-4B (3 encoder-decoder sizes) Apache 2.0 Next generation of T5Gemma with multimodal and long-context capabilities; extends T5Gemma's adaptation recipe (UL2) from text-only to multimodal based on Gemma 3; processes text and vision inputs; introduces tied word embeddings (shares all embeddings across encoder and decoder for efficiency) and merged attention (unifies decoder self-attention and cross-attention into single … Link
December 16 Nemotron-Cascade NVIDIA (USA) 8B / 14B (3 variants: 8B unified, 8B-Thinking, 14B-Thinking; dense, post-trained from Qwen3) NVIDIA Open Model License General-purpose reasoning models trained with novel Cascade RL (sequential domain-wise reinforcement learning); 14B-Thinking outperforms DeepSeek-R1-0528 (671B) on LiveCodeBench v5/v6/Pro; achieves silver-medal performance on 2025 IOI (International Olympiad in Informatics); 8B models match DeepSeek-R1-0528 on LiveCodeBench despite being 80× smaller; beats Gemini 2.5 Pro, o4-mini, Qwen3-235B on coding benchmarks; unified 8B … Link
December 16 MiMo-V2-Flash Xiaomi (China) 309B / 15B (MoE with 256 experts; 8 active) MIT Frontier-class foundation model excelling in reasoning, coding, and agentic workflows; #1 open-source on SWE-Bench Verified (73.4%) and SWE-Bench Multilingual (71.7%); 94.1% AIME 2025 (top 2 open-source); hybrid attention architecture with 5:1 SWA:GA ratio using aggressive 128-token sliding window; 6× reduction in KV-cache vs full attention; 256K context; trained on 27T … Link
December 15 QwenLong-L1.5 Alibaba (Qwen Team) 30B / 3B (MoE) Apache 2.0 Long-context reasoning model based on Qwen3-30B-A3B-Thinking with memory management for ultra-long contexts (1M-4M tokens); three core innovations: (1) Multi-hop reasoning data synthesis pipeline that moves beyond needle-in-haystack tasks to generate complex reasoning requiring globally distributed evidence, (2) Adaptive Entropy-Controlled Policy Optimization (AEPO) algorithm for stable long-context RL training with task-balanced … Link
December 15 Nemotron 3 Nano NVIDIA (USA) 31.6B / 3.6B (2 variants: Base NVIDIA Open Model License Instruct; hybrid Mamba-Transformer MoE),Breakthrough agentic AI model with hybrid Mamba-2 + Transformer + MoE architecture (activates 6 of 128 experts per pass); 1M-token context window natively; both Base and Instruct (post-trained) variants released; 4x faster throughput than Nemotron 2 Nano; 3.3x higher throughput than Qwen3-30B-A3B and 2.2x vs GPT-OSS-20B on … Link
December 12 OLMo 3.1 Allen Institute for AI (USA) 32B Think / 32B Instruct / 7B RL-Zero variants (3 model types; dense) Apache 2.0 Extended training of OLMo 3 with additional 21 days on 224 GPUs; Think 32B outperforms Qwen3-32B on AIME 2025 and performs close to Gemma 27B; Instruct 32B is strongest fully open 32B-scale instruct model; substantial improvements: +5 points AIME, +4 points ZebraLogic, +4 points IFEval, +20 points IFBench; beats Gemma … Link
December 11 LLaDA 2.0 inclusionAI / Ant Group (China) 16B / 1.4B mini and 100B / 6.1B flash (2 variants; MoE) Apache 2.0 First diffusion language model (dLLM) scaled to 100B parameters; uses iterative refinement approach instead of autoregressive generation (starts with fully masked sequence and unmasks tokens in parallel across multiple rounds); 2.1x faster inference than comparable AR models (535 tokens/s); trained on ~20T tokens; excels at code generation, complex reasoning, and … Link
December 9 Nomos 1 Nous Research (USA) 31B (fine-tune of Qwen3-30B-A3B-Thinking-2507; MoE) Apache 2.0 Specialized mathematical reasoning model for problem-solving and proof-writing in natural language; developed in collaboration with Hillclimb AI; scores 87/120 on Putnam 2025 (base model only achieves 24/120 - 3.6x improvement); designed to work with Nomos Reasoning Harness (open-sourced concurrently); significant advancement in domain-specific mathematical capabilities; demonstrates power of targeted fine-tuning … Link
December 9 Devstral 2 Mistral AI (France) 123B Devstral 2 and 24B Small 2 (2 variants; dense) Modified MIT (Devstral 2) / Apache 2.0 (Small 2) Next-generation agentic coding model family; 256K context; SOTA open-weight on SWE-bench Verified (72.2%, huge jump from original Devstral's 46.8%); 7x more cost-efficient than Claude Sonnet for real-world coding tasks; business context awareness similar to Le Chat's conversational memory; ships with Mistral Vibe CLI for natural language code automation and vibe … Link
December 8 GLM-4.6V Zhipu AI (Z.ai China) MIT 106B / 12B (MoE) and 9B Flash (dense),First GLM with native Function Calling integration; multimodal vision-language model based on GLM-4.5-Air; 128K context; SOTA on 42 public vision-language benchmarks; multimodal document understanding (processes up to 128K tokens of multi-document input as images); frontend replication with pixel-accurate HTML/CSS from UI screenshots; visual … Link
December 5 Rnj-1 Essential AI (USA) 8.3B (dense) Apache 2.0 First model from Essential AI (founded by Ashish Vaswani); exceptional code generation and agentic capabilities; leads 8B class on SWE-bench Verified (20.8%, beating Gemini 2.0 Flash and Qwen2.5-Coder 32B); SOTA tool use on BFCL; strong math (AIME) and STEM (GPQA); 32K context with YaRN extension; trained on 8.4T tokens using … Link
December 3 Hermes 4.3 Nous Research (USA) 36B (based on ByteDance Seed-OSS-36B-Base; dense) Apache 2.0 First production model trained entirely on Psyche distributed network; matches/exceeds Hermes 4 70B performance at half parameter cost; 512K context (extended from 128K); hybrid reasoning with <think> tags; SOTA on RefusalBench; trained twice (centralized vs distributed) with Psyche version outperforming; uses DisTrO optimizer for internet-scale distributed training secured by Solana … Link
December 2 Ministral 3 Mistral AI (France) 3B / 8B / 14B (3 sizes × 3 variants: Base Apache 2.0 Instruct, Reasoning; dense),Multimodal edge-optimized family (text + vision); 128K-256K context; single GPU deployment; Base for foundation tasks, Instruct for chat/assistants, Reasoning for complex logic; 14B Reasoning achieves 85% on AIME 2025; can run on laptops/phones/drones; efficient token generation. Link
December 2 Mistral Large 3 Mistral AI (France) 675B / 41B (MoE) Apache 2.0 First open-weight frontier model with unified multimodal (text + image) and multilingual capabilities; granular MoE architecture; 256K context window; excels in long-document understanding, agentic workflows, coding, and multilingual processing; trained on 3000 H200 GPUs; ranked #2 in OSS non-reasoning on LMArena. Link
December 1 DeepSeek V3.2 DeepSeek AI (China) 671B / 37B (2 variants: standard MIT Speciale; MoE),First DeepSeek to integrate thinking into tool-use; hybrid thinking/non-thinking modes; standard version reaches GPT-5 level (93.1% AIME, 92.5% HMMT); Speciale variant for extreme reasoning with gold medals in IMO/CMO/ICPC/IOI 2025 (99.2% HMMT, 35/42 IMO); combines theorem-proving from Math-V2; massive agent training (1,800+ environments). Link
December 1 Trinity Arcee AI (USA) 6B / 1B Nano (MoE) and 26B / 3B Mini (MoE) Apache 2.0 U.S.-trained MoE family with AFMoE architecture; 128K context; trained on 10T tokens; Nano (6B/1B) for chat with personality and on-device AI; Mini (26B/3B) for high-throughput reasoning, function calling, and agent workflows; strong on MMLU and BFCL V3. Link
November 27 DeepSeek-Math-V2 DeepSeek AI (China) 685B (built on DeepSeek-V3.2-Exp-Base) Apache 2.0 Self-verifying mathematical reasoning model with verifier-generator dual architecture; gold medal IMO 2025 (5/6 problems, 83.3%) and CMO 2024 (73.8%); near-perfect Putnam 2024 (118/120 points); IMO-ProofBench: 99% basic, 61.9% advanced; combines theorem-proving with self-verification; scales verification compute. Link
November 26 INTELLECT-3 Prime Intellect (USA) 106B / 12B (MoE) MIT Post-trained on GLM-4.5-Air-Base using SFT and RL; trained on 512 H200 GPUs with prime-rl framework; SOTA performance for size on math (90.8% AIME 2024), code, and reasoning; fully open-sourced with complete RL stack and environments. Link
November 21 Nanbeige4-3B-Thinking-2511 Nanbeige LLM Lab / BOSS Zhipin (China) 3B (dense) Apache 2.0 Small reasoning model with exceptional performance-to-size ratio; outperforms Qwen3-32B on AIME 2024 (90.4 vs 81.4) and GPQA-Diamond (82.2 vs 68.7); trained on 23T tokens with novel Fine-Grained Warmup-Stable-Decay (FG-WSD) technique; ranks #11 on WritingBench and #15 on EQBench3; scores 60 on Arena-Hard V2; SOTA open-source under 32B parameters on multiple … Link
November 20 OLMo 3 Allen Institute for AI (USA) 7B / 32B (multiple variants: Base, Think, Instruct, RL Zero; dense) Apache 2.0 Fully open model family trained on Dolma 3 (6T tokens); 65K context; Base for foundation tasks; Think for explicit reasoning (matches Qwen 3 on MATH); Instruct for chat/tool use; RL Zero for research; competitive with Qwen 2.5/Gemma 3; complete transparency from data to deployment; first fully open 32B thinking model. Link
November 12 Baguettotron PleIAs (France) 321M (dense) Apache 2.0 Small reasoning model with ultra-deep 80-layer "baguette" architecture; trained on 200B tokens of fully synthetic SYNTH dataset; native thinking traces with stenographic notation; best-in-class for size on MMLU, GSM8K, HotPotQA; multilingual (French, German, Italian, Spanish, Polish); trained on only 16 H100s; RAG-optimized with source grounding. Link
November 6 Kimi K2 Thinking Moonshot AI (China) 1T / 32B (MoE) Modified MIT Thinking agent with step-by-step reasoning and dynamic tool use; 256K context; SOTA on HLE (44.9% w/ tools) and BrowseComp (60.2%); 200-300 sequential tool calls; native INT4 quantization for 2x speed; excels at agentic coding/workflows; tops SWE-Bench Verified (71.3%).
October 31 Kimi Linear Moonshot AI (China) 48B / 3B (MoE) MIT Hybrid linear attention architecture with Kimi Delta Attention (KDA); 3:1 KDA-to-global MLA ratio; outperforms full attention across short/long-context and RL tasks; 75% KV cache reduction; 6x faster decoding at 1M context; trained on 5.7T tokens. Link
October 27 Ming Omni Inclusion (AntLingAGI China) MIT 103B / 9B Flash (MoE) and 19B Lite (dense),Omni-modal family: Flash-preview for any-to-any (text, image gen, audio/video) with sparse MoE on Ling-Flash-2.0, high-fidelity text rendering; Lite (v1.5) lightweight full-modal for edge deployment with fast inference.
October 27 MiniMax-M2 MiniMax AI (China) 230B / 10B (MoE) Open (permissive) Compact MoE for coding/agentic workflows; multi-file edits, coding-run-fix loops, toolchains; low latency/high throughput; supports <think> format; outperforms peers on SWE-bench/Terminal-Bench.
October 21 Qwen3-VL Alibaba (Qwen Team) 2B / 32B (2 sizes; dense; Instruct only) Apache 2.0 Additional VL sizes: 2B ultra-compact for edge devices with minimal VRAM; 32B mid-large excels in long-doc/video, screenshot-to-code; same 256K→1M context and multimodal capabilities as earlier releases.
October 15 Qwen3-VL Alibaba (Qwen Team) 4B / 8B (2 sizes; dense; Instruct and Thinking variants) Apache 2.0 Vision-language family with 256K→1M context; OCR, spatial grounding (2D/3D), visual coding, GUI agents; 32-language OCR; FP8 optimized for low VRAM; Thinking variants enhance multimodal reasoning/STEM; strong in long-doc/video comprehension.
October 13 Ring-1T Inclusion (AntLingAGI China) MIT 1T (MoE),Full release of trillion-param thinking model on Ling 2.0 arch; silver-level IMO (solved Problem 3); tops AIME '25 (92.6%), CodeForces, ARC-AGI; RLVR/IcePop tuning for stable multi-step reasoning/agents.
October 9 Ling-1T Ant Group (Inclusion/AntLingAGI China) MIT 1T (MoE),Flagship trillion-parameter general-purpose model; hybrid Syntax–Function–Aesthetics reward for code gen; strong in maths/coding; base for Ling family; pretrained on massive data for broad capabilities.
October 8 Qwen3 Omni Alibaba (Qwen Team) 30B (2 variants: standard Apache 2.0 Realtime; dense),End-to-end omni-modal (text/image/audio/video); unified architecture with Thinker/Talker MoEs for reasoning/speech gen; 58% Big Bench Audio; 119 text langs, 19 speech in/10 out; 17 voice options; Realtime variant for low-latency speech-to-speech (0.9s first audio).
September 30 GLM-4.6 Zhipu AI (Z.ai China) MIT 355B / 32B (MoE),Flagship upgrade to GLM-4.5; 200K context; ties Sonnet 4.5 on agentic/reasoning/coding benchmarks (tops AIME '25, LiveCodeBench v6); enhanced tool-use, search workflows, writing, translation; 30%+ token efficiency gains.
September 29 Ring-1T-preview Inclusion (AntLingAGI China) MIT 1T (MoE),World's first open-source 1T-param reasoning model; pretrained on 20T tokens, tuned with RLVR/IcePop for stable multi-step thinking; tops AIME 2025 (92.6), CodeForces, ARC-AGI; solved IMO 2025 Problem 3 in one shot via AWorld agents; hybrid MoE from Ling 2.0 lineage.
September 29 DeepSeek-V3.2-Exp DeepSeek AI (China) 671B / 37B (MoE) MIT Experimental update to V3.1-Terminus; introduces DeepSeek Sparse Attention (DSA) for fine-grained sparse processing; major efficiency gains in long-context training/inference (e.g., adaptive expert routing, better memory); maintains near-identical quality to prior versions.
September 23 Qwen3-VL-235B-A22B Alibaba (Qwen Team) 235B / 22B (2 variants: Instruct Apache 2.0 Thinking; MoE),Flagship vision-language model; Instruct variant outperforms Gemini 2.5 Pro on visual perception, GUI navigation, screenshot-to-code; Thinking variant SOTA on multimodal reasoning/STEM with deep causal analysis; 256K+ context for videos/PDFs; 32-lang OCR and 2D/3D spatial reasoning.
September 22 DeepSeek-V3.1-Terminus DeepSeek AI (China) 671B / 37B (MoE) MIT Update to V3.1; improved language consistency (fewer CN/EN mix-ups); enhanced Code/Search Agent performance; hybrid modes for reasoning (up to 64K tokens); stronger benchmarks in agentic tasks (e.g., SimpleQA: 96.8).
September 10 Qwen3-Next-80B-A3B Alibaba (Qwen Team) 80B / 3B (MoE) Apache 2.0 Next-gen MoE variant; excels in complex reasoning; outperforms larger models in efficiency benchmarks.
September 5 Kimi K2-Instruct-0905 Moonshot AI (China) 1T / 32B (MoE) Apache 2.0 Update to K2; enhanced agentic coding, front-end dev, and tool-calling; 256K context; improved integration with agents.
September 3 Nova-70B-Llama-3.3 LatitudeGames (USA) 70B (dense) Llama 3.3 License Narrative-focused 70B roleplay model trained on Llama 3.3 70B Instruct; built with same techniques as Muse-12B emphasizing relationships and character development; trained on multiple datasets combining text adventures (Wayfarer-style), long emotional narratives, detailed worldbuilding, and general roleplay; all data rewritten to eliminate common AI clichés; small single-turn instruct dataset included; … Link
September 3 Wayfarer-2-12B LatitudeGames (USA) 12B (dense) Apache 2.0 Sequel to original Wayfarer based on Mistral Nemo Base; refined formula with slower pacing and increased response length/detail; death is now possible for ALL characters (not just user); SFT training with three-ingredient recipe: Wayfarer 2 dataset, sentiment-balanced roleplay transcripts, and small instruct core to retain instructional capabilities; maintains pessimistic emotional … Link
September 1 Wayfarer-Large-70B-Llama-3.3 LatitudeGames (USA) 70B (dense) Llama 3.3 License Flagship 70B adventure roleplay model trained on Llama 3.3 70B Instruct; trained with 33/33/33 mixture of 8K text adventure data, 4K roleplay data, and SlimOrca Sonnet subset; SlimOrca instruct subset critical for emphasizing difference between instruct and fiction while amplifying Wayfarer's negative sentiment; regenerated training data from ground up to … Link
August 25 Hermes 4 Nous Research 14B / 70B / 405B (3 sizes; dense) Apache 2.0 Hybrid reasoning family (multi-step CoT + instruction-following); based on Llama 3.1; neutral alignment, uncensored; excels in math, coding, roleplay, and long-context retention; agentic function-calling; 405B flagship offers frontier-level performance with 40K+ context.
August 23 Grok-2 xAI 270B / 115B (MoE) Apache 2.0 Open-sourced weights from 2024 model; advanced reasoning and humor-infused responses; multimodal capabilities added in updates.
August 28 Command A Translate Cohere Labs (Canada) 111B (dense) CC-BY-NC First dedicated machine translation model from Cohere; achieves SOTA translation quality across 23 languages; introduces Deep Translation agentic workflow for iterative refinement; 16K context (8K in + 8K out); outperforms GPT-5, DeepSeek V3, DeepL Pro; enterprise-focused with private deployment options. Link
August 21 Command A Reasoning Cohere Labs (Canada) 111B (dense) CC-BY-NC First Cohere reasoning model with controllable token-budget thinking; excels at complex agentic tasks, tool use, and multilingual reasoning (23 languages); 256K context; hybrid mode (reasoning on/off); outperforms DeepSeek R1 and gpt-oss on enterprise benchmarks; powers North platform. Link
August 21 DeepSeek V3.1 DeepSeek AI (China) 671B / 37B (MoE) MIT Hybrid modes (thinking/non-thinking); pricing optimizations; improved multilingual and safety performance.
August 20 Seed-OSS-36B ByteDance Seed Team (China) 36B (3 variants: Base w/ synthetic Apache 2.0 Base w/o synthetic, Instruct; dense),Native 512K context window (4x mainstream models); "Thinking Budget" mechanism for flexible reasoning depth control (512 to 16K tokens); trained on only 12T tokens yet achieves SOTA on multiple benchmarks; SOTA open-source on AIME24 (91.7%), LiveCodeBench v6 (67.4), RULER 128K (94.6); research-friendly dual base release (with/without … Link
August 14 Gemma 3 (270M) Google DeepMind 270M (dense) Open (permissive) Ultra-compact text-only model for task-specific fine-tuning; 256K token vocabulary (170M embeddings + 100M transformer blocks); extreme energy efficiency (0.75% battery per 25 conversations on Pixel 9 Pro); strong instruction-following; QAT INT4 checkpoints; designed for on-device deployment, text classification, entity extraction; can run in browsers.
August 6 Qwen3-8B Alibaba (Qwen Team) 8B (dense) Apache 2.0 Compact dense model in Qwen3 series; suitable for on-device inference; strong in coding and multilingual support.
August 5 GPT-OSS OpenAI (USA) ~120B / 5.1B (MoE) and ~21B / 3.6B (MoE) Apache 2.0 First OpenAI open-weight release since GPT-2 (2019); both models use MoE architecture with MXFP4 quantization; 120B for reasoning and complex tasks (fits single H100); 21B for lightweight applications and on-device deployment (runs on 16GB); 128K context; adjustable reasoning effort levels; strong on code generation and structured reasoning. Link
July 31 Command A Vision Cohere (Canada) ~111B (est.) Commercial First commercial Cohere model with vision capabilities (text + image); 128K context; enterprise-focused for document analysis, chart interpretation, OCR; supports up to 20 images per request; multilingual support (English, French, German, Italian, Portuguese, Spanish). Link
July 25 GLM-4.5 Zhipu AI (Z.ai China) MIT 355B / 32B (MoE) and 106B / 12B Air (2 variants: standard, Air; MoE),Hybrid reasoning family (thinking/non-thinking modes); standard version excels in agentic coding, tool use, and complex tasks; Air variant for efficient deployment with lower resource needs; 128K context; strong in reasoning and multilingual support.
July 23 HyperCLOVA X SEED 14B Think Naver Cloud (South Korea) 14B (dense) Apache 2.0 First open-source HyperCLOVA X reasoning model with advanced AI agent capabilities; trained at 1% cost of comparable global models (52.6× lower than Qwen2.5-14B, 91.38× lower than Qwen3-14B) through parameter pruning and knowledge distillation; multi-stage RL pipeline: SFT → RLVR (Reinforcement Learning with Verifiable Rewards) → Length Controllability → RLHF; solves … Link
July 22 Qwen3-Coder-480B-A35B-Instruct Alibaba (Qwen Team) 480B / 35B (MoE with 160 experts; 8 active) Apache 2.0 Advanced agentic coding model with 256K native context (extends to 1M); trained on 7.5T tokens (70% code); long-horizon RL with 20K parallel environments; SOTA on SWE-Bench Verified; supports 100+ languages; includes Qwen Code CLI tool. Link
July 22 Qwen3-235B-A22B-Instruct-2507 Alibaba (Qwen Team) 235B / 22B (MoE) Apache 2.0 Major instruct update to flagship; enhanced instruction-following and task-specific fine-tuning.
July 19 OpenReasoning-Nemotron NVIDIA (USA) 1.5B / 8B / 32B (3 sizes; dense) Apache 2.0 Distilled reasoning suite from DeepSeek R1-0528; SOTA in math/science/code (GPQA, MMLU-PRO, AIME 2025); tops LiveCodeBench/SciCode; supports TensorRT-LLM/NeMo integration; optimized for Hugging Face Transformers and ONNX deployment; commercially permissive.
July 16 Voxtral Mistral AI (France) 24B Small and 3B Mini (2 sizes; dense) Apache 2.0 Audio LLM family; Small (24B) transcribes 30-min audio, understands 40-min with Q&A/summarization; Mini (3B) lightweight for edge ASR tasks with automatic lang detection; multilingual ASR + LLM backbone based on Small 3.1; optimized for European languages.
July 16 Kimi K2 Moonshot AI (China) 1T / 32B (MoE) Apache 2.0 Agentic intelligence focus; state-of-the-art in creative writing and long-context tasks; open-source for experimentation.
July 10 LFM2 Liquid AI (USA) 350M, 700M, 1.2B, 2.6B (4 dense) + 8B-A1B MoE (8.3B total / 1.5B active) Apache 2.0-based (free for <$10M revenue) Hybrid architecture (10 double-gated short-range convolution blocks + 6 GQA blocks); 3× faster training than previous LFM generation; 2× faster decode/prefill on CPU vs Qwen3; edge/on-device deployment focus (smartphones, laptops, vehicles); outperforms Qwen3, Gemma 3, Phi-4-Mini in size classes; pre-trained on 10-12T tokens with 32K-context mid-training; supports creative writing, agentic … Link
July 8 T5Gemma Google (USA) 2B-2B, 9B-2B, 9B-9B (Gemma 2 Series) and Small/Base/Large/XL/ML (T5-compatible Series; encoder-decoder) Apache 2.0 First encoder-decoder models adapted from Gemma 2 via novel adaptation technique; converts pretrained decoder-only models into encoder-decoder architecture using UL2 or PrefixLM training; achieves comparable/better performance than Gemma 2 counterparts while dominating quality-efficiency frontier; T5Gemma 2B-2B IT gains +12 points MMLU and +12.7% GSM8K over Gemma 2 2B; flexible unbalanced … Link
July 8 SmolLM3-3B Hugging Face (USA) 3B (dense) Apache 2.0 Compact multilingual reasoning model with dual-mode (think/no_think); trained on 11.2T tokens; supports 128K context (6 languages); outperforms Llama 3.2 3B and Qwen2.5 3B; competitive with 4B models; GQA and NoPE architecture. Link
June 20 Mistral Small 3.2 Mistral AI (France) 24B (dense) Apache 2.0 Maintenance release focused on targeted refinements; enhanced instruction-following (84.78% accuracy vs 82.75% in v3.1); reduced infinite/repetitive generations by ~50% (1.29% vs 2.11%); improved function calling template for robust tool-use scenarios; major gains on Wildbench v2 (65.33% vs 55.6%) and Arena Hard v2 (43.1% vs 19.56%); enhanced STEM performance (HumanEval Pass@5: … Link
June 16 MiniMax-M1 MiniMax AI (China) 456B / 45.9B (MoE) Open (permissive) Hybrid-attention reasoning model; Lightning attention for efficient scaling; 1M token context; RL with CISPO; outperforms DeepSeek-R1 on SWE-bench/GPQA; function calling/agentic tools.
June 10 Magistral Small Mistral AI (France) 24B (dense) Apache 2.0 First reasoning model with CoT capabilities; transparent step-by-step thinking; multilingual expert domains; outperforms non-reasoning LLMs in accuracy.
May 28 DeepSeek R1-0528 DeepSeek AI (China) 671B / 37B (MoE) MIT Update to R1; reduced hallucinations; improved math/code benchmarks; enhanced frontend integration and agentic capabilities.
May 21 Falcon Arabic TII (UAE) 7B (dense) Apache 2.0 (TII Falcon License) First Arabic-focused Falcon model; trained on native Modern Standard Arabic and regional dialects; best-performing Arabic model in its class; matches performance of 70B models; built on Falcon 3-7B. Link
May 21 Falcon-H1 TII (UAE) 500M-34B (6 sizes: 500M Apache 2.0 (TII Falcon License) 1.5B, 1.5B-deep, 3B, 7B, 34B),Hybrid Transformer-Mamba architecture; 256K context; multilingual (100+ languages); outperforms Llama and Qwen in its class; optimized for edge deployment; available as NVIDIA NIM microservice. Link
May 21 Devstral Mistral AI (France) ~30B (dense Apache 2.0 est.),Agentic coding model for software engineering; tops SWE-Bench Verified (46.8%); handles multi-file repos and complex workflows; collab with All Hands AI.
May 16 Harbinger-24B LatitudeGames (USA) 24B (dense) Apache 2.0 Premium adventure roleplay model for immersive stories with real consequences; trained on Mistral Small 3.1 Instruct using two-stage approach (SFT on multi-turn Wayfarer-style text adventures and general roleplay + DPO for narrative coherence); applies same DPO techniques as Muse to reduce clichés and repetitive patterns; focuses on enhancing instruction following, … Link
May 13 Muse-12B LatitudeGames (USA) 12B (dense) Apache 2.0 Narrative-focused roleplay model emphasizing polish and coherence; uses DPO (Direct Preference Optimization) to reduce AI clichés and repetitive patterns; designed for immersive storytelling with refined outputs; less punishing than Wayfarer line while maintaining narrative quality; free to use on AI Dungeon; trained to produce more sophisticated and varied narrative responses. Link
May 13 Wayfarer-12B LatitudeGames (USA) 12B (dense) Apache 2.0 Adventure role-play model trained for challenging and dangerous text-based experiences; counters positivity bias in modern AI by embracing conflict, failure states, and character death; trained on Mistral Nemo Base using two-stage SFT (180K instruct data + 50/50 mix of synthetic 8K context text adventures and roleplay); data generated by simulating … Link
April 28 Qwen3 Alibaba (Qwen Team) 235B / 22B (MoE) and 30B (dense) Apache 2.0 Flagship hybrid reasoning family (thinking/non-thinking modes); 235B MoE flagship with 119 languages support; 30B dense variant for efficient deployment; strong in coding and creative tasks; lower resource needs in dense variant.
April 23 HyperCLOVA X SEED Naver Cloud (South Korea) 3B (multimodal), 1.5B (text), 0.5B (text) Apache 2.0 First open-source HyperCLOVA X models released for commercial use under Korea's sovereign AI ecosystem initiative; SEED 3B is multimodal (text+image) designed for Korean linguistic and cultural context understanding with visual data comprehension; outperforms competing models in image/video understanding within Korean contexts; trained on high-quality Korean-centric data with years of accumulated … Link
April 10 Kimi-VL Moonshot AI (China) 16B / 2.8B (MoE) MIT Efficient multimodal vision-language model; 128K context; native-resolution MoonViT encoder for ultra-high-res images; strong on long video (64.5 LongVideoBench) and document understanding (35.1 MMLongBench-Doc); excels in OCR (83.2 InfoVQA), agent tasks (OSWorld), and multi-image reasoning; competes with GPT-4o-mini and Qwen2.5-VL-7B; includes Kimi-VL-Thinking variant with long CoT for enhanced multimodal reasoning (61.7 … Link
April 8 Llama Nemotron Ultra NVIDIA (USA) Varies (Llama 4-based) Apache 2.0 Advanced reasoning model; leads open-weights on GPQA (76%), AIME math, LiveCodeBench coding; optimized for NVIDIA hardware inference.
April 5 Llama 4 Meta AI 109B / 17B Scout (16 experts, MoE) and 400B / 17B Maverick (128 experts, MoE) Llama License (open-weight) Natively multimodal (text, image, video); Scout (109B/17B) with 10M context for accessibility and real-world AI; Maverick (400B/17B) with 1M context for high-performance tasks with scalable inference; both use early fusion architecture.
March 25 DeepSeek-V3-0324 DeepSeek AI (China) 671B / 37B (MoE) MIT Update to V3 base; major boost in reasoning and front-end development; improved multilingual benchmarks over predecessor; AIME improved from 39.6% to 59.4%, LiveCodeBench from 39.2% to 49.2%; first open-weights model to lead non-reasoning models on Artificial Analysis Intelligence Index; MIT License. Link
March 18 Llama Nemotron NVIDIA (USA) 8B Nano, 49B Super, 253B Ultra (3 sizes; Llama 4-based) Apache 2.0 Open reasoning family on Llama 4; Nano (8B) for PC/edge, Super (49B) for single GPU with best throughput, Ultra (253B) for maximum agentic accuracy; 20% improved accuracy vs base models, 5x faster inference; excels in multi-agent collaboration, workflow automation, and domain-specific fine-tuning; compute-efficient for enterprise agents. Link
March 17 Mistral Small 3.1 Mistral AI (France) 24B (2 variants: Base Apache 2.0 Instruct; dense),Adds state-of-the-art vision understanding to Small 3; 128K context window; multimodal (text + vision); first open-source model to surpass leading proprietary models across text, vision, and multilingual capabilities in its weight class; 150 tokens/s; runs on single RTX 4090 or 32GB RAM Mac; multilingual (dozens of languages); both Base … Link
March 13 Command A Cohere Labs (Canada) 111B (dense) CC-BY-NC Enterprise-optimized model excelling at tool use, RAG, and agentic tasks; 256K context; 150% higher throughput than Command R+; competitive with GPT-4o and DeepSeek V3; requires only 2 GPUs; strong multilingual support (23 languages). Link
March 12 Gemma 3 Google DeepMind 1B / 4B / 12B / 27B (4 sizes; dense) Open (permissive) Multimodal (text + vision) family; optimized for lightweight to enterprise-grade deployment; includes ShieldGemma for content moderation; strong in safety alignments and instruction-following; excels in reasoning and long-context handling; 128K context; supports 140+ languages. Link
March 4 Aya Vision Cohere Labs (Canada) 8B and 32B (2 sizes; dense) Research License State-of-the-art multimodal research model (text + image); excels across multiple languages and modalities; outperforms leading open-weight models on language, text, and image benchmarks; supports 23 languages; introduces AyaVisionBench evaluation suite; research use only. Link
February 26 Phi-4 Microsoft (USA) 3.8B mini and 5.6B multimodal (2 variants; dense) MIT Small model family: mini (3.8B) compact reasoning for math/coding on mobile devices with GQA; multimodal (5.6B) supports text, image, audio, video via Mixture of LoRAs; outperforms Gemini 2.0 Flash on audio+visual benchmarks; 200K vocab (20+ languages); top Hugging Face OpenASR leaderboard (6.14% WER); 128K context. Link
January 30 Mistral Small 3 Mistral AI (France) 24B (dense) Apache 2.0 Efficient base model for low-latency tasks; outperforms Llama 3.3 70B in internal evals; ideal for fine-tuning in automation/agent workflows; no RL/synthetic data used. Link
January 20 DeepSeek R1 DeepSeek AI (China) 671B / 37B (MoE) MIT Advanced reasoning with cold-start RL training; excels in math, code, and complex problem-solving; supports JSON output and function calling. Link
January 15 MiniMax 01 MiniMax AI (China) 456B / 45.9B (2 variants: Text-01 Open (permissive) VL-01; MoE),Foundational 01 series with Lightning Attention for linear complexity; 4M token context; 100% Needle-In-A-Haystack retrieval; Text-01 for language tasks, VL-01 for multimodal (text/image) with visual reasoning/OCR. Link
January 10 Sky-T1-32B-Preview UC Berkeley Sky Computing Lab (USA) 32B (dense) MIT-style (fully open) Open reasoning model trained for <$450 in 19 hours on 8 H100s; competitive with OpenAI o1-preview on MATH500 and AIME; trained using QwQ-32B-Preview synthetic data with rejection sampling and GPT-4o-mini reformatting; fully open with training code and data. Link
January 8 Phi-4 Microsoft (USA) 14B (dense) MIT Small language model optimized for math and coding; trained on 9.8T tokens with synthetic data; outperforms Llama 3.3 70B on MATH and GPQA despite 5x fewer parameters; decoder-only transformer with 4K context. Link