Open Weight Text-to-Text Models (2025)

96 models tracked

Date	Model	Developer	Parameters	License	Key Features	Paper
December 29	HyperCLOVA X SEED 8B Omni	Naver Cloud (South Korea)	8B (dense any-to-any model)	Apache 2.0	First native omnimodal architecture from Naver; Korean-centered any-to-any model processing and generating across text, images, and audio in unified architecture; eliminates modality barriers through integrated reasoning workflows; seamlessly handles explanations, conversations, visual analysis, and voice interactions; text-based image generation and editing functions; built on deep Korean language understanding with exceptional …	Link
December 29	HyperCLOVA X SEED 32B Think	Naver Cloud (South Korea)	32B (dense vision-language model)	Apache 2.0	Advanced vision-language reasoning model scaling beyond SEED 14B Think; unified Transformer architecture processing text tokens and visual patches in shared embedding space; multimodal capabilities across text, images, and video with 128K context window; optional thinking mode for deep controllable reasoning; knowledge cutoff May 2025; strengthens Korean-centric reasoning and agentic capabilities …	Link
December 29	Llama 3.3 8B Instruct	Meta (USA)	8B (dense)	Llama 3.3 Community License	The "lost" Llama 3.3 8B extracted from Meta's Llama API via finetuning workaround; model existed behind API since April 2025 but weights never officially released; extracted by downloading finetuned model and subtracting adapter to recover base; significant improvements over Llama 3.1 8B: 81.95% IFEval (vs 78.2%), 37.0% GPQA Diamond (vs …	Link
December 29	WeDLM-8B	Tencent (China)	8B (2 variants: Base, Instruct; dense)	Apache 2.0	First production diffusion language model with standard causal attention; initialized from Qwen3-8B; introduces Topological Reordering for parallel mask recovery under causal attention + Streaming Parallel Decoding for continuous prefix commitment; 3-6× faster than vLLM-optimized Qwen3-8B on math reasoning (GSM8K), 2-3× faster on code generation, 1.5-2× faster on open-ended QA; outperforms …	Link
December 25	MiniMax M2.1	MiniMax (China)	230B / 10B (MoE with 23:1 sparsity ratio)	Apache 2.0	Enhanced successor to M2 focused on multi-language programming and real-world complex tasks; 74% SWE-bench Verified, 72.5% SWE-bench Multilingual, 88.6% VIBE benchmark (Visual & Interactive Benchmark for Execution); outperforms Claude Sonnet 4.5 and approaches Claude Opus 4.5 on multilingual coding; exceptional multi-language capabilities across Rust, Java, Go, C++, Kotlin, Objective-C, TypeScript, …	Link
December 25	LFM2-2.6B-Exp	Liquid AI (USA)	2.6B (dense)	Apache 2.0	Experimental checkpoint built on LFM2-2.6B using pure reinforcement learning; hybrid architecture with 10 double-gated short-range convolution blocks + 6 Grouped Query Attention (GQA) blocks; specifically trained on instruction following, knowledge, and math; achieves 82.41% GSM8K, 79.56% IFEval, 42% GPQA; IFBench score surpasses DeepSeek R1-0528 (a model 263× larger); 3× faster …	Link
December 22	GLM-4.7	Zhipu AI (China)	358B / ~32B (MoE)	MIT	Latest flagship model with major improvements in coding and creative writing; Core Coding: 73.8% SWE-bench (+5.8%), 66.7% SWE-bench Multilingual (+12.9%), 41% Terminal Bench 2.0 (+10%), 84.9% LiveCodeBench v6; Vibe Coding: cleaner/more modern webpages and better-looking slides with accurate layout/sizing; Complex Reasoning: 42.8% HLE with tools (+12.4%), 95.7% AIME 2025, 97.1% …	Link
December 18	Hearthfire-24B	LatitudeGames (USA)	24B (dense)	Apache 2.0	Narrative longform writing model designed to embrace quiet moments and atmosphere; based on Mistral Small 3.2 Instruct; philosophy of 'vibes over velocity' prioritizing introspection and slow burn over constant action; deliberately slower-paced with cooperative and atmospheric tone (vs Wayfarer's grit and consequence); trained with SFT on single dataset of thousands …	Link
December 18	FunctionGemma	Google (USA)	270M (dense)	Gemma License	Specialized Gemma 3 270M fine-tuned for unified chat and function calling; translates natural language into structured API calls while maintaining conversational ability; achieves 85% accuracy on Mobile Actions benchmark after fine-tuning (vs 58% baseline); designed for edge deployment on mobile phones and devices like NVIDIA Jetson Nano; runs fully offline …	Link
December 16	T5Gemma 2	Google (USA)	270M-270M, 1B-1B, 4B-4B (3 encoder-decoder sizes)	Apache 2.0	Next generation of T5Gemma with multimodal and long-context capabilities; extends T5Gemma's adaptation recipe (UL2) from text-only to multimodal based on Gemma 3; processes text and vision inputs; introduces tied word embeddings (shares all embeddings across encoder and decoder for efficiency) and merged attention (unifies decoder self-attention and cross-attention into single …	Link
December 16	Nemotron-Cascade	NVIDIA (USA)	8B / 14B (3 variants: 8B unified, 8B-Thinking, 14B-Thinking; dense, post-trained from Qwen3)	NVIDIA Open Model License	General-purpose reasoning models trained with novel Cascade RL (sequential domain-wise reinforcement learning); 14B-Thinking outperforms DeepSeek-R1-0528 (671B) on LiveCodeBench v5/v6/Pro; achieves silver-medal performance on 2025 IOI (International Olympiad in Informatics); 8B models match DeepSeek-R1-0528 on LiveCodeBench despite being 80× smaller; beats Gemini 2.5 Pro, o4-mini, Qwen3-235B on coding benchmarks; unified 8B …	Link
December 16	MiMo-V2-Flash	Xiaomi (China)	309B / 15B (MoE with 256 experts; 8 active)	MIT	Frontier-class foundation model excelling in reasoning, coding, and agentic workflows; #1 open-source on SWE-Bench Verified (73.4%) and SWE-Bench Multilingual (71.7%); 94.1% AIME 2025 (top 2 open-source); hybrid attention architecture with 5:1 SWA:GA ratio using aggressive 128-token sliding window; 6× reduction in KV-cache vs full attention; 256K context; trained on 27T …	Link
December 15	QwenLong-L1.5	Alibaba (Qwen Team)	30B / 3B (MoE)	Apache 2.0	Long-context reasoning model based on Qwen3-30B-A3B-Thinking with memory management for ultra-long contexts (1M-4M tokens); three core innovations: (1) Multi-hop reasoning data synthesis pipeline that moves beyond needle-in-haystack tasks to generate complex reasoning requiring globally distributed evidence, (2) Adaptive Entropy-Controlled Policy Optimization (AEPO) algorithm for stable long-context RL training with task-balanced …	Link
December 15	Nemotron 3 Nano	NVIDIA (USA)	31.6B / 3.6B (2 variants: Base	NVIDIA Open Model License	Instruct; hybrid Mamba-Transformer MoE),Breakthrough agentic AI model with hybrid Mamba-2 + Transformer + MoE architecture (activates 6 of 128 experts per pass); 1M-token context window natively; both Base and Instruct (post-trained) variants released; 4x faster throughput than Nemotron 2 Nano; 3.3x higher throughput than Qwen3-30B-A3B and 2.2x vs GPT-OSS-20B on …	Link
December 12	OLMo 3.1	Allen Institute for AI (USA)	32B Think / 32B Instruct / 7B RL-Zero variants (3 model types; dense)	Apache 2.0	Extended training of OLMo 3 with additional 21 days on 224 GPUs; Think 32B outperforms Qwen3-32B on AIME 2025 and performs close to Gemma 27B; Instruct 32B is strongest fully open 32B-scale instruct model; substantial improvements: +5 points AIME, +4 points ZebraLogic, +4 points IFEval, +20 points IFBench; beats Gemma …	Link
December 11	LLaDA 2.0	inclusionAI / Ant Group (China)	16B / 1.4B mini and 100B / 6.1B flash (2 variants; MoE)	Apache 2.0	First diffusion language model (dLLM) scaled to 100B parameters; uses iterative refinement approach instead of autoregressive generation (starts with fully masked sequence and unmasks tokens in parallel across multiple rounds); 2.1x faster inference than comparable AR models (535 tokens/s); trained on ~20T tokens; excels at code generation, complex reasoning, and …	Link
December 9	Nomos 1	Nous Research (USA)	31B (fine-tune of Qwen3-30B-A3B-Thinking-2507; MoE)	Apache 2.0	Specialized mathematical reasoning model for problem-solving and proof-writing in natural language; developed in collaboration with Hillclimb AI; scores 87/120 on Putnam 2025 (base model only achieves 24/120 - 3.6x improvement); designed to work with Nomos Reasoning Harness (open-sourced concurrently); significant advancement in domain-specific mathematical capabilities; demonstrates power of targeted fine-tuning …	Link
December 9	Devstral 2	Mistral AI (France)	123B Devstral 2 and 24B Small 2 (2 variants; dense)	Modified MIT (Devstral 2) / Apache 2.0 (Small 2)	Next-generation agentic coding model family; 256K context; SOTA open-weight on SWE-bench Verified (72.2%, huge jump from original Devstral's 46.8%); 7x more cost-efficient than Claude Sonnet for real-world coding tasks; business context awareness similar to Le Chat's conversational memory; ships with Mistral Vibe CLI for natural language code automation and vibe …	Link
December 8	GLM-4.6V	Zhipu AI (Z.ai	China)	MIT	106B / 12B (MoE) and 9B Flash (dense),First GLM with native Function Calling integration; multimodal vision-language model based on GLM-4.5-Air; 128K context; SOTA on 42 public vision-language benchmarks; multimodal document understanding (processes up to 128K tokens of multi-document input as images); frontend replication with pixel-accurate HTML/CSS from UI screenshots; visual …	Link
December 5	Rnj-1	Essential AI (USA)	8.3B (dense)	Apache 2.0	First model from Essential AI (founded by Ashish Vaswani); exceptional code generation and agentic capabilities; leads 8B class on SWE-bench Verified (20.8%, beating Gemini 2.0 Flash and Qwen2.5-Coder 32B); SOTA tool use on BFCL; strong math (AIME) and STEM (GPQA); 32K context with YaRN extension; trained on 8.4T tokens using …	Link
December 3	Hermes 4.3	Nous Research (USA)	36B (based on ByteDance Seed-OSS-36B-Base; dense)	Apache 2.0	First production model trained entirely on Psyche distributed network; matches/exceeds Hermes 4 70B performance at half parameter cost; 512K context (extended from 128K); hybrid reasoning with <think> tags; SOTA on RefusalBench; trained twice (centralized vs distributed) with Psyche version outperforming; uses DisTrO optimizer for internet-scale distributed training secured by Solana …	Link
December 2	Ministral 3	Mistral AI (France)	3B / 8B / 14B (3 sizes × 3 variants: Base	Apache 2.0	Instruct, Reasoning; dense),Multimodal edge-optimized family (text + vision); 128K-256K context; single GPU deployment; Base for foundation tasks, Instruct for chat/assistants, Reasoning for complex logic; 14B Reasoning achieves 85% on AIME 2025; can run on laptops/phones/drones; efficient token generation.	Link
December 2	Mistral Large 3	Mistral AI (France)	675B / 41B (MoE)	Apache 2.0	First open-weight frontier model with unified multimodal (text + image) and multilingual capabilities; granular MoE architecture; 256K context window; excels in long-document understanding, agentic workflows, coding, and multilingual processing; trained on 3000 H200 GPUs; ranked #2 in OSS non-reasoning on LMArena.	Link
December 1	DeepSeek V3.2	DeepSeek AI (China)	671B / 37B (2 variants: standard	MIT	Speciale; MoE),First DeepSeek to integrate thinking into tool-use; hybrid thinking/non-thinking modes; standard version reaches GPT-5 level (93.1% AIME, 92.5% HMMT); Speciale variant for extreme reasoning with gold medals in IMO/CMO/ICPC/IOI 2025 (99.2% HMMT, 35/42 IMO); combines theorem-proving from Math-V2; massive agent training (1,800+ environments).	Link
December 1	Trinity	Arcee AI (USA)	6B / 1B Nano (MoE) and 26B / 3B Mini (MoE)	Apache 2.0	U.S.-trained MoE family with AFMoE architecture; 128K context; trained on 10T tokens; Nano (6B/1B) for chat with personality and on-device AI; Mini (26B/3B) for high-throughput reasoning, function calling, and agent workflows; strong on MMLU and BFCL V3.	Link
November 27	DeepSeek-Math-V2	DeepSeek AI (China)	685B (built on DeepSeek-V3.2-Exp-Base)	Apache 2.0	Self-verifying mathematical reasoning model with verifier-generator dual architecture; gold medal IMO 2025 (5/6 problems, 83.3%) and CMO 2024 (73.8%); near-perfect Putnam 2024 (118/120 points); IMO-ProofBench: 99% basic, 61.9% advanced; combines theorem-proving with self-verification; scales verification compute.	Link
November 26	INTELLECT-3	Prime Intellect (USA)	106B / 12B (MoE)	MIT	Post-trained on GLM-4.5-Air-Base using SFT and RL; trained on 512 H200 GPUs with prime-rl framework; SOTA performance for size on math (90.8% AIME 2024), code, and reasoning; fully open-sourced with complete RL stack and environments.	Link
November 21	Nanbeige4-3B-Thinking-2511	Nanbeige LLM Lab / BOSS Zhipin (China)	3B (dense)	Apache 2.0	Small reasoning model with exceptional performance-to-size ratio; outperforms Qwen3-32B on AIME 2024 (90.4 vs 81.4) and GPQA-Diamond (82.2 vs 68.7); trained on 23T tokens with novel Fine-Grained Warmup-Stable-Decay (FG-WSD) technique; ranks #11 on WritingBench and #15 on EQBench3; scores 60 on Arena-Hard V2; SOTA open-source under 32B parameters on multiple …	Link
November 20	OLMo 3	Allen Institute for AI (USA)	7B / 32B (multiple variants: Base, Think, Instruct, RL Zero; dense)	Apache 2.0	Fully open model family trained on Dolma 3 (6T tokens); 65K context; Base for foundation tasks; Think for explicit reasoning (matches Qwen 3 on MATH); Instruct for chat/tool use; RL Zero for research; competitive with Qwen 2.5/Gemma 3; complete transparency from data to deployment; first fully open 32B thinking model.	Link
November 12	Baguettotron	PleIAs (France)	321M (dense)	Apache 2.0	Small reasoning model with ultra-deep 80-layer "baguette" architecture; trained on 200B tokens of fully synthetic SYNTH dataset; native thinking traces with stenographic notation; best-in-class for size on MMLU, GSM8K, HotPotQA; multilingual (French, German, Italian, Spanish, Polish); trained on only 16 H100s; RAG-optimized with source grounding.	Link
November 6	Kimi K2 Thinking	Moonshot AI (China)	1T / 32B (MoE)	Modified MIT	Thinking agent with step-by-step reasoning and dynamic tool use; 256K context; SOTA on HLE (44.9% w/ tools) and BrowseComp (60.2%); 200-300 sequential tool calls; native INT4 quantization for 2x speed; excels at agentic coding/workflows; tops SWE-Bench Verified (71.3%).
October 31	Kimi Linear	Moonshot AI (China)	48B / 3B (MoE)	MIT	Hybrid linear attention architecture with Kimi Delta Attention (KDA); 3:1 KDA-to-global MLA ratio; outperforms full attention across short/long-context and RL tasks; 75% KV cache reduction; 6x faster decoding at 1M context; trained on 5.7T tokens.	Link
October 27	Ming Omni	Inclusion (AntLingAGI	China)	MIT	103B / 9B Flash (MoE) and 19B Lite (dense),Omni-modal family: Flash-preview for any-to-any (text, image gen, audio/video) with sparse MoE on Ling-Flash-2.0, high-fidelity text rendering; Lite (v1.5) lightweight full-modal for edge deployment with fast inference.
October 27	MiniMax-M2	MiniMax AI (China)	230B / 10B (MoE)	Open (permissive)	Compact MoE for coding/agentic workflows; multi-file edits, coding-run-fix loops, toolchains; low latency/high throughput; supports <think> format; outperforms peers on SWE-bench/Terminal-Bench.
October 21	Qwen3-VL	Alibaba (Qwen Team)	2B / 32B (2 sizes; dense; Instruct only)	Apache 2.0	Additional VL sizes: 2B ultra-compact for edge devices with minimal VRAM; 32B mid-large excels in long-doc/video, screenshot-to-code; same 256K→1M context and multimodal capabilities as earlier releases.
October 15	Qwen3-VL	Alibaba (Qwen Team)	4B / 8B (2 sizes; dense; Instruct and Thinking variants)	Apache 2.0	Vision-language family with 256K→1M context; OCR, spatial grounding (2D/3D), visual coding, GUI agents; 32-language OCR; FP8 optimized for low VRAM; Thinking variants enhance multimodal reasoning/STEM; strong in long-doc/video comprehension.
October 13	Ring-1T	Inclusion (AntLingAGI	China)	MIT	1T (MoE),Full release of trillion-param thinking model on Ling 2.0 arch; silver-level IMO (solved Problem 3); tops AIME '25 (92.6%), CodeForces, ARC-AGI; RLVR/IcePop tuning for stable multi-step reasoning/agents.
October 9	Ling-1T	Ant Group (Inclusion/AntLingAGI	China)	MIT	1T (MoE),Flagship trillion-parameter general-purpose model; hybrid Syntax–Function–Aesthetics reward for code gen; strong in maths/coding; base for Ling family; pretrained on massive data for broad capabilities.
October 8	Qwen3 Omni	Alibaba (Qwen Team)	30B (2 variants: standard	Apache 2.0	Realtime; dense),End-to-end omni-modal (text/image/audio/video); unified architecture with Thinker/Talker MoEs for reasoning/speech gen; 58% Big Bench Audio; 119 text langs, 19 speech in/10 out; 17 voice options; Realtime variant for low-latency speech-to-speech (0.9s first audio).
September 30	GLM-4.6	Zhipu AI (Z.ai	China)	MIT	355B / 32B (MoE),Flagship upgrade to GLM-4.5; 200K context; ties Sonnet 4.5 on agentic/reasoning/coding benchmarks (tops AIME '25, LiveCodeBench v6); enhanced tool-use, search workflows, writing, translation; 30%+ token efficiency gains.
September 29	Ring-1T-preview	Inclusion (AntLingAGI	China)	MIT	1T (MoE),World's first open-source 1T-param reasoning model; pretrained on 20T tokens, tuned with RLVR/IcePop for stable multi-step thinking; tops AIME 2025 (92.6), CodeForces, ARC-AGI; solved IMO 2025 Problem 3 in one shot via AWorld agents; hybrid MoE from Ling 2.0 lineage.
September 29	DeepSeek-V3.2-Exp	DeepSeek AI (China)	671B / 37B (MoE)	MIT	Experimental update to V3.1-Terminus; introduces DeepSeek Sparse Attention (DSA) for fine-grained sparse processing; major efficiency gains in long-context training/inference (e.g., adaptive expert routing, better memory); maintains near-identical quality to prior versions.
September 23	Qwen3-VL-235B-A22B	Alibaba (Qwen Team)	235B / 22B (2 variants: Instruct	Apache 2.0	Thinking; MoE),Flagship vision-language model; Instruct variant outperforms Gemini 2.5 Pro on visual perception, GUI navigation, screenshot-to-code; Thinking variant SOTA on multimodal reasoning/STEM with deep causal analysis; 256K+ context for videos/PDFs; 32-lang OCR and 2D/3D spatial reasoning.
September 22	DeepSeek-V3.1-Terminus	DeepSeek AI (China)	671B / 37B (MoE)	MIT	Update to V3.1; improved language consistency (fewer CN/EN mix-ups); enhanced Code/Search Agent performance; hybrid modes for reasoning (up to 64K tokens); stronger benchmarks in agentic tasks (e.g., SimpleQA: 96.8).
September 10	Qwen3-Next-80B-A3B	Alibaba (Qwen Team)	80B / 3B (MoE)	Apache 2.0	Next-gen MoE variant; excels in complex reasoning; outperforms larger models in efficiency benchmarks.
September 5	Kimi K2-Instruct-0905	Moonshot AI (China)	1T / 32B (MoE)	Apache 2.0	Update to K2; enhanced agentic coding, front-end dev, and tool-calling; 256K context; improved integration with agents.
September 3	Nova-70B-Llama-3.3	LatitudeGames (USA)	70B (dense)	Llama 3.3 License	Narrative-focused 70B roleplay model trained on Llama 3.3 70B Instruct; built with same techniques as Muse-12B emphasizing relationships and character development; trained on multiple datasets combining text adventures (Wayfarer-style), long emotional narratives, detailed worldbuilding, and general roleplay; all data rewritten to eliminate common AI clichés; small single-turn instruct dataset included; …	Link
September 3	Wayfarer-2-12B	LatitudeGames (USA)	12B (dense)	Apache 2.0	Sequel to original Wayfarer based on Mistral Nemo Base; refined formula with slower pacing and increased response length/detail; death is now possible for ALL characters (not just user); SFT training with three-ingredient recipe: Wayfarer 2 dataset, sentiment-balanced roleplay transcripts, and small instruct core to retain instructional capabilities; maintains pessimistic emotional …	Link
September 1	Wayfarer-Large-70B-Llama-3.3	LatitudeGames (USA)	70B (dense)	Llama 3.3 License	Flagship 70B adventure roleplay model trained on Llama 3.3 70B Instruct; trained with 33/33/33 mixture of 8K text adventure data, 4K roleplay data, and SlimOrca Sonnet subset; SlimOrca instruct subset critical for emphasizing difference between instruct and fiction while amplifying Wayfarer's negative sentiment; regenerated training data from ground up to …	Link
August 25	Hermes 4	Nous Research	14B / 70B / 405B (3 sizes; dense)	Apache 2.0	Hybrid reasoning family (multi-step CoT + instruction-following); based on Llama 3.1; neutral alignment, uncensored; excels in math, coding, roleplay, and long-context retention; agentic function-calling; 405B flagship offers frontier-level performance with 40K+ context.
August 23	Grok-2	xAI	270B / 115B (MoE)	Apache 2.0	Open-sourced weights from 2024 model; advanced reasoning and humor-infused responses; multimodal capabilities added in updates.
August 28	Command A Translate	Cohere Labs (Canada)	111B (dense)	CC-BY-NC	First dedicated machine translation model from Cohere; achieves SOTA translation quality across 23 languages; introduces Deep Translation agentic workflow for iterative refinement; 16K context (8K in + 8K out); outperforms GPT-5, DeepSeek V3, DeepL Pro; enterprise-focused with private deployment options.	Link
August 21	Command A Reasoning	Cohere Labs (Canada)	111B (dense)	CC-BY-NC	First Cohere reasoning model with controllable token-budget thinking; excels at complex agentic tasks, tool use, and multilingual reasoning (23 languages); 256K context; hybrid mode (reasoning on/off); outperforms DeepSeek R1 and gpt-oss on enterprise benchmarks; powers North platform.	Link
August 21	DeepSeek V3.1	DeepSeek AI (China)	671B / 37B (MoE)	MIT	Hybrid modes (thinking/non-thinking); pricing optimizations; improved multilingual and safety performance.
August 20	Seed-OSS-36B	ByteDance Seed Team (China)	36B (3 variants: Base w/ synthetic	Apache 2.0	Base w/o synthetic, Instruct; dense),Native 512K context window (4x mainstream models); "Thinking Budget" mechanism for flexible reasoning depth control (512 to 16K tokens); trained on only 12T tokens yet achieves SOTA on multiple benchmarks; SOTA open-source on AIME24 (91.7%), LiveCodeBench v6 (67.4), RULER 128K (94.6); research-friendly dual base release (with/without …	Link
August 14	Gemma 3 (270M)	Google DeepMind	270M (dense)	Open (permissive)	Ultra-compact text-only model for task-specific fine-tuning; 256K token vocabulary (170M embeddings + 100M transformer blocks); extreme energy efficiency (0.75% battery per 25 conversations on Pixel 9 Pro); strong instruction-following; QAT INT4 checkpoints; designed for on-device deployment, text classification, entity extraction; can run in browsers.
August 6	Qwen3-8B	Alibaba (Qwen Team)	8B (dense)	Apache 2.0	Compact dense model in Qwen3 series; suitable for on-device inference; strong in coding and multilingual support.
August 5	GPT-OSS	OpenAI (USA)	~120B / 5.1B (MoE) and ~21B / 3.6B (MoE)	Apache 2.0	First OpenAI open-weight release since GPT-2 (2019); both models use MoE architecture with MXFP4 quantization; 120B for reasoning and complex tasks (fits single H100); 21B for lightweight applications and on-device deployment (runs on 16GB); 128K context; adjustable reasoning effort levels; strong on code generation and structured reasoning.	Link
July 31	Command A Vision	Cohere (Canada)	~111B (est.)	Commercial	First commercial Cohere model with vision capabilities (text + image); 128K context; enterprise-focused for document analysis, chart interpretation, OCR; supports up to 20 images per request; multilingual support (English, French, German, Italian, Portuguese, Spanish).	Link
July 25	GLM-4.5	Zhipu AI (Z.ai	China)	MIT	355B / 32B (MoE) and 106B / 12B Air (2 variants: standard, Air; MoE),Hybrid reasoning family (thinking/non-thinking modes); standard version excels in agentic coding, tool use, and complex tasks; Air variant for efficient deployment with lower resource needs; 128K context; strong in reasoning and multilingual support.
July 23	HyperCLOVA X SEED 14B Think	Naver Cloud (South Korea)	14B (dense)	Apache 2.0	First open-source HyperCLOVA X reasoning model with advanced AI agent capabilities; trained at 1% cost of comparable global models (52.6× lower than Qwen2.5-14B, 91.38× lower than Qwen3-14B) through parameter pruning and knowledge distillation; multi-stage RL pipeline: SFT → RLVR (Reinforcement Learning with Verifiable Rewards) → Length Controllability → RLHF; solves …	Link
July 22	Qwen3-Coder-480B-A35B-Instruct	Alibaba (Qwen Team)	480B / 35B (MoE with 160 experts; 8 active)	Apache 2.0	Advanced agentic coding model with 256K native context (extends to 1M); trained on 7.5T tokens (70% code); long-horizon RL with 20K parallel environments; SOTA on SWE-Bench Verified; supports 100+ languages; includes Qwen Code CLI tool.	Link
July 22	Qwen3-235B-A22B-Instruct-2507	Alibaba (Qwen Team)	235B / 22B (MoE)	Apache 2.0	Major instruct update to flagship; enhanced instruction-following and task-specific fine-tuning.
July 19	OpenReasoning-Nemotron	NVIDIA (USA)	1.5B / 8B / 32B (3 sizes; dense)	Apache 2.0	Distilled reasoning suite from DeepSeek R1-0528; SOTA in math/science/code (GPQA, MMLU-PRO, AIME 2025); tops LiveCodeBench/SciCode; supports TensorRT-LLM/NeMo integration; optimized for Hugging Face Transformers and ONNX deployment; commercially permissive.
July 16	Voxtral	Mistral AI (France)	24B Small and 3B Mini (2 sizes; dense)	Apache 2.0	Audio LLM family; Small (24B) transcribes 30-min audio, understands 40-min with Q&A/summarization; Mini (3B) lightweight for edge ASR tasks with automatic lang detection; multilingual ASR + LLM backbone based on Small 3.1; optimized for European languages.
July 16	Kimi K2	Moonshot AI (China)	1T / 32B (MoE)	Apache 2.0	Agentic intelligence focus; state-of-the-art in creative writing and long-context tasks; open-source for experimentation.
July 10	LFM2	Liquid AI (USA)	350M, 700M, 1.2B, 2.6B (4 dense) + 8B-A1B MoE (8.3B total / 1.5B active)	Apache 2.0-based (free for <$10M revenue)	Hybrid architecture (10 double-gated short-range convolution blocks + 6 GQA blocks); 3× faster training than previous LFM generation; 2× faster decode/prefill on CPU vs Qwen3; edge/on-device deployment focus (smartphones, laptops, vehicles); outperforms Qwen3, Gemma 3, Phi-4-Mini in size classes; pre-trained on 10-12T tokens with 32K-context mid-training; supports creative writing, agentic …	Link
July 8	T5Gemma	Google (USA)	2B-2B, 9B-2B, 9B-9B (Gemma 2 Series) and Small/Base/Large/XL/ML (T5-compatible Series; encoder-decoder)	Apache 2.0	First encoder-decoder models adapted from Gemma 2 via novel adaptation technique; converts pretrained decoder-only models into encoder-decoder architecture using UL2 or PrefixLM training; achieves comparable/better performance than Gemma 2 counterparts while dominating quality-efficiency frontier; T5Gemma 2B-2B IT gains +12 points MMLU and +12.7% GSM8K over Gemma 2 2B; flexible unbalanced …	Link
July 8	SmolLM3-3B	Hugging Face (USA)	3B (dense)	Apache 2.0	Compact multilingual reasoning model with dual-mode (think/no_think); trained on 11.2T tokens; supports 128K context (6 languages); outperforms Llama 3.2 3B and Qwen2.5 3B; competitive with 4B models; GQA and NoPE architecture.	Link
June 20	Mistral Small 3.2	Mistral AI (France)	24B (dense)	Apache 2.0	Maintenance release focused on targeted refinements; enhanced instruction-following (84.78% accuracy vs 82.75% in v3.1); reduced infinite/repetitive generations by ~50% (1.29% vs 2.11%); improved function calling template for robust tool-use scenarios; major gains on Wildbench v2 (65.33% vs 55.6%) and Arena Hard v2 (43.1% vs 19.56%); enhanced STEM performance (HumanEval Pass@5: …	Link
June 16	MiniMax-M1	MiniMax AI (China)	456B / 45.9B (MoE)	Open (permissive)	Hybrid-attention reasoning model; Lightning attention for efficient scaling; 1M token context; RL with CISPO; outperforms DeepSeek-R1 on SWE-bench/GPQA; function calling/agentic tools.
June 10	Magistral Small	Mistral AI (France)	24B (dense)	Apache 2.0	First reasoning model with CoT capabilities; transparent step-by-step thinking; multilingual expert domains; outperforms non-reasoning LLMs in accuracy.
May 28	DeepSeek R1-0528	DeepSeek AI (China)	671B / 37B (MoE)	MIT	Update to R1; reduced hallucinations; improved math/code benchmarks; enhanced frontend integration and agentic capabilities.
May 21	Falcon Arabic	TII (UAE)	7B (dense)	Apache 2.0 (TII Falcon License)	First Arabic-focused Falcon model; trained on native Modern Standard Arabic and regional dialects; best-performing Arabic model in its class; matches performance of 70B models; built on Falcon 3-7B.	Link
May 21	Falcon-H1	TII (UAE)	500M-34B (6 sizes: 500M	Apache 2.0 (TII Falcon License)	1.5B, 1.5B-deep, 3B, 7B, 34B),Hybrid Transformer-Mamba architecture; 256K context; multilingual (100+ languages); outperforms Llama and Qwen in its class; optimized for edge deployment; available as NVIDIA NIM microservice.	Link
May 21	Devstral	Mistral AI (France)	~30B (dense	Apache 2.0	est.),Agentic coding model for software engineering; tops SWE-Bench Verified (46.8%); handles multi-file repos and complex workflows; collab with All Hands AI.
May 16	Harbinger-24B	LatitudeGames (USA)	24B (dense)	Apache 2.0	Premium adventure roleplay model for immersive stories with real consequences; trained on Mistral Small 3.1 Instruct using two-stage approach (SFT on multi-turn Wayfarer-style text adventures and general roleplay + DPO for narrative coherence); applies same DPO techniques as Muse to reduce clichés and repetitive patterns; focuses on enhancing instruction following, …	Link
May 13	Muse-12B	LatitudeGames (USA)	12B (dense)	Apache 2.0	Narrative-focused roleplay model emphasizing polish and coherence; uses DPO (Direct Preference Optimization) to reduce AI clichés and repetitive patterns; designed for immersive storytelling with refined outputs; less punishing than Wayfarer line while maintaining narrative quality; free to use on AI Dungeon; trained to produce more sophisticated and varied narrative responses.	Link
May 13	Wayfarer-12B	LatitudeGames (USA)	12B (dense)	Apache 2.0	Adventure role-play model trained for challenging and dangerous text-based experiences; counters positivity bias in modern AI by embracing conflict, failure states, and character death; trained on Mistral Nemo Base using two-stage SFT (180K instruct data + 50/50 mix of synthetic 8K context text adventures and roleplay); data generated by simulating …	Link
April 28	Qwen3	Alibaba (Qwen Team)	235B / 22B (MoE) and 30B (dense)	Apache 2.0	Flagship hybrid reasoning family (thinking/non-thinking modes); 235B MoE flagship with 119 languages support; 30B dense variant for efficient deployment; strong in coding and creative tasks; lower resource needs in dense variant.
April 23	HyperCLOVA X SEED	Naver Cloud (South Korea)	3B (multimodal), 1.5B (text), 0.5B (text)	Apache 2.0	First open-source HyperCLOVA X models released for commercial use under Korea's sovereign AI ecosystem initiative; SEED 3B is multimodal (text+image) designed for Korean linguistic and cultural context understanding with visual data comprehension; outperforms competing models in image/video understanding within Korean contexts; trained on high-quality Korean-centric data with years of accumulated …	Link
April 10	Kimi-VL	Moonshot AI (China)	16B / 2.8B (MoE)	MIT	Efficient multimodal vision-language model; 128K context; native-resolution MoonViT encoder for ultra-high-res images; strong on long video (64.5 LongVideoBench) and document understanding (35.1 MMLongBench-Doc); excels in OCR (83.2 InfoVQA), agent tasks (OSWorld), and multi-image reasoning; competes with GPT-4o-mini and Qwen2.5-VL-7B; includes Kimi-VL-Thinking variant with long CoT for enhanced multimodal reasoning (61.7 …	Link
April 8	Llama Nemotron Ultra	NVIDIA (USA)	Varies (Llama 4-based)	Apache 2.0	Advanced reasoning model; leads open-weights on GPQA (76%), AIME math, LiveCodeBench coding; optimized for NVIDIA hardware inference.
April 5	Llama 4	Meta AI	109B / 17B Scout (16 experts, MoE) and 400B / 17B Maverick (128 experts, MoE)	Llama License (open-weight)	Natively multimodal (text, image, video); Scout (109B/17B) with 10M context for accessibility and real-world AI; Maverick (400B/17B) with 1M context for high-performance tasks with scalable inference; both use early fusion architecture.
March 25	DeepSeek-V3-0324	DeepSeek AI (China)	671B / 37B (MoE)	MIT	Update to V3 base; major boost in reasoning and front-end development; improved multilingual benchmarks over predecessor; AIME improved from 39.6% to 59.4%, LiveCodeBench from 39.2% to 49.2%; first open-weights model to lead non-reasoning models on Artificial Analysis Intelligence Index; MIT License.	Link
March 18	Llama Nemotron	NVIDIA (USA)	8B Nano, 49B Super, 253B Ultra (3 sizes; Llama 4-based)	Apache 2.0	Open reasoning family on Llama 4; Nano (8B) for PC/edge, Super (49B) for single GPU with best throughput, Ultra (253B) for maximum agentic accuracy; 20% improved accuracy vs base models, 5x faster inference; excels in multi-agent collaboration, workflow automation, and domain-specific fine-tuning; compute-efficient for enterprise agents.	Link
March 17	Mistral Small 3.1	Mistral AI (France)	24B (2 variants: Base	Apache 2.0	Instruct; dense),Adds state-of-the-art vision understanding to Small 3; 128K context window; multimodal (text + vision); first open-source model to surpass leading proprietary models across text, vision, and multilingual capabilities in its weight class; 150 tokens/s; runs on single RTX 4090 or 32GB RAM Mac; multilingual (dozens of languages); both Base …	Link
March 13	Command A	Cohere Labs (Canada)	111B (dense)	CC-BY-NC	Enterprise-optimized model excelling at tool use, RAG, and agentic tasks; 256K context; 150% higher throughput than Command R+; competitive with GPT-4o and DeepSeek V3; requires only 2 GPUs; strong multilingual support (23 languages).	Link
March 12	Gemma 3	Google DeepMind	1B / 4B / 12B / 27B (4 sizes; dense)	Open (permissive)	Multimodal (text + vision) family; optimized for lightweight to enterprise-grade deployment; includes ShieldGemma for content moderation; strong in safety alignments and instruction-following; excels in reasoning and long-context handling; 128K context; supports 140+ languages.	Link
March 4	Aya Vision	Cohere Labs (Canada)	8B and 32B (2 sizes; dense)	Research License	State-of-the-art multimodal research model (text + image); excels across multiple languages and modalities; outperforms leading open-weight models on language, text, and image benchmarks; supports 23 languages; introduces AyaVisionBench evaluation suite; research use only.	Link
February 26	Phi-4	Microsoft (USA)	3.8B mini and 5.6B multimodal (2 variants; dense)	MIT	Small model family: mini (3.8B) compact reasoning for math/coding on mobile devices with GQA; multimodal (5.6B) supports text, image, audio, video via Mixture of LoRAs; outperforms Gemini 2.0 Flash on audio+visual benchmarks; 200K vocab (20+ languages); top Hugging Face OpenASR leaderboard (6.14% WER); 128K context.	Link
January 30	Mistral Small 3	Mistral AI (France)	24B (dense)	Apache 2.0	Efficient base model for low-latency tasks; outperforms Llama 3.3 70B in internal evals; ideal for fine-tuning in automation/agent workflows; no RL/synthetic data used.	Link
January 20	DeepSeek R1	DeepSeek AI (China)	671B / 37B (MoE)	MIT	Advanced reasoning with cold-start RL training; excels in math, code, and complex problem-solving; supports JSON output and function calling.	Link
January 15	MiniMax 01	MiniMax AI (China)	456B / 45.9B (2 variants: Text-01	Open (permissive)	VL-01; MoE),Foundational 01 series with Lightning Attention for linear complexity; 4M token context; 100% Needle-In-A-Haystack retrieval; Text-01 for language tasks, VL-01 for multimodal (text/image) with visual reasoning/OCR.	Link
January 10	Sky-T1-32B-Preview	UC Berkeley Sky Computing Lab (USA)	32B (dense)	MIT-style (fully open)	Open reasoning model trained for <$450 in 19 hours on 8 H100s; competitive with OpenAI o1-preview on MATH500 and AIME; trained using QwQ-32B-Preview synthetic data with rejection sampling and GPT-4o-mini reformatting; fully open with training code and data.	Link
January 8	Phi-4	Microsoft (USA)	14B (dense)	MIT	Small language model optimized for math and coding; trained on 9.8T tokens with synthetic data; outperforms Llama 3.3 70B on MATH and GPQA despite 5x fewer parameters; decoder-only transformer with 4K context.	Link