QwenLong-L1.5 | NLP.COM.AI

Parameters

30B / 3B (MoE)

License

Apache 2.0

Key Features

Long-context reasoning model based on Qwen3-30B-A3B-Thinking with memory management for ultra-long contexts (1M-4M tokens); three core innovations: (1) Multi-hop reasoning data synthesis pipeline that moves beyond needle-in-haystack tasks to generate complex reasoning requiring globally distributed evidence, (2) Adaptive Entropy-Controlled Policy Optimization (AEPO) algorithm for stable long-context RL training with task-balanced sampling and task-specific advantage estimation, (3) Memory-augmented architecture with multi-stage fusion RL training for seamless integration of single-pass reasoning and iterative memory-based processing beyond physical context window; achieves +9.90 points over Qwen3-30B-A3B-Thinking baseline; comparable performance to GPT-5 and Gemini-2.5-Pro on long-context benchmarks; +9.48 points on ultra-long tasks (1M-4M tokens); improvements translate to enhanced performance in math, tool use, and extended dialogue; designed for processing entire books and multi-document reasoning.

Paper / Source

https://huggingface.co/Tongyi-Zhiwen/QwenLong-L1.5-30B-A3B