671B / 37B (MoE)
MIT
Experimental update to V3.1-Terminus; introduces DeepSeek Sparse Attention (DSA) for fine-grained sparse processing; major efficiency gains in long-context training/inference (e.g., adaptive expert routing, better memory); maintains near-identical quality to prior versions.