Kimi Linear | NLP.COM.AI

Parameters

48B / 3B (MoE)

License

MIT

Key Features

Hybrid linear attention architecture with Kimi Delta Attention (KDA); 3:1 KDA-to-global MLA ratio; outperforms full attention across short/long-context and RL tasks; 75% KV cache reduction; 6x faster decoding at 1M context; trained on 5.7T tokens.

Paper / Source

https://arxiv.org/abs/2510.26692