T5Gemma 2 | NLP.COM.AI

Parameters

270M-270M, 1B-1B, 4B-4B (3 encoder-decoder sizes)

License

Apache 2.0

Key Features

Next generation of T5Gemma with multimodal and long-context capabilities; extends T5Gemma's adaptation recipe (UL2) from text-only to multimodal based on Gemma 3; processes text and vision inputs; introduces tied word embeddings (shares all embeddings across encoder and decoder for efficiency) and merged attention (unifies decoder self-attention and cross-attention into single module); demonstrates generality of adaptation strategy across architectures and modalities; shows encoder-decoder's unique strength on long-context modeling; yields comparable/better pretraining performance and significantly improved post-training performance compared to Gemma 3 counterpart; lightweight open encoder-decoder family with strong multilingual capabilities; pretrained models released for community research.

Paper / Source

https://arxiv.org/abs/2512.14856