← Back to models

Phi-4

Microsoft (USA) January 8 2025

Parameters

14B (dense)

License

MIT

Key Features

Small language model optimized for math and coding; trained on 9.8T tokens with synthetic data; outperforms Llama 3.3 70B on MATH and GPQA despite 5x fewer parameters; decoder-only transformer with 4K context.

Paper / Source

https://arxiv.org/abs/2412.08905