30B (2 variants: standard
Apache 2.0
Realtime; dense),End-to-end omni-modal (text/image/audio/video); unified architecture with Thinker/Talker MoEs for reasoning/speech gen; 58% Big Bench Audio; 119 text langs, 19 speech in/10 out; 17 voice options; Realtime variant for low-latency speech-to-speech (0.9s first audio).