32B (dense)
MIT-style (fully open)
Open reasoning model trained for <$450 in 19 hours on 8 H100s; competitive with OpenAI o1-preview on MATH500 and AIME; trained using QwQ-32B-Preview synthetic data with rejection sampling and GPT-4o-mini reformatting; fully open with training code and data.