We’re releasing OpenAI o3-mini, the newest, most cost-efficient model in our reasoning series, available in both ChatGPT and the API today. Previewed in December 2024, this powerful and fast model advances the boundaries of what small models can achieve, delivering exceptional STEM capabilities—with particular strength in science, math, and coding—all while maintaining the low cost and reduced latency of OpenAI o1-mini. OpenAI o3-mini is our first small reasoning model that supports highly requested developer features including function calling(opens in a new window), Structured Outputs(opens in a new window), and developer messages(opens in a new window), making it production-ready out of the gate. Like OpenAI o1-mini and OpenAI o1-preview, o3-mini will support streaming(opens in a new window). Also, developers can choose between three reasoning effort(opens in a new window) options—low, medium, and high—to optimize for their specific use cases. This flexibility allows o3-mini to “think harder” when tackling complex challenges or prioritize speed when latency is a concern. o3-mini does not support vision capabilities, so developers should continue using OpenAI o1 for visual reasoning tasks. o3-mini is rolling out in the Chat Completions API, Assistants API, and Batch API starting today to select developers in API usage tiers 3-5
Notes: We can’t make a precise estimate, but seems unlikely to exceed 10^25 FLOP. We think active parameter count is 10-30B. This would require >55T tokens to reach 10^25 FLOP at the large size, i.e. well beyond 10x overtraining relative to Chinchilla.
Notes: Can't get an exact estimate, but we suspect total parameter count around 60B-120B, active parameters around 10B-30B. Given these models are served at 150-200 tok/s, at $4.40/Mtok output, inference economics (https://epoch.ai/blog/inference-economics-of-language-models) suggests total parameter count around 60-120B parameters, with mixture-of-experts active parameters around 10-30B. MoEs make a given model roughly comparable to a ~50% smaller dense model (https://epoch.ai/gradient-updates/moe-vs-dense-models-inference), which lines up decently with Magistral Small pricing (24B dense, served at a similar speed for the cheaper $1.50/Mtok).