Model Details

Domain:

Task:

Quantitative reasoning

Chat

Translation

Code generation

Visual question answering

Instruction interpretation

Visual puzzles

Model Access:

API access

AI Tools Usage

This model is commonly used behind the scenes in AI tools.

Introduction

Today, we’re releasing OpenAI o3 and o4-mini, the latest in our o-series of models trained to think for longer before responding. These are the smartest models we’ve released to date, representing a step change in ChatGPT's capabilities for everyone from curious users to advanced researchers. For the first time, our reasoning models can agentically use and combine every tool within ChatGPT—this includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images. Critically, these models are trained to reason about when and how to use tools to produce detailed and thoughtful answers in the right output formats, typically in under a minute, to solve more complex problems. This allows them to tackle multi-faceted questions more effectively, a step toward a more agentic ChatGPT that can independently execute tasks on your behalf. The combined power of state-of-the-art reasoning with full tool access translates into significantly stronger performance across academic benchmarks and real-world tasks, setting a new standard in both intelligence and usefulness. <..> OpenAI o4-mini is a smaller model optimized for fast, cost-efficient reasoning—it achieves remarkable performance for its size and cost, particularly in math, coding, and visual tasks. It is the best-performing benchmarked model on AIME 2024 and 2025. Although access to a computer meaningfully reduces the difficulty of the AIME exam, we also found it notable that o4-mini achieves 99.5% pass@1 (100% consensus@8) on AIME 2025 when given access to a Python interpreter. While these results should not be compared to the performance of models without tool access, they are one example of how effectively o4-mini leverages available tools; o3 shows similar improvements on AIME 2025 from tool use (98.4% pass@1, 100% consensus@8).

Benchmarking

Notes: We can’t make a precise estimate, but seems unlikely to exceed 10^25 FLOP. We think active parameter count is 10-30B. This would require >55T tokens to reach 10^25 FLOP at the large size, i.e. well beyond 10x overtraining relative to Chinchilla.

Training

Training Code Accessibility"Both o3 and o4-mini are also available to developers today via the Chat Completions API and Responses API (some developers will need to verify their organizations⁠(opens in a new window) to access these models)"

Parameters

Notes: Can't get an exact estimate, but we suspect total parameter count around 60B-120B, active parameters around 10B-30B. Given these models are served at 150-200 tok/s, at $4.40/Mtok output, inference economics (https://epoch.ai/blog/inference-economics-of-language-models) suggests total parameter count around 60-120B parameters, with mixture-of-experts active parameters around 10-30B. MoEs make a given model roughly comparable to a ~50% smaller dense model (https://epoch.ai/gradient-updates/moe-vs-dense-models-inference), which lines up decently with Magistral Small pricing (24B dense, served at a similar speed for the cheaper $1.50/Mtok).

Related ModelsView all models