Skip to main content

Aug 06 2025

OpenAI GPT OSS 120B Runs Fastest on Cerebras

OpenAI’sGPT OSS 120B model is now available on Cerebras. The first open weight reasoning model by OpenAI, OSS 120B delivers model accuracy that rivals o4-mini while running at up to 3,000 tokens per second on the Cerebras Inference Cloud. Reasoning tasks that take up to a minute to complete on GPUs finish in just one second on Cerebras. OSS 120B is available today with 131K context at $0.25 per M input tokens and $0.69 per M output tokens.

GPTOSS120B is a 120 billion parameter mixture-of-expert model that delivers near parity performance with OpenAI’s popular o4mini on core reasoning benchmarks. It excels at chain of thought tasks, tackling coding, mathematical reasoning, and health related queries with class leading accuracy and efficiency. With its public weights release under Apache 2.0, it offers transparency, finetuning flexibility, and the ability to run on the Cerebras Wafer Scale Engine in the cloud and on-prem.

Cerebras is proud to offer launch-day support for OSS 120B. On OpenRouter, Cerebras was measured at 3,045 token/s– 15x faster than the leading GPU cloud. Artificial Analysis found that Cerebras offered the best combination of speed and latency, with time to first token of just 280 milliseconds and output speed of 2,700 tokens/s.

Inference speed is critical for agentic and coding applications. Reasoning models are still used sparingly in production workloads as they can take up to a minute to produce the final answer. Cerebras runs OSS 120B so fast, it returns answers as quickly as non-reasoning models. Artificial Analysis found that Cerebras was the only provider that returns the first answer token in a single second – comparable to popular instruct models like GPT-4.1 and Claude 4 Sonnet.

Speed isn’t the only factor when considering a model provider. Differences in numerics and quantization can greatly affect output accuracy. Artificial Analysis tested all GPT OSS 120B providers and found that Cerebras was equal first in accuracy for AIME 2025, a challenging math eval.

Most flagship products—from cars to computers—command steep price premiums for only modest performance gains. Cerebras’s GPT OSS 120B endpoint flips that script, delivering 16x the speed of the median GPU cloud for less than twice the cost. That’s an 8.4x price-performance advantage as measured in tokens per second per dollar, delivering not just incredible performance but also exceptional value.

GPT OSS 120B is the most capable U.S.-trained open-weight reasoning model available today, combining exceptional instruction-following, advanced tool-calling, and state-of-the-art accuracy across math, coding, and complex reasoning tasks. Paired with Cerebras, it runs at record-breaking speed, with single-second latency, best-in-class accuracy, and exceptional price-performance. Try it today on the Cerebras Cloud and through our partners HuggingFace, OpenRouter, and Vercel.

Try GPT OSS 120B on Cerebras →