Alibaba's Qwen3 235B 2507 Instruct model is now available on Cerebras. The world’s leading non-reasoning model – Qwen3 235B Instruct runs at over 1,400 tokens per second – 11x faster than the leading GPU cloud. We serve the model with 131K context and FP8 weights from our US based data centers. Priced at $0.60 per million input tokens and $1.20 per million output tokens, Qwen3 235B 2507 on Cerebras delivers best-in-class intelligence, speed, and price-performance.
Qwen3 235B2507 Instruct
Following developer feedback, the Qwen team developed two separate models based on Qwen3 235B – a thinking and non-thinking version. Qwen3-235B-A22B-Instruct-2507 is the non-thinking model, achieving state-of-the-art results among non-reasoning models. It outperforms GPT-4.1, Claude Opus 4, DeepSeek V3, and Kimi K2 in the Artificial Analysis Intelligence Index – a blended score across seven benchmarks representing general knowledge, reasoning, coding, and STEM.
Qwen3 235B 2507 improves upon the previous Qwen3 256B hybrid model in several ways:
- Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.
- Substantial gains in long-tail knowledge coverage across multiple languages.
- Markedly better alignment with user preferences in subjective and open-ended tasks, enabling more helpful responses and higher-quality text generation.
- Enhanced capabilities with up to 256K long-context understanding.
Qwen3 235B on Cerebras– Frontier Intelligence at 1,400 tokens/s
Qwen3 235B Instruct on Cerebras runs at an unprecedented 1,400 tokens/s – making it the world’s only frontier model that can generate code, run agents, and carry conversations at instant speed.
With 131K context for paying users (64K for free tier), you can use the model on large codebases and lengthy documents without losing coherence. Qwen3 235B Instruct on Cerebras is priced at $0.60 per million input tokens and $1.20 per million output tokens. Compared to GPT-4.1, Qwen3 235b Instruct on Cerebras offers higher model intelligence, 12x faster token output, and reduces cost per token by 70%.
Getting Started
To use Qwen3 235B Instruct in your app, generate an API key from Cerebras Inference Cloud and use the model name: qwen-3-235b-a22b-instruct-2507. As usual, we offer a generous free-tier with 1M tokens per day and pay-as-you-go via OpenRouter and Hugging Face. We will be adding support for Qwen3 2507 Thinking in the near future.