GPT-5
GraduatedOpenAI's most capable model โ unified reasoning, vision, and tool use
The new frontier. Native chain-of-thought, vision, audio, and tool use in one model. Dramatically better at complex reasoning and multi-step tasks. The benchmark all others are measured against.
GPT-4o
GraduatedOpenAI flagship multimodal model โ text, vision, audio
Still the workhorse for production RAG and agentic pipelines. JSON mode, function calling, and vision are battle-tested. Best cost/quality ratio for most enterprise deployments.
GPT-4o mini
GraduatedOpenAI's cost-efficient model โ 90% of GPT-4o quality at 5% cost
Best value in AI. Faster and cheaper than GPT-3.5 while being significantly smarter. Ideal for classification, extraction, summarisation, and simple chat. First choice for cost-sensitive apps.
GPT-OSS 120B
IncubatingOpenAI's first open-weight model โ 120B params, fully downloadable
Historic moment โ OpenAI releasing open weights. Competitive with Llama 4 and Mistral Large. Fine-tunable and self-hostable. Great for teams wanting OpenAI quality with full control.
Claude 4 Sonnet
GraduatedAnthropic's latest โ best-in-class coding, reasoning, and safety
The coding and reasoning champion. Extended thinking mode tackles PhD-level problems. MCP tool ecosystem is mature. Best for agentic workflows, code generation, and long-context analysis.
Claude 3.5 Sonnet
GraduatedAnthropic's proven workhorse with 200K context and computer use
Exceptional at long-context reasoning and code generation. Computer use API powers automated workflows. Strong safety via Constitution AI. Still widely used in production.
Gemini 2.5 Pro
GraduatedGoogle's most capable model with native multimodal and 2M context
First model with a 2M-token context window that actually works. Native audio, video, and image understanding. Deep Google ecosystem integration. Excellent for enterprise multimodal use cases.
Gemini 2.0 Flash
GraduatedGoogle's speed-optimised model with 1M context at low cost
Fastest inference at near-frontier quality. 1M-token context window for long-document analysis. Best for latency-sensitive applications and high-volume processing.
Llama 4 Maverick
IncubatingMeta's latest open model โ 400B MoE with 128 experts
Massive leap for open-source. 400B MoE architecture with 128 experts runs efficiently on 8ร H100. Matches frontier closed models on most benchmarks. Best open model for enterprise self-hosting.
Llama 3.3 70B
GraduatedMeta's proven open model โ battle-tested at 70B params
Battle-tested in thousands of production deployments. Runs on a single A100 80GB. Excellent for self-hosted RAG, fine-tuning, and cost-sensitive pipelines. Huge ecosystem of fine-tunes.
DeepSeek R1
GraduatedOpen-weight chain-of-thought reasoning rivalling GPT-o1
Made frontier reasoning accessible to everyone. Open weights, chain-of-thought at GPT-o1 quality, at a fraction of the cost. Self-hostable for full data control. Essential for reasoning-heavy tasks.
DeepSeek V3
Graduated685B MoE model trained for $5.5M โ remarkable efficiency
Proved you can train frontier models affordably. 685B MoE with FP8 training on 2048 H800s. Strong general capabilities. Best for teams wanting frontier quality with efficient self-hosting.
Qwen 3.5
IncubatingAlibaba's latest multilingual model โ 9B to 72B variants
Best open model for multilingual applications (CJK especially). Multiple sizes from 9B to 72B. Strong code and math. Available on Hugging Face with permissive licensing.
Gemma 4
IncubatingGoogle's open model family โ 26B and 31B instruction-tuned
Best small-to-medium open model from Google. 26B A4B variant uses mixture of experts for efficiency. Strong at instruction following and reasoning. Good JAX and Keras ecosystem.
Mistral Large 2
IncubatingMistral AI top-tier model with EU data residency
Best European option with data-residency guarantees via La Plateforme. Function calling and JSON mode are reliable. Good for regulated industries needing EU hosting and GDPR compliance.
Phi-4
IncubatingMicrosoft's 14B model punching above its weight on reasoning
Remarkable reasoning for its size โ beats many 70B models on math and logic benchmarks. Runs on consumer GPUs. Ideal for edge deployment and latency-sensitive applications.
Grok 3
IncubatingxAI's frontier model trained on Colossus โ 100K GPU cluster
Strong reasoning and real-time knowledge via X integration. Massive compute budget produces competitive frontier quality. API available. Good alternative for teams wanting model diversity.
GLM-5
SandboxZhipu AI's 754B frontier model from China's leading AI lab
One of the largest dense models available. Strong Chinese language capabilities and general reasoning. Open weights on Hugging Face. Interesting for multilingual and research applications.
Command R+
IncubatingCohere's enterprise model optimised for RAG and tool use
Purpose-built for enterprise RAG. Excellent citation generation and grounding. Multilingual at 10 languages. Cohere Coral SDK simplifies integration. Good for accuracy-critical search applications.
GPT-3.5 Turbo
ArchivedOpenAI's original cost-efficient chat model โ now superseded
Fully superseded by GPT-4o mini at similar cost and much higher quality. Avoid for new projects. Migrate to gpt-4o-mini or a modern open-weight alternative.