The Large Language Model Landscape

The LLM landscape: closed frontier models vs open-weight models, by company and capability — The 2026 LLM landscape — closed frontier models on the left, open-weight on the right

What is a large language model?

A large language model is a neural network trained on vast quantities of text to predict the next token in a sequence. That deceptively simple training objective produces systems capable of answering questions, summarizing documents, writing code, translating languages, and reasoning through multi-step problems. "Large" refers to the number of parameters — weights in the network that encode learned associations — which in frontier models now runs into the hundreds of billions. The models powering ChatGPT, Claude, Gemini, and Perplexity are all LLMs, as are the open-weight models businesses deploy on their own infrastructure.

The closed frontier: OpenAI, Anthropic, Google, xAI

The closed frontier models are developed by well-capitalised labs, deployed via API or proprietary product, and kept private — you can use them but not inspect or modify the weights.

OpenAI's GPT-5 family was unified in early 2026 into a single flagship line (currently GPT-5.4) combining reasoning and general capability with a context window exceeding one million tokens. GPT-5 added native computer use and is the model powering ChatGPT. OpenAI claims around 900 million weekly active ChatGPT users as of early 2026, giving it by far the largest consumer reach of any AI lab.

Anthropic's Claude Opus 4 (and Claude Sonnet 4) sit at the frontier of long-context reasoning. Claude Opus 4.6 benchmarks at 91.1% on advanced reasoning evaluations and has a 1-million-token context window. Claude is used most heavily in enterprise productivity, developer tooling (including Claude Code), and agentic workflows where careful, instruction-following behaviour matters. Anthropic's B2B referral share grew 14% in Q1 2026 — the fastest quarterly growth of any major AI provider.

Google's Gemini 3 Pro leads several benchmarks at 91.8% on advanced reasoning as of mid-2026. Gemini integrates tightly with Google's infrastructure — Search, Workspace, Vertex AI — and is the model behind Google AI Overviews. It leads competitor models on multimodal tasks, particularly image understanding and generation.

xAI's Grok 4 is a newer entrant with deep integration into the X platform (formerly Twitter) and access to its real-time data stream. It benchmarks competitively with other frontier models and is growing in deployment, particularly for applications where recency of information matters.

The performance gap has narrowed significantly

As of mid-2026, the top three or four closed models are separated by only a few percentage points on standard benchmarks — and leading open-weight models are within 7 points of them. Choosing a model on benchmark performance alone is largely a distraction; use case fit, pricing, and deployment constraints matter more for most applications.

The open-weight tier: Meta, DeepSeek, Mistral, Qwen

Open-weight models release their trained weights publicly, allowing anyone to download, run, and modify them. In 2025–2026, this tier closed most of its performance gap with the closed frontier — and in several specific benchmarks, open models have pulled ahead.

Meta's Llama 4, released in April 2025, introduced a Mixture-of-Experts architecture and native multimodality. The Scout variant has a 10-million-token context window — the largest of any model, open or closed. Llama 4 benefits from Meta's massive infrastructure investment and has the largest developer ecosystem of any open-weight model family.

DeepSeek (a Chinese AI lab) produced what many consider the most significant open-weight moment of 2025: DeepSeek R1, released in January 2025 under an MIT licence, achieved frontier-level reasoning through pure reinforcement learning at a training cost of approximately $5.9 million — orders of magnitude less than comparable closed models. DeepSeek V3 and V4 continued that cost-efficiency story. For businesses with high query volumes, self-hosting DeepSeek can reduce inference costs by 60–80% compared to GPT-5 API pricing.

Mistral Large 3, from the French AI lab Mistral AI, ships under Apache 2.0 and is the preferred option for European enterprises concerned about data residency and GDPR compliance. It is particularly strong on RAG pipelines — the retrieval-augmented generation architecture that powers most AI search systems — making it a common choice for enterprises building internal AI search over their own documents.

Qwen 3 (from Alibaba) is the leading multilingual open-weight model and has emerged as the best all-round performer among open-weight models on general benchmarks, also shipping under Apache 2.0.

Open vs closed: what the divide means practically

The open vs closed choice is not primarily about capability anymore — it's about data control, cost, and customisation. Closed models are easier to start with (API call, no infrastructure) and are kept up to date by the provider. Open models can be run entirely on your own infrastructure, which matters for data privacy, enterprise compliance, and cost at scale. A business running millions of AI queries per day will find the API costs of a closed model prohibitive; self-hosting Llama 4 70B on a single A100 costs roughly $0.15 per million tokens versus $1.50 or more for GPT-5 API.

How LLMs relate to AI search engines

The AI search engines that matter for GEO — ChatGPT, Claude, Perplexity, Google AI Overviews — are built on top of these models. Perplexity uses a combination of proprietary and licensed models for its retrieval and synthesis pipeline. ChatGPT runs on GPT-5. Google AI Overviews runs on Gemini. Claude.ai is Claude. This is relevant because a model's training data, retrieval architecture, and synthesis behaviour all affect what content it cites — which is why the LLM landscape is directly relevant to anyone optimizing for AI visibility.

The short version

Frontier models in mid-2026: GPT-5 (OpenAI), Claude Opus 4 (Anthropic), Gemini 3 Pro (Google), Grok 4 (xAI). Leading open-weight models: Llama 4 (Meta), DeepSeek V3/V4, Mistral Large 3, Qwen 3. The performance gap between the tiers has largely closed. Choice is now driven by deployment model, cost, and data policy rather than raw capability.