Glossary

LLM (Large Language Model)

A large language model (LLM) is a deep neural network trained on trillions of tokens of text that predicts the next token in a sequence — enabling it to write, summarize, translate, and reason at near-human quality on many tasks.

Updated: —
Words: 844
Category: AI / GenAI

LLM (Large Language Model)

A large language model (LLM) is a transformer-based neural network with billions to trillions of parameters, trained on massive text corpora to predict the next token in a sequence. By learning the statistical structure of language at scale, LLMs acquire emergent capabilities — reasoning, code generation, translation, summarization — without being explicitly programmed for any of them.

LLMs are the engine behind virtually every consumer-facing AI product launched since 2023, including ChatGPT, Claude, Gemini, Perplexity, and Copilot. The global LLM market was valued at $9.98B in 2026 and is forecast to grow at a 33.7% CAGR through 2033, reaching $82.1B (Coherent Market Insights).

How LLMs work

An LLM is trained in three phases:

Pretraining — The model ingests trillions of tokens (web pages, books, code, papers) and learns to predict the next token. This is unsupervised and produces a "base model" with broad knowledge but no instruction-following ability.
Supervised fine-tuning (SFT) — Human labelers write thousands of high-quality prompt/response pairs. The model learns to follow instructions in that style.
Reinforcement learning from human feedback (RLHF) — Humans rank pairs of model outputs; the model learns to prefer the higher-ranked style. This is what makes ChatGPT feel "helpful."

Modern LLMs add reasoning steps (chain-of-thought), tool use (search, code execution), and longer context windows (1M+ tokens). The frontier models in 2026 — GPT-5, Claude Opus 4.7, Gemini 2.5 Pro — all support multimodal inputs and structured outputs natively.

Capabilities and limits

LLMs excel at tasks where pattern-matching against language data is sufficient: drafting, summarizing, translating, coding common patterns, answering general-knowledge questions, and following specifications. They struggle with:

Real-time information — Without RAG or web search, knowledge is frozen at training cutoff.
Exact arithmetic and counting — Tokenization breaks numbers awkwardly; tools like calculators help.
Faithfulness — LLMs hallucinate, inventing plausible-sounding facts.
Long-horizon planning — Multi-step tasks degrade without agent scaffolding.

A Stanford 2026 evaluation found that frontier LLMs match or exceed expert humans on 47 of 100 benchmarked tasks, including legal contract review and medical triage — but underperform on tasks requiring real-world judgment or stakes.

Examples of leading LLMs (2026)

GPT-5 (OpenAI) — General-purpose; native multimodal; strongest at creative writing.
Claude Opus 4.7 (Anthropic) — Strong reasoning; 1M-token context; favored for code and long-document analysis.
Gemini 2.5 Pro (Google) — Deep Google integration; massive context; excellent multimodal grounding.
Llama 4 (Meta) — Open-weights; the leading model you can run on your own hardware.
Mistral Large 2 — European frontier model; efficient inference; favored for on-prem deployment.

How PostKit uses LLMs

PostKit uses Gemini Flash 3 for two of three pipeline steps. Step 1 takes a brand profile, platform rules, and chosen marketing pipeline (PAS, AIDA, POV Hook, etc.) and emits structured JSON: a week of posts, each with platform-appropriate captions, slide texts, hashtags, and image briefs.

The choice of Gemini Flash over a slower frontier model is deliberate. For structured output following a tight schema, a smaller fast model with carefully tuned prompts beats a larger model on cost and latency, and the quality gap is negligible when the task is well-bounded. PostKit reserves heavier reasoning for ambiguous tasks like writing a brand voice from a few examples.

Step 2 reuses Gemini Flash 3 to convert image briefs into prompt-engineered inputs for Imagen 3. Chaining smaller calls instead of asking one giant model to do everything yields more reliable, debuggable, and cheaper output.

Frequently asked questions

What does "large" actually mean for an LLM? "Large" is a sliding window. In 2018, BERT-Large at 340M parameters was huge; in 2026, frontier models exceed 1 trillion parameters. The threshold for "large" tracks frontier scale, not a fixed number.

How do LLMs differ from search engines? A search engine retrieves existing pages ranked by relevance; an LLM generates a new response by sampling from a learned distribution. Hybrid systems (RAG, AI Overviews) combine both.

Can I train my own LLM? Training a frontier model from scratch costs $50M+. But you can fine-tune an existing open-weights model (Llama, Mistral) on your data for $1k–$50k, or use few-shot learning prompts for free.

Are LLMs deterministic? No, by default. They sample from probability distributions, controlled by a "temperature" parameter. Setting temperature to 0 makes outputs nearly deterministic but reduces creativity.

Do LLMs understand language? Hot debate. Functionally, they pass many comprehension tests; mechanistically, they're statistical pattern matchers. Most researchers settle on "they exhibit understanding-like behavior," which is what matters for product use.

What's the context window? The maximum number of tokens an LLM can read in one request. GPT-3 had 4k tokens (~3k words); Claude 4.7 has 1M tokens (~750k words). Longer context enables analyzing whole books, codebases, or RAG document sets in one shot.

What is "tokens per second" and why does it matter? The speed at which an LLM emits output, typically 30–500 tokens/second. Faster models feel more responsive and cost less per request; slower models sometimes deliver higher quality.

Sources

Coherent Market Insights — Large Language Model Market Report 2026
Stanford HAI — AI Index Report 2026
Anthropic, OpenAI, Google DeepMind — Model documentation, 2025–2026

Related comparisons

Glossary

LLM (Large Language Model)

Updated: —
Words: 844
Category: AI / GenAI

LLM (Large Language Model)

How LLMs work

An LLM is trained in three phases:

Pretraining — The model ingests trillions of tokens (web pages, books, code, papers) and learns to predict the next token. This is unsupervised and produces a "base model" with broad knowledge but no instruction-following ability.
Supervised fine-tuning (SFT) — Human labelers write thousands of high-quality prompt/response pairs. The model learns to follow instructions in that style.
Reinforcement learning from human feedback (RLHF) — Humans rank pairs of model outputs; the model learns to prefer the higher-ranked style. This is what makes ChatGPT feel "helpful."

Capabilities and limits

Real-time information — Without RAG or web search, knowledge is frozen at training cutoff.
Exact arithmetic and counting — Tokenization breaks numbers awkwardly; tools like calculators help.
Faithfulness — LLMs hallucinate, inventing plausible-sounding facts.
Long-horizon planning — Multi-step tasks degrade without agent scaffolding.

Examples of leading LLMs (2026)

GPT-5 (OpenAI) — General-purpose; native multimodal; strongest at creative writing.
Claude Opus 4.7 (Anthropic) — Strong reasoning; 1M-token context; favored for code and long-document analysis.
Gemini 2.5 Pro (Google) — Deep Google integration; massive context; excellent multimodal grounding.
Llama 4 (Meta) — Open-weights; the leading model you can run on your own hardware.
Mistral Large 2 — European frontier model; efficient inference; favored for on-prem deployment.

LLM (Large Language Model)

How LLMs work

Capabilities and limits

Examples of leading LLMs (2026)

How PostKit uses LLMs

Frequently asked questions

Related terms

Sources

Related comparisons

LLM (Large Language Model)

How LLMs work

Capabilities and limits

Examples of leading LLMs (2026)

How PostKit uses LLMs

Frequently asked questions

Related terms

Sources

Related comparisons