Glossary

Fine-tuning

Fine-tuning is the process of further training a pretrained large language model on a smaller, task-specific dataset to improve performance on a specific domain, style, or behavior — adjusting model weights rather than just changing the prompt.

Updated: —
Words: 911
Category: AI / GenAI

Fine-tuning

Fine-tuning is the process of taking a pretrained large language model (or other neural network) and continuing its training on a smaller, curated dataset to specialize its behavior for a specific task, domain, or style. Unlike prompt engineering — which changes only the inputs the model sees — fine-tuning changes the model's weights, baking new patterns directly into its parameters.

Fine-tuning is the highest-impact lever after prompt engineering, but also the most expensive in time, data, and operational complexity. Most production AI products in 2026 reach for fine-tuning only after exhausting cheaper alternatives (few-shot prompting, RAG, structured output) — which solve 80%+ of use cases.

Types of fine-tuning

Several distinct techniques fall under the "fine-tuning" umbrella:

Supervised fine-tuning (SFT) — Train on input/output pairs. Most common; used when you have a few thousand high-quality examples of the desired behavior.
Reinforcement learning from human feedback (RLHF) — Humans rank model outputs; the model learns to prefer the higher-ranked style. The technique behind ChatGPT's "helpful assistant" personality.
Direct preference optimization (DPO) — A simpler alternative to RLHF that achieves similar quality without the reward model step.
LoRA (Low-Rank Adaptation) — Trains small "adapter" matrices instead of all model weights. 100–1000x cheaper to train and deploy; the dominant technique for open-weights fine-tuning in 2026.
Full fine-tuning — Updates all parameters. Expensive but maximizes quality.

OpenAI, Anthropic, and Google all offer hosted fine-tuning APIs that wrap these techniques. Open-weights models (Llama, Mistral, Gemma) can be fine-tuned freely on your own infrastructure.

When to fine-tune (and when not to)

A 2026 a16z survey of AI engineering teams found ~70% of production AI features use no fine-tuning — prompt engineering plus RAG cover the workload. Fine-tuning becomes the right answer when:

You need consistent style or format at scale (a brand voice, a JSON schema, a tone).
You're hitting token costs because prompts have grown long with examples and instructions.
You need to reduce latency by encoding behavior into the model rather than expensive in-context examples.
You're operating in a specialized domain (legal, medical, scientific) where the base model lacks vocabulary or reasoning patterns.

Don't fine-tune when:

You haven't first optimized the prompt.
Your task changes frequently — fine-tuning on a moving target is expensive.
You have under 100 high-quality examples — prompts beat fine-tuning at low data volume.
The base model already does the task well — you'll gain little.

Examples of fine-tuning in production

Harvey (legal AI) — Fine-tuned GPT-4 on legal contracts and case law for law-firm-grade output.
GitHub Copilot — Fine-tuned for code completion; weights shaped by billions of public repos.
Klarna AI assistant — Fine-tuned on customer service transcripts; replaced 700 human agents.
Replit Code LLM — Fine-tuned for live code generation in the Replit IDE.
Custom GPTs (OpenAI) — Lightweight fine-tuning + retrieval for personalized assistants.

How PostKit relates to fine-tuning

PostKit deliberately does not fine-tune any models in 2026. The reasoning is strategic: the social-content task changes frequently (new platforms, new algorithm rules, new viral patterns), and fine-tuning is the wrong tool for fast-moving targets. Instead, PostKit invests in:

Aggressive prompt engineering — Versioned prompts per platform and pipeline.
Structured output — JSON schemas the model must conform to, with validation and retry.
Brand voice as few-shot examples — Each user's brand profile becomes runtime examples in the prompt, not training data.

This keeps PostKit model-agnostic: when Gemini ships a better Flash version, Claude Haiku gets cheaper, or a new frontier model arrives, PostKit can switch without retraining.

That said, fine-tuning may make sense in PostKit's future for specific high-volume use cases — for example, a fine-tuned hashtag-generation model trained on millions of high-engagement posts could outperform a general LLM at lower cost. The decision will be data-driven: when prompt iteration plateaus, fine-tuning becomes the next lever.

Frequently asked questions

How much data do I need to fine-tune? SFT typically wants 500–10,000 high-quality examples. LoRA can work with as few as 50–200 well-chosen examples. More data helps if it's diverse and high-quality; noisy data hurts.

How much does fine-tuning cost? OpenAI fine-tuning of GPT-4o-mini: ~$25 per million training tokens. Llama 3 LoRA on a single A100: ~$2–10 in compute for a small dataset. Full fine-tuning of a frontier model: $50k–$500k.

What's the difference between fine-tuning and pretraining? Pretraining trains a model from random initialization on trillions of tokens to learn general language patterns ($10M–$1B+). Fine-tuning starts from a pretrained model and adds task-specific behavior on a much smaller dataset.

Can I fine-tune a closed model like GPT-5 or Claude? GPT-5 supports hosted fine-tuning. Claude does not currently offer fine-tuning (Anthropic's stance favors prompt engineering). Gemini offers fine-tuning via Vertex AI.

What is LoRA and why is it everywhere? Low-Rank Adaptation trains tiny adapter matrices (~1% of full model size). It's 100–1000x cheaper, you can swap LoRAs in and out at inference, and you can host hundreds of LoRAs against a single base model — making it the go-to technique for production fine-tuning.

Does fine-tuning cause hallucinations? It can. Fine-tuning on small or biased datasets can amplify existing failure modes or introduce new ones. Always evaluate fine-tuned models on held-out test sets before production.

What's "instruction tuning"? A specific kind of fine-tuning where you train the base model on (instruction, response) pairs to make it follow natural-language instructions. The first step in turning a "raw" pretrained model into a useful assistant.

Sources

Hugging Face — Parameter-Efficient Fine-Tuning Guide (2025)
a16z — State of AI Engineering Survey (2026)
Anthropic — Why We Don't Offer Fine-Tuning (blog post, 2024)

Related comparisons

Glossary

Fine-tuning

Updated: —
Words: 911
Category: AI / GenAI

Fine-tuning

Types of fine-tuning

Several distinct techniques fall under the "fine-tuning" umbrella:

Supervised fine-tuning (SFT) — Train on input/output pairs. Most common; used when you have a few thousand high-quality examples of the desired behavior.
Reinforcement learning from human feedback (RLHF) — Humans rank model outputs; the model learns to prefer the higher-ranked style. The technique behind ChatGPT's "helpful assistant" personality.
Direct preference optimization (DPO) — A simpler alternative to RLHF that achieves similar quality without the reward model step.
LoRA (Low-Rank Adaptation) — Trains small "adapter" matrices instead of all model weights. 100–1000x cheaper to train and deploy; the dominant technique for open-weights fine-tuning in 2026.
Full fine-tuning — Updates all parameters. Expensive but maximizes quality.

OpenAI, Anthropic, and Google all offer hosted fine-tuning APIs that wrap these techniques. Open-weights models (Llama, Mistral, Gemma) can be fine-tuned freely on your own infrastructure.

When to fine-tune (and when not to)

A 2026 a16z survey of AI engineering teams found ~70% of production AI features use no fine-tuning — prompt engineering plus RAG cover the workload. Fine-tuning becomes the right answer when:

You need consistent style or format at scale (a brand voice, a JSON schema, a tone).
You're hitting token costs because prompts have grown long with examples and instructions.
You need to reduce latency by encoding behavior into the model rather than expensive in-context examples.
You're operating in a specialized domain (legal, medical, scientific) where the base model lacks vocabulary or reasoning patterns.

Don't fine-tune when:

You haven't first optimized the prompt.
Your task changes frequently — fine-tuning on a moving target is expensive.
You have under 100 high-quality examples — prompts beat fine-tuning at low data volume.
The base model already does the task well — you'll gain little.

Examples of fine-tuning in production

Harvey (legal AI) — Fine-tuned GPT-4 on legal contracts and case law for law-firm-grade output.
GitHub Copilot — Fine-tuned for code completion; weights shaped by billions of public repos.
Klarna AI assistant — Fine-tuned on customer service transcripts; replaced 700 human agents.
Replit Code LLM — Fine-tuned for live code generation in the Replit IDE.
Custom GPTs (OpenAI) — Lightweight fine-tuning + retrieval for personalized assistants.

How PostKit relates to fine-tuning

Aggressive prompt engineering — Versioned prompts per platform and pipeline.
Structured output — JSON schemas the model must conform to, with validation and retry.
Brand voice as few-shot examples — Each user's brand profile becomes runtime examples in the prompt, not training data.

This keeps PostKit model-agnostic: when Gemini ships a better Flash version, Claude Haiku gets cheaper, or a new frontier model arrives, PostKit can switch without retraining.

Frequently asked questions

Sources

Hugging Face — Parameter-Efficient Fine-Tuning Guide (2025)
a16z — State of AI Engineering Survey (2026)
Anthropic — Why We Don't Offer Fine-Tuning (blog post, 2024)

Fine-tuning

Types of fine-tuning

When to fine-tune (and when not to)

Examples of fine-tuning in production

How PostKit relates to fine-tuning

Frequently asked questions

Related terms

Sources

Related comparisons

Fine-tuning

Types of fine-tuning

When to fine-tune (and when not to)

Examples of fine-tuning in production

How PostKit relates to fine-tuning

Frequently asked questions

Related terms

Sources

Related comparisons