DeepSeek V4 Flash - Fast DeepSeek V4 Chat

The speed lane of the DeepSeek V4 lineup. It keeps the 1M context window while using fewer active parameters and lower token prices for everyday product traffic.

Ready To Chat
DeepSeek-V4-Flash
Online

DeepSeek-V4-Flash is ready

DeepSeek-V4-Flash is the default model for this page. Fast DeepSeek V4 model for daily chat, tool-assisted answers, and high-throughput workflows.

Pick Flash or Pro, enable web search or thinking when needed, then start with a real prompt.
Fast
Thinking
Tool Use
Low Cost

Starter prompts

Context
1M
Scale
284B / 13B active
Output
32K
Best Fit
Fast workflows
Switch DeepSeek V4 model

Use Flash when you want DeepSeek V4 quality with lower cost and faster replies.

Overview

Where DeepSeek V4 Flash fits

DeepSeek V4 Flash is the efficient V4 path: 284B total parameters, 13B active parameters, and a 1M context window through the DeepSeek API. It is built for high-volume chat, summaries, routing, and quick iteration.

1M context window

Long conversations and large documents still fit without moving up to Pro first.

Lower token cost

Flash has lower cache-hit input, cache-miss input, and output prices than Pro.

Fast default path

Use Flash for frequent requests, draft generation, routing, summaries, and first-pass analysis.

DeepSeek V4 Flash in the Frontier Benchmark Context

DeepSeek V4 Flash is compared with DeepSeek V4 Pro and leading frontier models so you can see where the faster route stays close and where Pro should handle escalation.

DeepSeek V4 Flash

Max

Efficient DeepSeek V4 route that stays close to Pro on coding and software tasks.

Use Flash as the default when throughput and cost matter.

MMLU-Pro
86.2
SimpleQA
34.1
GPQA
88.1
LiveCodeBench
91.6
Terminal Bench
56.9
SWE Verified
79.0
SWE Pro
52.6
BrowseComp
73.2
MCPAtlas
69.0
Toolathlon
47.8

DeepSeek V4 Pro

Max

Flagship V4 path with strong coding, agentic, browsing, and tool-use scores.

Escalate to Pro when a wrong final answer is expensive.

MMLU-Pro
87.5
SimpleQA
57.9
GPQA
90.1
LiveCodeBench
93.5
Terminal Bench
67.9
SWE Verified
80.6
SWE Pro
55.4
BrowseComp
83.4
MCPAtlas
73.6
Toolathlon
51.8

Gemini 3.1 Pro

High

Strong general-reasoning competitor with high SimpleQA and GPQA scores.

External frontier baseline.

MMLU-Pro
91.0
SimpleQA
75.6
GPQA
94.3
LiveCodeBench
91.7
Terminal Bench
68.5
SWE Verified
80.6
SWE Pro
54.2
BrowseComp
85.9
MCPAtlas
69.2
Toolathlon
48.8

Claude Opus 4.6

Max

Strong coding and software-engineering baseline.

External frontier baseline.

MMLU-Pro
89.1
SimpleQA
46.2
GPQA
91.3
LiveCodeBench
88.8
Terminal Bench
65.4
SWE Verified
80.8
SWE Pro
57.3
BrowseComp
83.7
MCPAtlas
73.8
Toolathlon
47.2

GPT-5.4

xHigh

Reasoning-heavy baseline with strong terminal, browsing, and tool-use results.

- means the source table did not report the score.

MMLU-Pro
87.5
SimpleQA
45.3
GPQA
93.0
LiveCodeBench
-
Terminal Bench
75.1
SWE Verified
-
SWE Pro
57.7
BrowseComp
82.7
MCPAtlas
67.2
Toolathlon
54.6

Kimi K2.6

Thinking

Competitive coding and agentic-task comparison point.

External reasoning baseline.

MMLU-Pro
87.1
SimpleQA
36.9
GPQA
90.5
LiveCodeBench
89.6
Terminal Bench
66.7
SWE Verified
80.2
SWE Pro
58.6
BrowseComp
83.2
MCPAtlas
66.6
Toolathlon
50.0

GLM-5.1

Thinking

China-frontier baseline for reasoning, browsing, and tool tasks.

- means the source table did not report the score.

MMLU-Pro
86.0
SimpleQA
38.1
GPQA
86.2
LiveCodeBench
-
Terminal Bench
63.5
SWE Verified
-
SWE Pro
58.4
BrowseComp
79.3
MCPAtlas
71.8
Toolathlon
40.7

Values follow the official DeepSeek V4 model-card tables. Use them as routing hints, not a substitute for your own production evals.

Updated 2026-04-24
Use Cases

What DeepSeek V4 Flash is good at

Best for tasks where getting a useful answer quickly matters more than squeezing out the deepest reasoning.

Quick Q&A

Answer common questions, explain errors, and handle lightweight support chat without lag.

Summaries

Condense release notes, docs, tickets, emails, and chat history into short outputs.

Classification

Route requests, tag content, extract fields, and prepare inputs for downstream workflows.

Search-assisted answers

Use web search only when freshness matters, then let Flash draft the answer quickly.

Prompt iteration

Try prompts, compare outputs, and refine instructions without waiting on a slower model.

Low-cost long context

Keep large context available while still controlling per-token spend.

FAQ

DeepSeek V4 Flash FAQ

Quick answers about DeepSeek V4 Flash.

1

What is the DeepSeek V4 Flash API model ID?

Use deepseek-v4-flash.

2

How large is DeepSeek V4 Flash?

The official materials list 284B total parameters and 13B active parameters.

3

What context length does Flash support?

The DeepSeek API pricing table lists a 1M context window for DeepSeek V4 Flash.

4

How is Flash priced?

The current pricing page lists cache-hit input at $0.028, cache-miss input at $0.14, and output at $0.28 per 1M tokens.

5

When should I use Pro instead?

Use Pro when the task is complex, user-visible, or expensive to get wrong.

6

Does Flash support long context?

Yes. Flash keeps the same listed 1M context window while using a lower-cost model path.