
DeepSeek V4 Size: Parameters, Active Parameters, and Context
DeepSeek V4 size is easiest to understand by separating total parameters, active parameters, and context length.

The useful distinction is total capacity versus active inference cost: MoE scale lets a model be large without activating every parameter for every token.
Official model sizes
| Model | Total parameters | Active parameters | Context |
|---|---|---|---|
| DeepSeek V4 Flash | 284B | 13B | 1M tokens |
| DeepSeek V4 Pro | 1.6T | 49B | 1M tokens |
Sources: DeepSeek-V4-Pro model card and DeepSeek API pricing.
What active parameters mean
DeepSeek V4 is an MoE family, so total parameters and active parameters are different. Total parameters describe the full model capacity. Active parameters describe the approximate amount used per token during inference.
This is why Flash can be much cheaper while still remaining useful: it has fewer active parameters and lower token prices.
Why 1M context matters
A 1M context window changes product design. Instead of sending only the last few messages, you can include large documents, long project histories, logs, or source files. The tradeoff is cost and latency, so context should still be curated rather than dumped blindly.

