DeepSeek V4 Size: Parameters, Active Parameters, and Context

DeepSeek V4 Size: Parameters, Active Parameters, and Context

DeepSeek V4 size is easiest to understand by separating total parameters, active parameters, and context length.

DeepSeek V4 model size and context illustration

The useful distinction is total capacity versus active inference cost: MoE scale lets a model be large without activating every parameter for every token.

Official model sizes

ModelTotal parametersActive parametersContext
DeepSeek V4 Flash284B13B1M tokens
DeepSeek V4 Pro1.6T49B1M tokens

Sources: DeepSeek-V4-Pro model card and DeepSeek API pricing.

What active parameters mean

DeepSeek V4 is an MoE family, so total parameters and active parameters are different. Total parameters describe the full model capacity. Active parameters describe the approximate amount used per token during inference.

This is why Flash can be much cheaper while still remaining useful: it has fewer active parameters and lower token prices.

Why 1M context matters

A 1M context window changes product design. Instead of sending only the last few messages, you can include large documents, long project histories, logs, or source files. The tradeoff is cost and latency, so context should still be curated rather than dumped blindly.

D-Chat Team

D-Chat Team

DeepSeek V4 Size: Parameters, Active Parameters, and Context | DeepSeek V4 Blog