
DeepSeek’s new model cuts inference costs by 50% using sparse attention for long-context tasks.
DeepSeek has released its V3.2-Exp model, which dramatically reduces inference costs by 50% compared to leading models like GPT-4. The breakthrough is powered by DeepSeek Sparse Attention (DSA), a mechanism that selectively attends to the most relevant tokens in long sequences, rather than processing every token. This innovation, dubbed the ‘lightning indexer,’ flattens the cost curve for long-context tasks, making it feasible to process hundreds of pages of text at a fraction of the cost.
The model’s efficiency is particularly valuable for enterprise applications where long-context processing is essential, such as legal document analysis, scientific research, and enterprise knowledge management. With input processing at just $0.028 per million tokens, DeepSeek is positioning itself as a cost-effective alternative to proprietary models. The release underscores China’s growing competitiveness in the global LLM race, with DeepSeek’s models now rivaling those from OpenAI and Google. This development could accelerate the adoption of LLMs in cost-sensitive industries and regions.
Source: Macaron