Tech Wavo
  • Home
  • Technology
  • Computers
  • Gadgets
  • Mobile
  • Apps
  • News
  • Financial
  • Stock
Tech Wavo
No Result
View All Result

DeepSeek releases ‘sparse attention’ model that cuts API costs in half

Tech Wavo by Tech Wavo
September 29, 2025
in Computers
0


Researchers at DeepSeek on Monday released a new experimental model called V3.2-exp, designed to have dramatically lower inference costs when used in long-context operations. DeepSeek announced the model with a post on Hugging Face, also posting a linked academic paper on GitHub.

The most important feature of the new model is called DeepSeek Sparse Attention, an intricate system described in detail in the diagram below. In essence, the system uses a module called a “lightning indexer” to prioritize specific excerpts from the context window. After that, a separate system called a “fine-grained token selection system” chooses specific tokens from within those excerpts to load into the module’s limited attention window. Taken together, they allow the Sparse Attention models to operate over long portions of context with comparatively small server loads.

Screenshot

For long-context operations, the benefits of the system are significant. Preliminary testing by DeepSeek found that the price of a simple API call could be reduced by as much as half in long-context situations. Further testing will be required to build a more robust assessment, but because the model is open-weight and freely available on Hugging Face, it won’t be long before third-party tests can assess the claims made in the paper.

DeepSeek’s new model is one of a string of recent breakthroughs tackling the problem of inference costs — essentially, the server costs of operating a pre-trained AI model, as distinct from the cost of training it. In DeepSeek’s case, the researchers were looking for ways to make the fundamental transformer architecture operate more efficiently — and finding that there are significant improvements to be made.

Based in China, DeepSeek has been an unusual figure in the AI boom, particularly for those who view AI research as a nationalist struggle between the U.S. and China. The company made waves at the beginning of the year with its R1 model, trained using primarily reinforcement learning at a far lower cost than its American competitors. But the model has not sparked a wholesale revolution in AI training, as some predicted, and the company has receded from the spotlight in the months since.

The new “sparse attention” approach is unlikely to produce the same uproar as R1 — but it could still teach U.S. providers some much needed tricks to help keep inference costs low.

Previous Post

50% Cheaper, 3x Faster, Maximum Value

Next Post

OpenAI Is Preparing to Launch a Social App for AI-Generated Videos

Next Post
OpenAI Is Preparing to Launch a Social App for AI-Generated Videos

OpenAI Is Preparing to Launch a Social App for AI-Generated Videos

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

AI voice clones created in minutes now sound so real that your ears may never trust another voice again

by Tech Wavo
September 29, 2025
0
AI voice clones created in minutes now sound so real that your ears may never trust another voice again
Computers

AI-generated voices now mimic humans so convincingly that detection is nearly impossibleCreating a convincing voice clone now takes minutes and...

Read more

How to follow Amazon’s big hardware event tomorrow

by Tech Wavo
September 29, 2025
0
Amazon may be announcing new Echo and Kindle devices on September 30
Computers

Tomorrow, Amazon will host an event in New York City to unveil some new hardware. The showcase kicks off on...

Read more

Dreo Whole-Room Heater Review: Smart, Quiet and Powerful

by Tech Wavo
September 29, 2025
0
Dreo Whole-Room Heater Review: Smart, Quiet and Powerful
Mobile

At a glanceExpert's Rating Pros Powerful and quiet On-device thermostat Genuinely useful smart features, including scheduling Cons Fairly pricey to...

Read more

Macintosh System 7 Ported To X86 With LLM Help

by Tech Wavo
September 29, 2025
0
Macintosh System 7 Ported To X86 With LLM Help
Technology

You can use large language models for all sorts of things these days, from writing terrible college papers to bungling...

Read more

Site links

  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of use
  • Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of use

No Result
View All Result
  • Home
  • Technology
  • Computers
  • Gadgets
  • Mobile
  • Apps
  • News
  • Financial
  • Stock