AI is ‘an energy hog,’ but DeepSeek could change that
Source: The Verge
DeepSeek startled everyone last month with the claim that its AI model uses roughly one-tenth the amount of computing power as Meta’s Llama 3.1 model, upending an entire worldview of how much energy and resources it’ll take to develop artificial intelligence.
Taken at face value, that claim could have tremendous implications for the environmental impact of AI. Tech giants are rushing to build out massive AI data centers, with plans for some to use as much electricity as small cities. Generating that much electricity creates pollution, raising fears about how the physical infrastructure undergirding new generative AI tools could exacerbate climate change and worsen air quality.
Reducing how much energy it takes to train and run generative AI models could alleviate much of that stress. But it’s still too early to gauge whether DeepSeek will be a game-changer when it comes to AI’s environmental footprint. Much will depend on how other major players respond to the Chinese startup’s breakthroughs, especially considering plans to build new data centers.
“There’s a choice in the matter.”
“It just shows that AI doesn’t have to be an energy hog,” says Madalsa Singh, a postdoctoral research fellow at the University of California, Santa Barbara who studies energy systems. “There’s a choice in the matter.”
The fuss around DeepSeek began with the release of its V3 model in December, which only cost $5.6 million for its final training run and 2.78 million GPU hours to train on Nvidia’s older H800 chips, according to a technical report from the company. For comparison, Meta’s Llama 3.1 405B model — despite using newer, more efficient H100 chips — took about 30.8 million GPU hours to train. (We don’t know exact costs, but estimates for Llama 3.1 405B have been around $60 million and between $100 million and $1 billion for comparable models.)
Then DeepSeek released its R1 model last week, which venture capitalist Marc Andreessen called “a profound gift to the world.” The company’s AI assistant quickly shot to the top of Apple’s and Google’s app stores. And on Monday, it sent competitors’ stock prices into a nosedive on the assumption DeepSeek was able to create an alternative to Llama, Gemini, and ChatGPT for a fraction of the budget. Nvidia, whose chips enable all these technologies, saw its stock price plummet on news that DeepSeek’s V3 only needed 2,000 chips to train, compared to the 16,000 chips or more needed by its competitors.
DeepSeek says it was able to cut down on how much electricity it consumes by using more efficient training methods. In technical terms, it uses an auxiliary-loss-free strategy. Singh says it boils down to being more selective with which parts of the model are trained; you don’t have to train the entire model at the same time. If you think of the AI model as a big customer service firm with many experts, Singh says, it’s more selective in choosing which experts to tap.
The model also saves energy when it comes to inference, which is when the model is actually tasked to do something, through what’s called key value caching and compression. If you’re writing a story that requires research, you can think of this method as similar to being able to reference index cards with high-level summaries as you’re writing rather than having to read the entire report that’s been summarized, Singh explains.
What Singh is especially optimistic about is that DeepSeek’s models are mostly open source, minus the training data. With this approach, researchers can learn from each other faster, and it opens the door for smaller players to enter the industry. It also sets a precedent for more transparency and accountability so that investors and consumers can be more critical of what resources go into developing a model.
There is a double-edged sword to consider
“If we’ve demonstrated that these advanced AI capabilities don’t require such massive resource consumption, it will open up a little bit more breathing room for more sustainable infrastructure planning,” Singh says. “This can also incentivize these established AI labs today, like Open AI, Anthropic, Google Gemini, towards developing more efficient algorithms and techniques and move beyond sort of a brute force approach of simply adding more data and computing power onto these models.”
To be sure, there’s still skepticism around DeepSeek. “We’ve done some digging on DeepSeek, but it’s hard to find any concrete facts about the program’s energy consumption,” Carlos Torres Diaz, head of power research at Rystad Energy, said in an email.
If what the company claims about its energy use is true, that could slash a data center’s total energy consumption, Torres Diaz writes. And while big tech companies have signed a flurry of deals to procure renewable energy, soaring electricity demand from data centers still risks siphoning limited solar and wind resources from power grids. Reducing AI’s electricity consumption “would in turn make more renewable energy available for other sectors, helping displace faster the use of fossil fuels,” according to Torres Diaz. “Overall, less power demand from any sector is beneficial for the global energy transition as less fossil-fueled power generation would be needed in the long-term.”
There is a double-edged sword to consider with more energy-efficient AI models. Microsoft CEO Satya Nadella wrote on X about Jevons paradox, in which the more efficient a technology becomes, the more likely it is to be used. The environmental damage grows as a result of efficiency gains.
“The question is, gee, if we could drop the energy use of AI by a factor of 100 does that mean that there’d be 1,000 data providers coming in and saying, ‘Wow, this is great. We’re going to build, build, build 1,000 times as much even as we planned’?” says Philip Krein, research professor of electrical and computer engineering at the University of Illinois Urbana-Champaign. “It’ll be a really interesting thing over the next 10 years to watch.” Torres Diaz also said that this issue makes it too early to revise power consumption forecasts “significantly down.”
No matter how much electricity a data center uses, it’s important to look at where that electricity is coming from to understand how much pollution it creates. China still gets more than 60 percent of its electricity from coal, and another 3 percent comes from gas. The US also gets about 60 percent of its electricity from fossil fuels, but a majority of that comes from gas — which creates less carbon dioxide pollution when burned than coal.
To make things worse, energy companies are delaying the retirement of fossil fuel power plants in the US in part to meet skyrocketing demand from data centers. Some are even planning to build out new gas plants. Burning more fossil fuels inevitably leads to more of the pollution that causes climate change, as well as local air pollutants that raise health risks to nearby communities. Data centers also guzzle up a lot of water to keep hardware from overheating, which can lead to more stress in drought-prone regions.
Those are all problems that AI developers can minimize by limiting energy use overall. Traditional data centers have been able to do so in the past. Despite workloads almost tripling between 2015 and 2019, power demand managed to stay relatively flat during that time period, according to Goldman Sachs Research. Data centers then grew much more power-hungry around 2020 with advances in AI. They consumed more than 4 percent of electricity in the US in 2023, and that could nearly triple to around 12 percent by 2028, according to a December report from the Lawrence Berkeley National Laboratory. There’s more uncertainty about those kinds of projections now, but calling any shots based on DeepSeek at this point is still a shot in the dark.