Less is more: How 'chain of draft' could cut AI costs by 90% while improving performance

Source: Venture Beat

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

A team of researchers at Zoom Communications has developed a breakthrough technique that could dramatically reduce the cost and computational resources needed for AI systems to tackle complex reasoning problems, potentially transforming how enterprises deploy AI at scale.

The method, called chain of draft (CoD), enables large language models (LLMs) to solve problems with minimal words — using as little as 7.6% of the text required by current methods while maintaining or even improving accuracy. The findings were published in a paper last week on the research repository arXiv.

“By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT (chain-of-thought) in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks,” write the authors, led by Silei Xu, a researcher at Zoom.

Chain of draft (red) maintains or exceeds the accuracy of chain-of-thought (yellow) while using dramatically fewer tokens across four reasoning tasks, demonstrating how concise AI reasoning can cut costs without sacrificing performance. (Credit: arxiv.org)

How ‘less is more’ transforms AI reasoning without sacrificing accuracy

COD draws inspiration from how humans solve complex problems. Rather than articulating every detail when working through a math problem or logical puzzle, people typically jot down only essential information in abbreviated form.

“When solving complex tasks — whether mathematical problems, drafting essays or coding — we often jot down only the critical pieces of information that help us progress,” the researchers explain. “By emulating this behavior, LLMs can focus on advancing toward solutions without the overhead of verbose reasoning.”

The team tested their approach on numerous benchmarks, including arithmetic reasoning (GSM8k), commonsense reasoning (date understanding and sports understanding) and symbolic reasoning (coin flip tasks).

In one striking example in which Claude 3.5 Sonnet processed sports-related questions, the COD approach reduced the average output from 189.4 tokens to just 14.3 tokens — a 92.4% reduction — while simultaneously improving accuracy from 93.2% to 97.3%.

Slashing enterprise AI costs: The business case for concise machine reasoning

“For an enterprise processing 1 million reasoning queries monthly, CoD could cut costs from $3,800 (CoT) to $760, saving over $3,000 per month,” AI researcher Ajith Vallath Prabhakar writes in an analysis of the paper.

The research comes at a critical time for enterprise AI deployment. As companies increasingly integrate sophisticated AI systems into their operations, computational costs and response times have emerged as significant barriers to widespread adoption.

Current state-of-the-art reasoning techniques like (CoT), which was introduced in 2022, have dramatically improved AI’s ability to solve complex problems by breaking them down into step-by-step reasoning. But this approach generates lengthy explanations that consume substantial computational resources and increase response latency.

“The verbose nature of CoT prompting results in substantial computational overhead, increased latency and higher operational expenses,” writes Prabhakar.

What makes COD particularly noteworthy for enterprises is its simplicity of implementation. Unlike many AI advancements that require expensive model retraining or architectural changes, CoD can be deployed immediately with existing models through a simple prompt modification.

“Organizations already using CoT can switch to CoD with a simple prompt modification,” Prabhakar explains.

The technique could prove especially valuable for latency-sensitive applications like real-time customer support, mobile AI, educational tools and financial services, where even small delays can significantly impact user experience.

Industry experts suggest that the implications extend beyond cost savings, however. By making advanced AI reasoning more accessible and affordable, COD could democratize access to sophisticated AI capabilities for smaller organizations and resource-constrained environments.

As AI systems continue to evolve, techniques like COD highlight a growing emphasis on efficiency alongside raw capability. For enterprises navigating the rapidly changing AI landscape, such optimizations could prove as valuable as improvements in the underlying models themselves.

“As AI models continue to evolve, optimizing reasoning efficiency will be as critical as improving their raw capabilities,” Prabhakar concluded.

The research code and data have been made publicly available on GitHub, allowing organizations to implement and test the approach with their own AI systems.

Read Full Article