Anthropic releases new “hybrid reasoning” AI model

Anthropic releases new “hybrid reasoning” AI model

Source: The Verge

Anthropic is releasing Claude 3.7 Sonnet, its first “hybrid reasoning model” that can solve more complex problems and outperforms previous models in areas like math and coding.

In addition to a new model, Anthropic is also releasing a “limited research preview” of its “agentic” coding tool called Claude Code. While Anthropic already powers AI coding tools like Cursor, it’s pitching Claude Code as “an active collaborator that can search and read code, edit files, write and run tests, commit and push code to GitHub, and use command line tools.”

Claude 3.7 Sonnet is available starting Monday in the Claude app and for developers through Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertix AI. The model costs the same to run as its predecessor, 3.5 Sonnet, at $3 per million input tokens and $15 per million output tokens.

While OpenAI and others offer separate so-called reasoning models, Anthropic product research lead Dianne Penn tells The Verge that the company wanted to simplify the experience of using a model. “We fundamentally believe that reasoning is a feature of the AI rather than a completely separate thing,” she says, noting that Claude shouldn’t take long to answer the question “What time is it?” versus responding to a more complex prompt like, “plan a two-week trip to Italy while considering the weather in late March.”

Penn says that Claude 3.7 Sonnet performs noticeably better on “agentic coding,” finance, and legal tasks. While Claude still lacks real-time web search like other models, version 3.7’s knowledge cut-off date of October 2024 is more up to date. Anthropic is also allowing developers to help steer how the model “thinks” via its scratchpad and even dictate exactly how long it takes to respond. “Sometimes the developer just needs to say it shouldn’t take more than 200 milliseconds to answer this question,” says Anthropic’s VP of product, Michael Gerstenhaber. “And that’s a product decision.”

Inside Anthropic, employees have used the new model to build front-end website designs, interactive games, and even spend up to 45 minutes on coding work by “building test sets and editing test cases back and forth iteratively,” according to Penn.

She says that the company also tests its models on their ability to advance through an old-school Pokémon video game by mapping the model’s API to a controller scheme. Claude 3.5 Sonnet couldn’t get out of Pallet Town at the beginning of the game while version 3.7 was able to defeat multiple gym leaders.

As Elon Musk showed with Grok-3 last week, the AI model race is moving incredibly fast. For now, Anthropic appears to be in the lead again thanks to Claude 3.7 Sonnet’s performance gains. Its release also suggests that, rather than offer standalone reasoning models, the industry is moving toward a future where one model can do everything.



Read Full Article