Microsoft’s Differential Transformer cancels attention noise in LLMs
Source: Venture Beat
A simple change to the attention mechanism can make LLMs much more effective at finding relevant information in their context window.
Read Full Article