Related Streams
Information pathways in transformer models
Overview
In a [Transformer] model, residual streams are pathways that carry information as it flows through the many layers of the model. After each layer processes input, the resulting information is stored in a residual stream, where it accumulates and is then passed along to the next layer. The residual stream can be accessed at different points within a transformer block, before or after the attention and multilayer perceptron (MLP) layers. The residual stream acts as a kind of memory for the model, capturing its current understanding of the input sequence as it is being processed.
What are Residual Streams?
The "Information Highway" for a model's inner state.
- Pathways that carry information between different layers of a [Transformer] model
- Store and accumulate information as it flows through the model
- Can be accessed at different points within transformer blocks
How do they Work?
- Information from each layer is added to the residual stream
- The stream accumulates information as it flows through the model
- Can be accessed and modified at various points
Practical Uses
- Help understand how information flows through the model
- Can be modified to alter the model's behavior
- Provide a way to intervene in model's decision making