Related Streams

Information pathways in transformer models

Overview

In a [Transformer] model, residual streams are pathways that carry information as it flows through the many layers of the model. After each layer processes input, the resulting information is stored in a residual stream, where it accumulates and is then passed along to the next layer. The residual stream can be accessed at different points within a transformer block, before or after the attention and multilayer perceptron (MLP) layers. The residual stream acts as a kind of memory for the model, capturing its current understanding of the input sequence as it is being processed.

What are Residual Streams?

The "Information Highway" for a model's inner state.

Pathways that carry information between different layers of a [Transformer] model
Store and accumulate information as it flows through the model
Can be accessed at different points within transformer blocks

How do they Work?

Information from each layer is added to the residual stream
The stream accumulates information as it flows through the model
Can be accessed and modified at various points

Practical Uses

Help understand how information flows through the model
Can be modified to alter the model's behavior
Provide a way to intervene in model's decision making

PreviousProjection

NextSentiment Analysis

Related Streams

Overview

What are Residual Streams?

How do they Work?

Practical Uses

On this page

On this page

Related Streams

Overview

What are Residual Streams?

How do they Work?

Practical Uses

Related Concepts

On this page

On this page