Token
The fundamental unit of text processed by language models.
Overview
A token is the smallest unit of text that an AI model processes. Think of tokens as the building blocks of language that AI models understand - they can be words, parts of words, or even single characters. For example, the word "understanding" might be broken into tokens like "under" and "standing", while "cat" might be a single token.
How Tokens Work
Tokens help AI models process text efficiently by:
- Breaking text into manageable pieces
- Maintaining consistent input sizes
- Capturing common patterns in language
- Handling multiple languages effectively
Common Examples
- Full words: "cat", "dog", "the"
- Word pieces: "under" + "stand" + "ing"
- Special tokens: [START], [END], [MASK]
- Numbers and punctuation: "123", "!", "?"
- Common sequences: "ing", "'s", "pre"
Practical Impact
Understanding tokens is important because they:
- Affect model performance
- Influence processing costs
- Impact text length limits
- Determine model behavior