Token

The fundamental unit of text processed by language models.

Overview

A token is the smallest unit of text that an AI model processes. Think of tokens as the building blocks of language that AI models understand - they can be words, parts of words, or even single characters. For example, the word "understanding" might be broken into tokens like "under" and "standing", while "cat" might be a single token.

How Tokens Work

Tokens help AI models process text efficiently by:

Breaking text into manageable pieces
Maintaining consistent input sizes
Capturing common patterns in language
Handling multiple languages effectively

Common Examples

Full words: "cat", "dog", "the"
Word pieces: "under" + "stand" + "ing"
Special tokens: [START], [END], [MASK]
Numbers and punctuation: "123", "!", "?"
Common sequences: "ing", "'s", "pre"

Practical Impact

Understanding tokens is important because they:

Affect model performance
Influence processing costs
Impact text length limits
Determine model behavior

PreviousNamed Entity Recognition For Medical Texts

NextToken Embeddings

Token

Overview

How Tokens Work

Common Examples

Practical Impact

On this page

On this page

Token

Overview

How Tokens Work

Common Examples

Practical Impact

Related Concepts

On this page

On this page