llm
Token
The smallest unit of text processed by a language model. Roughly 1 token equals 4 characters in English or about 3/4 of a word.
What Is a Token
Tokens are fragments of text that input is split into before processing by the model. The model doesn’t “read” text character by character — it works with tokens.
Examples
| Text | Token Count |
|---|---|
| ”Hello” | 1 |
| ”Hello, world!“ | 4 |
| ”artificial intelligence” | 2 |
| ”The quick brown fox” | 4 |
Why It Matters
- Cost — APIs charge per token (input + output)
- Context window — maximum number of tokens per request
- Speed — more tokens = longer generation time
How to Count Tokens
- OpenAI:
tiktokenlibrary - Claude: roughly 1 token = 3.5 characters
- Most APIs return token usage in the response