Memory Limits: How AI Forgets (and Remembers)

Santosh Vaidya February 16, 2025 8 min read

Context Window: Understanding AI's Short-Term Memory

Introduction

Hook: Why AI "Forgets"?
You ask an AI to summarize a 100-page legal contract. It nails the first 10 pages, then starts hallucinating clauses about "unicorn arbitration." Why? Because even the smartest AI has a memory limit — and understanding the context window is key to keeping it on-task.

Why This Matters:
A model's context window dictates how much information an AI can process at once. If it runs out of "mental space", it forgets earlier points potentially leading to garbled outputs, lost details, and costly mistakes — whether you're drafting code, analysing data, or chatting with customers. By mastering context windows, we can ensure dialogues remain intact and correctly interpreted.

What Is a Context Window?

Simple Definition:
A context window is the maximum number of tokens (text units) an AI model can consider in a single interaction. Exceed it, and the model "forgets" earlier content.

Analogy:
Imagine the context window as a backpack. You can only fit so much inside. Prioritize the essentials, or the AI will toss out old items to make room for new ones. To further simplify think of the context window like a briefcase filled with important documents. You can only fit so many files inside at once. If you have more files than it can hold, you'll swap out older documents to make room — meaning the AI may discard crucial details from the first sections of your credit risk report or customer's financial history.

Key Components

Focus on three pillars:

1. Token Limits:
Fixed capacity (e.g., GPT-4 Turbo supports 128k tokens, roughly 300 pages of text). Think of analyzing a merger agreement that's 500 pages long — without chunking, you might lose track mid-analysis.

2. Memory Management:
How models retain or discard text as new data flows in. If the conversation about interest rate hedging gets too long, the AI starts forgetting earlier points about collateral requirements..

3. Information Retention:
Striking a balance between previous context (like key regulatory guidelines) and new inputs (like client data).

How It Works

Step 1: Token Counting

Use tools like OpenAI's tokenizer to track usage:

from transformers import GPT2Tokenizer  
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")  
text = "Your input text here..."  
tokens = tokenizer.encode(text)  
print(f"Token count: {len(tokens)}")  # Keep an eye on that total!

Step 2: Chunk Long Inputs

Split large documents (like annual reports or compliance manuals) into smaller sections. This way, you can feed each chunk into the AI sequentially:

def chunk_text(text, max_tokens=4000):
    words = text.split()
    chunks = []
    current_chunk = []
    current_count = 0
    for word in words:
        word_tokens = len(tokenizer.encode(word))
        if current_count + word_tokens > max_tokens:
            chunks.append(" ".join(current_chunk))
            current_chunk = []
            current_count = 0
        current_chunk.append(word)
        current_count += word_tokens
    return chunks

Step 3: Prioritize Critical Context

Put key instructions at the beginning of prompts (models weigh earlier tokens more). For example, if you're analyzing Basel III requirements, lead with "Focus on risk-weighted assets and Tier 1 ratios.".

Use summaries to remind the AI of previous discussion. E.g., "Previously, we covered sections 1–3 of this credit risk policy."

Real-World Applications

Regulatory Summaries
Summarize long compliance documents (e.g., Dodd-Frank, MiFID II) by breaking them into chunks. The AI can handle 30 pages at a time, then you compile summaries for a complete overview.

Wealth Management Analytics
When analyzing a portfolio of hundreds of assets, the context window can max out quickly. Chunking prevents losing track of important details about sector diversification or interest rate exposure.

Trading Strategies
Traders might feed market research, technical indicators, and news articles into an AI for insights. By chunking data, the AI retains key information about recent market movements.

Mergers & Acquisitions
Large M&A deals involve vast legal documents and multiple financial statements. With context windows, the AI can only handle so much at once — so chunk sections like balance sheets, profit-and-loss statements, or due diligence reports.

Challenges & Best Practices

Pitfalls:
Token Overflow: Inputs exceeding limits get truncated, leading to incomplete analysis. If you exceed the window, the AI truncates the earliest text. You might lose crucial disclaimers or risk factors.

Context Drift: The model loses track of the original task as new inputs pile up. As new instructions or text flood in, the AI may forget the original objective or query.

Pro Tips:

1. Summarize Strategically: Use AI to condense prior interactions (e.g., "Summarize the user's main complaint in 50 tokens").

2. External Memory: Store conversation histories or past context in external databases (e.g., Redis) and pull only what's needed into context to stay within token limits.

3. Hybrid Approach: Combine context windows with RAG (Retrieval-Augmented Generation) for dynamic knowledge pulling (teaser for Article 10!).

Tools & Resources

LangChain: Splits and manages long texts for LLMs.

LlamaIndex: Optimizes context retrieval from large datasets.

OpenAI Tokenizer: Visualize how text converts to tokens — helpful for planning input size.

Conclusion

Context windows are AI's short-term memory — finite but manageable. By chunking, large data prioritizing crucial details, and summarizing strategically, you ensure AI models remains focused, accurate, and cost-effective.

Next Up:
"Supercharging AI with External Knowledge" (Week 10). Discover how RAG (Retrieval-Augmented Generation) breaks memory limits by tapping into vast databases!

Call-to-Action

What's the longest document or conversation you've tossed at an AI? Did it forget halfway? Share your stories below — let's troubleshoot together!