What Is a Context Window? Why Your AI Coding Tool Forgets

Why Your AI Just "Forgot" What You Said

You've been building something for 45 minutes. The AI was right there with you — it knew your component structure, your naming conventions, the bug you fixed an hour ago. Then, without warning, it starts suggesting code that contradicts what it already wrote. It ignores a constraint you mentioned three messages back. It regenerates something you already told it to remove.

Nothing broke. The AI isn't confused. It literally cannot remember what you said — because it ran out of room.

The symptom: mid-project drift

This specific experience has a name: mid-project drift. The AI's responses start to feel disconnected from the project's earlier context. It stops referencing prior decisions. Answers get more generic. You find yourself repeating things.

It feels like the AI is getting lazy, but the real cause is mechanical. The AI only has access to what's currently loaded in its context window. Once that window fills up, older content gets pushed out — and the AI has no access to it at all.

It's not a bug — it's a hard limit

Every AI model has a fixed amount of working memory. That memory is called the context window. It holds everything: your messages, the AI's responses, any code it reads, any files it scans. When the total amount of content exceeds the window's size, something has to give.

Different tools handle this differently — and most don't warn you in plain English when it's about to happen.

What a Context Window Actually Is

Think of the context window as a whiteboard the AI can see. It can only work with what's currently written on that whiteboard. When the whiteboard fills up, the AI starts erasing older content from the top to make room for new content at the bottom. What gets erased is gone — the AI can't see it anymore.

Tokens, not words

The whiteboard isn't measured in words — it's measured in tokens. Tokens are chunks of text that roughly correspond to syllables or short word fragments. As a rough rule:

~750 words ≈ 1,000 tokens
A typical full-length article like this one ≈ 1,500–2,000 tokens
A file with 500 lines of TypeScript ≈ 3,000–5,000 tokens

When an AI tool reads your codebase files or pastes in a long error log, those tokens add up fast.

The window fills from both ends

Here's the part most explainers skip: the context window fills with both your input and the AI's output. Every response the AI writes counts against the limit, not just the messages you send. Long explanations, large code blocks, step-by-step plans — those all eat into the shared budget.

In a long coding session, a back-and-forth conversation can hit tens of thousands of tokens just from accumulated replies.

What happens when it overflows

When the window is full, the AI silently drops the oldest content. Most tools don't show a "context almost full" banner — the conversation just keeps going, and the AI answers with less information than it had before. ( is an exception: it now displays a visual indicator that turns yellow near capacity.)

This is the root cause of mid-project drift. The AI isn't degrading. It's answering with a smaller slice of your conversation than when you started.

Context Limits in the Tools You're Using

Context window sizes vary by model and tool. Here's where the three most popular AI coding tools currently stand.

Claude Code (Sonnet 4.6 / Opus 4.8)

runs on Anthropic's models, which currently support a 1,000,000-token context window. That's one of the largest in the consumer AI space — roughly equivalent to 750,000 words, or a substantial portion of a large codebase.

In practice this means a single Claude Code session can hold an enormous amount before drift kicks in. But large agentic tasks that repeatedly scan files, run commands, and generate long outputs can still fill it over time. The model itself doesn't give you a real-time token counter in the terminal.

For pricing and plan details, see Anthropic's pricing page.

Cursor

's context window depends on which underlying model you've selected. As of Cursor 3 (April 2026), the old Composer pane has been replaced by a full Agents Window designed for running and managing AI agents across multiple files. Inline edit mode (Cmd+K / Ctrl+K) still operates with a narrower local context focused on the surrounding code selection.

Cursor 3.3 added a context usage breakdown in the agent interface, so you can see how context is distributed across rules, skills, MCPs, and subagents. Prior to this, the context indicator was inconsistently present — if you're on an older version and responses start feeling less accurate deep in an agent session, that's usually the first signal.

For a direct comparison of how Cursor and Claude Code handle larger projects, see the Cursor vs. Claude Code guide for beginners.

Windsurf (Cascade)

Windsurf's Cascade mode is designed to handle multi-step agentic tasks across a codebase. Codeium (Windsurf's developer) has described Cascade as using a "flows" model that manages context across a session rather than a simple rolling window — but the underlying limits still apply based on the model being used.

Unlike Claude Code, Windsurf does show a visual context window indicator. As of early 2026, Windsurf added a real-time indicator in the Cascade interface that shows how full your context window is — it turns yellow when you're approaching capacity, signaling that it's time to start a new conversation. A prompt cache timer was also integrated into the indicator to help track caching status.

For a deeper look at how Windsurf handles longer sessions, see the Windsurf review.

3 Workarounds That Actually Help

You can't increase the window — but you can work within it. These three approaches have the lowest friction for beginners.

Start a fresh chat and paste a summary

When a session starts drifting, don't keep pushing the same thread. Open a new chat and begin with a short summary of the project state: what you're building, what's already done, the key decisions made so far, and what you need next.

This effectively resets the whiteboard with only the most important content, leaving maximum space for the actual work ahead. Three to five sentences is usually enough.

Use a CLAUDE.md (or equivalent rules file) to anchor key decisions

The most durable fix is to write down your project's core decisions somewhere the AI will always read it — before the conversation starts.

In Claude Code, this is the CLAUDE.md file. It sits in your project root and gets loaded into every session automatically. Architecture choices, naming conventions, constraints, and preferences you write there don't consume chat context — they're loaded fresh each time.

Cursor and Windsurf have equivalent "rules" or "instructions" files that work the same way. Use them. A few bullet points in a rules file can save you from re-explaining the same thing in every session.

Keep sessions scoped: one feature, one chat

Long, sprawling sessions are the fastest way to overflow a context window. A session that starts with "let's build the login page" and drifts into backend API design, database schema changes, and a completely different component will burn through context fast.

Instead: one feature per chat. When you've finished the login page, commit the work, summarize what was done, and open a fresh session for the next piece. This discipline keeps each context window focused on exactly what's relevant.

For more techniques on keeping AI sessions productive, see how to write better prompts for AI coding tools. And if you're already dealing with broken or confused output from a long session, how to fix AI-generated code covers recovery steps.

When Context Size Actually Matters for Choosing a Tool

Does a bigger window mean a better tool?

A larger context window gives you more headroom before drift kicks in, but it's not the whole picture. A tool that uses its context intelligently — by prioritizing the most relevant file content and summarizing what it doesn't need in full — will outperform a tool with a bigger window that loads everything indiscriminately.

For most beginners, context window size is rarely the deciding factor for choosing a tool. Workflow, price, and editor integration matter more day to day. Where context size becomes relevant is when you're working on a large, complex project and need the AI to hold a lot of state across a long session — that's where Claude Code's 1M-token window gives it a practical edge over tools that hit limits sooner.

The honest answer: learn to work within the window before you optimize for window size. The three habits above — fresh chats, rules files, and scoped sessions — will serve you in any tool you use.

Some links may be affiliate links. We may earn a commission at no extra cost to you.

Ready to go deeper? Start with what a CLAUDE.md file is and how to write one — it's the highest-leverage fix for context drift and takes about ten minutes to set up.

From the comments

AI personas · answered by the author

@cachemoney

Dumb question from the invoice-watcher: if the context window is filling up with every message and every file the AI reads, am I being charged for all those tokens every single turn? A long session sounds like it could quietly rack up a bill.

Sam Okafor · author

Fair thing to watch, but this article is about the memory limit, not pricing, so I won't pretend to know the exact billing math here. What the piece does establish is that the window holds everything — your messages, the AI's replies, and any files it scans — and that those tokens add up fast when it reads codebase files or pastes a long error log. So whether or not your plan meters tokens, the same habits help: scoped sessions and rules files keep less junk loaded. For actual plan and pricing details on Claude Code specifically, the article links out to Anthropic's pricing page rather than quoting numbers.

@cachemoney

So the cheap move is basically: don't let one giant sprawling chat balloon. One feature per chat keeps the loaded pile small either way?

Sam Okafor · author

That's the read. The article frames "one feature, one chat" as the discipline that keeps each window focused on what's relevant, and notes that sprawling sessions are the fastest way to overflow. Smaller relevant pile is good for accuracy regardless of how your plan bills.

@ships_at_2am

Claude Code has a 1,000,000-token window. So can't I just paste my whole repo in at the start and let it rip? Why bother scoping sessions if the window is that huge?

Sam Okafor · author

You can fit a lot — the article puts 1M tokens at roughly 750,000 words, a substantial chunk of a large codebase. But the piece is pretty direct that bigger isn't automatically better: it says a tool that prioritizes relevant content beats one that loads everything indiscriminately. And it warns that large agentic tasks which repeatedly scan files, run commands, and generate long outputs can still fill even that window over time. So loading the whole repo upfront burns headroom you'll want later in the session.

@ships_at_2am

Okay but say I ignore that and the thing starts drifting mid-build anyway. Fastest recovery that doesn't make me re-explain everything?

Sam Okafor · author

Two moves from the article. Short term: open a fresh chat and start with a three-to-five-sentence summary of project state — what you're building, what's done, key decisions, what's next. That resets the whiteboard with only the important stuff. Durable fix: put your core decisions in a CLAUDE.md (or the equivalent rules file in Cursor/Windsurf). The article notes that file loads into every session automatically and doesn't consume chat context, so you stop re-explaining it every time.

@promptpls

Total beginner here. When the article says "start a fresh chat," does that mean the AI forgets my actual code too? Like does my project disappear, or just the conversation?

Sam Okafor · author

Good instinct to ask. The thing that resets is the conversation — the article describes the context window as a whiteboard holding your messages and the AI's replies. Your files on disk don't go anywhere; the AI just no longer has the earlier back-and-forth loaded. That's why the article says to begin the new chat with a short summary: you're handing it back the key decisions, and it can read the actual code files fresh as needed.

@promptpls

Oh okay. So the summary is basically me reminding it of stuff it can't re-read on its own, like why we chose something?

Sam Okafor · author

Exactly. Code files it can scan again; the reasoning and decisions live only in the conversation, so those are what you summarize. The article suggests three to five sentences — what you're building, what's done, the key decisions, and what you need next. For decisions you make over and over, it recommends putting them in a CLAUDE.md or rules file so you don't have to type them into every new chat.

context-window explainer beginner-friendly claude-code cursor windsurf

The StackBrief weekly

New reviews and the AI-coding-tool news worth knowing — with our take. One email a week, unsubscribe anytime.

Keep reading

explainer

What Is Context Rot? (And How to Fix It Fast)

Context rot is why your AI coding tool degrades mid-session — not because the window is full, but because it's polluted. What causes it and how to fix it.

May 10, 2026

explainer

What Is a System Prompt? A Beginner's Plain-English Guide

What is a system prompt and why does it control how Cursor, Cline, and Claude Code behave? Plain-English explainer for beginners who keep seeing the term.

May 12, 2026

explainer

Prompt Chaining Explained for Vibe Coders

Prompt chaining explained: why one giant prompt collapses, and how breaking AI coding tasks into small verified steps gets you working code every time.

May 10, 2026