Use Case: Agentic Coding

The scenario

A developer describes a task in natural language — "add pagination to the user list endpoint, with tests" — and an AI agent reads the relevant files, writes the code changes, runs the tests, reads the failure output, fixes the issue, and stops when the tests pass.

No copy-paste from ChatGPT. No manual application of suggestions. The agent is in the loop.

This is agentic coding: the AI operates as a collaborator with file access, tool use, and iterative execution — not just a text completion engine.

How it works

The agent runs a loop:

Task description (from developer)
    ↓
[LLM: Plan]  →  steps to accomplish the task
    ↓
[LLM: Act]   →  tool call (read file / write file / run command / search docs)
    ↓
[Tool execution]  →  result (file content / test output / error message)
    ↓
[LLM: Observe]  →  interpret result, decide next action
    ↓
... repeat until done or human approval needed

The LLM is not just generating text — it is reasoning about state and making decisions. The tools give it hands.

Tools an agentic coding assistant uses

Tool	What it does
`read_file`	Read any file in the repo
`write_file`	Create or overwrite a file
`edit_file`	Make targeted edits (diff-style)
`run_command`	Execute shell commands (tests, builds, linters)
`search_codebase`	Semantic or keyword search across files
`web_search`	Look up documentation, Stack Overflow, GitHub issues
`list_directory`	Explore the file tree

MCP (Model Context Protocol) is the emerging standard for defining and connecting these tools — allowing the same agent to work across different editors and environments.

What changes compared to autocomplete

Autocomplete is reactive: the dev types, the model suggests the next tokens.

Agentic coding is proactive: the dev describes intent, the agent figures out what to do, does it, verifies it, and iterates.

Trigger	Dev types	Dev describes task
Scope	Current file, cursor position	Entire codebase
Execution	Suggestion only	Runs code, tests, commands
Iteration	One-shot	Multiple steps
Human in loop	Every suggestion	At approval checkpoints
Error handling	None	Reads errors, tries to fix

The trust and control problem

Agentic coding introduces a new challenge: the agent takes actions that are hard to reverse.

Deleting a file is easy. Undoing it after 5 more agent steps is painful. Letting an agent push to main without review is dangerous.

Good agentic systems address this with:

1. Checkpoints. The agent pauses before destructive actions (file deletion, git push, running migrations) and asks for approval.

2. Diff review. All file edits are shown as diffs before being applied. The dev sees exactly what changed.

3. Sandbox execution. Shell commands run in a sandboxed environment — the agent can't accidentally affect production.

4. Scope constraints. The agent is told which directories it's allowed to modify. It cannot wander outside the task boundary.

5. Git as a safety net. Every significant checkpoint creates a commit. If the agent goes off-track, git reset brings you back.

Context window management

An agentic session over a large codebase quickly exhausts the context window. The agent needs to be smart about what it loads:

Don't load the entire repo — only files relevant to the task
Summarize read results — instead of including the full file, extract only the relevant functions
Use search first, read second — search to find the right file, then read only that file

KV cache is critical for performance: large shared context (the task description, coding conventions, architecture overview) is cached and not re-sent on every iteration.

Practical example: "Add pagination to /users endpoint"

Step 1  → search_codebase("users endpoint") 
        → finds src/routes/users.py

Step 2  → read_file("src/routes/users.py")
        → sees: def list_users(): return all_users

Step 3  → read_file("tests/test_users.py")
        → sees: existing tests for list_users

Step 4  → edit_file("src/routes/users.py")
        → adds: page, page_size params + slicing logic

Step 5  → edit_file("tests/test_users.py")
        → adds: tests for pagination edge cases

Step 6  → run_command("pytest tests/test_users.py")
        → output: FAILED – TypeError: unsupported operand

Step 7  → [LLM reads error, identifies off-by-one in slicing]
        → edit_file("src/routes/users.py") — fixes the bug

Step 8  → run_command("pytest tests/test_users.py")
        → output: 4 passed

Done. → presents diff to developer for review

Leading tools in 2026

Tool	Type	Key feature
Claude Code	CLI agent	Deep file/shell access, MCP
Cursor	Agentic IDE	Codebase-wide context, inline agent
GitHub Copilot Workspace	Cloud agent	PR-level tasks, GitHub integration
Devin	Autonomous agent	Full-session autonomy, web browsing
Windsurf (Codeium)	Agentic IDE	Flow-based agent, fast iteration

When agentic coding shines — and when it doesn't

Good fit:

Repetitive, well-defined tasks (add CRUD endpoints, write tests for existing code, migrate a library)
Debugging with a clear error message and isolated scope
Exploring an unfamiliar codebase to understand structure

Poor fit:

Architectural decisions that require deep business context
Tasks requiring access to systems the agent can't reach (production DBs, customer data)
Long-horizon tasks where the goal keeps changing — the agent needs stable objectives