The scenario
A developer describes a task in natural language — "add pagination to the user list endpoint, with tests" — and an AI agent reads the relevant files, writes the code changes, runs the tests, reads the failure output, fixes the issue, and stops when the tests pass.
No copy-paste from ChatGPT. No manual application of suggestions. The agent is in the loop.
This is agentic coding: the AI operates as a collaborator with file access, tool use, and iterative execution — not just a text completion engine.
How it works
The agent runs a loop:
Task description (from developer)
↓
[LLM: Plan] → steps to accomplish the task
↓
[LLM: Act] → tool call (read file / write file / run command / search docs)
↓
[Tool execution] → result (file content / test output / error message)
↓
[LLM: Observe] → interpret result, decide next action
↓
... repeat until done or human approval needed
The LLM is not just generating text — it is reasoning about state and making decisions. The tools give it hands.
Tools an agentic coding assistant uses
| Tool | What it does |
|---|---|
read_file | Read any file in the repo |
write_file | Create or overwrite a file |
edit_file | Make targeted edits (diff-style) |
run_command | Execute shell commands (tests, builds, linters) |
search_codebase | Semantic or keyword search across files |
web_search | Look up documentation, Stack Overflow, GitHub issues |
list_directory | Explore the file tree |
MCP (Model Context Protocol) is the emerging standard for defining and connecting these tools — allowing the same agent to work across different editors and environments.
What changes compared to autocomplete
Autocomplete is reactive: the dev types, the model suggests the next tokens.
Agentic coding is proactive: the dev describes intent, the agent figures out what to do, does it, verifies it, and iterates.
| Trigger | Dev types | Dev describes task |
| Scope | Current file, cursor position | Entire codebase |
| Execution | Suggestion only | Runs code, tests, commands |
| Iteration | One-shot | Multiple steps |
| Human in loop | Every suggestion | At approval checkpoints |
| Error handling | None | Reads errors, tries to fix |
The trust and control problem
Agentic coding introduces a new challenge: the agent takes actions that are hard to reverse.
Deleting a file is easy. Undoing it after 5 more agent steps is painful. Letting an agent push to main without review is dangerous.
Good agentic systems address this with:
1. Checkpoints. The agent pauses before destructive actions (file deletion, git push, running migrations) and asks for approval.
2. Diff review. All file edits are shown as diffs before being applied. The dev sees exactly what changed.
3. Sandbox execution. Shell commands run in a sandboxed environment — the agent can't accidentally affect production.
4. Scope constraints. The agent is told which directories it's allowed to modify. It cannot wander outside the task boundary.
5. Git as a safety net. Every significant checkpoint creates a commit. If the agent goes off-track, git reset brings you back.
Context window management
An agentic session over a large codebase quickly exhausts the context window. The agent needs to be smart about what it loads:
- Don't load the entire repo — only files relevant to the task
- Summarize read results — instead of including the full file, extract only the relevant functions
- Use search first, read second — search to find the right file, then read only that file
KV cache is critical for performance: large shared context (the task description, coding conventions, architecture overview) is cached and not re-sent on every iteration.
Practical example: "Add pagination to /users endpoint"
Step 1 → search_codebase("users endpoint")
→ finds src/routes/users.py
Step 2 → read_file("src/routes/users.py")
→ sees: def list_users(): return all_users
Step 3 → read_file("tests/test_users.py")
→ sees: existing tests for list_users
Step 4 → edit_file("src/routes/users.py")
→ adds: page, page_size params + slicing logic
Step 5 → edit_file("tests/test_users.py")
→ adds: tests for pagination edge cases
Step 6 → run_command("pytest tests/test_users.py")
→ output: FAILED – TypeError: unsupported operand
Step 7 → [LLM reads error, identifies off-by-one in slicing]
→ edit_file("src/routes/users.py") — fixes the bug
Step 8 → run_command("pytest tests/test_users.py")
→ output: 4 passed
Done. → presents diff to developer for review
Leading tools in 2026
| Tool | Type | Key feature |
|---|---|---|
| Claude Code | CLI agent | Deep file/shell access, MCP |
| Cursor | Agentic IDE | Codebase-wide context, inline agent |
| GitHub Copilot Workspace | Cloud agent | PR-level tasks, GitHub integration |
| Devin | Autonomous agent | Full-session autonomy, web browsing |
| Windsurf (Codeium) | Agentic IDE | Flow-based agent, fast iteration |
When agentic coding shines — and when it doesn't
Good fit:
- Repetitive, well-defined tasks (add CRUD endpoints, write tests for existing code, migrate a library)
- Debugging with a clear error message and isolated scope
- Exploring an unfamiliar codebase to understand structure
Poor fit:
- Architectural decisions that require deep business context
- Tasks requiring access to systems the agent can't reach (production DBs, customer data)
- Long-horizon tasks where the goal keeps changing — the agent needs stable objectives