The scenario

A developer describes a task in natural language — "add pagination to the user list endpoint, with tests" — and an AI agent reads the relevant files, writes the code changes, runs the tests, reads the failure output, fixes the issue, and stops when the tests pass.

No copy-paste from ChatGPT. No manual application of suggestions. The agent is in the loop.

This is agentic coding: the AI operates as a collaborator with file access, tool use, and iterative execution — not just a text completion engine.

How it works

The agent runs a loop:

Task description (from developer)
    ↓
[LLM: Plan]  →  steps to accomplish the task
    ↓
[LLM: Act]   →  tool call (read file / write file / run command / search docs)
    ↓
[Tool execution]  →  result (file content / test output / error message)
    ↓
[LLM: Observe]  →  interpret result, decide next action
    ↓
... repeat until done or human approval needed

The LLM is not just generating text — it is reasoning about state and making decisions. The tools give it hands.

Tools an agentic coding assistant uses

ToolWhat it does
read_fileRead any file in the repo
write_fileCreate or overwrite a file
edit_fileMake targeted edits (diff-style)
run_commandExecute shell commands (tests, builds, linters)
search_codebaseSemantic or keyword search across files
web_searchLook up documentation, Stack Overflow, GitHub issues
list_directoryExplore the file tree

MCP (Model Context Protocol) is the emerging standard for defining and connecting these tools — allowing the same agent to work across different editors and environments.

What changes compared to autocomplete

Autocomplete is reactive: the dev types, the model suggests the next tokens.

Agentic coding is proactive: the dev describes intent, the agent figures out what to do, does it, verifies it, and iterates.

TriggerDev typesDev describes task
ScopeCurrent file, cursor positionEntire codebase
ExecutionSuggestion onlyRuns code, tests, commands
IterationOne-shotMultiple steps
Human in loopEvery suggestionAt approval checkpoints
Error handlingNoneReads errors, tries to fix

The trust and control problem

Agentic coding introduces a new challenge: the agent takes actions that are hard to reverse.

Deleting a file is easy. Undoing it after 5 more agent steps is painful. Letting an agent push to main without review is dangerous.

Good agentic systems address this with:

1. Checkpoints. The agent pauses before destructive actions (file deletion, git push, running migrations) and asks for approval.

2. Diff review. All file edits are shown as diffs before being applied. The dev sees exactly what changed.

3. Sandbox execution. Shell commands run in a sandboxed environment — the agent can't accidentally affect production.

4. Scope constraints. The agent is told which directories it's allowed to modify. It cannot wander outside the task boundary.

5. Git as a safety net. Every significant checkpoint creates a commit. If the agent goes off-track, git reset brings you back.

Context window management

An agentic session over a large codebase quickly exhausts the context window. The agent needs to be smart about what it loads:

  • Don't load the entire repo — only files relevant to the task
  • Summarize read results — instead of including the full file, extract only the relevant functions
  • Use search first, read second — search to find the right file, then read only that file

KV cache is critical for performance: large shared context (the task description, coding conventions, architecture overview) is cached and not re-sent on every iteration.

Practical example: "Add pagination to /users endpoint"

Step 1  → search_codebase("users endpoint") 
        → finds src/routes/users.py

Step 2  → read_file("src/routes/users.py")
        → sees: def list_users(): return all_users

Step 3  → read_file("tests/test_users.py")
        → sees: existing tests for list_users

Step 4  → edit_file("src/routes/users.py")
        → adds: page, page_size params + slicing logic

Step 5  → edit_file("tests/test_users.py")
        → adds: tests for pagination edge cases

Step 6  → run_command("pytest tests/test_users.py")
        → output: FAILED – TypeError: unsupported operand

Step 7  → [LLM reads error, identifies off-by-one in slicing]
        → edit_file("src/routes/users.py") — fixes the bug

Step 8  → run_command("pytest tests/test_users.py")
        → output: 4 passed

Done. → presents diff to developer for review

Leading tools in 2026

ToolTypeKey feature
Claude CodeCLI agentDeep file/shell access, MCP
CursorAgentic IDECodebase-wide context, inline agent
GitHub Copilot WorkspaceCloud agentPR-level tasks, GitHub integration
DevinAutonomous agentFull-session autonomy, web browsing
Windsurf (Codeium)Agentic IDEFlow-based agent, fast iteration

When agentic coding shines — and when it doesn't

Good fit:

  • Repetitive, well-defined tasks (add CRUD endpoints, write tests for existing code, migrate a library)
  • Debugging with a clear error message and isolated scope
  • Exploring an unfamiliar codebase to understand structure

Poor fit:

  • Architectural decisions that require deep business context
  • Tasks requiring access to systems the agent can't reach (production DBs, customer data)
  • Long-horizon tasks where the goal keeps changing — the agent needs stable objectives