From Suggestions To Delegated Work
The word agent has become vague.
In software work, the useful definition starts with delegation.
An assistant gives the developer something to use: a completion, an explanation, a patch, a suggestion. The developer still carries the work.
An agent can participate in the work loop: read context, choose a tool action, edit files, run commands, observe results, and continue under constraints the developer can inspect.
The work surface is anything text: code, Markdown, JSON, YAML, issue text, command output. File type is not the line. The line is whether the system can operate across the relevant context, tools, artifacts, and feedback inside a controlled loop.
The first tools did not delegate the loop. They delegated small pieces around it.
Five Steps Into The Work Loop
Completion delegated text near the cursor. Around 2021, GitHub Copilot made inline code prediction a normal part of the editor. The delegated work was narrow: complete a function, suggest an API call, fill in boilerplate. The developer still chose the file, framed the function, accepted or rejected the suggestion, edited the result, ran the check, and owned the context.
Chat delegated explanation and draft artifacts. ChatGPT’s 2022 launch made natural-language software help visible at a new scale. A developer could describe a problem and get an explanation, example, or patch-like answer. The model could reason across a prompt, answer follow-up questions, and help untangle a bug from a pasted stack trace. The workflow still depended on manual transfer: context copied in, result copied out, mismatches resolved by hand, checks run by the developer.
IDE assistants delegated more editor context. In 2023 and 2024, AI moved deeper into the editor, and a wave of intelligent editors and IDE extensions formed around that shift. Cursor made the editor itself part of the AI surface, with codebase context and inline edits. Copilot Chat brought conversational assistance into widely used IDEs. Windsurf later pushed the same pattern toward an agentic IDE. They were visible markers of a broader category: tools that made open files, selected code, diagnostics, codebase indexes, terminal fragments, and explicit file mentions part of the conversation. The developer pasted less context, but still chose what to apply and verified the result.
Terminal agents delegated file edits, commands, and feedback. A terminal agent can read files, edit files, run commands, observe output, and continue the loop. The developer no longer has to copy a proposed patch into the repository or summarize a failing test back to the model. The work product changes from a suggestion to interpret into a process run and a diff to review.
Composable agents delegated repeatable workflows under project rules. The more recent shape is an extensible harnessThe software around the model: context, tools, approvals, sandboxing, command execution, and stopping rules., not just a model with a terminal. Agents can follow project instructions, load custom skills, call approved commands, ask before risky steps, accept mid-flight steering, and delegate bounded side tasks to other agents. The developer defines the commands, constraints, review gates, and workflows that make the agent useful inside a real repository.
Models Became Reliable Enough For The Loop
Delegated work needed model-side changes. Completion and chat helped, but they could not run a loop.
Instruction following improved. An agent must preserve constraints over several turns: do not touch generated files, keep the public API stable, use the repo’s package manager, stop when the next step needs unavailable infrastructure. GPT-4-era chat models made that kind of multi-turn constraint following more practical. Not perfect. Practical enough to build around.
Tool use became structured. Function calling and tool-use interfaces gave applications a way to describe actions as schemas. The model no longer had to write run this command
as prose and wait for the user to do it. It could request a typed action such as reading a file, editing a file, or running a shell command. The harness could execute the action, capture the result, and feed that result into the next model turn.
Context windows grew. Software work needs project instructions, file excerpts, diffs, logs, previous decisions, and the recent transcript. GPT-4 Turbo’s 128K context window in 2023 was one public marker that task state could start to fit inside one session.
Repeated turns became cheaper and faster. Agents do not need one perfect answer. They need dependable next steps: observe, act, verify, adjust. That loop only becomes usable when the cost and latency of repeated turns are low enough that the harness can keep going without turning every test failure into a planning meeting.
The Harness Became Part Of The Product
Models alone did not create agents. The harness changed too.
The harness is the software around the model: tool schemasStructured descriptions of actions the model can request, including their input shapes and constraints., approval modesPolicies for which agent actions run automatically and which require human approval., sandboxesExecution boundaries that limit which files, commands, and network resources the agent can reach., project instructions, command execution, file editing, transcript handling, and integrations. It is the trust boundary between suggestion and execution.
A harness decides which files are reachable, which commands require approval, which tools exist, how tool results are summarized, where project rules are loaded from, and when the agent must stop. This is where engineering judgment enters the system.
Tool schemasStructured descriptions of actions the model can request, including their input shapes and constraints. define the action surface. Approval modesPolicies for which agent actions run automatically and which require human approval. define when the developer must intervene. SandboxesExecution boundaries that limit which files, commands, and network resources the agent can reach. define the reachable world. Project instruction files such as AGENTS.md and CLAUDE.md turn local conventions into durable context. CLIs and repo scripts let the agent use the same operational handles a developer already trusts. MCP gives tool vendors a more standard way to expose external systems.
The model may be impressive, but the harness determines whether impressive text becomes controlled action. Without that control layer, agentic
becomes another way to ask a chat model for a shell command. With it, the model acts inside a bounded system that can be inspected, constrained, and reviewed.
Capability And Control Arrived Together
Earlier assistants could help, but the developer was still the execution layer. The developer selected files, pasted context, ran tests, summarized errors, copied patches, and kept task state in their head. The model could help with individual pieces of reasoning, but the workflow did not yet have a stable place to hold the loop.
Modern agents move more of that loop into a protocol. The session receives instructions and context. The model emits a tool call. The harness executes it under rules. The result returns to the model. The next turn uses that result. The developer steers, approves, interrupts, and reviews.
- sessioninstructions + context
- modeltool call
- harnessrules + execution
- resultobserved output
- next turnuses result
- developersteers + reviews
Capability made the model worth delegating to. Control made the delegation inspectable. Either half alone is not enough. A capable model without a harness is still mostly advice. A harness around an unreliable model becomes automation with poor judgment.
This is the practical framing for AI-first engineering. The unit of value is not the prompt. It is engineering judgment applied through a controlled loop: map the system, define the delegation, expose the right tools, constrain the risky actions, inspect the diff, and verify the result.
What The Word Agent Should Mean
An agent is not a chatbot with repository access. It is a controlled execution loop around a model.
That definition is testable. Can the system read relevant context, choose a tool action, execute it under constraints, observe the result, and continue without the developer manually carrying every step? If yes, it is agentic in the useful engineering sense. If not, it may still be a valuable assistant, but the work loop has not been delegated.
The developer does not disappear. The developer stops being the only execution layer. The work moves to task design, context selection, harness configuration, verification, and review.
The next question is mechanical: what actually happens inside the loop? The useful unit is not AI writes code.
The useful unit is message, tool call, result, next turn. Once that mechanism is visible, delegated work becomes something a team can understand, modernize around, and control.
References
- GitHub, “Introducing GitHub Copilot: your AI pair programmer” (2021): https://github.blog/news-insights/product-news/introducing-github-copilot-ai-pair-programmer/
- OpenAI, “Introducing ChatGPT” (2022): https://openai.com/blog/chatgpt/
- GitHub, “GitHub Copilot X: The AI-powered developer experience” (2023): https://github.blog/news-insights/product-news/github-copilot-x-the-ai-powered-developer-experience/
- GitHub, “GitHub Copilot Chat now generally available for organizations and individuals” (2023): https://github.blog/news-insights/product-news/github-copilot-chat-now-generally-available-for-organizations-and-individuals/
- Cursor, “Codebase Context v1” changelog (2023): https://cursor.com/changelog/0-2-26
- Codeium, “Codeium Introduces the Windsurf Editor” (2024): https://www.webwire.com/ViewPressRel.asp?aId=329485
- OpenAI, “New models and developer products announced at DevDay” (2023): https://openai.com/index/new-models-and-developer-products-announced-at-devday/
- OpenAI, function calling guide: https://developers.openai.com/api/docs/guides/function-calling
- Anthropic, “Introducing the Model Context Protocol” (2024): https://www.anthropic.com/news/model-context-protocol