2026-01-19

Tools Return Data, Agents Make Decisions

After writing "Features Don't Compose", I had five primitives. Clean. Atomic. But the agent story was incomplete.

An agent could call openpkg diff old.json new.json and get a wall of changes. But what it really wanted was: "are there breaking changes?" That required parsing the output, counting items, deciding what mattered. Logic that belonged in the tool, not the prompt.

The gap

The primitives were right. The outputs were wrong.

diff returned everything. But CI pipelines don't want everything — they want a gate. Breaking changes? Exit 1. No breaking changes? Exit 0. Simple.

docs generated markdown. But what if you're using Fumadocs? Or Nextra? Or some custom thing? The agent had no way to say "generate docs in this format" without code changes.

The tool was doing too much and not enough at the same time.

The refactor

I broke diff into three commands:

openpkg breaking old.json new.json   # exit 1 if breaking, exit 0 if not
openpkg semver old.json new.json     # { bump: "major" | "minor" | "patch" }
openpkg changelog old.json new.json  # markdown for release notes

Same underlying diff. Three different answers to three different questions. The agent doesn't parse output and decide — it asks the right question and gets a direct answer.

For docs, I added an adapter registry:

openpkg docs spec.json --adapter fumadocs -o ./docs

Now adding a new output format is registering an adapter, not forking the codebase.

filter

The real primitive that emerged was filter. Given a spec and criteria, return a filtered spec:

# By kind
openpkg filter spec.json --kind function
openpkg filter spec.json --kind type,interface

# By name or search
openpkg filter spec.json --name "createDocs,loadSpec"
openpkg filter spec.json --search "parse"

# By metadata
openpkg filter spec.json --deprecated
openpkg filter spec.json --has-description
openpkg filter spec.json --missing-description

# Just counts
openpkg filter spec.json --kind function --summary

The agent can filter before rendering, filter before diffing, filter before anything. One primitive unlocks a dozen workflows I never designed for.

The philosophy

I kept coming back to a simple test: who decides?

If the tool decides, it's a feature - and features don't compose. If the agent decides based on data the tool returns, it's a primitive - and primitives compose.

breaking doesn't decide what counts as breaking — the spec package defines that. But it does give a clean yes/no that an agent can act on. diagnostics doesn't decide if missing descriptions are acceptable — it just reports them. The agent (or the human, or the CI config) decides what to do with that information.

Tools return data. Agents make decisions.