Tools Return Data, Agents Make Decisions
After writing "Features Don't Compose", I had five primitives. Clean. Atomic. But the agent story was incomplete.
An agent could call openpkg diff old.json new.json and get a wall of changes. But what it really wanted was: "are there breaking changes?" That required parsing the output, counting items, deciding what mattered. Logic that belonged in the tool, not the prompt.
The gap
The primitives were right. The outputs were wrong.
diff returned everything. But CI pipelines don't want everything — they want a gate. Breaking changes? Exit 1. No breaking changes? Exit 0. Simple.
docs generated markdown. But what if you're using Fumadocs? Or Nextra? Or some custom thing? The agent had no way to say "generate docs in this format" without code changes.
The tool was doing too much and not enough at the same time.
The refactor
I broke diff into three commands:
openpkg breaking old.json new.json # exit 1 if breaking, exit 0 if notopenpkg semver old.json new.json # { bump: "major" | "minor" | "patch" }openpkg changelog old.json new.json # markdown for release notes
Same underlying diff. Three different answers to three different questions. The agent doesn't parse output and decide — it asks the right question and gets a direct answer.
For docs, I added an adapter registry:
openpkg docs spec.json --adapter fumadocs -o ./docs
Now adding a new output format is registering an adapter, not forking the codebase.
filter
The real primitive that emerged was filter. Given a spec and criteria, return a filtered spec:
# By kindopenpkg filter spec.json --kind functionopenpkg filter spec.json --kind type,interface# By name or searchopenpkg filter spec.json --name "createDocs,loadSpec"openpkg filter spec.json --search "parse"# By metadataopenpkg filter spec.json --deprecatedopenpkg filter spec.json --has-descriptionopenpkg filter spec.json --missing-description# Just countsopenpkg filter spec.json --kind function --summary
The agent can filter before rendering, filter before diffing, filter before anything. One primitive unlocks a dozen workflows I never designed for.
The philosophy
I kept coming back to a simple test: who decides?
If the tool decides, it's a feature - and features don't compose. If the agent decides based on data the tool returns, it's a primitive - and primitives compose.
breaking doesn't decide what counts as breaking — the spec package defines that. But it does give a clean yes/no that an agent can act on. diagnostics doesn't decide if missing descriptions are acceptable — it just reports them. The agent (or the human, or the CI config) decides what to do with that information.
Tools return data. Agents make decisions.