Agent
Checklist

An agent is an actor with a defined domain. Give it a stable identity that establishes what it exists to do and the boundaries within which it operates. This includes a clear goal or charter, an explicit scope of responsibility, an owner, and an authority profile describing what actions are permitted without escalation.

Version the Agent

Serialize, diff, and replay

Treat the agent as a versioned bundle: prompts, policies, tool contracts, memory schemas, planners, and evaluators. If you cannot serialize it, diff it, and replay it, you cannot debug it or claim meaningful progress.

Make Hyperparameters Configurable

Control behavior with levers

Agent behavior should be controlled by configurable levers rather than code changes. These include model selection, decoding parameters, planning depth, reflection depth, retrieval settings, tool availability, and budget targets.

Run Under a Harness

Execute inside a standard harness

Agents must execute inside a standard harness that controls inputs, metrics, budgets, and tooling. The harness defines what counts as a run, how failures are handled, how retries work, and when execution terminates.

Externalize Context

Context lives outside the model

The context window is a materialized view of external context. Authoritative context lives outside the model in explicit, durable state such as files, databases, memories, plans, indexes, retrieved facts, and working sets.

Version Your Evals

Evaluators are artifacts, not opinions

Evaluators, rubrics, judge prompts, scoring functions, and datasets are artifacts, not opinions. Store and version them so changes in scores are attributable to agent changes rather than shifting measurement.

Declare Guardrails

Define the non-negotiables

Guardrails are the agent's non-negotiables: termination rules, safety constraints, permission boundaries, schema validity requirements, grounding requirements, escalation rules, and data-handling constraints.

Optimize With Budgets

Make tradeoffs visible

Budgets are part of the objective. Tokens, latency, tool calls, and cost must be measured and optimized alongside quality or the system will drift into impractical behavior.

Make Decisions Traceable

Emit machine-readable decision records

Agents should emit a machine-readable decision record that makes outcomes inspectable without exposing private reasoning. This record should capture what was decided, what evidence was used, what external actions were taken, and what guardrails or approvals were involved.

Package Tools as Components

Reusable with clear interfaces

Tools are reusable components. A packaged tool has a schema-defined interface, documented semantics, explicit permission scope, defined failure modes, timeouts, idempotency or retry behavior, and a harness-friendly mock or simulator.

Enable Human Override

Interruptible, steerable, resumable

Agents must be operable in the real world. They should be interruptible, steerable, and resumable. Provide mechanisms to stop, pause, or approve steps, override tool choices, inject constraints, and hand off to a human with a clear audit trail.

Treat Outputs as Training Data

Production feeds back into the system

Every run produces value beyond its immediate result. Persist outputs, intermediate artifacts, outcomes, and human edits so they can be reused for future training, preference learning, eval expansion, and regression prevention.

AgentChecklist

Cite / Reference

Index

Declare Agent Identity

Version the Agent

Make Hyperparameters Configurable

Run Under a Harness

Externalize Context

Version Your Evals

Declare Guardrails

Optimize With Budgets

Make Decisions Traceable

Package Tools as Components

Enable Human Override

Treat Outputs as Training Data

Declare Agent Identity

Define domain and boundaries

Version the Agent

Serialize, diff, and replay

Make Hyperparameters Configurable

Control behavior with levers

Run Under a Harness

Execute inside a standard harness

Externalize Context

Context lives outside the model

Version Your Evals

Evaluators are artifacts, not opinions

Declare Guardrails

Define the non-negotiables

Optimize With Budgets

Make tradeoffs visible

Make Decisions Traceable

Emit machine-readable decision records

Package Tools as Components

Reusable with clear interfaces

Enable Human Override

Interruptible, steerable, resumable

Treat Outputs as Training Data

Production feeds back into the system

Agent
Checklist