npm - @fateforge/archery-cli - Versions diffs - 1.0.3 - Mend

@fateforge/archery-cli 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

package/.agent/AGENT.md +59 -0
package/.agent/AGENT_zh.md +59 -0
package/.agent/CLI-SPEC.md +760 -0
package/.agent/CLI-SPEC_zh.md +700 -0
package/.agent/SEC-SPEC.md +142 -0
package/.agent/SEC-SPEC_zh.md +126 -0
package/.agent/SKILL-SPEC.md +199 -0
package/.agent/SKILL-SPEC_zh.md +195 -0
package/AGENTS.md +9 -0
package/CHANGELOG.md +104 -0
package/CODE_OF_CONDUCT.md +35 -0
package/CODE_OF_CONDUCT_zh.md +35 -0
package/CONTRIBUTING.md +127 -0
package/LICENSE +21 -0
package/NOTICE.md +8 -0
package/README.md +134 -0
package/README_zh.md +134 -0
package/SECURITY.md +51 -0
package/docs/COMPATIBILITY.md +48 -0
package/docs/E2E.md +31 -0
package/docs/LIVE-SMOKE-EVIDENCE.md +160 -0
package/docs/OPEN_SOURCE_CHECKLIST.md +43 -0
package/docs/SECURITY-TIER.md +34 -0
package/package.json +55 -0
package/scripts/run.js +46 -0
package/skills/archery-cli/SKILL.md +302 -0
package/skills/archery-cli/reference/instance.md +226 -0
package/skills/archery-cli/reference/query.md +231 -0
package/skills/archery-cli/reference/workflow.md +194 -0
package/skills/archery-cli/test-prompts.json +27 -0

package/.agent/CLI-SPEC.md ADDED Viewed

@@ -0,0 +1,760 @@
+# Agent-Facing CLI Design Spec
+This document defines the machine contract a CLI must honor when called by an AI agent. The goal: agents can call it reliably, parse it reliably, recover and retry reliably, and never block or mis-write in a non-interactive setting.
+## 1. Core rules
+1. stdout is the contract: emit a single valid JSON document by default; no logs, progress, prompts, or color codes mixed in.
+2. stderr is the side channel: progress, warnings, debug, and error explanations all go to stderr.
+3. Machine-first: default `--format json`; `text` is for humans only; `raw` is for raw bytes, logs, diffs passed through verbatim.
+4. Non-interactive safe: write operations must not wait on keyboard input; use `--dry-run` + `--confirm <token>`.
+5. Deterministic: same input produces the same output structure; field names, field order, and schema version stay stable.
+6. Least surprise: queries don't change state; a write with no valid confirm token must fail rather than proceed.
+7. Recoverable: error codes, exit codes, and `retryable` must be stable enough for an agent to decide retry, back off, or ask the user.
+## 2. Global flags
+| Flag | Meaning |
+|------|---------|
+| `--format json/text/raw` | Output format, default `json` |
+| `--json` | Compatibility alias for `--format json`; not recommended for new calls |
+| `--fields <a,b,c>` | Return only selected fields, reduces tokens (query commands) |
+| `--compact` | Compact JSON output, strips redundant whitespace (query commands) |
+| `--dry-run` | Simulate a write, return a change preview and `confirm_token` |
+| `--confirm <token>` | Carry the dry-run token to actually execute the write |
+| `--quiet` | Suppress progress/prompts on stderr, never suppress errors |
+`update` commands may add tool-specific flags such as `--target-version` or
+`--channel`, but they must keep the common lifecycle flags: `--check`,
+`--dry-run`, and `--confirm <token>`.
+Format responsibilities:
+- `json`: structured machine output, the default, and the only format recommended for agents.
+- `text`: human-readable, may change, must not be parsed programmatically.
+- `raw`: unwrapped bytes / log / diff, passed through verbatim, no JSON envelope.
+## 3. Unified output envelope
+Success and failure share one shape. The agent only needs to check `ok` first.
+Success:
+```json
+{
+  "ok": true,
+  "schema_version": "1.0",
+  "data": {},
+  "meta": {
+    "duration_ms": 0
+  }
+}
+```
+Failure:
+```json
+{
+  "ok": false,
+  "schema_version": "1.0",
+  "error": {
+    "code": "E_NOT_FOUND",
+    "message": "human readable message",
+    "details": {},
+    "retryable": false
+  },
+  "meta": {
+    "duration_ms": 0
+  }
+}
+```
+Conventions:
+- Every JSON response must include `ok` and `schema_version`.
+- `data` is always the command's business payload; do not hoist business fields to the envelope top level.
+- `error.code` is a stable semantic enum, prefixed with `E_`.
+- `error.message` is for humans; agents should not parse it.
+- `error.details` holds structured context; must be redacted.
+- `error.retryable` tells the agent whether it may back off and retry automatically.
+- `meta.duration_ms` records command execution time. `meta` is always emitted on
+  every response (success and error); do not mark it `omitempty`, since
+  `duration_ms: 0` is a valid value the agent should always be able to read.
+- A breaking schema change must bump the `schema_version` major version.
+## 4. stdout / stderr rules
+- In `json` mode, stdout may contain only one JSON document, or NDJSON for explicitly streaming commands.
+- stderr may carry progress, warnings, and diagnostics.
+- On error in `json` mode, the failure envelope is that single JSON document on stdout — agents always parse stdout and check `ok`, never scrape stderr. stderr may add human-readable explanation.
+- `--quiet` may only suppress non-error info on stderr.
+- No banners, prompts, progress bars, or color codes before/after the JSON on stdout.
+- stdout / stderr are always **UTF-8 encoded, no BOM**, newline `\n`, so agents parse reliably across platforms (especially Windows).
+## 5. Streaming output (NDJSON)
+Large output, log streams, subscription streams, and per-item batch results use NDJSON. Each line must be an independent valid JSON object — easy to consume streaming, low memory, interruptible:
+```jsonl
+{"ok":true,"schema_version":"1.0","type":"item","data":{}}
+{"ok":true,"schema_version":"1.0","type":"item","data":{}}
+{"ok":true,"schema_version":"1.0","type":"summary","data":{"count":2}}
+```
+Conventions:
+- Normal queries use a single JSON envelope by default.
+- Use NDJSON only when the command is explicitly a log / stream / subscribe / batched-stream.
+- NDJSON lines must include `ok`, `schema_version`, `type`.
+- The final line should use `type: "summary"`.
+- True binary or plain-text passthrough goes through `--format raw`, not wrapped into one giant JSON.
+## 6. Exit code table
+| Code | Meaning | Agent behavior |
+|------|---------|----------------|
+| 0 | Success | continue |
+| 1 | Generic error | read the error envelope to decide |
+| 2 | Argument/usage error | don't retry, fix args |
+| 3 | Resource not found | don't retry |
+| 4 | Permission/auth/config failure | don't retry, surface credentials or permission |
+| 5 | Confirmation required but token missing | run dry-run for a token, then retry |
+| 6 | Precondition conflict or invalid token | re-read state, then retry |
+| 7 | Retryable transient error (network/rate-limit/server) | back off and retry |
+| 8 | Timeout | back off and retry |
+| 9 | Human action required (see §15.3, optional) | relay to the user, run `resume` once done |
+Error codes and exit codes must align:
+- `E_USAGE` / `E_VALIDATION` -> 2
+- `E_NOT_FOUND` -> 3
+- `E_AUTH` / `E_FORBIDDEN` / `E_CONFIG` -> 4
+- `E_CONFIRMATION_REQUIRED` -> 5
+- `E_CONFLICT` -> 6
+- `E_NETWORK` / `E_RATE_LIMITED` / `E_SERVER` -> 7
+- `E_TIMEOUT` -> 8
+- `E_HUMAN_REQUIRED` -> 9 (optional, only when §15.3 is enabled)
+When the failure comes from an upstream HTTP call, map the status onto the
+taxonomy so the agent can tell failure modes apart from `error.code` +
+`retryable` — do NOT collapse every 4xx into `E_NETWORK`:
+- `401` -> `E_AUTH`
+- `403` -> `E_FORBIDDEN`
+- `404` -> `E_NOT_FOUND`
+- `408` -> `E_TIMEOUT` (retryable)
+- `409` -> `E_CONFLICT`
+- `429` -> `E_RATE_LIMITED` (retryable)
+- `5xx` -> `E_SERVER` (retryable)
+- connection refused / DNS / reset -> `E_NETWORK` (retryable)
+Map by the upstream's own error TYPE/status where available, not by sniffing
+the human-readable message text (substring matching misclassifies messages that
+merely contain words like "not found"). Keep this mapping in ONE function so the
+status->code->exit contract cannot drift between the output layer and the
+command layer. Codes that are declared but never reachable should be annotated
+as reserved so an agent does not plan for a branch that cannot occur.
+## 7. Write flow (dry-run -> confirm)
+A write command must first support `--dry-run`, returning a preview and a token:
+```json
+{
+  "ok": true,
+  "schema_version": "1.0",
+  "data": {
+    "preview": {
+      "changes": [
+        {
+          "action": "delete",
+          "resource": "mail",
+          "id": "123",
+          "before": {},
+          "after": null
+        }
+      ]
+    },
+    "confirm_token": "ct_9f2a...",
+    "expires_at": "2026-06-05T12:00:00Z"
+  },
+  "meta": {
+    "duration_ms": 0
+  }
+}
+```
+The second step carries the token to execute:
+```bash
+tool resource delete --id 123 --confirm ct_9f2a...
+```
+Confirm-token conventions:
+- The token must bind a hash of the operation content: command path, args, target resource ID, calling account, permission context.
+- The hash must be keyed (HMAC) with a machine-local secret (e.g. `~/.<tool>/confirm.secret`, created on first use, `0600`), so a token cannot be fabricated by recomputing a public hash — it must come from a real `--dry-run` on the same machine.
+- When a resource version is available, also bind it (version, etag, changekey, or updated_at) to prevent state drift.
+- The token must expire; `expires_at` is ISO 8601 UTC.
+- On expiry, changed args, or changed target state, execution returns `E_CONFLICT`, exit code 6.
+- With no token, return `E_CONFIRMATION_REQUIRED`, exit code 5.
+- dry-run must not cause external side effects, but may read state to build the preview.
+## 8. Query, pagination, and field selection
+Query commands support, by default:
+- `--fields <a,b,c>`: return only selected fields; when dotted paths are supported, declare it in reference.
+- `--compact`: strip JSON whitespace.
+- `--limit`: cap the number of returned items.
+- `--cursor` or `--offset`: pagination cursor or offset.
+Suggested pagination shape:
+```json
+{
+  "items": [],
+  "count": 0,
+  "next_cursor": null,
+  "has_more": false
+}
+```
+For offset-based upstreams, echo `offset` and return an explicit `next_offset`
+(the value to pass next, present only while `has_more` is true) so the agent
+pages deterministically instead of re-deriving `offset + count`:
+```json
+{
+  "items": [],
+  "count": 0,
+  "offset": 0,
+  "next_offset": 20,
+  "has_more": true
+}
+```
+When a list is silently capped (e.g. an auto-paginate ceiling), surface
+`truncated: true` rather than returning a short list that looks complete.
+Conventions:
+- All IDs are strings, even if numeric underneath.
+- All times are ISO 8601 UTC.
+- List order must be stable; declare the default sort in reference.
+- Query commands must not fall into an interactive prompt just because an optional filter is missing.
+## 9. Idempotency and concurrency safety
+Write commands should support idempotent semantics where possible:
+- Create-type commands should support `--request-id` or `--idempotency-key`. Where the upstream
+  honors an idempotency header (e.g. GitLab's `Idempotency-Key`), forward it; bind the key into the
+  confirm scope so the token matches only that key.
+- Retrying the same idempotency key must not create duplicate resources.
+- Update/delete commands should record the target resource version during dry-run.
+- If a version change is detected at confirm time, return `E_CONFLICT`.
+- **Confirm tokens are single-use.** Once a token has been accepted to execute a write, record its
+  fingerprint (e.g. under `~/.<tool>/confirm-consumed.json`, pruned by expiry) and reject any replay
+  with `E_CONFLICT` ("token already used; re-run `--dry-run`"). This gives agents safe-retry: a
+  confirmed write that times out cannot be blindly re-sent — the retry is rejected and re-running
+  `--dry-run` reveals the now-current state. This is the universal safe-retry mechanism for upstreams
+  that expose no resource version to bind. Mark consumed BEFORE the write executes (a crash mid-write
+  conservatively blocks the replay rather than risking a duplicate). A storage failure must degrade
+  gracefully and never block the write.
+- Batch writes should return per-item results; don't hide other items' status because one failed.
+Suggested batch-write result:
+```json
+{
+  "results": [
+    {
+      "id": "1",
+      "ok": true,
+      "action": "deleted"
+    },
+    {
+      "id": "2",
+      "ok": false,
+      "error": {
+        "code": "E_NOT_FOUND"
+      }
+    }
+  ],
+  "summary": {
+    "ok_count": 1,
+    "error_count": 1
+  }
+}
+```
+## 10. Sensitive data and auditing
+- password, token, secret, authorization header, cookie must not appear in stdout, stderr, error.details, or the audit log.
+- dry-run previews must redact sensitive fields.
+- reference/context/doctor must not leak plaintext credentials.
+- context may report whether credentials exist, but only as a boolean or redacted summary.
+- The audit log should record command path, redacted args, calling account, time, exit code, duration.
+- `--quiet` must not disable auditing.
+## 11. Self-description commands (reference / context / doctor / changelog)
+### reference
+Declares the tool's capabilities, commands, params, output schema, error codes, and permission levels, so an agent understands the tool first.
+Each command's `output_schema` MUST be machine-usable, not a stub. Use a string label that
+resolves to an entry in a top-level `schemas` catalog: `{ "shape": "object"|"array", "fields":
+[...], "untrusted_fields": [...] }`, with the field list enumerated from the command's actual
+returned data (the flatten structs / `*ToMap` builders) and `untrusted_fields` listing the
+attacker-controllable keys. Each command SHOULD also carry `examples`: one runnable invocation
+(write commands show the `--dry-run` then `--confirm` pair, dangerous commands include
+`--dangerous`). A guard test SHOULD assert every leaf command resolves to a non-empty schema and has
+at least one example, so `reference` cannot silently regress to a stub.
+```json
+{
+  "ok": true,
+  "schema_version": "1.0",
+  "data": {
+    "tool": "tool-name",
+    "version": "1.0.0",
+    "release_readiness": {
+      "level": "beta",
+      "fcc_required": true,
+      "fcc_status": "verified",
+      "mock_upstream_required": true,
+      "mock_upstream_status": "verified",
+      "live_smoke_required_for_stable": true,
+      "live_smoke_status": "missing",
+      "reason": "Stable requires recorded live smoke/E2E evidence.",
+      "required_evidence": [
+        "functional_contract_coverage_100",
+        "mock_upstream_contract_tests",
+        "recorded_live_smoke_for_stable"
+      ]
+    },
+    "commands": [
+      {
+        "path": "resource delete",
+        "type": "write",
+        "description": "Delete a resource",
+        "params": [
+          {
+            "name": "id",
+            "type": "string",
+            "required": true,
+            "multiple": false
+          }
+        ],
+        "output_schema": "deleted_resource",
+        "examples": [
+          "<tool> resource delete <id> --dry-run --compact",
+          "<tool> resource delete <id> --confirm <confirm_token> --compact"
+        ]
+      }
+    ],
+    "schemas": {
+      "deleted_resource": {
+        "shape": "object",
+        "fields": ["id", "status"],
+        "untrusted_fields": []
+      }
+    },
+    "exit_codes": {}
+  },
+  "meta": {
+    "duration_ms": 0
+  }
+}
+```
+`release_readiness` is the machine-readable release gate. It must appear in
+`reference` for every AI-native CLI:
+- `level`: `stable`, `beta`, or `unpublishable`.
+- `stable`: FCC is 100%, mock upstream/contract tests cover external behavior,
+  and at least one recorded live smoke/E2E run exists for the release candidate.
+- `beta`: FCC is 100% and mock upstream/contract tests exist, but live
+  smoke/E2E evidence is missing or explicitly not available yet.
+- `unpublishable`: any public behavior lacks command-level coverage, or mock
+  upstream/contract tests cover only happy paths while failure/auth/pagination/
+  empty/rate-limit behavior remains untested.
+- `fcc_status`, `mock_upstream_status`, and `live_smoke_status` use
+  `verified`, `missing`, `not_applicable`, or `unknown`; `stable` may not use
+  `missing` or `unknown` for any required item.
+- `required_evidence[]` names the evidence an agent or release script should
+  inspect before trusting the level.
+### context
+Reports the current runtime, config, target, and credential status.
+```json
+{
+  "ok": true,
+  "schema_version": "1.0",
+  "data": {
+    "env": "prod",
+    "account": "user@example.com",
+    "config": {},
+    "credentials": {
+      "configured": true
+    }
+  },
+  "meta": {
+    "duration_ms": 0
+  }
+}
+```
+### doctor
+Environment and risk check-up; each item gives an actionable fix.
+```json
+{
+  "ok": true,
+  "schema_version": "1.0",
+  "data": {
+    "checks": [
+      {
+        "check": "auth",
+        "status": "pass",
+        "fix": null
+      },
+      {
+        "check": "network",
+        "status": "fail",
+        "fix": "set HTTP_PROXY or check VPN"
+      },
+      {
+        "check": "release_readiness",
+        "status": "warn",
+        "fix": "record live smoke/E2E evidence before declaring stable"
+      }
+    ]
+  },
+  "meta": {
+    "duration_ms": 0
+  }
+}
+```
+`doctor` must include `check: "release_readiness"` with the same release level
+reported by `reference`. Use `pass` for `stable`, `warn` for intentional `beta`,
+and `fail` for `unpublishable` or for a declared `stable` state with missing
+evidence. The check should include an actionable `fix` when the status is not
+`pass`.
+### changelog
+Reports **what changed between versions** so an agent that just self-updated can refresh its knowledge instead of reusing stale patterns. This is the time-axis complement to `reference` (which describes current capabilities).
+```bash
+tool changelog                    # all version changes
+tool changelog --since 1.0.3      # only versions newer than 1.0.3
+```
+```json
+{
+  "ok": true,
+  "schema_version": "1.0",
+  "data": {
+    "current_version": "1.1.0",
+    "since": "1.0.3",
+    "entries": [
+      {
+        "version": "1.1.0",
+        "date": "2026-06-07",
+        "changes": {
+          "added": [
+            "..."
+          ],
+          "changed": [
+            "..."
+          ],
+          "fixed": []
+        }
+      }
+    ]
+  },
+  "meta": {
+    "duration_ms": 0
+  }
+}
+```
+Conventions:
+- **Single source of truth**: `changelog` output is derived from `CHANGELOG.md` (embedded into the binary at build time by `## [version]` section); no separate data maintained. Same source as release notes.
+- `--since <version>` returns only entries strictly newer than that version, for an agent that "last saw version X" to pull the delta.
+- Change categories follow Keep a Changelog: `added` / `changed` / `fixed` / `deprecated` / `removed` / `security`.
+- After a successful self-update, the tool should hint the agent to run `changelog --since <old version>` (see §14).
+## 12. Command design conventions
+1. Use the shortest command that completes a clear task; reduce combinatorial complexity.
+2. Query commands support `--fields` and `--compact` by default to cut tokens.
+3. Write commands must support `--dry-run` and `--confirm`.
+4. Naming uses `<noun> <verb>` or `<verb> <noun>` style, consistent across the tool.
+5. Don't require agents to parse help text; `--help` is for humans, machine capability is exposed via `reference`.
+6. All times ISO 8601 UTC; all IDs strings.
+7. On failure, return a structured error rather than a half-finished success payload.
+8. Avoid ambiguous params; booleans are flags, enums are bounded choices.
+## 13. Functional contract coverage and release gate
+Functional Contract Coverage (FCC) is the release blocker: every public behavior
+an agent can rely on must have automated command-level test coverage. Numeric
+line or branch coverage is useful engineering telemetry, but it is secondary and
+must not be used as a substitute for missing functional contract tests.
+A public functional contract is anything declared in:
+- `README.md` / `README_zh.md`, `SKILL.md`, or Skill reference pages;
+- `tool reference`, `--help`, `context`, `doctor`, `changelog`, or `update`
+  output;
+- JSON envelope fields, command output schemas, global flags, error codes, exit
+  codes, retryability, and stdout/stderr behavior;
+- documented config/env variables, credential handling, write safety, update
+  verification, Skill sync, and `_untrusted` security guarantees.
+Required coverage for each public command or contract:
+- success path;
+- missing/invalid arguments;
+- missing config, missing auth, or permission failure when applicable;
+- upstream API failure, network failure, rate limit, or timeout when applicable;
+- JSON envelope shape, output schema, exit code, and stderr/stdout boundary;
+- non-interactive behavior: no prompts, no blocking, and write commands use
+  `--dry-run` -> `--confirm <token>`;
+- regression test for every bug fix that changes observable behavior.
+What `FCC = 100%` means:
+- every command/flag/output/error behavior listed in the public contract is
+  mapped to at least one automated test, or explicitly marked non-applicable;
+- command-level tests validate the CLI boundary, not just internal helpers;
+- broad generated code, version constants, build metadata, or unreachable
+  platform guards may be excluded from numeric coverage, but not from FCC if
+  they are documented public behavior;
+- a release cannot be tagged while known FCC gaps remain;
+- `fcc_status: "verified"` must be machine-backed by an enumeration guard
+  test: enumerate every leaf command from live `reference` output and assert
+  each one is invoked by at least one command-level test. The guard skips
+  while the status is honestly declared `missing`, and fails if the claim is
+  flipped to `verified` without the coverage (the template ships this guard).
+CI should run the unit and command-level suites for every PR. Numeric coverage
+thresholds may ratchet upward over time per repository, but the release standard
+is absolute: public functional contracts must be covered.
+### Release readiness levels
+Release readiness is deliberately stricter than "tests pass":
+- **Stable**: FCC is 100%; mock upstream/contract tests cover success,
+  validation, config/auth/permission failures, upstream/network/rate-limit/
+  timeout failures, empty results, pagination, output schema, exit codes, and
+  stdout/stderr boundaries; at least one live smoke/E2E run has been recorded
+  for the release candidate.
+- **Beta**: FCC is 100% and mock upstream/contract tests meet the same
+  behavioral breadth, but recorded live smoke/E2E evidence is missing or the
+  project explicitly declares that live E2E is not available yet.
+- **Unpublishable**: any public command/flag/output/error behavior lacks
+  command-level coverage, or mock upstream tests only cover happy paths.
+`reference.release_readiness` and `doctor.checks[]` are the machine-readable
+surface for this gate. A repository may choose not to publish `beta` artifacts,
+but it must not describe itself as `stable` without the live evidence above.
+## 14. Versioning and compatibility
+- `schema_version` is the output schema version, not the tool version.
+- A breaking schema change bumps the major version, e.g. `1.x` -> `2.0`.
+- Non-breaking added fields may keep the major version.
+- Deprecated fields should keep a compatibility window and be marked deprecated in reference.
+- Compatibility aliases may exist but should not be the recommended usage in new docs.
+- Agents should rely on `reference`, not `--help` or README.
+### Version negotiation (tool version ↔ Skill expectation)
+A Skill is a snapshot of the capabilities the day it was written; once the binary version drifts, things misalign: a Skill written for v1.1 against a v1.0 binary will silently call commands that don't exist.
+- The tool must report its own version: `tool --version` and `context.data.version`.
+- The Skill declares a minimum compatible version in frontmatter (see SKILL-SPEC `requires.min_version`).
+- `doctor` should include a check "does the current version meet the declared minimum"; if not, give a `fix` (upgrade command), status `fail`.
+### Self-update and Skill-sync loop
+For tools with `self-update`, after a successful update they **must close both
+refresh loops**:
+1. the binary/package is current;
+2. the bundled Agent Skill directory is current, with the same end state as
+   running `npx skills add <repo> -y -g`.
+The user-facing Skill install command stays `npx skills add ...`; the binary
+must not expose a separate `install-skill` command. During update, however, the
+tool owns the full lifecycle and must either sync the entire `skills/<tool>/`
+directory or return an explicit `skill_sync_status` and `skill_sync_command`
+that the agent can execute before using new behavior.
+Required update contract:
+- `update --check` is read-only. It reports current/target versions,
+  install method, whether a binary/package update is available, whether Skill
+  sync is needed or supported, and signature/checksum availability.
+- `update --dry-run` returns a preview with every local change: binary/package
+  update, Skill directory sync, signature/checksum verification, and the
+  `confirm_token`.
+- `update --confirm <token>` performs the update. It must verify release
+  integrity before replacing files or running a package manager update.
+- If update succeeds, return `previous_version`, `current_version`,
+  `skill_sync_status`, and enough verification metadata for the agent to audit
+  what happened.
+- If binary/package update succeeds but Skill sync fails, return a non-success
+  or partial-success status with `skill_sync_command`; the agent must not use
+  newly documented behavior until the Skill sync has completed.
+Version notification contract:
+- `update --check` actively checks the latest release and refreshes the local
+  update notice cache.
+- `doctor` may actively check with a short timeout; network failure must not
+  make `doctor` fail by itself.
+- `context` and `--help` only read the local cache and must not contact remote
+  registries or GitHub.
+- When an update is available, JSON command data includes `notices[]` with
+  `type: "update_available"`, current/latest versions, install method,
+  `recommended_command`, release URL when known, checked-at timestamp, and
+  machine-readable next steps. Text/help output may append one concise hint.
+Release verification baseline:
+- Verify the archive/package against `checksums.txt`; checksum mismatch, missing
+  checksum file, or missing archive entry fails closed.
+- Signed releases should sign `checksums.txt` with Sigstore/Cosign keyless
+  signing from the tagged GitHub Actions release workflow. Verifiers should
+  validate the bundle against the expected repository workflow identity and the
+  GitHub OIDC issuer.
+- Update results carry `signature_status` (a short string describing where
+  release integrity verification happened: e.g. `verified`, `not_checked`,
+  `handled_by_npm_installer`, `manual_release_verification_required`) and
+  `signature_verified` (true only when local Sigstore verification actually
+  ran and succeeded). Never imply checksum verification is a signature.
+- After `update --confirm <token>` succeeds, return `previous_version` and `current_version` in `data`.
+- Also hint in the result: `run "changelog --since <previous_version>" to see what changed`.
+- Agent convention: after self-update, before continuing, read `changelog --since <old version>` (see the SKILL-SPEC recipe).
+## 15. Optional patterns (enable as needed)
+These three patterns are **not for everyone**: implement them if your tool needs them, ignore them otherwise — zero overhead. They let the spec scale with tool complexity — a simple tool stays light, a complex tool need not reinvent the wheel. Each is marked "when applicable."
+### 15.1 Credential lifecycle (when tokens expire)
+**When applicable**: credentials are not static but expire / need refresh — OAuth access_token (WeChat Official Account ~2h), cookie / session (Xiaohongshu), temporary STS credentials, etc. Tools with static username/password skip this section.
+- Beyond "is it configured," `context.data.credentials` should report **validity and expiry** (redacted):
+  ```json
+  {
+    "credentials": {
+      "configured": true,
+      "valid": true,
+      "expires_at": "2026-06-07T12:00:00Z",
+      "refreshable": true
+    }
+  }
+  ```
+- When a token is expired and cannot auto-refresh, the operation returns `E_AUTH` (exit 4), with `details` indicating re-auth is needed.
+- Tools that can auto-refresh should do so **transparently**, not bothering the agent; degrade to `E_AUTH` only if refresh fails.
+- `doctor` adds a `check: "credentials"` item; for near-expiry give `warn` + a renew `fix`.
+- Refresh tokens and secrets are always redacted — never in stdout / stderr / details.
+### 15.2 Async job lifecycle (long jobs: submit -> poll -> fetch result)
+**When applicable**: the operation can't return a result synchronously — async SQL execution / approval (Archery), bulk send, scrape/crawl jobs, large exports. Commands that return results synchronously skip this section.
+- The submit command returns a `job_id` and status immediately, without blocking:
+  ```json
+  {
+    "ok": true,
+    "schema_version": "1.0",
+    "data": {
+      "job_id": "job_abc123",
+      "status": "pending",
+      "poll": "tool job status --id job_abc123",
+      "result": "tool job result --id job_abc123"
+    },
+    "meta": { "duration_ms": 12 }
+  }
+  ```
+- Status queries return a stable enum: `pending` / `running` / `succeeded` / `failed` / `cancelled`, with progress (e.g. `progress`, `eta_seconds`).
+- Result and status are fetched separately: after `succeeded`, use the `result` command to pull data (large results via NDJSON / `--format raw`).
+- A `failed` result uses the standard error envelope; `retryable` indicates whether the whole job can be retried.
+- Submission of a write-type long job still goes through `dry-run → confirm`; the `job_id` is created only after confirm.
+### 15.3 Human-in-the-loop checkpoints (when a human must scan / solve captcha / approve)
+**When applicable**: a step mid-flow must be completed by a human — QR login / captcha (Xiaohongshu), approver sign-off (Archery), secondary confirmation. Fully automated tools skip this section.
+- When stuck at a human step, **don't block, don't guess** — return a dedicated signal so the agent hands off to the user:
+  ```json
+  {
+    "ok": false,
+    "schema_version": "1.0",
+    "error": {
+      "code": "E_HUMAN_REQUIRED",
+      "message": "Scan the QR code to continue",
+      "details": { "action": "scan_qr", "resume": "tool login resume --id sess_1", "qr_path": "/tmp/qr.png" },
+      "retryable": false
+    },
+    "meta": { "duration_ms": 30 }
+  }
+  ```
+- `E_HUMAN_REQUIRED` uses exit code `9` (added beyond the existing 0–8; not reusing `4`, to distinguish "bad credentials" from "waiting on a human action").
+- `details.action` is a stable enum describing what the human must do; `details.resume` gives the command to continue after the human is done.
+- Agent convention: on `E_HUMAN_REQUIRED` → relay `message` and the required action to the user → wait for them → run `resume`; do not auto-retry.
+## 16. Design checklist
+> Items marked `(optional)` only apply when the corresponding optional pattern is enabled.
+- [ ] Default `--format json`
+- [ ] stdout contains only valid JSON / NDJSON, no pollution
+- [ ] Logs and progress all go to stderr
+- [ ] Success/failure share one envelope, with `ok` and `schema_version`
+- [ ] `error` has semantic `code`, `details`, `retryable`
+- [ ] Exit codes tiered and consistent with `retryable`
+- [ ] Write commands have the dry-run / confirm-token loop
+- [ ] Confirm token binds operation args, account, permission context, resource version
+- [ ] Provides `reference` / `context` / `doctor`
+- [ ] Provides `changelog [--since]`, same source as CHANGELOG/release-notes
+- [ ] Tool reports its own version (`--version` and `context.version`)
+- [ ] `reference` reports `release_readiness`, and `doctor` checks it
+- [ ] (with self-update) `update --check` / `--dry-run` / `--confirm` are implemented
+- [ ] (with self-update) release integrity is verified, and signature status is explicit
+- [ ] (with self-update) whole Skill directory sync is part of the update result
+- [ ] (with self-update) post-update returns previous/current version and hints to read changelog
+- [ ] Query commands support `fields` / `compact`
+- [ ] List commands support pagination or explicitly state none is needed
+- [ ] Functional Contract Coverage is 100% for public README / Skill / reference / help / context / doctor / changelog / update behavior
+- [ ] Stable releases have recorded live smoke/E2E evidence; otherwise the tool declares `beta`
+- [ ] All times ISO 8601 UTC
+- [ ] All IDs strings
+- [ ] Secrets redacted end to end
+- [ ] Schema changes have a versioning/compat policy
+- [ ] stdout/stderr are UTF-8 without BOM
+- [ ] (optional · expiring tokens) `context`/`doctor` report credential validity and expiry; refresh failure degrades to `E_AUTH`
+- [ ] (optional · long jobs) submit returns `job_id` + status enum, status/result separated
+- [ ] (optional · human needed) stuck human steps return `E_HUMAN_REQUIRED` (exit 9) + `resume`, no auto-retry