hatch3r 1.7.1 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (189) hide show
  1. package/README.md +38 -12
  2. package/agents/hatch3r-a11y-auditor.md +4 -0
  3. package/agents/hatch3r-architect.md +4 -0
  4. package/agents/hatch3r-ci-watcher.md +4 -0
  5. package/agents/hatch3r-context-rules.md +26 -6
  6. package/agents/hatch3r-creator.md +6 -1
  7. package/agents/hatch3r-dependency-auditor.md +4 -0
  8. package/agents/hatch3r-devops.md +4 -0
  9. package/agents/hatch3r-docs-writer.md +4 -0
  10. package/agents/hatch3r-fixer.md +4 -0
  11. package/agents/hatch3r-handoff-loader.md +243 -0
  12. package/agents/hatch3r-handoff-preparer.md +134 -0
  13. package/agents/hatch3r-implementer.md +12 -0
  14. package/agents/hatch3r-learnings-loader.md +5 -1
  15. package/agents/hatch3r-lint-fixer.md +4 -0
  16. package/agents/hatch3r-perf-profiler.md +8 -0
  17. package/agents/hatch3r-researcher.md +4 -0
  18. package/agents/hatch3r-reviewer.md +94 -0
  19. package/agents/hatch3r-security-auditor.md +24 -0
  20. package/agents/hatch3r-test-writer.md +4 -0
  21. package/agents/modes/requirements-elicitation.md +4 -1
  22. package/agents/modes/similar-implementation.md +6 -0
  23. package/agents/modes/user-flows.md +76 -0
  24. package/agents/shared/quality-charter.md +128 -0
  25. package/agents/shared/user-content-templates.md +31 -1
  26. package/commands/hatch3r-agent-customize.md +4 -0
  27. package/commands/hatch3r-api-spec.md +7 -0
  28. package/commands/hatch3r-benchmark.md +7 -0
  29. package/commands/hatch3r-board-fill.md +8 -0
  30. package/commands/hatch3r-board-groom.md +4 -0
  31. package/commands/hatch3r-board-init.md +51 -0
  32. package/commands/hatch3r-board-pickup.md +8 -0
  33. package/commands/hatch3r-board-refresh.md +4 -0
  34. package/commands/hatch3r-board-shared.md +6 -6
  35. package/commands/hatch3r-bug-plan.md +7 -0
  36. package/commands/hatch3r-codebase-map.md +8 -0
  37. package/commands/hatch3r-command-customize.md +4 -0
  38. package/commands/hatch3r-context-health.md +5 -0
  39. package/commands/hatch3r-create.md +59 -4
  40. package/commands/hatch3r-debug.md +7 -0
  41. package/commands/hatch3r-dep-audit.md +4 -0
  42. package/commands/hatch3r-feature-plan.md +7 -0
  43. package/commands/hatch3r-handoff.md +133 -0
  44. package/commands/hatch3r-healthcheck.md +4 -0
  45. package/commands/hatch3r-hooks.md +4 -0
  46. package/commands/hatch3r-learn.md +16 -0
  47. package/commands/hatch3r-migration-plan.md +7 -0
  48. package/commands/hatch3r-onboard.md +7 -0
  49. package/commands/hatch3r-pr-resolve.md +12 -1
  50. package/commands/hatch3r-project-spec.md +8 -0
  51. package/commands/hatch3r-quick-change.md +11 -2
  52. package/commands/hatch3r-recipe.md +4 -0
  53. package/commands/hatch3r-refactor-plan.md +7 -0
  54. package/commands/hatch3r-release.md +5 -0
  55. package/commands/hatch3r-revision.md +7 -0
  56. package/commands/hatch3r-roadmap.md +8 -0
  57. package/commands/hatch3r-rule-customize.md +4 -0
  58. package/commands/hatch3r-security-audit.md +4 -0
  59. package/commands/hatch3r-skill-customize.md +4 -0
  60. package/commands/hatch3r-test-plan.md +7 -0
  61. package/commands/hatch3r-workflow.md +11 -1
  62. package/dist/cli/index.js +4814 -1130
  63. package/dist/cli/index.js.map +1 -1
  64. package/package.json +10 -5
  65. package/rules/hatch3r-accessibility-standards.md +21 -0
  66. package/rules/hatch3r-accessibility-standards.mdc +21 -0
  67. package/rules/hatch3r-agent-orchestration-detail.md +3 -0
  68. package/rules/hatch3r-agent-orchestration-detail.mdc +3 -0
  69. package/rules/hatch3r-agent-orchestration.md +34 -3
  70. package/rules/hatch3r-agent-orchestration.mdc +34 -3
  71. package/rules/hatch3r-ai-evals.md +158 -0
  72. package/rules/hatch3r-ai-evals.mdc +154 -0
  73. package/rules/hatch3r-ai-ux-patterns.md +131 -0
  74. package/rules/hatch3r-ai-ux-patterns.mdc +127 -0
  75. package/rules/hatch3r-api-design.md +67 -9
  76. package/rules/hatch3r-api-design.mdc +67 -9
  77. package/rules/hatch3r-api-versioning.md +119 -0
  78. package/rules/hatch3r-api-versioning.mdc +115 -0
  79. package/rules/hatch3r-auth-patterns.md +170 -0
  80. package/rules/hatch3r-auth-patterns.mdc +166 -0
  81. package/rules/hatch3r-component-conventions.md +30 -0
  82. package/rules/hatch3r-component-conventions.mdc +30 -0
  83. package/rules/hatch3r-container-hardening.md +131 -0
  84. package/rules/hatch3r-container-hardening.mdc +127 -0
  85. package/rules/hatch3r-contract-testing.md +117 -0
  86. package/rules/hatch3r-contract-testing.mdc +113 -0
  87. package/rules/hatch3r-deep-context.md +2 -0
  88. package/rules/hatch3r-deep-context.mdc +2 -0
  89. package/rules/hatch3r-dependency-management.md +73 -1
  90. package/rules/hatch3r-dependency-management.mdc +72 -0
  91. package/rules/hatch3r-design-system-detection.md +142 -0
  92. package/rules/hatch3r-design-system-detection.mdc +138 -0
  93. package/rules/hatch3r-event-schema-evolution.md +90 -0
  94. package/rules/hatch3r-event-schema-evolution.mdc +86 -0
  95. package/rules/hatch3r-handoff-readiness.md +45 -0
  96. package/rules/hatch3r-handoff-readiness.mdc +40 -0
  97. package/rules/hatch3r-i18n.md +13 -0
  98. package/rules/hatch3r-i18n.mdc +13 -0
  99. package/rules/hatch3r-iteration-summary.md +2 -0
  100. package/rules/hatch3r-iteration-summary.mdc +2 -0
  101. package/rules/hatch3r-migrations.md +61 -16
  102. package/rules/hatch3r-migrations.mdc +61 -16
  103. package/rules/hatch3r-observability-logging.md +1 -1
  104. package/rules/hatch3r-observability-logging.mdc +1 -1
  105. package/rules/hatch3r-observability-metrics.md +1 -1
  106. package/rules/hatch3r-observability-metrics.mdc +1 -1
  107. package/rules/hatch3r-observability-tracing-detail.md +8 -149
  108. package/rules/hatch3r-observability-tracing-detail.mdc +7 -149
  109. package/rules/hatch3r-observability-tracing.md +154 -6
  110. package/rules/hatch3r-observability-tracing.mdc +154 -6
  111. package/rules/hatch3r-observability.md +1 -0
  112. package/rules/hatch3r-observability.mdc +1 -0
  113. package/rules/hatch3r-operability.md +149 -0
  114. package/rules/hatch3r-operability.mdc +145 -0
  115. package/rules/hatch3r-passkey-server.md +181 -0
  116. package/rules/hatch3r-passkey-server.mdc +177 -0
  117. package/rules/hatch3r-progressive-delivery.md +120 -0
  118. package/rules/hatch3r-progressive-delivery.mdc +116 -0
  119. package/rules/hatch3r-resilience-patterns.md +154 -0
  120. package/rules/hatch3r-resilience-patterns.mdc +150 -0
  121. package/rules/hatch3r-secrets-management.md +29 -0
  122. package/rules/hatch3r-secrets-management.mdc +29 -0
  123. package/rules/hatch3r-testing.md +139 -43
  124. package/rules/hatch3r-testing.mdc +139 -43
  125. package/rules/hatch3r-ux-states-and-flows.md +149 -0
  126. package/rules/hatch3r-ux-states-and-flows.mdc +145 -0
  127. package/skills/hatch3r-a11y-audit/SKILL.md +14 -0
  128. package/skills/hatch3r-agent-customize/SKILL.md +10 -0
  129. package/skills/hatch3r-ai-feature/SKILL.md +136 -0
  130. package/skills/hatch3r-api-spec/SKILL.md +73 -0
  131. package/skills/hatch3r-architecture-review/SKILL.md +14 -0
  132. package/skills/hatch3r-bug-fix/SKILL.md +5 -0
  133. package/skills/hatch3r-ci-pipeline/SKILL.md +14 -0
  134. package/skills/hatch3r-cli-aichat/SKILL.md +84 -0
  135. package/skills/hatch3r-cli-ast-grep/SKILL.md +85 -0
  136. package/skills/hatch3r-cli-az-devops/SKILL.md +89 -0
  137. package/skills/hatch3r-cli-bat/SKILL.md +85 -0
  138. package/skills/hatch3r-cli-comby/SKILL.md +85 -0
  139. package/skills/hatch3r-cli-csvkit/SKILL.md +84 -0
  140. package/skills/hatch3r-cli-delta/SKILL.md +86 -0
  141. package/skills/hatch3r-cli-difftastic/SKILL.md +84 -0
  142. package/skills/hatch3r-cli-docker/SKILL.md +89 -0
  143. package/skills/hatch3r-cli-duckdb/SKILL.md +84 -0
  144. package/skills/hatch3r-cli-fd/SKILL.md +85 -0
  145. package/skills/hatch3r-cli-fzf/SKILL.md +84 -0
  146. package/skills/hatch3r-cli-gh/SKILL.md +90 -0
  147. package/skills/hatch3r-cli-glab/SKILL.md +89 -0
  148. package/skills/hatch3r-cli-jq/SKILL.md +89 -0
  149. package/skills/hatch3r-cli-lazygit/SKILL.md +78 -0
  150. package/skills/hatch3r-cli-llm/SKILL.md +84 -0
  151. package/skills/hatch3r-cli-miller/SKILL.md +84 -0
  152. package/skills/hatch3r-cli-mods/SKILL.md +84 -0
  153. package/skills/hatch3r-cli-overview/SKILL.md +60 -0
  154. package/skills/hatch3r-cli-playwright/SKILL.md +89 -0
  155. package/skills/hatch3r-cli-podman/SKILL.md +84 -0
  156. package/skills/hatch3r-cli-qsv/SKILL.md +91 -0
  157. package/skills/hatch3r-cli-ripgrep/SKILL.md +85 -0
  158. package/skills/hatch3r-cli-rtk/SKILL.md +91 -0
  159. package/skills/hatch3r-cli-sd/SKILL.md +85 -0
  160. package/skills/hatch3r-cli-stagehand/SKILL.md +111 -0
  161. package/skills/hatch3r-cli-taplo/SKILL.md +84 -0
  162. package/skills/hatch3r-cli-yq/SKILL.md +85 -0
  163. package/skills/hatch3r-cli-zstd/SKILL.md +85 -0
  164. package/skills/hatch3r-command-customize/SKILL.md +10 -0
  165. package/skills/hatch3r-context-health/SKILL.md +14 -0
  166. package/skills/hatch3r-cost-tracking/SKILL.md +14 -0
  167. package/skills/hatch3r-customize/SKILL.md +17 -0
  168. package/skills/hatch3r-dep-audit/SKILL.md +14 -0
  169. package/skills/hatch3r-design-system-detect/SKILL.md +164 -0
  170. package/skills/hatch3r-feature/SKILL.md +2 -0
  171. package/skills/hatch3r-gh-agentic-workflows/SKILL.md +13 -0
  172. package/skills/hatch3r-handoff-prepare/SKILL.md +160 -0
  173. package/skills/hatch3r-handoff-resume/SKILL.md +171 -0
  174. package/skills/hatch3r-incident-response/SKILL.md +14 -0
  175. package/skills/hatch3r-issue-workflow/SKILL.md +5 -0
  176. package/skills/hatch3r-logical-refactor/SKILL.md +14 -0
  177. package/skills/hatch3r-migration/SKILL.md +14 -0
  178. package/skills/hatch3r-observability-verify/SKILL.md +134 -0
  179. package/skills/hatch3r-perf-audit/SKILL.md +14 -0
  180. package/skills/hatch3r-pr-creation/SKILL.md +14 -0
  181. package/skills/hatch3r-qa-validation/SKILL.md +18 -0
  182. package/skills/hatch3r-recipe/SKILL.md +14 -0
  183. package/skills/hatch3r-refactor/SKILL.md +14 -0
  184. package/skills/hatch3r-release/SKILL.md +14 -0
  185. package/skills/hatch3r-reliability-verify/SKILL.md +146 -0
  186. package/skills/hatch3r-rule-customize/SKILL.md +10 -0
  187. package/skills/hatch3r-skill-customize/SKILL.md +10 -0
  188. package/skills/hatch3r-ui-ux-verify/SKILL.md +138 -0
  189. package/skills/hatch3r-visual-refactor/SKILL.md +15 -1
@@ -5,9 +5,19 @@ tags: [customize]
5
5
  quality_charter: agents/shared/quality-charter.md
6
6
  efficiency_patterns: agents/shared/efficiency-patterns.md
7
7
  cache_friendly: true
8
+ redirect_to: hatch3r-customize
8
9
  ---
9
10
  # Agent Customization
10
11
 
11
12
  > **This skill has been consolidated.** Use the `hatch3r-customize` skill with `type: agent`.
12
13
 
13
14
  For agent-specific reference (model resolution, protected agents, YAML schema), see the `hatch3r-agent-customize` command.
15
+
16
+ ## Rejected Merge Alternative (D16.3 add-vs-remove bias)
17
+
18
+ Per `governance/audit/domains/D16-compound-system.md` SA 16.3, the default recommendation on functional overlap is MERGE rather than removal. Full deletion of this redirect file was rejected for two reasons:
19
+
20
+ 1. **Preserves UX entry points.** Users typed `/h4tcher-agent-customize` or referenced the id `hatch3r-agent-customize` (per CHANGELOG.md, `website/docs/reference/configuration.md:325`, `docs/model-selection.md:158`) before consolidation. Deleting the id breaks those entry points without a redirect target.
21
+ 2. **Signals umbrella canonicality.** The `redirect_to: hatch3r-customize` frontmatter field marks `hatch3r-customize` as the single source of truth — tooling, audit scans, and adapters can resolve any redirect to the canonical without re-reading body prose.
22
+
23
+ The 13-LOC redirect cost is paid once per type; the umbrella body lives in `skills/hatch3r-customize/SKILL.md`.
@@ -0,0 +1,136 @@
1
+ ---
2
+ id: hatch3r-ai-feature
3
+ type: skill
4
+ description: Eval-driven development workflow for shipping AI features — write eval before prompt, measure, iterate, ship with caching + cost telemetry + model fallback + hallucination SLI
5
+ tags: [implementation, ai]
6
+ quality_charter: agents/shared/quality-charter.md
7
+ efficiency_patterns: agents/shared/efficiency-patterns.md
8
+ cache_friendly: true
9
+ ---
10
+ # AI Feature Workflow (Eval-Driven)
11
+
12
+ ## Quick Start
13
+
14
+ Run this skill before shipping any LLM-driven feature. It defines the canonical eval-driven loop (write eval, write prompt, measure, iterate) and the production-readiness gates. Skipping any of the 9 steps = the feature is not done.
15
+
16
+ This skill is the implementation counterpart to `rules/hatch3r-ai-evals.md` (backend governance) and `rules/hatch3r-ai-ux-patterns.md` (UI governance). The rules define the bar; this skill defines the route to clearing the bar.
17
+
18
+ ## Step 0 — Detect Ambiguity (P8 B1)
19
+
20
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: task class (classification vs open-ended vs RAG vs agentic), model pin (Sonnet vs Opus vs Haiku), eval threshold values, budget per request (cost cap), and fallback policy (graceful degrade vs hard fail).
21
+
22
+ ## Fan-out Discipline (P8 B2)
23
+
24
+ This skill delegates per task size:
25
+ - Tier 1 (trivial single-file): inline execution acceptable.
26
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
27
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
28
+
29
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
30
+
31
+ ## Step 1: Define the task and success criteria
32
+
33
+ - Write down what "right" looks like in one paragraph — the user input class, the expected output shape, the failure modes you want to catch.
34
+ - Hand-author 20+ golden examples in `evals/<feature>/golden.jsonl` with `input` + `expected_output` (or a graded rubric when the task is open-ended).
35
+ - Save the threshold per metric in `evals/<feature>/thresholds.json`. Without an explicit threshold, "passing the eval" is undefined.
36
+ - Cross-reference `rules/hatch3r-ai-evals.md` Golden Dataset Versioning for filename and refresh policy.
37
+ - Source diversity matters more than count beyond 20 — include adversarial inputs, edge cases from prior incidents, and at least 3 examples per known input class.
38
+ - Label every example with the input class so per-class accuracy is computable in Step 4.
39
+
40
+ ## Step 2: Pick eval tool and metric
41
+
42
+ Match the task class to the tool:
43
+
44
+ - Classification → promptfoo with exact-match assertions.
45
+ - Open-ended generation → DeepEval or braintrust with LLM-as-judge + a 50-example human-labeled calibration set.
46
+ - Retrieval/RAG → RAGAS (context_precision, context_recall, faithfulness, answer_relevance).
47
+ - Tool-use / agentic → Inspect or BFCL-style harness.
48
+ - Safety/red-team → Garak or PyRIT scheduled weekly.
49
+
50
+ Pin the choice in `evals/README.md` so the next agent run picks the same tool.
51
+
52
+ ## Step 3: Write the prompt
53
+
54
+ - Author the prompt at `prompts/<feature>/v1.md` with frontmatter `{ id, version: 1, model_pinned, eval_set }`.
55
+ - Commit; record SHA-256 hash in `evals/<feature>/thresholds.json`.
56
+ - If the system prompt + tool definitions + RAG context exceed 1024 tokens, apply Anthropic `cache_control` breakpoints (or rely on OpenAI's automatic prefix cache for ≥1024-token deterministic prefixes). Longest-TTL block first.
57
+
58
+ ## Step 4: Run eval; iterate prompt
59
+
60
+ - Run `npx promptfoo eval` (or the chosen tool's CLI) against the golden set.
61
+ - Read the per-metric report. If below threshold, modify the prompt, bump to `v2.md`, re-hash, re-run.
62
+ - Treat each prompt revision like a code commit — small, named, testable.
63
+ - Stop iterating when every metric clears its threshold in `thresholds.json` and the pairwise win-rate vs the prior version is >=55%.
64
+ - Capture the eval report artifact in CI so the PR reviewer can read per-case pass/fail without re-running the suite locally.
65
+ - If iteration count exceeds 10 versions without convergence, escalate — the task may need decomposition (one sub-prompt per input class) or a retrieval-grounded approach.
66
+
67
+ ## Step 5: Wire production telemetry
68
+
69
+ - Per-request log line emits `model`, `tokens_in`, `tokens_out`, `cache_hit`, `cached_tokens`, `cost_usd`, `latency_ms`, `prompt_version`, `prompt_hash`, `cost_center`.
70
+ - Per-request OpenTelemetry span follows the OTel GenAI semantic conventions (`gen_ai.*` attributes).
71
+ - Aggregate dashboards: cost-per-request, hallucination_rate, citation_precision, refusal_rate, cache_hit_ratio.
72
+ - Cross-reference `skills/hatch3r-observability-verify` for the per-feature dashboard checklist.
73
+
74
+ ## Step 6: Wire fallback chain
75
+
76
+ - Primary model (e.g. Sonnet 4.7) → secondary (cheaper/faster, e.g. Haiku 4.5) → static fallback (cached or canned).
77
+ - Wrap in circuit-breaker + retry-with-decorrelated-jitter — cross-reference `rules/hatch3r-resilience-patterns.md` (Slice 8) for the primitives.
78
+ - Run the eval suite against the secondary path too — a silent quality cliff between primary and secondary is a regression.
79
+ - Static fallback text names the failure mode in user-readable language ("AI is briefly unavailable — retry in a minute") rather than dumping a stack trace into the UI.
80
+
81
+ ## Step 7: Add CI gate
82
+
83
+ - Eval runs on every PR that touches `**/prompts/**`, `**/rag/**`, `**/ai/**`, `**/llm/**`.
84
+ - PR blocks when any metric drops below the threshold in `evals/<feature>/thresholds.json`.
85
+ - Model-version upgrade (Sonnet to Opus, 4.6 to 4.7) triggers a full eval with a 5% accuracy budget; cross over 5% requires a named-reviewer sign-off + 24-hour canary at 5% traffic.
86
+
87
+ ## Step 8: Production verification
88
+
89
+ First 24 hours after deploy, monitor:
90
+
91
+ - `ai.hallucination_rate` — SLO <5% on golden set; alert if 7-day rolling rate >5%.
92
+ - `ai.refusal_rate` — track false-positive refusal rate separately.
93
+ - `ai.cost_per_request_usd` — p50/p95/p99 vs feature budget; alert at 50%/75%/90% of monthly budget.
94
+ - `ai.latency_ms` — first-token-latency p95 + total-response-latency p99.
95
+ - `ai.cache_hit_ratio` — should match the dev-environment baseline within 10%; a drop indicates prefix drift.
96
+ - `ai.tokens_per_request` — p95 should be within 20% of the eval-time distribution; a spike signals retrieval growth or prompt drift.
97
+
98
+ Cross-reference `skills/hatch3r-observability-verify`.
99
+
100
+ ## Step 9: Feedback loop
101
+
102
+ - Wire user thumbs-down to a feedback queue per response.
103
+ - Monthly triage job promotes thumbs-down examples into regression fixtures in `evals/<feature>/edge.jsonl`.
104
+ - Promotion is a manual review step — raw user feedback contains noise and adversarial labels.
105
+ - Capture an optional free-text comment with each thumbs-down; the comment is the highest-signal feature for triage clustering.
106
+ - Track feedback volume per response surface — a sudden spike in thumbs-down rate signals an upstream prompt or retrieval regression and gates a rollback.
107
+
108
+ ## Verdict
109
+
110
+ All 9 steps complete = the AI feature is "done". Anything less = not done. The orchestrator running this skill emits a single-line verdict per step (`STEP_N: PASS|FAIL <evidence-path>`) and aggregates them. One FAIL on any step blocks release.
111
+
112
+ Evidence paths point at concrete artifacts: the golden set (`evals/<feature>/golden.jsonl`), the prompt version (`prompts/<feature>/v<N>.md`), the eval report (`evals/<feature>/report-<run-id>.json`), and the dashboard URL for production SLI verification. Verdicts without evidence paths are not accepted by the gate.
113
+
114
+ ## When this skill runs
115
+
116
+ - After `hatch3r-implementer` finishes the surrounding non-AI feature code, before `hatch3r-qa-validation`.
117
+ - On every PR that introduces a new LLM call or modifies an existing prompt, model, or retrieval pipeline.
118
+ - Step 8 (production verification) executes against the post-deploy environment, not the PR branch.
119
+
120
+ ## Cross-References
121
+
122
+ - `rules/hatch3r-ai-evals.md` — backend governance (eval, cost, caching, fallback, SLI).
123
+ - `rules/hatch3r-ai-ux-patterns.md` — frontend UX patterns (streaming, tool-call cards, citations).
124
+ - `skills/hatch3r-ui-ux-verify/SKILL.md` — UI verification gate for AI surfaces.
125
+ - `skills/hatch3r-observability-verify` — observability wiring checklist.
126
+ - `rules/hatch3r-resilience-patterns.md` (Slice 8) — circuit-breaker + retry primitives reused in the fallback chain.
127
+
128
+ ## References
129
+
130
+ - promptfoo — `promptfoo.dev`
131
+ - DeepEval — `github.com/confident-ai/deepeval`
132
+ - RAGAS — `docs.ragas.io`
133
+ - Inspect (UK AISI) — `github.com/UKGovernmentBEIS/inspect_ai`
134
+ - Anthropic prompt caching guide — `docs.anthropic.com/en/docs/build-with-claude/prompt-caching`
135
+ - OpenTelemetry GenAI semantic conventions — `opentelemetry.io/docs/specs/semconv/gen-ai/`
136
+ - Berkeley Function Calling Leaderboard (BFCL v4) — `gorilla.cs.berkeley.edu/leaderboard.html`
@@ -14,13 +14,19 @@ cache_friendly: true
14
14
 
15
15
  ```
16
16
  Task Progress:
17
+ - [ ] Step 0: Detect ambiguity (P8 B1)
17
18
  - [ ] Step 1: Inventory existing endpoints
18
19
  - [ ] Step 2: Generate OpenAPI spec
19
20
  - [ ] Step 3: Validate schemas
20
21
  - [ ] Step 4: Generate documentation
21
22
  - [ ] Step 5: Verify spec accuracy
23
+ - [ ] Step 6: Wire oasdiff breaking-change CI gate
22
24
  ```
23
25
 
26
+ ## Step 0 — Detect Ambiguity (P8 B1)
27
+
28
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: OpenAPI version (3.0 vs 3.1), spec output path, auth scheme (Bearer vs OAuth2 vs API key), breaking-change policy (block vs version vs document), and target consumers (SDK clients vs human docs vs both).
29
+
24
30
  ## Step 1: Inventory Existing Endpoints
25
31
 
26
32
  - Scan route definitions across the codebase (controllers, handlers, route files).
@@ -61,6 +67,72 @@ Task Progress:
61
67
  - Check that path parameters, query parameters, and headers are documented with accurate types, required flags, and example values.
62
68
  - Validate against any existing API consumers (SDKs, frontend clients) for breaking changes.
63
69
 
70
+ ## Step 6: Wire `oasdiff` Breaking-Change CI Gate
71
+
72
+ Breaking changes on stable endpoints must trip CI before merge. This step enforces the CONSTITUTION §2 P5 lean-thresholds row "API breaking-change events on stable endpoints = 0 per release" (governance/CONSTITUTION.md:80, verified by `oasdiff / buf breaking / graphql-inspector CI gate`).
73
+
74
+ ### 6.1 Install `oasdiff`
75
+
76
+ Pick one of two install paths:
77
+
78
+ - npm global (CI runner with Node 22+): `npm i -g @tufin/oasdiff`
79
+ - Docker image (no Node dependency): `docker run --rm -t -v $(pwd):/specs tufin/oasdiff <subcommand>`
80
+
81
+ Pin the version in CI (e.g., `npm i -g @tufin/oasdiff@1.10.x` or `tufin/oasdiff:1.10`) so a new release of oasdiff does not change gate semantics mid-cycle.
82
+
83
+ ### 6.2 Compare current spec vs previous merged version
84
+
85
+ The gate compares the spec on the feature branch against the spec at the merge base on the default branch. Fail CI on any breaking change to a stable endpoint; report non-breaking diffs as informational.
86
+
87
+ - Fetch the base ref's spec into a temp path (e.g., `git show origin/main:openapi.yaml > /tmp/openapi.base.yaml`).
88
+ - Run `oasdiff breaking /tmp/openapi.base.yaml ./openapi.yaml --fail-on ERR` — exit code 1 when one or more `ERR`-level breaking changes are detected.
89
+ - Scope the gate to stable endpoints by excluding paths tagged `x-stability: experimental` via `--match-path` or by maintaining an `oasdiff-ignore.yaml` rules file for documented breaking changes already coordinated with consumers.
90
+
91
+ ### 6.3 Example GitHub Actions step
92
+
93
+ ```yaml
94
+ name: API Breaking-Change Gate
95
+ on:
96
+ pull_request:
97
+ paths:
98
+ - 'openapi.yaml'
99
+ - 'openapi.json'
100
+ - 'docs/api/**'
101
+
102
+ jobs:
103
+ oasdiff:
104
+ runs-on: ubuntu-latest
105
+ steps:
106
+ - uses: actions/checkout@v4
107
+ with:
108
+ fetch-depth: 0
109
+ - uses: actions/setup-node@v4
110
+ with:
111
+ node-version: '22'
112
+ - name: Install oasdiff
113
+ run: npm i -g @tufin/oasdiff@1.10.x
114
+ - name: Resolve base spec
115
+ run: |
116
+ git show origin/${{ github.base_ref }}:openapi.yaml > /tmp/openapi.base.yaml
117
+ - name: Run breaking-change diff
118
+ run: |
119
+ oasdiff breaking /tmp/openapi.base.yaml ./openapi.yaml \
120
+ --fail-on ERR \
121
+ --format githubactions
122
+ ```
123
+
124
+ The `--format githubactions` flag emits `::error::` annotations so each breaking change shows up inline on the PR diff.
125
+
126
+ ### 6.4 Handling an intentional breaking change
127
+
128
+ When a breaking change is deliberate (versioned endpoint cut, deprecated field removed after the documented sunset window):
129
+
130
+ 1. Add a row to `oasdiff-ignore.yaml` with the change ID, the affected operation, and a link to the consumer-coordination record.
131
+ 2. Bump the spec `info.version` in line with the project's API versioning policy (semver-major for breaking changes on stable endpoints).
132
+ 3. Document the change in CHANGELOG (or equivalent) with the migration path for downstream consumers.
133
+
134
+ The gate stays green only because the change is recorded — not because the breaking signal was silenced.
135
+
64
136
  ## Error Handling
65
137
 
66
138
  - **Route definitions use dynamic or meta-programmed patterns**: If endpoints are generated at runtime or via decorators that resist static analysis, document the gap and manually enumerate the missing endpoints.
@@ -74,3 +146,4 @@ Task Progress:
74
146
  - [ ] Spec passes linter validation
75
147
  - [ ] Example requests/responses included
76
148
  - [ ] No breaking changes to existing API consumers
149
+ - [ ] `oasdiff breaking` CI gate is wired and fails on any `ERR`-level breaking change on stable endpoints (CONSTITUTION §2 P5: 0 per release)
@@ -12,6 +12,7 @@ cache_friendly: true
12
12
 
13
13
  ```
14
14
  Task Progress:
15
+ - [ ] Step 0: Detect ambiguity (P8 B1)
15
16
  - [ ] Step 1: Read existing ADRs and the template
16
17
  - [ ] Step 2: Define the decision context — problem, constraints, options
17
18
  - [ ] Step 3: Evaluate options — pros/cons, prototype if needed, check ADR constraints
@@ -20,6 +21,19 @@ Task Progress:
20
21
  - [ ] Step 6: Update affected specs or docs to reference the new ADR
21
22
  ```
22
23
 
24
+ ## Step 0 — Detect Ambiguity (P8 B1)
25
+
26
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: problem framing (what decision needs to be made), constraint set (mandatory vs preferred), evaluation horizon (short-term vs long-term cost), supersedes which prior ADR, and ADR status target (PROPOSED for discussion vs ACCEPTED for binding decision).
27
+
28
+ ## Fan-out Discipline (P8 B2)
29
+
30
+ This skill delegates per task size:
31
+ - Tier 1 (trivial single-file): inline execution acceptable.
32
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
33
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
34
+
35
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
36
+
23
37
  ## Step 1: Read Existing ADRs and Template
24
38
 
25
39
  - Read all ADRs in project docs to understand current architecture and constraints.
@@ -14,6 +14,7 @@ cache_friendly: true
14
14
 
15
15
  ```
16
16
  Task Progress:
17
+ - [ ] Step 0: Detect ambiguity (P8 B1)
17
18
  - [ ] Step 1: Read the issue and relevant specs
18
19
  - [ ] Step 2: Produce a diagnosis plan
19
20
  - [ ] Step 2b: Browser reproduction (if UI bug)
@@ -25,6 +26,10 @@ Task Progress:
25
26
  - [ ] Step 6: Open PR
26
27
  ```
27
28
 
29
+ ## Step 0 — Detect Ambiguity (P8 B1)
30
+
31
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: reproduction steps incomplete, expected vs actual behavior unstated, severity unclear (P0/P1 vs P2/P3), affected environment unknown (staging vs prod), or fix may require schema/API change with downstream consumers.
32
+
28
33
  ## Step 1: Read Inputs
29
34
 
30
35
  - Parse the issue body: problem description, reproduction steps, expected/actual behavior, severity, affected area.
@@ -14,6 +14,7 @@ cache_friendly: true
14
14
 
15
15
  ```
16
16
  Task Progress:
17
+ - [ ] Step 0: Detect ambiguity (P8 B1)
17
18
  - [ ] Step 1: Audit existing pipeline
18
19
  - [ ] Step 2: Design stage structure
19
20
  - [ ] Step 3: Optimize test parallelization
@@ -21,6 +22,19 @@ Task Progress:
21
22
  - [ ] Step 5: Implement and validate
22
23
  ```
23
24
 
25
+ ## Step 0 — Detect Ambiguity (P8 B1)
26
+
27
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: CI platform (GitHub Actions vs GitLab vs CircleCI vs Azure Pipelines), pipeline duration target, runner sizing budget, deploy gate (auto vs manual approval for prod), and artifact retention policy.
28
+
29
+ ## Fan-out Discipline (P8 B2)
30
+
31
+ This skill delegates per task size:
32
+ - Tier 1 (trivial single-file): inline execution acceptable.
33
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
34
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
35
+
36
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
37
+
24
38
  ## Step 1: Audit Existing Pipeline
25
39
 
26
40
  - Map the current pipeline stages, their dependencies, and execution times.
@@ -0,0 +1,84 @@
1
+ ---
2
+ id: hatch3r-cli-aichat
3
+ description: "Multi-provider LLM chat CLI with RAG and session memory. Use when RAG-enabled multi-provider conversational shell with saved session history; invoke `aichat`. Streams tokens to stdout so downstream `grep`/`tee` consumers see partial results."
4
+ tags: ["cli-tools", "ai", "opt-in"]
5
+ quality_charter: agents/shared/quality-charter.md
6
+ efficiency_patterns: agents/shared/efficiency-patterns.md
7
+ cache_friendly: true
8
+ cli_tool:
9
+ id: aichat
10
+ bin: aichat
11
+ tier: 3
12
+ category: ai
13
+ homepage: https://github.com/sigoden/aichat
14
+ ---
15
+ <!-- HATCH3R-CLI-SKILL-GENERATED v1 -->
16
+ # aichat
17
+
18
+ Multi-provider LLM chat CLI with RAG and session memory
19
+
20
+ ## When to Use
21
+
22
+ Reach for `aichat` when the task is in the **ai** category and the agent would otherwise call an MCP tool or read large outputs into context.
23
+
24
+ ## Token Cost
25
+
26
+ CLI tools return structured stdout that fits in <1KB for typical queries; equivalent MCP calls regularly exceed 10KB.
27
+ Reference: Anthropic engineering (Nov 4 2025) — code-execution-over-MCP yields 98.7% token reduction.
28
+
29
+ ## Recipes
30
+
31
+ ```bash
32
+ aichat 'explain this commit message' < commit.txt
33
+ ```
34
+ One-shot prompt with stdin as the input payload.
35
+
36
+ ```bash
37
+ aichat -r 'tech writer' 'rewrite as bullets' < draft.md
38
+ ```
39
+ Apply a saved role (`~/.config/aichat/roles/tech-writer.md`) as the system prompt.
40
+
41
+ ```bash
42
+ aichat --model claude-3-5-sonnet -e 'summarize' README.md
43
+ ```
44
+ Pin the model and pass a file argument directly — `-e` executes the prompt non-interactively.
45
+
46
+ ```bash
47
+ aichat --rag mydocs 'how do we configure auth?'
48
+ ```
49
+ Query a pre-built RAG index over local documentation — runs embeddings locally, no remote indexer needed.
50
+
51
+ ```bash
52
+ aichat --session refactor-plan
53
+ ```
54
+ Resume a named session with persisted history — useful for multi-turn refinement loops.
55
+
56
+ ## Wrong Choice When
57
+
58
+ - **Scripted Unix-style pipelines with a rich plugin ecosystem:** `hatch3r-cli-llm` (tier 2) has plugin support for templates, embeddings, and provider adapters not in aichat.
59
+ - **Offline-only / fully local inference:** aichat supports Ollama backends but adds an unneeded abstraction; talk to Ollama's HTTP API directly via `curl`.
60
+ - **CI batch tasks that benefit from `mods` pipe semantics:** `hatch3r-cli-mods` reads a single piped payload then exits — simpler for one-shot transforms.
61
+
62
+ ## Alternatives
63
+
64
+ | Tool | When to prefer |
65
+ |------|----------------|
66
+ | `hatch3r-cli-llm` (tier 2) | Plugin ecosystem, templates, embeddings, structured CI use |
67
+ | `hatch3r-cli-mods` (tier 3) | Single-piped-payload transforms, Unix-pipe ergonomics |
68
+ | Raw `curl` against Ollama / provider HTTP API | Maximum control, no client-side caching or session state |
69
+
70
+ ## Detection / Install
71
+
72
+ Verify with:
73
+ ```bash
74
+ command -v aichat
75
+ ```
76
+
77
+ Install (mac):
78
+
79
+ ```bash
80
+ # brew
81
+ brew install aichat
82
+ ```
83
+
84
+ Homepage: https://github.com/sigoden/aichat
@@ -0,0 +1,85 @@
1
+ ---
2
+ id: hatch3r-cli-ast-grep
3
+ description: "Structural search and rewrite for code via AST patterns. Use when Tree-sitter AST pattern rewrites scoped to a single grammar; invoke `sg`. Grammar-aware: queries are written in the same syntax as the language being edited."
4
+ tags: ["cli-tools", "search", "core"]
5
+ quality_charter: agents/shared/quality-charter.md
6
+ efficiency_patterns: agents/shared/efficiency-patterns.md
7
+ cache_friendly: true
8
+ cli_tool:
9
+ id: ast-grep
10
+ bin: sg
11
+ tier: 1
12
+ category: search
13
+ homepage: https://ast-grep.github.io/
14
+ ---
15
+ <!-- HATCH3R-CLI-SKILL-GENERATED v1 -->
16
+ # ast-grep
17
+
18
+ Structural search and rewrite for code via AST patterns
19
+
20
+ ## When to Use
21
+
22
+ Reach for `sg` when the task is in the **search** category and the agent would otherwise call an MCP tool or read large outputs into context.
23
+
24
+ ## Token Cost
25
+
26
+ CLI tools return structured stdout that fits in <1KB for typical queries; equivalent MCP calls regularly exceed 10KB.
27
+ Reference: Anthropic engineering (Nov 4 2025) — code-execution-over-MCP yields 98.7% token reduction.
28
+
29
+ ## Recipes
30
+
31
+ ```bash
32
+ sg --pattern 'console.log($MSG)' --lang ts src/
33
+ ```
34
+ Pattern with a meta-variable (`$MSG`) — matches any `console.log` call regardless of whitespace or argument shape.
35
+
36
+ ```bash
37
+ sg run -p 'await $FN()' -r 'await ($FN()).catch(e => log(e))' --update-all src/
38
+ ```
39
+ Structural rewrite: every bare `await $FN()` gains a `.catch` arm; `--update-all` writes in place.
40
+
41
+ ```bash
42
+ sg scan --config sgconfig.yml
43
+ ```
44
+ Runs a rule pack from `sgconfig.yml` — repo-pinned lints that survive regex edits.
45
+
46
+ ```bash
47
+ sg test --update-snapshots
48
+ ```
49
+ Snapshot-style tests for rules — keeps rule packs honest as the codebase shifts.
50
+
51
+ ```bash
52
+ sg --pattern 'function $NAME($$$ARGS) { $$$BODY }' --lang ts --json src/
53
+ ```
54
+ Triple-`$` captures the rest of an argument list or body — JSON output feeds `jq` for downstream filtering.
55
+
56
+ ## Wrong Choice When
57
+
58
+ - Don't reach for `sg` when the target is plain literal text (a TODO marker, a string in CHANGELOG). Reach for `ripgrep` (`hatch3r-cli-ripgrep`) — orders of magnitude faster on raw matching.
59
+ - Don't use `sg` for cross-language SAST policy work (e.g., taint analysis). Reach for `semgrep`, which has rule packs, CI integrations, and a security-audit lineage.
60
+ - Don't reach for `sg` on languages it does not parse (Bash, Makefile, INI). The pattern compiler will reject the request — fall back to `ripgrep` + `sd`.
61
+
62
+ ## Alternatives
63
+
64
+ | Tool | When to prefer |
65
+ |------|----------------|
66
+ | `ripgrep` (`hatch3r-cli-ripgrep`) | Literal regex over text — ast-grep is overkill if you do not need structural matching. |
67
+ | `semgrep` | Security/policy rule packs, multi-language SAST, central rule registry. |
68
+ | `comby` | Multi-language structural rewrites with template syntax and no per-language plugin. |
69
+ | Editor refactor / language server | Authoritative rename or extract-method with full type information. |
70
+
71
+ ## Detection / Install
72
+
73
+ Verify with:
74
+ ```bash
75
+ command -v sg
76
+ ```
77
+
78
+ Install (mac):
79
+
80
+ ```bash
81
+ # brew
82
+ brew install ast-grep
83
+ ```
84
+
85
+ Homepage: https://ast-grep.github.io/
@@ -0,0 +1,89 @@
1
+ ---
2
+ id: hatch3r-cli-az-devops
3
+ description: "Azure DevOps work items, repos, pipelines via az CLI extension. Use when Azure DevOps work-item edits, repo pushes, and pipeline runs; invoke `az`. Authenticates via the platform's native token mechanism (OAuth / PAT)."
4
+ tags: ["cli-tools", "forge"]
5
+ quality_charter: agents/shared/quality-charter.md
6
+ efficiency_patterns: agents/shared/efficiency-patterns.md
7
+ cache_friendly: true
8
+ cli_tool:
9
+ id: az-devops
10
+ bin: az
11
+ tier: 2
12
+ category: forge
13
+ homepage: https://learn.microsoft.com/en-us/cli/azure/azure-devops
14
+ ---
15
+ <!-- HATCH3R-CLI-SKILL-GENERATED v1 -->
16
+ # az-devops
17
+
18
+ Azure DevOps work items, repos, pipelines via az CLI extension
19
+
20
+ ## When to Use
21
+
22
+ Reach for `az` when the task is in the **forge** category and the agent would otherwise call an MCP tool or read large outputs into context.
23
+
24
+ ## Token Cost
25
+
26
+ CLI tools return structured stdout that fits in <1KB for typical queries; equivalent MCP calls regularly exceed 10KB.
27
+ Reference: Anthropic engineering (Nov 4 2025) — code-execution-over-MCP yields 98.7% token reduction.
28
+
29
+ ## Recipes
30
+
31
+ ```bash
32
+ az repos pr list --status active --query '[].pullRequestId' --output tsv
33
+ ```
34
+ Print active PR IDs as a newline-separated list; `--query` (JMESPath) trims the payload before stdout.
35
+
36
+ ```bash
37
+ az repos pr show --id 42 --output json
38
+ ```
39
+ Fetch a single PR's metadata as JSON for downstream `jq` filters.
40
+
41
+ ```bash
42
+ az boards work-item show --id 4242 --output json
43
+ ```
44
+ Pull a work item (bug, task, user story) by numeric ID; one round-trip, structured output.
45
+
46
+ ```bash
47
+ az boards work-item create --type Bug --title 'flaky import test' --description 'Repro: ...'
48
+ ```
49
+ Open a work item from CI or an agent; the new ID is printed on stdout.
50
+
51
+ ```bash
52
+ az pipelines run --name CI --branch main
53
+ ```
54
+ Queue a pipeline run on a named definition; returns the build ID for polling.
55
+
56
+ ```bash
57
+ az artifacts universal download --feed myfeed --name pkg --version 1.0.0 --path .
58
+ ```
59
+ Fetch a Universal Package into the cwd — avoids the larger Azure Artifacts MCP equivalents.
60
+
61
+ ## Wrong Choice When
62
+
63
+ - The repo is on GitHub — use `gh` (Tier 1); `az repos` will return 404s without a configured Azure project.
64
+ - The repo is on GitLab — use `glab` (Tier 2 sibling); same operations, native auth.
65
+ - You only need to download a public release asset — `curl` to the artifact URL is one hop.
66
+
67
+ ## Alternatives
68
+
69
+ | Tool | When to prefer |
70
+ |------|----------------|
71
+ | `gh` | GitHub-hosted code or issues. |
72
+ | `glab` | GitLab-hosted code or issues. |
73
+ | `curl` + `AZURE_DEVOPS_PAT` | Endpoint not surfaced by `az devops`; need raw header control. |
74
+
75
+ ## Detection / Install
76
+
77
+ Verify with:
78
+ ```bash
79
+ command -v az
80
+ ```
81
+
82
+ Install (mac):
83
+
84
+ ```bash
85
+ # brew
86
+ brew install azure-cli && az extension add --name azure-devops
87
+ ```
88
+
89
+ Homepage: https://learn.microsoft.com/en-us/cli/azure/azure-devops