hatch3r 1.7.5 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (75) hide show
  1. package/README.md +2 -2
  2. package/agents/hatch3r-context-rules.md +22 -6
  3. package/agents/hatch3r-creator.md +2 -1
  4. package/agents/hatch3r-handoff-loader.md +1 -1
  5. package/agents/hatch3r-implementer.md +8 -0
  6. package/agents/hatch3r-learnings-loader.md +1 -1
  7. package/agents/hatch3r-reviewer.md +2 -0
  8. package/agents/shared/user-content-templates.md +31 -1
  9. package/commands/hatch3r-agent-customize.md +4 -0
  10. package/commands/hatch3r-api-spec.md +7 -0
  11. package/commands/hatch3r-benchmark.md +7 -0
  12. package/commands/hatch3r-board-fill.md +7 -0
  13. package/commands/hatch3r-board-groom.md +4 -0
  14. package/commands/hatch3r-board-init.md +51 -0
  15. package/commands/hatch3r-board-pickup.md +8 -0
  16. package/commands/hatch3r-board-refresh.md +4 -0
  17. package/commands/hatch3r-board-shared.md +6 -6
  18. package/commands/hatch3r-bug-plan.md +7 -0
  19. package/commands/hatch3r-codebase-map.md +8 -0
  20. package/commands/hatch3r-command-customize.md +4 -0
  21. package/commands/hatch3r-context-health.md +5 -0
  22. package/commands/hatch3r-create.md +57 -4
  23. package/commands/hatch3r-debug.md +7 -0
  24. package/commands/hatch3r-dep-audit.md +4 -0
  25. package/commands/hatch3r-feature-plan.md +7 -0
  26. package/commands/hatch3r-handoff.md +7 -0
  27. package/commands/hatch3r-healthcheck.md +4 -0
  28. package/commands/hatch3r-hooks.md +4 -0
  29. package/commands/hatch3r-learn.md +16 -0
  30. package/commands/hatch3r-migration-plan.md +7 -0
  31. package/commands/hatch3r-onboard.md +7 -0
  32. package/commands/hatch3r-pr-resolve.md +8 -1
  33. package/commands/hatch3r-project-spec.md +8 -0
  34. package/commands/hatch3r-quick-change.md +7 -0
  35. package/commands/hatch3r-recipe.md +4 -0
  36. package/commands/hatch3r-refactor-plan.md +7 -0
  37. package/commands/hatch3r-release.md +5 -0
  38. package/commands/hatch3r-revision.md +7 -0
  39. package/commands/hatch3r-roadmap.md +8 -0
  40. package/commands/hatch3r-rule-customize.md +4 -0
  41. package/commands/hatch3r-security-audit.md +4 -0
  42. package/commands/hatch3r-skill-customize.md +4 -0
  43. package/commands/hatch3r-test-plan.md +7 -0
  44. package/commands/hatch3r-workflow.md +9 -1
  45. package/dist/cli/index.js +2600 -777
  46. package/dist/cli/index.js.map +1 -1
  47. package/package.json +8 -5
  48. package/rules/hatch3r-agent-orchestration-detail.md +3 -0
  49. package/rules/hatch3r-agent-orchestration-detail.mdc +3 -0
  50. package/rules/hatch3r-agent-orchestration.md +25 -2
  51. package/rules/hatch3r-agent-orchestration.mdc +25 -2
  52. package/rules/hatch3r-iteration-summary.md +2 -0
  53. package/rules/hatch3r-iteration-summary.mdc +2 -0
  54. package/rules/hatch3r-observability-tracing-detail.md +7 -148
  55. package/rules/hatch3r-observability-tracing-detail.mdc +6 -148
  56. package/rules/hatch3r-observability-tracing.md +154 -6
  57. package/rules/hatch3r-observability-tracing.mdc +154 -6
  58. package/skills/hatch3r-agent-customize/SKILL.md +10 -0
  59. package/skills/hatch3r-ai-feature/SKILL.md +2 -0
  60. package/skills/hatch3r-api-spec/SKILL.md +68 -0
  61. package/skills/hatch3r-cli-csvkit/SKILL.md +2 -2
  62. package/skills/hatch3r-cli-duckdb/SKILL.md +3 -3
  63. package/skills/hatch3r-cli-jq/SKILL.md +4 -0
  64. package/skills/hatch3r-cli-miller/SKILL.md +2 -2
  65. package/skills/hatch3r-cli-overview/SKILL.md +1 -1
  66. package/skills/{hatch3r-cli-xsv → hatch3r-cli-qsv}/SKILL.md +20 -18
  67. package/skills/hatch3r-cli-stagehand/SKILL.md +48 -16
  68. package/skills/hatch3r-command-customize/SKILL.md +10 -0
  69. package/skills/hatch3r-customize/SKILL.md +3 -0
  70. package/skills/hatch3r-design-system-detect/SKILL.md +2 -0
  71. package/skills/hatch3r-observability-verify/SKILL.md +4 -3
  72. package/skills/hatch3r-reliability-verify/SKILL.md +2 -0
  73. package/skills/hatch3r-rule-customize/SKILL.md +10 -0
  74. package/skills/hatch3r-skill-customize/SKILL.md +10 -0
  75. package/skills/hatch3r-ui-ux-verify/SKILL.md +2 -0
@@ -1,16 +1,16 @@
1
1
  ---
2
2
  id: hatch3r-observability-tracing
3
3
  type: rule
4
- description: Distributed tracing and OpenTelemetry core conventions for the project
4
+ description: Distributed tracing, OpenTelemetry conventions, and AI agent instrumentation for the project
5
5
  scope: conditional
6
- globs: "**/*trac*,**/*span*,**/*telemetry*,**/*otel*,**/observability/**,**/routes/**,**/handlers/**,**/services/**,**/api/**,**/middleware/**,**/controllers/**,**/lib/**"
6
+ globs: "**/*trac*,**/*span*,**/*telemetry*,**/*otel*,**/*agent*,**/observability/**,**/routes/**,**/handlers/**,**/services/**,**/api/**,**/middleware/**,**/controllers/**,**/lib/**"
7
7
  tags: [devops]
8
8
  quality_charter: agents/shared/quality-charter.md
9
9
  cache_friendly: true
10
10
  ---
11
11
  # Observability -- Distributed Tracing & OpenTelemetry
12
12
 
13
- Core distributed tracing and OpenTelemetry conventions. For structured logging see `hatch3r-observability-logging`. For metrics, SLOs, alerting, and dashboards see `hatch3r-observability-metrics`. For AI agent instrumentation, tool call audit trails, and correlation ID patterns see `hatch3r-observability-tracing-detail`.
13
+ Distributed tracing, OpenTelemetry semantic conventions, AI agent instrumentation, tool call audit trails, and correlation ID patterns. For structured logging see `hatch3r-observability-logging`. For metrics, SLOs, alerting, and dashboards see `hatch3r-observability-metrics`.
14
14
 
15
15
  ## Distributed Tracing
16
16
 
@@ -82,6 +82,154 @@ Every telemetry-producing service must declare resource attributes at startup:
82
82
  - Attribute values should be low-cardinality. Never use unbounded values (full URLs with query params, raw SQL) as attribute values.
83
83
  - Prefer semantic convention attributes over custom attributes. Prefix custom attributes with your project namespace (e.g., `myapp.feature.flag_key`).
84
84
 
85
- ### AI Agent Semantic Conventions (Summary)
86
-
87
- Follow the [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) for AI/LLM agent instrumentation. Key attributes: `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`. For full attribute tables, code examples, tool call audit trails, and correlation ID patterns, see `hatch3r-observability-tracing-detail`.
85
+ ## AI Agent Instrumentation
86
+
87
+ Follow the [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) for AI/LLM agent instrumentation.
88
+
89
+ ### GenAI Span Attributes
90
+
91
+ Use these attributes on all spans representing interactions with generative AI models:
92
+
93
+ | Attribute | Type | Description | Example |
94
+ |-----------|------|-------------|---------|
95
+ | `gen_ai.system` | string | GenAI provider system name | `openai`, `anthropic`, `azure_openai` |
96
+ | `gen_ai.request.model` | string | Model name as specified in the request | `gpt-4o`, `claude-sonnet-4-20250514` |
97
+ | `gen_ai.response.model` | string | Model name as returned in the response | `gpt-4o-2024-08-06` |
98
+ | `gen_ai.request.max_tokens` | int | Maximum tokens requested for generation | `4096` |
99
+ | `gen_ai.request.temperature` | float | Temperature parameter | `0.7` |
100
+ | `gen_ai.response.finish_reasons` | string[] | Reasons the model stopped generating | `["stop"]`, `["length"]` |
101
+ | `gen_ai.usage.input_tokens` | int | Tokens in the input/prompt | `1250` |
102
+ | `gen_ai.usage.output_tokens` | int | Tokens in the generated output | `530` |
103
+
104
+ - Always set `gen_ai.system` and `gen_ai.request.model` on every GenAI span.
105
+ - Record `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens` from the API response for cost dashboards.
106
+ - Use `gen_ai.response.finish_reasons` to detect truncated outputs (`length`) and trigger re-prompting.
107
+
108
+ ### Agent Invocation Spans
109
+
110
+ Instrument the full lifecycle of an agent invocation with a dedicated span. This span is the parent for all LLM calls, tool executions, and sub-agent delegations.
111
+
112
+ - **Span name pattern:** `agent.{agent_name}.invoke`
113
+ - **Required attributes:** `agent.id`, `agent.name`, `agent.parent_id`, `agent.task`, `agent.framework`
114
+ - **Span events for state transitions:** `agent.planning`, `agent.tool_selection`, `agent.awaiting_human`, `agent.delegating`, `agent.completed`, `agent.error`
115
+
116
+ ```typescript
117
+ const agentSpan = tracer.startSpan('agent.code_reviewer.invoke', {
118
+ attributes: {
119
+ 'agent.id': invocationId,
120
+ 'agent.name': 'code_reviewer',
121
+ 'agent.parent_id': parentAgentId ?? '',
122
+ 'agent.task': `review PR #${prNumber}`,
123
+ 'agent.framework': 'custom',
124
+ },
125
+ });
126
+ agentSpan.addEvent('agent.planning');
127
+ // ... agent reasoning and tool calls happen as child spans ...
128
+ agentSpan.addEvent('agent.completed');
129
+ agentSpan.end();
130
+ ```
131
+
132
+ ### Tool Call Spans
133
+
134
+ Every tool invocation by an agent creates a child span of the agent invocation span.
135
+
136
+ - **Span name pattern:** `tool.{tool_name}.execute`
137
+ - **Required attributes:** `tool.name`, `tool.input_hash` (SHA-256), `tool.output_status`, `tool.duration_ms`, `tool.parameters_count`
138
+ - Tool spans must be children of the invoking agent span. Set span status to `ERROR` when `tool.output_status` is `error` or `timeout`.
139
+ - For tools performing I/O, create nested child spans using appropriate semantic conventions (`http.*`, `db.*`).
140
+
141
+ ```typescript
142
+ const toolSpan = tracer.startSpan(
143
+ 'tool.git_diff.execute',
144
+ { attributes: { 'tool.name': 'git_diff' } },
145
+ trace.setSpan(context.active(), agentSpan),
146
+ );
147
+ try {
148
+ const result = await tools.gitDiff(params);
149
+ toolSpan.setAttributes({
150
+ 'tool.output_status': 'success',
151
+ 'tool.duration_ms': performance.now() - startTime,
152
+ 'tool.input_hash': hashInput(params),
153
+ });
154
+ } catch (err) {
155
+ toolSpan.setAttributes({ 'tool.output_status': 'error' });
156
+ toolSpan.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
157
+ toolSpan.recordException(err);
158
+ throw err;
159
+ } finally {
160
+ toolSpan.end();
161
+ }
162
+ ```
163
+
164
+ ### LLM Request/Response Tracing
165
+
166
+ - **Span name pattern:** `gen_ai.{operation}` (e.g., `gen_ai.chat`, `gen_ai.completion`)
167
+ - **Token tracking:** Capture `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens`. Aggregate in metrics: Counter `gen_ai.tokens_total` with labels `{direction, model, agent_name}`, Histogram `gen_ai.request_duration_ms`.
168
+ - **Model version tracking:** Record both `gen_ai.request.model` and `gen_ai.response.model` for drift detection.
169
+ - **Retry spans:** Each retry attempt is a separate child span. Set `gen_ai.request.retries` on the final span. Record `http.response.status_code` on failed spans (429 vs 500+).
170
+ - Never log raw prompt content or full model responses as span attributes. Use token counts for cost tracking and correlated logs for prompt debugging in non-production environments.
171
+ - Sample GenAI spans at 50-100% in production (higher than general spans) because each call is expensive and low volume.
172
+
173
+ ### Tool Call Audit Trail
174
+
175
+ Maintain a structured audit log for every tool invocation in agentic workflows, separate from tracing spans.
176
+
177
+ | Field | Type | Description |
178
+ |-------|------|-------------|
179
+ | `tool.name` | string | Name of the tool invoked |
180
+ | `tool.input_hash` | string | SHA-256 hash of tool input (never log raw input) |
181
+ | `tool.output_status` | string | `success`, `error`, `timeout`, or `denied` |
182
+ | `tool.duration_ms` | float | Execution time in milliseconds |
183
+ | `agent.id` | string | ID of the invoking agent |
184
+ | `agent.name` | string | Human-readable agent name |
185
+ | `correlation.id` | string | Trace correlation ID |
186
+ | `timestamp` | string | ISO 8601 timestamp |
187
+ | `session.id` | string | Session identifier |
188
+
189
+ - Log tool invocations at `info` level, failures at `error` level with `error.type` and `error.message`.
190
+ - Aggregate tool call counts per agent per session for anomaly detection.
191
+ - Retain audit logs for a minimum of 90 days.
192
+
193
+ ### Correlation IDs for Agent Workflows
194
+
195
+ - Use UUIDv4 with workflow-type prefix: `{workflow-type}-{uuid}` (e.g., `agent-run-550e8400-...`).
196
+ - Generate at the workflow entry point. Propagate to all sub-agents and tool calls.
197
+ - Every log entry, span, and metric must include `correlation.id`.
198
+ - Cross-process: propagate via `X-Correlation-ID` header alongside W3C Trace Context.
199
+ - Use OpenTelemetry `SpanLink` for cross-workflow references (e.g., agent run triggered by CI event).
200
+
201
+ ```typescript
202
+ import { randomUUID } from 'node:crypto';
203
+ import { context, trace, SpanStatusCode } from '@opentelemetry/api';
204
+
205
+ function generateCorrelationId(workflowType: string): string {
206
+ return `${workflowType}-${randomUUID()}`;
207
+ }
208
+
209
+ async function runAgentWorkflow(task: string): Promise<void> {
210
+ const correlationId = generateCorrelationId('agent-run');
211
+ const tracer = trace.getTracer('agent-orchestrator');
212
+ const rootSpan = tracer.startSpan('agent.orchestrator.invoke', {
213
+ attributes: {
214
+ 'correlation.id': correlationId,
215
+ 'agent.name': 'orchestrator',
216
+ 'agent.task': task,
217
+ },
218
+ });
219
+ try {
220
+ await context.with(trace.setSpan(context.active(), rootSpan), async () => {
221
+ await delegateToSubAgent('code_reviewer', {
222
+ correlationId,
223
+ parentSpanId: rootSpan.spanContext().spanId,
224
+ task: 'review changes',
225
+ });
226
+ });
227
+ } catch (err) {
228
+ rootSpan.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
229
+ rootSpan.recordException(err as Error);
230
+ throw err;
231
+ } finally {
232
+ rootSpan.end();
233
+ }
234
+ }
235
+ ```
@@ -1,11 +1,11 @@
1
1
  ---
2
- description: Distributed tracing and OpenTelemetry core conventions for the project
3
- globs: ["**/*trac*", "**/*span*", "**/*telemetry*", "**/*otel*", "**/observability/**", "**/routes/**", "**/handlers/**", "**/services/**", "**/api/**", "**/middleware/**", "**/controllers/**", "**/lib/**"]
2
+ description: Distributed tracing, OpenTelemetry conventions, and AI agent instrumentation for the project
3
+ globs: ["**/*trac*", "**/*span*", "**/*telemetry*", "**/*otel*", "**/*agent*", "**/observability/**", "**/routes/**", "**/handlers/**", "**/services/**", "**/api/**", "**/middleware/**", "**/controllers/**", "**/lib/**"]
4
4
  alwaysApply: false
5
5
  ---
6
6
  # Observability -- Distributed Tracing & OpenTelemetry
7
7
 
8
- Core distributed tracing and OpenTelemetry conventions. For structured logging see `hatch3r-observability-logging`. For metrics, SLOs, alerting, and dashboards see `hatch3r-observability-metrics`. For AI agent instrumentation, tool call audit trails, and correlation ID patterns see `hatch3r-observability-tracing-detail`.
8
+ Distributed tracing, OpenTelemetry semantic conventions, AI agent instrumentation, tool call audit trails, and correlation ID patterns. For structured logging see `hatch3r-observability-logging`. For metrics, SLOs, alerting, and dashboards see `hatch3r-observability-metrics`.
9
9
 
10
10
  ## Distributed Tracing
11
11
 
@@ -77,6 +77,154 @@ Every telemetry-producing service must declare resource attributes at startup:
77
77
  - Attribute values should be low-cardinality. Never use unbounded values (full URLs with query params, raw SQL) as attribute values.
78
78
  - Prefer semantic convention attributes over custom attributes. Prefix custom attributes with your project namespace (e.g., `myapp.feature.flag_key`).
79
79
 
80
- ### AI Agent Semantic Conventions (Summary)
81
-
82
- Follow the [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) for AI/LLM agent instrumentation. Key attributes: `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`. For full attribute tables, code examples, tool call audit trails, and correlation ID patterns, see `hatch3r-observability-tracing-detail`.
80
+ ## AI Agent Instrumentation
81
+
82
+ Follow the [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) for AI/LLM agent instrumentation.
83
+
84
+ ### GenAI Span Attributes
85
+
86
+ Use these attributes on all spans representing interactions with generative AI models:
87
+
88
+ | Attribute | Type | Description | Example |
89
+ |-----------|------|-------------|---------|
90
+ | `gen_ai.system` | string | GenAI provider system name | `openai`, `anthropic`, `azure_openai` |
91
+ | `gen_ai.request.model` | string | Model name as specified in the request | `gpt-4o`, `claude-sonnet-4-20250514` |
92
+ | `gen_ai.response.model` | string | Model name as returned in the response | `gpt-4o-2024-08-06` |
93
+ | `gen_ai.request.max_tokens` | int | Maximum tokens requested for generation | `4096` |
94
+ | `gen_ai.request.temperature` | float | Temperature parameter | `0.7` |
95
+ | `gen_ai.response.finish_reasons` | string[] | Reasons the model stopped generating | `["stop"]`, `["length"]` |
96
+ | `gen_ai.usage.input_tokens` | int | Tokens in the input/prompt | `1250` |
97
+ | `gen_ai.usage.output_tokens` | int | Tokens in the generated output | `530` |
98
+
99
+ - Always set `gen_ai.system` and `gen_ai.request.model` on every GenAI span.
100
+ - Record `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens` from the API response for cost dashboards.
101
+ - Use `gen_ai.response.finish_reasons` to detect truncated outputs (`length`) and trigger re-prompting.
102
+
103
+ ### Agent Invocation Spans
104
+
105
+ Instrument the full lifecycle of an agent invocation with a dedicated span. This span is the parent for all LLM calls, tool executions, and sub-agent delegations.
106
+
107
+ - **Span name pattern:** `agent.{agent_name}.invoke`
108
+ - **Required attributes:** `agent.id`, `agent.name`, `agent.parent_id`, `agent.task`, `agent.framework`
109
+ - **Span events for state transitions:** `agent.planning`, `agent.tool_selection`, `agent.awaiting_human`, `agent.delegating`, `agent.completed`, `agent.error`
110
+
111
+ ```typescript
112
+ const agentSpan = tracer.startSpan('agent.code_reviewer.invoke', {
113
+ attributes: {
114
+ 'agent.id': invocationId,
115
+ 'agent.name': 'code_reviewer',
116
+ 'agent.parent_id': parentAgentId ?? '',
117
+ 'agent.task': `review PR #${prNumber}`,
118
+ 'agent.framework': 'custom',
119
+ },
120
+ });
121
+ agentSpan.addEvent('agent.planning');
122
+ // ... agent reasoning and tool calls happen as child spans ...
123
+ agentSpan.addEvent('agent.completed');
124
+ agentSpan.end();
125
+ ```
126
+
127
+ ### Tool Call Spans
128
+
129
+ Every tool invocation by an agent creates a child span of the agent invocation span.
130
+
131
+ - **Span name pattern:** `tool.{tool_name}.execute`
132
+ - **Required attributes:** `tool.name`, `tool.input_hash` (SHA-256), `tool.output_status`, `tool.duration_ms`, `tool.parameters_count`
133
+ - Tool spans must be children of the invoking agent span. Set span status to `ERROR` when `tool.output_status` is `error` or `timeout`.
134
+ - For tools performing I/O, create nested child spans using appropriate semantic conventions (`http.*`, `db.*`).
135
+
136
+ ```typescript
137
+ const toolSpan = tracer.startSpan(
138
+ 'tool.git_diff.execute',
139
+ { attributes: { 'tool.name': 'git_diff' } },
140
+ trace.setSpan(context.active(), agentSpan),
141
+ );
142
+ try {
143
+ const result = await tools.gitDiff(params);
144
+ toolSpan.setAttributes({
145
+ 'tool.output_status': 'success',
146
+ 'tool.duration_ms': performance.now() - startTime,
147
+ 'tool.input_hash': hashInput(params),
148
+ });
149
+ } catch (err) {
150
+ toolSpan.setAttributes({ 'tool.output_status': 'error' });
151
+ toolSpan.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
152
+ toolSpan.recordException(err);
153
+ throw err;
154
+ } finally {
155
+ toolSpan.end();
156
+ }
157
+ ```
158
+
159
+ ### LLM Request/Response Tracing
160
+
161
+ - **Span name pattern:** `gen_ai.{operation}` (e.g., `gen_ai.chat`, `gen_ai.completion`)
162
+ - **Token tracking:** Capture `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens`. Aggregate in metrics: Counter `gen_ai.tokens_total` with labels `{direction, model, agent_name}`, Histogram `gen_ai.request_duration_ms`.
163
+ - **Model version tracking:** Record both `gen_ai.request.model` and `gen_ai.response.model` for drift detection.
164
+ - **Retry spans:** Each retry attempt is a separate child span. Set `gen_ai.request.retries` on the final span. Record `http.response.status_code` on failed spans (429 vs 500+).
165
+ - Never log raw prompt content or full model responses as span attributes. Use token counts for cost tracking and correlated logs for prompt debugging in non-production environments.
166
+ - Sample GenAI spans at 50-100% in production (higher than general spans) because each call is expensive and low volume.
167
+
168
+ ### Tool Call Audit Trail
169
+
170
+ Maintain a structured audit log for every tool invocation in agentic workflows, separate from tracing spans.
171
+
172
+ | Field | Type | Description |
173
+ |-------|------|-------------|
174
+ | `tool.name` | string | Name of the tool invoked |
175
+ | `tool.input_hash` | string | SHA-256 hash of tool input (never log raw input) |
176
+ | `tool.output_status` | string | `success`, `error`, `timeout`, or `denied` |
177
+ | `tool.duration_ms` | float | Execution time in milliseconds |
178
+ | `agent.id` | string | ID of the invoking agent |
179
+ | `agent.name` | string | Human-readable agent name |
180
+ | `correlation.id` | string | Trace correlation ID |
181
+ | `timestamp` | string | ISO 8601 timestamp |
182
+ | `session.id` | string | Session identifier |
183
+
184
+ - Log tool invocations at `info` level, failures at `error` level with `error.type` and `error.message`.
185
+ - Aggregate tool call counts per agent per session for anomaly detection.
186
+ - Retain audit logs for a minimum of 90 days.
187
+
188
+ ### Correlation IDs for Agent Workflows
189
+
190
+ - Use UUIDv4 with workflow-type prefix: `{workflow-type}-{uuid}` (e.g., `agent-run-550e8400-...`).
191
+ - Generate at the workflow entry point. Propagate to all sub-agents and tool calls.
192
+ - Every log entry, span, and metric must include `correlation.id`.
193
+ - Cross-process: propagate via `X-Correlation-ID` header alongside W3C Trace Context.
194
+ - Use OpenTelemetry `SpanLink` for cross-workflow references (e.g., agent run triggered by CI event).
195
+
196
+ ```typescript
197
+ import { randomUUID } from 'node:crypto';
198
+ import { context, trace, SpanStatusCode } from '@opentelemetry/api';
199
+
200
+ function generateCorrelationId(workflowType: string): string {
201
+ return `${workflowType}-${randomUUID()}`;
202
+ }
203
+
204
+ async function runAgentWorkflow(task: string): Promise<void> {
205
+ const correlationId = generateCorrelationId('agent-run');
206
+ const tracer = trace.getTracer('agent-orchestrator');
207
+ const rootSpan = tracer.startSpan('agent.orchestrator.invoke', {
208
+ attributes: {
209
+ 'correlation.id': correlationId,
210
+ 'agent.name': 'orchestrator',
211
+ 'agent.task': task,
212
+ },
213
+ });
214
+ try {
215
+ await context.with(trace.setSpan(context.active(), rootSpan), async () => {
216
+ await delegateToSubAgent('code_reviewer', {
217
+ correlationId,
218
+ parentSpanId: rootSpan.spanContext().spanId,
219
+ task: 'review changes',
220
+ });
221
+ });
222
+ } catch (err) {
223
+ rootSpan.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
224
+ rootSpan.recordException(err as Error);
225
+ throw err;
226
+ } finally {
227
+ rootSpan.end();
228
+ }
229
+ }
230
+ ```
@@ -5,9 +5,19 @@ tags: [customize]
5
5
  quality_charter: agents/shared/quality-charter.md
6
6
  efficiency_patterns: agents/shared/efficiency-patterns.md
7
7
  cache_friendly: true
8
+ redirect_to: hatch3r-customize
8
9
  ---
9
10
  # Agent Customization
10
11
 
11
12
  > **This skill has been consolidated.** Use the `hatch3r-customize` skill with `type: agent`.
12
13
 
13
14
  For agent-specific reference (model resolution, protected agents, YAML schema), see the `hatch3r-agent-customize` command.
15
+
16
+ ## Rejected Merge Alternative (D16.3 add-vs-remove bias)
17
+
18
+ Per `governance/audit/domains/D16-compound-system.md` SA 16.3, the default recommendation on functional overlap is MERGE rather than removal. Full deletion of this redirect file was rejected for two reasons:
19
+
20
+ 1. **Preserves UX entry points.** Users typed `/h4tcher-agent-customize` or referenced the id `hatch3r-agent-customize` (per CHANGELOG.md, `website/docs/reference/configuration.md:325`, `docs/model-selection.md:158`) before consolidation. Deleting the id breaks those entry points without a redirect target.
21
+ 2. **Signals umbrella canonicality.** The `redirect_to: hatch3r-customize` frontmatter field marks `hatch3r-customize` as the single source of truth — tooling, audit scans, and adapters can resolve any redirect to the canonical without re-reading body prose.
22
+
23
+ The 13-LOC redirect cost is paid once per type; the umbrella body lives in `skills/hatch3r-customize/SKILL.md`.
@@ -4,6 +4,8 @@ type: skill
4
4
  description: Eval-driven development workflow for shipping AI features — write eval before prompt, measure, iterate, ship with caching + cost telemetry + model fallback + hallucination SLI
5
5
  tags: [implementation, ai]
6
6
  quality_charter: agents/shared/quality-charter.md
7
+ efficiency_patterns: agents/shared/efficiency-patterns.md
8
+ cache_friendly: true
7
9
  ---
8
10
  # AI Feature Workflow (Eval-Driven)
9
11
 
@@ -20,6 +20,7 @@ Task Progress:
20
20
  - [ ] Step 3: Validate schemas
21
21
  - [ ] Step 4: Generate documentation
22
22
  - [ ] Step 5: Verify spec accuracy
23
+ - [ ] Step 6: Wire oasdiff breaking-change CI gate
23
24
  ```
24
25
 
25
26
  ## Step 0 — Detect Ambiguity (P8 B1)
@@ -66,6 +67,72 @@ Before any work, scan the invocation for unresolved questions in scope, intent,
66
67
  - Check that path parameters, query parameters, and headers are documented with accurate types, required flags, and example values.
67
68
  - Validate against any existing API consumers (SDKs, frontend clients) for breaking changes.
68
69
 
70
+ ## Step 6: Wire `oasdiff` Breaking-Change CI Gate
71
+
72
+ Breaking changes on stable endpoints must trip CI before merge. This step enforces the CONSTITUTION §2 P5 lean-thresholds row "API breaking-change events on stable endpoints = 0 per release" (governance/CONSTITUTION.md:80, verified by `oasdiff / buf breaking / graphql-inspector CI gate`).
73
+
74
+ ### 6.1 Install `oasdiff`
75
+
76
+ Pick one of two install paths:
77
+
78
+ - npm global (CI runner with Node 22+): `npm i -g @tufin/oasdiff`
79
+ - Docker image (no Node dependency): `docker run --rm -t -v $(pwd):/specs tufin/oasdiff <subcommand>`
80
+
81
+ Pin the version in CI (e.g., `npm i -g @tufin/oasdiff@1.10.x` or `tufin/oasdiff:1.10`) so a new release of oasdiff does not change gate semantics mid-cycle.
82
+
83
+ ### 6.2 Compare current spec vs previous merged version
84
+
85
+ The gate compares the spec on the feature branch against the spec at the merge base on the default branch. Fail CI on any breaking change to a stable endpoint; report non-breaking diffs as informational.
86
+
87
+ - Fetch the base ref's spec into a temp path (e.g., `git show origin/main:openapi.yaml > /tmp/openapi.base.yaml`).
88
+ - Run `oasdiff breaking /tmp/openapi.base.yaml ./openapi.yaml --fail-on ERR` — exit code 1 when one or more `ERR`-level breaking changes are detected.
89
+ - Scope the gate to stable endpoints by excluding paths tagged `x-stability: experimental` via `--match-path` or by maintaining an `oasdiff-ignore.yaml` rules file for documented breaking changes already coordinated with consumers.
90
+
91
+ ### 6.3 Example GitHub Actions step
92
+
93
+ ```yaml
94
+ name: API Breaking-Change Gate
95
+ on:
96
+ pull_request:
97
+ paths:
98
+ - 'openapi.yaml'
99
+ - 'openapi.json'
100
+ - 'docs/api/**'
101
+
102
+ jobs:
103
+ oasdiff:
104
+ runs-on: ubuntu-latest
105
+ steps:
106
+ - uses: actions/checkout@v4
107
+ with:
108
+ fetch-depth: 0
109
+ - uses: actions/setup-node@v4
110
+ with:
111
+ node-version: '22'
112
+ - name: Install oasdiff
113
+ run: npm i -g @tufin/oasdiff@1.10.x
114
+ - name: Resolve base spec
115
+ run: |
116
+ git show origin/${{ github.base_ref }}:openapi.yaml > /tmp/openapi.base.yaml
117
+ - name: Run breaking-change diff
118
+ run: |
119
+ oasdiff breaking /tmp/openapi.base.yaml ./openapi.yaml \
120
+ --fail-on ERR \
121
+ --format githubactions
122
+ ```
123
+
124
+ The `--format githubactions` flag emits `::error::` annotations so each breaking change shows up inline on the PR diff.
125
+
126
+ ### 6.4 Handling an intentional breaking change
127
+
128
+ When a breaking change is deliberate (versioned endpoint cut, deprecated field removed after the documented sunset window):
129
+
130
+ 1. Add a row to `oasdiff-ignore.yaml` with the change ID, the affected operation, and a link to the consumer-coordination record.
131
+ 2. Bump the spec `info.version` in line with the project's API versioning policy (semver-major for breaking changes on stable endpoints).
132
+ 3. Document the change in CHANGELOG (or equivalent) with the migration path for downstream consumers.
133
+
134
+ The gate stays green only because the change is recorded — not because the breaking signal was silenced.
135
+
69
136
  ## Error Handling
70
137
 
71
138
  - **Route definitions use dynamic or meta-programmed patterns**: If endpoints are generated at runtime or via decorators that resist static analysis, document the gap and manually enumerate the missing endpoints.
@@ -79,3 +146,4 @@ Before any work, scan the invocation for unresolved questions in scope, intent,
79
146
  - [ ] Spec passes linter validation
80
147
  - [ ] Example requests/responses included
81
148
  - [ ] No breaking changes to existing API consumers
149
+ - [ ] `oasdiff breaking` CI gate is wired and fails on any `ERR`-level breaking change on stable endpoints (CONSTITUTION §2 P5: 0 per release)
@@ -57,14 +57,14 @@ Run SQL directly against a CSV using an in-memory SQLite — no schema file requ
57
57
 
58
58
  - **Files larger than ~1M rows:** csvkit is Python-startup-heavy; `hatch3r-cli-duckdb` (tier 2) loads and queries the same file in a fraction of the time.
59
59
  - **Production SQL workloads:** csvsql is convenient but evaluates against in-memory SQLite — use a real database for anything served.
60
- - **Single-column slice or count under a few hundred MB:** `hatch3r-cli-xsv` (tier 2) is faster with lower memory pressure.
60
+ - **Single-column slice or count under a few hundred MB:** `hatch3r-cli-qsv` (tier 2) is faster with lower memory pressure.
61
61
 
62
62
  ## Alternatives
63
63
 
64
64
  | Tool | When to prefer |
65
65
  |------|----------------|
66
66
  | `hatch3r-cli-duckdb` (tier 2) | Large files, analytical SQL, Parquet, multi-file joins |
67
- | `hatch3r-cli-xsv` (tier 2) | Fast column slicing, sampling, deduping |
67
+ | `hatch3r-cli-qsv` (tier 2) | Fast column slicing, sampling, deduping |
68
68
  | `hatch3r-cli-miller` (tier 3) | Streaming put/filter DSL, format conversion |
69
69
 
70
70
  ## Detection / Install
@@ -36,7 +36,7 @@ Count rows across a Parquet glob — no schema declaration, no import step.
36
36
  ```bash
37
37
  duckdb -c "COPY (SELECT * FROM 'in.csv' WHERE active) TO 'out.parquet' (FORMAT PARQUET)"
38
38
  ```
39
- Filter a CSV and emit columnar Parquet in one pass; ideal for downstream `xsv`/`jq` chains.
39
+ Filter a CSV and emit columnar Parquet in one pass; ideal for downstream `qsv`/`jq` chains.
40
40
 
41
41
  ```bash
42
42
  duckdb -c "ATTACH 'app.sqlite' AS sqlite; SELECT * FROM sqlite.users LIMIT 10"
@@ -55,7 +55,7 @@ Aggregate over a CSV directory; DuckDB streams the read so memory stays bounded.
55
55
 
56
56
  ## Wrong Choice When
57
57
 
58
- - The CSV has <10k rows and you only need to slice/select columns — `xsv` (Tier 2 sibling) starts faster and has no install dependency in many environments.
58
+ - The CSV has <10k rows and you only need to slice/select columns — `qsv` (Tier 2 sibling) starts faster and has no install dependency in many environments.
59
59
  - The workload is transactional (writes from multiple clients, ACID across rows) — use SQLite or Postgres; DuckDB is read-optimized OLAP.
60
60
  - A single `jq` filter would do the job (the data is already JSON, the operation is field extraction) — skip the SQL detour.
61
61
 
@@ -63,7 +63,7 @@ Aggregate over a CSV directory; DuckDB streams the read so memory stays bounded.
63
63
 
64
64
  | Tool | When to prefer |
65
65
  |------|----------------|
66
- | `xsv` | Single CSV file, <100MB, just need slice/select/sort. |
66
+ | `qsv` | Single CSV file, <100MB, just need slice/select/sort. |
67
67
  | `sqlite3` | Need OLTP writes or row-level updates rather than analytics. |
68
68
  | `python -m pandas` | Already in a Python script and the data fits in memory. |
69
69
 
@@ -68,6 +68,10 @@ Compact (`-c`) one-object-per-line projection — perfect input for `xargs -L1`
68
68
  | `dasel` | Single binary across JSON/YAML/TOML/XML with a path-query DSL — handy in CI where you do not want jq+yq. |
69
69
  | `fx` | Interactive JSON browsing in a TTY; jq is the right call in scripts. |
70
70
 
71
+ ## Known Issues
72
+
73
+ - **CVE-2026-32316 (active, no tagged fix as of 2026-05-18):** jq 1.8.1 ships with a heap buffer overflow in expression evaluation. Six additional CVEs were disclosed 2026-04-15; patches are committed on `jqlang/jq` `main` but no superseding tagged release exists yet. Do not invoke `jq` on JSON sourced from an untrusted producer (third-party API webhook, user-supplied upload) until a tagged release past 1.8.1 lands. Reference: https://github.com/jqlang/jq/security/advisories.
74
+
71
75
  ## Detection / Install
72
76
 
73
77
  Verify with:
@@ -56,7 +56,7 @@ SQL-style join on `id` between two CSVs, streamed.
56
56
  ## Wrong Choice When
57
57
 
58
58
  - **Multi-gigabyte analytical queries with joins:** `hatch3r-cli-duckdb` (tier 2) has a query planner and parallel scan; mlr is streaming-single-thread.
59
- - **One-column slice or count:** `hatch3r-cli-xsv` (tier 2) is faster for trivial slicing.
59
+ - **One-column slice or count:** `hatch3r-cli-qsv` (tier 2) is faster for trivial slicing.
60
60
  - **Production ETL with schema enforcement:** use a real database or dbt — mlr is a CLI-scratchpad tool.
61
61
 
62
62
  ## Alternatives
@@ -64,7 +64,7 @@ SQL-style join on `id` between two CSVs, streamed.
64
64
  | Tool | When to prefer |
65
65
  |------|----------------|
66
66
  | `hatch3r-cli-duckdb` (tier 2) | Multi-GB data, joins, analytical SQL, Parquet |
67
- | `hatch3r-cli-xsv` (tier 2) | Single-column slice, count, sample on plain CSV |
67
+ | `hatch3r-cli-qsv` (tier 2) | Single-column slice, count, sample on plain CSV |
68
68
  | `hatch3r-cli-csvkit` (tier 3) | SQL-over-CSV with `csvsql`, Python integration |
69
69
 
70
70
  ## Detection / Install
@@ -40,7 +40,7 @@ hatch3r recommends a small set of terminal-native CLI tools agents can call inst
40
40
  | `llm` | `hatch3r-cli-llm` | simonw/llm — invoke LLMs from the command line with prompt templates |
41
41
  | `playwright` | `hatch3r-cli-playwright` | Browser automation, web testing, and UI interaction |
42
42
  | `taplo` | `hatch3r-cli-taplo` | TOML toolkit (format, lint, query) for pyproject.toml / Cargo.toml |
43
- | `xsv` | `hatch3r-cli-xsv` | Fast CSV toolkit (slice, search, join, stats) |
43
+ | `qsv` | `hatch3r-cli-qsv` | Fast CSV toolkit (slice, search, join, stats, 80+ commands) — actively-maintained xsv successor |
44
44
 
45
45
  ## Tier 3 — opt-in advanced
46
46
 
@@ -1,25 +1,27 @@
1
1
  ---
2
- id: hatch3r-cli-xsv
3
- description: "Fast CSV toolkit (slice, search, join, stats). Use when slicing huge CSV documents by row range or column without materialising the dataset; invoke `xsv`. Streams records lazily; works on datasets that exceed available RAM."
2
+ id: hatch3r-cli-qsv
3
+ description: "Fast CSV toolkit (slice, search, join, stats, 80+ commands) — actively-maintained xsv successor. Use when slicing huge CSV documents by row range or column without materialising the dataset; invoke `qsv`. Streams records lazily; works on datasets that exceed available RAM."
4
4
  tags: ["cli-tools", "data"]
5
5
  quality_charter: agents/shared/quality-charter.md
6
6
  efficiency_patterns: agents/shared/efficiency-patterns.md
7
7
  cache_friendly: true
8
8
  cli_tool:
9
- id: xsv
10
- bin: xsv
9
+ id: qsv
10
+ bin: qsv
11
11
  tier: 2
12
12
  category: data
13
- homepage: https://github.com/BurntSushi/xsv
13
+ homepage: https://github.com/jqnatividad/qsv
14
14
  ---
15
15
  <!-- HATCH3R-CLI-SKILL-GENERATED v1 -->
16
- # xsv
16
+ # qsv
17
17
 
18
- Fast CSV toolkit (slice, search, join, stats)
18
+ Fast CSV toolkit (slice, search, join, stats, 80+ commands) — actively-maintained xsv successor
19
19
 
20
20
  ## When to Use
21
21
 
22
- Reach for `xsv` when the task is in the **data** category and the agent would otherwise call an MCP tool or read large outputs into context.
22
+ Reach for `qsv` when the task is in the **data** category and the agent would otherwise call an MCP tool or read large outputs into context.
23
+
24
+ `qsv` is a drop-in superset of `xsv` — every `xsv` sub-command name and flag works under `qsv`, plus 50+ additional commands (`apply`, `fetch`, `validate`, `tojsonl`, `sqlp`, etc.). The upstream `BurntSushi/xsv` repository was archived on 2025-04-24; `jqnatividad/qsv` is the active fork with regular releases.
23
25
 
24
26
  ## Token Cost
25
27
 
@@ -29,39 +31,39 @@ Reference: Anthropic engineering (Nov 4 2025) — code-execution-over-MCP yields
29
31
  ## Recipes
30
32
 
31
33
  ```bash
32
- xsv stats huge.csv
34
+ qsv stats huge.csv
33
35
  ```
34
36
  Per-column min/max/mean/stddev/cardinality — single streaming pass over the file.
35
37
 
36
38
  ```bash
37
- xsv select name,email,active records.csv
39
+ qsv select name,email,active records.csv
38
40
  ```
39
41
  Project a subset of columns without rewriting; output stays CSV for downstream tools.
40
42
 
41
43
  ```bash
42
- xsv sort -s amount records.csv | xsv slice -e 100
44
+ qsv sort -s amount records.csv | qsv slice -e 100
43
45
  ```
44
46
  Sort by `amount` then take the first 100 rows — composable pipe; both stages stream.
45
47
 
46
48
  ```bash
47
- xsv frequency -s status events.csv
49
+ qsv frequency -s status events.csv
48
50
  ```
49
51
  Tabulate value counts for a column; output is itself CSV, parsable by the next step.
50
52
 
51
53
  ```bash
52
- xsv search -s email '@example\.com$' users.csv
54
+ qsv search -s email '@example\.com$' users.csv
53
55
  ```
54
56
  Regex-filter a column — much cheaper than loading the whole file into a SQL engine.
55
57
 
56
58
  ```bash
57
- xsv join id orders.csv id customers.csv > joined.csv
59
+ qsv join id orders.csv id customers.csv > joined.csv
58
60
  ```
59
61
  Hash join two CSVs on a common column without spinning up DuckDB.
60
62
 
61
63
  ## Wrong Choice When
62
64
 
63
65
  - The query needs aggregation across millions of rows or multiple files — DuckDB (Tier 2 sibling) is built for that scan plan.
64
- - You need a multi-way join with type coercion or window functions — `xsv join` is hash-only and untyped; use DuckDB.
66
+ - You need a multi-way join with type coercion or window functions — `qsv join` is hash-only and untyped; use DuckDB.
65
67
  - The data is JSON or Parquet, not CSV — pipe through `jq`/DuckDB instead of CSV-converting first.
66
68
 
67
69
  ## Alternatives
@@ -76,14 +78,14 @@ Hash join two CSVs on a common column without spinning up DuckDB.
76
78
 
77
79
  Verify with:
78
80
  ```bash
79
- command -v xsv
81
+ command -v qsv
80
82
  ```
81
83
 
82
84
  Install (mac):
83
85
 
84
86
  ```bash
85
87
  # brew
86
- brew install xsv
88
+ brew install qsv
87
89
  ```
88
90
 
89
- Homepage: https://github.com/BurntSushi/xsv
91
+ Homepage: https://github.com/jqnatividad/qsv