npm - autonomous-coding-toolkit - Versions diffs - 1.0.0 - Mend

autonomous-coding-toolkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (324) hide show

package/.claude-plugin/marketplace.json +22 -0
package/.claude-plugin/plugin.json +13 -0
package/LICENSE +21 -0
package/Makefile +21 -0
package/README.md +140 -0
package/SECURITY.md +28 -0
package/agents/bash-expert.md +113 -0
package/agents/dependency-auditor.md +138 -0
package/agents/integration-tester.md +120 -0
package/agents/lesson-scanner.md +149 -0
package/agents/python-expert.md +179 -0
package/agents/service-monitor.md +141 -0
package/agents/shell-expert.md +147 -0
package/benchmarks/runner.sh +147 -0
package/benchmarks/tasks/01-rest-endpoint/rubric.sh +29 -0
package/benchmarks/tasks/01-rest-endpoint/task.md +17 -0
package/benchmarks/tasks/02-refactor-module/task.md +8 -0
package/benchmarks/tasks/03-fix-integration-bug/task.md +8 -0
package/benchmarks/tasks/04-add-test-coverage/task.md +8 -0
package/benchmarks/tasks/05-multi-file-feature/task.md +8 -0
package/bin/act.js +238 -0
package/commands/autocode.md +6 -0
package/commands/cancel-ralph.md +18 -0
package/commands/code-factory.md +53 -0
package/commands/create-prd.md +55 -0
package/commands/ralph-loop.md +18 -0
package/commands/run-plan.md +117 -0
package/commands/submit-lesson.md +122 -0
package/docs/ARCHITECTURE.md +630 -0
package/docs/CONTRIBUTING.md +125 -0
package/docs/lessons/0001-bare-exception-swallowing.md +34 -0
package/docs/lessons/0002-async-def-without-await.md +28 -0
package/docs/lessons/0003-create-task-without-callback.md +28 -0
package/docs/lessons/0004-hardcoded-test-counts.md +28 -0
package/docs/lessons/0005-sqlite-without-closing.md +33 -0
package/docs/lessons/0006-venv-pip-path.md +27 -0
package/docs/lessons/0007-runner-state-self-rejection.md +35 -0
package/docs/lessons/0008-quality-gate-blind-spot.md +33 -0
package/docs/lessons/0009-parser-overcount-empty-batches.md +36 -0
package/docs/lessons/0010-local-outside-function-bash.md +33 -0
package/docs/lessons/0011-batch-tests-for-unimplemented-code.md +36 -0
package/docs/lessons/0012-api-markdown-unescaped-chars.md +33 -0
package/docs/lessons/0013-export-prefix-env-parsing.md +33 -0
package/docs/lessons/0014-decorator-registry-import-side-effect.md +43 -0
package/docs/lessons/0015-frontend-backend-schema-drift.md +43 -0
package/docs/lessons/0016-event-driven-cold-start-seeding.md +44 -0
package/docs/lessons/0017-copy-paste-logic-diverges.md +43 -0
package/docs/lessons/0018-layer-passes-pipeline-broken.md +45 -0
package/docs/lessons/0019-systemd-envfile-ignores-export.md +41 -0
package/docs/lessons/0020-persist-state-incrementally.md +44 -0
package/docs/lessons/0021-dual-axis-testing.md +48 -0
package/docs/lessons/0022-jsx-factory-shadowing.md +43 -0
package/docs/lessons/0023-static-analysis-spiral.md +51 -0
package/docs/lessons/0024-shared-pipeline-implementation.md +55 -0
package/docs/lessons/0025-defense-in-depth-all-entry-points.md +65 -0
package/docs/lessons/0026-linter-no-rules-false-enforcement.md +54 -0
package/docs/lessons/0027-jsx-silent-prop-drop.md +64 -0
package/docs/lessons/0028-no-infrastructure-in-client-code.md +49 -0
package/docs/lessons/0029-never-write-secrets-to-files.md +61 -0
package/docs/lessons/0030-cache-merge-not-replace.md +62 -0
package/docs/lessons/0031-verify-units-at-boundaries.md +66 -0
package/docs/lessons/0032-module-lifecycle-subscribe-unsubscribe.md +89 -0
package/docs/lessons/0033-async-iteration-mutable-snapshot.md +72 -0
package/docs/lessons/0034-caller-missing-await-silent-discard.md +65 -0
package/docs/lessons/0035-duplicate-registration-silent-overwrite.md +85 -0
package/docs/lessons/0036-websocket-dirty-disconnect.md +33 -0
package/docs/lessons/0037-parallel-agents-worktree-corruption.md +31 -0
package/docs/lessons/0038-subscribe-no-stored-ref.md +36 -0
package/docs/lessons/0039-fallback-or-default-hides-bugs.md +34 -0
package/docs/lessons/0040-event-firehose-filter-first.md +36 -0
package/docs/lessons/0041-ambiguous-base-dir-path-nesting.md +32 -0
package/docs/lessons/0042-spec-compliance-insufficient.md +36 -0
package/docs/lessons/0043-exact-count-extensible-collections.md +32 -0
package/docs/lessons/0044-relative-file-deps-worktree.md +39 -0
package/docs/lessons/0045-iterative-design-improvement.md +33 -0
package/docs/lessons/0046-plan-assertion-math-bugs.md +38 -0
package/docs/lessons/0047-pytest-single-threaded-default.md +37 -0
package/docs/lessons/0048-integration-wiring-batch.md +40 -0
package/docs/lessons/0049-ab-verification.md +41 -0
package/docs/lessons/0050-editing-sourced-files-during-execution.md +33 -0
package/docs/lessons/0051-infrastructure-fixes-cant-self-heal.md +30 -0
package/docs/lessons/0052-uncommitted-changes-poison-quality-gates.md +31 -0
package/docs/lessons/0053-jq-compact-flag-inconsistency.md +31 -0
package/docs/lessons/0054-parser-matches-inside-code-blocks.md +30 -0
package/docs/lessons/0055-agents-compensate-for-garbled-prompts.md +31 -0
package/docs/lessons/0056-grep-count-exit-code-on-zero.md +42 -0
package/docs/lessons/0057-new-artifacts-break-git-clean-gates.md +42 -0
package/docs/lessons/0058-dead-config-keys-never-consumed.md +49 -0
package/docs/lessons/0059-contract-test-shared-structures.md +53 -0
package/docs/lessons/0060-set-e-silent-death-in-runners.md +53 -0
package/docs/lessons/0061-context-injection-dirty-state.md +50 -0
package/docs/lessons/0062-sibling-bug-neighborhood-scan.md +29 -0
package/docs/lessons/0063-one-flag-two-lifetimes.md +31 -0
package/docs/lessons/0064-test-passes-wrong-reason.md +31 -0
package/docs/lessons/0065-pipefail-grep-count-double-output.md +39 -0
package/docs/lessons/0066-local-keyword-outside-function.md +37 -0
package/docs/lessons/0067-stdin-hang-non-interactive-shell.md +36 -0
package/docs/lessons/0068-agent-builds-wrong-thing-correctly.md +31 -0
package/docs/lessons/0069-plan-quality-dominates-execution.md +30 -0
package/docs/lessons/0070-spec-echo-back-prevents-drift.md +31 -0
package/docs/lessons/0071-positive-instructions-outperform-negative.md +30 -0
package/docs/lessons/0072-lost-in-the-middle-context-placement.md +30 -0
package/docs/lessons/0073-unscoped-lessons-cause-false-positives.md +30 -0
package/docs/lessons/0074-stale-context-injection-wrong-batch.md +32 -0
package/docs/lessons/0075-research-artifacts-must-persist.md +32 -0
package/docs/lessons/0076-wrong-decomposition-contaminates-downstream.md +30 -0
package/docs/lessons/0077-cherry-pick-merges-need-manual-resolution.md +30 -0
package/docs/lessons/0078-static-review-without-live-test.md +30 -0
package/docs/lessons/0079-integration-wiring-batch-required.md +32 -0
package/docs/lessons/FRAMEWORK.md +161 -0
package/docs/lessons/SUMMARY.md +201 -0
package/docs/lessons/TEMPLATE.md +85 -0
package/docs/plans/2026-02-21-code-factory-v2-design.md +204 -0
package/docs/plans/2026-02-21-code-factory-v2-implementation-plan.md +2189 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-design.md +537 -0
package/docs/plans/2026-02-21-code-factory-v2-phase4-implementation-plan.md +2012 -0
package/docs/plans/2026-02-21-hardening-pass-design.md +108 -0
package/docs/plans/2026-02-21-hardening-pass-plan.md +1378 -0
package/docs/plans/2026-02-21-mab-research-report.md +406 -0
package/docs/plans/2026-02-21-marketplace-restructure-design.md +240 -0
package/docs/plans/2026-02-21-marketplace-restructure-plan.md +832 -0
package/docs/plans/2026-02-21-phase4-completion-plan.md +697 -0
package/docs/plans/2026-02-21-validator-suite-design.md +148 -0
package/docs/plans/2026-02-21-validator-suite-plan.md +540 -0
package/docs/plans/2026-02-22-mab-research-round2.md +556 -0
package/docs/plans/2026-02-22-mab-run-design.md +462 -0
package/docs/plans/2026-02-22-mab-run-plan.md +2046 -0
package/docs/plans/2026-02-22-operations-design-methodology-research.md +681 -0
package/docs/plans/2026-02-22-research-agent-failure-taxonomy.md +532 -0
package/docs/plans/2026-02-22-research-code-guideline-policies.md +886 -0
package/docs/plans/2026-02-22-research-codebase-audit-refactoring.md +908 -0
package/docs/plans/2026-02-22-research-coding-standards-documentation.md +541 -0
package/docs/plans/2026-02-22-research-competitive-landscape.md +687 -0
package/docs/plans/2026-02-22-research-comprehensive-testing.md +1076 -0
package/docs/plans/2026-02-22-research-context-utilization.md +459 -0
package/docs/plans/2026-02-22-research-cost-quality-tradeoff.md +548 -0
package/docs/plans/2026-02-22-research-lesson-transferability.md +508 -0
package/docs/plans/2026-02-22-research-multi-agent-coordination.md +312 -0
package/docs/plans/2026-02-22-research-phase-integration.md +602 -0
package/docs/plans/2026-02-22-research-plan-quality.md +428 -0
package/docs/plans/2026-02-22-research-prompt-engineering.md +558 -0
package/docs/plans/2026-02-22-research-unconventional-perspectives.md +528 -0
package/docs/plans/2026-02-22-research-user-adoption.md +638 -0
package/docs/plans/2026-02-22-research-verification-effectiveness.md +433 -0
package/docs/plans/2026-02-23-agent-suite-design.md +299 -0
package/docs/plans/2026-02-23-agent-suite-plan.md +578 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-design.md +148 -0
package/docs/plans/2026-02-23-phase3-cost-infrastructure-plan.md +1062 -0
package/docs/plans/2026-02-23-research-bash-expert-agent.md +543 -0
package/docs/plans/2026-02-23-research-dependency-auditor-agent.md +564 -0
package/docs/plans/2026-02-23-research-improving-existing-agents.md +503 -0
package/docs/plans/2026-02-23-research-integration-tester-agent.md +454 -0
package/docs/plans/2026-02-23-research-python-expert-agent.md +429 -0
package/docs/plans/2026-02-23-research-service-monitor-agent.md +425 -0
package/docs/plans/2026-02-23-research-shell-expert-agent.md +533 -0
package/docs/plans/2026-02-23-roadmap-to-completion.md +530 -0
package/docs/plans/2026-02-24-headless-module-split-design.md +98 -0
package/docs/plans/2026-02-24-headless-module-split.md +443 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-design.md +228 -0
package/docs/plans/2026-02-24-lesson-scope-metadata-plan.md +968 -0
package/docs/plans/2026-02-24-npm-packaging-design.md +841 -0
package/docs/plans/2026-02-24-npm-packaging-plan.md +1965 -0
package/docs/plans/audit-findings.md +186 -0
package/docs/telegram-notification-format.md +98 -0
package/examples/example-plan.md +51 -0
package/examples/example-prd.json +72 -0
package/examples/example-roadmap.md +33 -0
package/examples/quickstart-plan.md +63 -0
package/hooks/hooks.json +26 -0
package/hooks/setup-symlinks.sh +48 -0
package/hooks/stop-hook.sh +135 -0
package/package.json +47 -0
package/policies/bash.md +71 -0
package/policies/python.md +71 -0
package/policies/testing.md +61 -0
package/policies/universal.md +60 -0
package/scripts/analyze-report.sh +97 -0
package/scripts/architecture-map.sh +145 -0
package/scripts/auto-compound.sh +273 -0
package/scripts/batch-audit.sh +42 -0
package/scripts/batch-test.sh +101 -0
package/scripts/entropy-audit.sh +221 -0
package/scripts/failure-digest.sh +51 -0
package/scripts/generate-ast-rules.sh +96 -0
package/scripts/init.sh +112 -0
package/scripts/lesson-check.sh +428 -0
package/scripts/lib/common.sh +61 -0
package/scripts/lib/cost-tracking.sh +153 -0
package/scripts/lib/ollama.sh +60 -0
package/scripts/lib/progress-writer.sh +128 -0
package/scripts/lib/run-plan-context.sh +215 -0
package/scripts/lib/run-plan-echo-back.sh +231 -0
package/scripts/lib/run-plan-headless.sh +396 -0
package/scripts/lib/run-plan-notify.sh +57 -0
package/scripts/lib/run-plan-parser.sh +81 -0
package/scripts/lib/run-plan-prompt.sh +215 -0
package/scripts/lib/run-plan-quality-gate.sh +132 -0
package/scripts/lib/run-plan-routing.sh +315 -0
package/scripts/lib/run-plan-sampling.sh +170 -0
package/scripts/lib/run-plan-scoring.sh +146 -0
package/scripts/lib/run-plan-state.sh +142 -0
package/scripts/lib/run-plan-team.sh +199 -0
package/scripts/lib/telegram.sh +54 -0
package/scripts/lib/thompson-sampling.sh +176 -0
package/scripts/license-check.sh +74 -0
package/scripts/mab-run.sh +575 -0
package/scripts/module-size-check.sh +146 -0
package/scripts/patterns/async-no-await.yml +5 -0
package/scripts/patterns/bare-except.yml +6 -0
package/scripts/patterns/empty-catch.yml +6 -0
package/scripts/patterns/hardcoded-localhost.yml +9 -0
package/scripts/patterns/retry-loop-no-backoff.yml +12 -0
package/scripts/pipeline-status.sh +197 -0
package/scripts/policy-check.sh +226 -0
package/scripts/prior-art-search.sh +133 -0
package/scripts/promote-mab-lessons.sh +126 -0
package/scripts/prompts/agent-a-superpowers.md +29 -0
package/scripts/prompts/agent-b-ralph.md +29 -0
package/scripts/prompts/judge-agent.md +61 -0
package/scripts/prompts/planner-agent.md +44 -0
package/scripts/pull-community-lessons.sh +90 -0
package/scripts/quality-gate.sh +266 -0
package/scripts/research-gate.sh +90 -0
package/scripts/run-plan.sh +329 -0
package/scripts/scope-infer.sh +159 -0
package/scripts/setup-ralph-loop.sh +155 -0
package/scripts/telemetry.sh +230 -0
package/scripts/tests/run-all-tests.sh +52 -0
package/scripts/tests/test-act-cli.sh +46 -0
package/scripts/tests/test-agents-md.sh +87 -0
package/scripts/tests/test-analyze-report.sh +114 -0
package/scripts/tests/test-architecture-map.sh +89 -0
package/scripts/tests/test-auto-compound.sh +169 -0
package/scripts/tests/test-batch-test.sh +65 -0
package/scripts/tests/test-benchmark-runner.sh +25 -0
package/scripts/tests/test-common.sh +168 -0
package/scripts/tests/test-cost-tracking.sh +158 -0
package/scripts/tests/test-echo-back.sh +180 -0
package/scripts/tests/test-entropy-audit.sh +146 -0
package/scripts/tests/test-failure-digest.sh +66 -0
package/scripts/tests/test-generate-ast-rules.sh +145 -0
package/scripts/tests/test-helpers.sh +82 -0
package/scripts/tests/test-init.sh +47 -0
package/scripts/tests/test-lesson-check.sh +278 -0
package/scripts/tests/test-lesson-local.sh +55 -0
package/scripts/tests/test-license-check.sh +109 -0
package/scripts/tests/test-mab-run.sh +182 -0
package/scripts/tests/test-ollama-lib.sh +49 -0
package/scripts/tests/test-ollama.sh +60 -0
package/scripts/tests/test-pipeline-status.sh +198 -0
package/scripts/tests/test-policy-check.sh +124 -0
package/scripts/tests/test-prior-art-search.sh +96 -0
package/scripts/tests/test-progress-writer.sh +140 -0
package/scripts/tests/test-promote-mab-lessons.sh +110 -0
package/scripts/tests/test-pull-community-lessons.sh +149 -0
package/scripts/tests/test-quality-gate.sh +241 -0
package/scripts/tests/test-research-gate.sh +132 -0
package/scripts/tests/test-run-plan-cli.sh +86 -0
package/scripts/tests/test-run-plan-context.sh +305 -0
package/scripts/tests/test-run-plan-e2e.sh +153 -0
package/scripts/tests/test-run-plan-headless.sh +424 -0
package/scripts/tests/test-run-plan-notify.sh +124 -0
package/scripts/tests/test-run-plan-parser.sh +217 -0
package/scripts/tests/test-run-plan-prompt.sh +254 -0
package/scripts/tests/test-run-plan-quality-gate.sh +222 -0
package/scripts/tests/test-run-plan-routing.sh +178 -0
package/scripts/tests/test-run-plan-scoring.sh +148 -0
package/scripts/tests/test-run-plan-state.sh +261 -0
package/scripts/tests/test-run-plan-team.sh +157 -0
package/scripts/tests/test-scope-infer.sh +150 -0
package/scripts/tests/test-setup-ralph-loop.sh +63 -0
package/scripts/tests/test-telegram-env.sh +38 -0
package/scripts/tests/test-telegram.sh +121 -0
package/scripts/tests/test-telemetry.sh +46 -0
package/scripts/tests/test-thompson-sampling.sh +139 -0
package/scripts/tests/test-validate-all.sh +60 -0
package/scripts/tests/test-validate-commands.sh +89 -0
package/scripts/tests/test-validate-hooks.sh +98 -0
package/scripts/tests/test-validate-lessons.sh +150 -0
package/scripts/tests/test-validate-plan-quality.sh +235 -0
package/scripts/tests/test-validate-plans.sh +187 -0
package/scripts/tests/test-validate-plugin.sh +106 -0
package/scripts/tests/test-validate-prd.sh +184 -0
package/scripts/tests/test-validate-skills.sh +134 -0
package/scripts/validate-all.sh +57 -0
package/scripts/validate-commands.sh +67 -0
package/scripts/validate-hooks.sh +89 -0
package/scripts/validate-lessons.sh +98 -0
package/scripts/validate-plan-quality.sh +369 -0
package/scripts/validate-plans.sh +120 -0
package/scripts/validate-plugin.sh +86 -0
package/scripts/validate-policies.sh +42 -0
package/scripts/validate-prd.sh +118 -0
package/scripts/validate-skills.sh +96 -0
package/skills/autocode/SKILL.md +285 -0
package/skills/autocode/ab-verification.md +51 -0
package/skills/autocode/code-quality-standards.md +37 -0
package/skills/autocode/competitive-mode.md +364 -0
package/skills/brainstorming/SKILL.md +97 -0
package/skills/capture-lesson/SKILL.md +187 -0
package/skills/check-lessons/SKILL.md +116 -0
package/skills/dispatching-parallel-agents/SKILL.md +110 -0
package/skills/executing-plans/SKILL.md +85 -0
package/skills/finishing-a-development-branch/SKILL.md +201 -0
package/skills/receiving-code-review/SKILL.md +72 -0
package/skills/requesting-code-review/SKILL.md +59 -0
package/skills/requesting-code-review/code-reviewer.md +82 -0
package/skills/research/SKILL.md +145 -0
package/skills/roadmap/SKILL.md +115 -0
package/skills/subagent-driven-development/SKILL.md +98 -0
package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +18 -0
package/skills/subagent-driven-development/implementer-prompt.md +73 -0
package/skills/subagent-driven-development/spec-reviewer-prompt.md +57 -0
package/skills/systematic-debugging/SKILL.md +134 -0
package/skills/systematic-debugging/condition-based-waiting.md +64 -0
package/skills/systematic-debugging/defense-in-depth.md +32 -0
package/skills/systematic-debugging/root-cause-tracing.md +55 -0
package/skills/test-driven-development/SKILL.md +167 -0
package/skills/using-git-worktrees/SKILL.md +219 -0
package/skills/using-superpowers/SKILL.md +54 -0
package/skills/verification-before-completion/SKILL.md +140 -0
package/skills/verify/SKILL.md +82 -0
package/skills/writing-plans/SKILL.md +128 -0
package/skills/writing-skills/SKILL.md +93 -0

package/docs/plans/2026-02-23-research-integration-tester-agent.md ADDED Viewed

@@ -0,0 +1,454 @@
+# Research: Integration Tester Agent for Cross-Service Boundary Validation
+**Date:** 2026-02-23
+**Status:** Complete
+**Confidence:** High on patterns (multiple independent implementations); medium on agent structure (synthesized from field examples, not academic benchmarks)
+**Domain classification:** Complicated (Cynefin) — known engineering patterns exist; adaptation to file-based, systemd-driven architecture requires contextual judgment
+---
+## Executive Summary
+The target system has four known integration seams where individual services pass their own tests but bugs hide at the boundary: aria-hub → engine output files → ha-log-sync logbook; telegram-brief → notion-tools local replica; ollama-queue daemon → 10 Ollama-using timers; and shared env vars across services. This is textbook Cluster B: "Each layer passes its test; bug hides at the seam."
+The field has solved this class of problem with three converging approaches: (1) contract testing at schema/interface boundaries, (2) end-to-end trace following a single value across all hops, and (3) black-box pipeline validation treating inter-service file/DB handoffs as input/output pairs. No existing Claude Code agent covers this for file-based, systemd-driven local architectures — the agent must be purpose-built.
+**Recommended agent structure:** A single orchestrating agent with four specialized sub-probes — one per seam — plus a shared-env audit probe. Each probe executes a vertical trace: inject a known value at the upstream boundary, verify it arrives with correct schema at the downstream boundary, report pass/fail with evidence. No mocks. No internal inspection. Only observable outputs.
+**Recommended agent name:** `integration-tester` — invoked as `/integration-tester` or dispatched via Task tool.
+---
+## Source Analysis
+### Source 1: Airwallex Airtest — Claude Code Subagents for Integration Testing
+**URL:** [From 2 weeks to 2 hours — cutting integration test time using Claude Code Subagents](https://careers.airwallex.com/blog/using-claude-code-subagents/)
+**What they built:** Airtest is an AI-generated, self-healing test platform initiated via a `/airtest` slash command. A General Agent orchestrates a team of specialists:
+- Happy Path Agent (expected functionality)
+- Unhappy Path Agent (error handling / failure modes)
+- State Transition Agent (state changes across calls)
+- Dependency Testing Agent (service interactions)
+- End-to-End Flow Agent (complete workflow validation)
+- Test Reviewer Agent (quality assessment of generated tests)
+- Test Debugging Agent (diagnosis and fix of failing tests)
+- Existing Tests Analysis Agent (coverage gap detection)
+All agents share a persistent knowledge base containing API dependency mapping, business flow documentation, and recent code change impact. Agents have access to Code Search, Text Editor, Read, and Bash tools. The system generated 4,000+ integration tests and enabled 50 APIs to launch safely, reducing test time from 2 weeks to 2 hours.
+**Key patterns to adopt:**
+1. General agent + specialist decomposition. The orchestrator analyzes scope, delegates to the probe that matches the seam type, collects results, and writes a unified report.
+2. The knowledge base pattern — the agent should maintain a `docs/integration-tester/seam-registry.json` mapping each seam's writer, reader, interface schema, and last-verified timestamp.
+3. The Dependency Testing Agent maps directly to the integration-tester's probe model: one probe per service interaction, executing in isolation.
+**What does not transfer:** Their architecture is API-call centric (HTTP request/response). The target system is file-based and systemd-driven. The "inject a request and verify the response" pattern must be adapted to "write a sentinel file at the upstream output path and verify it appears, transformed correctly, at the downstream input path."
+---
+### Source 2: OpenTelemetry Integration Test Pattern — Verify Trace Data Across Service Boundaries
+**URL:** [How to Write Integration Tests That Verify Trace Data with OpenTelemetry](https://oneuptime.com/blog/post/2026-02-06-integration-tests-verify-trace-data-opentelemetry/view)
+**Core pattern:** Integration tests that validate cross-service behavior follow a 5-step sequence:
+1. Clear state — delete previous test data from the shared store
+2. Send request — trigger cross-service workflow via the upstream entry point
+3. Extract trace ID — retrieve the correlation identifier from the response or output
+4. Poll downstream — wait for propagated data with timeout (10-15 seconds typical)
+5. Assert structure — verify trace ID consistency, schema correctness, parent-child relationships, and data completeness
+**Four critical cross-boundary assertions:**
+- Shared correlation identifier (same trace ID across all hops)
+- Service representation (all expected services produced output)
+- Parent-child integrity (no orphaned references; each downstream output references the upstream source)
+- Count sufficiency (expected number of artifacts arrive — e.g., 3 spans, 2 files, 1 DB row)
+The polling-with-timeout pattern is critical because "spans may take a moment to be processed and exported" — the same is true of file-based pipelines where the upstream timer writes asynchronously.
+**Key patterns to adopt:**
+1. The 5-step sequence maps exactly to file-based integration testing: clear sentinel, trigger upstream, extract correlation ID embedded in output filename or content, poll downstream directory with timeout, assert schema.
+2. The "clear state before each trace" principle prevents false positives from leftover artifacts — each probe run starts with a clean slate.
+3. Deterministic correlation IDs: the agent injects a known sentinel value (e.g., `INTEGRATION_TEST_PROBE_2026-02-23T10:00:00Z`) into upstream data, then searches for that exact string downstream. This makes assertion unambiguous.
+**What does not transfer:** OpenTelemetry's W3C `traceparent` headers and span exporters are specific to HTTP/RPC. File-based pipelines need correlation via embedded payload markers, not headers. The polling mechanism translates directly — poll a directory or file for the sentinel value instead of polling a trace backend.
+---
+### Source 3: Great Expectations — Black-Box Pipeline Validation
+**URL:** [put-data-pipeline-under-test-with-pytest-and-great-expectations](https://github.com/greatexpectationslabs/put-data-pipeline-under-test-with-pytest-and-great-expectations)
+**Core pattern:** Treat a pipeline as a black box. Prepare a known input dataset. Assert the output against a declarative specification. No internal inspection. Tests are parametrized from a JSON config:
+```json
+{
+  "test_cases": [{
+    "title": "logbook entries contain required fields",
+    "input_file_path": "tests/fixtures/logbook-2026-02-23.json",
+    "expectations_config_path": "tests/fixtures/logbook-schema.json"
+  }]
+}
+```
+The test runner reads config, executes the pipeline against each fixture, validates output against the expectations file, and reports failures with detailed assertion messages.
+**Key patterns to adopt:**
+1. The declarative expectations model — each seam should have a schema file (`seam-aria-engine-output.schema.json`, `seam-notion-replica.schema.json`) that serves as the contract. The integration-tester agent reads the schema and validates downstream output against it.
+2. Fixture-based testing — the agent creates synthetic fixtures that represent minimal valid upstream output, injects them into the pipeline's input path, and checks downstream output. Decouples the integration test from live upstream timing.
+3. The black-box principle — the agent never inspects internal service state (Python objects, in-memory caches). It only reads files, checks DB tables, or calls health endpoints that the service exposes externally.
+**What does not transfer:** Great Expectations is a Python library for data validation, not an agent framework. The integration-tester agent uses its conceptual pattern (expectations-as-contracts) but implements it directly via Bash file inspection and Python schema validators, not the GX library itself.
+---
+### Source 4: Pact — Consumer-Driven Contract Testing
+**URL:** [Pact Documentation](https://docs.pact.io/) | [Contract Testing Best Practices 2025](https://www.sachith.co.uk/contract-testing-with-pact-best-practices-in-2025-practical-guide-feb-10-2026/)
+**Core concept:** A consumer (downstream service) defines what it expects from a provider (upstream service) in a contract. The provider verifies it meets those expectations. Only the parts of the interface actually consumed get tested — changes to unconsummed behavior don't break tests.
+**Two key principles:**
+1. Consumer-driven: the reader's needs define the contract, not the writer's full output schema. This prevents over-specification.
+2. Can-I-Deploy check: before deploying any service, verify the latest version satisfies all consumer contracts. Gate deployments on contract compliance.
+**Key patterns to adopt:**
+1. The consumer-driven framing is exactly right for the target system: telegram-brief defines what it needs from the notion-tools replica (which fields, which structure); aria-hub defines what it needs from the engine output files. The integration-tester agent validates that the provider (upstream writer) satisfies these declared needs.
+2. The contract-as-artifact pattern: store contracts in `docs/integration-tester/contracts/` as JSON files. The agent reads them to know what to assert. Contracts evolve with the system. This also serves as living documentation of cross-service dependencies.
+3. Schema validation mode — for the file-based target system, use JSON Schema drafts rather than the Pact wire format. Each contract file specifies: `producer`, `consumer`, `interface_path` (file path or DB table), `schema` (JSON Schema), `freshness_ttl_minutes` (how old the file can be before it's stale).
+**What does not transfer:** Pact's HTTP mock server and broker are for API-based systems. The target system needs file-path and SQLite-table contracts, not HTTP request/response pairs.
+---
+### Source 5: Microservices Testing Honeycomb — Integration at Every Seam
+**URL:** [microservices-testing · GitHub Topics](https://github.com/topics/microservices-testing)
+**Core pattern:** Spotify's honeycomb model for microservices prioritizes integration tests over unit tests because individual units are trivially simple — the complexity lives at service-to-service boundaries. Each boundary gets its own integration test. Unit tests are minimal. E2E tests are rare.
+Applied to the target system:
+- Individual service unit tests already exist (ha-aria/tests/, ollama-queue/tests/)
+- The gap is integration tests at the four seams
+- E2E ("did the whole system produce a valid HA automation suggestion today") is too slow for daily validation
+**Key patterns to adopt:**
+1. One test per seam, not one test for the whole system. The four seams in the target system each have different writers, readers, file formats, and timing characteristics. Bundling them into one test makes failures unattributable.
+2. The honeycomb framing justifies the integration-tester as a first-class component, not an afterthought. Each seam test is as important as any unit test.
+---
+### Source 6: Seam Theory — Michael Feathers, Working Effectively with Legacy Code
+**URL:** [Seams | Testing Effectively With Legacy Code | InformIT](https://www.informit.com/articles/article.aspx?p=359417&seqNum=2)
+**Core concept:** A seam is "a place where you can alter behavior in your program without editing in that place." For integration testing, seams are the boundaries where one service hands off data to another. The key insight: at a seam, you can insert test probes without modifying the services themselves.
+**Three seam types (adapted to the target system):**
+| Feathers Seam Type | Target System Equivalent |
+|-------------------|-------------------------|
+| Link seams (swap library implementations) | Swap real output file paths with test fixture paths |
+| Object seams (inject mock objects via interfaces) | Inject sentinel values into DB tables or file content |
+| Preprocessing seams (use macros/env vars to alter behavior) | Set `INTEGRATION_TEST_MODE=1` env var to redirect output paths |
+**Key patterns to adopt:**
+1. The link seam pattern: the integration-tester does not need to run the full service. It can write a synthetic upstream artifact at the path the downstream service reads from, then verify the downstream service correctly consumes it. This decouples probe timing from timer schedules.
+2. The seam-as-attachment-point: each seam is a specific file path, DB table, or API endpoint that both services agree on. The integration contract IS the seam definition.
+---
+### Source 7: VoltAgent Awesome Claude Code Subagents — test-automator Pattern
+**URL:** [awesome-claude-code-subagents](https://github.com/VoltAgent/awesome-claude-code-subagents) | [test-automator agent](https://github.com/VoltAgent/awesome-claude-code-subagents/blob/main/categories/04-quality-security/test-automator.md)
+**Core pattern:** The test-automator agent is a senior test automation engineer persona with a constrained tool set (Bash, Read, Write, Grep) and a structured execution mandate. It:
+1. Analyzes codebase architecture and existing coverage (Existing Tests Analysis pattern)
+2. Generates diverse test cases using Equivalence Class Partitioning and Boundary Value Analysis
+3. Executes tests autonomously and debugs failures in a loop
+4. Reports structured results
+The wshobson/agents repo adds the full-stack-orchestration pattern: orchestrator chains `backend-architect → database-architect → frontend-developer → test-automator → security-auditor → deployment-engineer → observability-engineer` for boundary validation during feature development.
+**Key patterns to adopt:**
+1. The constrained tool set is essential. The integration-tester agent should have exactly: Bash (for file inspection, systemd queries, Python one-liners), Read (for contract and log files), Grep (for sentinel value search). No Write beyond test artifacts. No edit of service code.
+2. The Equivalence Class Partitioning principle: test one normal case and one failure case at each seam. For aria-engine → ha-log-sync: normal (today's log file exists, schema valid) + failure (log file missing, simulate stale sync).
+3. Structured result format: each probe emits a standard record — `seam_id`, `status` (PASS/FAIL/SKIP), `evidence` (what was checked), `latency_seconds`, `timestamp`. The orchestrator aggregates into a summary report.
+---
+### Source 8: Systemd Health Check Patterns
+**URL:** [How to monitor systemd service liveness | Netdata](https://www.netdata.cloud/blog/systemd-service-liveness/) | [Monitoring SystemD services with Healthchecks.io](https://passbe.com/2022/healthchecks-io-systemd-checks/)
+**Core pattern:** Systemd services expose health via three mechanisms: (1) active/failed state (`systemctl is-active`), (2) journal log recency (`journalctl -u <service> --since "5 min ago"`), (3) output file freshness (mtime check on the last-written artifact).
+For the integration-tester, these become pre-probe health checks: before tracing through a seam, verify the upstream service is alive and has produced a recent artifact. A dead upstream service means the probe can fail fast with a clear cause rather than timing out.
+**Key patterns to adopt:**
+1. Pre-probe health check sequence: `systemctl is-active <service>` → check artifact mtime → proceed to trace. If pre-check fails, report `status: SKIP` with cause instead of running the trace and producing a false failure.
+2. Journal log parsing as evidence: after a probe run, the agent should grep the relevant service's journal for error-level entries in the probe window. This catches failures that don't manifest in output files (e.g., service ran but silently dropped records).
+---
+## Synthesis: Best Patterns to Adopt
+### Pattern 1: Sentinel Value Injection (Primary)
+Inject a uniquely identifiable value at the upstream boundary of each seam. Verify it arrives at the downstream boundary. This is the core Cluster B trap: the sentinel reveals whether data actually flows across the seam, not just whether each side can process data in isolation.
+Implementation for file-based seams:
+```python
+sentinel = f"INTEGRATION_PROBE_{datetime.utcnow().isoformat()}Z"
+# Write sentinel into upstream artifact (or verify it exists in live data)
+# Wait with timeout for sentinel to appear downstream
+# Assert schema validity of the downstream artifact containing the sentinel
+```
+For seams that cannot accept injected data (live telemetry pipelines), use the freshness + schema check pattern instead: verify the downstream artifact was written within the expected window AND matches the declared schema.
+### Pattern 2: Declarative Contracts as Source of Truth
+Store one contract per seam in `docs/integration-tester/contracts/`. Each contract declares:
+```json
+{
+  "seam_id": "aria-engine-to-hub",
+  "producer_service": "aria-engine (systemd timer)",
+  "consumer_service": "aria-hub (aria serve)",
+  "interface_path": "~/ha-logs/intelligence/",
+  "interface_type": "file_directory",
+  "schema_file": "contracts/schemas/aria-engine-output.schema.json",
+  "freshness_ttl_minutes": 1440,
+  "probe_strategy": "freshness_and_schema",
+  "notes": "Engine writes daily; hub reads on demand. Stale = engine timer failed."
+}
+```
+The contract is the agent's instruction set. Adding a new seam = adding a contract file. No code changes.
+### Pattern 3: Pre-Probe Health Check + 5-Step Trace
+Each probe follows this sequence (adapted from OpenTelemetry integration test pattern):
+1. **Health check** — `systemctl is-active <producer_service>` + artifact mtime check
+2. **Clear state** — if using sentinel injection, ensure no prior sentinels pollute the check
+3. **Trigger / Observe** — inject sentinel or identify latest live artifact
+4. **Poll downstream** — with configurable timeout (default 30s for file-based, 5s for DB-based)
+5. **Assert** — schema validation + freshness + sentinel presence (where applicable)
+### Pattern 4: Black-Box Only, No Internal Inspection
+The agent never reads Python source code of services or inspects in-memory state. It only reads:
+- Files at declared interface paths
+- SQLite DB tables (as flat files via `sqlite3` CLI)
+- systemd journal output
+- HTTP health endpoints where exposed
+This forces the contracts to be complete — if the agent cannot verify a seam from external observables, the seam lacks a proper external interface and that is itself a finding.
+### Pattern 5: Structured Result Emission
+Every probe emits:
+```json
+{
+  "seam_id": "aria-engine-to-hub",
+  "timestamp": "2026-02-23T10:00:00Z",
+  "status": "PASS",
+  "checks": [
+    {"name": "producer_alive", "result": "PASS", "evidence": "systemctl is-active aria-engine: active"},
+    {"name": "artifact_freshness", "result": "PASS", "evidence": "current.json mtime 14 min ago, TTL 1440 min"},
+    {"name": "schema_valid", "result": "PASS", "evidence": "validated against aria-engine-output.schema.json, 0 errors"},
+    {"name": "downstream_reachable", "result": "PASS", "evidence": "aria hub API /health returned 200"}
+  ],
+  "latency_seconds": 2.3,
+  "failures": []
+}
+```
+The orchestrator aggregates all probe results into a Markdown summary report.
+### Pattern 6: Shared Env Var Audit Probe
+A fifth probe type — not a seam trace but an env var consistency audit. For each shared variable in `~/.env`:
+- Identify all services that consume it (grep service files for variable name)
+- Verify the variable is set and non-empty in the loaded environment
+- Check that each consuming service is active (alive and recently active)
+This catches the "key was rotated in ~/.env but one service still has the old value baked in" class of failure. It also catches services that expect a variable the env file no longer provides.
+---
+## Recommended Agent Structure
+### Agent Identity
+**File:** `~/.claude/agents/integration-tester.md` (or `agents/integration-tester.md` in the toolkit)
+**Invocation:** Task tool (`integration-tester`) or `/integration-tester` slash command
+**Model:** sonnet (diagnostic reasoning; not architecture-level complexity)
+**Tools:** Bash, Read, Grep (no Write beyond `/tmp/integration-tester-results/`)
+### Agent Prompt Structure
+```markdown
+# Integration Tester Agent
+You are an integration boundary tester. Your job is to verify that data flows correctly
+across service seams — not that individual services work, but that handoffs between them
+work. You catch Cluster B bugs: the upstream passes its test, the downstream passes its test,
+but the data never arrives correctly at the seam.
+## Operating Principles
+1. Black box only. Never read service source code to infer behavior. Only check external
+   observables: files, DB tables, HTTP endpoints, systemd status.
+2. Evidence-based assertions. Every PASS and FAIL must include quoted evidence (file content,
+   command output, timestamp). No inferred assertions.
+3. One probe per seam. Do not bundle multiple seams into one check — failures must be
+   unambiguously attributable.
+4. Fail fast with cause. If a pre-probe health check fails (service down, no recent artifact),
+   report SKIP with cause. Do not run the full trace and report a misleading FAIL.
+5. No side effects. Do not write to live service data paths. Test artifacts go to /tmp/.
+## Seam Inventory
+Load contracts from: docs/integration-tester/contracts/*.json
+Run each contract's probe strategy in sequence.
+Aggregate results into: /tmp/integration-tester-results/report-<timestamp>.md
+## Probe Strategies
+### freshness_and_schema
+1. Check producer service is active (systemctl is-active)
+2. Find most recent artifact at interface_path
+3. Check artifact mtime is within freshness_ttl_minutes
+4. Validate artifact schema against schema_file
+5. PASS if all checks pass; FAIL with evidence on any failure
+### sentinel_injection
+1. Check producer service is active
+2. Write sentinel file to producer's output staging area (if writable)
+3. Wait up to timeout_seconds for sentinel to propagate to consumer's input path
+4. Validate propagated artifact schema
+5. Clean up sentinel artifacts
+### db_row_trace
+1. Check producer service is active
+2. Query producer DB table for most recent row
+3. Extract correlation ID from row
+4. Query consumer DB table for row with matching correlation ID
+5. Assert schema of consumer row
+### env_audit
+1. Read ~/.env for declared variables
+2. For each variable, grep ~/.config/systemd/user/*.service for consumers
+3. Verify variable is non-empty in current environment (source ~/.env)
+4. Verify each consuming service is active
+5. Report any mismatch between declared variables and consuming services
+## Output Format
+Write a Markdown report with:
+- Summary table (seam_id, status, latency)
+- Per-seam detail section with evidence
+- Action items for each FAIL
+```
+### Seam Registry (Four Target Seams)
+| Seam ID | Producer | Interface | Consumer | Probe Strategy | Key Risk |
+|---------|----------|-----------|----------|---------------|----------|
+| `aria-engine-to-hub` | aria engine timers | `~/ha-logs/intelligence/` (JSON files) | aria hub | freshness_and_schema | Engine timer fails silently; hub reads stale data |
+| `ha-log-sync-to-engine` | ha-log-sync timer | `~/ha-logs/logbook/` (JSON files) | aria engine | freshness_and_schema | Sync fails; engine trains on missing or partial logbook |
+| `telegram-brief-to-notion` | notion-tools sync timer | `~/Documents/notion/` (directory) | telegram-brief | freshness_and_schema | Notion sync fails; brief references stale local replica |
+| `ollama-queue-to-timers` | ollama-queue daemon | `~/.local/share/ollama-queue/queue.db` | 10 Ollama-using timers | db_row_trace | Queue daemon down; timers silently fail on submit |
+Plus the cross-cutting env audit probe targeting: `HA_URL`, `HA_TOKEN`, `TELEGRAM_BOT_TOKEN`, `CHAT_ID`, `NOTION_API_KEY`.
+### Contract Files to Create
+```
+docs/integration-tester/
+├── README.md                          — seam inventory and probe strategy guide
+├── contracts/
+│   ├── aria-engine-to-hub.json
+│   ├── ha-log-sync-to-engine.json
+│   ├── telegram-brief-to-notion.json
+│   ├── ollama-queue-to-timers.json
+│   └── env-audit.json
+└── schemas/
+    ├── aria-engine-output.schema.json
+    ├── ha-logbook-entry.schema.json
+    ├── notion-replica-index.schema.json
+    └── ollama-queue-job.schema.json
+```
+### Slash Command
+**File:** `commands/integration-tester.md`
+```markdown
+# Integration Tester
+Runs integration boundary probes across all registered seams.
+Usage:
+- `/integration-tester` — run all probes
+- `/integration-tester seam <seam-id>` — run one probe
+- `/integration-tester env` — run env audit only
+Reads contracts from: docs/integration-tester/contracts/
+Writes report to: /tmp/integration-tester-results/report-<timestamp>.md
+```
+---
+## Implementation Priority
+| Priority | Task | Rationale |
+|----------|------|-----------|
+| 1 | Write seam contracts + schemas for all 4 seams | Contracts are the agent's source of truth; nothing else works without them |
+| 2 | Implement `freshness_and_schema` probe | Covers 3 of 4 seams; highest immediate value |
+| 3 | Implement env audit probe | Catches the env-rotation-breaks-multiple-services failure class |
+| 4 | Write the agent prompt file | Orchestrates the probes |
+| 5 | Create slash command | Invocation convenience |
+| 6 | Implement `db_row_trace` for ollama-queue | Requires sqlite3 query against live DB; more complex |
+| 7 | Wire into quality-gate.sh (optional) | Run integration probe on deploy; not blocking |
+Confidence: High on priority 1-4 (clear requirements, known patterns). Medium on priority 6 (sqlite3 schema must be verified against actual queue.db structure first). Low on priority 7 (integration into quality gate increases gate latency; may be better as a separate daily check).
+---
+## Risks and Open Questions
+**Risk 1: Live data timing.** The freshness_and_schema probe depends on the upstream service having run recently. If the integration tester is run during a dead period (service timer hasn't fired in 24+ hours due to machine sleep), the probe will FAIL for timing reasons unrelated to the seam health. Mitigation: use freshness_ttl_minutes conservatively (e.g., 1440 minutes = 24h for daily timers) and distinguish "stale" from "invalid schema."
+**Risk 2: Sentinel injection side effects.** Writing sentinel files to upstream output directories could confuse the downstream service if the sentinel is malformed. Mitigation: the sentinel strategy should only be used for seams where a test-flag file path can be agreed upon (e.g., `~/ha-logs/intelligence/INTEGRATION_TEST_PROBE.json` — file the hub ignores by naming convention). For production seams, use freshness_and_schema (read-only) instead.
+**Risk 3: Schema drift.** If the upstream service changes its output format without updating the contract schema file, the probe fails on every run — not because the seam is broken but because the contract is stale. Mitigation: the agent should detect schema validation failures and suggest running `update-contract --seam <id>` to regenerate the schema from the current live artifact. Add schema update to the service's deploy checklist.
+**Open question:** Should the integration-tester run continuously (systemd timer, every 30min) or on-demand? Given the file-based, timer-driven architecture, the seams produce data on 15-minute to daily intervals. A 30-minute continuous probe would generate mostly SKIP results for intra-day intervals. Recommendation: run on-demand (slash command) and once daily (systemd timer at 07:00 after the overnight batch timers complete).
+---
+## Sources
+- [Create custom subagents - Claude Code Docs](https://code.claude.com/docs/en/sub-agents)
+- [From 2 weeks to 2 hours — cutting integration test time using Claude Code Subagents (Airwallex)](https://careers.airwallex.com/blog/using-claude-code-subagents/)
+- [How to Write Integration Tests That Verify Trace Data with OpenTelemetry (OneUptime)](https://oneuptime.com/blog/post/2026-02-06-integration-tests-verify-trace-data-opentelemetry/view)
+- [put-data-pipeline-under-test-with-pytest-and-great-expectations (Great Expectations Labs)](https://github.com/greatexpectationslabs/put-data-pipeline-under-test-with-pytest-and-great-expectations)
+- [great_expectations — Always know what to expect from your data](https://github.com/great-expectations/great_expectations)
+- [Pact — Introduction](https://docs.pact.io/)
+- [Contract testing with Pact — Best Practices in 2025](https://www.sachith.co.uk/contract-testing-with-pact-best-practices-in-2025-practical-guide-feb-10-2026/)
+- [Contract Testing vs. Schema Testing (Pactflow)](https://pactflow.io/blog/contract-testing-using-json-schemas-and-open-api-part-1/)
+- [Seams | Testing Effectively With Legacy Code (InformIT / Michael Feathers)](https://www.informit.com/articles/article.aspx?p=359417&seqNum=2)
+- [awesome-claude-code-subagents (VoltAgent)](https://github.com/VoltAgent/awesome-claude-code-subagents)
+- [Intelligent automation and multi-agent orchestration for Claude Code (wshobson/agents)](https://github.com/wshobson/agents)
+- [Claude Code QA agents (darcyegb/ClaudeCodeAgents)](https://github.com/darcyegb/ClaudeCodeAgents)
+- [OpenTelemetry Context Propagation](https://opentelemetry.io/docs/concepts/context-propagation/)
+- [Distributed Tracing Tools for Microservices 2026 (SigNoz)](https://signoz.io/blog/distributed-tracing-tools/)
+- [How to monitor systemd service liveness (Netdata)](https://www.netdata.cloud/blog/systemd-service-liveness/)
+- [Monitoring SystemD services with Healthchecks.io](https://passbe.com/2022/healthchecks-io-systemd-checks/)
+- [microservices-testing examples (andreschaffer/microservices-testing-examples)](https://github.com/andreschaffer/microservices-testing-examples)