npm - @guilz-dev/sdlc-gh - Versions diffs - 0.1.0 - Mend

@guilz-dev/sdlc-gh 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (176) hide show

package/.github/CODEOWNERS +5 -0
package/.github/ISSUE_TEMPLATE/bug_report.yml +68 -0
package/.github/ISSUE_TEMPLATE/config.yml +1 -0
package/.github/ISSUE_TEMPLATE/feature_request.yml +39 -0
package/.github/ISSUE_TEMPLATE/support.yml +56 -0
package/.github/ISSUE_TEMPLATE/task.yml +89 -0
package/.github/agents/implementer.agent.md +17 -0
package/.github/agents/reviewer.agent.md +18 -0
package/.github/agents/triager.agent.md +13 -0
package/.github/aw/actions-lock.json +9 -0
package/.github/copilot-instructions.md +35 -0
package/.github/hooks/hooks.json +12 -0
package/.github/instructions/core.instructions.md +11 -0
package/.github/instructions/profiles/go.instructions.md +10 -0
package/.github/instructions/profiles/php.instructions.md +11 -0
package/.github/instructions/profiles/python.instructions.md +11 -0
package/.github/instructions/profiles/ruby.instructions.md +11 -0
package/.github/instructions/profiles/typescript.instructions.md +11 -0
package/.github/labels.yml +55 -0
package/.github/pull_request_template.md +33 -0
package/.github/ruleset.example.json +33 -0
package/.github/ruleset.harness-eval.example.json +29 -0
package/.github/skills/quality-loop/SKILL.md +23 -0
package/.github/workflows/agent-retry-orchestrator.yml +161 -0
package/.github/workflows/copilot-setup-steps.yml +64 -0
package/.github/workflows/eval-ci.yml +169 -0
package/.github/workflows/eval-drift.yml +75 -0
package/.github/workflows/gh-aw-dogfood-ci.yml +73 -0
package/.github/workflows/harness-ci.yml +244 -0
package/.github/workflows/harness-sync.yml +28 -0
package/.github/workflows/l1-readiness-check.yml +45 -0
package/.github/workflows/labels-sync.yml +24 -0
package/.github/workflows/nightly-harness-review.lock.yml +1643 -0
package/.github/workflows/nightly-harness-review.md +87 -0
package/.github/workflows/nightly-harness-review.yml +63 -0
package/.github/workflows/npm-publish.yml +49 -0
package/.github/workflows/pr-context-comment.yml +138 -0
package/.github/workflows/product-ci-go.yml +33 -0
package/.github/workflows/product-ci-php.yml +39 -0
package/.github/workflows/product-ci-python.yml +34 -0
package/.github/workflows/product-ci-ruby.yml +35 -0
package/.github/workflows/product-ci-ts.yml +37 -0
package/.github/workflows/task-issue-label-sync.yml +50 -0
package/.github/workflows/weekly-redteam.lock.yml +1571 -0
package/.github/workflows/weekly-redteam.md +76 -0
package/.github/zizmor.yml +11 -0
package/AGENTS.md +54 -0
package/LICENSE +21 -0
package/README.md +366 -0
package/config/stacks.json +55 -0
package/docs/adoption.md +126 -0
package/docs/arch.md +535 -0
package/docs/auth-boundaries.md +16 -0
package/docs/coding-agent-l1.md +152 -0
package/docs/exceptions/README.md +25 -0
package/docs/exceptions/TEMPLATE.md +8 -0
package/docs/failure-taxonomy.md +23 -0
package/docs/gh-aw-dogfood.md +109 -0
package/docs/kpi-baseline.md +9 -0
package/docs/nightly-harness-review.md +94 -0
package/docs/operations.md +108 -0
package/docs/publishing.md +79 -0
package/docs/revert-playbook.md +44 -0
package/docs/shared-config.md +30 -0
package/docs/telemetry-artifacts.md +78 -0
package/docs/telemetry-schema.md +60 -0
package/evals/.score-baseline.json +6 -0
package/evals/e2e-bench/README.md +28 -0
package/evals/e2e-bench/manifest.json +16 -0
package/evals/e2e-bench/tasks/e2e-001.yml +10 -0
package/evals/e2e-bench/tasks/e2e-002.yml +11 -0
package/evals/e2e-bench/tasks/e2e-003.yml +10 -0
package/evals/e2e-bench/tasks/e2e-004.yml +14 -0
package/evals/e2e-bench/tasks/e2e-005.yml +11 -0
package/evals/e2e-bench/tasks/e2e-006.yml +10 -0
package/evals/e2e-bench/tasks/e2e-007.yml +10 -0
package/evals/e2e-bench/tasks/e2e-008.yml +10 -0
package/evals/e2e-bench/tasks/e2e-009.yml +10 -0
package/evals/trajectories/rubric.md +12 -0
package/evals/trajectories/test_harness_conventions.py +271 -0
package/infra/README.md +49 -0
package/infra/langfuse/docker-compose.yml +25 -0
package/infra/otel/collector-config.yml +24 -0
package/infra/samples/gh-aw-dogfood-report.json +44 -0
package/infra/samples/harness-review-routing-plan.json +19 -0
package/infra/samples/harness-review-summary.json +61 -0
package/infra/samples/telemetry-artifact.json +29 -0
package/infra/samples/telemetry-payload.json +19 -0
package/package.json +85 -0
package/prompts/triager-classify.prompt.yml +10 -0
package/sample/go/add.go +5 -0
package/sample/go/add_test.go +9 -0
package/sample/go/go.mod +3 -0
package/sample/php/composer.json +26 -0
package/sample/php/composer.lock +1881 -0
package/sample/php/phpunit.xml +8 -0
package/sample/php/src/Add.php +13 -0
package/sample/php/tests/AddTest.php +16 -0
package/sample/python/requirements-dev.txt +2 -0
package/sample/python/src/__init__.py +0 -0
package/sample/python/src/greet.py +3 -0
package/sample/python/tests/conftest.py +4 -0
package/sample/python/tests/test_greet.py +5 -0
package/sample/ruby/.rubocop.yml +10 -0
package/sample/ruby/Gemfile +6 -0
package/sample/ruby/Gemfile.lock +58 -0
package/sample/ruby/lib/add.rb +9 -0
package/sample/ruby/spec/add_spec.rb +11 -0
package/sample/ts/biome.json +6 -0
package/sample/ts/package-lock.json +1763 -0
package/sample/ts/package.json +15 -0
package/sample/ts/src/add.ts +3 -0
package/sample/ts/tests/add.test.ts +8 -0
package/sample/ts/tsconfig.json +12 -0
package/scripts/aggregate-harness-review.mjs +48 -0
package/scripts/bootstrap-harness.sh +411 -0
package/scripts/check-diff-size.mjs +46 -0
package/scripts/check-e2e-manifest.mjs +35 -0
package/scripts/check-eval-score-drift.mjs +31 -0
package/scripts/check-gh-aw-dogfood-scope.mjs +51 -0
package/scripts/check-issue-spec.mjs +215 -0
package/scripts/check-l1-readiness.mjs +82 -0
package/scripts/check-open-pr-limit.mjs +34 -0
package/scripts/doctor.mjs +177 -0
package/scripts/emit-gh-aw-dogfood-report.mjs +112 -0
package/scripts/emit-telemetry-artifact.mjs +99 -0
package/scripts/fetch-telemetry-artifacts.mjs +176 -0
package/scripts/harness-drift-report.mjs +99 -0
package/scripts/lib/bootstrap-copy.mjs +123 -0
package/scripts/lib/ccsd-contract.mjs +212 -0
package/scripts/lib/diff-size.mjs +103 -0
package/scripts/lib/doctor-local.mjs +179 -0
package/scripts/lib/e2e-manifest.mjs +76 -0
package/scripts/lib/gh-aw-dogfood.mjs +293 -0
package/scripts/lib/github-config.mjs +94 -0
package/scripts/lib/harness-ci-fragments.mjs +98 -0
package/scripts/lib/harness-review-routing.mjs +244 -0
package/scripts/lib/harness-review.mjs +388 -0
package/scripts/lib/issue-form-label-sync.mjs +56 -0
package/scripts/lib/l1-readiness.mjs +258 -0
package/scripts/lib/merge-harness-package.mjs +36 -0
package/scripts/lib/npm-package.mjs +129 -0
package/scripts/lib/setup-wizard.mjs +224 -0
package/scripts/lib/stacks.mjs +138 -0
package/scripts/lib/telemetry-artifact.mjs +253 -0
package/scripts/lib/template-root.mjs +39 -0
package/scripts/merge-harness-package.mjs +14 -0
package/scripts/route-harness-review.mjs +168 -0
package/scripts/run-e2e-bench.mjs +216 -0
package/scripts/sdlc-gh-cli.mjs +91 -0
package/scripts/select-eval-jobs.mjs +41 -0
package/scripts/setup-github.mjs +242 -0
package/scripts/setup-github.sh +4 -0
package/scripts/setup-wizard.mjs +426 -0
package/scripts/test-bootstrap-guidance-scenarios.mjs +94 -0
package/scripts/test-diff-size-scenarios.mjs +88 -0
package/scripts/test-doctor-scenarios.mjs +70 -0
package/scripts/test-e2e-manifest-scenarios.mjs +65 -0
package/scripts/test-gh-aw-dogfood-scenarios.mjs +74 -0
package/scripts/test-harness-review-routing-scenarios.mjs +130 -0
package/scripts/test-harness-review-scenarios.mjs +92 -0
package/scripts/test-hooks-scenarios.mjs +44 -0
package/scripts/test-issue-form-label-sync-scenarios.mjs +48 -0
package/scripts/test-issue-spec-scenarios.mjs +258 -0
package/scripts/test-l1-readiness-scenarios.mjs +204 -0
package/scripts/test-merge-harness-package-scenarios.mjs +53 -0
package/scripts/test-npm-package-scenarios.mjs +31 -0
package/scripts/test-sdlc-gh-cli-scenarios.mjs +54 -0
package/scripts/test-setup-github-scenarios.mjs +103 -0
package/scripts/test-setup-wizard-scenarios.mjs +114 -0
package/scripts/test-telemetry-artifact-scenarios.mjs +69 -0
package/scripts/trim-harness-ci.mjs +18 -0
package/scripts/validate-gh-aw-compile.mjs +64 -0
package/scripts/validate-harness.mjs +199 -0
package/scripts/validate-telemetry.mjs +21 -0
package/scripts/verify-bootstrap-stacks.sh +192 -0

package/docs/shared-config.md ADDED Viewed

@@ -0,0 +1,30 @@
+# Optional shared config repository
+For organizations with multiple product repos, use a **shared config repo** to distribute common agents and skills.
+## Layout
+```text
+org-harness-shared/
+├── .github/
+│   └── agents/
+│   └── skills/
+└── README.md
+```
+## Distribution options
+1. **Manual copy** — periodic sync of `agents/` and `skills/` into product repos
+2. **Subtree** — `git subtree pull` from shared repo into `.github/`
+3. **harness-sync.yml** — extend with `TARGET_REPO` matrix for each product repo
+## Conflict resolution
+Product repo local overrides win for repo-specific instructions. Shared repo provides defaults only.
+## When to use
+- 3+ product repositories on the same harness template version
+- Identical triager/implementer/reviewer definitions across teams
+See [adoption.md](adoption.md) for bootstrap and sync procedures.

package/docs/telemetry-artifacts.md ADDED Viewed

@@ -0,0 +1,78 @@
+# Telemetry artifacts
+Machine-readable JSON records emitted by inner-loop workflows for nightly outer-loop aggregation. Span-level OTel export remains optional; these artifacts are the **canonical offline source** when Langfuse wiring is absent.
+Parent schema fields: [telemetry-schema.md](telemetry-schema.md).
+## Envelope shape
+Each file is a single JSON object:
+| Field | Required | Description |
+|-------|----------|-------------|
+| `schema_version` | yes | Currently `"1"` |
+| `emitted_at` | yes | ISO-8601 timestamp |
+| `source` | yes | Emitting workflow id (see table below) |
+| `workflow` | best-effort | GitHub Actions workflow name |
+| `workflow_run_id` | best-effort | `github.run_id` for correlation |
+| `run_attempt` | best-effort | `github.run_attempt` |
+| `event_name` | best-effort | `github.event_name` |
+| `placeholders` | yes | Payload fields still using sentinel defaults |
+| `payload` | yes | Telemetry fields per [telemetry-schema.md](telemetry-schema.md) |
+Sample: [infra/samples/telemetry-artifact.json](../infra/samples/telemetry-artifact.json)
+## Emitting workflows
+| `source` | Workflow | When |
+|----------|----------|------|
+| `harness-ci` | `.github/workflows/harness-ci.yml` | Every PR after harness jobs complete |
+| `eval-ci` | `.github/workflows/eval-ci.yml` | Pull request eval runs only (scheduled runs skip telemetry) |
+| `agent-retry-orchestrator` | `.github/workflows/agent-retry-orchestrator.yml` | Failed check suite on a linked PR |
+| `pr-context` | `.github/workflows/pr-context-comment.yml` | PR opened / synchronized |
+Implementation: `node scripts/emit-telemetry-artifact.mjs` (see `scripts/lib/telemetry-artifact.mjs`).
+## Storage and naming
+**Runner path:** `telemetry-artifacts/` (repo root during the job).
+**Filename:** `{source}-pr{number}-run{workflow_run_id}.json` (or `no-pr` when not PR-scoped).
+**GitHub Actions artifact:** each workflow uploads the directory as `harness-telemetry-{run_id}`, `eval-telemetry-{run_id}`, `retry-telemetry-{run_id}`, or `pr-context-telemetry-{run_id}`.
+Artifacts are retained per repository retention settings (default 90 days). Nightly aggregation should list workflow runs for the emitters above and download matching artifacts — no PR comment parsing required.
+## Required vs best-effort payload fields
+| Field | Inner-loop CI | Notes |
+|-------|---------------|-------|
+| `repo`, `pr_number`, `task_id` | required | `task_id` from linked Issue (`fixes #N`) or `pr-{number}` |
+| `task_class`, `autonomy_level`, `retry_count` | required | From PR labels (`task:*`, `autonomy:*`, `retry:N`) |
+| `changed_files`, `diff_loc` | required on PR workflows | From `git diff` when `BASE_SHA` is set |
+| `wall_failure_type` | required | Empty when green; mapped from failed job names on harness-ci |
+| `final_outcome` | required | `in_progress` until merge/close events wire later |
+| `agent_type`, `execution_mode`, `model` | best-effort | Sentinel `n/a` / `ci` until agent runtime export |
+| `tool_calls`, `cost`, `elapsed_time` | best-effort | Sentinel `-1` until Langfuse / OTel |
+| `review_outcome` | best-effort | `pending` until review webhooks exist |
+Fields listed in `placeholders` use documented sentinels and are safe for aggregation dashboards to filter.
+## Validation
+```bash
+node scripts/validate-telemetry.mjs "$(cat infra/samples/telemetry-artifact.json)"
+node scripts/emit-telemetry-artifact.mjs   # in CI with TELEMETRY_SOURCE set
+node scripts/test-telemetry-artifact-scenarios.mjs
+```
+Set `HARNESS_STRICT_TELEMETRY=1` to fail when `placeholders` is non-empty (intended for post-wiring CI).
+## Nightly consumption (outline)
+1. Query Actions API for workflow runs of `Harness CI`, `Eval CI`, `Agent retry orchestrator`, and `PR context comment` in the last 24h.
+2. Download `*-telemetry-*` artifacts from each run.
+3. Parse JSON envelopes; dedupe by `workflow_run_id` + `source` + `payload.pr_number`.
+4. Join rows on `repo`, `task_id`, `pr_number` for KPI rollups ([kpi-baseline.md](kpi-baseline.md)).
+Classification and harness revision routing are out of scope for the emitters; see [failure-taxonomy.md](failure-taxonomy.md).

package/docs/telemetry-schema.md ADDED Viewed

@@ -0,0 +1,60 @@
+# Telemetry Schema
+Minimum structured fields for agent harness observability (arch.md §5.4). Export via OpenTelemetry to Langfuse or any OTel-compatible backend.
+## Required fields
+| Field | Type | Description |
+|-------|------|-------------|
+| `task_id` | string | Unique task identifier (Issue number or UUID) |
+| `pr_number` | integer | Pull request number, if applicable |
+| `repo` | string | `owner/name` |
+| `agent_type` | string | e.g. `implementer`, `triager`, `reviewer` |
+| `execution_mode` | string | `cli`, `ide`, `coding_agent`, `gh_aw`, `sdk` |
+| `model` | string | Model identifier used |
+| `task_class` | string | `docs`, `test-fix`, `refactor`, etc. |
+| `autonomy_level` | string | `L0`–`L3` |
+| `tool_calls` | integer | Count of tool invocations |
+| `retry_count` | integer | Inner-loop retry attempts |
+| `wall_failure_type` | string | `test`, `lint`, `type`, `security`, `safe-output`, `diff-size`, or empty |
+| `cost` | number | AI credits or token cost |
+| `elapsed_time` | number | Seconds |
+| `changed_files` | integer | Files in diff |
+| `diff_loc` | integer | Lines changed (add + delete) |
+| `final_outcome` | string | `merged`, `closed`, `escalated`, `in_progress` |
+| `review_outcome` | string | `approved`, `changes_requested`, `pending` |
+## KPI mapping
+| KPI | Fields |
+|-----|--------|
+| PR rejection rate | `review_outcome`, `final_outcome` |
+| First-pass wall rate | `wall_failure_type`, `retry_count` |
+| Cost per task | `cost`, `task_id` |
+| Autonomy distribution | `autonomy_level`, `task_class` |
+| Adoption rate | `review_outcome`, `task_class` |
+## Inner-loop artifacts
+Workflows emit JSON artifacts (envelope + `payload`) for nightly aggregation without gh-aw. Storage, naming, emitters, and required vs best-effort fields: [telemetry-artifacts.md](telemetry-artifacts.md).
+## Validation
+```bash
+node scripts/validate-telemetry.mjs "$(cat infra/samples/telemetry-payload.json)"
+node scripts/validate-telemetry.mjs "$(cat infra/samples/telemetry-artifact.json)"
+```
+Collector or `scripts/validate-telemetry.mjs` should reject spans missing required fields when `HARNESS_STRICT_TELEMETRY=1`. With strict mode, non-empty `placeholders` on artifacts also fails validation.
+## PR context comment placeholders
+When observability is not fully wired:
+| Field | Behavior |
+|-------|----------|
+| Trace link | If `LANGFUSE_HOST` is unset, PR comment shows `_configure LANGFUSE_HOST; then search by repo=…, pr_number=…_` |
+| AI credits | Informational only — `_set max-ai-credits in org settings_` until org policy is configured |
+| Threat detection | `n/a` — gh-aw threat detection not active in this template |
+Workflow display logic in `pr-context-comment.yml` is unchanged; this section documents the spec only.

package/evals/.score-baseline.json ADDED Viewed

@@ -0,0 +1,6 @@
+{
+  "eval_pass_rate": 85,
+  "production_acceptance_rate": 70,
+  "updated": "2026-07-04",
+  "_comment": "Update weekly from kpi-baseline.md; eval-drift workflow reads this file"
+}

package/evals/e2e-bench/README.md ADDED Viewed

@@ -0,0 +1,28 @@
+# E2E task bench
+Executable acceptance checks for representative tasks. Each task definition carries
+machine-checkable verifiers (`verification_commands`, `verification_contains`,
+`verification_not_contains`) so the bench measures more than manifest/file presence.
+This is still lighter than a full break-and-fix agent runner: it validates that task
+fixtures are reproducible and acceptance checks are real. See `manifest.json`.
+Run weekly via `eval-ci.yml` schedule. Current manifest: **9 tasks** (target 20–100 in a future break-and-fix runner).
+## Runner boundary (current vs planned)
+| Concern | Current (`run-e2e-bench.mjs`) | Planned break-and-fix runner |
+|---------|-------------------------------|------------------------------|
+| **Task input** | Static YAML fixture in `tasks/*.yml` | Issue + CC-SD contract + repo snapshot |
+| **Expected artifact** | File content / command exit code | Agent-produced PR diff |
+| **Verifier contract** | `verification_*` fields in task YAML | Same fields + agent execution harness |
+| **Result summary** | Per-task ok/fail; class/stack counts; executed/skipped/failed totals | Above + pass@1, retry count, wall failure class |
+Validation before run: `scripts/check-e2e-manifest.mjs` (duplicate id, orphan files, unsupported class, `min_tasks`, `last_rotated`).
+Local:
+```bash
+npm run check-e2e   # manifest only
+npm run run-e2e     # manifest + executable checks
+```

package/evals/e2e-bench/manifest.json ADDED Viewed

@@ -0,0 +1,16 @@
+{
+  "version": 1,
+  "min_tasks": 7,
+  "last_rotated": "2026-07-04T00:00:00Z",
+  "tasks": [
+    { "id": "e2e-001", "class": "docs", "description": "Validate README heading acceptance", "stack": "any" },
+    { "id": "e2e-002", "class": "test-fix", "description": "Validate sample/ts unit-test acceptance", "stack": "ts" },
+    { "id": "e2e-003", "class": "test-fix", "description": "Validate sample/python unit-test acceptance", "stack": "python" },
+    { "id": "e2e-004", "class": "refactor", "description": "Validate Go API rename without behavior change", "stack": "go" },
+    { "id": "e2e-005", "class": "docs", "description": "Validate docstring presence on public API", "stack": "python" },
+    { "id": "e2e-006", "class": "test-fix", "description": "Validate sample/ruby unit-test acceptance", "stack": "ruby" },
+    { "id": "e2e-007", "class": "test-fix", "description": "Validate sample/php unit-test acceptance", "stack": "php" },
+    { "id": "e2e-008", "class": "test-fix", "description": "Validate CC-SD contract module defines v1 enforced task classes", "stack": "any" },
+    { "id": "e2e-009", "class": "test-fix", "description": "Validate diff-size autonomy limits match operations policy", "stack": "any" }
+  ]
+}

package/evals/e2e-bench/tasks/e2e-001.yml ADDED Viewed

@@ -0,0 +1,10 @@
+id: e2e-001
+class: docs
+stack: any
+description: Validate README heading acceptance
+acceptance:
+  - Heading spelling corrected
+  - No code files changed
+verification_contains:
+  - README.md::# SDLC-GH
+  - README.md::deterministic guardrails for AI coding agents

package/evals/e2e-bench/tasks/e2e-002.yml ADDED Viewed

@@ -0,0 +1,11 @@
+id: e2e-002
+class: test-fix
+stack: ts
+description: Validate sample/ts unit-test acceptance
+acceptance:
+  - npm test passes in sample/ts
+verification_commands:
+  - node>=22@sample/ts::npm test
+verification_contains:
+  - sample/ts/src/add.ts::return a + b;
+  - sample/ts/tests/add.test.ts::expect(add(2, 3)).toBe(5);

package/evals/e2e-bench/tasks/e2e-003.yml ADDED Viewed

@@ -0,0 +1,10 @@
+id: e2e-003
+class: test-fix
+stack: python
+description: Validate sample/python unit-test acceptance
+acceptance:
+  - pytest passes in sample/python
+verification_commands:
+  - sample/python::python3 -c "from src.greet import greet; assert greet('world') == 'Hello, world!'"
+verification_contains:
+  - sample/python/tests/test_greet.py::assert greet("world") == "Hello, world!"

package/evals/e2e-bench/tasks/e2e-004.yml ADDED Viewed

@@ -0,0 +1,14 @@
+id: e2e-004
+class: refactor
+stack: go
+description: Validate Go API rename without behavior change
+acceptance:
+  - go test ./... passes
+  - Public API uses Sum
+verification_commands:
+  - sample/go::env GOCACHE=/private/tmp/sdlc-gh-go-cache go test ./...
+verification_contains:
+  - sample/go/add.go::func Sum(a, b int) int {
+  - sample/go/add_test.go::if got := Sum(2, 3); got != 5 {
+verification_not_contains:
+  - sample/go/add.go::func Add(

package/evals/e2e-bench/tasks/e2e-005.yml ADDED Viewed

@@ -0,0 +1,11 @@
+id: e2e-005
+class: docs
+stack: python
+description: Validate docstring presence on greet() public function
+acceptance:
+  - Docstring present
+  - Tests still pass
+verification_commands:
+  - sample/python::python3 -c "from src.greet import greet; assert greet('world') == 'Hello, world!'"
+verification_contains:
+  - sample/python/src/greet.py::"""Return a friendly greeting for the provided name."""

package/evals/e2e-bench/tasks/e2e-006.yml ADDED Viewed

@@ -0,0 +1,10 @@
+id: e2e-006
+class: test-fix
+stack: ruby
+description: Validate sample/ruby unit-test acceptance
+acceptance:
+  - rspec passes in sample/ruby
+verification_commands:
+  - cmd:bundle@sample/ruby::bundle exec rspec
+verification_contains:
+  - sample/ruby/spec/add_spec.rb::expect(Add.add(2, 3)).to eq(5)

package/evals/e2e-bench/tasks/e2e-007.yml ADDED Viewed

@@ -0,0 +1,10 @@
+id: e2e-007
+class: test-fix
+stack: php
+description: Validate sample/php unit-test acceptance
+acceptance:
+  - phpunit passes in sample/php
+verification_commands:
+  - cmd:composer@sample/php::composer test
+verification_contains:
+  - sample/php/tests/AddTest.php::$this->assertSame(5, Add::add(2, 3));

package/evals/e2e-bench/tasks/e2e-008.yml ADDED Viewed

@@ -0,0 +1,10 @@
+id: e2e-008
+class: test-fix
+stack: any
+description: Validate CC-SD contract module defines v1 enforced task classes
+acceptance:
+  - CCSD_ENFORCED_TASK_CLASSES exports docs and test-fix only
+verification_contains:
+  - scripts/lib/ccsd-contract.mjs::export const CCSD_ENFORCED_TASK_CLASSES
+  - scripts/lib/ccsd-contract.mjs::"docs"
+  - scripts/lib/ccsd-contract.mjs::"test-fix"

package/evals/e2e-bench/tasks/e2e-009.yml ADDED Viewed

@@ -0,0 +1,10 @@
+id: e2e-009
+class: test-fix
+stack: any
+description: Validate diff-size autonomy limits match operations policy
+acceptance:
+  - L2 and L3 limits are codified in diff-size module
+verification_contains:
+  - scripts/lib/diff-size.mjs::L2: { loc: 120, files: 4 }
+  - scripts/lib/diff-size.mjs::L3: { loc: 60, files: 2 }
+  - scripts/lib/diff-size.mjs::L1: { loc: 300, files: 8 }

package/evals/trajectories/rubric.md ADDED Viewed

@@ -0,0 +1,12 @@
+# Rubric for trajectory evaluation (G2)
+## Good PR
+- Meets all acceptance criteria
+- Within autonomy size limits
+- Tests adequately constrain behavior
+- Clear rollback path
+## Scoring
+Use G-Eval or LLM-as-judge with `EVAL_JUDGE_API_KEY` in CI.

package/evals/trajectories/test_harness_conventions.py ADDED Viewed

@@ -0,0 +1,271 @@
+"""Harness convention compliance and regression checks."""
+from pathlib import Path
+import re
+ROOT = Path(".")
+def read(path: str) -> str:
+    return (ROOT / path).read_text()
+def parse_frontmatter(path: str) -> tuple[dict[str, str], str]:
+    text = read(path)
+    match = re.match(r"^---\n(.*?)\n---\n(.*)$", text, re.S)
+    assert match, f"{path} missing YAML frontmatter"
+    frontmatter = {}
+    for line in match.group(1).splitlines():
+        if ":" in line:
+            key, value = line.split(":", 1)
+            frontmatter[key.strip()] = value.strip().strip('"')
+    return frontmatter, match.group(2)
+def test_agents_have_frontmatter_and_expected_tools():
+    expected = {
+        "implementer.agent.md": {"read", "edit", "search", "execute"},
+        "reviewer.agent.md": {"read", "search"},
+        "triager.agent.md": {"read"},
+    }
+    for filename, tools in expected.items():
+        fm, _ = parse_frontmatter(f".github/agents/{filename}")
+        assert fm["name"]
+        tool_values = set(re.findall(r'"([^"]+)"', fm["tools"]))
+        assert tool_values == tools
+def test_issue_template_requires_acceptance_criteria_and_no_fixed_labels():
+    text = read(".github/ISSUE_TEMPLATE/task.yml")
+    assert "id: acceptance_criteria" in text
+    assert "id: goal" in text
+    assert "id: rollback_hints" in text
+    assert "type: textarea" in text
+    assert re.search(r"id: acceptance_criteria.*?required: true", text, re.S)
+    assert "labels:" not in text
+def test_pr_template_has_harness_context_and_rollback():
+    text = read(".github/pull_request_template.md")
+    assert "## Harness context" in text
+    assert "## Rollback" in text
+    assert "## Goal implemented" in text
+    assert "Trace link" in text
+def test_change_size_limits_align_between_docs_and_gate():
+    operations = read("docs/operations.md")
+    gate = read("scripts/lib/diff-size.mjs")
+    agents = read("AGENTS.md")
+    copilot = read(".github/copilot-instructions.md")
+    expected = {"L1": ("300", "8"), "L2": ("120", "4"), "L3": ("60", "2")}
+    for level, (loc, files) in expected.items():
+        assert f"| {level} | {loc} | {files} |" in operations
+        assert f"{level}: {{ loc: {loc}, files: {files} }}" in gate
+        assert f"- {level}: max {loc} LOC, {files} files" in copilot
+    assert "| `docs` | L3 | 60 | 2 |" in agents
+    assert "| `test-fix` | L2 | 120 | 4 |" in agents
+    assert "| `feature-small` | L1 | 300 | 8 |" in agents
+def test_telemetry_required_fields_align_with_validator():
+    schema = read("docs/telemetry-schema.md")
+    lib = read("scripts/lib/telemetry-artifact.mjs")
+    required = re.findall(r"^\| `([^`]+)` \|", schema, re.M)
+    match = re.search(r"export const TELEMETRY_REQUIRED_FIELDS = \[([\s\S]*?)\];", lib)
+    assert match, "TELEMETRY_REQUIRED_FIELDS not found in telemetry-artifact.mjs"
+    validator_fields = re.findall(r'"([^"]+)"', match.group(1))
+    assert set(required) == set(validator_fields)
+def test_retry_policy_matches_operations_doc():
+    operations = read("docs/operations.md")
+    orchestrator = read(".github/workflows/agent-retry-orchestrator.yml")
+    assert "Max retries `N` | 3" in operations
+    assert "const MAX_RETRIES = 3;" in orchestrator
+    assert "Same failure signature | Stop after 2 consecutive identical" in operations
+    assert "Same failure signature detected twice" in orchestrator
+def test_gh_aw_dogfood_label_and_doc():
+    labels = read(".github/labels.yml")
+    assert "task:gh-aw-dogfood" in labels
+    dogfood = read("docs/gh-aw-dogfood.md")
+    assert "task:gh-aw-dogfood" in dogfood
+    assert "nightly-harness-review.yml" in dogfood
+def test_harness_review_classes_align_with_failure_taxonomy():
+    taxonomy = read("docs/failure-taxonomy.md")
+    lib = read("scripts/lib/harness-review.mjs")
+    match = re.search(r"export const FAILURE_CLASSES = \[([\s\S]*?)\];", lib)
+    assert match, "FAILURE_CLASSES not found in harness-review.mjs"
+    classes = re.findall(r'"([^"]+)"', match.group(1))
+    for label in ("FF不足", "壁不足", "モデル限界"):
+        assert label in taxonomy
+        assert label in classes
+    assert "unclassified" in classes
+def test_telemetry_fetch_workflows_align_with_emitters():
+    fetch = read("scripts/fetch-telemetry-artifacts.mjs")
+    docs = read("docs/telemetry-artifacts.md")
+    bootstrap = read("scripts/bootstrap-harness.sh")
+    workflows = re.findall(r'workflow: "([^"]+\.yml)"', fetch)
+    assert workflows, "TELEMETRY_WORKFLOWS missing in fetch-telemetry-artifacts.mjs"
+    for wf in workflows:
+        assert (ROOT / ".github/workflows" / wf).is_file(), f"missing emitter workflow {wf}"
+        assert wf in bootstrap, f"bootstrap-harness.sh does not copy {wf}"
+    assert "harness-telemetry-" in fetch
+    assert "eval-telemetry-" in fetch
+    assert "retry-telemetry-" in fetch
+    assert "pr-context-telemetry-" in fetch
+    for source in ("harness-ci", "eval-ci", "agent-retry-orchestrator", "pr-context"):
+        assert f"`{source}`" in docs
+def test_nightly_harness_review_bootstrap_and_workflow():
+    bootstrap = read("scripts/bootstrap-harness.sh")
+    assert "nightly-harness-review.yml" in bootstrap
+    assert "fetch-telemetry-artifacts.mjs" in bootstrap
+    assert "aggregate-harness-review.mjs" in bootstrap
+    assert "route-harness-review.mjs" in bootstrap
+    assert "harness-review.mjs" in bootstrap
+    assert (ROOT / ".github/workflows/nightly-harness-review.yml").is_file()
+    nightly = read(".github/workflows/nightly-harness-review.yml")
+    assert "fetch-telemetry-artifacts.mjs" in nightly
+    assert "aggregate-harness-review.mjs" in nightly
+    assert "route-harness-review.mjs" in nightly
+def test_outer_loop_routing_labels_defined():
+    labels = read(".github/labels.yml")
+    routing = read("scripts/lib/harness-review-routing.mjs")
+    assert "outer-loop:harness-revision" in labels
+    assert "outer-loop:wall-addition" in labels
+    assert "outer-loop:harness-revision" in routing
+    assert "outer-loop:wall-addition" in routing
+def test_template_codeowners_keeps_placeholder():
+    codeowners = read(".github/CODEOWNERS")
+    validate = read("scripts/validate-harness.mjs")
+    assert "@your-org/harness-engineers" in codeowners
+    assert "detectRepoProfile" in validate
+    assert "CODEOWNERS_PLACEHOLDER" in validate
+def test_gh_aw_sources_include_required_sections():
+    nightly = read(".github/workflows/nightly-harness-review.md")
+    weekly = read(".github/workflows/weekly-redteam.md")
+    for section in (
+        "## Required inputs",
+        "## Forbidden operations",
+        "## Expected outputs",
+        "## Promotion criteria",
+    ):
+        assert section in nightly
+        assert section in weekly
+    assert "GH_AW_SOURCE_REQUIRED_SECTIONS" in read("scripts/lib/gh-aw-dogfood.mjs")
+def _parse_ccsd_exports() -> tuple[list[str], list[str], list[str]]:
+    """Read canonical CC-SD field names from scripts/lib/ccsd-contract.mjs."""
+    contract = read("scripts/lib/ccsd-contract.mjs")
+    required = re.findall(
+        r'export const CCSD_REQUIRED_FIELDS = \[\s*([\s\S]*?)\s*\];',
+        contract,
+    )[0]
+    optional = re.findall(
+        r'export const CCSD_OPTIONAL_FIELDS = \[\s*([\s\S]*?)\s*\];',
+        contract,
+    )[0]
+    pr_fields = re.findall(
+        r'export const CCSD_PR_SUMMARY_FIELDS = \[\s*([\s\S]*?)\s*\];',
+        contract,
+    )[0]
+    def names(block: str) -> list[str]:
+        return re.findall(r'"([^"]+)"', block)
+    return names(required), names(optional), names(pr_fields)
+def test_task_template_contains_canonical_ccsd_fields():
+    required, optional, _ = _parse_ccsd_exports()
+    text = read(".github/ISSUE_TEMPLATE/task.yml")
+    for field in required:
+        assert f"label: {field}" in text, f"task.yml missing required field {field}"
+    for field in optional:
+        assert f"label: {field}" in text, f"task.yml missing optional field {field}"
+    assert "labels:" not in text
+def test_task_template_placeholders_are_detected_by_validator():
+    contract = read("scripts/lib/ccsd-contract.mjs")
+    template = read(".github/ISSUE_TEMPLATE/task.yml")
+    snippets = re.findall(r'"([^"]+)"', contract.split("CCSD_PLACEHOLDER_SNIPPETS", 1)[1].split("];", 1)[0])
+    for snippet in snippets:
+        assert snippet in template or snippet in contract
+def test_agents_and_quality_loop_reference_canonical_ccsd_fields():
+    required, _, _ = _parse_ccsd_exports()
+    paths = [
+        ".github/agents/triager.agent.md",
+        ".github/agents/implementer.agent.md",
+        ".github/agents/reviewer.agent.md",
+        ".github/skills/quality-loop/SKILL.md",
+        "AGENTS.md",
+        ".github/copilot-instructions.md",
+    ]
+    for path in paths:
+        text = read(path)
+        for field in required:
+            assert field in text, f"{path} missing canonical field {field}"
+def test_pr_template_contains_ccsd_summary_fields():
+    _, _, pr_fields = _parse_ccsd_exports()
+    text = read(".github/pull_request_template.md")
+    for field in pr_fields:
+        assert f"## {field}" in text, f"PR template missing section {field}"
+def test_coding_agent_l1_requires_ccsd_for_l1_docs_test_fix():
+    text = read("docs/coding-agent-l1.md")
+    assert "CC-SD" in text
+    assert "`task:docs`" in text
+    assert "`task:test-fix`" in text
+    assert "`autonomy:L1`" in text
+    assert "issue-spec-check" in text
+def test_arch_documents_ccsd_contract():
+    text = read("docs/arch.md")
+    assert "CC-SD" in text
+    assert "ccsd-contract.mjs" in text
+    assert "issue-spec-check" in text
+def test_adoption_describes_ccsd_as_l1_only_v1():
+    text = read("docs/adoption.md")
+    assert "CC-SD" in text
+    assert "v1" in text
+    assert "`task:docs`" in text
+    assert "`task:test-fix`" in text
+    assert "feature-small" in text
+def test_validation_script_field_list_matches_template():
+    required, optional, _ = _parse_ccsd_exports()
+    template = read(".github/ISSUE_TEMPLATE/task.yml")
+    for field in required + optional:
+        assert f"label: {field}" in template
+    assert "CCSD_REQUIRED_FIELDS" in read("scripts/lib/ccsd-contract.mjs")
+    assert "check-issue-spec.mjs" in read(".github/workflows/harness-ci.yml")

package/infra/README.md ADDED Viewed

@@ -0,0 +1,49 @@
+# Observability infrastructure
+Scaffold only — no production wiring is included in the template. See [docs/telemetry-schema.md](../docs/telemetry-schema.md) for required span fields.
+## Langfuse (self-hosted)
+```bash
+cd infra/langfuse
+docker compose up -d
+# UI: http://localhost:3000
+```
+Change `NEXTAUTH_SECRET` and `SALT` before production use.
+## OpenTelemetry collector
+```bash
+docker run -p 4317:4317 -p 4318:4318 \
+  -v "$(pwd)/otel/collector-config.yml:/etc/otelcol/config.yaml" \
+  otel/opentelemetry-collector:latest
+```
+## Connect harness telemetry
+1. Export spans with required fields per [docs/telemetry-schema.md](../docs/telemetry-schema.md), or consume inner-loop JSON artifacts from [docs/telemetry-artifacts.md](../docs/telemetry-artifacts.md).
+2. Point exporters at collector `:4317` (gRPC) or `:4318` (HTTP).
+3. Uncomment Langfuse OTLP exporter in `otel/collector-config.yml` when ready.
+4. Validate payloads:
+```bash
+node scripts/validate-telemetry.mjs "$(cat infra/samples/telemetry-payload.json)"
+node scripts/validate-telemetry.mjs "$(cat infra/samples/telemetry-artifact.json)"
+```
+## Environment variables (CI / local)
+| Variable | Purpose |
+|----------|---------|
+| `LANGFUSE_HOST` | Base URL for trace deep links in PR comments. When unset, PR context shows a configure placeholder (see telemetry-schema.md) |
+| `LANGFUSE_PUBLIC_KEY` | Optional export auth |
+| `LANGFUSE_SECRET_KEY` | Optional export auth |
+## PR context comment (informational fields)
+| Display | Spec |
+|---------|------|
+| Trace | Langfuse search hint when `LANGFUSE_HOST` set; otherwise placeholder text |
+| AI credits | Informational — org `max-ai-credits` not exposed to workflow |
+| Threat detection | `n/a` until gh-aw outer loop is promoted beyond stub |