npm - @guilz-dev/sdlc-gh - Versions diffs - 0.1.0 - Mend

@guilz-dev/sdlc-gh 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (176) hide show

package/.github/CODEOWNERS +5 -0
package/.github/ISSUE_TEMPLATE/bug_report.yml +68 -0
package/.github/ISSUE_TEMPLATE/config.yml +1 -0
package/.github/ISSUE_TEMPLATE/feature_request.yml +39 -0
package/.github/ISSUE_TEMPLATE/support.yml +56 -0
package/.github/ISSUE_TEMPLATE/task.yml +89 -0
package/.github/agents/implementer.agent.md +17 -0
package/.github/agents/reviewer.agent.md +18 -0
package/.github/agents/triager.agent.md +13 -0
package/.github/aw/actions-lock.json +9 -0
package/.github/copilot-instructions.md +35 -0
package/.github/hooks/hooks.json +12 -0
package/.github/instructions/core.instructions.md +11 -0
package/.github/instructions/profiles/go.instructions.md +10 -0
package/.github/instructions/profiles/php.instructions.md +11 -0
package/.github/instructions/profiles/python.instructions.md +11 -0
package/.github/instructions/profiles/ruby.instructions.md +11 -0
package/.github/instructions/profiles/typescript.instructions.md +11 -0
package/.github/labels.yml +55 -0
package/.github/pull_request_template.md +33 -0
package/.github/ruleset.example.json +33 -0
package/.github/ruleset.harness-eval.example.json +29 -0
package/.github/skills/quality-loop/SKILL.md +23 -0
package/.github/workflows/agent-retry-orchestrator.yml +161 -0
package/.github/workflows/copilot-setup-steps.yml +64 -0
package/.github/workflows/eval-ci.yml +169 -0
package/.github/workflows/eval-drift.yml +75 -0
package/.github/workflows/gh-aw-dogfood-ci.yml +73 -0
package/.github/workflows/harness-ci.yml +244 -0
package/.github/workflows/harness-sync.yml +28 -0
package/.github/workflows/l1-readiness-check.yml +45 -0
package/.github/workflows/labels-sync.yml +24 -0
package/.github/workflows/nightly-harness-review.lock.yml +1643 -0
package/.github/workflows/nightly-harness-review.md +87 -0
package/.github/workflows/nightly-harness-review.yml +63 -0
package/.github/workflows/npm-publish.yml +49 -0
package/.github/workflows/pr-context-comment.yml +138 -0
package/.github/workflows/product-ci-go.yml +33 -0
package/.github/workflows/product-ci-php.yml +39 -0
package/.github/workflows/product-ci-python.yml +34 -0
package/.github/workflows/product-ci-ruby.yml +35 -0
package/.github/workflows/product-ci-ts.yml +37 -0
package/.github/workflows/task-issue-label-sync.yml +50 -0
package/.github/workflows/weekly-redteam.lock.yml +1571 -0
package/.github/workflows/weekly-redteam.md +76 -0
package/.github/zizmor.yml +11 -0
package/AGENTS.md +54 -0
package/LICENSE +21 -0
package/README.md +366 -0
package/config/stacks.json +55 -0
package/docs/adoption.md +126 -0
package/docs/arch.md +535 -0
package/docs/auth-boundaries.md +16 -0
package/docs/coding-agent-l1.md +152 -0
package/docs/exceptions/README.md +25 -0
package/docs/exceptions/TEMPLATE.md +8 -0
package/docs/failure-taxonomy.md +23 -0
package/docs/gh-aw-dogfood.md +109 -0
package/docs/kpi-baseline.md +9 -0
package/docs/nightly-harness-review.md +94 -0
package/docs/operations.md +108 -0
package/docs/publishing.md +79 -0
package/docs/revert-playbook.md +44 -0
package/docs/shared-config.md +30 -0
package/docs/telemetry-artifacts.md +78 -0
package/docs/telemetry-schema.md +60 -0
package/evals/.score-baseline.json +6 -0
package/evals/e2e-bench/README.md +28 -0
package/evals/e2e-bench/manifest.json +16 -0
package/evals/e2e-bench/tasks/e2e-001.yml +10 -0
package/evals/e2e-bench/tasks/e2e-002.yml +11 -0
package/evals/e2e-bench/tasks/e2e-003.yml +10 -0
package/evals/e2e-bench/tasks/e2e-004.yml +14 -0
package/evals/e2e-bench/tasks/e2e-005.yml +11 -0
package/evals/e2e-bench/tasks/e2e-006.yml +10 -0
package/evals/e2e-bench/tasks/e2e-007.yml +10 -0
package/evals/e2e-bench/tasks/e2e-008.yml +10 -0
package/evals/e2e-bench/tasks/e2e-009.yml +10 -0
package/evals/trajectories/rubric.md +12 -0
package/evals/trajectories/test_harness_conventions.py +271 -0
package/infra/README.md +49 -0
package/infra/langfuse/docker-compose.yml +25 -0
package/infra/otel/collector-config.yml +24 -0
package/infra/samples/gh-aw-dogfood-report.json +44 -0
package/infra/samples/harness-review-routing-plan.json +19 -0
package/infra/samples/harness-review-summary.json +61 -0
package/infra/samples/telemetry-artifact.json +29 -0
package/infra/samples/telemetry-payload.json +19 -0
package/package.json +85 -0
package/prompts/triager-classify.prompt.yml +10 -0
package/sample/go/add.go +5 -0
package/sample/go/add_test.go +9 -0
package/sample/go/go.mod +3 -0
package/sample/php/composer.json +26 -0
package/sample/php/composer.lock +1881 -0
package/sample/php/phpunit.xml +8 -0
package/sample/php/src/Add.php +13 -0
package/sample/php/tests/AddTest.php +16 -0
package/sample/python/requirements-dev.txt +2 -0
package/sample/python/src/__init__.py +0 -0
package/sample/python/src/greet.py +3 -0
package/sample/python/tests/conftest.py +4 -0
package/sample/python/tests/test_greet.py +5 -0
package/sample/ruby/.rubocop.yml +10 -0
package/sample/ruby/Gemfile +6 -0
package/sample/ruby/Gemfile.lock +58 -0
package/sample/ruby/lib/add.rb +9 -0
package/sample/ruby/spec/add_spec.rb +11 -0
package/sample/ts/biome.json +6 -0
package/sample/ts/package-lock.json +1763 -0
package/sample/ts/package.json +15 -0
package/sample/ts/src/add.ts +3 -0
package/sample/ts/tests/add.test.ts +8 -0
package/sample/ts/tsconfig.json +12 -0
package/scripts/aggregate-harness-review.mjs +48 -0
package/scripts/bootstrap-harness.sh +411 -0
package/scripts/check-diff-size.mjs +46 -0
package/scripts/check-e2e-manifest.mjs +35 -0
package/scripts/check-eval-score-drift.mjs +31 -0
package/scripts/check-gh-aw-dogfood-scope.mjs +51 -0
package/scripts/check-issue-spec.mjs +215 -0
package/scripts/check-l1-readiness.mjs +82 -0
package/scripts/check-open-pr-limit.mjs +34 -0
package/scripts/doctor.mjs +177 -0
package/scripts/emit-gh-aw-dogfood-report.mjs +112 -0
package/scripts/emit-telemetry-artifact.mjs +99 -0
package/scripts/fetch-telemetry-artifacts.mjs +176 -0
package/scripts/harness-drift-report.mjs +99 -0
package/scripts/lib/bootstrap-copy.mjs +123 -0
package/scripts/lib/ccsd-contract.mjs +212 -0
package/scripts/lib/diff-size.mjs +103 -0
package/scripts/lib/doctor-local.mjs +179 -0
package/scripts/lib/e2e-manifest.mjs +76 -0
package/scripts/lib/gh-aw-dogfood.mjs +293 -0
package/scripts/lib/github-config.mjs +94 -0
package/scripts/lib/harness-ci-fragments.mjs +98 -0
package/scripts/lib/harness-review-routing.mjs +244 -0
package/scripts/lib/harness-review.mjs +388 -0
package/scripts/lib/issue-form-label-sync.mjs +56 -0
package/scripts/lib/l1-readiness.mjs +258 -0
package/scripts/lib/merge-harness-package.mjs +36 -0
package/scripts/lib/npm-package.mjs +129 -0
package/scripts/lib/setup-wizard.mjs +224 -0
package/scripts/lib/stacks.mjs +138 -0
package/scripts/lib/telemetry-artifact.mjs +253 -0
package/scripts/lib/template-root.mjs +39 -0
package/scripts/merge-harness-package.mjs +14 -0
package/scripts/route-harness-review.mjs +168 -0
package/scripts/run-e2e-bench.mjs +216 -0
package/scripts/sdlc-gh-cli.mjs +91 -0
package/scripts/select-eval-jobs.mjs +41 -0
package/scripts/setup-github.mjs +242 -0
package/scripts/setup-github.sh +4 -0
package/scripts/setup-wizard.mjs +426 -0
package/scripts/test-bootstrap-guidance-scenarios.mjs +94 -0
package/scripts/test-diff-size-scenarios.mjs +88 -0
package/scripts/test-doctor-scenarios.mjs +70 -0
package/scripts/test-e2e-manifest-scenarios.mjs +65 -0
package/scripts/test-gh-aw-dogfood-scenarios.mjs +74 -0
package/scripts/test-harness-review-routing-scenarios.mjs +130 -0
package/scripts/test-harness-review-scenarios.mjs +92 -0
package/scripts/test-hooks-scenarios.mjs +44 -0
package/scripts/test-issue-form-label-sync-scenarios.mjs +48 -0
package/scripts/test-issue-spec-scenarios.mjs +258 -0
package/scripts/test-l1-readiness-scenarios.mjs +204 -0
package/scripts/test-merge-harness-package-scenarios.mjs +53 -0
package/scripts/test-npm-package-scenarios.mjs +31 -0
package/scripts/test-sdlc-gh-cli-scenarios.mjs +54 -0
package/scripts/test-setup-github-scenarios.mjs +103 -0
package/scripts/test-setup-wizard-scenarios.mjs +114 -0
package/scripts/test-telemetry-artifact-scenarios.mjs +69 -0
package/scripts/trim-harness-ci.mjs +18 -0
package/scripts/validate-gh-aw-compile.mjs +64 -0
package/scripts/validate-harness.mjs +199 -0
package/scripts/validate-telemetry.mjs +21 -0
package/scripts/verify-bootstrap-stacks.sh +192 -0

package/docs/coding-agent-l1.md ADDED Viewed

@@ -0,0 +1,152 @@
+# Coding agent L1 trial guide
+Start Issue-driven delegation with low-risk task classes only.
+## Eligible tasks
+| Label | Description |
+|-------|-------------|
+| `task:docs` | Documentation, comments, README |
+| `task:test-fix` | Fix or add tests (single responsibility) |
+Always set `autonomy:L1` unless triager recommends lower.
+`task:feature-small`, `task:infra`, and `task:security-sensitive` are **out of scope** for CC-SD enforcement in v1.
+## CC-SD contract (required for L1 docs / test-fix)
+L1 delegation on `task:docs` and `task:test-fix` requires a complete Issue-embedded CC-SD contract:
+| Field | Required |
+|-------|----------|
+| `Goal` | yes |
+| `Non-goals` | yes |
+| `Constraints` | yes |
+| `Acceptance criteria` | yes |
+| `Rollback hints` | yes |
+| `Additional context` | optional |
+Use `.github/ISSUE_TEMPLATE/task.yml`. CI enforces completeness via the `issue-spec-check` job when the linked Issue has `autonomy:L1` and `task:docs` or `task:test-fix` **labels**. Task Issues sync those labels automatically from the dropdown selections; triager still verifies that the classification is correct before delegation.
+**No usable CC-SD contract, no L1 delegation** — triager must not apply `autonomy:L1` if fields are blank or placeholder-only.
+`issue-spec-check` fails on fetch errors only when PR or Issue labels indicate L1 `docs`/`test-fix`; otherwise it warns and skips.
+Task Issues created from `.github/ISSUE_TEMPLATE/task.yml` now sync `task:*` and `autonomy:*` labels automatically from the dropdown selections via [task-issue-label-sync.yml](../.github/workflows/task-issue-label-sync.yml). Triager still owns the classification decision, but no longer has to retype the same values into labels.
+## CC-SD examples
+### Good example: `task:docs`
+```md
+Goal
+Refresh the README readiness section so first-time adopters can find the L1 workflow and know when to use it.
+Non-goals
+- Do not change workflow behavior or required checks.
+- Do not rewrite unrelated setup sections.
+Constraints
+- Limit edits to README.md and docs/coding-agent-l1.md.
+- Keep terminology aligned with docs/operations.md and workflow names.
+Acceptance criteria
+- [ ] README links to the L1 readiness workflow from the readiness section.
+- [ ] docs/coding-agent-l1.md mentions the Actions-based fallback.
+- [ ] No other docs are required to understand how to start the check.
+Rollback hints
+Revert the README/docs commit if the wording causes confusion; no data migration or config rollback is required.
+```
+### Good example: `task:test-fix`
+```md
+Goal
+Fix the failing readiness scenario test so strict mode expectations match the current CLI output.
+Non-goals
+- Do not change production workflow permissions.
+- Do not refactor unrelated doctor or diff-size checks.
+Constraints
+- Edit only the readiness test and the minimal supporting helper if needed.
+- Preserve current status vocabulary unless a test proves it is wrong.
+Acceptance criteria
+- [ ] The targeted readiness scenario reproduces the original failure before the fix.
+- [ ] `node scripts/test-l1-readiness-scenarios.mjs` passes after the fix.
+- [ ] The fix does not weaken existing failure coverage.
+Rollback hints
+Revert the test/helper commit and re-run the scenario suite.
+```
+### Bad example: rejected placeholder-only contract
+```md
+Goal
+Update docs.
+Non-goals
+- None
+Constraints
+- Keep it simple
+Acceptance criteria
+- [ ] Works
+Rollback hints
+Revert if needed
+```
+This should be rejected for L1 because the contract does not bound scope or define testable acceptance criteria.
+## L1 flow
+1. Author fills CC-SD Issue
+2. Triager validates the contract and confirms or corrects the synced labels
+3. Implementer executes against the contract
+4. CI enforces spec completeness (`issue-spec-check`)
+5. Reviewer checks spec conformance (requirement fit + non-goal preservation)
+## Agent assignment
+1. Assign **triager** to classify and confirm the synced labels (or add labels manually if sync did not run)
+2. Assign **implementer** for L1 tasks with a complete contract
+3. Agent opens draft PR; human reviews at single gate
+## Prerequisites
+- GitHub Copilot with coding agent enabled
+- Copilot setup workflow green
+- Ruleset: `harness-static`, `diff-size`, `issue-spec-check` (for L1 repos), stack `product-ci-*`
+## Readiness check (recommended)
+Run before creating the first L1 Issue:
+```bash
+npm run check-l1-readiness
+npm run check-l1-readiness -- --strict
+```
+Direct script form is also available: `node scripts/check-l1-readiness.mjs --strict`.
+Without local Node/gh, use **Actions → L1 readiness check → Run workflow** (`.github/workflows/l1-readiness-check.yml`). The workflow uses `GITHUB_TOKEN` for remote checks and writes a markdown job summary.
+What it verifies:
+- local harness assets and doctor checks (stack inferred from `product-ci-*.yml` when `.harness-stack` is absent)
+- Issue template and agent files required for L1 delegation
+- labels/rulesets/workflow status on GitHub when `gh` is authenticated (or via the Actions workflow)
+- unresolved manual prerequisites (for example Copilot coding agent entitlement)
+Status vocabulary is `PASS` / `FAIL` / `SKIP` / `WARN` / `MANUAL`. `--strict` uses the `check-l1-readiness` CLI's strict gating semantics.
+## Success criteria
+- Draft PR passes all required checks
+- Harness context comment posted on PR
+- Human merges after review

package/docs/exceptions/README.md ADDED Viewed

@@ -0,0 +1,25 @@
+# Exception ledger
+Record deviations from harness principles (arch.md §9). Every exception is time-boxed and must include a revert plan.
+## Required fields
+Every exception record **must** include all of the following:
+| Field | Requirement |
+|-------|-------------|
+| **Reason** | Why the deviation is needed |
+| **Target task / PR** | Issue or PR link |
+| **Principle deviated** | Which harness principle is waived (e.g. Principle 4 — no harness change without eval) |
+| **Approver** | Named human approver (not the agent) |
+| **Expiry** | Max **14 days** from approval; no permanent exceptions |
+| **Revert plan** | Concrete steps to undo if the exception causes regression |
+## Template
+Copy [TEMPLATE.md](TEMPLATE.md) when recording an exception. Exceptions appear in the next morning queue for post-review.
+## Related docs
+- Revert procedure: [revert-playbook.md](../revert-playbook.md)
+- Morning queue: [operations.md](../operations.md)

package/docs/exceptions/TEMPLATE.md ADDED Viewed

@@ -0,0 +1,8 @@
+# Exception record
+- **Reason**:
+- **Target task / PR**:
+- **Principle deviated**:
+- **Approver**:
+- **Expiry** (max 14 days from approval):
+- **Revert plan**:

package/docs/failure-taxonomy.md ADDED Viewed

@@ -0,0 +1,23 @@
+# Failure Taxonomy
+Classify failures for outer-loop routing (arch.md §5.3).
+## Categories
+| Class | Definition | Remediation |
+|-------|------------|-------------|
+| **FF不足** (feed-forward gap) | Repeated convention violations, missed steps | Update instructions / skills / agents |
+| **壁不足** (wall gap) | CI passes but human review rejects | Add tests, lint rules, contracts |
+| **モデル限界** (model limit) | Correct tools and context, still fails after N retries | Escalate, split task, or accept human-led |
+## Wall failure types
+`test` | `lint` | `type` | `security` | `safe-output` | `diff-size`
+## Routing
+1. Auto-retry inner loop (where allowed per `docs/operations.md`)
+2. Structured comment on PR with `wall_failure_type`
+3. Nightly GHA aggregate ([nightly-harness-review.md](nightly-harness-review.md))
+4. Repeated **FF不足** → `outer-loop:harness-revision` issue (automated when thresholds met)
+5. Repeated **壁不足** → `outer-loop:wall-addition` issue (automated when thresholds met)

package/docs/gh-aw-dogfood.md ADDED Viewed

@@ -0,0 +1,109 @@
+# gh-aw dogfood track (sdlc-gh on sdlc-gh)
+Bounded validation path for [Agentic Workflows (`gh aw`)](https://github.github.com/gh-aw/introduction/overview/) on this repository. **Dogfooding gh-aw is not the same as depending on gh-aw** for core outer-loop operability — see [nightly-harness-review.md](nightly-harness-review.md) for the standard GitHub Actions fallback.
+## Purpose
+High-signal validation of:
+- source `.md` workflows → compiled `.lock.yml`
+- safe-output frontmatter boundaries
+- reviewable, narrow change scope
+## How to run a dogfood task
+1. Open an Issue/PR labeled **`task:gh-aw-dogfood`** (+ `autonomy:L0` recommended — proposal only).
+2. Limit changes to the allowed paths below.
+3. If `.md` sources change, run locally:
+   ```bash
+   gh aw compile nightly-harness-review.md
+   gh aw compile weekly-redteam.md
+   git diff .github/workflows/*.lock.yml
+   ```
+4. Push and let **gh-aw dogfood CI** (`.github/workflows/gh-aw-dogfood-ci.yml`) record pass/fail criteria.
+## Allowed path scope
+| Area | Paths |
+|------|-------|
+| gh-aw sources | `.github/workflows/nightly-harness-review.md`, `.github/workflows/weekly-redteam.md` |
+| Compiled locks | corresponding `*.lock.yml` |
+| Tooling | `scripts/lib/gh-aw-dogfood.mjs`, `scripts/check-gh-aw-dogfood-scope.mjs`, `scripts/validate-gh-aw-compile.mjs`, `scripts/emit-gh-aw-dogfood-report.mjs`, `scripts/test-gh-aw-dogfood-scenarios.mjs` |
+| CI | `.github/workflows/gh-aw-dogfood-ci.yml` |
+| Labels | `.github/labels.yml` (when adding or updating `task:gh-aw-dogfood`) |
+| AW pins | `.github/aw/actions-lock.json` |
+| Docs | `docs/gh-aw-dogfood.md`, `docs/nightly-harness-review.md` (gh-aw contract sections only) |
+PRs with `task:gh-aw-dogfood` **fail CI** if any changed file is outside this list.
+## Evaluation criteria
+Recorded in `dogfood-report/gh-aw-dogfood-report.json` (sample: [infra/samples/gh-aw-dogfood-report.json](../infra/samples/gh-aw-dogfood-report.json)):
+| Criterion | Pass condition |
+|-----------|----------------|
+| **scope** | All PR diffs stay within allowed prefixes when `task:gh-aw-dogfood` is set |
+| **safe_outputs** | `create-pull-request.max <= 1`; no forbidden auto-merge outputs |
+| **compile** | `gh aw compile` succeeds for each source workflow when CLI is present |
+| **lock_drift** | `.lock.yml` has `gh-aw-metadata` header; byte-level drift is caught by **compile** |
+| **reviewability** | Above gates pass; outputs remain PRs/summaries/issues only |
+Set `GH_AW_COMPILE_REQUIRED=1` in CI to hard-fail when `gh aw` is missing (default: skip with warning).
+## Explicit constraints (Issue #7)
+- Do **not** replace standard GHA nightly aggregation ([nightly-harness-review.yml](../.github/workflows/nightly-harness-review.yml))
+- No autonomous merge
+- No repo-wide refactors under this track
+- Outputs: reviewable artifacts only (PR proposals, compile/drift results, summaries, Issues)
+## Comparing runs over time
+Download `gh-aw-dogfood-{run_id}` artifacts from Actions and diff `criteria` blocks. Track:
+- compile pass rate
+- lock drift incidents
+- safe-output regressions
+- scope violations
+### Baseline run (2026-07-04)
+First green dogfood CI on `main` after track landing ([#7](https://github.com/guilz-dev/sdlc-gh/issues/7)):
+| Field | Value |
+|-------|-------|
+| Run | [workflow_dispatch `28712363476`](https://github.com/guilz-dev/sdlc-gh/actions/runs/28712363476) |
+| Compiler | `gh aw` v0.81.6 (pinned in dogfood CI) |
+| `criteria.compile.skipped` | `false` |
+| Artifact | `gh-aw-dogfood-28712363476` |
+Use this run as the reference point when diffing future dogfood reports.
+## Rollback
+Trigger rollback when:
+- `gh aw` preview/compiler regression breaks compile on unchanged `.md` sources
+- safe-output policy is violated in dogfood workflows
+- dogfood CI blocks unrelated harness work
+**Action:**
+1. Revert the `.md` / `.lock.yml` pair (see [revert-playbook.md](revert-playbook.md))
+2. Disable or skip `gh-aw-dogfood-ci` until upstream fix is confirmed
+3. Keep **GHA outer loop** (`nightly-harness-review.yml`) as the operational baseline
+## Tests
+```bash
+node scripts/test-gh-aw-dogfood-scenarios.mjs
+node scripts/validate-gh-aw-compile.mjs   # requires gh aw
+```
+## Related docs
+- [failure-taxonomy.md](failure-taxonomy.md) — outer-loop classification (GHA path)
+- [adoption.md](adoption.md) — when to promote beyond dogfood
+- [auth-boundaries.md](auth-boundaries.md) — execution mode credentials

package/docs/kpi-baseline.md ADDED Viewed

@@ -0,0 +1,9 @@
+# KPI baseline template
+Record weekly in spreadsheet or issue until Langfuse dashboard is connected.
+| Week | PR rejection rate | First-pass CI rate | Avg retry count | Notes |
+|------|-------------------|--------------------|-----------------|-------|
+| YYYY-WW | | | | |
+Fields map to [telemetry-schema.md](telemetry-schema.md).

package/docs/nightly-harness-review.md ADDED Viewed

@@ -0,0 +1,94 @@
+# Nightly harness review
+Standard GitHub Actions outer-loop job that aggregates inner-loop telemetry artifacts (#2) and classifies failures per [failure-taxonomy.md](failure-taxonomy.md). Runs **without gh-aw**.
+## Workflow
+| Item | Value |
+|------|-------|
+| File | `.github/workflows/nightly-harness-review.yml` |
+| Schedule | `0 2 * * *` (02:00 UTC daily) |
+| Manual | `workflow_dispatch` with optional `window_hours` (default 24); routing is **dry-run by default** unless `apply_routing=true` |
+gh-aw stub (`.github/workflows/nightly-harness-review.md` + `.lock.yml`) documents promotion criteria and safe-outputs; **GHA** [nightly-harness-review.yml](../.github/workflows/nightly-harness-review.yml) is the operational baseline.
+## Pipeline
+1. `scripts/fetch-telemetry-artifacts.mjs` — list recent runs for emitter workflows, download `*-telemetry-*` artifacts into `telemetry-collected/`
+2. `scripts/aggregate-harness-review.mjs` — dedupe, group by `repo` + `task_id` + `pr_number`, classify, write summaries
+Emitter workflows: see [telemetry-artifacts.md](telemetry-artifacts.md).
+## Output
+| Path | Format |
+|------|--------|
+| `harness-review/harness-review-summary.json` | Machine-readable rollup + per-task `classifications[]` |
+| `harness-review/harness-review-summary.md` | Human-readable tables for the morning queue |
+| GitHub Actions step summary | Same Markdown as above |
+| Artifact `nightly-harness-review-{run_id}` | Uploaded directory for downstream automation |
+Sample JSON: [infra/samples/harness-review-summary.json](../infra/samples/harness-review-summary.json)
+## Classification rules
+| Class | Signals |
+|-------|---------|
+| **壁不足** | Harness CI green (`wall_failure_type` empty) + `review_outcome: changes_requested` |
+| **モデル限界** | `final_outcome: escalated`, `retry_count >= 3`, security wall, repeated `test`/`type`/etc., or same wall across multiple retry events |
+| **FF不足** | Repeated `lint` / issue-spec failures (≥2 records) |
+| **unclassified** | Failure present but pattern does not match above |
+`rollup.repeated_failure_signatures` lists `wall_failure_type` values seen on ≥2 records or ≥2 tasks. `by_wall_failure_type` counts **task groups** per wall type (not raw telemetry rows).
+## Local dry-run
+With fixture telemetry JSON under `telemetry-collected/`:
+```bash
+node scripts/aggregate-harness-review.mjs
+cat harness-review/harness-review-summary.md
+```
+Fetch from GitHub (requires `gh` + token):
+```bash
+export GH_TOKEN=...
+export GITHUB_REPOSITORY=owner/repo
+node scripts/fetch-telemetry-artifacts.mjs
+node scripts/aggregate-harness-review.mjs
+```
+## Tests
+```bash
+node scripts/test-harness-review-scenarios.mjs
+node scripts/test-harness-review-routing-scenarios.mjs
+```
+## Follow-up automation
+The JSON summary feeds `scripts/route-harness-review.mjs` (#4), which opens or updates GitHub issues when thresholds are met.
+## Issue routing
+| Classification | Threshold | Issue kind | Labels |
+|----------------|-----------|------------|--------|
+| **FF不足** | ≥2 task groups **or** repeated `lint` signature (`record_count >= 2`) | harness-revision | `outer-loop:harness-revision`, `autonomy:L0` |
+| **壁不足** | ≥2 task groups **or** CI-pass + review-reject proxy | wall-addition | `outer-loop:wall-addition`, `autonomy:L0` |
+Dedupe: HTML comment marker `<!-- harness-routing-key: {repo}:{kind}:{signature}:{scope} -->` in the issue body. `scope` is derived from task class and wall types where available, so unrelated findings do not collapse into one repo-wide issue. Existing open issues with the same key are **updated**, not duplicated.
+**Migration:** keys before scope suffix (`{repo}:{kind}:{signature}`) are not matched automatically. Close or relabel legacy routed issues after upgrading, or accept one duplicate cycle before the new keys stabilize.
+Dry-run locally:
+```bash
+node scripts/aggregate-harness-review.mjs
+HARNESS_ROUTING_DRY_RUN=1 node scripts/route-harness-review.mjs
+cat harness-review/harness-review-routing-plan.json
+```
+Outputs also written to `harness-review/harness-review-routing-results.json` when live.
+Non-goals: automatic code changes, proposal PR creation (issues only), Langfuse dependency, weekly red-team.

package/docs/operations.md ADDED Viewed

@@ -0,0 +1,108 @@
+# Operations
+Canonical thresholds and policies. All CI gates read from this document.
+## Change size limits (arch.md §5.2.2)
+| Level | Max LOC | Max files |
+|-------|---------|-----------|
+| L1 | 300 | 8 |
+| L2 | 120 | 4 |
+| L3 | 60 | 2 |
+L2/L3 labeled PRs **hard fail** CI when exceeded. L1 **warns by default** (template default). Phase 4 allows opt-in L1 hard-fail via `DIFF_SIZE_L1_HARD_FAIL=1` in the `diff-size` CI job when your org is ready to enforce.
+To enable L1 hard-fail, uncomment or add in `.github/workflows/harness-ci.yml` under the `diff-size` job:
+```yaml
+env:
+  DIFF_SIZE_L1_HARD_FAIL: "1"
+```
+`autonomy:L0` is **proposal-only** — no LOC/file limits are enforced (human gate). This matches arch.md §5.2.2; older harness versions treated L0 as L1 limits with warn-only.
+## Retry policy (Phase 3)
+| Parameter | Value |
+|-----------|-------|
+| Max retries `N` | 3 |
+| Same failure signature | Stop after 2 consecutive identical |
+| Cost cap per task | Configure per org (`max-ai-credits`) |
+| Failure type | Retry allowed |
+|--------------|---------------|
+| test | yes |
+| lint / type | yes |
+| security | no — escalate immediately |
+| safe-output / diff-size | conditional — request split |
+| same signature ×2 | no |
+## Forbidden operations (hooks + CI)
+- `git push --force` to protected branches
+- `rm -rf /` and destructive filesystem patterns
+- Production DB / secrets modification without `task:infra` + human approval
+## Single gate
+- Human judgment: **PR review only**
+- No self-approval where ruleset allows enforcement
+- Harness engineer owns `.github/**`, `evals/**`, `docs/telemetry-schema.md`
+## Morning queue (outer loop)
+Daily ~30 min checklist:
+1. Review nightly harness review summaries (GHA) and routed `outer-loop:*` issues
+2. Triage `harness:eval-drift` issues
+3. Classify failures per [failure-taxonomy.md](failure-taxonomy.md)
+4. Update [kpi-baseline.md](kpi-baseline.md) if metrics available
+## KPI baseline (Phase 1)
+Track weekly per [kpi-baseline.md](kpi-baseline.md). Schema fields in [telemetry-schema.md](telemetry-schema.md).
+## L2 autonomy promotion (Phase 4)
+Promote `task:docs` to L2 candidate when **all** hold:
+- Last 50 tasks: adoption rate > 90%
+- Zero major reverts in 90 days
+- E2E bench pass@1 stable or improving
+Document promotion in PR with evidence links.
+## Revert and exceptions (Phase 4)
+- Revert procedure: [revert-playbook.md](revert-playbook.md)
+- Policy exceptions: [exceptions/README.md](exceptions/README.md) — approver, expiry, revert plan, and principle deviated are required
+## Eval governance
+### Rubric updates
+Review for: validity, reproducibility, bias before merge.
+### Schedule
+- PR: subset eval by change type matrix
+- Weekly: full eval suite
+### Drift threshold
+If eval pass rate exceeds production acceptance rate by **more than 15 points**, open bench review Issue automatically.
+## Secrets naming
+| Secret | Purpose |
+|--------|---------|
+| `EVAL_JUDGE_API_KEY` | DeepEval / LLM-as-judge |
+| `GITHUB_TOKEN` | gh models eval (default Actions token often sufficient) |
+| `LANGFUSE_PUBLIC_KEY` | Optional telemetry export |
+| `LANGFUSE_SECRET_KEY` | Optional telemetry export |
+## PR open limit (safe outputs substitute)
+Until gh-aw `safe_outputs.max_prs` is active, CI uses **open PR count per author** as a proxy (Phase 0–2: warn; Phase 3+: gh-aw enforces per workflow run).
+Warn when an author has more than **3** open PRs at once (`scripts/check-open-pr-limit.mjs`).

package/docs/publishing.md ADDED Viewed

@@ -0,0 +1,79 @@
+# npm publish
+The harness installer is published as **`@guilz-dev/sdlc-gh`**.
+## Install (adopters)
+```bash
+cd /path/to/your-product
+npx @guilz-dev/sdlc-gh
+```
+Pin a version:
+```bash
+npx @guilz-dev/sdlc-gh@0.1.0 init --yes --stack ts --codeowners @your-org/harness-engineers
+```
+Before the first npm release, use GitHub:
+```bash
+npx github:guilz-dev/sdlc-gh
+```
+## Prerequisites (maintainers)
+1. npm org **`@guilz-dev`** exists and your user can publish to it
+   - https://www.npmjs.com/org/create
+   - `npm org ls guilz-dev`
+2. Granular npm access token with **Publish** for `@guilz-dev/sdlc-gh`
+3. GitHub repository secret **`NPM_TOKEN`** (repo → Settings → Secrets → Actions)
+Enable **Trusted Publishing** (recommended) or use a classic token with publish scope.
+## Release flow
+1. Bump `version` in `package.json` (semver).
+2. Merge to `main`.
+3. Create a GitHub **Release** whose tag matches the version (`0.1.0` or `v0.1.0`; workflow strips optional `v`).
+4. Workflow **npm publish** runs on `release: published` and publishes with provenance.
+Manual dry run (no publish):
+```bash
+gh workflow run npm-publish.yml
+```
+Publish happens **only** on GitHub Release (`release: published`). The manual workflow runs validation and `npm publish --dry-run` only.
+Local dry run:
+```bash
+npm pack --dry-run
+npm publish --dry-run --access public
+```
+## prepack / prepublishOnly
+- **`prepack`**: `validate` + `test-sdlc-gh-cli` (runs on `npm pack` / `npm publish`)
+- **`NPM_PACKAGE_FILES`** in `scripts/lib/npm-package.mjs` must stay in sync with `package.json` `files` (enforced by `validate-harness.mjs`)
+- **`prepublishOnly`**: `check-e2e` (manifest checks only; full bench is CI)
+## Package contents
+Controlled by `files` in `package.json` and `.npmignore`. Sample stacks ship **source only** (no `node_modules` / `vendor`); adopters run `npm install` / `composer install` in the product repo after `--mode new` bootstrap.
+## Rollback
+Unpublish is discouraged on npm. Ship a patch release instead:
+```bash
+npm version patch
+# release + publish 0.1.1
+```
+If a bad tarball must be deprecated:
+```bash
+npm deprecate @guilz-dev/sdlc-gh@0.1.0 "Use >=0.1.1"
+```

package/docs/revert-playbook.md ADDED Viewed

@@ -0,0 +1,44 @@
+# Revert playbook
+Minimal procedure for rolling back harness or product changes. Operational judgment stays in PR review — this document fixes the steps only.
+## When to revert
+Choose revert (not forward-fix) when **any** of these hold:
+1. **Retry exhaustion** — agent-retry-orchestrator reached max retries with the same failure signature
+2. **Bad prompt / instruction change** — merged harness asset caused measurable quality regression (eval drift, spike in wall failures)
+3. **Eval drift** — bench pass rate exceeds production acceptance by more than 15 points (see [operations.md](operations.md))
+4. **Unapproved policy deviation** — change merged without a valid [exception record](exceptions/README.md)
+5. **Production incident** — merged PR is the clear cause of a hotfix or rollback in the product
+## Harness asset rollback vs product rollback
+| Change type | Revert target | Follow-up |
+|-------------|---------------|-----------|
+| Harness only (`.github/**`, `evals/**`, `scripts/**`, `docs/**`) | Revert the harness PR on `main` | Re-run `npm run check`; confirm ruleset checks still green |
+| Product code only | Revert the product PR | Product CI must pass on the revert PR |
+| Mixed harness + product | **Split**: revert harness commit(s) first, then product if needed | Never leave harness in a half-upgraded state |
+Bootstrap/sync PRs that touch both: revert the **entire** sync PR, then re-apply product changes without harness assets if required.
+## Retry exhaustion / bad prompt / eval drift
+1. Identify the merge commit or PR that introduced the regression
+2. Open a revert PR (`git revert <sha>` or GitHub UI **Revert**)
+3. Link the original PR, failure taxonomy class ([failure-taxonomy.md](failure-taxonomy.md)), and any eval-drift Issue
+4. If the root cause was a harness asset change, add a follow-up Issue to fix forward with eval evidence — do not re-merge without passing eval CI
+## Revert PR must include
+- [ ] Link to the original PR and Issue (if any)
+- [ ] Failure class (`feed-forward`, `wall`, `model`, or `eval-drift`)
+- [ ] Evidence: CI logs, eval scores, or trace search query
+- [ ] Rollback hints from the original Issue CC-SD contract (when applicable)
+- [ ] Confirmation that required status checks pass on the revert PR
+## After revert
+1. Post in the morning queue ([operations.md](operations.md)) if the incident affects KPI baseline
+2. If an [exception](exceptions/README.md) was involved, close or expire the exception record
+3. For harness changes, run `npm run drift-report` before the next sync attempt