@guilz-dev/sdlc-gh 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (176) hide show
  1. package/.github/CODEOWNERS +5 -0
  2. package/.github/ISSUE_TEMPLATE/bug_report.yml +68 -0
  3. package/.github/ISSUE_TEMPLATE/config.yml +1 -0
  4. package/.github/ISSUE_TEMPLATE/feature_request.yml +39 -0
  5. package/.github/ISSUE_TEMPLATE/support.yml +56 -0
  6. package/.github/ISSUE_TEMPLATE/task.yml +89 -0
  7. package/.github/agents/implementer.agent.md +17 -0
  8. package/.github/agents/reviewer.agent.md +18 -0
  9. package/.github/agents/triager.agent.md +13 -0
  10. package/.github/aw/actions-lock.json +9 -0
  11. package/.github/copilot-instructions.md +35 -0
  12. package/.github/hooks/hooks.json +12 -0
  13. package/.github/instructions/core.instructions.md +11 -0
  14. package/.github/instructions/profiles/go.instructions.md +10 -0
  15. package/.github/instructions/profiles/php.instructions.md +11 -0
  16. package/.github/instructions/profiles/python.instructions.md +11 -0
  17. package/.github/instructions/profiles/ruby.instructions.md +11 -0
  18. package/.github/instructions/profiles/typescript.instructions.md +11 -0
  19. package/.github/labels.yml +55 -0
  20. package/.github/pull_request_template.md +33 -0
  21. package/.github/ruleset.example.json +33 -0
  22. package/.github/ruleset.harness-eval.example.json +29 -0
  23. package/.github/skills/quality-loop/SKILL.md +23 -0
  24. package/.github/workflows/agent-retry-orchestrator.yml +161 -0
  25. package/.github/workflows/copilot-setup-steps.yml +64 -0
  26. package/.github/workflows/eval-ci.yml +169 -0
  27. package/.github/workflows/eval-drift.yml +75 -0
  28. package/.github/workflows/gh-aw-dogfood-ci.yml +73 -0
  29. package/.github/workflows/harness-ci.yml +244 -0
  30. package/.github/workflows/harness-sync.yml +28 -0
  31. package/.github/workflows/l1-readiness-check.yml +45 -0
  32. package/.github/workflows/labels-sync.yml +24 -0
  33. package/.github/workflows/nightly-harness-review.lock.yml +1643 -0
  34. package/.github/workflows/nightly-harness-review.md +87 -0
  35. package/.github/workflows/nightly-harness-review.yml +63 -0
  36. package/.github/workflows/npm-publish.yml +49 -0
  37. package/.github/workflows/pr-context-comment.yml +138 -0
  38. package/.github/workflows/product-ci-go.yml +33 -0
  39. package/.github/workflows/product-ci-php.yml +39 -0
  40. package/.github/workflows/product-ci-python.yml +34 -0
  41. package/.github/workflows/product-ci-ruby.yml +35 -0
  42. package/.github/workflows/product-ci-ts.yml +37 -0
  43. package/.github/workflows/task-issue-label-sync.yml +50 -0
  44. package/.github/workflows/weekly-redteam.lock.yml +1571 -0
  45. package/.github/workflows/weekly-redteam.md +76 -0
  46. package/.github/zizmor.yml +11 -0
  47. package/AGENTS.md +54 -0
  48. package/LICENSE +21 -0
  49. package/README.md +366 -0
  50. package/config/stacks.json +55 -0
  51. package/docs/adoption.md +126 -0
  52. package/docs/arch.md +535 -0
  53. package/docs/auth-boundaries.md +16 -0
  54. package/docs/coding-agent-l1.md +152 -0
  55. package/docs/exceptions/README.md +25 -0
  56. package/docs/exceptions/TEMPLATE.md +8 -0
  57. package/docs/failure-taxonomy.md +23 -0
  58. package/docs/gh-aw-dogfood.md +109 -0
  59. package/docs/kpi-baseline.md +9 -0
  60. package/docs/nightly-harness-review.md +94 -0
  61. package/docs/operations.md +108 -0
  62. package/docs/publishing.md +79 -0
  63. package/docs/revert-playbook.md +44 -0
  64. package/docs/shared-config.md +30 -0
  65. package/docs/telemetry-artifacts.md +78 -0
  66. package/docs/telemetry-schema.md +60 -0
  67. package/evals/.score-baseline.json +6 -0
  68. package/evals/e2e-bench/README.md +28 -0
  69. package/evals/e2e-bench/manifest.json +16 -0
  70. package/evals/e2e-bench/tasks/e2e-001.yml +10 -0
  71. package/evals/e2e-bench/tasks/e2e-002.yml +11 -0
  72. package/evals/e2e-bench/tasks/e2e-003.yml +10 -0
  73. package/evals/e2e-bench/tasks/e2e-004.yml +14 -0
  74. package/evals/e2e-bench/tasks/e2e-005.yml +11 -0
  75. package/evals/e2e-bench/tasks/e2e-006.yml +10 -0
  76. package/evals/e2e-bench/tasks/e2e-007.yml +10 -0
  77. package/evals/e2e-bench/tasks/e2e-008.yml +10 -0
  78. package/evals/e2e-bench/tasks/e2e-009.yml +10 -0
  79. package/evals/trajectories/rubric.md +12 -0
  80. package/evals/trajectories/test_harness_conventions.py +271 -0
  81. package/infra/README.md +49 -0
  82. package/infra/langfuse/docker-compose.yml +25 -0
  83. package/infra/otel/collector-config.yml +24 -0
  84. package/infra/samples/gh-aw-dogfood-report.json +44 -0
  85. package/infra/samples/harness-review-routing-plan.json +19 -0
  86. package/infra/samples/harness-review-summary.json +61 -0
  87. package/infra/samples/telemetry-artifact.json +29 -0
  88. package/infra/samples/telemetry-payload.json +19 -0
  89. package/package.json +85 -0
  90. package/prompts/triager-classify.prompt.yml +10 -0
  91. package/sample/go/add.go +5 -0
  92. package/sample/go/add_test.go +9 -0
  93. package/sample/go/go.mod +3 -0
  94. package/sample/php/composer.json +26 -0
  95. package/sample/php/composer.lock +1881 -0
  96. package/sample/php/phpunit.xml +8 -0
  97. package/sample/php/src/Add.php +13 -0
  98. package/sample/php/tests/AddTest.php +16 -0
  99. package/sample/python/requirements-dev.txt +2 -0
  100. package/sample/python/src/__init__.py +0 -0
  101. package/sample/python/src/greet.py +3 -0
  102. package/sample/python/tests/conftest.py +4 -0
  103. package/sample/python/tests/test_greet.py +5 -0
  104. package/sample/ruby/.rubocop.yml +10 -0
  105. package/sample/ruby/Gemfile +6 -0
  106. package/sample/ruby/Gemfile.lock +58 -0
  107. package/sample/ruby/lib/add.rb +9 -0
  108. package/sample/ruby/spec/add_spec.rb +11 -0
  109. package/sample/ts/biome.json +6 -0
  110. package/sample/ts/package-lock.json +1763 -0
  111. package/sample/ts/package.json +15 -0
  112. package/sample/ts/src/add.ts +3 -0
  113. package/sample/ts/tests/add.test.ts +8 -0
  114. package/sample/ts/tsconfig.json +12 -0
  115. package/scripts/aggregate-harness-review.mjs +48 -0
  116. package/scripts/bootstrap-harness.sh +411 -0
  117. package/scripts/check-diff-size.mjs +46 -0
  118. package/scripts/check-e2e-manifest.mjs +35 -0
  119. package/scripts/check-eval-score-drift.mjs +31 -0
  120. package/scripts/check-gh-aw-dogfood-scope.mjs +51 -0
  121. package/scripts/check-issue-spec.mjs +215 -0
  122. package/scripts/check-l1-readiness.mjs +82 -0
  123. package/scripts/check-open-pr-limit.mjs +34 -0
  124. package/scripts/doctor.mjs +177 -0
  125. package/scripts/emit-gh-aw-dogfood-report.mjs +112 -0
  126. package/scripts/emit-telemetry-artifact.mjs +99 -0
  127. package/scripts/fetch-telemetry-artifacts.mjs +176 -0
  128. package/scripts/harness-drift-report.mjs +99 -0
  129. package/scripts/lib/bootstrap-copy.mjs +123 -0
  130. package/scripts/lib/ccsd-contract.mjs +212 -0
  131. package/scripts/lib/diff-size.mjs +103 -0
  132. package/scripts/lib/doctor-local.mjs +179 -0
  133. package/scripts/lib/e2e-manifest.mjs +76 -0
  134. package/scripts/lib/gh-aw-dogfood.mjs +293 -0
  135. package/scripts/lib/github-config.mjs +94 -0
  136. package/scripts/lib/harness-ci-fragments.mjs +98 -0
  137. package/scripts/lib/harness-review-routing.mjs +244 -0
  138. package/scripts/lib/harness-review.mjs +388 -0
  139. package/scripts/lib/issue-form-label-sync.mjs +56 -0
  140. package/scripts/lib/l1-readiness.mjs +258 -0
  141. package/scripts/lib/merge-harness-package.mjs +36 -0
  142. package/scripts/lib/npm-package.mjs +129 -0
  143. package/scripts/lib/setup-wizard.mjs +224 -0
  144. package/scripts/lib/stacks.mjs +138 -0
  145. package/scripts/lib/telemetry-artifact.mjs +253 -0
  146. package/scripts/lib/template-root.mjs +39 -0
  147. package/scripts/merge-harness-package.mjs +14 -0
  148. package/scripts/route-harness-review.mjs +168 -0
  149. package/scripts/run-e2e-bench.mjs +216 -0
  150. package/scripts/sdlc-gh-cli.mjs +91 -0
  151. package/scripts/select-eval-jobs.mjs +41 -0
  152. package/scripts/setup-github.mjs +242 -0
  153. package/scripts/setup-github.sh +4 -0
  154. package/scripts/setup-wizard.mjs +426 -0
  155. package/scripts/test-bootstrap-guidance-scenarios.mjs +94 -0
  156. package/scripts/test-diff-size-scenarios.mjs +88 -0
  157. package/scripts/test-doctor-scenarios.mjs +70 -0
  158. package/scripts/test-e2e-manifest-scenarios.mjs +65 -0
  159. package/scripts/test-gh-aw-dogfood-scenarios.mjs +74 -0
  160. package/scripts/test-harness-review-routing-scenarios.mjs +130 -0
  161. package/scripts/test-harness-review-scenarios.mjs +92 -0
  162. package/scripts/test-hooks-scenarios.mjs +44 -0
  163. package/scripts/test-issue-form-label-sync-scenarios.mjs +48 -0
  164. package/scripts/test-issue-spec-scenarios.mjs +258 -0
  165. package/scripts/test-l1-readiness-scenarios.mjs +204 -0
  166. package/scripts/test-merge-harness-package-scenarios.mjs +53 -0
  167. package/scripts/test-npm-package-scenarios.mjs +31 -0
  168. package/scripts/test-sdlc-gh-cli-scenarios.mjs +54 -0
  169. package/scripts/test-setup-github-scenarios.mjs +103 -0
  170. package/scripts/test-setup-wizard-scenarios.mjs +114 -0
  171. package/scripts/test-telemetry-artifact-scenarios.mjs +69 -0
  172. package/scripts/trim-harness-ci.mjs +18 -0
  173. package/scripts/validate-gh-aw-compile.mjs +64 -0
  174. package/scripts/validate-harness.mjs +199 -0
  175. package/scripts/validate-telemetry.mjs +21 -0
  176. package/scripts/verify-bootstrap-stacks.sh +192 -0
@@ -0,0 +1,152 @@
1
+ # Coding agent L1 trial guide
2
+
3
+ Start Issue-driven delegation with low-risk task classes only.
4
+
5
+ ## Eligible tasks
6
+
7
+ | Label | Description |
8
+ |-------|-------------|
9
+ | `task:docs` | Documentation, comments, README |
10
+ | `task:test-fix` | Fix or add tests (single responsibility) |
11
+
12
+ Always set `autonomy:L1` unless triager recommends lower.
13
+
14
+ `task:feature-small`, `task:infra`, and `task:security-sensitive` are **out of scope** for CC-SD enforcement in v1.
15
+
16
+ ## CC-SD contract (required for L1 docs / test-fix)
17
+
18
+ L1 delegation on `task:docs` and `task:test-fix` requires a complete Issue-embedded CC-SD contract:
19
+
20
+ | Field | Required |
21
+ |-------|----------|
22
+ | `Goal` | yes |
23
+ | `Non-goals` | yes |
24
+ | `Constraints` | yes |
25
+ | `Acceptance criteria` | yes |
26
+ | `Rollback hints` | yes |
27
+ | `Additional context` | optional |
28
+
29
+ Use `.github/ISSUE_TEMPLATE/task.yml`. CI enforces completeness via the `issue-spec-check` job when the linked Issue has `autonomy:L1` and `task:docs` or `task:test-fix` **labels**. Task Issues sync those labels automatically from the dropdown selections; triager still verifies that the classification is correct before delegation.
30
+
31
+ **No usable CC-SD contract, no L1 delegation** — triager must not apply `autonomy:L1` if fields are blank or placeholder-only.
32
+
33
+ `issue-spec-check` fails on fetch errors only when PR or Issue labels indicate L1 `docs`/`test-fix`; otherwise it warns and skips.
34
+
35
+ Task Issues created from `.github/ISSUE_TEMPLATE/task.yml` now sync `task:*` and `autonomy:*` labels automatically from the dropdown selections via [task-issue-label-sync.yml](../.github/workflows/task-issue-label-sync.yml). Triager still owns the classification decision, but no longer has to retype the same values into labels.
36
+
37
+ ## CC-SD examples
38
+
39
+ ### Good example: `task:docs`
40
+
41
+ ```md
42
+ Goal
43
+ Refresh the README readiness section so first-time adopters can find the L1 workflow and know when to use it.
44
+
45
+ Non-goals
46
+ - Do not change workflow behavior or required checks.
47
+ - Do not rewrite unrelated setup sections.
48
+
49
+ Constraints
50
+ - Limit edits to README.md and docs/coding-agent-l1.md.
51
+ - Keep terminology aligned with docs/operations.md and workflow names.
52
+
53
+ Acceptance criteria
54
+ - [ ] README links to the L1 readiness workflow from the readiness section.
55
+ - [ ] docs/coding-agent-l1.md mentions the Actions-based fallback.
56
+ - [ ] No other docs are required to understand how to start the check.
57
+
58
+ Rollback hints
59
+ Revert the README/docs commit if the wording causes confusion; no data migration or config rollback is required.
60
+ ```
61
+
62
+ ### Good example: `task:test-fix`
63
+
64
+ ```md
65
+ Goal
66
+ Fix the failing readiness scenario test so strict mode expectations match the current CLI output.
67
+
68
+ Non-goals
69
+ - Do not change production workflow permissions.
70
+ - Do not refactor unrelated doctor or diff-size checks.
71
+
72
+ Constraints
73
+ - Edit only the readiness test and the minimal supporting helper if needed.
74
+ - Preserve current status vocabulary unless a test proves it is wrong.
75
+
76
+ Acceptance criteria
77
+ - [ ] The targeted readiness scenario reproduces the original failure before the fix.
78
+ - [ ] `node scripts/test-l1-readiness-scenarios.mjs` passes after the fix.
79
+ - [ ] The fix does not weaken existing failure coverage.
80
+
81
+ Rollback hints
82
+ Revert the test/helper commit and re-run the scenario suite.
83
+ ```
84
+
85
+ ### Bad example: rejected placeholder-only contract
86
+
87
+ ```md
88
+ Goal
89
+ Update docs.
90
+
91
+ Non-goals
92
+ - None
93
+
94
+ Constraints
95
+ - Keep it simple
96
+
97
+ Acceptance criteria
98
+ - [ ] Works
99
+
100
+ Rollback hints
101
+ Revert if needed
102
+ ```
103
+
104
+ This should be rejected for L1 because the contract does not bound scope or define testable acceptance criteria.
105
+
106
+ ## L1 flow
107
+
108
+ 1. Author fills CC-SD Issue
109
+ 2. Triager validates the contract and confirms or corrects the synced labels
110
+ 3. Implementer executes against the contract
111
+ 4. CI enforces spec completeness (`issue-spec-check`)
112
+ 5. Reviewer checks spec conformance (requirement fit + non-goal preservation)
113
+
114
+ ## Agent assignment
115
+
116
+ 1. Assign **triager** to classify and confirm the synced labels (or add labels manually if sync did not run)
117
+ 2. Assign **implementer** for L1 tasks with a complete contract
118
+ 3. Agent opens draft PR; human reviews at single gate
119
+
120
+ ## Prerequisites
121
+
122
+ - GitHub Copilot with coding agent enabled
123
+ - Copilot setup workflow green
124
+ - Ruleset: `harness-static`, `diff-size`, `issue-spec-check` (for L1 repos), stack `product-ci-*`
125
+
126
+ ## Readiness check (recommended)
127
+
128
+ Run before creating the first L1 Issue:
129
+
130
+ ```bash
131
+ npm run check-l1-readiness
132
+ npm run check-l1-readiness -- --strict
133
+ ```
134
+
135
+ Direct script form is also available: `node scripts/check-l1-readiness.mjs --strict`.
136
+
137
+ Without local Node/gh, use **Actions → L1 readiness check → Run workflow** (`.github/workflows/l1-readiness-check.yml`). The workflow uses `GITHUB_TOKEN` for remote checks and writes a markdown job summary.
138
+
139
+ What it verifies:
140
+
141
+ - local harness assets and doctor checks (stack inferred from `product-ci-*.yml` when `.harness-stack` is absent)
142
+ - Issue template and agent files required for L1 delegation
143
+ - labels/rulesets/workflow status on GitHub when `gh` is authenticated (or via the Actions workflow)
144
+ - unresolved manual prerequisites (for example Copilot coding agent entitlement)
145
+
146
+ Status vocabulary is `PASS` / `FAIL` / `SKIP` / `WARN` / `MANUAL`. `--strict` uses the `check-l1-readiness` CLI's strict gating semantics.
147
+
148
+ ## Success criteria
149
+
150
+ - Draft PR passes all required checks
151
+ - Harness context comment posted on PR
152
+ - Human merges after review
@@ -0,0 +1,25 @@
1
+ # Exception ledger
2
+
3
+ Record deviations from harness principles (arch.md §9). Every exception is time-boxed and must include a revert plan.
4
+
5
+ ## Required fields
6
+
7
+ Every exception record **must** include all of the following:
8
+
9
+ | Field | Requirement |
10
+ |-------|-------------|
11
+ | **Reason** | Why the deviation is needed |
12
+ | **Target task / PR** | Issue or PR link |
13
+ | **Principle deviated** | Which harness principle is waived (e.g. Principle 4 — no harness change without eval) |
14
+ | **Approver** | Named human approver (not the agent) |
15
+ | **Expiry** | Max **14 days** from approval; no permanent exceptions |
16
+ | **Revert plan** | Concrete steps to undo if the exception causes regression |
17
+
18
+ ## Template
19
+
20
+ Copy [TEMPLATE.md](TEMPLATE.md) when recording an exception. Exceptions appear in the next morning queue for post-review.
21
+
22
+ ## Related docs
23
+
24
+ - Revert procedure: [revert-playbook.md](../revert-playbook.md)
25
+ - Morning queue: [operations.md](../operations.md)
@@ -0,0 +1,8 @@
1
+ # Exception record
2
+
3
+ - **Reason**:
4
+ - **Target task / PR**:
5
+ - **Principle deviated**:
6
+ - **Approver**:
7
+ - **Expiry** (max 14 days from approval):
8
+ - **Revert plan**:
@@ -0,0 +1,23 @@
1
+ # Failure Taxonomy
2
+
3
+ Classify failures for outer-loop routing (arch.md §5.3).
4
+
5
+ ## Categories
6
+
7
+ | Class | Definition | Remediation |
8
+ |-------|------------|-------------|
9
+ | **FF不足** (feed-forward gap) | Repeated convention violations, missed steps | Update instructions / skills / agents |
10
+ | **壁不足** (wall gap) | CI passes but human review rejects | Add tests, lint rules, contracts |
11
+ | **モデル限界** (model limit) | Correct tools and context, still fails after N retries | Escalate, split task, or accept human-led |
12
+
13
+ ## Wall failure types
14
+
15
+ `test` | `lint` | `type` | `security` | `safe-output` | `diff-size`
16
+
17
+ ## Routing
18
+
19
+ 1. Auto-retry inner loop (where allowed per `docs/operations.md`)
20
+ 2. Structured comment on PR with `wall_failure_type`
21
+ 3. Nightly GHA aggregate ([nightly-harness-review.md](nightly-harness-review.md))
22
+ 4. Repeated **FF不足** → `outer-loop:harness-revision` issue (automated when thresholds met)
23
+ 5. Repeated **壁不足** → `outer-loop:wall-addition` issue (automated when thresholds met)
@@ -0,0 +1,109 @@
1
+ # gh-aw dogfood track (sdlc-gh on sdlc-gh)
2
+
3
+ Bounded validation path for [Agentic Workflows (`gh aw`)](https://github.github.com/gh-aw/introduction/overview/) on this repository. **Dogfooding gh-aw is not the same as depending on gh-aw** for core outer-loop operability — see [nightly-harness-review.md](nightly-harness-review.md) for the standard GitHub Actions fallback.
4
+
5
+ ## Purpose
6
+
7
+ High-signal validation of:
8
+
9
+ - source `.md` workflows → compiled `.lock.yml`
10
+ - safe-output frontmatter boundaries
11
+ - reviewable, narrow change scope
12
+
13
+ ## How to run a dogfood task
14
+
15
+ 1. Open an Issue/PR labeled **`task:gh-aw-dogfood`** (+ `autonomy:L0` recommended — proposal only).
16
+ 2. Limit changes to the allowed paths below.
17
+ 3. If `.md` sources change, run locally:
18
+
19
+ ```bash
20
+ gh aw compile nightly-harness-review.md
21
+ gh aw compile weekly-redteam.md
22
+ git diff .github/workflows/*.lock.yml
23
+ ```
24
+
25
+ 4. Push and let **gh-aw dogfood CI** (`.github/workflows/gh-aw-dogfood-ci.yml`) record pass/fail criteria.
26
+
27
+ ## Allowed path scope
28
+
29
+ | Area | Paths |
30
+ |------|-------|
31
+ | gh-aw sources | `.github/workflows/nightly-harness-review.md`, `.github/workflows/weekly-redteam.md` |
32
+ | Compiled locks | corresponding `*.lock.yml` |
33
+ | Tooling | `scripts/lib/gh-aw-dogfood.mjs`, `scripts/check-gh-aw-dogfood-scope.mjs`, `scripts/validate-gh-aw-compile.mjs`, `scripts/emit-gh-aw-dogfood-report.mjs`, `scripts/test-gh-aw-dogfood-scenarios.mjs` |
34
+ | CI | `.github/workflows/gh-aw-dogfood-ci.yml` |
35
+ | Labels | `.github/labels.yml` (when adding or updating `task:gh-aw-dogfood`) |
36
+ | AW pins | `.github/aw/actions-lock.json` |
37
+ | Docs | `docs/gh-aw-dogfood.md`, `docs/nightly-harness-review.md` (gh-aw contract sections only) |
38
+
39
+ PRs with `task:gh-aw-dogfood` **fail CI** if any changed file is outside this list.
40
+
41
+ ## Evaluation criteria
42
+
43
+ Recorded in `dogfood-report/gh-aw-dogfood-report.json` (sample: [infra/samples/gh-aw-dogfood-report.json](../infra/samples/gh-aw-dogfood-report.json)):
44
+
45
+ | Criterion | Pass condition |
46
+ |-----------|----------------|
47
+ | **scope** | All PR diffs stay within allowed prefixes when `task:gh-aw-dogfood` is set |
48
+ | **safe_outputs** | `create-pull-request.max <= 1`; no forbidden auto-merge outputs |
49
+ | **compile** | `gh aw compile` succeeds for each source workflow when CLI is present |
50
+ | **lock_drift** | `.lock.yml` has `gh-aw-metadata` header; byte-level drift is caught by **compile** |
51
+ | **reviewability** | Above gates pass; outputs remain PRs/summaries/issues only |
52
+
53
+ Set `GH_AW_COMPILE_REQUIRED=1` in CI to hard-fail when `gh aw` is missing (default: skip with warning).
54
+
55
+ ## Explicit constraints (Issue #7)
56
+
57
+ - Do **not** replace standard GHA nightly aggregation ([nightly-harness-review.yml](../.github/workflows/nightly-harness-review.yml))
58
+ - No autonomous merge
59
+ - No repo-wide refactors under this track
60
+ - Outputs: reviewable artifacts only (PR proposals, compile/drift results, summaries, Issues)
61
+
62
+ ## Comparing runs over time
63
+
64
+ Download `gh-aw-dogfood-{run_id}` artifacts from Actions and diff `criteria` blocks. Track:
65
+
66
+ - compile pass rate
67
+ - lock drift incidents
68
+ - safe-output regressions
69
+ - scope violations
70
+
71
+ ### Baseline run (2026-07-04)
72
+
73
+ First green dogfood CI on `main` after track landing ([#7](https://github.com/guilz-dev/sdlc-gh/issues/7)):
74
+
75
+ | Field | Value |
76
+ |-------|-------|
77
+ | Run | [workflow_dispatch `28712363476`](https://github.com/guilz-dev/sdlc-gh/actions/runs/28712363476) |
78
+ | Compiler | `gh aw` v0.81.6 (pinned in dogfood CI) |
79
+ | `criteria.compile.skipped` | `false` |
80
+ | Artifact | `gh-aw-dogfood-28712363476` |
81
+
82
+ Use this run as the reference point when diffing future dogfood reports.
83
+
84
+ ## Rollback
85
+
86
+ Trigger rollback when:
87
+
88
+ - `gh aw` preview/compiler regression breaks compile on unchanged `.md` sources
89
+ - safe-output policy is violated in dogfood workflows
90
+ - dogfood CI blocks unrelated harness work
91
+
92
+ **Action:**
93
+
94
+ 1. Revert the `.md` / `.lock.yml` pair (see [revert-playbook.md](revert-playbook.md))
95
+ 2. Disable or skip `gh-aw-dogfood-ci` until upstream fix is confirmed
96
+ 3. Keep **GHA outer loop** (`nightly-harness-review.yml`) as the operational baseline
97
+
98
+ ## Tests
99
+
100
+ ```bash
101
+ node scripts/test-gh-aw-dogfood-scenarios.mjs
102
+ node scripts/validate-gh-aw-compile.mjs # requires gh aw
103
+ ```
104
+
105
+ ## Related docs
106
+
107
+ - [failure-taxonomy.md](failure-taxonomy.md) — outer-loop classification (GHA path)
108
+ - [adoption.md](adoption.md) — when to promote beyond dogfood
109
+ - [auth-boundaries.md](auth-boundaries.md) — execution mode credentials
@@ -0,0 +1,9 @@
1
+ # KPI baseline template
2
+
3
+ Record weekly in spreadsheet or issue until Langfuse dashboard is connected.
4
+
5
+ | Week | PR rejection rate | First-pass CI rate | Avg retry count | Notes |
6
+ |------|-------------------|--------------------|-----------------|-------|
7
+ | YYYY-WW | | | | |
8
+
9
+ Fields map to [telemetry-schema.md](telemetry-schema.md).
@@ -0,0 +1,94 @@
1
+ # Nightly harness review
2
+
3
+ Standard GitHub Actions outer-loop job that aggregates inner-loop telemetry artifacts (#2) and classifies failures per [failure-taxonomy.md](failure-taxonomy.md). Runs **without gh-aw**.
4
+
5
+ ## Workflow
6
+
7
+ | Item | Value |
8
+ |------|-------|
9
+ | File | `.github/workflows/nightly-harness-review.yml` |
10
+ | Schedule | `0 2 * * *` (02:00 UTC daily) |
11
+ | Manual | `workflow_dispatch` with optional `window_hours` (default 24); routing is **dry-run by default** unless `apply_routing=true` |
12
+
13
+ gh-aw stub (`.github/workflows/nightly-harness-review.md` + `.lock.yml`) documents promotion criteria and safe-outputs; **GHA** [nightly-harness-review.yml](../.github/workflows/nightly-harness-review.yml) is the operational baseline.
14
+
15
+ ## Pipeline
16
+
17
+ 1. `scripts/fetch-telemetry-artifacts.mjs` — list recent runs for emitter workflows, download `*-telemetry-*` artifacts into `telemetry-collected/`
18
+ 2. `scripts/aggregate-harness-review.mjs` — dedupe, group by `repo` + `task_id` + `pr_number`, classify, write summaries
19
+
20
+ Emitter workflows: see [telemetry-artifacts.md](telemetry-artifacts.md).
21
+
22
+ ## Output
23
+
24
+ | Path | Format |
25
+ |------|--------|
26
+ | `harness-review/harness-review-summary.json` | Machine-readable rollup + per-task `classifications[]` |
27
+ | `harness-review/harness-review-summary.md` | Human-readable tables for the morning queue |
28
+ | GitHub Actions step summary | Same Markdown as above |
29
+ | Artifact `nightly-harness-review-{run_id}` | Uploaded directory for downstream automation |
30
+
31
+ Sample JSON: [infra/samples/harness-review-summary.json](../infra/samples/harness-review-summary.json)
32
+
33
+ ## Classification rules
34
+
35
+ | Class | Signals |
36
+ |-------|---------|
37
+ | **壁不足** | Harness CI green (`wall_failure_type` empty) + `review_outcome: changes_requested` |
38
+ | **モデル限界** | `final_outcome: escalated`, `retry_count >= 3`, security wall, repeated `test`/`type`/etc., or same wall across multiple retry events |
39
+ | **FF不足** | Repeated `lint` / issue-spec failures (≥2 records) |
40
+ | **unclassified** | Failure present but pattern does not match above |
41
+
42
+ `rollup.repeated_failure_signatures` lists `wall_failure_type` values seen on ≥2 records or ≥2 tasks. `by_wall_failure_type` counts **task groups** per wall type (not raw telemetry rows).
43
+
44
+ ## Local dry-run
45
+
46
+ With fixture telemetry JSON under `telemetry-collected/`:
47
+
48
+ ```bash
49
+ node scripts/aggregate-harness-review.mjs
50
+ cat harness-review/harness-review-summary.md
51
+ ```
52
+
53
+ Fetch from GitHub (requires `gh` + token):
54
+
55
+ ```bash
56
+ export GH_TOKEN=...
57
+ export GITHUB_REPOSITORY=owner/repo
58
+ node scripts/fetch-telemetry-artifacts.mjs
59
+ node scripts/aggregate-harness-review.mjs
60
+ ```
61
+
62
+ ## Tests
63
+
64
+ ```bash
65
+ node scripts/test-harness-review-scenarios.mjs
66
+ node scripts/test-harness-review-routing-scenarios.mjs
67
+ ```
68
+
69
+ ## Follow-up automation
70
+
71
+ The JSON summary feeds `scripts/route-harness-review.mjs` (#4), which opens or updates GitHub issues when thresholds are met.
72
+
73
+ ## Issue routing
74
+
75
+ | Classification | Threshold | Issue kind | Labels |
76
+ |----------------|-----------|------------|--------|
77
+ | **FF不足** | ≥2 task groups **or** repeated `lint` signature (`record_count >= 2`) | harness-revision | `outer-loop:harness-revision`, `autonomy:L0` |
78
+ | **壁不足** | ≥2 task groups **or** CI-pass + review-reject proxy | wall-addition | `outer-loop:wall-addition`, `autonomy:L0` |
79
+
80
+ Dedupe: HTML comment marker `<!-- harness-routing-key: {repo}:{kind}:{signature}:{scope} -->` in the issue body. `scope` is derived from task class and wall types where available, so unrelated findings do not collapse into one repo-wide issue. Existing open issues with the same key are **updated**, not duplicated.
81
+
82
+ **Migration:** keys before scope suffix (`{repo}:{kind}:{signature}`) are not matched automatically. Close or relabel legacy routed issues after upgrading, or accept one duplicate cycle before the new keys stabilize.
83
+
84
+ Dry-run locally:
85
+
86
+ ```bash
87
+ node scripts/aggregate-harness-review.mjs
88
+ HARNESS_ROUTING_DRY_RUN=1 node scripts/route-harness-review.mjs
89
+ cat harness-review/harness-review-routing-plan.json
90
+ ```
91
+
92
+ Outputs also written to `harness-review/harness-review-routing-results.json` when live.
93
+
94
+ Non-goals: automatic code changes, proposal PR creation (issues only), Langfuse dependency, weekly red-team.
@@ -0,0 +1,108 @@
1
+ # Operations
2
+
3
+ Canonical thresholds and policies. All CI gates read from this document.
4
+
5
+ ## Change size limits (arch.md §5.2.2)
6
+
7
+ | Level | Max LOC | Max files |
8
+ |-------|---------|-----------|
9
+ | L1 | 300 | 8 |
10
+ | L2 | 120 | 4 |
11
+ | L3 | 60 | 2 |
12
+
13
+ L2/L3 labeled PRs **hard fail** CI when exceeded. L1 **warns by default** (template default). Phase 4 allows opt-in L1 hard-fail via `DIFF_SIZE_L1_HARD_FAIL=1` in the `diff-size` CI job when your org is ready to enforce.
14
+
15
+ To enable L1 hard-fail, uncomment or add in `.github/workflows/harness-ci.yml` under the `diff-size` job:
16
+
17
+ ```yaml
18
+ env:
19
+ DIFF_SIZE_L1_HARD_FAIL: "1"
20
+ ```
21
+
22
+ `autonomy:L0` is **proposal-only** — no LOC/file limits are enforced (human gate). This matches arch.md §5.2.2; older harness versions treated L0 as L1 limits with warn-only.
23
+
24
+ ## Retry policy (Phase 3)
25
+
26
+ | Parameter | Value |
27
+ |-----------|-------|
28
+ | Max retries `N` | 3 |
29
+ | Same failure signature | Stop after 2 consecutive identical |
30
+ | Cost cap per task | Configure per org (`max-ai-credits`) |
31
+
32
+ | Failure type | Retry allowed |
33
+ |--------------|---------------|
34
+ | test | yes |
35
+ | lint / type | yes |
36
+ | security | no — escalate immediately |
37
+ | safe-output / diff-size | conditional — request split |
38
+ | same signature ×2 | no |
39
+
40
+ ## Forbidden operations (hooks + CI)
41
+
42
+ - `git push --force` to protected branches
43
+ - `rm -rf /` and destructive filesystem patterns
44
+ - Production DB / secrets modification without `task:infra` + human approval
45
+
46
+ ## Single gate
47
+
48
+ - Human judgment: **PR review only**
49
+ - No self-approval where ruleset allows enforcement
50
+ - Harness engineer owns `.github/**`, `evals/**`, `docs/telemetry-schema.md`
51
+
52
+ ## Morning queue (outer loop)
53
+
54
+ Daily ~30 min checklist:
55
+
56
+ 1. Review nightly harness review summaries (GHA) and routed `outer-loop:*` issues
57
+ 2. Triage `harness:eval-drift` issues
58
+ 3. Classify failures per [failure-taxonomy.md](failure-taxonomy.md)
59
+ 4. Update [kpi-baseline.md](kpi-baseline.md) if metrics available
60
+
61
+ ## KPI baseline (Phase 1)
62
+
63
+ Track weekly per [kpi-baseline.md](kpi-baseline.md). Schema fields in [telemetry-schema.md](telemetry-schema.md).
64
+
65
+ ## L2 autonomy promotion (Phase 4)
66
+
67
+ Promote `task:docs` to L2 candidate when **all** hold:
68
+
69
+ - Last 50 tasks: adoption rate > 90%
70
+ - Zero major reverts in 90 days
71
+ - E2E bench pass@1 stable or improving
72
+
73
+ Document promotion in PR with evidence links.
74
+
75
+ ## Revert and exceptions (Phase 4)
76
+
77
+ - Revert procedure: [revert-playbook.md](revert-playbook.md)
78
+ - Policy exceptions: [exceptions/README.md](exceptions/README.md) — approver, expiry, revert plan, and principle deviated are required
79
+
80
+ ## Eval governance
81
+
82
+ ### Rubric updates
83
+
84
+ Review for: validity, reproducibility, bias before merge.
85
+
86
+ ### Schedule
87
+
88
+ - PR: subset eval by change type matrix
89
+ - Weekly: full eval suite
90
+
91
+ ### Drift threshold
92
+
93
+ If eval pass rate exceeds production acceptance rate by **more than 15 points**, open bench review Issue automatically.
94
+
95
+ ## Secrets naming
96
+
97
+ | Secret | Purpose |
98
+ |--------|---------|
99
+ | `EVAL_JUDGE_API_KEY` | DeepEval / LLM-as-judge |
100
+ | `GITHUB_TOKEN` | gh models eval (default Actions token often sufficient) |
101
+ | `LANGFUSE_PUBLIC_KEY` | Optional telemetry export |
102
+ | `LANGFUSE_SECRET_KEY` | Optional telemetry export |
103
+
104
+ ## PR open limit (safe outputs substitute)
105
+
106
+ Until gh-aw `safe_outputs.max_prs` is active, CI uses **open PR count per author** as a proxy (Phase 0–2: warn; Phase 3+: gh-aw enforces per workflow run).
107
+
108
+ Warn when an author has more than **3** open PRs at once (`scripts/check-open-pr-limit.mjs`).
@@ -0,0 +1,79 @@
1
+ # npm publish
2
+
3
+ The harness installer is published as **`@guilz-dev/sdlc-gh`**.
4
+
5
+ ## Install (adopters)
6
+
7
+ ```bash
8
+ cd /path/to/your-product
9
+ npx @guilz-dev/sdlc-gh
10
+ ```
11
+
12
+ Pin a version:
13
+
14
+ ```bash
15
+ npx @guilz-dev/sdlc-gh@0.1.0 init --yes --stack ts --codeowners @your-org/harness-engineers
16
+ ```
17
+
18
+ Before the first npm release, use GitHub:
19
+
20
+ ```bash
21
+ npx github:guilz-dev/sdlc-gh
22
+ ```
23
+
24
+ ## Prerequisites (maintainers)
25
+
26
+ 1. npm org **`@guilz-dev`** exists and your user can publish to it
27
+ - https://www.npmjs.com/org/create
28
+ - `npm org ls guilz-dev`
29
+ 2. Granular npm access token with **Publish** for `@guilz-dev/sdlc-gh`
30
+ 3. GitHub repository secret **`NPM_TOKEN`** (repo → Settings → Secrets → Actions)
31
+
32
+ Enable **Trusted Publishing** (recommended) or use a classic token with publish scope.
33
+
34
+ ## Release flow
35
+
36
+ 1. Bump `version` in `package.json` (semver).
37
+ 2. Merge to `main`.
38
+ 3. Create a GitHub **Release** whose tag matches the version (`0.1.0` or `v0.1.0`; workflow strips optional `v`).
39
+ 4. Workflow **npm publish** runs on `release: published` and publishes with provenance.
40
+
41
+ Manual dry run (no publish):
42
+
43
+ ```bash
44
+ gh workflow run npm-publish.yml
45
+ ```
46
+
47
+ Publish happens **only** on GitHub Release (`release: published`). The manual workflow runs validation and `npm publish --dry-run` only.
48
+
49
+ Local dry run:
50
+
51
+ ```bash
52
+ npm pack --dry-run
53
+ npm publish --dry-run --access public
54
+ ```
55
+
56
+ ## prepack / prepublishOnly
57
+
58
+ - **`prepack`**: `validate` + `test-sdlc-gh-cli` (runs on `npm pack` / `npm publish`)
59
+ - **`NPM_PACKAGE_FILES`** in `scripts/lib/npm-package.mjs` must stay in sync with `package.json` `files` (enforced by `validate-harness.mjs`)
60
+ - **`prepublishOnly`**: `check-e2e` (manifest checks only; full bench is CI)
61
+
62
+ ## Package contents
63
+
64
+ Controlled by `files` in `package.json` and `.npmignore`. Sample stacks ship **source only** (no `node_modules` / `vendor`); adopters run `npm install` / `composer install` in the product repo after `--mode new` bootstrap.
65
+
66
+ ## Rollback
67
+
68
+ Unpublish is discouraged on npm. Ship a patch release instead:
69
+
70
+ ```bash
71
+ npm version patch
72
+ # release + publish 0.1.1
73
+ ```
74
+
75
+ If a bad tarball must be deprecated:
76
+
77
+ ```bash
78
+ npm deprecate @guilz-dev/sdlc-gh@0.1.0 "Use >=0.1.1"
79
+ ```
@@ -0,0 +1,44 @@
1
+ # Revert playbook
2
+
3
+ Minimal procedure for rolling back harness or product changes. Operational judgment stays in PR review — this document fixes the steps only.
4
+
5
+ ## When to revert
6
+
7
+ Choose revert (not forward-fix) when **any** of these hold:
8
+
9
+ 1. **Retry exhaustion** — agent-retry-orchestrator reached max retries with the same failure signature
10
+ 2. **Bad prompt / instruction change** — merged harness asset caused measurable quality regression (eval drift, spike in wall failures)
11
+ 3. **Eval drift** — bench pass rate exceeds production acceptance by more than 15 points (see [operations.md](operations.md))
12
+ 4. **Unapproved policy deviation** — change merged without a valid [exception record](exceptions/README.md)
13
+ 5. **Production incident** — merged PR is the clear cause of a hotfix or rollback in the product
14
+
15
+ ## Harness asset rollback vs product rollback
16
+
17
+ | Change type | Revert target | Follow-up |
18
+ |-------------|---------------|-----------|
19
+ | Harness only (`.github/**`, `evals/**`, `scripts/**`, `docs/**`) | Revert the harness PR on `main` | Re-run `npm run check`; confirm ruleset checks still green |
20
+ | Product code only | Revert the product PR | Product CI must pass on the revert PR |
21
+ | Mixed harness + product | **Split**: revert harness commit(s) first, then product if needed | Never leave harness in a half-upgraded state |
22
+
23
+ Bootstrap/sync PRs that touch both: revert the **entire** sync PR, then re-apply product changes without harness assets if required.
24
+
25
+ ## Retry exhaustion / bad prompt / eval drift
26
+
27
+ 1. Identify the merge commit or PR that introduced the regression
28
+ 2. Open a revert PR (`git revert <sha>` or GitHub UI **Revert**)
29
+ 3. Link the original PR, failure taxonomy class ([failure-taxonomy.md](failure-taxonomy.md)), and any eval-drift Issue
30
+ 4. If the root cause was a harness asset change, add a follow-up Issue to fix forward with eval evidence — do not re-merge without passing eval CI
31
+
32
+ ## Revert PR must include
33
+
34
+ - [ ] Link to the original PR and Issue (if any)
35
+ - [ ] Failure class (`feed-forward`, `wall`, `model`, or `eval-drift`)
36
+ - [ ] Evidence: CI logs, eval scores, or trace search query
37
+ - [ ] Rollback hints from the original Issue CC-SD contract (when applicable)
38
+ - [ ] Confirmation that required status checks pass on the revert PR
39
+
40
+ ## After revert
41
+
42
+ 1. Post in the morning queue ([operations.md](operations.md)) if the incident affects KPI baseline
43
+ 2. If an [exception](exceptions/README.md) was involved, close or expire the exception record
44
+ 3. For harness changes, run `npm run drift-report` before the next sync attempt