@guilz-dev/sdlc-gh 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (176) hide show
  1. package/.github/CODEOWNERS +5 -0
  2. package/.github/ISSUE_TEMPLATE/bug_report.yml +68 -0
  3. package/.github/ISSUE_TEMPLATE/config.yml +1 -0
  4. package/.github/ISSUE_TEMPLATE/feature_request.yml +39 -0
  5. package/.github/ISSUE_TEMPLATE/support.yml +56 -0
  6. package/.github/ISSUE_TEMPLATE/task.yml +89 -0
  7. package/.github/agents/implementer.agent.md +17 -0
  8. package/.github/agents/reviewer.agent.md +18 -0
  9. package/.github/agents/triager.agent.md +13 -0
  10. package/.github/aw/actions-lock.json +9 -0
  11. package/.github/copilot-instructions.md +35 -0
  12. package/.github/hooks/hooks.json +12 -0
  13. package/.github/instructions/core.instructions.md +11 -0
  14. package/.github/instructions/profiles/go.instructions.md +10 -0
  15. package/.github/instructions/profiles/php.instructions.md +11 -0
  16. package/.github/instructions/profiles/python.instructions.md +11 -0
  17. package/.github/instructions/profiles/ruby.instructions.md +11 -0
  18. package/.github/instructions/profiles/typescript.instructions.md +11 -0
  19. package/.github/labels.yml +55 -0
  20. package/.github/pull_request_template.md +33 -0
  21. package/.github/ruleset.example.json +33 -0
  22. package/.github/ruleset.harness-eval.example.json +29 -0
  23. package/.github/skills/quality-loop/SKILL.md +23 -0
  24. package/.github/workflows/agent-retry-orchestrator.yml +161 -0
  25. package/.github/workflows/copilot-setup-steps.yml +64 -0
  26. package/.github/workflows/eval-ci.yml +169 -0
  27. package/.github/workflows/eval-drift.yml +75 -0
  28. package/.github/workflows/gh-aw-dogfood-ci.yml +73 -0
  29. package/.github/workflows/harness-ci.yml +244 -0
  30. package/.github/workflows/harness-sync.yml +28 -0
  31. package/.github/workflows/l1-readiness-check.yml +45 -0
  32. package/.github/workflows/labels-sync.yml +24 -0
  33. package/.github/workflows/nightly-harness-review.lock.yml +1643 -0
  34. package/.github/workflows/nightly-harness-review.md +87 -0
  35. package/.github/workflows/nightly-harness-review.yml +63 -0
  36. package/.github/workflows/npm-publish.yml +49 -0
  37. package/.github/workflows/pr-context-comment.yml +138 -0
  38. package/.github/workflows/product-ci-go.yml +33 -0
  39. package/.github/workflows/product-ci-php.yml +39 -0
  40. package/.github/workflows/product-ci-python.yml +34 -0
  41. package/.github/workflows/product-ci-ruby.yml +35 -0
  42. package/.github/workflows/product-ci-ts.yml +37 -0
  43. package/.github/workflows/task-issue-label-sync.yml +50 -0
  44. package/.github/workflows/weekly-redteam.lock.yml +1571 -0
  45. package/.github/workflows/weekly-redteam.md +76 -0
  46. package/.github/zizmor.yml +11 -0
  47. package/AGENTS.md +54 -0
  48. package/LICENSE +21 -0
  49. package/README.md +366 -0
  50. package/config/stacks.json +55 -0
  51. package/docs/adoption.md +126 -0
  52. package/docs/arch.md +535 -0
  53. package/docs/auth-boundaries.md +16 -0
  54. package/docs/coding-agent-l1.md +152 -0
  55. package/docs/exceptions/README.md +25 -0
  56. package/docs/exceptions/TEMPLATE.md +8 -0
  57. package/docs/failure-taxonomy.md +23 -0
  58. package/docs/gh-aw-dogfood.md +109 -0
  59. package/docs/kpi-baseline.md +9 -0
  60. package/docs/nightly-harness-review.md +94 -0
  61. package/docs/operations.md +108 -0
  62. package/docs/publishing.md +79 -0
  63. package/docs/revert-playbook.md +44 -0
  64. package/docs/shared-config.md +30 -0
  65. package/docs/telemetry-artifacts.md +78 -0
  66. package/docs/telemetry-schema.md +60 -0
  67. package/evals/.score-baseline.json +6 -0
  68. package/evals/e2e-bench/README.md +28 -0
  69. package/evals/e2e-bench/manifest.json +16 -0
  70. package/evals/e2e-bench/tasks/e2e-001.yml +10 -0
  71. package/evals/e2e-bench/tasks/e2e-002.yml +11 -0
  72. package/evals/e2e-bench/tasks/e2e-003.yml +10 -0
  73. package/evals/e2e-bench/tasks/e2e-004.yml +14 -0
  74. package/evals/e2e-bench/tasks/e2e-005.yml +11 -0
  75. package/evals/e2e-bench/tasks/e2e-006.yml +10 -0
  76. package/evals/e2e-bench/tasks/e2e-007.yml +10 -0
  77. package/evals/e2e-bench/tasks/e2e-008.yml +10 -0
  78. package/evals/e2e-bench/tasks/e2e-009.yml +10 -0
  79. package/evals/trajectories/rubric.md +12 -0
  80. package/evals/trajectories/test_harness_conventions.py +271 -0
  81. package/infra/README.md +49 -0
  82. package/infra/langfuse/docker-compose.yml +25 -0
  83. package/infra/otel/collector-config.yml +24 -0
  84. package/infra/samples/gh-aw-dogfood-report.json +44 -0
  85. package/infra/samples/harness-review-routing-plan.json +19 -0
  86. package/infra/samples/harness-review-summary.json +61 -0
  87. package/infra/samples/telemetry-artifact.json +29 -0
  88. package/infra/samples/telemetry-payload.json +19 -0
  89. package/package.json +85 -0
  90. package/prompts/triager-classify.prompt.yml +10 -0
  91. package/sample/go/add.go +5 -0
  92. package/sample/go/add_test.go +9 -0
  93. package/sample/go/go.mod +3 -0
  94. package/sample/php/composer.json +26 -0
  95. package/sample/php/composer.lock +1881 -0
  96. package/sample/php/phpunit.xml +8 -0
  97. package/sample/php/src/Add.php +13 -0
  98. package/sample/php/tests/AddTest.php +16 -0
  99. package/sample/python/requirements-dev.txt +2 -0
  100. package/sample/python/src/__init__.py +0 -0
  101. package/sample/python/src/greet.py +3 -0
  102. package/sample/python/tests/conftest.py +4 -0
  103. package/sample/python/tests/test_greet.py +5 -0
  104. package/sample/ruby/.rubocop.yml +10 -0
  105. package/sample/ruby/Gemfile +6 -0
  106. package/sample/ruby/Gemfile.lock +58 -0
  107. package/sample/ruby/lib/add.rb +9 -0
  108. package/sample/ruby/spec/add_spec.rb +11 -0
  109. package/sample/ts/biome.json +6 -0
  110. package/sample/ts/package-lock.json +1763 -0
  111. package/sample/ts/package.json +15 -0
  112. package/sample/ts/src/add.ts +3 -0
  113. package/sample/ts/tests/add.test.ts +8 -0
  114. package/sample/ts/tsconfig.json +12 -0
  115. package/scripts/aggregate-harness-review.mjs +48 -0
  116. package/scripts/bootstrap-harness.sh +411 -0
  117. package/scripts/check-diff-size.mjs +46 -0
  118. package/scripts/check-e2e-manifest.mjs +35 -0
  119. package/scripts/check-eval-score-drift.mjs +31 -0
  120. package/scripts/check-gh-aw-dogfood-scope.mjs +51 -0
  121. package/scripts/check-issue-spec.mjs +215 -0
  122. package/scripts/check-l1-readiness.mjs +82 -0
  123. package/scripts/check-open-pr-limit.mjs +34 -0
  124. package/scripts/doctor.mjs +177 -0
  125. package/scripts/emit-gh-aw-dogfood-report.mjs +112 -0
  126. package/scripts/emit-telemetry-artifact.mjs +99 -0
  127. package/scripts/fetch-telemetry-artifacts.mjs +176 -0
  128. package/scripts/harness-drift-report.mjs +99 -0
  129. package/scripts/lib/bootstrap-copy.mjs +123 -0
  130. package/scripts/lib/ccsd-contract.mjs +212 -0
  131. package/scripts/lib/diff-size.mjs +103 -0
  132. package/scripts/lib/doctor-local.mjs +179 -0
  133. package/scripts/lib/e2e-manifest.mjs +76 -0
  134. package/scripts/lib/gh-aw-dogfood.mjs +293 -0
  135. package/scripts/lib/github-config.mjs +94 -0
  136. package/scripts/lib/harness-ci-fragments.mjs +98 -0
  137. package/scripts/lib/harness-review-routing.mjs +244 -0
  138. package/scripts/lib/harness-review.mjs +388 -0
  139. package/scripts/lib/issue-form-label-sync.mjs +56 -0
  140. package/scripts/lib/l1-readiness.mjs +258 -0
  141. package/scripts/lib/merge-harness-package.mjs +36 -0
  142. package/scripts/lib/npm-package.mjs +129 -0
  143. package/scripts/lib/setup-wizard.mjs +224 -0
  144. package/scripts/lib/stacks.mjs +138 -0
  145. package/scripts/lib/telemetry-artifact.mjs +253 -0
  146. package/scripts/lib/template-root.mjs +39 -0
  147. package/scripts/merge-harness-package.mjs +14 -0
  148. package/scripts/route-harness-review.mjs +168 -0
  149. package/scripts/run-e2e-bench.mjs +216 -0
  150. package/scripts/sdlc-gh-cli.mjs +91 -0
  151. package/scripts/select-eval-jobs.mjs +41 -0
  152. package/scripts/setup-github.mjs +242 -0
  153. package/scripts/setup-github.sh +4 -0
  154. package/scripts/setup-wizard.mjs +426 -0
  155. package/scripts/test-bootstrap-guidance-scenarios.mjs +94 -0
  156. package/scripts/test-diff-size-scenarios.mjs +88 -0
  157. package/scripts/test-doctor-scenarios.mjs +70 -0
  158. package/scripts/test-e2e-manifest-scenarios.mjs +65 -0
  159. package/scripts/test-gh-aw-dogfood-scenarios.mjs +74 -0
  160. package/scripts/test-harness-review-routing-scenarios.mjs +130 -0
  161. package/scripts/test-harness-review-scenarios.mjs +92 -0
  162. package/scripts/test-hooks-scenarios.mjs +44 -0
  163. package/scripts/test-issue-form-label-sync-scenarios.mjs +48 -0
  164. package/scripts/test-issue-spec-scenarios.mjs +258 -0
  165. package/scripts/test-l1-readiness-scenarios.mjs +204 -0
  166. package/scripts/test-merge-harness-package-scenarios.mjs +53 -0
  167. package/scripts/test-npm-package-scenarios.mjs +31 -0
  168. package/scripts/test-sdlc-gh-cli-scenarios.mjs +54 -0
  169. package/scripts/test-setup-github-scenarios.mjs +103 -0
  170. package/scripts/test-setup-wizard-scenarios.mjs +114 -0
  171. package/scripts/test-telemetry-artifact-scenarios.mjs +69 -0
  172. package/scripts/trim-harness-ci.mjs +18 -0
  173. package/scripts/validate-gh-aw-compile.mjs +64 -0
  174. package/scripts/validate-harness.mjs +199 -0
  175. package/scripts/validate-telemetry.mjs +21 -0
  176. package/scripts/verify-bootstrap-stacks.sh +192 -0
@@ -0,0 +1,126 @@
1
+ # Adoption Guide
2
+
3
+ Apply this harness template to any repository.
4
+
5
+ ## Prerequisites
6
+
7
+ - GitHub repository with Actions enabled
8
+ - GitHub Copilot (Business or Enterprise) for coding agent features
9
+ - Optional: self-hosted Langfuse for telemetry
10
+
11
+ ## New repository
12
+
13
+ 1. Use **GitHub Template repository** → Create new repository from this template.
14
+ 2. Or run the wizard (recommended):
15
+
16
+ ```bash
17
+ cd /path/to/new-product
18
+ npx @guilz-dev/sdlc-gh
19
+ ```
20
+
21
+ 3. Or run bootstrap manually, then wizard:
22
+
23
+ ```bash
24
+ git clone <harness-template-url> /tmp/harness
25
+ /tmp/harness/scripts/bootstrap-harness.sh \
26
+ --repo /path/to/new-product \
27
+ --stack ts \
28
+ --mode new \
29
+ --codeowners-team @your-org/harness-engineers
30
+ cd /path/to/new-product && npx @guilz-dev/sdlc-gh --yes --stack ts --codeowners @your-org/harness-engineers
31
+ ```
32
+
33
+ 3. Run `./scripts/setup-github.sh` to sync labels and create/update the `main-protection` ruleset.
34
+ 4. *(Optional, Phase 3)* After eval CI is green in your org, run `./scripts/setup-github.sh --with-eval-ruleset` to create/update the `harness-pr-eval-required` ruleset. The template ruleset applies to all PRs targeting `main`; narrow conditions in GitHub Settings if you only want harness-asset PRs blocked. GitHub Models enablement is still required before `prompt-eval` can block merges.
35
+ 5. Run `./scripts/doctor.mjs --strict` and fix any remaining failures.
36
+ 6. Manual fallback: import `.github/ruleset.example.json` and apply `.github/labels.yml` if `gh` cannot be used.
37
+
38
+ ## GitHub setup order
39
+
40
+ Apply in this order (or run `./scripts/setup-wizard.mjs` to perform steps 1–2 and verify with doctor):
41
+
42
+ 1. **Labels sync** — `task:*` and `autonomy:*` from `.github/labels.yml`
43
+ 2. **Main protection** — `main-protection` ruleset with harness + product CI checks
44
+ 3. **Optional eval ruleset** — `--with-eval-ruleset` adds `harness-pr-eval-required` (eval CI checks only; does not enable GitHub Models). This ruleset targets `main` and requires `select` + `trajectory-conventions` on **all** PRs to that branch — enable only when your org accepts that cost, or narrow the ruleset conditions in GitHub Settings after creation.
45
+
46
+ ### Setup wizard
47
+
48
+ `./scripts/setup-wizard.mjs` orchestrates Phase 0–1 install settings:
49
+
50
+ - writes `.harness-stack` (primary stack for rulesets)
51
+ - replaces the `CODEOWNERS` placeholder on **product repos** (skipped by default with `--template`)
52
+ - runs `setup-github.sh` (labels + rulesets)
53
+ - runs `doctor --strict` (pass `--template` for multi-stack template repos)
54
+
55
+ Non-interactive flags: `--yes`, `--stack`, `--codeowners`, `--github-repo`, `--with-eval-ruleset`, `--skip-github`, `--dry-run`, `--patch-codeowners` (opt-in CODEOWNERS replacement in template mode), `--force-bootstrap` (destructive; never with `--yes`).
56
+
57
+ ## Behavior / spec corrections (template updates)
58
+
59
+ When pulling harness updates, review these intentional behavior alignments with [arch.md](arch.md):
60
+
61
+ | Area | Current spec | Legacy behavior |
62
+ |------|--------------|-----------------|
63
+ | `autonomy:L0` diff-size | Proposal only — no LOC/file gate | Some versions applied L1 limits with warn |
64
+ | L1 over-limit | Warn by default; opt-in hard-fail via `DIFF_SIZE_L1_HARD_FAIL=1` | — |
65
+
66
+ ## Existing repository (phased)
67
+
68
+ | Step | Assets | Risk |
69
+ |------|--------|------|
70
+ | 1 | FF only: instructions, agents, hooks, templates | Low |
71
+ | 2 | `harness-ci.yml` + stack `product-ci` | Medium |
72
+ | 3 | Eval CI + ruleset eval required | Medium |
73
+ | 4 | Coding agent L1 on `task:docs` / `task:test-fix` (CC-SD contract required) | Low tasks first |
74
+
75
+ gh-aw outer loop: use the **gh-aw dogfood track** ([gh-aw-dogfood.md](gh-aw-dogfood.md)) for bounded `gh aw compile` validation on `sdlc-gh` itself. Standard GHA aggregation remains the operational baseline — see [nightly-harness-review.md](nightly-harness-review.md). Do not enable unrestricted gh-aw across the repo until dogfood criteria stay green over multiple runs.
76
+
77
+ ```bash
78
+ ./scripts/bootstrap-harness.sh \
79
+ --repo /path/to/existing \
80
+ --codeowners-team @your-org/harness-engineers
81
+ cd /path/to/existing
82
+ npx @guilz-dev/sdlc-gh --yes --stack ts --codeowners @your-org/harness-engineers
83
+ ```
84
+
85
+ Or skip manual bootstrap entirely:
86
+
87
+ ```bash
88
+ cd /path/to/existing
89
+ npx @guilz-dev/sdlc-gh
90
+ ```
91
+
92
+ ## Stack selection
93
+
94
+ | Stack | Profile | Sample | CI workflow |
95
+ |-------|---------|--------|-------------|
96
+ | `ts` | `typescript.instructions.md` | `sample/ts/` | `product-ci-ts.yml` |
97
+ | `python` | `python.instructions.md` | `sample/python/` | `product-ci-python.yml` |
98
+ | `go` | `go.instructions.md` | `sample/go/` | `product-ci-go.yml` |
99
+ | `ruby` | `ruby.instructions.md` | `sample/ruby/` | `product-ci-ruby.yml` |
100
+ | `php` | `php.instructions.md` | `sample/php/` | `product-ci-php.yml` |
101
+
102
+ Stack metadata is centralized in [`config/stacks.json`](../config/stacks.json). Bootstrap copies **only** the selected stack's profile and `product-ci-*` workflow, and replaces the `CODEOWNERS` team placeholder at install time.
103
+
104
+ ## CC-SD contract (L1 only in v1)
105
+
106
+ Phase 4 L1 delegation uses a lightweight **Issue-embedded CC-SD contract** — not a separate spec file. v1 enforces the contract only for `task:docs` and `task:test-fix` at `autonomy:L1` via the `issue-spec-check` CI job. `feature-small`, `infra`, and `security-sensitive` are out of scope until a later version.
107
+
108
+ Required Issue fields: `Goal`, `Non-goals`, `Constraints`, `Acceptance criteria`, `Rollback hints`. See [coding-agent-l1.md](coding-agent-l1.md). Enforcement uses Issue **labels** (`task:*`, `autonomy:*`), not the form dropdown alone.
109
+
110
+ `issue-spec-check` is safe to keep always required: non-L1 and unlinked PRs exit successfully (warn/skip only).
111
+
112
+ ## Sync from canonical template (Phase 4)
113
+
114
+ Use `harness-sync.yml` or subtree merge to pull harness updates. Review drift report before merge.
115
+
116
+ ## Rollback
117
+
118
+ See [revert-playbook.md](revert-playbook.md) for the canonical procedure. Quick steps:
119
+
120
+ 1. Revert the bootstrap commit or sync PR.
121
+ 2. Disable required status checks for `harness-ci` in ruleset.
122
+ 3. Remove `.github/agents` if coding agent assignment causes issues.
123
+
124
+ ## Multi-project rollout
125
+
126
+ Target **3+ product repos** sharing the same template version. Pin template ref in `harness-sync.yml`.
package/docs/arch.md ADDED
@@ -0,0 +1,535 @@
1
+ # Agent Harness Architecture — GitHub Copilot Core
2
+
3
+ **Version**: 1.1 (2026-07-04)
4
+ **Repository**: `sdlc-gh` — template harness for GitHub Copilot coding agents
5
+ **Canonical ops**: [operations.md](operations.md) (thresholds, retry policy, forbidden ops)
6
+
7
+ ---
8
+
9
+ ## 1. Executive summary
10
+
11
+ This document describes the architecture of **sdlc-gh**, a stack-agnostic agent harness template built on the GitHub Copilot ecosystem (coding agent, CLI/IDE, Agentic Workflows, GitHub Models) and complementary OSS (Langfuse, DeepEval, promptfoo, etc.).
12
+
13
+ A **harness** is the full mechanism for keeping AI agents aligned with intent — not just prompts. It combines:
14
+
15
+ - **Feed-forward**: instructions, agents, skills, tool limits, credential boundaries
16
+ - **Feedback**: deterministic walls (CI, hooks, diff-size gates), observability, evals, human PR review
17
+
18
+ Three design conclusions:
19
+
20
+ 1. **Use off-the-shelf enforcement.** Isolation, safe outputs, deterministic walls, observability, and eval runners come from GitHub platform + OSS — do not rebuild them.
21
+ 2. **Invest in intent definition.** Golden datasets, rubrics, wall content, and revision-cycle operations are the differentiation layer.
22
+ 3. **Converge human judgment on PR review.** No matter how autonomous agents become, add gates at review time — not mid-execution approval prompts.
23
+
24
+ ### Implementation status (this repo)
25
+
26
+ | Area | Status |
27
+ |------|--------|
28
+ | Bootstrap, stack catalog, harness/product CI | **Implemented** |
29
+ | Hooks, diff-size gate, CC-SD issue-spec check | **Implemented** |
30
+ | Custom agents (triager / implementer / reviewer) | **Implemented** |
31
+ | Eval CI with change-type job selection | **Implemented** |
32
+ | Retry orchestrator, PR context comments | **Implemented** |
33
+ | E2E bench (executable acceptance checks) | **Partial** — 9 tasks; not yet break-and-fix agent runner |
34
+ | `gh models eval` in CI | **Scaffolded** — runs when prompts exist; org must enable Models |
35
+ | gh-aw outer loop (`nightly-harness-review`, `weekly-redteam`) | **Partial** — GHA nightly review + gh-aw dogfood track (#7); `.md`/`.lock.yml` stubs remain |
36
+ | Langfuse / OTel export | **Scaffolded** — `infra/` + schema; inner-loop JSON artifacts wired |
37
+
38
+ Operational details and thresholds live in companion docs — see [Documentation index](#11-related-documentation).
39
+
40
+ ---
41
+
42
+ ## 2. Design principles
43
+
44
+ ### 2.1 Harness as a control system
45
+
46
+ Model the harness as a dual-loop control system:
47
+
48
+ ```mermaid
49
+ flowchart LR
50
+ subgraph OUTER["Outer loop (cross-task, daily–weekly)"]
51
+ EVAL[Eval platform<br/>eval / trace analysis]
52
+ REVISE[Harness revision<br/>instructions / skills / walls]
53
+ end
54
+ subgraph INNER["Inner loop (single task, seconds–minutes)"]
55
+ FF[Feed-forward<br/>instructions / skills / tool limits]
56
+ AGENT[Agent execution<br/>plan → act → observe]
57
+ WALL[Deterministic walls<br/>tests / lint / hooks / diff-size]
58
+ end
59
+ INTENT[Intent<br/>Issue / CC-SD contract] --> FF
60
+ FF --> AGENT
61
+ AGENT --> WALL
62
+ WALL -- fail → retry --> AGENT
63
+ WALL -- pass --> OUT[PR artifact]
64
+ AGENT -. trace .-> EVAL
65
+ WALL -. pass/fail log .-> EVAL
66
+ OUT -. review outcome .-> EVAL
67
+ EVAL --> REVISE
68
+ REVISE -- asset update --> FF
69
+ REVISE -- wall addition --> WALL
70
+ ```
71
+
72
+ The **inner loop** runs fast (seconds–minutes). The **outer loop** runs slower (daily–weekly). Without the outer loop, the harness degrades as models and task distributions shift.
73
+
74
+ ### 2.2 Five principles
75
+
76
+ **Principle 1: Walls are declarative and deterministic.** Constraints that depend on agent goodwill are not constraints. Prefer CI jobs, hooks, rulesets, and (when available) gh-aw safe outputs over prompt pleading.
77
+
78
+ **Principle 2: Structurally separate secrets from agents.** gh-aw's secretless design is the ideal; coding agent uses short-lived scoped tokens; CLI/IDE and SDK use delegated or proxied credentials. Never expose long-lived keys in agent-readable files.
79
+
80
+ **Principle 3: Log at every trust boundary.** Observability enables future control. Traces use OpenTelemetry; avoid vendor lock-in on the export path.
81
+
82
+ **Principle 4: No harness change without eval.** Changes to instructions, agents, skills, hooks, or eval assets must pass the change-type eval matrix before merge.
83
+
84
+ **Principle 5: One human gate — make it strong.** Collect decision inputs on the PR (scores, cost, trace links, harness asset SHAs). Do not scatter synchronous approval prompts.
85
+
86
+ > L2 auto-merge and L3 full auto-merge are **future promotions** with strict scope limits. PR review is never abolished — only its sync timing and scope change. See [operations.md](operations.md).
87
+
88
+ ---
89
+
90
+ ## 3. Platform and gap analysis
91
+
92
+ ### 3.1 GitHub platform maturity (mid-2026)
93
+
94
+ | Layer | Offering | Maturity | Notes |
95
+ |-------|----------|----------|-------|
96
+ | Instruction hierarchy | `copilot-instructions.md`, `AGENTS.md`, `.instructions.md` | GA | Priority: custom agent > path-specific > global |
97
+ | Custom agents | `.github/agents/*.agent.md` | GA | Tools, handoffs, org distribution via `.github-private` |
98
+ | Skills | Agent Skills (open spec) | Available | On-demand load; compatible with Copilot / Claude Code / Codex |
99
+ | Hooks | `hooks.json` (6 events) | Available | Deterministic block of destructive ops |
100
+ | Environment setup | `copilot-setup-steps.yml` | Available | Agent continues even if setup fails — monitor it |
101
+ | Isolated execution | coding agent (Actions VM, own branch) | GA | Requester cannot approve own PR |
102
+ | Automation | Agentic Workflows (gh-aw) | Public Preview | Markdown → lock.yml, AWF firewall, safe outputs, threat detection |
103
+ | Observability | gh aw logs/audit, OTel export | Available | |
104
+ | Eval | GitHub Models: `.prompt.yml` + `gh models eval` | Available | Single-prompt eval; not full agent trajectory |
105
+ | Embedding | Copilot SDK | Public Preview | Node/Python/Go/.NET/Java |
106
+
107
+ ### 3.2 OSS complement map
108
+
109
+ | Purpose | Primary | Alternative | Rationale |
110
+ |---------|---------|-------------|-----------|
111
+ | Trace / observability | Langfuse (self-host) | Phoenix, OpenLLMetry | OTel-compatible; de facto OSS choice |
112
+ | Trajectory / agent eval | DeepEval | promptfoo, Ragas | pytest-style CI integration; G-Eval |
113
+ | Red team | NVIDIA garak | AI-Infra-Guard | Periodic prompt-injection testing |
114
+ | Workflow static analysis | zizmor, actionlint | — | Integrated in `harness-ci` |
115
+ | Template assets | github/awesome-copilot | — | Official recipes |
116
+
117
+ ### 3.3 Gaps requiring custom work
118
+
119
+ | Gap | Description | sdlc-gh response |
120
+ |-----|-------------|------------------|
121
+ | **G1** Trajectory golden dataset | `gh models eval` is prompt-level, not E2E Issue→PR | `evals/e2e-bench/` — executable acceptance checks today; break-and-fix runner planned |
122
+ | **G2** Rubrics | "Good PR" definition is domain-specific | `evals/trajectories/rubric.md` + convention tests |
123
+ | **G3** Revision-cycle operations | Trace → classify → revise routing | Documented in [failure-taxonomy.md](failure-taxonomy.md); gh-aw stubs for automation |
124
+
125
+ #### G1 runner boundary (current vs planned)
126
+
127
+ | Concern | Current (`run-e2e-bench.mjs`) | Planned break-and-fix runner |
128
+ |---------|-------------------------------|------------------------------|
129
+ | **Task input** | Static YAML fixture in `tasks/*.yml` | Issue + CC-SD contract + repo snapshot |
130
+ | **Expected artifact** | File content / command exit code | Agent-produced PR diff |
131
+ | **Verifier contract** | `verification_*` fields in task YAML | Same fields + agent execution harness |
132
+ | **Result summary** | Per-task ok/fail; class/stack counts; executed/skipped totals | Above + pass@1, retry count, wall failure class |
133
+
134
+ Manifest validation: `scripts/check-e2e-manifest.mjs`. Details: [evals/e2e-bench/README.md](../evals/e2e-bench/README.md).
135
+
136
+ ---
137
+
138
+ ## 4. UX design
139
+
140
+ ### 4.1 Scope
141
+
142
+ | Category | Default | Examples |
143
+ |----------|---------|----------|
144
+ | In scope | L0–L2 candidates | App code, tests, refactors, dep bumps, docs |
145
+ | Always human | Production DB, prod secrets, billing/legal/PII | Never auto-delegate |
146
+ | L3 (conditional) | Typo/link/comment/docs only | Small, non-executable changes |
147
+
148
+ **v1 L1 enforcement** (CC-SD contract + `issue-spec-check`) applies only to `task:docs` and `task:test-fix`. See [coding-agent-l1.md](coding-agent-l1.md).
149
+
150
+ ### 4.2 Personas
151
+
152
+ | Persona | Responsibility | Primary touchpoints |
153
+ |---------|----------------|---------------------|
154
+ | **Developer** | Write Issue, delegate to agent, review PR | Issue, PR, IDE, CLI |
155
+ | **Reviewer** | Final gate on PRs with bundled context | PR |
156
+ | **Harness engineer** | Maintain harness assets, run outer loop | `.github/**`, `evals/**`, morning queue |
157
+ | **Org admin** | MCP allowlist, model policy, budget | Organization settings |
158
+
159
+ ### 4.3 UX principles
160
+
161
+ - **Single gate**: PR review only; hooks and CI replace mid-task approvals.
162
+ - **Bundled decision inputs**: PR context comment auto-posts diff stats, labels, retry count, instruction/skill SHAs, eval baseline, trace link placeholder.
163
+ - **Morning queue**: Outer-loop artifacts batched for daily review, not real-time interrupts.
164
+ - **Graduated autonomy**: L0→L3 with promotion evidence; see task-class matrix below.
165
+ - **Visible failures**: Retry exhaustion, security blocks, and drift warnings surface as structured PR comments and Issues.
166
+
167
+ ### 4.4 Task classification and autonomy matrix
168
+
169
+ Enforced by Issue labels (`task:*`, `autonomy:*`) and `scripts/check-diff-size.mjs`:
170
+
171
+ | Task class | Examples | Max autonomy (default) | Size limits (LOC / files) |
172
+ |------------|----------|------------------------|---------------------------|
173
+ | `docs` | README, design docs | L3 | 60 / 2 |
174
+ | `test-fix` | Fix or add tests | L2 | 120 / 4 |
175
+ | `refactor` | Rename, dedupe | L1 | 300 / 8 |
176
+ | `feature-small` | Small feature | L1 | 300 / 8 |
177
+ | `dependency-bump` | patch/minor deps | L1 | 300 / 8 |
178
+ | `infra` | CI, IaC, deploy | L0 | Human gate |
179
+ | `security-sensitive` | Auth, billing, secrets | L0 | Proposal only |
180
+
181
+ L2/L3 labeled PRs **hard-fail** CI when limits exceeded. L1 **warns by default** (template). Phase 4 supports opt-in L1 hard-fail via `DIFF_SIZE_L1_HARD_FAIL=1`. `autonomy:L0` is proposal-only (no size gate).
182
+
183
+ ### 4.5 CC-SD contract (L1 docs / test-fix)
184
+
185
+ Lightweight Issue-embedded spec — not a separate file:
186
+
187
+ | Field | Required |
188
+ |-------|----------|
189
+ | Goal | yes |
190
+ | Non-goals | yes |
191
+ | Constraints | yes |
192
+ | Acceptance criteria | yes |
193
+ | Rollback hints | yes |
194
+ | Additional context | optional |
195
+
196
+ Enforcement flow:
197
+
198
+ 1. Author fills `.github/ISSUE_TEMPLATE/task.yml`
199
+ 2. Triager validates contract and applies **labels** (form dropdown alone does not trigger CI)
200
+ 3. `issue-spec-check` resolves the linked Issue (`closingIssuesReferences` first, then `fixes/closes #N` in the PR body), validates CC-SD when labels are `task:docs` or `task:test-fix` + `autonomy:L1`
201
+ 4. Unlinked PRs warn and skip. Issue fetch failure **fails** only when PR or Issue proxy labels indicate L1 docs/test-fix; otherwise warn and skip
202
+ 5. Reviewer checks Issue → PR summary → diff alignment
203
+
204
+ Canonical field names: `scripts/lib/ccsd-contract.mjs`.
205
+
206
+ ---
207
+
208
+ ## 5. Architecture overview
209
+
210
+ ### 5.1 Layer model
211
+
212
+ ```mermaid
213
+ flowchart TB
214
+ subgraph L6["L6 Outer loop (process + gh-aw stubs)"]
215
+ NIGHTLY[nightly-harness-review<br/>failure classify / revision PR]
216
+ DRIFT[eval-drift / harness-sync<br/>bench rotation / template drift]
217
+ end
218
+ subgraph L5["L5 Eval (gh models + pytest + e2e-bench)"]
219
+ PEVAL[prompt-eval<br/>prompts/*.prompt.yml]
220
+ TEVAL[trajectory tests<br/>evals/trajectories/]
221
+ E2E[e2e-bench<br/>executable acceptance checks]
222
+ end
223
+ subgraph L4["L4 Observability (OTel → Langfuse)"]
224
+ TRACE[telemetry-schema fields<br/>PR context comment links]
225
+ end
226
+ subgraph L3["L3 Deterministic walls"]
227
+ HARNESS[harness-ci<br/>static / hooks / diff-size / issue-spec]
228
+ PRODUCT[product-ci-*<br/>stack tests / lint]
229
+ HOOKS[hooks.json preToolUse]
230
+ end
231
+ subgraph L2["L2 Execution (GitHub platform)"]
232
+ CODING[coding agent]
233
+ GHAW[Agentic Workflows]
234
+ CLI[Copilot CLI / IDE]
235
+ end
236
+ subgraph L1["L1 Feed-forward assets"]
237
+ INS[instructions / AGENTS.md]
238
+ AGT[agents: triager / implementer / reviewer]
239
+ SKL[skills: quality-loop]
240
+ end
241
+ subgraph L0["L0 Governance"]
242
+ POL[rulesets / CODEOWNERS / labels / budget]
243
+ end
244
+ L0 --> L1 --> L2 --> L3
245
+ L2 -. OTel .-> L4
246
+ L3 -. outcomes .-> L4
247
+ L4 --> L5 --> L6
248
+ L6 -- revise --> L1
249
+ L6 -- add walls --> L3
250
+ ```
251
+
252
+ ### 5.2 Component map (as implemented in sdlc-gh)
253
+
254
+ | # | Component | Source | Implementation |
255
+ |---|-----------|--------|----------------|
256
+ | C1 | Instruction hierarchy | GitHub + custom | `.github/copilot-instructions.md`, `.github/instructions/` (core + stack profiles), `AGENTS.md` |
257
+ | C2 | Custom agents | GitHub + custom | `triager` (read), `implementer` (read/edit/search/execute), `reviewer` (read/search); handoffs triager→implementer |
258
+ | C3 | Skills | Open spec + custom | `.github/skills/quality-loop/SKILL.md` — verify against CC-SD before complete |
259
+ | C4 | Deterministic guards | GitHub + custom | `hooks/hooks.json` (force-push, rm -rf, DROP TABLE); `check-diff-size.mjs`; `check-open-pr-limit.mjs` (3 open PRs proxy for safe outputs) |
260
+ | C5 | Execution modes | GitHub | CLI/IDE, coding agent, gh-aw (stubs), SDK — see [auth-boundaries.md](auth-boundaries.md) |
261
+ | C6 | Walls (content) | Custom per repo | Stack `product-ci-*` workflows; sample apps under `sample/{stack}/` |
262
+ | C7 | Harness CI | Custom | `harness-ci.yml`: harness-static, issue-spec-check, open-pr-limit, diff-size, detect-projects → product-ci |
263
+ | C8 | Eval CI | Custom | `eval-ci.yml` + `select-eval-jobs.mjs` change-type matrix |
264
+ | C9 | Retry orchestrator | Custom | `agent-retry-orchestrator.yml` — max 3 retries, same-signature stop, security no-retry |
265
+ | C10 | PR context | Custom | `pr-context-comment.yml` — decision table on every PR |
266
+ | C11 | Bootstrap | Custom | `scripts/bootstrap-harness.sh` + `config/stacks.json` |
267
+ | C12 | Observability | OSS scaffold | `infra/langfuse/`, `infra/otel/`, [telemetry-schema.md](telemetry-schema.md) |
268
+
269
+ ### 5.2.1 Eval matrix by change type
270
+
271
+ Implemented in `scripts/select-eval-jobs.mjs`:
272
+
273
+ | Changed paths | Eval jobs triggered |
274
+ |---------------|---------------------|
275
+ | `prompts/*.prompt.yml` | `prompt-eval` |
276
+ | `.github/agents/**` | `prompt-eval`, `agent-policy` |
277
+ | `.github/instructions/**`, `AGENTS.md` | `trajectory-conventions` |
278
+ | `.github/skills/**` | `trajectory-task` |
279
+ | `evals/**` | `meta-eval` |
280
+ | Default (other harness paths) | `trajectory-conventions` |
281
+
282
+ Weekly schedule runs full `e2e-bench` job regardless of PR paths.
283
+
284
+ ### 5.2.2 Change size limits
285
+
286
+ Canonical values in [operations.md](operations.md). Enforced by `scripts/check-diff-size.mjs` from `autonomy:*` PR labels.
287
+
288
+ | Level | Max LOC | Max files | CI behavior |
289
+ |-------|---------|-----------|-------------|
290
+ | L0 | — | — | Proposal only |
291
+ | L1 | 300 | 8 | Warn (hard-fail opt-in via `DIFF_SIZE_L1_HARD_FAIL=1`) |
292
+ | L2 | 120 | 4 | Hard fail |
293
+ | L3 | 60 | 2 | Hard fail |
294
+
295
+ Over-limit changes should be split, not force-merged.
296
+
297
+ ### 5.3 Task lifecycle (data flow)
298
+
299
+ ```mermaid
300
+ sequenceDiagram
301
+ participant Dev as Developer
302
+ participant GH as GitHub Issue/PR
303
+ participant TRI as triager agent
304
+ participant IMP as implementer agent
305
+ participant WALL as Walls (harness + product CI)
306
+ participant RET as Retry orchestrator
307
+ participant REV as Reviewer
308
+ Dev->>GH: Issue with CC-SD contract
309
+ GH->>TRI: Classify task:* / autonomy:*
310
+ TRI->>GH: Labels applied (L1 requires complete contract)
311
+ GH->>IMP: Delegate implementation
312
+ loop Inner loop
313
+ IMP->>IMP: Plan → edit → test
314
+ IMP->>GH: Draft PR commits
315
+ GH->>WALL: Required checks
316
+ alt Check failure
317
+ WALL->>RET: check_suite completed (failure)
318
+ RET->>GH: Structured comment + retry:N label
319
+ Note over RET: Max 3; same sig ×2 stops;<br/>security escalates immediately
320
+ else Check pass
321
+ WALL->>GH: PR context comment posted
322
+ GH->>REV: Single human gate
323
+ REV->>GH: Approve or request changes
324
+ end
325
+ end
326
+ ```
327
+
328
+ **Failure classification** (outer-loop routing): feed-forward gap, wall gap, model limit — see [failure-taxonomy.md](failure-taxonomy.md). Retry comments include `wall_failure_type` and `failure_sig`.
329
+
330
+ ### 5.4 Telemetry minimum schema
331
+
332
+ Required span fields defined in [telemetry-schema.md](telemetry-schema.md). Inner-loop workflows emit JSON artifacts per [telemetry-artifacts.md](telemetry-artifacts.md). PR context comment surfaces repo + PR number for Langfuse lookup when `LANGFUSE_HOST` is configured.
333
+
334
+ ### 5.5 Reviewer checklist
335
+
336
+ From `reviewer.agent.md` and arch §5.5:
337
+
338
+ 1. **Requirement fit** — Goal and Acceptance criteria met?
339
+ 2. **Non-goal preservation** — Out-of-scope items untouched?
340
+ 3. **Boundary compliance** — Constraints respected?
341
+ 4. **Test adequacy** — Tests constrain the change?
342
+ 5. **Accountability** — Eval scores, cost, trace links present?
343
+ 6. **Rollback ease** — Rollback hints plausible?
344
+
345
+ Compare **Issue → PR summary → diff** in one pass.
346
+
347
+ ### 5.6 Repository layout
348
+
349
+ **Template repo (`sdlc-gh`)**:
350
+
351
+ ```text
352
+ sdlc-gh/
353
+ ├── AGENTS.md # Project instructions (task classes, roles)
354
+ ├── config/
355
+ │ └── stacks.json # Stack catalog (ts / python / go / ruby / php)
356
+ ├── .github/
357
+ │ ├── copilot-instructions.md # Global agent policy
358
+ │ ├── instructions/
359
+ │ │ ├── core.instructions.md
360
+ │ │ └── profiles/ # Per-stack conventions
361
+ │ ├── agents/ # triager / implementer / reviewer
362
+ │ ├── skills/quality-loop/SKILL.md
363
+ │ ├── hooks/hooks.json
364
+ │ ├── labels.yml # task:* / autonomy:* definitions
365
+ │ ├── CODEOWNERS
366
+ │ ├── ruleset.example.json
367
+ │ ├── ISSUE_TEMPLATE/task.yml # CC-SD contract template
368
+ │ ├── pull_request_template.md
369
+ │ ├── aw/actions-lock.json
370
+ │ └── workflows/
371
+ │ ├── harness-ci.yml # Walls + stack detection + product CI
372
+ │ ├── product-ci-{stack}.yml # Per-stack test/lint (5 stacks)
373
+ │ ├── eval-ci.yml # Change-type eval matrix
374
+ │ ├── eval-drift.yml # Bench rotation + score drift Issues
375
+ │ ├── agent-retry-orchestrator.yml
376
+ │ ├── pr-context-comment.yml
377
+ │ ├── copilot-setup-steps.yml
378
+ │ ├── labels-sync.yml
379
+ │ ├── harness-sync.yml # Weekly drift report
380
+ │ ├── nightly-harness-review.md # gh-aw stub (Phase 3)
381
+ │ ├── nightly-harness-review.lock.yml
382
+ │ ├── weekly-redteam.md # gh-aw stub (Phase 3)
383
+ │ └── weekly-redteam.lock.yml
384
+ ├── docs/ # Architecture and operations
385
+ ├── evals/
386
+ │ ├── trajectories/ # pytest convention tests + rubric.md
387
+ │ ├── e2e-bench/ # Task fixtures + manifest.json
388
+ │ └── .score-baseline.json
389
+ ├── prompts/ # .prompt.yml for gh models eval
390
+ ├── scripts/ # CI gate implementations, bootstrap
391
+ ├── sample/ # Minimal apps per stack (CI targets)
392
+ └── infra/ # Langfuse + OTel collector scaffolding
393
+ ```
394
+
395
+ **After bootstrap** (`scripts/bootstrap-harness.sh`):
396
+
397
+ - Only the selected stack's profile and `product-ci-*` workflow are copied
398
+ - `harness-ci.yml` is trimmed to a single product job
399
+ - Sample code expands to repo root when `--mode new`
400
+
401
+ In the template repo, marker detection runs all present stacks' product CI via `detect-projects` job.
402
+
403
+ Harness assets live in the product repo so Git history, PR review, and rollback apply directly. Org-wide shared assets can ship from a `.github-private` repository.
404
+
405
+ ---
406
+
407
+ ## 6. CI and automation reference
408
+
409
+ ### 6.1 Harness CI jobs
410
+
411
+ | Job | Trigger | Purpose |
412
+ |-----|---------|---------|
413
+ | `harness-static` | PR, push main | `validate-harness.mjs`, actionlint, zizmor, hooks + issue-spec scenario tests |
414
+ | `issue-spec-check` | PR | CC-SD completeness for L1 docs/test-fix (`check-issue-spec.mjs`; uses `PR_LABELS` + linked Issue labels) |
415
+ | `open-pr-limit` | PR | Warn when author has >3 open PRs (`check-open-pr-limit.mjs`) |
416
+ | `diff-size` | PR | Autonomy size gate (`check-diff-size.mjs`) |
417
+ | `product-ci-*` | PR, push main | Stack tests/lint (conditional on marker files) |
418
+
419
+ ### 6.2 Eval CI jobs
420
+
421
+ | Job | When | Purpose |
422
+ |-----|------|---------|
423
+ | `select` | PR / schedule | `select-eval-jobs.mjs` |
424
+ | `prompt-eval` | Selected | `gh models eval` on `prompts/*.prompt.yml` |
425
+ | `agent-policy` | Selected | Agent definition validation |
426
+ | `trajectory-conventions` | Selected | pytest harness convention tests |
427
+ | `trajectory-task` | Selected | Skill/task rubric tests |
428
+ | `meta-eval` | Selected | E2E manifest + bench runner + pytest |
429
+ | `e2e-bench` | Weekly schedule | Full bench run |
430
+
431
+ ### 6.3 Local validation
432
+
433
+ ```bash
434
+ npm run validate # Harness asset consistency
435
+ npm run test-hooks # Hook block/allow scenarios
436
+ npm run test-diff-size # Diff-size scenarios
437
+ npm run test-e2e-manifest # E2E manifest scenarios
438
+ npm run test-doctor # Doctor local check scenarios
439
+ npm run check-e2e # E2E manifest checks
440
+ npm run verify-bootstrap # Bootstrap integration (all stacks)
441
+ pytest evals/trajectories -q
442
+ ```
443
+
444
+ Node 22 recommended for full E2E verifier parity with CI.
445
+
446
+ ---
447
+
448
+ ## 7. Phased rollout
449
+
450
+ Do not enable everything at once. Each phase uses prior metrics as promotion evidence.
451
+
452
+ | Phase | Timeline | Enable | sdlc-gh state |
453
+ |-------|----------|--------|---------------|
454
+ | **0** | ~2 weeks | CI walls, rulesets, optional Langfuse | harness-ci + product-ci |
455
+ | **1** | ~1 month | Instructions, agents, hooks, templates; record baseline KPIs | FF assets + labels sync |
456
+ | **2** | ~2 months | Eval CI + change-type matrix; E2E bench v1 | eval-ci, 9 e2e tasks |
457
+ | **3** | ~3 months | Retry orchestrator, PR context, gh-aw outer loop stubs | orchestrator + stubs (compile not guaranteed) |
458
+ | **4** | Stable ops | L2 promotion for docs; exception ledger; revert playbook | See [revert-playbook.md](revert-playbook.md), [exceptions/](exceptions/README.md) |
459
+
460
+ Detailed adoption steps: [adoption.md](adoption.md). L1 trial guide: [coding-agent-l1.md](coding-agent-l1.md).
461
+
462
+ ---
463
+
464
+ ## 8. Risks and mitigations
465
+
466
+ | Risk | Mitigation |
467
+ |------|------------|
468
+ | gh-aw Public Preview instability | Limit to outer loop; commit lock.yml; track release notes |
469
+ | Prompt injection | Input sanitization (gh-aw), secretless design, garak red-team stub; treat Issue body as untrusted |
470
+ | Auth boundary drift per mode | [auth-boundaries.md](auth-boundaries.md) matrix; no long-lived secrets in repo |
471
+ | Approval fatigue | Single PR gate; structured context comment; no mid-task prompts |
472
+ | Cost runaway | `max-ai-credits`; Langfuse cost fields; open-PR limit proxy |
473
+ | Eval overfitting | Quarterly 20% E2E rotation (`eval-drift.yml`); 15pt production gap threshold |
474
+ | Principle 4 hollowed out | Eval CI path filters + ruleset eval required |
475
+ | Retry loop runaway | Max 3 retries, same-signature stop, security immediate escalation |
476
+ | Oversized diffs | Autonomy size gates; split recommendation |
477
+ | Harness asset drift | `harness-sync.yml` weekly drift report; bootstrap re-run |
478
+
479
+ ---
480
+
481
+ ## 9. Custom investment areas
482
+
483
+ Quality ultimately depends on:
484
+
485
+ 1. **Wall content** — test suites, contract tests, business-rule checks in product CI
486
+ 2. **E2E bench (G1)** — expand from acceptance checks to break-and-fix agent runner; target 20–100 tasks
487
+ 3. **Rubrics (G2)** — `evals/trajectories/rubric.md` and G-Eval / LLM-as-judge specs
488
+ 4. **Revision cycle (G3)** — failure taxonomy routing; morning queue (~30 min/day)
489
+ 5. **Feed-forward content** — domain rules in instructions / agents / skills
490
+ 6. **Exception ledger** — [docs/exceptions/](exceptions/README.md)
491
+
492
+ ---
493
+
494
+ ## 10. KPIs
495
+
496
+ **Inner loop**: PR rejection rate, first-pass wall rate, AI credits per task, average retry count.
497
+
498
+ **Outer loop**: E2E pass rate trend, harness revision PR adoption rate, failure-class mix (wall-gap ratio declining = walls improving).
499
+
500
+ **Lagging quality**: 7-day revert rate, agent-caused hotfix rate, post-review fix rate.
501
+
502
+ **UX**: Review time per PR, morning queue processing time, autonomy level distribution.
503
+
504
+ Tracking template: [kpi-baseline.md](kpi-baseline.md).
505
+
506
+ ---
507
+
508
+ ## 11. Related documentation
509
+
510
+ | Document | Contents |
511
+ |----------|----------|
512
+ | [adoption.md](adoption.md) | Installation, bootstrap, rollback |
513
+ | [operations.md](operations.md) | **Canonical** thresholds, retry policy, forbidden ops |
514
+ | [coding-agent-l1.md](coding-agent-l1.md) | First L1 delegations (docs / test-fix) |
515
+ | [failure-taxonomy.md](failure-taxonomy.md) | Failure classification for outer loop |
516
+ | [telemetry-schema.md](telemetry-schema.md) | Required observability fields |
517
+ | [telemetry-artifacts.md](telemetry-artifacts.md) | Inner-loop JSON artifact format |
518
+ | [gh-aw-dogfood.md](gh-aw-dogfood.md) | Bounded gh-aw validation on sdlc-gh |
519
+ | [auth-boundaries.md](auth-boundaries.md) | Credential matrix per execution mode |
520
+ | [shared-config.md](shared-config.md) | Cross-repo harness distribution |
521
+ | [kpi-baseline.md](kpi-baseline.md) | Weekly KPI template |
522
+ | [exceptions/README.md](exceptions/README.md) | Policy exception records |
523
+ | [revert-playbook.md](revert-playbook.md) | Revert procedure |
524
+ | [infra/README.md](../infra/README.md) | Langfuse / OTel self-host |
525
+
526
+ ---
527
+
528
+ ## Appendix: Assumptions (July 2026)
529
+
530
+ - Agentic Workflows remain Public Preview with safe outputs, AWF firewall, and secretless defaults.
531
+ - Copilot SDK is Public Preview (MIT; JSON-RPC to CLI server).
532
+ - `gh models eval` supports `.prompt.yml` in CI (similarity, string match, LLM-as-judge).
533
+ - GitHub Copilot billing moved to AI credits (June 2026).
534
+ - Agent Skills are an open spec shared across Copilot, Claude Code, and Codex.
535
+ - Custom agents support tool restriction, handoffs, and org-level distribution.
@@ -0,0 +1,16 @@
1
+ # Auth Boundaries
2
+
3
+ Execution mode credential matrix (arch.md §4.2).
4
+
5
+ | Mode | Credentials | Scope | Audit |
6
+ |------|-------------|-------|-------|
7
+ | CLI / IDE | Developer local delegation | User's local permissions | Copilot audit logs |
8
+ | coding agent | Short-lived, repo-scoped token | Isolated VM, own branch only, cannot approve own PR | Actions + OTel |
9
+ | gh-aw | Secretless; proxy/gateway auth | AWF firewall, domain allowlist | gh aw audit, firewall logs |
10
+ | SDK | Proxy execution service | Limited operations only | Application audit log |
11
+
12
+ ## Invariants
13
+
14
+ - No long-lived secrets in prompt context or agent-readable files
15
+ - Production credentials never in harness assets committed to git
16
+ - Exceptions require documented approval in `docs/exceptions/`