opencastle 0.7.0 → 0.8.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +30 -3
- package/bin/cli.mjs +2 -0
- package/dist/cli/adapters/claude-code.d.ts +2 -5
- package/dist/cli/adapters/claude-code.d.ts.map +1 -1
- package/dist/cli/adapters/claude-code.js +12 -251
- package/dist/cli/adapters/claude-code.js.map +1 -1
- package/dist/cli/adapters/cursor.d.ts.map +1 -1
- package/dist/cli/adapters/cursor.js +3 -17
- package/dist/cli/adapters/cursor.js.map +1 -1
- package/dist/cli/adapters/frontmatter.d.ts +26 -0
- package/dist/cli/adapters/frontmatter.d.ts.map +1 -0
- package/dist/cli/adapters/frontmatter.js +40 -0
- package/dist/cli/adapters/frontmatter.js.map +1 -0
- package/dist/cli/adapters/index.d.ts +5 -0
- package/dist/cli/adapters/index.d.ts.map +1 -0
- package/dist/cli/adapters/index.js +9 -0
- package/dist/cli/adapters/index.js.map +1 -0
- package/dist/cli/adapters/opencode.d.ts +2 -5
- package/dist/cli/adapters/opencode.d.ts.map +1 -1
- package/dist/cli/adapters/opencode.js +12 -250
- package/dist/cli/adapters/opencode.js.map +1 -1
- package/dist/cli/adapters/single-file-base.d.ts +40 -0
- package/dist/cli/adapters/single-file-base.d.ts.map +1 -0
- package/dist/cli/adapters/single-file-base.js +246 -0
- package/dist/cli/adapters/single-file-base.js.map +1 -0
- package/dist/cli/dashboard.d.ts.map +1 -1
- package/dist/cli/dashboard.js +3 -2
- package/dist/cli/dashboard.js.map +1 -1
- package/dist/cli/detect.d.ts.map +1 -1
- package/dist/cli/detect.js +13 -11
- package/dist/cli/detect.js.map +1 -1
- package/dist/cli/doctor.d.ts +3 -0
- package/dist/cli/doctor.d.ts.map +1 -0
- package/dist/cli/doctor.js +205 -0
- package/dist/cli/doctor.js.map +1 -0
- package/dist/cli/init.d.ts.map +1 -1
- package/dist/cli/init.js +31 -19
- package/dist/cli/init.js.map +1 -1
- package/dist/cli/run/schema.d.ts +1 -5
- package/dist/cli/run/schema.d.ts.map +1 -1
- package/dist/cli/run/schema.js +6 -330
- package/dist/cli/run/schema.js.map +1 -1
- package/dist/cli/run.d.ts.map +1 -1
- package/dist/cli/run.js +14 -1
- package/dist/cli/run.js.map +1 -1
- package/dist/cli/types.d.ts +0 -5
- package/dist/cli/types.d.ts.map +1 -1
- package/dist/cli/update.d.ts.map +1 -1
- package/dist/cli/update.js +4 -17
- package/dist/cli/update.js.map +1 -1
- package/package.json +7 -2
- package/src/cli/adapters/claude-code.ts +13 -304
- package/src/cli/adapters/cursor.ts +3 -23
- package/src/cli/adapters/frontmatter.ts +47 -0
- package/src/cli/adapters/index.ts +13 -0
- package/src/cli/adapters/opencode.ts +12 -301
- package/src/cli/adapters/single-file-base.ts +320 -0
- package/src/cli/dashboard.ts +3 -2
- package/src/cli/detect.ts +19 -15
- package/src/cli/doctor.ts +235 -0
- package/src/cli/init.ts +31 -24
- package/src/cli/run/schema.ts +7 -365
- package/src/cli/run.ts +17 -1
- package/src/cli/types.ts +0 -6
- package/src/cli/update.ts +5 -23
- package/src/dashboard/dist/_astro/{index.CWVzbF4T.css → index.Bnq19_1M.css} +1 -1
- package/src/dashboard/dist/index.html +170 -11
- package/src/dashboard/node_modules/.vite/deps/_metadata.json +6 -6
- package/src/dashboard/seed-data/reviews.ndjson +6 -0
- package/src/dashboard/src/pages/index.astro +213 -10
- package/src/dashboard/src/styles/dashboard.css +196 -0
- package/src/orchestrator/agent-workflows/bug-fix.md +2 -2
- package/src/orchestrator/agent-workflows/data-pipeline.md +8 -8
- package/src/orchestrator/agent-workflows/database-migration.md +2 -2
- package/src/orchestrator/agent-workflows/feature-implementation.md +12 -5
- package/src/orchestrator/agent-workflows/performance-optimization.md +2 -2
- package/src/orchestrator/agent-workflows/refactoring.md +2 -2
- package/src/orchestrator/agent-workflows/schema-changes.md +2 -2
- package/src/orchestrator/agent-workflows/security-audit.md +2 -2
- package/src/orchestrator/agents/data-expert.agent.md +2 -2
- package/src/orchestrator/agents/researcher.agent.md +0 -16
- package/src/orchestrator/agents/team-lead.agent.md +17 -6
- package/src/orchestrator/customizations/AGENT-PERFORMANCE.md +1 -3
- package/src/orchestrator/prompts/bootstrap-customizations.prompt.md +1 -1
- package/src/orchestrator/prompts/bug-fix.prompt.md +11 -6
- package/src/orchestrator/prompts/implement-feature.prompt.md +9 -4
- package/src/orchestrator/prompts/quick-refinement.prompt.md +9 -5
- package/src/orchestrator/prompts/resolve-pr-comments.prompt.md +18 -4
- package/src/orchestrator/skills/agent-hooks/SKILL.md +4 -2
- package/src/orchestrator/skills/fast-review/SKILL.md +15 -4
- package/src/orchestrator/skills/self-improvement/SKILL.md +1 -1
- package/src/orchestrator/skills/validation-gates/SKILL.md +152 -15
- package/src/orchestrator/prompts/metrics-report.prompt.md +0 -144
|
@@ -123,7 +123,9 @@ CONFIDENCE: low | medium | high
|
|
|
123
123
|
**Auto-PASS conditions (skip reviewer):**
|
|
124
124
|
- The delegation was pure research/exploration with no code changes
|
|
125
125
|
- The delegation only modified documentation files (`.md`)
|
|
126
|
-
- All deterministic gates already passed AND the change is ≤10 lines across ≤2 files
|
|
126
|
+
- All deterministic gates already passed AND the change is ≤10 lines across ≤2 files AND **no sensitive files were touched** (see validation-gates Gate 3 sensitive file list)
|
|
127
|
+
|
|
128
|
+
> **Sensitive file override:** Changes to auth/middleware files, database migrations, RLS policies, security headers, CSP configuration, environment variable schemas, or CI/CD configuration **always** require a reviewer — even for 1-line changes. Auto-PASS never applies to these files.
|
|
127
129
|
|
|
128
130
|
### Step 4: Handle Verdict
|
|
129
131
|
|
|
@@ -247,14 +249,23 @@ Fast review sits between the agent's output and the Team Lead's acceptance:
|
|
|
247
249
|
Agent completes work
|
|
248
250
|
│
|
|
249
251
|
▼
|
|
250
|
-
|
|
252
|
+
Secret Scanning ← validation-gates Gate 1
|
|
253
|
+
│
|
|
254
|
+
▼
|
|
255
|
+
Deterministic checks (lint, test, build) ← validation-gates Gate 2
|
|
256
|
+
│
|
|
257
|
+
▼
|
|
258
|
+
Blast Radius Check ← validation-gates Gate 3
|
|
259
|
+
│
|
|
260
|
+
▼
|
|
261
|
+
Dependency Audit (if packages changed) ← validation-gates Gate 4
|
|
251
262
|
│
|
|
252
263
|
▼
|
|
253
|
-
Fast Review (this skill) ← validation-gates Gate
|
|
264
|
+
Fast Review (this skill) ← validation-gates Gate 5
|
|
254
265
|
│
|
|
255
266
|
├── PASS → Accept, move to next task
|
|
256
267
|
├── FAIL → Retry loop (up to 2x)
|
|
257
|
-
└── 3x FAIL → Escalate to Panel (Gate
|
|
268
|
+
└── 3x FAIL → Escalate to Panel (Gate 9)
|
|
258
269
|
```
|
|
259
270
|
|
|
260
271
|
### Relationship to on-post-delegate Hook
|
|
@@ -50,7 +50,7 @@ A lesson MUST be written when **any** of these triggers occur:
|
|
|
50
50
|
echo '{"timestamp":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","agent":"Agent Name","model":"model-id","task":"Short description","outcome":"success","files_changed":N,"retries":0}' >> .github/customizations/logs/sessions.ndjson
|
|
51
51
|
```
|
|
52
52
|
|
|
53
|
-
This is **mandatory** — session logging fuels the
|
|
53
|
+
This is **mandatory** — session logging fuels the observability dashboard (`npx opencastle dashboard`).
|
|
54
54
|
|
|
55
55
|
## How to Write a Lesson
|
|
56
56
|
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: validation-gates
|
|
3
|
-
description: "Shared validation gates for all orchestration workflows — deterministic checks, browser testing, cache management, regression checks. Referenced by prompt templates to maintain single source of truth."
|
|
3
|
+
description: "Shared validation gates for all orchestration workflows — secret scanning, deterministic checks, blast radius analysis, dependency auditing, browser testing, cache management, regression checks, and final smoke tests. Referenced by prompt templates to maintain single source of truth."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
<!-- ⚠️ This file is managed by OpenCastle. Edits will be overwritten on update. Customize in the .github/customizations/ directory instead. -->
|
|
@@ -9,7 +9,57 @@ description: "Shared validation gates for all orchestration workflows — determ
|
|
|
9
9
|
|
|
10
10
|
Canonical reference for validation gates shared across all orchestration workflows. Prompt templates reference this skill to avoid duplication.
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
**Gate summary:**
|
|
13
|
+
|
|
14
|
+
| Gate | Name | Runs When |
|
|
15
|
+
|------|------|-----------|
|
|
16
|
+
| 1 | Secret Scanning | Every delegation |
|
|
17
|
+
| 2 | Deterministic Checks | Every delegation |
|
|
18
|
+
| 3 | Blast Radius Check | Every delegation |
|
|
19
|
+
| 4 | Dependency Audit | When `package.json` or lockfiles change |
|
|
20
|
+
| 5 | Fast Review | Every delegation (with auto-PASS exceptions) |
|
|
21
|
+
| 6 | Cache Clearing | Before browser testing |
|
|
22
|
+
| 7 | Browser Testing | UI changes |
|
|
23
|
+
| 8 | Regression Testing | Every delegation |
|
|
24
|
+
| 9 | Panel Review | High-stakes changes only |
|
|
25
|
+
| 10 | Final Smoke Test | Feature completion (after all tasks Done) |
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Gate 1: Secret Scanning
|
|
30
|
+
|
|
31
|
+
> **HARD GATE — Constitution rule #1.** No tokens, keys, passwords, or connection strings in code, logs, commits, or terminal output.
|
|
32
|
+
|
|
33
|
+
Scan every diff **before** any other gate. A secret leak caught after merge is exponentially more expensive than one caught at review time.
|
|
34
|
+
|
|
35
|
+
### What to scan
|
|
36
|
+
|
|
37
|
+
Run a regex scan of all changed files for patterns that match common secret formats:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
# Scan staged/changed files for common secret patterns
|
|
41
|
+
grep -rn -E '(AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9]{20,}|ghp_[a-zA-Z0-9]{36}|glpat-[a-zA-Z0-9\-]{20}|xox[bpors]-[a-zA-Z0-9\-]+|eyJ[a-zA-Z0-9]{10,}\.[a-zA-Z0-9]{10,}|-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----|mongodb(\+srv)?://[^\s]+|postgres(ql)?://[^\s]+|mysql://[^\s]+|redis://[^\s]+)' <changed-files>
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Also check for:
|
|
45
|
+
- Hardcoded `password`, `secret`, `api_key`, `apiKey`, `token` assignments (not just references)
|
|
46
|
+
- `.env` file contents copied into source files
|
|
47
|
+
- Base64-encoded secrets (common obfuscation attempt)
|
|
48
|
+
|
|
49
|
+
### On detection
|
|
50
|
+
|
|
51
|
+
- **BLOCK immediately** — do not proceed to Gate 2
|
|
52
|
+
- Flag the specific file and line number
|
|
53
|
+
- Re-delegate to the agent with explicit instruction to use environment variables instead
|
|
54
|
+
- If a secret was already committed, **rotate it immediately** — git history is permanent
|
|
55
|
+
|
|
56
|
+
### Exceptions
|
|
57
|
+
|
|
58
|
+
- Test fixtures with obviously fake values (e.g., `sk-test-1234567890`)
|
|
59
|
+
- Documentation examples with placeholder values (e.g., `YOUR_API_KEY_HERE`)
|
|
60
|
+
- Pattern matches inside comments that are clearly explanatory
|
|
61
|
+
|
|
62
|
+
## Gate 2: Deterministic Checks
|
|
13
63
|
|
|
14
64
|
Run for every affected project (resolve exact commands via the **codebase-tool** skill):
|
|
15
65
|
|
|
@@ -19,31 +69,84 @@ Run for every affected project (resolve exact commands via the **codebase-tool**
|
|
|
19
69
|
|
|
20
70
|
All must pass with zero errors. Run for **every** project that consumed modified files, not just the primary project.
|
|
21
71
|
|
|
22
|
-
## Gate
|
|
72
|
+
## Gate 3: Blast Radius Check
|
|
73
|
+
|
|
74
|
+
Assess the scope of changes to catch scope creep and ensure reviewers can evaluate the diff effectively.
|
|
75
|
+
|
|
76
|
+
### Thresholds
|
|
77
|
+
|
|
78
|
+
| Metric | Normal | Warning | Escalate |
|
|
79
|
+
|--------|--------|---------|----------|
|
|
80
|
+
| Lines changed | ≤200 | 201–500 | >500 |
|
|
81
|
+
| Files changed | ≤5 | 6–10 | >10 |
|
|
82
|
+
| Projects affected | ≤1 | 2 | >2 |
|
|
83
|
+
|
|
84
|
+
### Actions
|
|
85
|
+
|
|
86
|
+
- **Normal** — proceed to Gate 4
|
|
87
|
+
- **Warning** — log a note in the delegation record. Ask: *"Was this scope expected?"* If yes, proceed. If unexpected, investigate whether the agent drifted from the partition
|
|
88
|
+
- **Escalate** — **STOP.** The Team Lead must review the diff before proceeding:
|
|
89
|
+
1. Verify all changed files are within the agent's assigned partition
|
|
90
|
+
2. Check whether the task should have been split into smaller subtasks
|
|
91
|
+
3. If scope creep: revert extra changes, re-delegate with tighter scope
|
|
92
|
+
4. If legitimately large: proceed, but **always run fast review** (no auto-PASS) and consider panel review
|
|
93
|
+
|
|
94
|
+
### Sensitive files
|
|
95
|
+
|
|
96
|
+
Changes to these file categories always trigger Warning regardless of line count:
|
|
97
|
+
|
|
98
|
+
- Auth/middleware files (e.g., `middleware.ts`, `auth.ts`, `**/auth/**`)
|
|
99
|
+
- Database migrations, RLS policies
|
|
100
|
+
- Security headers, CSP configuration (`next.config.*`, `vercel.json`)
|
|
101
|
+
- Environment variable schemas (`.env.example`, `env.ts`)
|
|
102
|
+
- CI/CD configuration (`.github/workflows/**`)
|
|
103
|
+
- Package manager configs (`package.json`, lockfiles) — also triggers Gate 4
|
|
104
|
+
|
|
105
|
+
## Gate 4: Dependency Audit
|
|
106
|
+
|
|
107
|
+
> Runs only when `package.json`, `yarn.lock`, `package-lock.json`, `pnpm-lock.yaml`, or similar lockfiles are modified.
|
|
108
|
+
|
|
109
|
+
When agents add, remove, or update npm packages, verify:
|
|
110
|
+
|
|
111
|
+
1. **Vulnerability scan** — Run `npm audit` (or the project's equivalent). No new `high` or `critical` vulnerabilities
|
|
112
|
+
2. **License compatibility** — New packages must use MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, or ISC licenses. Flag any copyleft (GPL, LGPL, AGPL) or proprietary licenses for human review
|
|
113
|
+
3. **Bundle size impact** — For frontend packages, note the minified + gzipped size. Flag packages >50KB gzipped that have lighter alternatives
|
|
114
|
+
4. **Duplicate functionality** — Check whether the new dependency overlaps with an existing one (e.g., adding `moment` when `date-fns` is already installed)
|
|
115
|
+
5. **Maintenance health** — Flag packages with no updates in >2 years or <100 weekly downloads
|
|
116
|
+
|
|
117
|
+
### On failure
|
|
118
|
+
|
|
119
|
+
- **Vulnerability:** BLOCK. Re-delegate with instruction to use a patched version or alternative package
|
|
120
|
+
- **License concern:** Flag for human review. Do not block, but document in the PR description
|
|
121
|
+
- **Size/duplicate:** Flag as SHOULD-FIX in the fast review. Not blocking unless egregious (>200KB)
|
|
122
|
+
|
|
123
|
+
## Gate 5: Fast Review (MANDATORY)
|
|
23
124
|
|
|
24
125
|
> **HARD GATE:** Every agent delegation output must pass fast review before acceptance. This is non-negotiable — even for overnight/unattended runs. Load the **fast-review** skill for the full procedure.
|
|
25
126
|
|
|
26
|
-
After
|
|
127
|
+
After gates 1–4 pass:
|
|
27
128
|
|
|
28
129
|
1. **Spawn a single reviewer sub-agent** with the review prompt from the fast-review skill
|
|
29
130
|
2. **On PASS** — proceed to remaining gates
|
|
30
131
|
3. **On FAIL** — re-delegate to the same agent with reviewer feedback (up to 2 retries)
|
|
31
|
-
4. **On 3x FAIL** — escalate to panel review (Gate
|
|
132
|
+
4. **On 3x FAIL** — escalate to panel review (Gate 9)
|
|
32
133
|
|
|
33
134
|
The reviewer validates: acceptance criteria met, file partition respected, no regressions, type safety, error handling, security basics, and edge cases.
|
|
34
135
|
|
|
35
136
|
**Auto-PASS conditions** (skip the reviewer sub-agent):
|
|
36
137
|
- Pure research/exploration with no code changes
|
|
37
138
|
- Only `.md` files were modified
|
|
38
|
-
- All deterministic gates passed AND the change is ≤10 lines across ≤2 files
|
|
139
|
+
- All deterministic gates passed AND the change is ≤10 lines across ≤2 files AND **no sensitive files were touched** (see Gate 3 sensitive file list)
|
|
39
140
|
|
|
40
|
-
|
|
141
|
+
> **Sensitive file override:** If any changed file falls into the sensitive file categories listed in Gate 3 (auth, migrations, security headers, env schemas, CI/CD), auto-PASS is **never** applied — even for 1-line changes. These files always get a human-quality review.
|
|
142
|
+
|
|
143
|
+
## Gate 6: Cache Clearing (BEFORE Browser Testing)
|
|
41
144
|
|
|
42
145
|
**Always clear before testing.** Testing stale code wastes time and produces false results.
|
|
43
146
|
|
|
44
147
|
Clear framework caches and task runner caches before starting the dev server for browser testing. See the **codebase-tool** skill for cache-clearing commands.
|
|
45
148
|
|
|
46
|
-
## Gate
|
|
149
|
+
## Gate 7: Browser Testing (MANDATORY for UI Changes)
|
|
47
150
|
|
|
48
151
|
> **HARD GATE:** A task with UI changes is NOT done until you have screenshots in Chrome proving the feature works. "The code looks correct" is not proof. "Tests pass" is not proof. Only a screenshot of the working UI in Chrome is proof.
|
|
49
152
|
|
|
@@ -59,7 +162,7 @@ Clear framework caches and task runner caches before starting the dev server for
|
|
|
59
162
|
|
|
60
163
|
Load the **browser-testing** skill for Chrome MCP commands, breakpoint details, and reporting format.
|
|
61
164
|
|
|
62
|
-
## Gate
|
|
165
|
+
## Gate 8: Regression Testing
|
|
63
166
|
|
|
64
167
|
New features must not break existing functionality:
|
|
65
168
|
|
|
@@ -68,7 +171,7 @@ New features must not break existing functionality:
|
|
|
68
171
|
3. **Verify navigation** — Ensure routing, links, and back-button behavior still work
|
|
69
172
|
4. **Check shared components** — If a component from a shared library was modified, test it in all apps that consume it
|
|
70
173
|
|
|
71
|
-
## Gate
|
|
174
|
+
## Gate 9: Panel Review (High-Stakes Only)
|
|
72
175
|
|
|
73
176
|
Use the **panel-majority-vote** skill for:
|
|
74
177
|
|
|
@@ -79,16 +182,50 @@ Use the **panel-majority-vote** skill for:
|
|
|
79
182
|
|
|
80
183
|
If the panel returns BLOCK, extract MUST-FIX items, re-delegate to the same agent, and re-run the panel. Never skip, never halt. Max 3 attempts, then escalate to Architect.
|
|
81
184
|
|
|
185
|
+
## Gate 10: Final Smoke Test (Feature-Level)
|
|
186
|
+
|
|
187
|
+
> Runs once after ALL tasks in a feature are Done — not per-task.
|
|
188
|
+
|
|
189
|
+
Individual tasks pass gates 1–9 independently. But the combined result may have integration issues that per-task testing misses. This gate verifies the feature as a cohesive unit.
|
|
190
|
+
|
|
191
|
+
### Steps
|
|
192
|
+
|
|
193
|
+
1. **Full build** — Build all affected projects from clean state (not incremental)
|
|
194
|
+
2. **Full test suite** — Run tests across all projects that consumed any changed files
|
|
195
|
+
3. **End-to-end browser walkthrough** — Navigate the complete user flow from start to finish:
|
|
196
|
+
- Verify all states: loading, empty, populated, error, partial
|
|
197
|
+
- Test every state transition end-to-end (not just individual screens)
|
|
198
|
+
- Confirm data flows correctly between pages/components
|
|
199
|
+
- Test the happy path AND at least one error path
|
|
200
|
+
4. **Cross-task integration check** — Verify that outputs from different tasks (e.g., DB migration + component + page) compose correctly
|
|
201
|
+
5. **Smoke test at all breakpoints** — If the feature has UI, one final responsive sweep
|
|
202
|
+
|
|
203
|
+
### When to skip
|
|
204
|
+
|
|
205
|
+
- Non-UI features with comprehensive test coverage (e.g., pure backend/data pipeline work where tests verify integration)
|
|
206
|
+
- Single-task features (Gate 8 already covers regression)
|
|
207
|
+
|
|
208
|
+
### On failure
|
|
209
|
+
|
|
210
|
+
Re-delegate the specific failing integration point to the agent responsible for that layer. Do NOT re-run the entire feature implementation.
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
82
214
|
## Universal Completion Checklist
|
|
83
215
|
|
|
84
216
|
Use this checklist for any orchestration workflow:
|
|
85
217
|
|
|
86
|
-
- [ ]
|
|
87
|
-
- [ ]
|
|
88
|
-
- [ ]
|
|
89
|
-
- [ ]
|
|
218
|
+
- [ ] **No secrets in diff** (Gate 1)
|
|
219
|
+
- [ ] Lint, test, and build pass for all affected projects (Gate 2)
|
|
220
|
+
- [ ] Blast radius assessed — scope is expected (Gate 3)
|
|
221
|
+
- [ ] Dependency audit passed if packages changed (Gate 4)
|
|
222
|
+
- [ ] **Fast review passed** (mandatory — load **fast-review** skill) (Gate 5)
|
|
223
|
+
- [ ] Dev server started with **clean cache** (Gate 6)
|
|
224
|
+
- [ ] UI changes verified in Chrome with screenshots at all breakpoints (Gate 7)
|
|
90
225
|
- [ ] Every acceptance criteria item visually confirmed — not just "page loads"
|
|
91
|
-
- [ ] No regressions in adjacent functionality
|
|
226
|
+
- [ ] No regressions in adjacent functionality (Gate 8)
|
|
227
|
+
- [ ] Panel review passed for high-stakes changes (Gate 9)
|
|
228
|
+
- [ ] **Final smoke test passed** for multi-task features (Gate 10)
|
|
92
229
|
- [ ] Shared code changes tested across all consuming apps
|
|
93
230
|
- [ ] No duplicated code — shared logic extracted to libraries
|
|
94
231
|
- [ ] Lessons learned captured if any retries occurred
|
|
@@ -1,144 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
description: 'Collect and report metrics from agent logs, GitHub PRs, tracker issues, and deployments'
|
|
3
|
-
agent: Researcher
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
<!-- ⚠️ This file is managed by OpenCastle. Edits will be overwritten on update. Customize in the .github/customizations/ directory instead. -->
|
|
7
|
-
|
|
8
|
-
# Metrics Report
|
|
9
|
-
|
|
10
|
-
Generate a comprehensive metrics dashboard from all project data sources.
|
|
11
|
-
|
|
12
|
-
## Data Sources
|
|
13
|
-
|
|
14
|
-
Collect data from ALL of these sources. Run collections in parallel where possible.
|
|
15
|
-
|
|
16
|
-
### 1. Agent Session Logs (local)
|
|
17
|
-
|
|
18
|
-
Read `.github/customizations/logs/sessions.ndjson` and `.github/customizations/logs/delegations.ndjson`.
|
|
19
|
-
|
|
20
|
-
Compute:
|
|
21
|
-
- **Total sessions** and **sessions per agent**
|
|
22
|
-
- **Success rate** — `outcome` field breakdown (success / partial / failed)
|
|
23
|
-
- **Retries per session** — average and total
|
|
24
|
-
- **Lessons added** — count and which agents contribute most
|
|
25
|
-
- **Delegation stats** — mechanism (sub-agent vs background), tier distribution, success rate per agent
|
|
26
|
-
- **Model usage** — which models used how often
|
|
27
|
-
- **Activity timeline** — sessions per day/week
|
|
28
|
-
|
|
29
|
-
### 2. GitHub PRs and Commits
|
|
30
|
-
|
|
31
|
-
Use `gh` CLI commands (always prefix with `GH_PAGER=cat`):
|
|
32
|
-
|
|
33
|
-
```bash
|
|
34
|
-
# All PRs (open + closed + merged)
|
|
35
|
-
GH_PAGER=cat gh pr list --state all --limit 100 --json number,title,state,createdAt,mergedAt,closedAt,author,additions,deletions,changedFiles,labels,headRefName
|
|
36
|
-
|
|
37
|
-
# Recent commits on main
|
|
38
|
-
GH_PAGER=cat gh api repos/{owner}/{repo}/commits --paginate -q '.[0:50] | .[] | {sha: .sha[0:7], date: .commit.author.date, message: .commit.message}' 2>/dev/null || git --no-pager log main --oneline -50
|
|
39
|
-
```
|
|
40
|
-
|
|
41
|
-
Compute:
|
|
42
|
-
- **PR count** — total, open, merged, closed-without-merge
|
|
43
|
-
- **Merge rate** — merged / (merged + closed-without-merge)
|
|
44
|
-
- **Time to merge** — median and average (createdAt → mergedAt)
|
|
45
|
-
- **PR size** — average additions, deletions, changedFiles
|
|
46
|
-
- **Commit frequency** — commits per day/week on main
|
|
47
|
-
- **Bogus/closed PRs** — PRs closed without merge (potential failed agent work)
|
|
48
|
-
|
|
49
|
-
### 3. Tracker Issues
|
|
50
|
-
|
|
51
|
-
Use tracker MCP tools (`list_issues`, `search_issues`):
|
|
52
|
-
|
|
53
|
-
```
|
|
54
|
-
list_issues with status filter for each state: Backlog, Todo, In Progress, Done, Cancelled
|
|
55
|
-
```
|
|
56
|
-
|
|
57
|
-
Compute:
|
|
58
|
-
- **Issue count by status** — Backlog, Todo, In Progress, Done, Cancelled
|
|
59
|
-
- **Completion rate** — Done / (Done + Cancelled + In Progress + Todo)
|
|
60
|
-
- **Issues by label** — which areas have the most work
|
|
61
|
-
- **Issues by priority** — distribution across Urgent/High/Medium/Low
|
|
62
|
-
- **Cycle time** — average time from In Progress → Done (if dates available)
|
|
63
|
-
- **Stale issues** — In Progress for >7 days without updates
|
|
64
|
-
|
|
65
|
-
### 4. Deployments
|
|
66
|
-
|
|
67
|
-
Use deployment platform tools (if available via MCP or CLI):
|
|
68
|
-
|
|
69
|
-
Query deployments for all configured apps (see `project.instructions.md` for the app inventory).
|
|
70
|
-
|
|
71
|
-
Compute:
|
|
72
|
-
- **Total deployments** — count over last 30 days
|
|
73
|
-
- **Deployment success rate** — ready / (ready + error + cancelled)
|
|
74
|
-
- **Failure rate** — error / total
|
|
75
|
-
- **Build times** — average, median, p95
|
|
76
|
-
- **Deployments per day** — activity timeline
|
|
77
|
-
- **Failed deployment details** — which commits/branches failed and why
|
|
78
|
-
|
|
79
|
-
### 5. Panel Reviews (local)
|
|
80
|
-
|
|
81
|
-
Read `.github/customizations/logs/panels.ndjson`.
|
|
82
|
-
|
|
83
|
-
Compute:
|
|
84
|
-
- **Total reviews** — count of panel runs
|
|
85
|
-
- **Pass rate** — pass / total
|
|
86
|
-
- **Must-fix vs should-fix** — average counts per review
|
|
87
|
-
- **Retry rate** — reviews with attempt > 1
|
|
88
|
-
- **Model usage** — which reviewer models used
|
|
89
|
-
- **Reviews by panel key** — what gets reviewed most
|
|
90
|
-
|
|
91
|
-
### 6. Agent Failures (DLQ)
|
|
92
|
-
|
|
93
|
-
Read `.github/customizations/AGENT-FAILURES.md`.
|
|
94
|
-
|
|
95
|
-
Compute:
|
|
96
|
-
- **Total failures** — count of DLQ entries
|
|
97
|
-
- **Failures by agent** — which agents fail most
|
|
98
|
-
- **Failure status** — pending vs resolved
|
|
99
|
-
- **Common root causes** — categorize failure reasons
|
|
100
|
-
|
|
101
|
-
## Report Format
|
|
102
|
-
|
|
103
|
-
Present the report as a structured markdown summary with these sections:
|
|
104
|
-
|
|
105
|
-
```markdown
|
|
106
|
-
# Project Metrics Dashboard
|
|
107
|
-
> Generated: {date} | Period: Last 30 days
|
|
108
|
-
|
|
109
|
-
## Executive Summary
|
|
110
|
-
- X agent sessions, Y% success rate
|
|
111
|
-
- Z PRs merged, W% merge rate
|
|
112
|
-
- N deployments, M% success rate
|
|
113
|
-
- P tracker issues completed
|
|
114
|
-
|
|
115
|
-
## Agent Activity
|
|
116
|
-
{sessions table, success rates, model usage}
|
|
117
|
-
|
|
118
|
-
## Delegation Performance
|
|
119
|
-
{per-agent delegation stats, tier distribution}
|
|
120
|
-
|
|
121
|
-
## GitHub
|
|
122
|
-
{PR stats, merge rates, commit frequency}
|
|
123
|
-
|
|
124
|
-
## Task Board
|
|
125
|
-
{issue distribution, completion rate, stale issues}
|
|
126
|
-
|
|
127
|
-
## Deployments
|
|
128
|
-
{success rate, failure rate, build times}
|
|
129
|
-
|
|
130
|
-
## Panel Reviews
|
|
131
|
-
{pass rate, retry rate, must-fix/should-fix stats}
|
|
132
|
-
|
|
133
|
-
## Agent Failures (DLQ)
|
|
134
|
-
{failure count, pending items}
|
|
135
|
-
|
|
136
|
-
## Trends & Recommendations
|
|
137
|
-
{observations, areas for improvement}
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
## Usage
|
|
141
|
-
|
|
142
|
-
Run this prompt periodically (weekly recommended) to track project health. Compare with previous reports to identify trends.
|
|
143
|
-
|
|
144
|
-
If session logs are empty (no data yet), still collect GitHub/tracker/deployment data and note that agent logging has just been enabled.
|