@pieerry/harness-kit 3.1.2 → 3.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/.hk-version +1 -0
- package/.claude/agents/product-manager.md +2 -2
- package/.claude/agents/staff-software-engineer.md +2 -2
- package/.claude/commands/pipeline/continue.md +8 -8
- package/.claude/commands/pipeline/reset.md +4 -4
- package/.claude/commands/product-manager/prd.md +4 -4
- package/.claude/commands/product-manager/prp.md +7 -7
- package/.claude/commands/product-manager/run.md +9 -5
- package/.claude/commands/sse/dev.md +18 -13
- package/.claude/commands/sse/plan.md +8 -8
- package/.claude/commands/sse/pr-monitor.md +56 -0
- package/.claude/commands/sse/pr.md +22 -12
- package/.claude/commands/sse/run.md +15 -8
- package/.claude/commands/sse/test.md +16 -10
- package/.claude/conventions/README.md +12 -0
- package/.claude/hooks/activity-pre-read.sh +18 -0
- package/.claude/hooks/pipeline-session-start.sh +26 -2
- package/.claude/hooks/status-line.sh +17 -1
- package/.claude/plugins/product-manager/evals/prd-quality.md +3 -3
- package/.claude/plugins/product-manager/evals/prd-readiness.md +1 -1
- package/.claude/plugins/product-manager/evals/prp-context-readiness.md +5 -5
- package/.claude/plugins/product-manager/evals/prp-quality.md +1 -1
- package/.claude/plugins/product-manager/guides/pipeline.md +14 -14
- package/.claude/plugins/product-manager/guides/prd-guidelines.md +4 -4
- package/.claude/plugins/product-manager/guides/product-guidelines.md +16 -16
- package/.claude/plugins/product-manager/guides/prp-guidelines.md +16 -16
- package/.claude/plugins/product-manager/guides/writing-style.md +9 -9
- package/.claude/plugins/product-manager/sensors/prd-acceptance-criteria.md +6 -6
- package/.claude/plugins/product-manager/sensors/prd-structure.md +2 -2
- package/.claude/plugins/product-manager/sensors/prp-context-quality.md +6 -6
- package/.claude/plugins/product-manager/sensors/prp-links.md +2 -2
- package/.claude/plugins/product-manager/sensors/prp-structure.md +1 -1
- package/.claude/plugins/staff-software-engineer/evals/dev-quality.md +48 -0
- package/.claude/plugins/staff-software-engineer/evals/plan-quality.md +6 -6
- package/.claude/plugins/staff-software-engineer/evals/pr-quality.md +48 -0
- package/.claude/plugins/staff-software-engineer/evals/test-quality.md +48 -0
- package/.claude/plugins/staff-software-engineer/guides/coding-style.md +7 -7
- package/.claude/plugins/staff-software-engineer/guides/commit-style.md +3 -3
- package/.claude/plugins/staff-software-engineer/guides/conventions-override.md +13 -13
- package/.claude/plugins/staff-software-engineer/guides/pipeline.md +19 -15
- package/.claude/plugins/staff-software-engineer/hooks/post-eval-sse.sh +65 -0
- package/.claude/plugins/staff-software-engineer/hooks/post-write-sse.sh +60 -0
- package/.claude/plugins/staff-software-engineer/sensors/code-conventions.md +4 -4
- package/.claude/plugins/staff-software-engineer/sensors/dev-structure.md +46 -0
- package/.claude/plugins/staff-software-engineer/sensors/pr-structure.md +49 -0
- package/.claude/plugins/staff-software-engineer/sensors/test-coverage.md +6 -6
- package/.claude/plugins/staff-software-engineer/sensors/test-structure.md +46 -0
- package/.claude/plugins/staff-software-engineer/skills/backend/SKILL.md +8 -8
- package/.claude/plugins/staff-software-engineer/skills/devops/SKILL.md +3 -3
- package/.claude/plugins/staff-software-engineer/skills/mobile/SKILL.md +3 -3
- package/.claude/plugins/staff-software-engineer/skills/web/SKILL.md +2 -2
- package/.claude/scripts/activity.py +68 -0
- package/.claude/scripts/pr-monitor.py +113 -0
- package/.claude/scripts/stage-card.md +37 -34
- package/.claude/settings.json +74 -0
- package/.claude/settings.local.json +13 -1
- package/CLAUDE.md +6 -0
- package/README.md +161 -61
- package/VERSION +1 -1
- package/bin/hk.js +6 -1
- package/package.json +1 -1
- package/setup/install.sh +16 -8
- package/.claude/plugins/staff-software-engineer/hooks/post-eval-plan.sh +0 -43
- package/.claude/plugins/staff-software-engineer/hooks/post-write-plan.sh +0 -49
|
@@ -9,21 +9,21 @@ Score each dimension 0-10. Cite line numbers when scoring below 7. Weighted tota
|
|
|
9
9
|
## Rubric
|
|
10
10
|
|
|
11
11
|
### Clarity (weight 20%)
|
|
12
|
-
|
|
12
|
+
Problem stated in 1-2 sentences? Names who suffers, how often, what it costs?
|
|
13
13
|
|
|
14
14
|
- 10: crisp, specific, evidence-backed
|
|
15
15
|
- 5: present but generic
|
|
16
16
|
- 0: missing or vague
|
|
17
17
|
|
|
18
18
|
### Hypothesis (weight 15%)
|
|
19
|
-
|
|
19
|
+
"If we X, then Y will Z, because W" hypothesis with numeric target?
|
|
20
20
|
|
|
21
21
|
- 10: falsifiable, numeric, evidence-tied
|
|
22
22
|
- 5: directional but missing target or evidence
|
|
23
23
|
- 0: aspirational, no measurable claim
|
|
24
24
|
|
|
25
25
|
### Customer specificity (weight 10%)
|
|
26
|
-
Real customers named with reasons each
|
|
26
|
+
Real customers named with reasons each matters?
|
|
27
27
|
|
|
28
28
|
- 10: concrete, differentiated
|
|
29
29
|
- 5: segments named but thin
|
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
Type: rule-based stage gate
|
|
4
4
|
Mode: advisory
|
|
5
5
|
|
|
6
|
-
Decides whether
|
|
6
|
+
Decides whether PRD has enough content for declared stage to advance. Run after prd-quality passes.
|
|
7
7
|
|
|
8
8
|
## Stage gates
|
|
9
9
|
|
|
@@ -22,11 +22,11 @@ If any fail, abort and return sensor feedback.
|
|
|
22
22
|
|
|
23
23
|
## Phase 2: executor simulation
|
|
24
24
|
|
|
25
|
-
You are
|
|
25
|
+
You are coding agent that has never seen this codebase. Handed only this PRP. Cannot ask PM follow-ups. Read end-to-end and answer:
|
|
26
26
|
|
|
27
|
-
1. Could you ship with only what is in
|
|
28
|
-
2. List every place
|
|
29
|
-
3. List every claim
|
|
27
|
+
1. Could you ship with only what is in PRP plus linked files? (yes / partial / no)
|
|
28
|
+
2. List every place you would need to stop and ask a human.
|
|
29
|
+
3. List every claim looking plausible but you cannot verify from PRP alone.
|
|
30
30
|
4. Rate one-shot-success likelihood from 0 to 1.
|
|
31
31
|
|
|
32
32
|
## Output format
|
|
@@ -48,4 +48,4 @@ PM confirms or requests another iteration.
|
|
|
48
48
|
|
|
49
49
|
Phase 1 fails: re-run failing sensor, regenerate failed sections.
|
|
50
50
|
Phase 2 fails criteria: regenerate weakest section based on blocking_questions.
|
|
51
|
-
Phase 3 rejects: collect PM feedback, retry
|
|
51
|
+
Phase 3 rejects: collect PM feedback, retry loop.
|
|
@@ -30,7 +30,7 @@ Blueprint specific enough that executor never decides naming, file location, or
|
|
|
30
30
|
- 0: prose handwaving
|
|
31
31
|
|
|
32
32
|
### Validation executability (weight 15%)
|
|
33
|
-
Validation gates are real commands
|
|
33
|
+
Validation gates are real commands executor copy-pastes? Manual checks specific?
|
|
34
34
|
|
|
35
35
|
- 10: copy-paste ready
|
|
36
36
|
- 5: commands with placeholders
|
|
@@ -1,13 +1,13 @@
|
|
|
1
1
|
# Pipeline
|
|
2
2
|
|
|
3
|
-
Shared rules for PRD and PRP generation. Edit retry, approval, publish,
|
|
3
|
+
Shared rules for PRD and PRP generation. Edit retry, approval, publish, token accounting here.
|
|
4
4
|
|
|
5
5
|
## Stages per artifact
|
|
6
6
|
|
|
7
|
-
1. Generate
|
|
7
|
+
1. Generate artifact using template in skills/{prd|prp}/SKILL.md.
|
|
8
8
|
2. Apply sensors listed in skills/{prd|prp}/SKILL.md.
|
|
9
9
|
3. Apply evals listed in skills/{prd|prp}/SKILL.md.
|
|
10
|
-
4. Append
|
|
10
|
+
4. Append approval marker. Publish hook fires.
|
|
11
11
|
|
|
12
12
|
## Retry policy
|
|
13
13
|
|
|
@@ -18,7 +18,7 @@ Trigger on:
|
|
|
18
18
|
- eval weighted total below threshold (default 8.0)
|
|
19
19
|
|
|
20
20
|
Strategy:
|
|
21
|
-
1. Read
|
|
21
|
+
1. Read feedback.
|
|
22
22
|
2. Regenerate only failed or low-scoring sections.
|
|
23
23
|
3. Re-run sensors and eval.
|
|
24
24
|
|
|
@@ -42,7 +42,7 @@ Triggers hooks/post-eval-{prd|prp}.sh.
|
|
|
42
42
|
|
|
43
43
|
Local: file stays in outputs/{prd|prp}/.
|
|
44
44
|
|
|
45
|
-
Confluence: fires if JIRA_USERNAME and JIRA_API_TOKEN
|
|
45
|
+
Confluence: fires if JIRA_USERNAME and JIRA_API_TOKEN set. Calls scripts/confluence-publish.py. Missing creds skips silently.
|
|
46
46
|
|
|
47
47
|
After publish, hook appends:
|
|
48
48
|
```
|
|
@@ -51,34 +51,34 @@ After publish, hook appends:
|
|
|
51
51
|
|
|
52
52
|
## Token accounting
|
|
53
53
|
|
|
54
|
-
Each phase brackets
|
|
54
|
+
Each phase brackets measurable token window. Phases:
|
|
55
55
|
- prd-generate: from skill invocation to first save in outputs/prd/
|
|
56
|
-
- prd-validate: from first save to approval marker (
|
|
56
|
+
- prd-validate: from first save to approval marker (sensors + evals)
|
|
57
57
|
- prp-generate: from skill invocation to first save in outputs/prp/
|
|
58
58
|
- prp-validate: from first save to approval marker
|
|
59
59
|
|
|
60
|
-
Markers
|
|
60
|
+
Markers in outputs/.markers/{feature_id}.{phase}.{start|end}, each `{"timestamp": ISO, "session_id": ""}`. Skill writes .start; hooks write .end.
|
|
61
61
|
|
|
62
|
-
After
|
|
62
|
+
After eval passes, publish hook runs scripts/token-phase.py for both phases. Script reads Claude transcript JSONL, sums usage tokens (input, output, cache_read, cache_creation) within each window, appends phase entry to outputs/tokens/{feature_id}.json. Markers deleted after consumption.
|
|
63
63
|
|
|
64
|
-
|
|
64
|
+
Post-eval hook also appends inline summary to artifact:
|
|
65
65
|
```
|
|
66
66
|
<!-- tokens: outputs/tokens/{feature_id}.json in={N} out={N} cache_r={N} -->
|
|
67
67
|
```
|
|
68
68
|
|
|
69
|
-
Future phases (dev, code review, launch) append
|
|
69
|
+
Future phases (dev, code review, launch) append entries to same outputs/tokens/{feature_id}.json by reusing feature_id slug.
|
|
70
70
|
|
|
71
|
-
If
|
|
71
|
+
If transcript not readable, script logs warning and exits 0. Token accounting never blocks publish.
|
|
72
72
|
|
|
73
73
|
## Orchestrator order
|
|
74
74
|
|
|
75
75
|
1. PRD: stages 1-4.
|
|
76
|
-
2. hooks/pre-prp-check.sh validates
|
|
76
|
+
2. hooks/pre-prp-check.sh validates PRD approved marker.
|
|
77
77
|
3. PRP: stages 1-4.
|
|
78
78
|
4. Return summary.
|
|
79
79
|
|
|
80
80
|
## Stop conditions
|
|
81
81
|
|
|
82
|
-
- 3 failed attempts on
|
|
82
|
+
- 3 failed attempts on same artifact
|
|
83
83
|
- missing required input after one clarification
|
|
84
84
|
- pre-prp-check.sh denies PRP start
|
|
@@ -2,11 +2,11 @@
|
|
|
2
2
|
|
|
3
3
|
Rules specific to PRDs. General PM rules in product-guidelines.md.
|
|
4
4
|
|
|
5
|
-
Business only. Zero technical detail. File paths, APIs, schemas belong in
|
|
5
|
+
Business only. Zero technical detail. File paths, APIs, schemas belong in PRP.
|
|
6
6
|
|
|
7
7
|
User stories: As a {persona}, I want {action}, so that {benefit}.
|
|
8
8
|
|
|
9
|
-
Metrics need baseline, target, horizon
|
|
9
|
+
Metrics need baseline, target, horizon per row. No empty cells.
|
|
10
10
|
|
|
11
11
|
Mermaid only, never ASCII.
|
|
12
12
|
|
|
@@ -18,10 +18,10 @@ Word count by stage:
|
|
|
18
18
|
- Solution Review: 800-1200
|
|
19
19
|
- Launch Readiness: 1500-2000
|
|
20
20
|
|
|
21
|
-
Required sections
|
|
21
|
+
Required sections per stage:
|
|
22
22
|
- Team Kickoff: 1, 2, 8
|
|
23
23
|
- Planning Review: + 5
|
|
24
24
|
- Solution Review: + 4, 7
|
|
25
25
|
- Launch Readiness: all
|
|
26
26
|
|
|
27
|
-
Stage progression: do not skip stages. Each gate adds requirements, never removes
|
|
27
|
+
Stage progression: do not skip stages. Each gate adds requirements, never removes.
|
|
@@ -1,23 +1,23 @@
|
|
|
1
1
|
# Product Guidelines
|
|
2
2
|
|
|
3
|
-
How PMs think about product. Read before drafting
|
|
3
|
+
How PMs think about product. Read before drafting PRD.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Ship outcomes, not features
|
|
6
6
|
|
|
7
|
-
Every PRD touches
|
|
8
|
-
- Who pays for
|
|
7
|
+
Every PRD touches real user or customer cohort. Always answer:
|
|
8
|
+
- Who pays for pain today?
|
|
9
9
|
- Whose workflow changes?
|
|
10
|
-
- What metric in
|
|
10
|
+
- What metric in customer's business or experience moves?
|
|
11
11
|
|
|
12
|
-
If you cannot answer all three,
|
|
12
|
+
If you cannot answer all three, PRD not ready.
|
|
13
13
|
|
|
14
14
|
## Squads
|
|
15
15
|
|
|
16
|
-
Defined per org in `context-library/business-info.md` and `context-library/squads/`. Multi-squad PRDs: name
|
|
16
|
+
Defined per org in `context-library/business-info.md` and `context-library/squads/`. Multi-squad PRDs: name primary squad, call out cross-squad interfaces.
|
|
17
17
|
|
|
18
18
|
## Customer hierarchy
|
|
19
19
|
|
|
20
|
-
When prioritizing, make trade-offs between cohorts explicit, not invisible.
|
|
20
|
+
When prioritizing, make trade-offs between cohorts explicit, not invisible. Common ranking template (override per org):
|
|
21
21
|
|
|
22
22
|
1. Paying customers with high operational impact
|
|
23
23
|
2. Reliability partners (SLAs, contracts)
|
|
@@ -26,9 +26,9 @@ When prioritizing, make trade-offs between cohorts explicit, not invisible. A co
|
|
|
26
26
|
|
|
27
27
|
## Metrics
|
|
28
28
|
|
|
29
|
-
Prefer metrics already on existing dashboards. New metrics need
|
|
29
|
+
Prefer metrics already on existing dashboards. New metrics need owner to instrument.
|
|
30
30
|
|
|
31
|
-
Strong examples (
|
|
31
|
+
Strong examples (shape of strong metric — specific, owned, instrumented):
|
|
32
32
|
- error rate of X by segment
|
|
33
33
|
- first-attempt success rate
|
|
34
34
|
- latency P95 of Y action
|
|
@@ -40,17 +40,17 @@ Weak:
|
|
|
40
40
|
- "Improved efficiency" with no number
|
|
41
41
|
- "Reduced friction" with no baseline
|
|
42
42
|
|
|
43
|
-
## Non-goals
|
|
43
|
+
## Non-goals mandatory
|
|
44
44
|
|
|
45
|
-
Every PRD lists at least one non-goal. If you cannot think of one, you have not scoped
|
|
45
|
+
Every PRD lists at least one non-goal. If you cannot think of one, you have not scoped feature.
|
|
46
46
|
|
|
47
|
-
## Rollout
|
|
47
|
+
## Rollout not "ship it"
|
|
48
48
|
|
|
49
49
|
Every PRD ends with phases, audience per phase, pass criteria per phase, rollback. Feature flags by default.
|
|
50
50
|
|
|
51
51
|
## Kill criteria
|
|
52
52
|
|
|
53
|
-
Decide before launch what makes you turn
|
|
53
|
+
Decide before launch what makes you turn feature off. Examples:
|
|
54
54
|
- error rate above X%
|
|
55
55
|
- latency P95 above Yms
|
|
56
56
|
- adoption below Z% after N days
|
|
@@ -59,7 +59,7 @@ Decide before launch what makes you turn the feature off. Examples:
|
|
|
59
59
|
|
|
60
60
|
In PRD:
|
|
61
61
|
- why we are building
|
|
62
|
-
- who feels
|
|
62
|
+
- who feels pain
|
|
63
63
|
- success metric and target
|
|
64
64
|
- open questions about scope
|
|
65
65
|
|
|
@@ -68,7 +68,7 @@ In PRP:
|
|
|
68
68
|
- API endpoints, validation commands
|
|
69
69
|
- open questions about implementation
|
|
70
70
|
|
|
71
|
-
If
|
|
71
|
+
If question is "should we build this", PRD. If "how do we build this", PRP.
|
|
72
72
|
|
|
73
73
|
## Diagrams
|
|
74
74
|
|
|
@@ -1,32 +1,32 @@
|
|
|
1
1
|
# PRP Guidelines
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
PRP = PRD + curated codebase intelligence + agent runbook. If executor leaves document to figure out what to do, PRP incomplete.
|
|
4
4
|
|
|
5
5
|
## Three pillars
|
|
6
6
|
|
|
7
7
|
### Context
|
|
8
8
|
|
|
9
|
-
Everything
|
|
9
|
+
Everything executor needs, included or linked. Specific:
|
|
10
10
|
- file paths with line numbers when localized
|
|
11
11
|
- existing patterns to mimic, one concrete example each
|
|
12
|
-
- external docs scoped to
|
|
12
|
+
- external docs scoped to relevant section
|
|
13
13
|
- known gotchas (concurrency, idempotency, retries, timezones)
|
|
14
14
|
|
|
15
|
-
If
|
|
15
|
+
If codebase has similar feature, point to it. Coding agents work best by analogy.
|
|
16
16
|
|
|
17
17
|
### Implementation detail
|
|
18
18
|
|
|
19
19
|
Pseudocode or numbered steps concrete enough to translate one-to-one. Include:
|
|
20
20
|
- class names, method signatures, return types where decided
|
|
21
|
-
- schema changes with
|
|
21
|
+
- schema changes with actual migration file or DDL
|
|
22
22
|
- endpoint definitions: method, path, request/response shape
|
|
23
23
|
- logging, metrics, alert hooks
|
|
24
24
|
|
|
25
|
-
You do not write
|
|
25
|
+
You do not write code. You remove every "what should I name this" decision.
|
|
26
26
|
|
|
27
27
|
### Validation gates
|
|
28
28
|
|
|
29
|
-
Real commands
|
|
29
|
+
Real commands executor runs and passes. Vague checks fail.
|
|
30
30
|
|
|
31
31
|
Bad:
|
|
32
32
|
- "Make sure tests pass"
|
|
@@ -36,29 +36,29 @@ Good:
|
|
|
36
36
|
- `npm --prefix web run typecheck`
|
|
37
37
|
- `./gradlew :billing-service:detekt`
|
|
38
38
|
|
|
39
|
-
Manual gates count,
|
|
40
|
-
- [ ] Open `https://staging.example.com/invoices/new` and verify
|
|
39
|
+
Manual gates count, must be specific:
|
|
40
|
+
- [ ] Open `https://staging.example.com/invoices/new` and verify deadline badge renders for region `EU-WEST`.
|
|
41
41
|
|
|
42
|
-
##
|
|
42
|
+
## Good PRP
|
|
43
43
|
|
|
44
44
|
1. Executor reads top-to-bottom, never tabs out except to view linked files.
|
|
45
|
-
2. No decisions left for
|
|
46
|
-
3. Every TODO has
|
|
47
|
-
4. Validation block
|
|
45
|
+
2. No decisions left for executor that change scope.
|
|
46
|
+
3. Every TODO has owner.
|
|
47
|
+
4. Validation block copy-pasteable.
|
|
48
48
|
|
|
49
49
|
## Anti-patterns
|
|
50
50
|
|
|
51
51
|
- Prose mush where bullets would do
|
|
52
|
-
- Restating
|
|
52
|
+
- Restating PRD instead of linking
|
|
53
53
|
- Hidden scope ("also we should refactor X")
|
|
54
54
|
- Optimism about state ("there is probably a helper for this already")
|
|
55
55
|
- Fabricated file paths or class names
|
|
56
56
|
|
|
57
|
-
## When to split
|
|
57
|
+
## When to split PRP
|
|
58
58
|
|
|
59
59
|
Split if:
|
|
60
60
|
- scope crosses two repos with independent deploy cycles
|
|
61
61
|
- one PR would exceed ~600 lines
|
|
62
62
|
- independent validation gates pass or fail separately
|
|
63
63
|
|
|
64
|
-
Otherwise keep
|
|
64
|
+
Otherwise keep one. Context fragmentation costs more than slightly longer doc.
|
|
@@ -1,13 +1,13 @@
|
|
|
1
1
|
# Writing Style
|
|
2
2
|
|
|
3
|
-
How PRDs and PRPs read
|
|
3
|
+
How PRDs and PRPs read.
|
|
4
4
|
|
|
5
5
|
## Voice
|
|
6
6
|
|
|
7
|
-
- Direct. Lead with
|
|
7
|
+
- Direct. Lead with point.
|
|
8
8
|
- Specific. Real numbers, real names, real quotes. "47% abandon at step 3" beats "many users struggle".
|
|
9
9
|
- Conversational, professional. Contractions ok. Use "we", not "I".
|
|
10
|
-
- Action-oriented. Every section helps
|
|
10
|
+
- Action-oriented. Every section helps reader decide or act.
|
|
11
11
|
|
|
12
12
|
## Length
|
|
13
13
|
|
|
@@ -32,21 +32,21 @@ Dead giveaways for AI fluff:
|
|
|
32
32
|
|
|
33
33
|
## Punctuation
|
|
34
34
|
|
|
35
|
-
- No em-dashes. Use commas, periods, parentheses, or
|
|
35
|
+
- No em-dashes. Use commas, periods, parentheses, or new sentence.
|
|
36
36
|
- Oxford comma: yes.
|
|
37
|
-
- One space after
|
|
37
|
+
- One space after period.
|
|
38
38
|
|
|
39
39
|
## Negative parallelism
|
|
40
40
|
|
|
41
|
-
Avoid. Lead with
|
|
41
|
+
Avoid. Lead with positive:
|
|
42
42
|
- bad: "Don't use X. Use Y."
|
|
43
43
|
- good: "Use Y."
|
|
44
44
|
|
|
45
45
|
## AI-detector test
|
|
46
46
|
|
|
47
|
-
If
|
|
47
|
+
If paragraph sounds like LinkedIn post, rewrite. Real human writing:
|
|
48
48
|
- occasional sentence starting with "And" or "But"
|
|
49
|
-
- specific details
|
|
49
|
+
- specific details generic writer would not know
|
|
50
50
|
- slightly imperfect rhythm
|
|
51
51
|
- mid-paragraph asides
|
|
52
52
|
|
|
@@ -64,7 +64,7 @@ Not:
|
|
|
64
64
|
|
|
65
65
|
## Tables
|
|
66
66
|
|
|
67
|
-
Use when you have comparison across 3+ items, structured data with same fields per row, or decision matrices. Otherwise
|
|
67
|
+
Use when you have comparison across 3+ items, structured data with same fields per row, or decision matrices. Otherwise bullets.
|
|
68
68
|
|
|
69
69
|
## Diagrams
|
|
70
70
|
|
|
@@ -16,23 +16,23 @@ Mode: hard gate
|
|
|
16
16
|
|
|
17
17
|
## Checks
|
|
18
18
|
|
|
19
|
-
User story format. Every
|
|
19
|
+
User story format. Every Solution Overview bullet starting with "As a" must follow:
|
|
20
20
|
|
|
21
21
|
```
|
|
22
22
|
- As a {persona}, I want {action}, so that {benefit}.
|
|
23
23
|
```
|
|
24
24
|
|
|
25
|
-
Metric completeness. Success Metrics table: every row
|
|
25
|
+
Metric completeness. Success Metrics table: every row needs non-empty Baseline, Target, Horizon. Cells with only "TBD", "N/A", "?", or empty fail.
|
|
26
26
|
|
|
27
|
-
Guardrails block.
|
|
27
|
+
Guardrails block. "Guardrails" line must exist in Success Metrics.
|
|
28
28
|
|
|
29
|
-
Kill criteria.
|
|
29
|
+
Kill criteria. "Kill criteria:" line must exist in Success Metrics and contain at least one digit.
|
|
30
30
|
|
|
31
31
|
Rollout phases. Rollout table needs at least 2 rows.
|
|
32
32
|
|
|
33
|
-
Non-goals listed. Scope and Non-Goals
|
|
33
|
+
Non-goals listed. Scope and Non-Goals needs Non-goals subsection with 1-3 bullets.
|
|
34
34
|
|
|
35
|
-
Open gaps budget.
|
|
35
|
+
Open gaps budget. `NOT FOUND` marker count must be 5 or fewer.
|
|
36
36
|
|
|
37
37
|
## On failure
|
|
38
38
|
|
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
Type: deterministic
|
|
4
4
|
Mode: hard gate
|
|
5
5
|
|
|
6
|
-
## Required sections (all
|
|
6
|
+
## Required sections (all present, in order)
|
|
7
7
|
|
|
8
8
|
- Problem and Hypothesis
|
|
9
9
|
- Customers
|
|
@@ -36,4 +36,4 @@ Mode: hard gate
|
|
|
36
36
|
|
|
37
37
|
## On failure
|
|
38
38
|
|
|
39
|
-
Block publish. Return
|
|
39
|
+
Block publish. Return missing sections, forbidden tokens, rule violations. Agent regenerates failed parts only.
|
|
@@ -3,7 +3,7 @@
|
|
|
3
3
|
Type: deterministic plus heuristic
|
|
4
4
|
Mode: hard gate
|
|
5
5
|
|
|
6
|
-
|
|
6
|
+
PRP must give executor enough context to ship without coming back to ask.
|
|
7
7
|
|
|
8
8
|
## Required sections
|
|
9
9
|
|
|
@@ -13,9 +13,9 @@ A PRP must give an executor enough context to ship without coming back to ask.
|
|
|
13
13
|
|
|
14
14
|
## Checks within Context
|
|
15
15
|
|
|
16
|
-
File paths referenced. Repos and files touched table must reference at least 2 real file paths.
|
|
16
|
+
File paths referenced. Repos and files touched table must reference at least 2 real file paths. File path has `/` and known extension (java, kt, ts, tsx, js, jsx, vue, py, sql, yaml, yml, md, sh, gradle).
|
|
17
17
|
|
|
18
|
-
Patterns with concrete examples. Patterns to follow
|
|
18
|
+
Patterns with concrete examples. Patterns to follow needs at least 1 pattern. Each pattern must include `Example in codebase:` line pointing to concrete path.
|
|
19
19
|
|
|
20
20
|
Known gotchas. Subsection `### Known gotchas` must exist with at least 1 bullet.
|
|
21
21
|
|
|
@@ -23,9 +23,9 @@ External documentation. Subsection `### External documentation` must exist. Eith
|
|
|
23
23
|
|
|
24
24
|
## Checks within Implementation blueprint
|
|
25
25
|
|
|
26
|
-
Numbered steps in
|
|
26
|
+
Numbered steps in fenced code block.
|
|
27
27
|
|
|
28
|
-
At least 1 numbered step references
|
|
28
|
+
At least 1 numbered step references class, file, or command (uppercase or path-shaped token).
|
|
29
29
|
|
|
30
30
|
## Checks within What
|
|
31
31
|
|
|
@@ -35,7 +35,7 @@ Success criteria must include at least 1 line matching:
|
|
|
35
35
|
|
|
36
36
|
## Open gaps budget
|
|
37
37
|
|
|
38
|
-
|
|
38
|
+
`NEEDS REVIEW`, `NOT FOUND`, or `TBD` marker count must be 5 or fewer.
|
|
39
39
|
|
|
40
40
|
## On failure
|
|
41
41
|
|
|
@@ -7,7 +7,7 @@ Implemented in scripts/link-validator.py. Same rules when agent self-checks.
|
|
|
7
7
|
|
|
8
8
|
## Hard checks (block)
|
|
9
9
|
|
|
10
|
-
Source PRD resolvable.
|
|
10
|
+
Source PRD resolvable. "Source PRD:" line must point to path existing under outputs/prd/, relative to PRP file, or relative to repo root.
|
|
11
11
|
|
|
12
12
|
No localhost URLs. `http://localhost` or `http://127.0.0.1` not allowed as pinned references.
|
|
13
13
|
|
|
@@ -15,7 +15,7 @@ GitHub permalinks. Links like `github.com/.../blob/{main|master|develop}/...` no
|
|
|
15
15
|
|
|
16
16
|
## Soft checks (warn)
|
|
17
17
|
|
|
18
|
-
Path format. Inline code spans
|
|
18
|
+
Path format. Inline code spans looking like file paths (have `/` and end with extension) should end in recognized extension.
|
|
19
19
|
|
|
20
20
|
URL syntax. http/https URLs must be syntactically valid.
|
|
21
21
|
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Eval: Dev Quality
|
|
2
|
+
|
|
3
|
+
Type: LLM-judge
|
|
4
|
+
Mode: quality gate
|
|
5
|
+
Threshold: weighted total >= 8.0
|
|
6
|
+
|
|
7
|
+
Score each dimension 0-10. Cite file paths or commit SHAs when below 7. Weighted total = sum(score x weight%) / 100.
|
|
8
|
+
|
|
9
|
+
## Rubric
|
|
10
|
+
|
|
11
|
+
### Plan fidelity (weight 25%)
|
|
12
|
+
Did implementation cover every step in source plan? Any scope creep or missed items?
|
|
13
|
+
|
|
14
|
+
### Files touched specificity (weight 15%)
|
|
15
|
+
Real, concrete paths listed (not placeholders)? Match what plan named?
|
|
16
|
+
|
|
17
|
+
### Commit hygiene (weight 20%)
|
|
18
|
+
Conventional Commits prefix correct. Small commits (1-4 files, < 100 lines ideal). One concern per commit. Messages explain why, not what.
|
|
19
|
+
|
|
20
|
+
### Test coverage (weight 20%)
|
|
21
|
+
Every new feature/bugfix has matching tests. Edge cases covered. Tests run before commit.
|
|
22
|
+
|
|
23
|
+
### Convention adherence (weight 10%)
|
|
24
|
+
Coding style, framework version, package layout match `coding-style.md` and `conventions/{area}.md`.
|
|
25
|
+
|
|
26
|
+
### Blocker quality (weight 10%)
|
|
27
|
+
If blockers exist, specific (file:line, error message). If none, state `none` explicitly.
|
|
28
|
+
|
|
29
|
+
## On failure (total below 8.0)
|
|
30
|
+
|
|
31
|
+
Retry. Regenerate weakest section only. Max 3 attempts.
|
|
32
|
+
|
|
33
|
+
## Output
|
|
34
|
+
|
|
35
|
+
```json
|
|
36
|
+
{
|
|
37
|
+
"scores": {
|
|
38
|
+
"plan_fidelity": 0,
|
|
39
|
+
"files_touched": 0,
|
|
40
|
+
"commit_hygiene": 0,
|
|
41
|
+
"test_coverage": 0,
|
|
42
|
+
"convention_adherence": 0,
|
|
43
|
+
"blocker_quality": 0
|
|
44
|
+
},
|
|
45
|
+
"weighted_total": 0.0,
|
|
46
|
+
"feedback": ["dimension: specific issue with file or sha ref"]
|
|
47
|
+
}
|
|
48
|
+
```
|
|
@@ -9,22 +9,22 @@ Score each dimension 0-10. Cite line numbers when below 7. Weighted total = sum(
|
|
|
9
9
|
## Rubric
|
|
10
10
|
|
|
11
11
|
### Scope clarity (weight 20%)
|
|
12
|
-
|
|
12
|
+
What is being built clear? Tied to source PRP?
|
|
13
13
|
|
|
14
14
|
### Files touched specificity (weight 20%)
|
|
15
|
-
|
|
15
|
+
Files and modules concrete? Real paths, not placeholders?
|
|
16
16
|
|
|
17
17
|
### Execution flow (weight 20%)
|
|
18
|
-
|
|
18
|
+
Steps actionable in order? Could fresh engineer follow them?
|
|
19
19
|
|
|
20
20
|
### Risk awareness (weight 15%)
|
|
21
|
-
|
|
21
|
+
Real risks called out with mitigations?
|
|
22
22
|
|
|
23
23
|
### Rollout (weight 15%)
|
|
24
|
-
|
|
24
|
+
Phased plan or feature flag strategy?
|
|
25
25
|
|
|
26
26
|
### Tests (weight 10%)
|
|
27
|
-
|
|
27
|
+
Plan names test cases to cover?
|
|
28
28
|
|
|
29
29
|
## On failure (total below 8.0)
|
|
30
30
|
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Eval: PR Quality
|
|
2
|
+
|
|
3
|
+
Type: LLM-judge
|
|
4
|
+
Mode: quality gate
|
|
5
|
+
Threshold: weighted total >= 8.0
|
|
6
|
+
|
|
7
|
+
Score each dimension 0-10. Cite line refs from PR body when below 7. Weighted total = sum(score x weight%) / 100.
|
|
8
|
+
|
|
9
|
+
## Rubric
|
|
10
|
+
|
|
11
|
+
### Title quality (weight 20%)
|
|
12
|
+
Conventional Commits prefix correct (`feat`, `fix`, `chore`, etc). Scope present if relevant. <= 70 chars. Concrete, not vague.
|
|
13
|
+
|
|
14
|
+
### Summary clarity (weight 20%)
|
|
15
|
+
Bullets explain what changed and why. Reviewer can grasp change without opening files. No marketing language.
|
|
16
|
+
|
|
17
|
+
### Test plan completeness (weight 20%)
|
|
18
|
+
Markdown checklist covers golden path and edge cases. Items are checkable actions, not vague ("test it works").
|
|
19
|
+
|
|
20
|
+
### Links and refs (weight 15%)
|
|
21
|
+
Source plan and dev paths linked. Ticket id (e.g. `PROJ-123`) referenced if branch carried one.
|
|
22
|
+
|
|
23
|
+
### Risk callouts (weight 15%)
|
|
24
|
+
Migration risks, feature flags, rollback plan named when applicable. Marked `none` if truly nothing.
|
|
25
|
+
|
|
26
|
+
### Draft / readiness signal (weight 10%)
|
|
27
|
+
Draft status matches stage of work. If `--ready`, CI gates passed and reviewers assignable. If draft, what's still pending is stated.
|
|
28
|
+
|
|
29
|
+
## On failure (total below 8.0)
|
|
30
|
+
|
|
31
|
+
Retry. Regenerate weakest section only. Max 3 attempts.
|
|
32
|
+
|
|
33
|
+
## Output
|
|
34
|
+
|
|
35
|
+
```json
|
|
36
|
+
{
|
|
37
|
+
"scores": {
|
|
38
|
+
"title_quality": 0,
|
|
39
|
+
"summary_clarity": 0,
|
|
40
|
+
"test_plan_completeness": 0,
|
|
41
|
+
"links_and_refs": 0,
|
|
42
|
+
"risk_callouts": 0,
|
|
43
|
+
"draft_readiness": 0
|
|
44
|
+
},
|
|
45
|
+
"weighted_total": 0.0,
|
|
46
|
+
"feedback": ["dimension: specific issue with body line ref"]
|
|
47
|
+
}
|
|
48
|
+
```
|