@haposoft/cafekit 0.7.29 → 0.8.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +21 -12
- package/package.json +5 -2
- package/src/claude/CLAUDE.md +81 -135
- package/src/claude/agents/brainstormer.md +24 -13
- package/src/claude/agents/code-auditor.md +1 -1
- package/src/claude/agents/spec-maker.md +2 -2
- package/src/claude/agents/test-runner.md +10 -8
- package/src/claude/rules/ai-dev-rules.md +36 -51
- package/src/claude/rules/hook-protocols.md +35 -0
- package/src/claude/rules/orchestrator.md +11 -0
- package/src/claude/rules/workflow.md +41 -45
- package/src/claude/skills/brainstorm/SKILL.md +123 -39
- package/src/claude/skills/chrome-devtools/scripts/package.json +3 -1
- package/src/claude/skills/code-review/references/spec-compliance-review.md +1 -1
- package/src/claude/skills/develop/SKILL.md +4 -4
- package/src/claude/skills/develop/references/quality-gate.md +2 -2
- package/src/claude/skills/git/SKILL.md +19 -2
- package/src/claude/skills/git/references/finish-branch.md +61 -0
- package/src/claude/skills/pdf/scripts/__pycache__/check_bounding_boxes.cpython-314.pyc +0 -0
- package/src/claude/skills/specs/SKILL.md +38 -16
- package/src/claude/skills/specs/references/codebase-analysis.md +33 -2
- package/src/claude/skills/specs/references/research-strategy.md +54 -7
- package/src/claude/skills/specs/references/review.md +1 -1
- package/src/claude/skills/specs/rules/tasks-generation.md +3 -3
- package/src/claude/skills/specs/templates/research.md +46 -0
- package/src/claude/skills/specs/templates/task.md +4 -2
- package/src/claude/skills/sync/SKILL.md +2 -2
- package/src/claude/skills/sync/references/sync-protocols.md +4 -4
- package/src/claude/skills/test/SKILL.md +4 -1
- package/src/claude/skills/test/references/execution-strategy.md +3 -1
- package/src/claude/skills/test/references/test-memory.md +2 -2
|
@@ -11,10 +11,10 @@ argument-hint: "<feature-description> | status | resume | --validate | archive"
|
|
|
11
11
|
|
|
12
12
|
## Overview
|
|
13
13
|
|
|
14
|
-
This skill provides a 10-step workflow to transform ideas into specs:
|
|
14
|
+
This skill provides a 10-step workflow to transform ideas into evidence-backed specs:
|
|
15
15
|
|
|
16
16
|
```
|
|
17
|
-
Analyze → Dependency Scan → Complexity Assessment → Init → Requirements → Design → Tasks → Hydration → Review → Completion
|
|
17
|
+
Analyze → Dependency Scan → Complexity Assessment → Init → Evidence Gate + Requirements → Design → Tasks → Hydration → Review → Completion
|
|
18
18
|
```
|
|
19
19
|
|
|
20
20
|
**CRITICAL:** Before starting, the system MUST:
|
|
@@ -46,6 +46,7 @@ Analyze → Dependency Scan → Complexity Assessment → Init → Requirements
|
|
|
46
46
|
- `task_files` in `spec.json` MUST exactly match the real files under `tasks/` after Step 7.
|
|
47
47
|
- `task_registry` in `spec.json` MUST exist once task files are generated and MUST contain one entry per task file, keyed by relative path.
|
|
48
48
|
- `ready_for_implementation` is a hard gate, not a convenience flag. Never set it before the finalization audit passes.
|
|
49
|
+
- Non-trivial specs MUST have an evidence trail in `research.md` before requirements, design, or tasks are finalized. Evidence can be codebase scout findings, external/current research, or an explicit skip rationale.
|
|
49
50
|
|
|
50
51
|
### Output Criteria
|
|
51
52
|
- Never implement code — only create spec documents
|
|
@@ -85,9 +86,11 @@ Display selection menu via `AskUserQuestion`:
|
|
|
85
86
|
### When called WITH a feature description
|
|
86
87
|
|
|
87
88
|
System auto-analyzes the description:
|
|
88
|
-
- If description is too short (< 20 words) or
|
|
89
|
+
- If description is too short (< 20 words) or missing one concrete detail → stop and ask 1-2 clarifying questions
|
|
90
|
+
- If the idea has unresolved architecture choices, unclear acceptance criteria, unclear scope boundaries, or multiple plausible approaches → stop and route to `/hapo:brainstorm <idea>` before creating spec artifacts
|
|
89
91
|
- If task is simple (small bugfix, config change) → suggest "A spec may not be needed for this. Continue anyway?"
|
|
90
92
|
- If task is complex (multi-module, security/migration related) → auto-activate deep research, ask user 3 scope questions
|
|
93
|
+
- For non-trivial specs, execute the Step 5 Evidence Gate before writing final requirements. Do not design from memory when codebase or current external evidence can answer the question.
|
|
91
94
|
|
|
92
95
|
### When called WITH `--validate` argument
|
|
93
96
|
|
|
@@ -111,7 +114,9 @@ flowchart TD
|
|
|
111
114
|
A["Call /hapo:specs"] --> B{Has description?}
|
|
112
115
|
B -->|No| C["Menu: init / status / resume / --validate / archive"]
|
|
113
116
|
B -->|Yes| D["Step 1: Analyze description"]
|
|
114
|
-
D -->
|
|
117
|
+
D --> DB{"Needs pre-spec brainstorm?"}
|
|
118
|
+
DB -->|Yes| DB2["Stop: run /hapo:brainstorm with same idea"]
|
|
119
|
+
DB -->|No| E{Clear enough?}
|
|
115
120
|
E -->|No| F["Ask user 1-2 clarifying questions"]
|
|
116
121
|
F --> D
|
|
117
122
|
E -->|Yes| G["Step 2: Scan specs/ for related specs"]
|
|
@@ -125,11 +130,11 @@ flowchart TD
|
|
|
125
130
|
H3 -->|No| K["Keep default scope"]
|
|
126
131
|
J --> L["Step 4: Init — create specs/<feature>/"]
|
|
127
132
|
K --> L
|
|
128
|
-
L --> M["Step
|
|
129
|
-
M --> N{
|
|
130
|
-
N -->|
|
|
131
|
-
O -->
|
|
132
|
-
N -->|
|
|
133
|
+
L --> M["Step 5A: Evidence Gate — scout + research"]
|
|
134
|
+
M --> N{Evidence sufficient?}
|
|
135
|
+
N -->|No| O["Ask user / run targeted scout / external research"]
|
|
136
|
+
O --> M
|
|
137
|
+
N -->|Yes| P["Step 5B: Requirements — write EARS"]
|
|
133
138
|
P --> Q["Step 6: Design — pick discovery mode"]
|
|
134
139
|
Q --> R["Write design.md"]
|
|
135
140
|
R --> S["Step 7: Tasks — split into individual files"]
|
|
@@ -148,6 +153,12 @@ flowchart TD
|
|
|
148
153
|
|
|
149
154
|
### Step 1: Analyze Description
|
|
150
155
|
- Assess clarity and complexity of the description
|
|
156
|
+
- Route to `hapo:brainstorm` before creating files when:
|
|
157
|
+
- the expected output or acceptance criteria are not concrete
|
|
158
|
+
- the scope boundary is unknown
|
|
159
|
+
- the request has 2-3 viable architectures and no user-approved direction
|
|
160
|
+
- the feature spans 3+ independent subsystems and needs decomposition
|
|
161
|
+
- the user is explicitly asking to explore, compare, debate, or decide
|
|
151
162
|
- **Multimodal & Document Auto-Ingestion (MANDATORY)**: If the input includes file paths or URLs pointing to images, audio, video, or Office documents, you MUST spawn the matching subagent to extract content BEFORE proceeding:
|
|
152
163
|
- `.mp3`, `.wav`, `.mp4`, `.mov`, `.jpg`, `.png`, `.webp` → `Task(subagent_type="hapo:ai-multimodal", prompt="Transcribe/Analyze [path]")`
|
|
153
164
|
- `.pdf` → `Task(subagent_type="hapo:pdf", prompt="Extract text and tables from [path]")`
|
|
@@ -185,12 +196,19 @@ Load: `references/scope-inquiry.md`
|
|
|
185
196
|
- `expansion_policy`: `requires-user-approval`
|
|
186
197
|
- Do NOT generate requirements, design, or tasks at this step
|
|
187
198
|
|
|
188
|
-
### Step 5: Requirements & Research
|
|
199
|
+
### Step 5: Evidence Gate, Requirements & Research
|
|
189
200
|
- Read `spec.json` — stop if init hasn't completed
|
|
190
201
|
- Stop if requirements already exist, unless user wants to regenerate
|
|
191
202
|
- Respect `scope_lock` — keep new requirements within `in_scope`
|
|
192
|
-
-
|
|
193
|
-
-
|
|
203
|
+
- Load `references/research-strategy.md` and `references/codebase-analysis.md`
|
|
204
|
+
- Classify evidence needs before writing requirements:
|
|
205
|
+
- **Targeted codebase scout is mandatory** when the spec changes existing behavior, touches an API/CLI/package export/schema/auth/session/permission/config/hook/runtime contract, lacks exact file paths, may invalidate tests, resumes an older spec, or crosses monorepo/package/runtime boundaries.
|
|
206
|
+
- **External/current research is mandatory** when the spec depends on third-party APIs, libraries, platform policies, AI providers/models/tooling, security/auth/payment/privacy/delete-data rules, performance/accessibility/SEO/security standards, or the user asks for "best", "optimal", "latest", "recommended", or equivalent.
|
|
207
|
+
- **Skip evidence gathering only** for trivial one-file edits, internal text/docs changes, isolated new files with no integration points, or decisions already backed by a recent user-provided report. Record the skip rationale in `research.md`.
|
|
208
|
+
- Codebase scout must be targeted, not a blind full-repo crawl. Identify relevant files/modules, current patterns, existing tests, contracts, and likely blast radius.
|
|
209
|
+
- External/current research must prefer official docs, standards, primary sources, or maintained upstream references. Record source links and the date/context of the finding.
|
|
210
|
+
- Write `research.md` before final requirements. It MUST include an Evidence Summary with: codebase scout result, external research result or skip rationale, selected decision, rejected alternatives, remaining gaps, and downstream task/test implications.
|
|
211
|
+
- If evidence exposes unresolved architecture choices, unclear acceptance criteria, or multiple viable approaches with no obvious winner, stop and route to `/hapo:brainstorm` instead of forcing a spec.
|
|
194
212
|
- Write requirements in **EARS** format (see `rules/ears-format.md`)
|
|
195
213
|
- **Feasibility Check:** Cross-check each requirement against known technical constraints from `research.md`.
|
|
196
214
|
- Each requirement gets a unique numeric ID
|
|
@@ -208,6 +226,7 @@ Load: `references/scope-inquiry.md`
|
|
|
208
226
|
- **full**: integration, security, schema, or performance
|
|
209
227
|
- Record findings in `research.md` before finalizing design
|
|
210
228
|
- Write `design.md` from template `templates/design.md` (see `rules/design-principles.md`)
|
|
229
|
+
- Design decisions MUST trace back to `research.md` evidence. If a design choice lacks evidence and is not a user-approved constraint, gather more evidence or ask the user before finalizing.
|
|
211
230
|
- Add diagrams only when design has multi-step or cross-boundary flows
|
|
212
231
|
- For auth/session, transport/entrypoint, persistence/schema, generated-artifact, or runtime-sensitive work, the design MUST fill the `Canonical Contracts & Invariants` section and tasks MUST inherit the same decisions verbatim.
|
|
213
232
|
- Update `spec.json` phase, timestamps, discovery mode
|
|
@@ -218,7 +237,8 @@ Load: `references/scope-inquiry.md`
|
|
|
218
237
|
- Load `rules/tasks-generation.md` for core principles
|
|
219
238
|
- Load `rules/tasks-parallel-analysis.md` for parallel markers (default: enabled)
|
|
220
239
|
- Each task file follows template `templates/task.md`
|
|
221
|
-
-
|
|
240
|
+
- `Related Files` and test plans must inherit paths, contracts, and test targets from the codebase scout. If exact files/tests cannot be named for an enhancement, run targeted inspect before generating tasks.
|
|
241
|
+
- Each task file MUST include `Completion Criteria` and `Task Test Plan & Verification Evidence` sections detailed enough that a downstream quality gate can prove the task is truly done.
|
|
222
242
|
- Build `spec.json.task_registry` alongside `task_files`. For each task file, register at minimum:
|
|
223
243
|
- `id`
|
|
224
244
|
- `title`
|
|
@@ -310,9 +330,10 @@ Load: `references/review.md` + `rules/design-review.md`
|
|
|
310
330
|
- FAIL if any path in `task_files` does not exist on disk
|
|
311
331
|
- FAIL if any task file exists on disk but is missing from `task_registry`
|
|
312
332
|
- FAIL if any path in `task_registry` does not exist on disk
|
|
333
|
+
- FAIL if a newly generated non-trivial spec lacks a `research.md` Evidence Summary with codebase scout result, external research result or skip rationale, selected decision, rejected alternatives, and downstream task/test implications.
|
|
313
334
|
- FAIL if any requirement or NFR mapping uses non-numeric labels (`NFR-1`, `SEC-1`, etc.)
|
|
314
|
-
- FAIL if a task lacks `Completion Criteria` or `Verification & Evidence`
|
|
315
|
-
- FAIL if accepted validation decisions exist in reports but are not reflected in the implementation-facing sections of affected artifacts (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `
|
|
335
|
+
- FAIL if a task lacks `Completion Criteria` or `Task Test Plan & Verification Evidence` (legacy `Verification & Evidence` is accepted only for pre-existing task files)
|
|
336
|
+
- FAIL if accepted validation decisions exist in reports but are not reflected in the implementation-facing sections of affected artifacts (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Task Test Plan & Verification Evidence`, canonical contracts, or requirements text).
|
|
316
337
|
- FAIL if the spec scope/provider was switched away from Anthropic/Claude but `requirements.md`, `design.md`, or `tasks/*.md` still contain stale provider-specific strings such as `Claude API`, `Haiku`, or `haiku_reachable`. `research.md` is the only allowed place for historical cost comparisons.
|
|
317
338
|
- FAIL if privacy/delete-data work lacks a single canonical deletion policy. The design MUST explicitly choose either:
|
|
318
339
|
1. hard-delete with no re-registration lock, or
|
|
@@ -440,13 +461,14 @@ specs/
|
|
|
440
461
|
### Pre-Finalization Checklist
|
|
441
462
|
Before finalizing any specification, assert all the following:
|
|
442
463
|
- [ ] **scope_lock** initialized and respected throughout all phases
|
|
464
|
+
- [ ] **Evidence Summary** exists in `research.md` with codebase scout, external research or skip rationale, selected decision, rejected alternatives, and task/test implications
|
|
443
465
|
- [ ] **EARS format** applied to all acceptance criteria in requirements.md
|
|
444
466
|
- [ ] **Numeric requirement IDs** assigned to every requirement
|
|
445
467
|
- [ ] **Discovery mode** selected and recorded in spec.json.design_context
|
|
446
468
|
- [ ] **Requirements traceability** matrix present in design.md
|
|
447
469
|
- [ ] **Canonical Contracts & Invariants** filled for auth/transport/persistence/artifact-sensitive work
|
|
448
470
|
- [ ] **Every task file** maps to at least 1 valid in-scope requirement ID
|
|
449
|
-
- [ ] **Every task file** includes `
|
|
471
|
+
- [ ] **Every task file** includes `Task Test Plan & Verification Evidence` with executable or inspectable proof
|
|
450
472
|
- [ ] **State Machine Blueprint:** design.md contains Mermaid diagrams for non-trivial flows
|
|
451
473
|
- [ ] **Dependency graph complete**: no task can start before its blockers are listed
|
|
452
474
|
- [ ] **Risk matrix filled**: likelihood × impact, with mitigation for High items
|
|
@@ -2,11 +2,26 @@
|
|
|
2
2
|
|
|
3
3
|
## Purpose
|
|
4
4
|
|
|
5
|
-
Understand the current codebase before designing solutions — ensure the new spec aligns with existing architecture, patterns, and
|
|
5
|
+
Understand the current codebase before designing solutions — ensure the new spec aligns with existing architecture, patterns, contracts, tests, and runtime boundaries.
|
|
6
6
|
|
|
7
7
|
## Skip Conditions
|
|
8
8
|
|
|
9
9
|
- Already provided with inspector reports → skip, use directly
|
|
10
|
+
- Greenfield artifact that does not integrate with existing code → record skip rationale in `research.md`
|
|
11
|
+
- Internal docs/text-only spec with no runtime behavior → record skip rationale in `research.md`
|
|
12
|
+
|
|
13
|
+
## Targeted Codebase Scout Gate
|
|
14
|
+
|
|
15
|
+
Run a targeted scout before requirements when any of these are true:
|
|
16
|
+
|
|
17
|
+
- The feature modifies existing behavior, UI, API, CLI, data flow, runtime config, hooks, settings, generated artifacts, or package exports.
|
|
18
|
+
- The feature touches database schemas, migrations, auth/session, permissions, external integrations, or shared contracts.
|
|
19
|
+
- The task may break existing tests, snapshots, build scripts, type checks, e2e flows, or docs generation.
|
|
20
|
+
- The spec crosses monorepo boundaries such as source package → installed `.claude/`, library package → docs app, package manifest → publish/install runtime.
|
|
21
|
+
- `Related Files` cannot be named precisely yet.
|
|
22
|
+
- A resumed or validated spec may be stale because files, tests, dependencies, or contracts changed since the spec was created.
|
|
23
|
+
|
|
24
|
+
The scout must be narrow and question-driven. Do not scan the whole repo just because the repo is available.
|
|
10
25
|
|
|
11
26
|
## 4 Mandatory Files to Read First
|
|
12
27
|
|
|
@@ -24,6 +39,19 @@ Understand the current codebase before designing solutions — ensure the new sp
|
|
|
24
39
|
- **HALT** the spec process immediately.
|
|
25
40
|
- Ask the User: *"No codebase documentation found. Exploring blind will drain tokens and produce inaccurate specs. Shall I trigger `docs-keeper` or `/hapo:docs` to generate a baseline `codebase-summary.md` first?"*
|
|
26
41
|
|
|
42
|
+
## Scout Output Contract
|
|
43
|
+
|
|
44
|
+
Record the concise findings in `research.md`; if inspector agents are used, save detailed output to `reports/inspect-report.md`.
|
|
45
|
+
|
|
46
|
+
Required output:
|
|
47
|
+
- **Project surface:** project type, package/workspace boundaries, languages, frameworks, and relevant commands.
|
|
48
|
+
- **Relevant files/modules:** exact paths likely to be created, modified, deleted, or read.
|
|
49
|
+
- **Existing patterns:** naming, architecture, state/data flow, error handling, testing, and docs conventions that tasks must follow.
|
|
50
|
+
- **Contracts:** API/CLI/schema/auth/config/runtime/package/export contracts affected by the spec.
|
|
51
|
+
- **Tests and verification:** existing tests/checks likely to pass, fail, or require updates.
|
|
52
|
+
- **Blast radius:** affected modules, consumers, generated artifacts, publish/install paths, and rollback considerations.
|
|
53
|
+
- **Staleness check:** docs or prior specs that conflict with source code or manifests.
|
|
54
|
+
|
|
27
55
|
## Analysis Activities
|
|
28
56
|
|
|
29
57
|
### 1. Environment Analysis
|
|
@@ -38,7 +66,7 @@ Before designing any logic, you must identify and read the existing schemas:
|
|
|
38
66
|
- Identify Global State setups (Redux stores, Zustand, React Context).
|
|
39
67
|
- Output the relational impact: How will the new feature alter existing tables or state structures?
|
|
40
68
|
|
|
41
|
-
###
|
|
69
|
+
### 3. Pattern Recognition
|
|
42
70
|
- Study existing patterns in codebase
|
|
43
71
|
- Identify conventions and architectural decisions
|
|
44
72
|
- Note consistency in implementation approaches
|
|
@@ -61,6 +89,7 @@ Write a "Collateral Damage" section in your `research.md`:
|
|
|
61
89
|
- Each inspector targets a specific aspect of the task
|
|
62
90
|
- Wait for all inspectors to report before analysis
|
|
63
91
|
- Save results to `reports/inspect-report.md`
|
|
92
|
+
- If the scout cannot name exact paths/tests after inspection, stop and ask a grounded question instead of generating vague tasks
|
|
64
93
|
|
|
65
94
|
## Best Practices
|
|
66
95
|
|
|
@@ -69,3 +98,5 @@ Write a "Collateral Damage" section in your `research.md`:
|
|
|
69
98
|
- Document patterns found for consistency
|
|
70
99
|
- Note any inconsistencies or technical debt
|
|
71
100
|
- Consider impact on existing features
|
|
101
|
+
- Use `rg`/targeted search terms or inspector agents before broad traversal
|
|
102
|
+
- Pass exact file and test findings downstream into `design.md` and task `Related Files`
|
|
@@ -2,12 +2,34 @@
|
|
|
2
2
|
|
|
3
3
|
## Purpose
|
|
4
4
|
|
|
5
|
-
Provide tools and methods to gather necessary information before writing requirements and design.
|
|
5
|
+
Provide tools and methods to gather necessary information before writing requirements and design. The goal is evidence-backed decision-making: use the current codebase and current external knowledge before locking requirements, architecture, and tasks.
|
|
6
6
|
|
|
7
7
|
## Skip Conditions
|
|
8
8
|
|
|
9
|
-
- Simple task
|
|
10
|
-
-
|
|
9
|
+
- Simple one-file task with no integration point → record skip rationale in `research.md`
|
|
10
|
+
- Internal text/docs-only change → record skip rationale in `research.md`
|
|
11
|
+
- User already provided recent research reports → use directly, but record what was reused
|
|
12
|
+
|
|
13
|
+
Skipping research does NOT mean skipping the evidence trail. Every non-trivial spec still needs an Evidence Summary in `research.md`.
|
|
14
|
+
|
|
15
|
+
## Evidence Gate Triggers
|
|
16
|
+
|
|
17
|
+
### Targeted Codebase Scout — Mandatory When
|
|
18
|
+
|
|
19
|
+
- The spec changes existing behavior rather than creating an isolated new artifact.
|
|
20
|
+
- The spec touches API routes, CLI commands, package exports, database schemas, migrations, auth/session, permissions, runtime config, hooks, generated artifacts, or settings.
|
|
21
|
+
- Requirements or tasks cannot name exact affected files, modules, tests, or contracts.
|
|
22
|
+
- The change may invalidate existing `.test.*`, `.spec.*`, e2e, build, or integration checks.
|
|
23
|
+
- The spec is being resumed or validated after the codebase may have changed.
|
|
24
|
+
- The work crosses monorepo, package source, installed runtime, docs site, or publish/install boundaries.
|
|
25
|
+
|
|
26
|
+
### External / Current Research — Mandatory When
|
|
27
|
+
|
|
28
|
+
- The spec depends on third-party APIs, libraries, SDKs, browser/platform policies, package manager behavior, cloud services, or external protocols.
|
|
29
|
+
- The spec touches security, auth, payment, privacy, delete-data, compliance, performance, accessibility, SEO, or current framework best practices.
|
|
30
|
+
- The spec involves AI providers, model behavior, agent tooling, browser automation, or fast-moving platform constraints.
|
|
31
|
+
- The user asks for "best", "optimal", "latest", "recommended", "current", "modern", or equivalent.
|
|
32
|
+
- Existing internal docs are stale, incomplete, or contradict package manifests/source code.
|
|
11
33
|
|
|
12
34
|
## 7 Research Tools
|
|
13
35
|
|
|
@@ -23,27 +45,50 @@ Provide tools and methods to gather necessary information before writing require
|
|
|
23
45
|
|
|
24
46
|
## Workflow
|
|
25
47
|
|
|
26
|
-
### 1.
|
|
48
|
+
### 1. Classify Evidence Needs
|
|
27
49
|
Before detailing requirements, list unanswered questions:
|
|
28
50
|
- Which technology is most suitable?
|
|
29
51
|
- Is there an existing pattern/library that solves this?
|
|
30
52
|
- How does the current codebase handle similar functionality?
|
|
31
53
|
- Are there technical risks that need verification?
|
|
54
|
+
- What evidence is needed from the repository?
|
|
55
|
+
- What evidence requires current external sources?
|
|
56
|
+
|
|
57
|
+
### 2. Run Targeted Codebase Scout
|
|
58
|
+
- Read project docs first, then verify claims against source files such as `package.json`, `go.mod`, schemas, routes, tests, and runtime config.
|
|
59
|
+
- Use inspector agents for large codebases or when multiple focused searches are needed.
|
|
60
|
+
- Save scout details to `reports/inspect-report.md` when inspector agents are used.
|
|
61
|
+
- Record the useful summary in `research.md`; do not dump raw search output.
|
|
32
62
|
|
|
33
|
-
###
|
|
63
|
+
### 3. Run External / Current Research
|
|
64
|
+
- Prefer official documentation, standards, release notes, package repositories, or maintained upstream examples.
|
|
65
|
+
- Use broader web search only when primary sources do not answer the question.
|
|
66
|
+
- Record links and explain why each source matters.
|
|
67
|
+
- If sources conflict, state which source wins and why.
|
|
68
|
+
|
|
69
|
+
### 4. Pick the Right Tool
|
|
34
70
|
- Framework/API questions → Docs seeker
|
|
35
71
|
- Current codebase questions → Inspector agents
|
|
36
72
|
- Architecture/approach questions → Researcher agents
|
|
37
73
|
- Complex multi-step reasoning → Sequential thinking
|
|
38
74
|
- Historical decision questions → GitHub analysis
|
|
39
75
|
|
|
40
|
-
###
|
|
76
|
+
### 5. Spawn Researchers (when needed)
|
|
41
77
|
- Max 2 agents running in parallel
|
|
42
78
|
- Each agent gets a specific aspect (e.g., agent 1 researches auth approach, agent 2 researches database schema)
|
|
43
79
|
- Limit each agent to max 5 tool calls
|
|
44
80
|
- Wait for all agents to complete before synthesizing
|
|
45
81
|
|
|
46
|
-
###
|
|
82
|
+
### 6. Synthesize Decisions
|
|
83
|
+
- Convert raw findings into decisions before writing requirements:
|
|
84
|
+
- selected approach
|
|
85
|
+
- rejected alternatives
|
|
86
|
+
- codebase fit
|
|
87
|
+
- external/current constraints
|
|
88
|
+
- downstream task and test implications
|
|
89
|
+
- If evidence leaves multiple viable choices with no obvious winner, route to `/hapo:brainstorm` or ask the user for a decision.
|
|
90
|
+
|
|
91
|
+
### 7. Record Findings
|
|
47
92
|
- Write to `research.md` using template `templates/research.md`
|
|
48
93
|
- Save researcher reports to `reports/researcher-{NN}.md`
|
|
49
94
|
- Save inspector reports to `reports/inspect-report.md`
|
|
@@ -55,3 +100,5 @@ Before detailing requirements, list unanswered questions:
|
|
|
55
100
|
- Identify multiple approaches for comparison
|
|
56
101
|
- Consider edge cases during research
|
|
57
102
|
- Flag security concerns early
|
|
103
|
+
- Do not design from memory when repository or current external evidence can settle the decision
|
|
104
|
+
- Keep external research concise: source, finding, implication, decision
|
|
@@ -42,7 +42,7 @@ These rules override any self-reasoning or optimization the system may attempt:
|
|
|
42
42
|
4. **Apply YAGNI to fixes.** When user says "configure later" or "decide later", add a single note to the task file. Do NOT generate multiple concrete implementations (e.g., 4 provider files when user only asked for abstraction).
|
|
43
43
|
5. **No false completion.** You MUST NOT set `validation.status = "completed"` or `ready_for_implementation = true` until a reconciliation audit proves the accepted findings and validation decisions are reflected in the physical spec artifacts.
|
|
44
44
|
6. **Provider drift is a real defect.** If the scope changed away from Claude/Anthropic, stale strings like `Claude API`, `Haiku`, or `haiku_reachable` in `requirements.md`, `design.md`, or `tasks/*.md` are validation failures. `research.md` may mention them only as historical comparison.
|
|
45
|
-
7. **Implementation-facing propagation is mandatory.** A decision that affects implementation is NOT considered applied if it only appears in `Risk Assessment`, `validate-log.md`, or `red-team-report.md`. It must update at least one of: `requirements.md`, `Canonical Contracts & Invariants`, `Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, or `
|
|
45
|
+
7. **Implementation-facing propagation is mandatory.** A decision that affects implementation is NOT considered applied if it only appears in `Risk Assessment`, `validate-log.md`, or `red-team-report.md`. It must update at least one of: `requirements.md`, `Canonical Contracts & Invariants`, `Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, or `Task Test Plan & Verification Evidence`.
|
|
46
46
|
|
|
47
47
|
---
|
|
48
48
|
|
|
@@ -136,11 +136,11 @@ Every task file MUST contain the Risk Assessment table, even if no risks are ide
|
|
|
136
136
|
- Never mark implementation work or integration-critical verification as optional—reserve `*` for auxiliary/deferrable test coverage that can be revisited post-MVP.
|
|
137
137
|
- Never mark auth, permissions, privacy, data deletion, migration, schema, or contract verification work as optional.
|
|
138
138
|
|
|
139
|
-
### Mandatory
|
|
139
|
+
### Mandatory Task Test Plan & Verification Evidence
|
|
140
140
|
|
|
141
|
-
Every task file MUST include a `##
|
|
141
|
+
Every new task file MUST include a `## Task Test Plan & Verification Evidence` section. Existing specs may still use the legacy `## Verification & Evidence` heading; readers and sync tools must support both.
|
|
142
142
|
|
|
143
|
-
That section MUST contain:
|
|
143
|
+
That section is the task-level test plan and MUST contain:
|
|
144
144
|
1. **Automated proof** — exact command(s) for typecheck, tests, build, or explicit `N/A`
|
|
145
145
|
2. **Artifact/runtime proof** — exact files, routes, UI surfaces, generated outputs, or persisted state to inspect
|
|
146
146
|
3. **Contract/negative-path proof** — at least one contract-preserving check for unauthorized, invalid, missing-permission, rollback, or failure-path behavior when relevant
|
|
@@ -17,6 +17,52 @@
|
|
|
17
17
|
- Finding 2
|
|
18
18
|
- Finding 3
|
|
19
19
|
|
|
20
|
+
## Evidence Summary
|
|
21
|
+
This section is mandatory for non-trivial specs. It must be written before finalizing requirements, design, or tasks.
|
|
22
|
+
|
|
23
|
+
- **Codebase Scout**: Required / Skipped
|
|
24
|
+
- Result or skip rationale:
|
|
25
|
+
- Relevant files/modules:
|
|
26
|
+
- Existing patterns/contracts:
|
|
27
|
+
- Tests or checks affected:
|
|
28
|
+
- **External / Current Research**: Required / Skipped
|
|
29
|
+
- Result or skip rationale:
|
|
30
|
+
- Primary sources:
|
|
31
|
+
- Current constraints or best practices:
|
|
32
|
+
- **Selected Decision**:
|
|
33
|
+
- Decision:
|
|
34
|
+
- Why it fits the current codebase:
|
|
35
|
+
- Why it fits current external constraints:
|
|
36
|
+
- **Rejected Alternatives**:
|
|
37
|
+
- Alternative 1 — rejection reason
|
|
38
|
+
- Alternative 2 — rejection reason
|
|
39
|
+
- **Remaining Gaps / Questions**:
|
|
40
|
+
- Gap 1
|
|
41
|
+
- Gap 2
|
|
42
|
+
- **Downstream Task & Test Implications**:
|
|
43
|
+
- Task implication:
|
|
44
|
+
- Test/verification implication:
|
|
45
|
+
|
|
46
|
+
## Codebase Scout
|
|
47
|
+
Capture only useful repo evidence, not raw file dumps.
|
|
48
|
+
|
|
49
|
+
| Area | Finding | Evidence / Path | Implication |
|
|
50
|
+
|------|---------|-----------------|-------------|
|
|
51
|
+
| Project surface | | | |
|
|
52
|
+
| Relevant files/modules | | | |
|
|
53
|
+
| Existing patterns | | | |
|
|
54
|
+
| Contracts | | | |
|
|
55
|
+
| Tests and verification | | | |
|
|
56
|
+
| Blast radius | | | |
|
|
57
|
+
| Staleness / conflicts | | | |
|
|
58
|
+
|
|
59
|
+
## External / Current Research
|
|
60
|
+
Use official docs, standards, package repos, release notes, or maintained upstream references first.
|
|
61
|
+
|
|
62
|
+
| Question | Source | Finding | Decision Impact |
|
|
63
|
+
|----------|--------|---------|-----------------|
|
|
64
|
+
| | | | |
|
|
65
|
+
|
|
20
66
|
## Research Log
|
|
21
67
|
Document notable investigation steps and their outcomes. Group entries by topic for readability.
|
|
22
68
|
|
|
@@ -58,7 +58,9 @@
|
|
|
58
58
|
- [ ] {{Criteria 2 — measurable behavior or negative-path outcome}}
|
|
59
59
|
- [ ] {{Criteria 3 — maps directly to acceptance criteria from requirements.md and can be proven below}}
|
|
60
60
|
|
|
61
|
-
##
|
|
61
|
+
## Task Test Plan & Verification Evidence
|
|
62
|
+
|
|
63
|
+
This section is the task-level test plan. It names the exact commands, observable runtime/artifact proof, and negative-path checks required before this task can be marked done.
|
|
62
64
|
|
|
63
65
|
- [ ] Automated verification
|
|
64
66
|
- Command(s): `{{TYPECHECK / TEST / BUILD COMMANDS OR N/A}}`
|
|
@@ -82,4 +84,4 @@
|
|
|
82
84
|
> **Parallel marker**: Append `(P)` to the title if this task can run concurrently with another (usually when serving different requirements).
|
|
83
85
|
> **Test note**: If a test coverage sub-task can be deferred post-MVP, mark it with `- [ ]*`.
|
|
84
86
|
> **Requirement mapping**: Every sub-task MUST end with `_Requirements: X.X_`. No mapping = invalid task file.
|
|
85
|
-
> **Verification rule**: No `##
|
|
87
|
+
> **Verification rule**: No `## Task Test Plan & Verification Evidence` section = invalid task file. Existing specs may use legacy `## Verification & Evidence`; agents must support both headings.
|
|
@@ -34,8 +34,8 @@ Scans the `spec.json` against all physical `task-R*.md` files to detect mismatch
|
|
|
34
34
|
|
|
35
35
|
1. **Precision Edits:** Never overwrite the entire `spec.json` string blindly. Update only the required keys, while keeping JSON valid.
|
|
36
36
|
2. **Machine + Human Sync:** Every task status update MUST modify both `spec.json.task_registry[...]` and the matching markdown task file header/status section.
|
|
37
|
-
3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Implementation Steps` and relevant `Completion Criteria` / `
|
|
38
|
-
4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
|
|
37
|
+
3. **Markdown Integrity:** When marking a task `done`, only then turn `[ ]` into `[x]` inside `## Implementation Steps` and relevant `Completion Criteria` / `Task Test Plan & Verification Evidence` checkboxes that have actual proof. Legacy `Verification & Evidence` sections are supported.
|
|
38
|
+
4. **Verification Receipt Rule:** `done` is illegal without a human-readable verification receipt already present in `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence` (commands executed, artifact/runtime proof, or equivalent concrete evidence). If proof is missing, keep the task `in_progress` or `blocked`.
|
|
39
39
|
5. **Task Docs Hook:** Every time `hapo:sync` marks a task as `done`, it must flag that a task-level docs checkpoint is now due for that verified task.
|
|
40
40
|
6. **Phase Prompt Rule:** When `hapo:sync` marks the final pending task in the whole feature as `done`, it should automatically prompt the user if they'd like to advance the phase, but only after the docs checkpoint for that last completed task has been considered.
|
|
41
41
|
|
|
@@ -15,7 +15,7 @@ When requested to update a phase or change task configuration, `spec.json` must
|
|
|
15
15
|
- full relative path like `tasks/task-R0-02-extension-shell.md`
|
|
16
16
|
* **Status Update:** If a task changes to `blocked`, the matching `task_registry[path].status` must become `"blocked"`, `task_registry[path].blocker` must record the reason, and `spec.json.status` / `spec.json.blocker` must reflect the top-level block if work is globally blocked.
|
|
17
17
|
* **Timestamp Rule:** Update `task_registry[path].started_at`, `completed_at`, and `last_updated_at` consistently with the new state. Also refresh `spec.json.updated_at`.
|
|
18
|
-
* **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
|
|
18
|
+
* **Done-State Rule:** Never set `task_registry[path].status = "done"` unless the matching markdown task file already contains a verification receipt in `## Task Test Plan & Verification Evidence` or legacy `## Verification & Evidence`, or the caller explicitly provides proof that can be written there first.
|
|
19
19
|
* **Receipt Integrity Rule:** A valid verification receipt must include the exact commands run, their outcomes, and artifact/runtime proof. Receipts containing `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit "placeholder / simplified for MVP / production later" contract deviations are not eligible for `done`.
|
|
20
20
|
* **Contract Fidelity Rule:** If the task file notes or evidence show that a named framework/auth/runtime choice from the spec was silently replaced, sync MUST refuse `done` until the spec is amended or the implementation is corrected.
|
|
21
21
|
* **Task Docs Rule:** After a task is moved to `done`, emit a short alert that a task-level docs checkpoint is due for this verified task.
|
|
@@ -27,12 +27,12 @@ The structure of `tasks/task.md` relies heavily on exact keyword markers. Follow
|
|
|
27
27
|
### A. Completing a Task
|
|
28
28
|
When `/hapo:sync <feature> <task-id> done`:
|
|
29
29
|
1. Find: `**Status:** pending` (or `in_progress` / `blocked`).
|
|
30
|
-
2. Inspect `##
|
|
30
|
+
2. Inspect `## Task Test Plan & Verification Evidence` first. If the task uses legacy `## Verification & Evidence`, inspect that section instead. If it has no explicit proof lines (commands run, artifact proof, runtime proof, or blockers cleared), STOP and refuse to mark the task done.
|
|
31
31
|
3. Refuse completion if the receipt contains any non-passing marker such as `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or an explicit note that the implementation substituted a named contract with a placeholder/custom simplification.
|
|
32
32
|
4. Replace with: `**Status:** done`.
|
|
33
33
|
5. Locate block: `## Implementation Steps`.
|
|
34
34
|
6. Convert `- [ ]` into `- [x]` strictly within that section.
|
|
35
|
-
7. Update relevant checkboxes in `## Completion Criteria` and `##
|
|
35
|
+
7. Update relevant checkboxes in `## Completion Criteria` and `## Task Test Plan & Verification Evidence` only when the caller provides or the file already contains real proof. For legacy task files, update `## Verification & Evidence` instead.
|
|
36
36
|
8. Surface a note such as: `Docs checkpoint due: task Rn-mm just completed`.
|
|
37
37
|
|
|
38
38
|
### B. Blocking a Task
|
|
@@ -59,7 +59,7 @@ When `/hapo:sync audit <feature>` is activated:
|
|
|
59
59
|
- Missing disk file referenced in registry → remove or flag it
|
|
60
60
|
- Markdown says `done` but registry not done → registry wins only if evidence already exists; otherwise downgrade markdown or flag conflict
|
|
61
61
|
- Registry says `done` but markdown still pending → update markdown only if evidence exists
|
|
62
|
-
- Either side says `done` but `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
|
|
62
|
+
- Either side says `done` but `## Task Test Plan & Verification Evidence` / legacy `## Verification & Evidence` has no concrete proof → downgrade to `in_progress` or flag conflict instead of preserving fake completion
|
|
63
63
|
- Either side says `done` but the receipt contains `PRECHECK_FAIL`, `FAIL`, `UNVERIFIED`, or explicit contract-substitution notes → downgrade to `in_progress` or flag conflict
|
|
64
64
|
5. **Correction Alert:** Output a brief markdown alert detailing mismatches fixed and any unresolved conflicts requiring manual review.
|
|
65
65
|
6. **Task Docs Alert:** If audit reveals tasks newly marked `done`, include whether task-level docs sync appears still due or already accounted for in the current run summary.
|
|
@@ -27,6 +27,7 @@ Designed to work **after `hapo:develop`**. Standalone `/hapo:test` uses the same
|
|
|
27
27
|
NEVER claim tests pass when they were NOT actually executed.
|
|
28
28
|
NEVER mock, stub, or skip a failing test to produce a green result.
|
|
29
29
|
If no test command is detected, report NO_TESTS — do not fabricate results.
|
|
30
|
+
If a test command exits 0 but runs 0 tests, report NO_TESTS — this is a green lie, not a PASS.
|
|
30
31
|
If tests fail, list every failure explicitly — do not summarize failures away.
|
|
31
32
|
</HARD-GATE>
|
|
32
33
|
|
|
@@ -65,7 +66,8 @@ affected by recent file changes. See `references/execution-strategy.md` Phase A.
|
|
|
65
66
|
**Code testing (default):**
|
|
66
67
|
1. Pre-flight: run typecheck/lint to catch compile errors first
|
|
67
68
|
2. Execute test command with coverage flags
|
|
68
|
-
3. Collect
|
|
69
|
+
3. Collect test counts, coverage percentages, and fail stack traces
|
|
70
|
+
4. Treat 0 executed tests as `NO_TESTS`, even if the command exits 0
|
|
69
71
|
|
|
70
72
|
**UI verification (`--ui` / `--ui-auth` / `--ui-flow`):**
|
|
71
73
|
Execute multi-page discovery, then spawn **Parallel UI Subagents** (test-runner instances) to handle Smoke, Core-Vitals, Accessibility, SEO, Security, and User Flows simultaneously.
|
|
@@ -146,6 +148,7 @@ It merges the JSON data into `.hapo/test-memory.json` per `references/test-memor
|
|
|
146
148
|
|
|
147
149
|
- `references/execution-strategy.md` — Blast-radius algorithm, auto-detect logic, UI verification phases (A–E)
|
|
148
150
|
- `references/failure-triage.md` — Failure categories, triage decision tree, escalation rules
|
|
151
|
+
- `references/test-memory.md` — `.hapo/test-memory.json` schema and merge rules
|
|
149
152
|
|
|
150
153
|
## Related
|
|
151
154
|
|
|
@@ -84,6 +84,9 @@ cargo test
|
|
|
84
84
|
flutter test --coverage
|
|
85
85
|
```
|
|
86
86
|
|
|
87
|
+
After each command, parse the runner output for executed test count. A successful
|
|
88
|
+
exit with 0 executed tests is `NO_TESTS`, not `PASS`.
|
|
89
|
+
|
|
87
90
|
### Coverage Thresholds
|
|
88
91
|
|
|
89
92
|
| Metric | Minimum | Focus Areas |
|
|
@@ -360,4 +363,3 @@ Flag as `Security Warning` if:
|
|
|
360
363
|
- API keys, secrets, or JWT tokens visible in page HTML
|
|
361
364
|
- Mixed content (HTTP resources on HTTPS page) detected via network audit
|
|
362
365
|
- `autocomplete="off"` missing on password fields
|
|
363
|
-
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# Test Memory
|
|
2
2
|
|
|
3
|
-
The
|
|
3
|
+
The `.hapo/test-memory.json` file in the target project serves as the Long-Term Memory for the testing ecosystem.
|
|
4
4
|
|
|
5
5
|
## Schema
|
|
6
6
|
|
|
@@ -37,4 +37,4 @@ Example:
|
|
|
37
37
|
</lessons_learned>
|
|
38
38
|
```
|
|
39
39
|
|
|
40
|
-
The orchestrating `hapo:test` skill (Phase 4) then intercepts this block and automatically merges it into
|
|
40
|
+
The orchestrating `hapo:test` skill (Phase 4) then intercepts this block and automatically merges it into `.hapo/test-memory.json`.
|