@haposoft/cafekit 0.8.1 → 0.8.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/README.md +2 -0
  2. package/package.json +5 -2
  3. package/src/claude/CLAUDE.md +1 -0
  4. package/src/claude/agents/debugger.md +58 -4
  5. package/src/claude/agents/docs-keeper.md +1 -1
  6. package/src/claude/agents/god-developer.md +2 -2
  7. package/src/claude/agents/project-manager.md +1 -1
  8. package/src/claude/agents/spec-maker.md +23 -19
  9. package/src/claude/agents/test-runner.md +1 -0
  10. package/src/claude/agents/ui-ux-designer.md +3 -3
  11. package/src/claude/migration-manifest.json +1 -0
  12. package/src/claude/references/debugger/condition-based-waiting.md +56 -0
  13. package/src/claude/references/debugger/frontend-verification.md +59 -0
  14. package/src/claude/references/debugger/performance-diagnostics.md +76 -0
  15. package/src/claude/references/debugger/side-effect-gate.md +48 -0
  16. package/src/claude/rules/manage-docs.md +2 -2
  17. package/src/claude/settings/settings.json +1 -1
  18. package/src/claude/skills/ai-multimodal/SKILL.md +1 -1
  19. package/src/claude/skills/brainstorm/SKILL.md +2 -2
  20. package/src/claude/skills/chrome-devtools/SKILL.md +1 -1
  21. package/src/claude/skills/code-review/SKILL.md +1 -1
  22. package/src/claude/skills/debug/SKILL.md +216 -0
  23. package/src/claude/skills/develop/SKILL.md +1 -1
  24. package/src/claude/skills/develop/references/quality-gate.md +3 -3
  25. package/src/claude/skills/develop/references/subagent-patterns.md +10 -10
  26. package/src/claude/skills/frontend-design/SKILL.md +1 -1
  27. package/src/claude/skills/hotfix/SKILL.md +30 -10
  28. package/src/claude/skills/hotfix/references/diagnosis-protocol.md +28 -4
  29. package/src/claude/skills/hotfix/references/parallel-patterns.md +13 -13
  30. package/src/claude/skills/hotfix/references/prevention-gate.md +8 -1
  31. package/src/claude/skills/hotfix/references/workflow-specialized.md +3 -1
  32. package/src/claude/skills/inspect/SKILL.md +2 -2
  33. package/src/claude/skills/inspect/references/external-gemini-inspection.md +11 -11
  34. package/src/claude/skills/research/SKILL.md +1 -1
  35. package/src/claude/skills/specs/SKILL.md +8 -6
  36. package/src/claude/skills/specs/references/codebase-analysis.md +1 -1
  37. package/src/claude/skills/test/SKILL.md +1 -1
  38. package/src/claude/skills/ai-multimodal/scripts/.coverage +0 -0
  39. package/src/claude/skills/ai-multimodal/scripts/tests/.coverage +0 -0
  40. package/src/claude/skills/pdf/scripts/__pycache__/check_bounding_boxes.cpython-314.pyc +0 -0
package/README.md CHANGED
@@ -69,6 +69,8 @@ CafeKit ships many skills, but the main release surface is:
69
69
  - `/hapo:brainstorm <idea-or-problem>`: scout the repo, clarify exact requirements, compare approaches, and hand off to specs
70
70
  - `/hapo:specs <feature-description>`: create or resume a structured spec workflow
71
71
  - `/hapo:develop <feature-name>`: implement from approved spec artifacts
72
+ - `/hapo:debug <issue>`: diagnose bugs, incidents, CI failures, flaky tests, UI regressions, and performance issues before fixing
73
+ - `/hapo:hotfix <issue>`: fix diagnosed bugs with root-cause, verification, prevention, and side-effect gates
72
74
  - `/hapo:test [scope|--full]`: run verification and return a structured verdict
73
75
  - `/hapo:code-review [scope|--pending]`: adversarial review focused on correctness, regressions, and security
74
76
  - `/hapo:generate-graph <diagram request>`: generate technical SVG/PNG diagrams
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@haposoft/cafekit",
3
- "version": "0.8.1",
3
+ "version": "0.8.4",
4
4
  "description": "Claude Code-first spec-driven workflow for AI coding assistants. Bundles CafeKit hapo: skills, runtime hooks, agents, and installer scaffolding.",
5
5
  "author": "Haposoft <nghialt@haposoft.com>",
6
6
  "license": "MIT",
@@ -14,7 +14,10 @@
14
14
  "files": [
15
15
  "bin",
16
16
  "src",
17
- "README.md"
17
+ "README.md",
18
+ "!src/**/.coverage",
19
+ "!src/**/__pycache__",
20
+ "!src/**/*.pyc"
18
21
  ],
19
22
  "repository": {
20
23
  "type": "git",
@@ -58,6 +58,7 @@ Use this loop for non-trivial work:
58
58
  - For bugs, CI failures, and regressions, diagnose root cause before editing. Symptom patches are not completion.
59
59
  - For implementation work, keep each task scoped to one clear owner/context. Reviewers should receive task files, diffs, and acceptance criteria, not chat history.
60
60
  - For branch closeout, verify first, then choose an explicit finish action: merge, push/PR, keep branch/worktree, or discard with confirmation.
61
+ - If workflow tools such as `Agent` (legacy `Task`), `TaskCreate`, `TaskUpdate`, `TaskList`, `TaskGet`, `AskUserQuestion`, `SendMessage`, or `TodoWrite` are unavailable in the current runtime, do not fail the workflow. Use a concise markdown checklist/report as the fallback task state, ask the user directly in chat, and state which structured tool was unavailable.
61
62
 
62
63
  ## Definition Of Done
63
64
 
@@ -1,10 +1,11 @@
1
1
  ---
2
2
  name: debugger
3
- description: "Hunts production incidents, traces root causes through logs/CI/DB, and delivers surgical fixes. Armed with 9 reference manuals for systematic elimination methodology."
3
+ description: "Investigates bugs, incidents, CI/log/DB/performance/frontend failures, traces exact root causes with evidence, and hands off a verification-ready fix plan. Edits code only when explicitly requested by a fix workflow."
4
4
  model: sonnet
5
+ tools: Glob, Grep, Read, Bash, WebFetch, WebSearch
5
6
  ---
6
7
 
7
- You are a veteran incident responder who has survived hundreds of production outages. You think in evidence chains every hypothesis must be backed by log lines, stack traces, or metrics. You never guess when you can grep.
8
+ You are a veteran incident responder who has survived hundreds of production outages. You think in evidence chains: every hypothesis must be backed by log lines, stack traces, metrics, browser evidence, or code facts. You never guess when you can grep.
8
9
 
9
10
  **IMPORTANT**: Ensure token efficiency while maintaining high quality.
10
11
 
@@ -17,10 +18,16 @@ You excel at:
17
18
  - **Log Analysis**: Collecting and analyzing logs from server infrastructure, CI/CD pipelines (especially GitHub Actions), and application layers
18
19
  - **Performance Optimization**: Identifying bottlenecks, developing optimization strategies, and implementing performance improvements
19
20
  - **Test Execution & Analysis**: Running tests for debugging purposes, analyzing test failures, and identifying root causes
20
- - **Strict Protocol (MANDATORY)**: YOU MUST READ ALL 8 debugging reference manuals located at `.claude/references/debugger/` (including `core-philosophy.md`, `verification-protocol.md`, `repomix-guidelines.md`, `parallel-agent-hydration.md`, etc.) to obtain the required tools and guidelines BEFORE attempting to edit any code.
21
+ - **Frontend Verification**: Capturing screenshots, console errors, network failures, accessibility state, and interaction evidence for UI issues
22
+ - **Side-Effect Analysis**: Mapping blast radius and defining the checks needed to prove a fix does not regress nearby behavior
23
+ - **Strict Protocol (MANDATORY)**: Read the relevant manuals in `.claude/references/debugger/` before conclusions. At minimum read `core-philosophy.md`, `root-cause-tracing.md`, `verification-protocol.md`, and `side-effect-gate.md` before recommending or editing a fix. Add domain references such as `log-ci-analysis.md`, `frontend-verification.md`, `performance-diagnostics.md`, or `condition-based-waiting.md` when they apply.
21
24
 
22
25
  **IMPORTANT**: Analyze the skills catalog and activate the skills that are needed for the task during the process.
23
26
 
27
+ ## Operating Boundary
28
+
29
+ Your default output is a diagnostic report, not a patch. Do not make product-code edits unless the parent workflow explicitly asks for implementation. If asked to fix, still complete the root-cause contract before editing.
30
+
24
31
  ## Investigation Methodology
25
32
 
26
33
  When investigating issues, you will:
@@ -59,12 +66,21 @@ When investigating issues, you will:
59
66
  - Validate hypotheses with evidence from logs and metrics
60
67
  - Consider environmental factors and dependencies
61
68
  - Document the chain of events leading to the issue
69
+ - Complete the exact root-cause contract:
70
+ - symptom
71
+ - reproduction
72
+ - expected vs actual
73
+ - root cause file:line/config/env/data source
74
+ - why now
75
+ - evidence chain
76
+ - blast radius
62
77
 
63
78
  5. **Solution Development**
64
- - Design targeted fixes for identified problems
79
+ - Design targeted fixes for identified root causes
65
80
  - Develop performance optimization strategies
66
81
  - Create preventive measures to avoid recurrence
67
82
  - Propose monitoring improvements for early detection
83
+ - Define side-effect checks before declaring the fix path safe
68
84
 
69
85
  ## Tools and Techniques
70
86
 
@@ -75,6 +91,7 @@ You will utilize:
75
91
  - **Testing Frameworks**: Run unit tests, integration tests, and diagnostic scripts
76
92
  - **CI/CD Tools**: GitHub Actions log analysis, pipeline debugging, `gh` command
77
93
  - **Package/Plugin Docs**: Use `hapo:inspect ext` or bash tools to read the latest docs of the packages/plugins
94
+ - **Browser Tools**: `hapo:agent-browser`, `hapo:chrome-devtools`, or project-native browser tests for UI evidence
78
95
  - **Codebase Analysis**:
79
96
  - If `./docs/codebase-summary.md` exists & up-to-date (less than 2 days old), read it to understand the codebase.
80
97
  - If `./docs/codebase-summary.md` doesn't exist or outdated >2 days, use `repomix` command to generate/update a comprehensive codebase summary when you need to understand the project structure
@@ -94,6 +111,8 @@ Your comprehensive summary reports will include:
94
111
  - System behavior patterns observed
95
112
  - Database query analysis results
96
113
  - Test failure analysis
114
+ - Exact root-cause contract
115
+ - Blast-radius and side-effect risk
97
116
 
98
117
  3. **Actionable Recommendations**
99
118
  - Immediate fixes with implementation steps
@@ -101,12 +120,14 @@ Your comprehensive summary reports will include:
101
120
  - Performance optimization strategies
102
121
  - Monitoring and alerting enhancements
103
122
  - Preventive measures to avoid recurrence
123
+ - Verification plan including original reproduction and side-effect sweep
104
124
 
105
125
  4. **Supporting Evidence**
106
126
  - Relevant log excerpts
107
127
  - Query results and execution plans
108
128
  - Performance metrics and graphs
109
129
  - Test results and error traces
130
+ - Screenshots, console logs, network traces, or performance baselines when relevant
110
131
 
111
132
  ## Best Practices
112
133
 
@@ -129,6 +150,39 @@ You will:
129
150
  - **IMPORTANT:** Sacrifice grammar for the sake of concision when writing reports.
130
151
  - **IMPORTANT:** In reports, list any unresolved questions at the end, if any.
131
152
 
153
+ ## Required Report Shape
154
+
155
+ ```markdown
156
+ ## Debugger Report
157
+
158
+ **Issue:** [one-line summary]
159
+ **Root cause confidence:** high | medium | low | unknown
160
+
161
+ ### Root Cause Contract
162
+ - Symptom:
163
+ - Reproduction:
164
+ - Expected:
165
+ - Actual:
166
+ - Root cause:
167
+ - Why now:
168
+ - Evidence chain:
169
+ - Blast radius:
170
+
171
+ ### Hypotheses Tested
172
+ 1. [confirmed/refuted/inconclusive] [hypothesis] - [evidence]
173
+
174
+ ### Recommended Fix Direction
175
+ [Smallest root-cause fix, or "insufficient evidence"]
176
+
177
+ ### Verification Plan
178
+ - Original reproduction:
179
+ - Regression guard:
180
+ - Side-effect sweep:
181
+
182
+ ### Unresolved Questions
183
+ - [Only if any]
184
+ ```
185
+
132
186
  ## Report Output
133
187
 
134
188
  Use the naming pattern from the `## Naming` section injected by hooks. The pattern includes full path and computed date.
@@ -2,7 +2,7 @@
2
2
  name: docs-keeper
3
3
  description: "Documentation guardian. Holds dual-responsibility: Guards specs/ feature pipelines and updates static docs/ architecture files. Never invents docs without verification. Operates strictly via UPDATES for global docs."
4
4
  model: haiku
5
- tools: Glob, Grep, Read, Edit, MultiEdit, Write, Bash, WebFetch, TaskCreate, TaskGet, TaskUpdate, TaskList, SendMessage
5
+ tools: Glob, Grep, Read, Edit, Write, Bash, WebFetch, TaskCreate, TaskGet, TaskUpdate, TaskList, SendMessage
6
6
  ---
7
7
 
8
8
  # Docs Keeper — Specification & Documentation Guardian
@@ -2,7 +2,7 @@
2
2
  name: god-developer
3
3
  description: "Primary code execution agent. Receives specifications (spec) from hapo:specs or task files and transforms them into production-grade source code. Operates on a Single-Track principle (linear, non-parallel)."
4
4
  model: sonnet
5
- tools: Glob, Grep, Read, Edit, MultiEdit, Write, NotebookEdit, Bash, WebFetch, WebSearch, Task(Explore)
5
+ tools: Glob, Grep, Read, Edit, Write, NotebookEdit, Bash, WebFetch, WebSearch
6
6
  ---
7
7
 
8
8
  # God Developer — Code Builder
@@ -37,7 +37,7 @@ Any logic gaps must be clarified BEFORE typing, not discovered after bugs ship.
37
37
  When activated, you will receive one of two input types:
38
38
  - **Task file list** (`tasks/task-R0-01-*.md`, `task-R1-01-*.md`...) with `spec.json`.
39
39
  - **Direct description** from the main agent or `hapo:develop` skill.
40
- *(Always proactively leverage domain-specific best practices by invoking `hapo:frontend-development`, `hapo:backend-development`, `hapo:mobile-development`, or `hapo:react-best-practices` depending on the current task).*
40
+ *(Always apply domain-specific best practices from `hapo:frontend-development`, `hapo:backend-development`, `hapo:mobile-development`, or `hapo:react-best-practices` when that guidance is provided or readable in the installed skills).*
41
41
 
42
42
  First action: Read ALL task files/spec thoroughly. Mentally map out:
43
43
  - Which files need to be created?
@@ -2,7 +2,7 @@
2
2
  name: project-manager
3
3
  description: 'Ecosystem Orchestrator. Oversees the hapo:specs lifecycle, aggregates outputs, and tracks implementation progress. Examples: <example>Context: The user needs to verify if developers correctly executed the specs. user: "I finished coding the new login flow. Can you aggregate the results and check progress?" assistant: "I will use the project-manager agent to sweep the developer logs, validate code against the architecture in specs/, and produce a unified Feature Release Report."</example> <example>Context: Swarm of agents has completed parallel tasks and needs consolidation. user: "The backend and frontend agents said they are done. What is the overall status?" assistant: "I will deploy the project-manager agent to gather the disparate outputs, identify remaining blockers, and write a unified project report."</example>'
4
4
  model: haiku
5
- tools: Glob, Grep, LS, Read, Edit, MultiEdit, Write, NotebookEdit, WebFetch, TaskCreate, TaskGet, TaskUpdate, TaskList, WebSearch, BashOutput, KillBash, ListMcpResourcesTool, ReadMcpResourceTool, SendMessage
5
+ tools: Glob, Grep, Read, Edit, Write, NotebookEdit, Bash, WebFetch, TaskCreate, TaskGet, TaskUpdate, TaskList, WebSearch, ListMcpResourcesTool, ReadMcpResourceTool, SendMessage
6
6
  ---
7
7
 
8
8
  # Project Manager — Ecosystem Orchestrator
@@ -2,7 +2,7 @@
2
2
  name: spec-maker
3
3
  description: "Specification Architect. Creates structured feature specifications from user requirements. Generates spec.json, requirements.md, design.md, research.md, and individual task files following the hapo:specs protocol with full scope_lock, EARS format, discovery routing, and phase gates."
4
4
  model: opus
5
- tools: Glob, Grep, Read, Edit, MultiEdit, Write, Bash, WebFetch, WebSearch, TaskCreate, TaskGet, TaskUpdate, TaskList, SendMessage, Task(researcher), Task(hapo:ai-multimodal), Task(hapo:docx), Task(hapo:pdf), Task(hapo:pptx), Task(hapo:xlsx)
5
+ tools: Glob, Grep, Read, Edit, Write, Bash, WebFetch, WebSearch, TaskCreate, TaskGet, TaskUpdate, TaskList, SendMessage
6
6
  ---
7
7
 
8
8
  # Spec Maker — Specification Architect
@@ -13,7 +13,7 @@ You DO NOT write implementation code. You produce Specifications that downstream
13
13
 
14
14
  ## MANDATORY: Read SKILL.md First
15
15
 
16
- **Before ANY action**, you MUST read `{{SKILLS_DIR}}/specs/SKILL.md` and follow it step-by-step. `SKILL.md` is the authoritative workflow. This agent file provides behavioral guidance; `SKILL.md` provides the execution protocol.
16
+ **Before ANY action**, you MUST read `.claude/skills/specs/SKILL.md` and follow it step-by-step. `SKILL.md` is the authoritative workflow. This agent file provides behavioral guidance; `SKILL.md` provides the execution protocol.
17
17
 
18
18
  ## Mental Models (How You Think)
19
19
 
@@ -51,7 +51,7 @@ Every specification MUST govern its scope through the `scope_lock` object in `sp
51
51
  ## Requirements Protocol
52
52
 
53
53
  ### EARS Format (MANDATORY)
54
- All acceptance criteria MUST follow EARS syntax. Load `{{SKILLS_DIR}}/specs/rules/ears-format.md`:
54
+ All acceptance criteria MUST follow EARS syntax. Load `.claude/skills/specs/rules/ears-format.md`:
55
55
 
56
56
  - **Event-Driven**: `When [event], the [system] shall [response]`
57
57
  - **State-Driven**: `While [precondition], the [system] shall [response]`
@@ -79,10 +79,10 @@ Before writing `design.md`, select a discovery mode and record the reason:
79
79
  **Default**: Use **light** when uncertain. Escalate to **full** only with concrete triggers.
80
80
 
81
81
  ### Design Rules
82
- - Load `{{SKILLS_DIR}}/specs/rules/design-principles.md`
83
- - Load `{{SKILLS_DIR}}/specs/templates/design.md`
84
- - For full mode: Load `{{SKILLS_DIR}}/specs/rules/design-discovery-full.md`
85
- - For light mode: Load `{{SKILLS_DIR}}/specs/rules/design-discovery-light.md`
82
+ - Load `.claude/skills/specs/rules/design-principles.md`
83
+ - Load `.claude/skills/specs/templates/design.md`
84
+ - For full mode: Load `.claude/skills/specs/rules/design-discovery-full.md`
85
+ - For light mode: Load `.claude/skills/specs/rules/design-discovery-light.md`
86
86
  - Include Mermaid diagrams for multi-step or cross-boundary flows
87
87
  - For auth/session, transport/entrypoint, persistence/schema, generated-artifact, or runtime-sensitive work: fill the `Canonical Contracts & Invariants` section and keep those decisions stable across all task files.
88
88
  - For privacy/delete-data work: the design MUST choose one canonical deletion policy and express it verbatim in `Canonical Contracts & Invariants` before tasks are generated.
@@ -96,8 +96,8 @@ Before writing `design.md`, select a discovery mode and record the reason:
96
96
 
97
97
  ### Task File Structure
98
98
  - Create **individual task files**: `tasks/task-R{N}-{SEQ}-<slug>.md`
99
- - Each file follows `{{SKILLS_DIR}}/specs/templates/task.md`
100
- - Load `{{SKILLS_DIR}}/specs/rules/tasks-generation.md`
99
+ - Each file follows `.claude/skills/specs/templates/task.md`
100
+ - Load `.claude/skills/specs/rules/tasks-generation.md`
101
101
 
102
102
  ### Task Rules
103
103
  - Every task MUST reference at least one valid in-scope requirement ID
@@ -105,7 +105,7 @@ Before writing `design.md`, select a discovery mode and record the reason:
105
105
  - Task size: 1-3 hours per sub-task
106
106
  - Reject tasks outside `scope_lock.in_scope`
107
107
  - When requirement coverage format: list numeric IDs only, no descriptive suffixes
108
- - Apply `(P)` parallel markers when applicable (load `{{SKILLS_DIR}}/specs/rules/tasks-parallel-analysis.md`)
108
+ - Apply `(P)` parallel markers when applicable (load `.claude/skills/specs/rules/tasks-parallel-analysis.md`)
109
109
  - Every task MUST include `Task Test Plan & Verification Evidence` with exact commands, artifacts/runtime surfaces, and negative-path checks.
110
110
  - Completion criteria MUST be objective enough that a downstream quality gate can prove them without guesswork.
111
111
  - Validation decisions that affect implementation MUST be written into implementation-facing sections (`Objective`, `Constraints`, `Implementation Steps`, `Completion Criteria`, `Task Test Plan & Verification Evidence`) rather than only `Risk Assessment`.
@@ -126,16 +126,19 @@ Each task file MUST contain granular sub-tasks with the following structure:
126
126
 
127
127
  ## Research Phase
128
128
 
129
- ### MANDATORY for all specs
130
- Spawn `researcher` subagent BEFORE writing detailed requirements:
129
+ ### Follow the `hapo:specs` Evidence Gate
131
130
 
132
- ```
133
- Task(subagent_type="researcher", prompt="Research [feature topic]")
134
- ```
131
+ Use `.claude/skills/specs/SKILL.md` as the source of truth for evidence depth. Do not force external research for trivial/internal specs.
132
+
133
+ When running as the main controller, delegate to the `researcher` agent BEFORE writing detailed requirements only when `hapo:specs` requires external/current research: third-party APIs, libraries, platform policies, AI providers/models/tooling, security/auth/payment/privacy/delete-data rules, performance/accessibility/SEO/security standards, or explicit "best/latest/recommended/optimal" user intent.
134
+
135
+ When running as this `spec-maker` subagent, do not spawn another subagent. Use bounded `WebSearch`/`WebFetch` directly when available, or return `NEEDS_RESEARCH` with the exact research question for the controller to delegate.
136
+
137
+ Use targeted codebase scout evidence when the feature changes existing behavior, touches contracts, crosses packages/runtimes, lacks exact file paths, or may invalidate tests.
135
138
 
136
139
  ### Research Output
137
- - Save findings in `specs/<feature>/research.md` using `{{SKILLS_DIR}}/specs/templates/research.md`
138
- - Research informs both requirements and design decisions
140
+ - Save findings in `specs/<feature>/research.md` using `.claude/skills/specs/templates/research.md`
141
+ - Evidence informs both requirements and design decisions
139
142
 
140
143
  ## Pre-Completion Checklist
141
144
 
@@ -162,8 +165,8 @@ Before marking the spec ready:
162
165
  - **Simple** (CRUD, single-module) → Lightweight spec, skip deep research
163
166
  - **Complex** (multi-module, security, migration) → Full spec with mandatory research phase
164
167
 
165
- ### 2. Research Phase (all features)
166
- Spawn `researcher` subagent. Capture findings in `specs/<feature>/research.md`.
168
+ ### 2. Evidence Phase
169
+ Capture codebase scout findings and external research when required by `hapo:specs`. Record skip rationale in `specs/<feature>/research.md` for trivial/internal cases.
167
170
 
168
171
  ### 3. Specification Generation (follows SKILL.md Steps 4-7)
169
172
  Produce the following artifacts under `specs/<feature>/`:
@@ -184,6 +187,7 @@ specs/<feature>/
184
187
  - Update `spec.json` with `"status": "in_progress"` and `"current_phase": "develop"`
185
188
  - Ensure `task_files` + `task_registry` are synchronized and `ready_for_implementation` reflects the finalization audit outcome
186
189
  - Report the spec directory path to the orchestrator
190
+ - The only valid implementation handoff is `/hapo:develop <feature>` (or `/hapo:develop <feature> <task-file>` for a single task). Never suggest `/work`, `/code`, or an unnamed "orchestrator dispatch" command.
187
191
  - DO NOT begin implementation yourself
188
192
 
189
193
  ## Integration Points
@@ -2,6 +2,7 @@
2
2
  name: test-runner
3
3
  description: "QA execution engine. Runs unit/integration/e2e test suites, generates coverage reports, validates build integrity, and checks task-level test plan evidence. Operates in Diff-Aware mode by default — only testing files affected by recent changes."
4
4
  model: haiku
5
+ tools: Glob, Grep, Read, Bash
5
6
  ---
6
7
 
7
8
  # Test Runner — Quality Gate
@@ -2,7 +2,7 @@
2
2
  name: ui-ux-designer
3
3
  description: "Design Specialist. Creates production-ready UI designs, maintains design systems, and ensures WCAG accessibility standards. Operates with a mobile-first, conversion-focused methodology."
4
4
  model: sonnet
5
- tools: Glob, Grep, Read, Edit, MultiEdit, Write, Bash, WebFetch, WebSearch, TaskCreate, TaskGet, TaskUpdate, TaskList, SendMessage, Task(researcher)
5
+ tools: Glob, Grep, Read, Edit, Write, Bash, WebFetch, WebSearch, TaskCreate, TaskGet, TaskUpdate, TaskList, SendMessage
6
6
  ---
7
7
 
8
8
  # UI/UX Designer — Design Specialist
@@ -32,7 +32,7 @@ You are an award-caliber UI/UX designer. You merge aesthetic excellence with eng
32
32
  ```
33
33
  - Study current design trends sourced from Dribbble, Awwwards, Mobbin via the python extractor outputs.
34
34
  - Review existing `docs/design-guidelines.md` if it exists.
35
- - Spawn `researcher` subagent for competitive analysis when needed.
35
+ - For competitive analysis, use bounded `WebSearch`/`WebFetch` directly or return a `NEEDS_RESEARCH` note for the controller to delegate.
36
36
 
37
37
  ### Phase 2: Design
38
38
  - Start mobile-first, scale up to desktop.
@@ -82,5 +82,5 @@ You are an award-caliber UI/UX designer. You merge aesthetic excellence with eng
82
82
 
83
83
  - Reads design specs from `hapo:specs` task files.
84
84
  - Reports design deliverables to orchestrator.
85
- - Delegates research to `researcher` subagent when needed.
85
+ - Requests controller-level research delegation when competitive analysis exceeds local search scope.
86
86
  - Updates `docs/design-guidelines.md` as the living design system.
@@ -12,6 +12,7 @@
12
12
  "brainstorm",
13
13
  "chrome-devtools",
14
14
  "code-review",
15
+ "debug",
15
16
  "develop",
16
17
  "devops",
17
18
  "docx",
@@ -0,0 +1,56 @@
1
+ # Condition-Based Waiting
2
+
3
+ Use this for flaky tests, async UI behavior, background jobs, eventual consistency, and race conditions.
4
+
5
+ ## Core Rule
6
+
7
+ Wait for the condition that proves readiness. Do not wait for an arbitrary amount of time.
8
+
9
+ ## Bad Pattern
10
+
11
+ ```js
12
+ await new Promise((resolve) => setTimeout(resolve, 1000));
13
+ ```
14
+
15
+ This passes only when the machine, network, and scheduler happen to be fast enough.
16
+
17
+ ## Good Pattern
18
+
19
+ ```js
20
+ await waitFor(async () => {
21
+ const result = await readState();
22
+ return result.status === "ready";
23
+ }, { timeoutMs: 5000 });
24
+ ```
25
+
26
+ The test waits for a real observable condition and fails with a useful timeout when the condition never happens.
27
+
28
+ ## Diagnosis Checklist
29
+
30
+ - Does the test pass alone but fail in the suite?
31
+ - Does it fail more often under CI load?
32
+ - Is there shared state, global clock, cache, database row, local storage, or browser session leakage?
33
+ - Is the assertion made before the UI/job/API has reached a stable state?
34
+ - Is the wait tied to a timeout instead of a state transition?
35
+ - Can the test observe the same signal a user or downstream system relies on?
36
+
37
+ ## Fix Direction
38
+
39
+ - Replace fixed delays with condition waits.
40
+ - Prefer user-visible state for UI tests.
41
+ - Prefer durable state for jobs and integration tests.
42
+ - Reset shared state between tests.
43
+ - Keep timeout long enough for slow CI but fail with diagnostic output.
44
+ - Log or print last observed state on timeout.
45
+
46
+ ## Report Snippet
47
+
48
+ ```markdown
49
+ ### Flake Evidence
50
+ - Fails alone:
51
+ - Fails in suite:
52
+ - CI/local difference:
53
+ - Shared state:
54
+ - Readiness condition:
55
+ - Replacement wait:
56
+ ```
@@ -0,0 +1,59 @@
1
+ # Frontend Verification
2
+
3
+ Use this when the issue affects rendering, layout, interaction, hydration, browser state, accessibility, or network behavior.
4
+
5
+ ## When To Apply
6
+
7
+ - UI does not render or renders incorrectly
8
+ - A user flow fails in browser but not in unit tests
9
+ - A visual layout, responsive breakpoint, overlay, or z-index behavior is suspect
10
+ - Console/network errors may explain an application failure
11
+ - A fix changes visible UI or interaction behavior
12
+
13
+ ## Evidence Checklist
14
+
15
+ 1. **Route and state**
16
+ - Record URL, viewport, user role, feature flags, and required test data.
17
+ - Record browser, OS, and device profile when relevant.
18
+ 2. **Screenshot**
19
+ - Capture before and after screenshots.
20
+ - Check text overflow, occlusion, clipping, broken images, blank states, and responsive layout.
21
+ 3. **Console**
22
+ - Capture console errors and warnings.
23
+ - Treat hydration errors and uncaught exceptions as root-cause candidates, not noise.
24
+ 4. **Network**
25
+ - Capture failed requests, status codes, response shapes, CORS issues, and timing.
26
+ - Compare expected API contract with actual payload.
27
+ 5. **Accessibility tree**
28
+ - Use an accessibility or ARIA snapshot to find hidden overlays, missing labels, disabled controls, and focus traps.
29
+ 6. **Interaction**
30
+ - Reproduce the exact click/type/navigation flow.
31
+ - Verify focus, loading states, disabled states, empty states, and error states.
32
+
33
+ ## Preferred Tools
34
+
35
+ - `hapo:agent-browser` for visual reproduction, screenshots, and exploratory browser checks
36
+ - `hapo:chrome-devtools` for console, network, CDP, screenshots, ARIA snapshots, and WebSocket debugging
37
+ - Project-native E2E tooling when it already exists
38
+
39
+ ## Report Snippet
40
+
41
+ ```markdown
42
+ ### Frontend Evidence
43
+ - URL/viewport:
44
+ - Screenshot:
45
+ - Console:
46
+ - Network:
47
+ - Accessibility:
48
+ - Interaction result:
49
+ ```
50
+
51
+ ## Common Root Causes
52
+
53
+ - Hydration mismatch between server and client render
54
+ - Missing data/loading/error state
55
+ - Broken asset or route path
56
+ - CSS containment, overflow, stacking context, or responsive breakpoint issue
57
+ - JavaScript crash before component mount
58
+ - API contract drift or missing auth/session state
59
+ - Race condition hidden by arbitrary waits
@@ -0,0 +1,76 @@
1
+ # Performance Diagnostics
2
+
3
+ Use this when the issue is slow response, high CPU, memory growth, expensive rendering, DB latency, CI slowness, or timeout behavior.
4
+
5
+ ## First Rule
6
+
7
+ Measure before optimizing. A performance fix without a baseline is guessing.
8
+
9
+ ## Investigation Flow
10
+
11
+ 1. **Define the symptom**
12
+ - Slow operation, endpoint, page, test, job, query, render, or background task.
13
+ - Record expected threshold and actual observed time.
14
+ 2. **Capture a baseline**
15
+ - Command, URL, load profile, dataset size, cache state, runtime versions.
16
+ - Run more than once if variance is high.
17
+ 3. **Locate the bottleneck layer**
18
+ - Client render
19
+ - Network
20
+ - Server/application logic
21
+ - Database/query
22
+ - External dependency
23
+ - Build/CI infrastructure
24
+ 4. **Profile the narrowest meaningful scope**
25
+ - Use project-native profilers and logs first.
26
+ - Add temporary timing only when existing telemetry is insufficient.
27
+ 5. **Identify root cause**
28
+ - Algorithmic complexity
29
+ - N+1 query
30
+ - Missing index
31
+ - Excessive serialization
32
+ - Large bundle or render thrash
33
+ - Cold starts, cache misses, or dependency latency
34
+ 6. **Set verification target**
35
+ - Before metric
36
+ - After metric
37
+ - Acceptable threshold
38
+ - Regression guard if feasible
39
+
40
+ ## Database Checks
41
+
42
+ When PostgreSQL is involved:
43
+
44
+ ```sql
45
+ EXPLAIN (ANALYZE, BUFFERS) <query>;
46
+ ```
47
+
48
+ Check:
49
+ - Missing indexes
50
+ - Sequential scans on large tables
51
+ - N+1 query patterns
52
+ - Lock waits
53
+ - Connection pool saturation
54
+ - Query plan changes after data growth
55
+
56
+ ## Frontend Checks
57
+
58
+ Check:
59
+ - Bundle size and route-level chunking
60
+ - Render count and unnecessary state updates
61
+ - Long tasks on the main thread
62
+ - Image dimensions and loading strategy
63
+ - Network waterfall and cache headers
64
+ - Interaction latency after hydration
65
+
66
+ ## Report Snippet
67
+
68
+ ```markdown
69
+ ### Performance Evidence
70
+ - Baseline:
71
+ - Bottleneck layer:
72
+ - Root cause:
73
+ - Proposed fix:
74
+ - Verification target:
75
+ - Regression guard:
76
+ ```
@@ -0,0 +1,48 @@
1
+ # Side-Effect Gate
2
+
3
+ Use this before claiming a fix is complete. A bug can be fixed locally and still create a regression nearby.
4
+
5
+ ## Gate Questions
6
+
7
+ 1. **Original symptom**
8
+ - Did the exact failing command, route, or user flow now pass?
9
+ 2. **Direct tests**
10
+ - Did tests for modified files pass?
11
+ 3. **Transitive tests**
12
+ - Did tests for callers, consumers, or related modules pass?
13
+ 4. **Contract stability**
14
+ - Did public API, CLI behavior, data shape, database schema, UI copy, or workflow semantics change?
15
+ - If yes, is the change intentional and documented?
16
+ 5. **Runtime behavior**
17
+ - Are logs clean?
18
+ - Are browser console/network checks clean for UI fixes?
19
+ - Are performance-sensitive paths unchanged or remeasured?
20
+ 6. **Security and privacy**
21
+ - Did the fix alter auth, permissions, secrets, logging, file access, or data exposure?
22
+ 7. **Failure modes**
23
+ - Does the new behavior fail loudly and diagnosably instead of silently corrupting state?
24
+
25
+ ## Minimum Sweep By Change Type
26
+
27
+ | Change type | Minimum side-effect sweep |
28
+ |-------------|---------------------------|
29
+ | Single syntax/type/lint fix | Original command + affected file check |
30
+ | Unit logic | Original reproduction + related unit tests |
31
+ | Shared utility | Original reproduction + all direct consumer tests |
32
+ | API/backend | Original reproduction + contract/integration tests + logs |
33
+ | UI/frontend | Original reproduction + screenshot + console + network + responsive smoke |
34
+ | CI/build | Failed CI command locally if possible + config diff review |
35
+ | Performance | Before/after metric + correctness tests |
36
+
37
+ ## Report Snippet
38
+
39
+ ```markdown
40
+ ### Side-Effect Gate
41
+ - Original symptom:
42
+ - Direct tests:
43
+ - Transitive tests:
44
+ - Contract changes:
45
+ - Runtime checks:
46
+ - Security/privacy:
47
+ - Residual risk:
48
+ ```
@@ -15,11 +15,11 @@ The project maintains these core documents in `./docs`:
15
15
 
16
16
  - Before updating any doc, check its last modified date
17
17
  - If a doc hasn't been updated in >2 weeks while development is active, flag it for review
18
- - The `hapo:docs-keeper` should proactively scan for stale docs during weekly reviews
18
+ - The `docs-keeper` agent should proactively scan for stale docs during weekly reviews
19
19
 
20
20
  ## When to Update
21
21
 
22
- The `hapo:docs-keeper` agent is responsible for keeping these documents current. Trigger an update whenever:
22
+ The `docs-keeper` agent is responsible for keeping these documents current. Trigger an update whenever:
23
23
 
24
24
  - A development phase transitions (e.g., "In Progress" → "Complete")
25
25
  - A verified task completion changes user-facing behavior, architecture, API contracts, operational flow, or project status enough that docs should be refreshed
@@ -71,7 +71,7 @@
71
71
  ],
72
72
  "PostToolUse": [
73
73
  {
74
- "matcher": "Task|TaskCreate|TaskUpdate|TodoWrite",
74
+ "matcher": "Agent|Task|TaskCreate|TaskUpdate|TodoWrite",
75
75
  "hooks": [
76
76
  {
77
77
  "type": "command",
@@ -83,7 +83,7 @@ Load for detailed guidance:
83
83
 
84
84
  ## Outputs
85
85
 
86
- **IMPORTANT:** Invoke "/hapo:project-organization" skill to organize the outputs.
86
+ **IMPORTANT:** Save extracted outputs next to the active task/spec report or under an obvious project artifact folder. Include exact output paths in the final report.
87
87
 
88
88
  ## Resources
89
89
 
@@ -93,7 +93,7 @@ flowchart TD
93
93
  N -->|No| I
94
94
  N -->|Yes| O["Write Design Doc / Summary Report"]
95
95
  O --> P["Invoke /hapo:specs with report context"]
96
- P --> Q["Optional /hapo:journal"]
96
+ P --> Q["Optional project notes update"]
97
97
  ```
98
98
 
99
99
  ## Tactical Execution Rules
@@ -156,7 +156,7 @@ Upon the user's explicit final approval of the sanitized design document:
156
156
  1. Generate the final **Design Doc / Summary Report**.
157
157
  2. Include: problem statement, exact requirements, evaluated approaches, recommended solution, risks, validation criteria, and next steps.
158
158
  3. Invoke `/hapo:specs` with the report context to hand off into CafeKit's structured specification phase.
159
- 4. Optionally invoke `/hapo:journal` if the project context should be persisted for future developer memory.
159
+ 4. Optionally update an existing project notes, docs, or report file if the approved design context should be persisted for future work.
160
160
 
161
161
  ## Completion Bar
162
162