@haaaiawd/anws 2.3.0 → 2.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (95) hide show
  1. package/README.md +1 -1
  2. package/bin/cli.js +53 -23
  3. package/lib/diff.js +5 -2
  4. package/lib/init.js +217 -96
  5. package/lib/install-state.js +18 -3
  6. package/lib/manifest.js +364 -79
  7. package/lib/prompt.js +68 -0
  8. package/lib/resources/index.js +36 -2
  9. package/lib/update.js +12 -6
  10. package/package.json +48 -47
  11. package/templates/.agents/skills/anws-system/SKILL.md +107 -105
  12. package/templates/.agents/skills/code-reviewer/SKILL.md +171 -115
  13. package/templates/.agents/skills/concept-modeler/SKILL.md +230 -179
  14. package/templates/.agents/skills/craft-authoring/SKILL.md +186 -183
  15. package/templates/.agents/skills/craft-authoring/references/BUNDLE_POLICY.md +42 -0
  16. package/templates/.agents/skills/design-reviewer/SKILL.md +265 -190
  17. package/templates/.agents/skills/e2e-testing-guide/SKILL.md +246 -135
  18. package/templates/.agents/skills/nexus-mapper/SKILL.md +321 -321
  19. package/templates/.agents/skills/nexus-mapper/references/probe-protocol.md +1 -1
  20. package/templates/.agents/skills/nexus-query/SKILL.md +1 -1
  21. package/templates/.agents/skills/output-contract/SKILL.md +37 -0
  22. package/templates/.agents/skills/runtime-inspector/SKILL.md +150 -99
  23. package/templates/.agents/skills/sequential-thinking/SKILL.md +222 -225
  24. package/templates/.agents/skills/spec-writer/SKILL.md +75 -30
  25. package/templates/.agents/skills/system-architect/SKILL.md +538 -678
  26. package/templates/.agents/skills/system-designer/SKILL.md +124 -537
  27. package/templates/.agents/skills/task-planner/SKILL.md +1 -2
  28. package/templates/.agents/skills/task-planner/references/TASK_TEMPLATE_05A.md +2 -2
  29. package/templates/.agents/skills/task-reviewer/SKILL.md +421 -386
  30. package/templates/.agents/skills/tech-evaluator/SKILL.md +252 -144
  31. package/templates/.agents/workflows/blueprint.md +156 -68
  32. package/templates/.agents/workflows/challenge.md +322 -494
  33. package/templates/.agents/workflows/change.md +182 -339
  34. package/templates/.agents/workflows/craft.md +159 -197
  35. package/templates/.agents/workflows/design-system.md +202 -674
  36. package/templates/.agents/workflows/explore.md +187 -399
  37. package/templates/.agents/workflows/forge.md +650 -609
  38. package/templates/.agents/workflows/genesis.md +438 -351
  39. package/templates/.agents/workflows/probe.md +215 -240
  40. package/templates/.agents/workflows/quickstart.md +304 -123
  41. package/templates/.agents/workflows/upgrade.md +145 -182
  42. package/templates_en/.agents/skills/anws-system/SKILL.md +110 -0
  43. package/templates_en/.agents/skills/code-reviewer/SKILL.md +171 -0
  44. package/templates_en/.agents/skills/concept-modeler/SKILL.md +230 -0
  45. package/templates_en/.agents/skills/craft-authoring/SKILL.md +179 -0
  46. package/templates_en/.agents/skills/craft-authoring/references/BUNDLE_POLICY.md +42 -0
  47. package/templates_en/.agents/skills/craft-authoring/references/PROMPT_QUALITY_RUBRIC.md +92 -0
  48. package/templates_en/.agents/skills/craft-authoring/references/SCORECARD_TEMPLATE.md +52 -0
  49. package/templates_en/.agents/skills/design-reviewer/SKILL.md +264 -0
  50. package/templates_en/.agents/skills/e2e-testing-guide/SKILL.md +246 -0
  51. package/templates_en/.agents/skills/nexus-mapper/SKILL.md +306 -0
  52. package/templates_en/.agents/skills/nexus-mapper/references/language-customization.md +167 -0
  53. package/templates_en/.agents/skills/nexus-mapper/references/output-schema.md +311 -0
  54. package/templates_en/.agents/skills/nexus-mapper/references/probe-protocol.md +246 -0
  55. package/templates_en/.agents/skills/nexus-mapper/scripts/extract_ast.py +706 -0
  56. package/templates_en/.agents/skills/nexus-mapper/scripts/git_detective.py +194 -0
  57. package/templates_en/.agents/skills/nexus-mapper/scripts/languages.json +127 -0
  58. package/templates_en/.agents/skills/nexus-mapper/scripts/query_graph.py +556 -0
  59. package/templates_en/.agents/skills/nexus-mapper/scripts/requirements.txt +6 -0
  60. package/templates_en/.agents/skills/nexus-query/SKILL.md +114 -0
  61. package/templates_en/.agents/skills/nexus-query/scripts/extract_ast.py +706 -0
  62. package/templates_en/.agents/skills/nexus-query/scripts/git_detective.py +194 -0
  63. package/templates_en/.agents/skills/nexus-query/scripts/languages.json +127 -0
  64. package/templates_en/.agents/skills/nexus-query/scripts/query_graph.py +556 -0
  65. package/templates_en/.agents/skills/nexus-query/scripts/requirements.txt +6 -0
  66. package/templates_en/.agents/skills/output-contract/SKILL.md +37 -0
  67. package/templates_en/.agents/skills/runtime-inspector/SKILL.md +150 -0
  68. package/templates_en/.agents/skills/sequential-thinking/SKILL.md +214 -0
  69. package/templates_en/.agents/skills/spec-writer/SKILL.md +153 -0
  70. package/templates_en/.agents/skills/spec-writer/references/prd_template.md +177 -0
  71. package/templates_en/.agents/skills/system-architect/SKILL.md +538 -0
  72. package/templates_en/.agents/skills/system-architect/references/rfc_template.md +59 -0
  73. package/templates_en/.agents/skills/system-designer/SKILL.md +188 -0
  74. package/templates_en/.agents/skills/system-designer/references/system-design-detail-template.md +187 -0
  75. package/templates_en/.agents/skills/system-designer/references/system-design-template.md +605 -0
  76. package/templates_en/.agents/skills/task-planner/SKILL.md +251 -0
  77. package/templates_en/.agents/skills/task-planner/references/TASK_TEMPLATE_05A.md +109 -0
  78. package/templates_en/.agents/skills/task-planner/references/TASK_TEMPLATE_05B.md +176 -0
  79. package/templates_en/.agents/skills/task-reviewer/SKILL.md +422 -0
  80. package/templates_en/.agents/skills/tech-evaluator/SKILL.md +252 -0
  81. package/templates_en/.agents/skills/tech-evaluator/references/ADR_TEMPLATE.md +78 -0
  82. package/templates_en/.agents/workflows/blueprint.md +200 -0
  83. package/templates_en/.agents/workflows/challenge.md +326 -0
  84. package/templates_en/.agents/workflows/change.md +182 -0
  85. package/templates_en/.agents/workflows/craft.md +159 -0
  86. package/templates_en/.agents/workflows/design-system.md +202 -0
  87. package/templates_en/.agents/workflows/explore.md +187 -0
  88. package/templates_en/.agents/workflows/forge.md +651 -0
  89. package/templates_en/.agents/workflows/genesis.md +438 -0
  90. package/templates_en/.agents/workflows/probe.md +215 -0
  91. package/templates_en/.agents/workflows/quickstart.md +305 -0
  92. package/templates_en/.agents/workflows/upgrade.md +145 -0
  93. package/templates_en/AGENTS.md +149 -0
  94. package/templates/.agents/skills/report-template/SKILL.md +0 -92
  95. package/templates/.agents/skills/report-template/references/REPORT_TEMPLATE.md +0 -100
@@ -0,0 +1,264 @@
1
+ ---
2
+ name: design-reviewer
3
+ description: Load when `/challenge` needs design-side contract-closure evidence (three-dimensional architecture and system-design doc review); deliver anchorable, severity-graded findings for inclusion in 07_CHALLENGE_REPORT—not final rulings outside challenge context.
4
+ ---
5
+
6
+ # design-reviewer
7
+ > Naming design defects before implementation is an order of magnitude cheaper than settling the debt after production.
8
+
9
+ Within the `/challenge` chain, you are the **design-side evidence layer**: show which contracts remain unclosed at system boundaries, interfaces, state, timing, and error paths; you **do not** replace CHALLENGER’s holistic report judgment or routing—only deliver mergeable, verifiable design finding blocks.
10
+
11
+ ---
12
+
13
+ ## CRITICAL Methodology anchors
14
+
15
+ > [!IMPORTANT]
16
+ >
17
+ > The goal of design review is not to show cleverness—it is to make “document pledge—reasoning—gap” inspectable point by point by a third party.
18
+ >
19
+ > - **Awaken, not proclaim**: Recover design intent and ADR trade-offs before marking gaps; items that skip intent recovery tend to degrade into vague “risks”.
20
+ > - **Unfold, not single-track**: The same conclusion must still hold under cross-reading of PRD, architecture overview, System Design, and ADRs; scanning a single document misses defaults and implicit coupling.
21
+ > - **Elevate, then ground**: Raise issues to the contract layer (boundary / interface / state machine / fault semantics), then land on **citeable anchors**; stopping at analogy or stopping at outline titles is not deliverable.
22
+ > - **Rebuild, not paraphrase**: Rebuild evidence chains for “where it necessarily breaks if unfixed”—do not rewrite or repeat source sentences verbatim.
23
+
24
+ ---
25
+
26
+ ## CRITICAL Deliverable contract
27
+
28
+ > [!IMPORTANT]
29
+ >
30
+ > Shared persisted-report rules (precision, evidence, non-repetition, no filler, single-writer, delegation closure) are defined in **`.agents/skills/output-contract/SKILL.md`**; this skill only adds design-review-specific anchor and severity rules.
31
+ > - **Anchor**: Every finding carries **minimal sufficient anchors** (`path`, explicit title / subsection name, or stable chapter id); do not write only “see architecture doc”.
32
+ > - **Traceable**: “Finding → quote or précis → inference chain → impact → suggestion” stays in consistent order for lookup; without an inference chain, do not tag Critical / High.
33
+ > - **Table rule**: In the **Core findings table**, **finding / impact / suggestion** are each **one sentence** (very short compound sentences allowed).
34
+ > - **Quality over quantity**: A few high-signal findings beat a pile of guesses.
35
+
36
+ ---
37
+
38
+ ## CRITICAL sequential-thinking (Compression rules)
39
+
40
+ > [!IMPORTANT]
41
+ >
42
+ > **Dimension 1 (System Design)**: Before critique, use `sequential-thinking` with **3–5 thoughts** to fix design intent and core assumptions, then check against SD-1..6.
43
+ > **Dimensions 2 (Runtime Simulation) and 3 (Engineering Implementation)**: **Must** each run one `sequential-thinking` round (**3–5 thoughts per dimension**) for sequence reasoning and buildability / verifiability; natural chain-of-thought does **not** replace the CLI obligation for these two dimensions (without CLI you must declare blocking explicitly in output and degrade to “await parent-agent evidence”; do **not** fabricate thought lists).
44
+ > Each thought must answer: what is the premise, which step breaks, and which document anchor breaks.
45
+
46
+ ---
47
+
48
+ ## Task goals (minimal set aligned with challenge)
49
+
50
+ 1. **Load (required)**: `02_ARCHITECTURE_OVERVIEW.md`, all `04_SYSTEM_DESIGN/*.md`, all `03_ADR/*.md`; if challenge context mounts `01_PRD.md`, use it together for cross-consistency.
51
+ 2. **Pre-mortem**: Imagine failure roughly six months out; work backwards to root-cause classes tied directly to design docs (boundary, timing, state, error paths, etc.).
52
+ 3. **Three-dimensional pass**: Complete full-table scan for dimensions 1–3; **assumption verification**: list implicit assumptions and try to falsify them.
53
+ 4. **Deliver**: Produce a severity- and anchor-tagged finding set structured to embed into the “Design review findings” section of `07_CHALLENGE_REPORT.md`.
54
+
55
+ **Hard bounds**: **Evidence-first** (no concrete citation + inference chain → do not enter the findings list); **respect documented ADR trade-offs** (no reopening old decisions without new evidence); **no implementation code-level detail** (review targets are design contracts and buildability semantics).
56
+
57
+ ---
58
+
59
+ ## Inputs & modeling
60
+
61
+ ### What to do
62
+
63
+ Build a readable design model: list components/boundaries, external interface inventory, core state and fault semantics, dependency on ADRs; flag blanks (undeclared protocol, undeclared timeout/degradation, undeclared error-code semantics, etc.).
64
+
65
+ ### Why
66
+
67
+ Table-scanning without a model only yields tag clouds; challenge needs **closure evidence** that merges into a contract model.
68
+
69
+ ### How to verify
70
+
71
+ - You can explain in your own words the system hard boundary and “most likely failure” seams.
72
+ - You listed this round’s file paths (not only directory names).
73
+ - Pre-mortem converges to at least 1–3 testable design-failure pattern hypotheses.
74
+
75
+ ---
76
+
77
+ ## Three-dimension pass
78
+
79
+ ### What to do
80
+
81
+ Walk every row in the three tables below; dimensions 2 and 3 each complete mandatory `sequential-thinking` beforehand; record results only as **candidate findings**, then wrap as DR-ID rows in the next section.
82
+
83
+ ### Why
84
+
85
+ The three-dimensional frame keeps “structure—time—construction” blind spots from being skipped by a single review path.
86
+
87
+ ### How to verify
88
+
89
+ - SD / RS / EI tables all show scan traces: short marginal notes or direct entries into the findings list—not silently skipping entire rows.
90
+ - RS and EI satisfy the **CRITICAL sequential-thinking** obligations above.
91
+ - No duplicate items against clearly recorded and still-valid ADR trade-offs unless new evidence exists.
92
+
93
+ ---
94
+
95
+ ### Dimension 1: System Design
96
+
97
+ **Goal**: Verify architectural completeness, consistency, and boundary clarity.
98
+
99
+ | # | Item | Focus |
100
+ | ---- | ------------------------- | --------------------------------------------------------------------- |
101
+ | SD-1 | **Architectural coherence** | Do descriptions of the same component conflict across PRD, Architecture, System Design? |
102
+ | SD-2 | **Boundary clarity** | Is each system’s scope explicit? Responsibility overlap? |
103
+ | SD-3 | **Dependency sanity** | Acyclic dependencies? Hidden coupling? |
104
+ | SD-4 | **Interface completeness** | Cross-system interfaces fully defined (inputs/outputs/errors/protocol)? |
105
+ | SD-5 | **State management** | Transitions explicit? Edge states handled? |
106
+ | SD-6 | **Data-model completeness** | Entity relationships consistent across docs? Orphan entities? |
107
+
108
+ ---
109
+
110
+ ### Dimension 2: Runtime Simulation
111
+
112
+ **Goal**: “Run” the system mentally; surface timing, state, and concurrency issues.
113
+
114
+ | # | Item | Focus |
115
+ | ---- | ----------------------- | --------------------------------------------------------------------- |
116
+ | RS-1 | **Temporal coherence** | Timing model sane? Contradictions in “must happen before”? |
117
+ | RS-2 | **State synchronization** | Under distribution, can replicas diverge? Is eventual consistency acceptable here? |
118
+ | RS-3 | **Concurrency handling** | What happens when two operations conflict? Is there resolution? |
119
+ | RS-4 | **Boundary conditions** | Empty/full/overflow—how each handled? |
120
+ | RS-5 | **Fault propagation** | How does component A failing affect B/C? Cascading risk? |
121
+ | RS-6 | **Happy-path bias** | Only nominal path designed? Errors / timeouts / partial failure paths? |
122
+
123
+ ---
124
+
125
+ ### Dimension 3: Engineering Implementation
126
+
127
+ **Goal**: Verify the design is buildable, testable, maintainable.
128
+
129
+ | # | Item | Focus |
130
+ | ---- | ---------------------- | --------------------------------------------------------------------- |
131
+ | EI-1 | **Testability** | Can core logic be unit-tested? Mock seams present? |
132
+ | EI-2 | **Maintainability** | If requirements change, how many files move? |
133
+ | EI-3 | **Performance hotspots** | Hidden N+1, unbounded loops, or O(n²)? |
134
+ | EI-4 | **Security surface** | Auth boundary clear? Sensitive data at rest/in transit encrypted? Inputs validated? |
135
+ | EI-5 | **Observability** | From logging/metrics design, can prod issues be debugged? |
136
+ | EI-6 | **Tech fit** | Does chosen stack truly support needed capabilities? Version compatibility? |
137
+
138
+ ---
139
+
140
+ ## Severity scale
141
+
142
+ | Level | Decision rule | Required action |
143
+ | ------------ | ----------------------------------------------- | ------------------------------------------------------------ |
144
+ | **Critical** | Fundamental contradiction or infeasibility. Cannot proceed without fix. | P0 — Fix before blueprint/forge |
145
+ | **High** | Serious risk likely to force rework or failure. | P1 — Fix before forge |
146
+ | **Medium** | Quality risk with workable mitigations. | P2 — Fix during implementation |
147
+ | **Low** | Polish or minor inconsistency. | P3 — Track later |
148
+
149
+ ---
150
+
151
+ ## Findings packaging
152
+
153
+ ### What to do
154
+
155
+ Filter candidates into deliverables: sort by severity; assign `DR-xx` each; fill summary table and (Critical/High only) details; **finding / impact / suggestion** each one sentence; **Document location** uses minimal workable anchors.
156
+
157
+ ### Why
158
+
159
+ Challenge reports need mergeable, auditable blocks; long paraphrase buries P0/P1.
160
+
161
+ ### How to verify
162
+
163
+ - Every row has: dimension, severity, anchor, one-sentence finding, one-sentence impact, one-sentence suggestion.
164
+ - Critical/High have “evidence” and “inference chain” expansions consistent with `sequential-thinking` obligations.
165
+ - Output pastes straight into `07_CHALLENGE_REPORT.md` sections without rewriting header semantics.
166
+
167
+ ---
168
+
169
+ ## Output format
170
+
171
+ Generate findings using the structure below, suitable for `07_CHALLENGE_REPORT.md`:
172
+
173
+ ```markdown
174
+ ## Design review findings
175
+
176
+ ### Summary
177
+
178
+ | Dimension | Count | Critical | High | Medium | Low |
179
+ |-----------|:-----:|:--------:|:----:|:------:|:---:|
180
+ | System design | — | — | — | — | — |
181
+ | Runtime simulation | — | — | — | — | — |
182
+ | Engineering implementation | — | — | — | — | — |
183
+ | **Total** | **—** | **—** | **—** | **—** | **—** |
184
+
185
+ **High-signal conclusions**: [1–3 sentences on what most deserves space in the main challenge report]
186
+
187
+ ---
188
+
189
+ ### Core findings
190
+
191
+ | ID | Dimension | Severity | Document location | Finding | Impact | Suggestion |
192
+ |----|-----------|----------|-------------------|---------|--------|------------|
193
+ | DR-01 | System design | Critical | 02_ARCHITECTURE_OVERVIEW.md §X | Boundary conflict; two overlapping system scopes | Responsibility drift and high rework risk in implementation | Redraw boundaries and update references |
194
+ | DR-02 | Runtime simulation | High | 04_SYSTEM_DESIGN/... §Y | Fault propagation path undefined | Cascading failures may not converge | Add timeout/degradation/retry strategy |
195
+ | DR-03 | Engineering implementation | Medium | ADR-00X / System Design §Z | Insufficient test seams | Higher verification cost later | Add interface isolation or mock seams |
196
+
197
+ > Only emit issues that truly affect design judgment. Low-value wording and duplicated worries must not enter the list.
198
+
199
+ ---
200
+
201
+ ### Top finding details (Critical / High only)
202
+
203
+ #### DR-01 [Title]
204
+
205
+ **Severity**: Critical
206
+ **Document location**: [Exact file + section reference]
207
+
208
+ **Evidence**:
209
+ - Document analysis: [Concrete content from PRD/Architecture/ADR]
210
+ - Inference chain: [Analysis grounded in `sequential-thinking`]
211
+ - Analogy: [Known analogous failure in other systems if applicable]
212
+
213
+ **Impact**:
214
+ - [What happens if unfixed]
215
+
216
+ **Suggestion**:
217
+ - [Smallest corrective direction]
218
+ ```
219
+
220
+ ---
221
+
222
+ ## Quality gate before handoff
223
+
224
+ ### What to do
225
+
226
+ Self-check against the list: citations, inference, severity, ADR respect, actionability; remove duplicates and hollow language; confirm alignment with parent/child merge rules for challenge (single write path to parent agent—this skill only delivers blocks).
227
+
228
+ ### Why
229
+
230
+ Ungated design blocks corrupt the contract model—false P0 or missed real closure gaps.
231
+
232
+ ### How to verify
233
+
234
+ - Every finding has concrete doc references—not vague “architecture doc”.
235
+ - Every finding states impact explicitly; Critical/High satisfy enforced `sequential-thinking` on RS/EI.
236
+ - No pure guesses without inference chains.
237
+ - ADR-recorded trade-offs are not challenged again without new evidence.
238
+ - Suggestions point to minimal fixes reviewers can execute.
239
+
240
+ ---
241
+
242
+ ## Subagent orchestration
243
+
244
+ **Optional**: Where the environment supports parallelization and context split, parent may delegate **each dimension (SD / RS / EI)** or **“doc read-only consolidated summary”** to subagents; subagents return only structured draft findings with anchors and **must not** write `07_CHALLENGE_REPORT.md` themselves.
245
+
246
+ **Parent agent**: Merge into challenge report shape, dedupe, unify DR numbering, execute Step 4.5 and routing language; **single write path**.
247
+
248
+ **Handoff checklist (subagent → parent)**:
249
+
250
+ - Dimensions executed and mandatory file list declared.
251
+ - Each draft contains: dimension, suggested severity, anchor, one-sentence finding / impact / suggestion.
252
+ - RS and EI drafts declare `sequential-thinking` executed or blocking note.
253
+ - No implicit premises conflicting with contracts already loaded by parent; conflicting items isolated under “Needs parent ruling”.
254
+ - After merge, subagent does not independently modify the same report path again.
255
+
256
+ ---
257
+
258
+ <completion_criteria>
259
+ - Read `02_ARCHITECTURE_OVERVIEW.md`, `04_SYSTEM_DESIGN/*`, `03_ADR/*` (and PRD from challenge context if available).
260
+ - Pre-mortem hypotheses and three-dimensional table scan complete; RS and EI satisfy mandatory `sequential-thinking` rounds.
261
+ - Output includes summary table, core findings (finding / impact / suggestion each one sentence), and Critical/High detail blocks—all anchors traceable.
262
+ - Quality gate executed; no evidence-free items, no duplicate laundry lists, ADR trade-offs handled correctly.
263
+ - Role boundary clear: deliver mergeable design evidence blocks; do not replace `/challenge` final judgment and write duties (unless session explicitly designates you parent agent with write ownership).
264
+ </completion_criteria>
@@ -0,0 +1,246 @@
1
+ ---
2
+ name: e2e-testing-guide
3
+ description: Specifies the human-facing E2E / manual verification *Testing Guide* and *E2E Verification* report skeleton (PRD traceability, human walk-through order, verdict columns may only be PASS/PARTIAL_PASS/FAIL); **does not include real-browser orchestration**—order of operations and backfill obligations are fixed by the host **`/forge` §3.7** (per `/forge` wording).
4
+ ---
5
+
6
+ # E2E Testing Guide — Human verification document layer
7
+
8
+ <phase_context>
9
+ You are **E2E GUIDE AUTHOR (verification guide writer)**.
10
+
11
+ **Mission**: Before **executing or being authorized for real-browser testing**, produce *E2E Verification* documentation a reader can follow **as if seeing the product for the first time**: **read-the-screen before action**, honest entries and coverage, each conclusion traceable to PRD/acceptance; do **not** mistake “having written the guide” for “having tested”.
12
+ **Capability**: Context gathering and explicit blocking issues; structured RTM/Surface/Journey enumeration; steps aligned with human exploration order; expected Evidence types; aligning with `/forge` §3.7 on filenames on disk and order of operations.
13
+ **Constraint**: Do not write browser-automation protocols or verdict tiers outside this skill; do **not**, without a real browser run, set `Journey result` / `Step result` to `PASS`; do **not** remove the **hard constraints, mandatory walk-through rules, or required headings/tables below** (you may only compress repetitive asides).
14
+ **Relationship to sub-agents**: The parent session exclusively owns **TARGET_DIR/wave-{N}-e2e.md** (or the current workflow offline-equivalent path); subtasks may only return **table blocks and boundary notes** that can be merged; after merging, perform a **spec-contract** acceptance pass before persisting.
15
+ **Output goal**: Satisfy the Markdown skeleton in **Required output**; real-browser backfill runs in **`/forge` §3.7** step two after authorization.
16
+ </phase_context>
17
+
18
+ ---
19
+
20
+ ## CRITICAL methodological anchors
21
+
22
+ > [!IMPORTANT]
23
+ > The guide is a “proof plan that can be walked,” not green-check theater.
24
+ >
25
+ > - **See first, believe second**: Establish read-screen expectations and traceable PRD anchors before actions and visible outcomes; a chain of clicks with no UI narrative is an unacceptable step.
26
+ > - **Honest coverage for human habits**: A happy path alone is insufficient; primary/secondary CTAs within scope, tabs, navigation chrome, and common combinations (filters/pagination/back/deep links, etc.) must appear as Steps or, in **Coverage gaps**, document why they are out of scope.
27
+ > - **Scarce tiers mean discipline**: Verdict semantics are only three tiers PASS / PARTIAL_PASS / FAIL; **forbidden** to invent “passed but…,” “mostly done,” or other pseudo-green lights.
28
+ > - **Tables and narrative stay aligned**: Surfaces claimed in Surface tables **must not** diverge from Journey/Step entries; each Finding needs a PRD ref and reproducible wording.
29
+
30
+ ---
31
+
32
+ ## CRITICAL: spec contract (verdict columns + traceability)
33
+
34
+ ### Allowed literal Journey / Step “results” (only these three)
35
+
36
+ - **`PASS`**: Supported by evidence **after authorized real-browser Evidence backfill**, matching PRD behavior and perceived UI.
37
+ - **`PARTIAL_PASS`**: Core value reachable but **documented** gaps remain (say in `Notes` / `Findings` what did not fully close); **forbidden** to wash a failure into PASS with vague language.
38
+ - **`FAIL`**: Does not meet PRD / acceptance or blocks continuation (may stay FAIL until fixed and retested).
39
+
40
+ **Strictly no “fake PASS”**: Guide-only drafts, `guide-only`, static review alone, or no user-authorized browser backfill completed: **`Journey result` / `Step result` stay blank**, or use **`pending real-browser run`** — **never** fill `PASS` / `PARTIAL_PASS` / `FAIL` pretending verification happened. Invented URLs, screenshots, or network conclusions likewise.
41
+
42
+ ### PRD traceability (equivalent to hard constraints)
43
+
44
+ Any RTM row, Surface, `PRD ref`/`PRD reference`, Journey, Step, or `Findings` line: **must** point to a **PRD anchor** or **task acceptance item** (e.g. `T-x`); if no PRD, declare a “pseudo-PRD” source in **Scope**. **Steps without anchors** do not belong in main tables or go into **Coverage gaps** with rationale for not testing.
45
+
46
+ **Paired skill**: Host workflow **`/forge` §3.7 — wave-end E2E** (trigger, close-out A/B, `wave-{N}-e2e.md` path, `guide-only` boundaries) is authoritative per **`forge`** text; this file does not repeat the full workflow but must not conflict.
47
+
48
+ ---
49
+
50
+ ## When to invoke
51
+
52
+ - Any task in `05A_TASKS.md` mentions **E2E testing** or **manual verification**, or `05B_VERIFICATION_PLAN.md` requires real-browser verification; or changes affect flows that depend on hands-on perception (pages, navigation, forms, auth, etc.).
53
+ - The user explicitly asks for a “test guide,” “E2E report,” “browser verification checklist,” etc.
54
+
55
+ ---
56
+
57
+ ## Hard constraints
58
+
59
+ - **PRD / acceptance traceability**: Tables and steps point to PRD or acceptance rows; declare source in Scope when PRD is missing.
60
+ - **Human-style coverage (written in the guide)**: Reflect navigation chrome, empty states, secondary routes, primary/secondary CTAs, tabs, inline actions, etc. **within scope** in **Surface coverage** or **Journey/Step**; deliberate omissions go in **Coverage gaps**.
61
+ - **Do not fabricate results**: Without real browser runs, leave blank or `pending real-browser run`; **never** write `PASS` without a run (or fake green wording). Verdict columns may only use **PASS / PARTIAL_PASS / FAIL**, and only after an evidence chain exists.
62
+ - **Evidence columns**: URLs, screenshots, logs, etc. backfilled in `/forge` browser phase after user authorization; the guide phase states **what Evidence to capture**.
63
+ - **Side effects**: Any step involving login, DB writes, payments, prod-equivalent writes, etc. **must flag upfront that user authorization is required**.
64
+ - **Concision**: `Findings` / `Coverage gaps` / `Notes`—**one issue per line, one sentence** (PRD ref allowed); no duplicate phrasing of the same gap.
65
+
66
+ ---
67
+
68
+ ### Write the guide the way humans use the product (mandatory)
69
+
70
+ 1. **Real entry points**: Start from real user arrivals (home, deep link, email links, etc.); unless the task explicitly says otherwise, **do not** default to Storybook / debug shells.
71
+ 2. **Look then act**: Before each step describe what structure/copy should appear on screen, then action; no “next clicks” without UI description.
72
+ 3. **Navigate the chrome**: Top bar, side nav, user menu, settings, help, breadcrumbs, back routes—what **humans will click—**appear once in Surface or Journey or land in Coverage gaps with reason.
73
+ 4. **Sweep leaf-screen affordances**: Per screen map at least one Step each for **primary CTA plus visible secondary actions** (overflow menus, row buttons, tabs); **never** collapse to a single happy path.
74
+ 5. **Common combinations**: Reflect filter+sort+pagination, refresh, back, copy URL reopen, keyboard reachability of primary actions as the product warrants; omit only with **Coverage gaps** rationale.
75
+ 6. **Data shapes**: For lists/tables spell expectations for zero / one / many rows (preparation steps when feasible; otherwise log in **Blockers**).
76
+
77
+ ---
78
+
79
+ ## Authoring flow (documentation only)
80
+
81
+ ### 1. Read context
82
+
83
+ #### What
84
+
85
+ Read tasks plus `05A_TASKS.md`, `05B_VERIFICATION_PLAN.md`, `01_PRD.md` (or **`inputs`** / requirements pointers), routing and screen notes, how to boot, accounts and roles; record missing URLs / credentials / environment into **Blockers**.
86
+
87
+ #### Why
88
+
89
+ Without boundaries Surface/Journey drifts; front-loading Blockers avoids “discovery fails halfway through writing.”
90
+
91
+ #### How to validate
92
+
93
+ Known gaps live in Blockers; **do not treat assumptions as fact** labeled PASS.
94
+
95
+ ---
96
+
97
+ ### 2. PRD alignment table (RTM)
98
+
99
+ #### What
100
+
101
+ Build **PRD ↔ Journey** mapping; if no PRD, first column uses **task acceptance T-x**, Scope footnotes “pseudo-PRD” source. **Optional sub-agent pattern**: one sub-session only fills the skeleton table plus a one-line Blocker summary what cannot fit; parent dedupes, unifies `PRD ref`, aligns with Surface.
102
+
103
+ #### Why
104
+
105
+ Contract grid first, then human paths; avoids long journeys disconnected from acceptance.
106
+
107
+ #### How to validate
108
+
109
+ Every acceptance/PRD item to be tested appears **at least once** or is explained under **Coverage gaps**.
110
+
111
+ | PRD reference | Requirement summary | Priority P0/P1/P2 | Planned Journeys |
112
+ | ------ | ---- | ------------ | ------------- |
113
+
114
+ ---
115
+
116
+ ### 3. Feature surface inventory (Surface)
117
+
118
+ #### What
119
+
120
+ Enumerate surfaces: **how users discover**, not only routing dumps; forbid “routes only developers know” instead of “what users see first.” Optional sub-draft merged by parent.
121
+
122
+ #### Why
123
+
124
+ Surface is the human entry map; cross-check against Journey tables.
125
+
126
+ #### How to validate
127
+
128
+ **Mapped Journey** column lines up Journey IDs below row-by-row or has gap rows.
129
+
130
+ | Surface / entry | How users discover it | Mapped Journey | PRD ref |
131
+ | -------- | ------ | ---------- | ------ |
132
+
133
+ ---
134
+
135
+ ### 4. Journeys and granular steps
136
+
137
+ #### What
138
+
139
+ Per Journey: **PRD, role, start, goal**; **Step = human operation order.** Each Step has three clauses:
140
+ **(1)** Read-screen expectation **(2)** Action **(3)** Observable outcome + Evidence type (e.g. full-page screenshot, specific 200 response).
141
+ Cover: **core success, cold start / empty, typical errors, simple boundary (refresh/back/deep link), at least one viewport** (if desktop-only, state it plainly).
142
+
143
+ Optional sub-agents slice by individual Journey; after parent merge verify **Coverage gaps** / **Surface** alignment.
144
+
145
+ #### Why
146
+
147
+ Steps are the single execution truth source; fuzzy granularity prevents per-item Evidence later.
148
+
149
+ #### How to validate
150
+
151
+ No “clicks out of nowhere”; Evidence expectations actionable; aligned with six mandatory human-writing rules above.
152
+
153
+ ---
154
+
155
+ ### 5. Execution plan (short optional prose)
156
+
157
+ One paragraph covering `Target` / `Environment` / `Role` / `Data setup` / `Side effects` / `Blockers`. **Do not** write host browser tap sequences (real runs follow **`/forge` §3.7**).
158
+
159
+ ---
160
+
161
+ ## Output format (Required output)
162
+
163
+ Use the Markdown below **verbatim as the report skeleton**; **do not** drop section names or table headers arbitrarily. Assume the executor is **a human opening the product for the first time**.
164
+
165
+ ```markdown
166
+ <!--
167
+ Verdict semantics (Journey result / Step result): only PASS | PARTIAL_PASS | FAIL.
168
+ Until user authorization AND browser Evidence backfill complete: leave blank or write "pending real-browser run" — never any verdict, never invent other tiers or euphemisms.
169
+ -->
170
+
171
+ ## E2E Verification
172
+
173
+ ### Scope
174
+ - PRD / requirement source:
175
+ - Target:
176
+ - Environment:
177
+ - Browser / Viewport (planned):
178
+ - User Role:
179
+ - Build / Commit:
180
+
181
+ ### PRD traceability (RTM)
182
+ | PRD ref | Summary | Priority | Journeys |
183
+ | --- | --- | --- | --- |
184
+
185
+ ### Surface coverage
186
+ | Surface / entry | How discovered | Journey | PRD ref | Notes |
187
+ | --- | --- | --- | --- | --- |
188
+
189
+ ### Journeys (journey level)
190
+ | ID | PRD ref | User Journey | Journey result | Evidence | Notes |
191
+ | --- | --- | --- | --- | --- | --- |
192
+
193
+ ### Step breakdown
194
+ | Journey | Step | PRD ref | Step result | Evidence | Notes |
195
+ | --- | --- | --- | --- | --- | --- |
196
+
197
+ ### Findings
198
+ - [HIGH/MEDIUM/LOW] Title
199
+ - PRD ref:
200
+ - Expected / Actual / Repro / Evidence / Suggested fix:
201
+
202
+ ### Coverage gaps
203
+ - Scope not in journeys or not planned for real-browser run, with reason
204
+
205
+ ### Recommendation
206
+ - Merge/publish/fix-first guidance (based on guide plus known real results; if not yet run, say so)
207
+ ```
208
+
209
+ ---
210
+
211
+ ## Snippet templates (trim into Journeys)
212
+
213
+ - **Auth**: Visitor hits protected page → success/failure/empty fields/session expiry messaging.
214
+ - **Forms**: Required fields and validation, success feedback, failure without losing filled data.
215
+ - **Lists**: Empty/loading/populated, filter/sort/pagination, path back from no results.
216
+ - **Navigation**: Primary nav, back, deep links, critical actions not obscured.
217
+
218
+ ---
219
+
220
+ ## Quality bar
221
+
222
+ Readers can walk all in-scope affordances **without reading code**; each item maps to PRD or acceptance; **Surface and Journey must not contradict each other**.
223
+
224
+ ---
225
+
226
+ ## Handoff checklist (orchestration / browser backfill / merge)
227
+
228
+ - [ ] Per **`/forge` §3.7**, decided whether to persist `wave-{N}-e2e.md` (or workflow offline equivalent).
229
+ - [ ] Scope, Blockers, side effects, and **Coverage gaps** are honest; “pseudo-PRD” noted when no PRD.
230
+ - [ ] Required output skeleton **has full table headers**; `PRD ref` **no orphan steps**.
231
+ - [ ] **Spec contract**: no PASS/PARTIAL_PASS/FAIL without real run; parent spot-checks for “fake PASS.”
232
+ - [ ] Optional sub-drafts deduped into master tables; **parent session writes once**.
233
+ - [ ] No emoji anywhere in the document.
234
+
235
+ ---
236
+
237
+ <completion_criteria>
238
+ - [ ] **CRITICAL methodological anchors** and **spec contract** (three verdict tiers + PRD traceability + no fake PASS) visibly honored in output
239
+ - [ ] **Write the guide the way humans use the product** six rules reflected in Surface / Journey / Step or Coverage gaps
240
+ - [ ] **Hard constraints** not weakened (traceability, coverage, Evidence, side effects, no fabrication)
241
+ - [ ] Required output structure and **canonical headers** kept; extra notes only add, do not delete columns
242
+ - [ ] Verdict columns only **PASS / PARTIAL_PASS / FAIL** or blank / `pending real-browser run`; no verdict without explicit authorization and evidence
243
+ - [ ] Surface table **cross-indexes** Journey/Step without structural conflict
244
+ - [ ] `/forge` §3.7 deliverables (path, close-out A/B, `guide-only`) align with or map to host forge text
245
+ - [ ] No emoji in the full document
246
+ </completion_criteria>