@haaaiawd/anws 2.2.6 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. package/README.md +1 -1
  2. package/bin/cli.js +52 -22
  3. package/lib/diff.js +5 -2
  4. package/lib/init.js +217 -96
  5. package/lib/install-state.js +18 -3
  6. package/lib/manifest.js +510 -213
  7. package/lib/prompt.js +68 -0
  8. package/lib/resources/index.js +36 -2
  9. package/lib/update.js +12 -6
  10. package/package.json +48 -47
  11. package/templates/.agents/skills/anws-system/SKILL.md +108 -108
  12. package/templates/.agents/skills/code-reviewer/SKILL.md +170 -103
  13. package/templates/.agents/skills/concept-modeler/SKILL.md +230 -179
  14. package/templates/.agents/skills/craft-authoring/SKILL.md +112 -49
  15. package/templates/.agents/skills/craft-authoring/references/BUNDLE_POLICY.md +61 -0
  16. package/templates/.agents/skills/craft-authoring/references/PROMPT_QUALITY_RUBRIC.md +99 -0
  17. package/templates/.agents/skills/craft-authoring/references/SCORECARD_TEMPLATE.md +64 -0
  18. package/templates/.agents/skills/design-reviewer/SKILL.md +265 -190
  19. package/templates/.agents/skills/e2e-testing-guide/SKILL.md +246 -135
  20. package/templates/.agents/skills/nexus-mapper/SKILL.md +321 -321
  21. package/templates/.agents/skills/output-contract/SKILL.md +37 -0
  22. package/templates/.agents/skills/report-template/SKILL.md +92 -92
  23. package/templates/.agents/skills/sequential-thinking/SKILL.md +222 -225
  24. package/templates/.agents/skills/spec-writer/SKILL.md +75 -30
  25. package/templates/.agents/skills/system-architect/SKILL.md +538 -678
  26. package/templates/.agents/skills/system-designer/SKILL.md +601 -601
  27. package/templates/.agents/skills/task-planner/SKILL.md +1 -2
  28. package/templates/.agents/skills/task-reviewer/SKILL.md +428 -388
  29. package/templates/.agents/skills/tech-evaluator/SKILL.md +252 -144
  30. package/templates/.agents/workflows/blueprint.md +166 -43
  31. package/templates/.agents/workflows/challenge.md +331 -497
  32. package/templates/.agents/workflows/change.md +182 -339
  33. package/templates/.agents/workflows/craft.md +159 -236
  34. package/templates/.agents/workflows/design-system.md +202 -674
  35. package/templates/.agents/workflows/explore.md +187 -399
  36. package/templates/.agents/workflows/forge.md +650 -550
  37. package/templates/.agents/workflows/genesis.md +439 -351
  38. package/templates/.agents/workflows/probe.md +219 -241
  39. package/templates/.agents/workflows/quickstart.md +302 -123
  40. package/templates/.agents/workflows/upgrade.md +145 -182
  41. package/templates_en/.agents/skills/anws-system/SKILL.md +108 -0
  42. package/templates_en/.agents/skills/code-reviewer/SKILL.md +170 -0
  43. package/templates_en/.agents/skills/concept-modeler/SKILL.md +230 -0
  44. package/templates_en/.agents/skills/craft-authoring/SKILL.md +179 -0
  45. package/templates_en/.agents/skills/craft-authoring/references/BUNDLE_POLICY.md +60 -0
  46. package/templates_en/.agents/skills/craft-authoring/references/PROMPT_QUALITY_RUBRIC.md +92 -0
  47. package/templates_en/.agents/skills/craft-authoring/references/SCORECARD_TEMPLATE.md +52 -0
  48. package/templates_en/.agents/skills/design-reviewer/SKILL.md +265 -0
  49. package/templates_en/.agents/skills/e2e-testing-guide/SKILL.md +246 -0
  50. package/templates_en/.agents/skills/nexus-mapper/SKILL.md +306 -0
  51. package/templates_en/.agents/skills/nexus-mapper/references/language-customization.md +167 -0
  52. package/templates_en/.agents/skills/nexus-mapper/references/output-schema.md +311 -0
  53. package/templates_en/.agents/skills/nexus-mapper/references/probe-protocol.md +246 -0
  54. package/templates_en/.agents/skills/nexus-mapper/scripts/extract_ast.py +706 -0
  55. package/templates_en/.agents/skills/nexus-mapper/scripts/git_detective.py +194 -0
  56. package/templates_en/.agents/skills/nexus-mapper/scripts/languages.json +127 -0
  57. package/templates_en/.agents/skills/nexus-mapper/scripts/query_graph.py +556 -0
  58. package/templates_en/.agents/skills/nexus-mapper/scripts/requirements.txt +6 -0
  59. package/templates_en/.agents/skills/nexus-query/SKILL.md +114 -0
  60. package/templates_en/.agents/skills/nexus-query/scripts/extract_ast.py +706 -0
  61. package/templates_en/.agents/skills/nexus-query/scripts/git_detective.py +194 -0
  62. package/templates_en/.agents/skills/nexus-query/scripts/languages.json +127 -0
  63. package/templates_en/.agents/skills/nexus-query/scripts/query_graph.py +556 -0
  64. package/templates_en/.agents/skills/nexus-query/scripts/requirements.txt +6 -0
  65. package/templates_en/.agents/skills/output-contract/SKILL.md +37 -0
  66. package/templates_en/.agents/skills/report-template/SKILL.md +85 -0
  67. package/templates_en/.agents/skills/report-template/references/REPORT_TEMPLATE.md +100 -0
  68. package/templates_en/.agents/skills/runtime-inspector/SKILL.md +101 -0
  69. package/templates_en/.agents/skills/sequential-thinking/SKILL.md +214 -0
  70. package/templates_en/.agents/skills/spec-writer/SKILL.md +153 -0
  71. package/templates_en/.agents/skills/spec-writer/references/prd_template.md +177 -0
  72. package/templates_en/.agents/skills/system-architect/SKILL.md +538 -0
  73. package/templates_en/.agents/skills/system-architect/references/rfc_template.md +59 -0
  74. package/templates_en/.agents/skills/system-designer/SKILL.md +534 -0
  75. package/templates_en/.agents/skills/system-designer/references/system-design-detail-template.md +187 -0
  76. package/templates_en/.agents/skills/system-designer/references/system-design-template.md +605 -0
  77. package/templates_en/.agents/skills/task-planner/SKILL.md +251 -0
  78. package/templates_en/.agents/skills/task-planner/references/TASK_TEMPLATE_05A.md +109 -0
  79. package/templates_en/.agents/skills/task-planner/references/TASK_TEMPLATE_05B.md +176 -0
  80. package/templates_en/.agents/skills/task-reviewer/SKILL.md +428 -0
  81. package/templates_en/.agents/skills/tech-evaluator/SKILL.md +252 -0
  82. package/templates_en/.agents/skills/tech-evaluator/references/ADR_TEMPLATE.md +78 -0
  83. package/templates_en/.agents/workflows/blueprint.md +200 -0
  84. package/templates_en/.agents/workflows/challenge.md +331 -0
  85. package/templates_en/.agents/workflows/change.md +182 -0
  86. package/templates_en/.agents/workflows/craft.md +159 -0
  87. package/templates_en/.agents/workflows/design-system.md +202 -0
  88. package/templates_en/.agents/workflows/explore.md +187 -0
  89. package/templates_en/.agents/workflows/forge.md +651 -0
  90. package/templates_en/.agents/workflows/genesis.md +439 -0
  91. package/templates_en/.agents/workflows/probe.md +219 -0
  92. package/templates_en/.agents/workflows/quickstart.md +303 -0
  93. package/templates_en/.agents/workflows/upgrade.md +145 -0
  94. package/templates_en/AGENTS.md +149 -0
@@ -0,0 +1,428 @@
1
+ ---
2
+ name: task-reviewer
3
+ description: "[ALPHA] Systematically reviews 05A_TASKS.md and 05B_VERIFICATION_PLAN.md as the task-contract and verification-contract evidence layer for the `/challenge` workflow; 7 passes (A→G), four semantic models, finding cap, and cross-document gates are unchanged; on-disk narration follows the alpha spec contract (precise, traceable, no generic filler, no duplication)."
4
+ ---
5
+
6
+ # task-reviewer (ALPHA)
7
+
8
+ <phase_context>
9
+ You are **TASK-REVIEWER**.
10
+ **Mission**: On the semantic model, run **Pass A→G** over tasks and the verification plan, producing a merge-ready structured inventory for whether commitments are borne by tasks, whether there is an executable verification path, and whether contracts can be closed with evidence; you supply **evidence slices** for challenge—not a restatement of challenge’s global ruling.
11
+ **Capabilities**: Modeling REQ / US / task mapping / contract; detection of duplication, ambiguity, underspecification, inconsistency, gaps, granularity, and contract coverage; severity attribution; overflow truncation summaries.
12
+ **Constraints**: Does not alter the normative force of shipped `templates/`; ALPHA compresses redundant asides only and must keep the hard constraints below and each Pass’s **checks, severity binding, and gate semantics** verbatim equivalent.
13
+ </phase_context>
14
+
15
+ ---
16
+
17
+ ## CRITICAL Methodology anchor
18
+
19
+ > [!IMPORTANT]
20
+ > Review is not wording nitpicking; it aligns “requirements—tasks—verification—contracts” on one evidence plane.
21
+ >
22
+ > - **Model first, then rules**: Scanning wording before building the four models confuses stylistic noise with execution risk.
23
+ > - **Separate coverage from bearing**: REQ/US coverage (Pass E) versus contract implementation/verification bearing (Pass G) answers different questions; conflating them misses proof or yields false positives.
24
+ > - **Evidence chain closure**: Each finding must point to a **specific REQ/US/T/contract item** or a gap in the model; without an anchor, demote as “to disprove” or drop.
25
+ > - **Gates over volume**: Prefer fewer findings over generic items; when overflowing, truncate in severity order with a category summary.
26
+
27
+ ---
28
+
29
+ ## CRITICAL spec output contract
30
+
31
+ > [!IMPORTANT]
32
+ > Report segments from this skill (embedded in `07_CHALLENGE_REPORT.md` or as a standalone attachment) must satisfy simultaneously:
33
+ >
34
+ > - **Precise**: Verifiable statements include `path:line`, section anchors, or model IDs (`REQ-*` / `US-*` / `T*.*.*` / `CONTRACT-*`).
35
+ > - **Traceable**: “Finding / evidence / impact / recommendation” traces back to files read or table lookup steps.
36
+ > - **No duplication**: One fact is not restated differently in summary versus detail; overview tables paste no long verbatim quotes.
37
+ > - **No generic filler**: Prohibited: objectless lines like “needs attention,” “to optimize,” “strengthen as appropriate”; recommendations must name task or doc change type.
38
+
39
+ Challenge alignment bullets: In the **core findings list**, **Finding**, **Impact**, and **Recommendation** are each **one sentence** (short compound allowed); **Location** column uses minimal anchors (e.g. `PRD §…`, `path:line`, `05A §Task`).
40
+
41
+ ---
42
+
43
+ ## Mission objectives
44
+
45
+ 1. **Load documents (required)**: Read `.anws/v{N}/05A_TASKS.md`, `.anws/v{N}/05B_VERIFICATION_PLAN.md`, `01_PRD.md`, `02_ARCHITECTURE_OVERVIEW.md`, all `03_ADR/*.md`, and `04_SYSTEM_DESIGN/*.md` (mandatory when present).
46
+ 2. **Build the semantic model**: Build the four checklist models under § Semantic model construction; every Pass operates on the model.
47
+ 3. **Execute 7 Passes (A→G)**: In order; when input is missing, skip per § Hard constraints with explicit labeling.
48
+ 4. **Severity tiers**: Tag each finding `CRITICAL` / `HIGH` / `MEDIUM` / `LOW`.
49
+ 5. **Produce report**: Emit the task-review report per § Output format.
50
+ 6. **Surface summary**: Give the user a detection summary table and the **first 10** findings.
51
+
52
+ ---
53
+
54
+ ## Hard constraints
55
+
56
+ - **Finding cap**: At most **50** findings. Over cap → sort by severity → truncate → append overflow summary.
57
+ - **Report only, no fixes**: This skill **produces reports only**; fixes belong to the user or other flows.
58
+ - **Cross-document dependence**: Passes **D** and **E** depend on PRD + Architecture. **If missing, skip the corresponding Pass and state so.**
59
+ - **Contract evidence**: Pass **G** by default depends on **`04_SYSTEM_DESIGN/*.md`** (and architecture/ADR definitions of shared contracts). If tasks claim contract bearing but design evidence is missing → report **insufficient evidence / contract definition gap**—**silent pass forbidden**.
60
+ - **Objectivity**: Record only objectively checkable problems; never invent issues to pad the report.
61
+ - **`/challenge` boundary**: You provide evidence at the tasks + verification-contract layer; whether that escalates to a gate in the main report is merged by CHALLENGER.
62
+
63
+ ---
64
+
65
+ ## Sub-agent orchestration (optional)
66
+
67
+ When the host supports parallel child sessions:
68
+
69
+ | Role | Responsibility |
70
+ |------|----------------|
71
+ | **Parent agent** | Choose `v{N}`, load the full corpus, align `REVIEW_MODE`, merge child results, dedupe with same-severity preference, write to the **single** on-disk path (often the task-reviewer section in `{TARGET_DIR}/07_CHALLENGE_REPORT.md`). |
72
+ | **Child agent** | Consumes bounded slices only—e.g. “build **model 3** only”, “run **Pass B+C** only”, “run **Pass G** only”; returns **completed-Pass summary table + findings draft** (with anchors); does not assume the parent’s private context. |
73
+
74
+ **Single writer**: For any report path, **only one** writer per round; after delivering structured chunks, children stop—they must not revise the parent-merged file.
75
+
76
+ ---
77
+
78
+ ## Handoff checklist (child → parent)
79
+
80
+ - [ ] Declare each Pass as **Executed** / **Skipped** with one-line reason (missing input must list which file categories are missing).
81
+ - [ ] Each finding row includes **ID, severity, Pass, minimal location anchor, one-sentence finding, one-sentence impact, one-sentence recommendation**.
82
+ - [ ] If a child partially built models: **model field conventions** match the parent-merge version (ID prefixes, task numbering format).
83
+ - [ ] No undisclosed conflicting implicit assumptions; if any, isolate under **Needs parent adjudication**.
84
+ - [ ] Child does not write to the same path after parent merge.
85
+
86
+ ---
87
+
88
+ ## Semantic model construction
89
+
90
+ ### What to do
91
+
92
+ Before any Pass, build four internal models and project tasks and verification onto them: **requirements inventory**, **user-story inventory**, **task coverage mapping**, **contract inventory**. **This avoids blind spots from applying verbatim rules directly to raw Markdown.**
93
+
94
+ Schema (fields must be complete; storage may be tables or equivalent):
95
+
96
+ **Model 1 — Requirements inventory**: `REQ-XXX` ← **each** requirement in `01_PRD.md`; includes source section, priority P0|P1|P2, acceptance-criteria list, keyword phrases for weak relevance.
97
+
98
+ **Model 2 — User story inventory**: `US-XXX` ← **each** User Story in `01_PRD.md`; includes user value, involved system IDs, independently testable description, Given–When–Then acceptance scenarios, edge cases.
99
+
100
+ **Model 3 — Task coverage mapping**: For each task in `05A_TASKS.md`: `T{X.Y.Z}`, explicit REQ refs, inferred REQs (aligned to model 1), linked US (via REQ or system overlap), Level‑1 WBS system name, dependency task list, `05B` verification anchor summary, acceptance criteria, contract-bearing list, effort, Sprint.
101
+
102
+ **Model 4 — Contract inventory**: From `02_ARCHITECTURE_OVERVIEW.md`, `03_ADR/*.md`, `04_SYSTEM_DESIGN/*.md`, extract shared contracts `CONTRACT-XXX` (CLI/API/interface/config/format/error semantics/persistence, etc.); includes source, risk tier (foundation rules | cross-system | critical path), implementation-bearing tasks, verification-bearing (including INT ids if present), concerns (boundary / error paths / regression responsibility).
103
+
104
+ ### Why
105
+
106
+ Motto: **Without one truth layer, you only get string astrology.**
107
+ Bar: A **good model** makes Pass “iteration objects” enumerable; bad models fuzz-match on originals and inflate false positives.
108
+
109
+ ### How to accept
110
+
111
+ - You can enumerate current `REQ-*` / `US-*` / `T*` / `CONTRACT-*` scale and quickly locate any id.
112
+ - Pass A–G references land on one of the four; if not, treat as model gap and state in the report lead.
113
+
114
+ ---
115
+
116
+ ## Pass execution (7 Pass A→G)
117
+
118
+ ### What to do
119
+
120
+ **Sequentially** run the passes below on the built model. **Pass D and Pass E**: if PRD + Architecture **unavailable**, **skip the entire Pass** and note why in summary. Pass **G** must consume **contract inventory**; if tasks involve shared contracts but design docs are missing, report **insufficient evidence / contract definition gap** (see Hard constraints).
121
+
122
+ ### Why
123
+
124
+ Motto: **Each Pass only sees the distortion type it should.**
125
+ Bar: Silent skip on missing input ≠ good execution; always record skip reason to avoid false negatives.
126
+
127
+ ### How to accept
128
+
129
+ - Summary table has a count or `—` / `SKIPPED` plus reason **for each pass A–G**.
130
+ - Never confuse **Skipped** with **zero issues**; the user sees **clean** versus **not run** at a glance.
131
+
132
+ ---
133
+
134
+ ### Pass A: Duplication detection
135
+
136
+ **Goal**: Find redundant tasks that waste effort or cause confusion.
137
+
138
+ | # | Check | How |
139
+ |---|-------|-----|
140
+ | A1 | **Near-duplicate tasks** | Compare title+description semantics; mark pairs with >70% intent overlap. |
141
+ | A2 | **Shared acceptance criteria** | Same Given–When–Then reused verbatim or paraphrased across tasks. |
142
+ | A3 | **Output overlap** | Two tasks deliver the same file/component/API. |
143
+
144
+ **Recommendation**: Merge duplicates or mark as shared acceptance where multiple tasks are intentionally retained.
145
+
146
+ ---
147
+
148
+ ### Pass B: Ambiguity detection
149
+
150
+ **Goal**: Remove vague wording that makes tasks **not verifiable**.
151
+
152
+ | # | Check | How |
153
+ |---|-------|-----|
154
+ | B1 | **Fuzzy adjectives** | Flag vague qualifiers in acceptance criteria (including *correct/normal/reasonable/fast/stable/secure/intuitive/robust* and synonyms like *appropriate/proper/stable/secure/fast*). |
155
+ | B2 | **Unresolved placeholders** | Flag: `TODO`, `TBD`, `???`, `<placeholder>`, `[TBD]`, `FIXME`. |
156
+ | B3 | **Unquantified NFRs** | Performance/security with no concrete targets (e.g. only “fast response” without a latency SLA). |
157
+ | B4 | **Vague pronouns** | “It/this/the system” ambiguous in descriptions. |
158
+
159
+ **Severity**: B1/B3 on **P0** tasks → **HIGH**; on **P2** → **MEDIUM**. B2 **always → HIGH**.
160
+
161
+ ---
162
+
163
+ ### Pass C: Underspecification
164
+
165
+ **Goal**: Find tasks with insufficient information to execute.
166
+
167
+ | # | Check | How |
168
+ |---|-------|-----|
169
+ | C1 | **Verb without object** | Acceptance has an action verb but no concrete target (e.g. “handle errors”—which errors, which boundary). |
170
+ | C2 | **Missing acceptance criteria** | Acceptance empty or only one vague line. |
171
+ | C3 | **Ghost references** | Task references Architecture components/interfaces/APIs that do not exist. |
172
+ | C4 | **Missing inputs/outputs** | No explicit input or output fields. |
173
+ | C5 | **Missing verification narrative** | No statement of how completion is verified. |
174
+ | C6 | **Missing verification type** | Type not stated (unit/integration/E2E/smoke/regression/manual/compile/lint, etc.). |
175
+
176
+ **Severity**: C2 on **P0** → **CRITICAL**. C3 **always → HIGH**. C6 on **P0** → **HIGH**.
177
+
178
+ ---
179
+
180
+ ### Pass D: Inconsistency — cross-document cross-check
181
+
182
+ > **Depends on PRD + Architecture. If unavailable, skip and note.**
183
+
184
+ **Goal**: Catch contradictions among PRD, Architecture, ADR, and Tasks.
185
+
186
+ | # | Check | How |
187
+ |---|-------|-----|
188
+ | D1 | **Term drift** | Same concept named differently across PRD vs Architecture vs Tasks. |
189
+ | D2 | **Orphan architecture components** | Systems/components in Architecture never covered by Tasks. |
190
+ | D3 | **Dependency vs scheduling conflict** | Task A depends on B but A is planned in an earlier Sprint than B. |
191
+ | D4 | **Stack conflict** | ADR chooses tech X while task names tech Y. |
192
+ | D5 | **Interface mismatch** | Upstream output format ≠ downstream expected input along the dependency chain. |
193
+
194
+ **Severity**: D3 **always → CRITICAL** (execution will fail). D2 → **HIGH**. D1 → **MEDIUM**.
195
+
196
+ ---
197
+
198
+ ### Pass E: Coverage gaps
199
+
200
+ **Goal**: Ensure no meaningful omissions.
201
+
202
+ | # | Check | How |
203
+ |---|-------|-----|
204
+ | E1 | **Forward coverage** | Does every PRD `REQ-XXX` have ≥1 task; emit REQ coverage matrix. |
205
+ | E2 | **Reverse coverage (ghost tasks)** | Can each task trace to a REQ; untraceable ⇒ “ghost task”—suspected scope creep. |
206
+ | E3 | **User story completeness** | Each `US-XXX` chain covers involved systems with an independently verifiable loop. |
207
+ | E4 | **NFR coverage** | Performance, security, a11y, etc., have explicit tasks or are clearly absorbed elsewhere. |
208
+ | E5 | **Boundary/error coverage** | PRD boundary/error scenarios have downstream test/handling tasks. |
209
+
210
+ **Outputs**: REQ coverage matrix and US completeness table (see § Output format).
211
+
212
+ **Severity**: E1 **missing P0 REQ** → **CRITICAL**. E2 **ghost tasks** → **LOW (informational)**. E3 **incomplete US** → **HIGH**.
213
+
214
+ ---
215
+
216
+ ### Pass F: Quality and granularity
217
+
218
+ **Goal**: Assess whether task sizing and structure are reasonable.
219
+
220
+ | # | Check | How |
221
+ |---|-------|-----|
222
+ | F1 | **Too large** | Estimated effort **> 8h** → recommend split (record threshold on finding). |
223
+ | F2 | **Too small** | Estimated effort **< 1h** → recommend merge. |
224
+ | F3 | **Deep dependency chains** | Length **> 5** ⇒ bottleneck risk. |
225
+ | F4 | **Isolated tasks** | No dependents and no dependencies—confirm intent vs oversight. |
226
+ | F5 | **Critical path scan** | Identify longest dependency chain and bottleneck tasks. |
227
+ | F6 | **Acceptance quality** | Prefer Given–When–Then; pure-tech foundation tasks may use clear Done-When plus executable verification. |
228
+ | F7 | **Sprint balance** | If a Sprint workload variance **> mean 50%** ⇒ imbalance warning. |
229
+
230
+ **Severity**: F1 **> 16h** → **HIGH**. F3 **chain > 7** → **HIGH**. F5 **informational only** → **LOW**.
231
+
232
+ ---
233
+
234
+ ### Pass G: Contract coverage
235
+
236
+ **Goal**: Shared contracts and foundational unit-test responsibilities have no leaks; **aligned with design evidence**.
237
+
238
+ | # | Check | How |
239
+ |---|-------|-----|
240
+ | G1 | **Shared contract lacks implementation bearer** | Contract in inventory has no task implementation bearer. |
241
+ | G2 | **Shared contract lacks verification bearer** | Has implementation but no verification type/description/`05B`/`INT` bearer. |
242
+ | G3 | **High-risk contracts missing negative-path verification** | API/CLI/config/format contracts missing failure/boundary verification responsibility. |
243
+ | G4 | **Foundational logic lacks unit-test bearer** | registry/manifest/parser/schema/diff/merge/normalizer/planner-style logic lacks unit-test assignment. |
244
+ | G5 | **Contract vs verification-type mismatch** | Obvious shared contract paired only with vague manual checks or materially insufficient tier. |
245
+ | G6 | **Missing regression duty** | Change impacts critical contract but lacks minimal regression verification task. |
246
+
247
+ **Severity**: G1 **for P0 or core contracts** → **CRITICAL**. G2/G3/G6 → **HIGH**. G4 **shared foundations missing UT** → **HIGH**. G5 → **MEDIUM**.
248
+
249
+ > [!IMPORTANT]
250
+ > **If a task declares contract bearing but `04_SYSTEM_DESIGN/*.md` / ADR / Architecture lack a matching contract source, report design evidence gaps first—not a default endorsement of task wording.**
251
+
252
+ ---
253
+
254
+ ## Severity and reporting
255
+
256
+ ### What to do
257
+
258
+ Tag each finding with severity and produce the full **task review report** per § Output format; over-cap items go into an **overflow summary**. User-facing brief must include **summary table + top 10** findings.
259
+
260
+ ### Why
261
+
262
+ Motto: **Severity is the tool that takes prioritization off rhetoric.**
263
+ Bar: **Good reporting** lets CHALLENGER align gates at a glance; bad reporting heaps equal tone so readers cannot route.
264
+
265
+ ### How to accept
266
+
267
+ - Each finding **traces** to a Pass + model element; Critical/High include evidence sublists in detail.
268
+ - **Health**: matches the table below and agrees with summary counts.
269
+
270
+ ---
271
+
272
+ ### Output format: Task review report
273
+
274
+ ```markdown
275
+ ## Task Review Report
276
+
277
+ > **Reviewed files**: .anws/v{N}/05A_TASKS.md + .anws/v{N}/05B_VERIFICATION_PLAN.md
278
+ > **Reference docs**: 01_PRD.md, 02_ARCHITECTURE_OVERVIEW.md, 03_ADR/*, 04_SYSTEM_DESIGN/*
279
+ > **Date**: {YYYY-MM-DD}
280
+
281
+ ---
282
+
283
+ ### Detection summary
284
+
285
+ | Pass | Checks run | CRITICAL | HIGH | MEDIUM | LOW |
286
+ |------|:-------:|:--------:|:----:|:------:|:---:|
287
+ | A Duplication | — | — | — | — | — |
288
+ | B Ambiguity | — | — | — | — | — |
289
+ | C Underspecification | — | — | — | — | — |
290
+ | D Inconsistency | — | — | — | — | — |
291
+ | E Coverage | — | — | — | — | — |
292
+ | F Quality/granularity | — | — | — | — | — |
293
+ | G Contract coverage | — | — | — | — | — |
294
+ | **Total** | **—** | **—** | **—** | **—** | **—** |
295
+
296
+ **Overall health**: Healthy / Needs attention / Blocked
297
+
298
+ **High-signal takeaway**: [1–3 sentences; only issues feeding the challenge main narrative]
299
+
300
+ ---
301
+
302
+ ### REQ coverage
303
+
304
+ | REQ-ID | Title | Priority | Related tasks | Status |
305
+ |--------|------|:------:|---------|:----:|
306
+
307
+ **Coverage**: {covered}/{total} ({pct}%)
308
+
309
+ ---
310
+
311
+ ### User Story completeness
312
+
313
+ | US-ID | Title | Systems involved | Related tasks | Independently testable | Status |
314
+ |-------|------|---------|---------|:--------:|:----:|
315
+
316
+ ---
317
+
318
+ ### Terminology consistency
319
+
320
+ | Term | In PRD | In Architecture | In Tasks | Status |
321
+ |------|--------|----------------|---------|:----:|
322
+
323
+ ---
324
+
325
+ ### Contract coverage
326
+
327
+ | Contract | Type | Impl bearer | Verify bearer | Status |
328
+ |------|------|---------|---------|:----:|
329
+
330
+ **Design evidence source**: Read / Not read `04_SYSTEM_DESIGN/*`
331
+
332
+ ---
333
+
334
+ ### Critical path
335
+
336
+ > Longest dependency chain and bottleneck highlights (optional Mermaid).
337
+
338
+ ---
339
+
340
+ ### Core findings list
341
+
342
+ | ID | Severity | Pass | Location | Finding | Impact | Recommendation |
343
+ |----|--------|------|------|------|------|------|
344
+
345
+ ---
346
+
347
+ ### Top findings detail (Critical / High only)
348
+
349
+ #### TR-01 [Title]
350
+
351
+ **Pass**:
352
+ **Severity**:
353
+ **Location**:
354
+
355
+ **Evidence**:
356
+ - Requirement source:
357
+ - Task mapping:
358
+ - Cross-check:
359
+
360
+ **Impact**:
361
+ **Recommendation**:
362
+
363
+ ---
364
+
365
+ ### Overflow summary (when findings > 50)
366
+
367
+ {N} additional findings omitted. Primary categories: …
368
+ ```
369
+
370
+ ---
371
+
372
+ ### Severity tiers
373
+
374
+ | Tier | Decision rule | Required action |
375
+ |:----:|---------|---------|
376
+ | **Critical** | Fundamental contradiction or cannot proceed; if not gated, rework or failure is inevitable | **P0** — must fix before blueprint / forge |
377
+ | **High** | High probability of rework or failed acceptance | **P1** — fix before forge |
378
+ | **Medium** | Risk with workable mitigations | **P2** — fix during implementation |
379
+ | **Low** | Polish or minor deltas that don’t gate | **P3** — backlog |
380
+
381
+ **Health rules**: Critical ≥ **1** ⇒ **Blocked**. High ≥ **5** ⇒ **Needs attention**. Else ⇒ **Healthy**.
382
+
383
+ > [!NOTE]
384
+ > Prefer **Critical / High** first; Medium / Low only when they change execution decisions or materially improve stability.
385
+
386
+ ---
387
+
388
+ ## Pre-handoff QA
389
+
390
+ ### What to do
391
+
392
+ Before closing out to parent `/challenge` or user, checklist: **model completeness → Pass semantics → overflow/trim → spec contract → single-writer clashes**.
393
+
394
+ ### Why
395
+
396
+ Motto: **A last-second formatting glitch vaporizes evidence trust.**
397
+ Bar: **Good handoff** merges context-free across sessions; bad handoff leaves stray MD shards.
398
+
399
+ ### How to accept
400
+
401
+ - Detection summary versus core-findings counts are conserved (severity sums reconcile).
402
+ - No placeholder phrases violating **CRITICAL spec output contract**; Top section not padded with lows.
403
+ - If children used: **Handoff checklist** complete; parent **dedupe** leaves continuous IDs or an id map appendix.
404
+ - `04_SYSTEM_DESIGN` read state matches Pass G conclusions; never “not read yet claims completeness.”
405
+
406
+ ---
407
+
408
+ ## completion_criteria
409
+
410
+ `<completion_criteria>`
411
+ **This skill round may be marked complete iff:**
412
+
413
+ 1. All **existent** paths listed under § Mission objectives are read; unread items appear as **SKIPPED/insufficient evidence** in summaries.
414
+ 2. Four semantic models built; passes **A→G** closed as either executed or skipped+reason.
415
+ 3. Task review report per § Output format includes: **detection summary, REQ coverage, US completeness, terminology consistency, contract coverage, critical path, core findings list**; Critical/High have Top detail.
416
+ 4. Findings **≤50** or truncated with **overflow summary**.
417
+ 5. User sees **summary table + top 10** findings.
418
+ 6. If child agent: **Handoff checklist** satisfied and parent declares merge complete.
419
+ `</completion_criteria>`
420
+
421
+ ---
422
+
423
+ ## Review notes (non-normative execution hints)
424
+
425
+ 1. Semantically clear but slightly colloquial → at most **LOW**.
426
+ 2. Words like “fast” differ between realtime loops vs batch; decide domain before B1.
427
+ 3. Use Architecture system boundaries to bound task scope; ADR-recorded trades are not reopened—only whether **tasks violate ADRs**.
428
+ 4. **Incremental value**: A few high-severity findings already win; exhaustive coverage isn’t KPI.
@@ -0,0 +1,252 @@
1
+ ---
2
+ name: tech-evaluator
3
+ description: [ALPHA] Serves `/genesis` Step 3 "Technical selection": evaluates candidate stacks with ATAM and a 12-dimension weighted matrix, producing traceable comparison conclusions and ADR promotion material; does not author formal ADR files (persisted in Step 5). Same lineage as the same-named skill in shipped `templates/`; prefer references at the same relative path inside this bundle.
4
+ ---
5
+
6
+ # Technical Evaluator Handbook — [ALPHA] Genesis Step 3
7
+
8
+ > "There is no best technology stack—only the most suitable stack." — ThoughtWorks Technology Radar
9
+
10
+ This skill is grounded in **SEI ATAM (Architecture Tradeoff Analysis Method)** and **weighted decision matrices**. In **`templates_alpha/`** / **`templates_alpha_en/`** ALPHA workflows it binds to **`/genesis`** as **Step 3**; formal **ADR authoring** and numbering are governed by **Step 5** and `genesis.md`.
11
+
12
+ ---
13
+
14
+ ## CRITICAL /genesis gates (difference from shipped skill)
15
+
16
+ > [!IMPORTANT]
17
+ >
18
+ > - **`/genesis` Step 3**: **Only** output evaluation results and Markdown comparison material; **must not** in this step create or modify any ADR file under `.anws/v{N}/03_ADR/`. Rationale is in `genesis.md` Step 3 / Step 5: ADRs are formal decision records and must land after full review in Step 5.
19
+ > - **Step 5 persistence targets** (for downstream reference, **not** Step 3 work): elevate the Step 3 comparison table into `.anws/v{N}/03_ADR/ADR_001_TECH_STACK.md` (and sibling ADRs); **section structure is authoritative in `references/ADR_TEMPLATE.md` only**.
20
+ > - If the host session declares it is **not** `/genesis` or explicitly authorizes "write ADRs in this step," follow **that workflow** on the spot; by default Step 3 still does not write ADRs.
21
+
22
+ > [!NOTE]
23
+ > **ADR timing**: Step 3 outputs evaluation + comparison material only—no `03_ADR/`; Step 4 yields boundaries + `02_*`; Step 5 promotes ADRs so impact aligns with real system IDs. Full table and four-point rationale: **`genesis.md`** Step 3 NOTE. Non-`/genesis` or explicit “write ADR in this step” overrides defaults.
24
+
25
+ ---
26
+
27
+ ## CRITICAL spec delivery contract (Step 3 artifacts)
28
+
29
+ > [!IMPORTANT]
30
+ >
31
+ > - **Verifiable**: Constraint blocks must cover functional requirements, non-functional requirements, team, budget, and special constraints; for missing items write "Not provided—evaluation assumes H-…"; silent omission is not allowed.
32
+ > - **Countable**: Each candidate stack must have a 12-dimension score table (1–5) or per-dimension "N/A + reason"; totals without a dimension breakdown are not allowed.
33
+ > - **Deducible**: ATAM paragraphs must include at least one **quality-attribute scenario**, several **trade-off points**, and several **risk points**; stacking adjectives is not allowed in place of scenarios.
34
+ > - **Promotable**: The final comparison table must **map cleanly** onto **`references/ADR_TEMPLATE.md`** sections and required fields for paste and polish in Step 5.
35
+ > - **Explicit verification strategy**: Must answer or mark open: emphasis for unit / integration / E2E tests, smoke / regression gates, and which layer owns quality gates—PR / INT / staging / release (consistent with Step 3 in `genesis.md`).
36
+ > - **Single source of truth**: Numbers and conclusions are anchored to Step 3 output; Step 5 only edits and changes status—it must not re-score backward without new evidence.
37
+
38
+ ---
39
+
40
+ ## Mandatory deep thinking
41
+
42
+ > [!IMPORTANT]
43
+ > Before evaluating you **must** invoke the `sequential-thinking` skill and organize **3–7 thoughts** by complexity—for example:
44
+ >
45
+ > 1. What are the user's core scenarios and must-support use-case boundaries?
46
+ > 2. Team familiarity and acceptable learning cost?
47
+ > 3. Budget and sensitivity to cloud / license TCO?
48
+ > 4. Expected scale and concurrency / data volume?
49
+ > 5. Do compliance regimes (GDPR, classified protection, etc.) veto certain stacks?
50
+
51
+ ---
52
+
53
+ ## Goals (Step 3)
54
+
55
+ Without **writing ADR files**, produce:
56
+
57
+ 1. Structured **constraint summary** and **candidate stack list**.
58
+ 2. **12-dimension scoring matrix** and weighted aggregate explanation (declare weights or use the suggestions here and justify).
59
+ 3. Short **ATAM trade-offs and risks** narrative.
60
+ 4. A **Markdown master comparison table** of candidates usable directly in Step 5.
61
+
62
+ ---
63
+
64
+ ## Evaluation flow (the evaluation)
65
+
66
+ ### Step 1: Gather constraints
67
+
68
+ **Must obtain from the user or loaded artifacts** (label assumptions per spec when incomplete):
69
+
70
+ - **Functional requirements**: core capability list (may cite `01_PRD.md`).
71
+ - **Non-functional requirements**: performance, availability, security level.
72
+ - **Team**: headcount, skills, appetite to learn.
73
+ - **Budget**: dev, ops, time.
74
+ - **Special constraints**: compliance, legacy integration, mandated technologies.
75
+ - (If Step 2.5 ran) Evidence and alternatives from `/explore` research.
76
+
77
+ #### What to do
78
+
79
+ Fix input boundaries and list gaps with hypothesis IDs.
80
+
81
+ #### Why
82
+
83
+ Avoid preference scoring without constraints and unreproducible conclusions.
84
+
85
+ #### How to validate
86
+
87
+ Outputs may include a "hypothesis H-xx" cross-reference table; no silent gaps.
88
+
89
+ ---
90
+
91
+ ### Step 2: Identify candidates
92
+
93
+ **Mainstream stack reference** (replace or extend per project):
94
+
95
+
96
+ | Scenario | Recommended stack | Alternatives |
97
+ | -------- | ----------------- | ------------ |
98
+ | **Web full-stack** | Next.js + TypeScript | Nuxt, SvelteKit |
99
+ | **Backend API** | Go / Rust / Node.js | Python FastAPI, Java Spring |
100
+ | **Desktop** | Tauri (Rust + Web) | Electron, Flutter Desktop |
101
+ | **Mobile** | React Native / Flutter | Swift/Kotlin native |
102
+ | **AI/ML** | Python + PyTorch/TensorFlow | Rust (Candle), Julia |
103
+ | **Data-heavy** | PostgreSQL + TimescaleDB | ClickHouse, DuckDB |
104
+
105
+
106
+ #### What to do
107
+
108
+ List **two or more named** candidates (language / framework / key middleware) with one line stating scope.
109
+
110
+ #### Why
111
+
112
+ Single candidate yields no trade-off and cannot satisfy ATAM.
113
+
114
+ #### How to validate
115
+
116
+ Each candidate is scored independently; no anonymous "option A/B."
117
+
118
+ ---
119
+
120
+ ### Step 3: 12-dimension evaluation
121
+
122
+ Score each candidate 1–5 per dimension:
123
+
124
+
125
+ | Dimension | Weight hint | Evaluation question |
126
+ | --------- | ----------- | ------------------- |
127
+ | **Requirement fit** | — | Covers all core functionality? |
128
+ | **Scalability** | — | Supports 10x growth? |
129
+ | **Performance** | — | Meets latency / throughput? |
130
+ | **Security** | — | Built-in security and compliance support? |
131
+ | **Team skill** | — | Familiarity and learning curve? |
132
+ | **Talent market** | — | Hiring and outsource availability? |
133
+ | **Development speed** | — | Iteration and delivery velocity? |
134
+ | **TCO** | — | Dev + ops + licensing? |
135
+ | **Community/ecosystem** | — | Libraries, tools, troubleshooting assets? |
136
+ | **Long-term maintenance** | — | Shelf life and LTS? |
137
+ | **Integration** | — | Legacy and third-party integration? |
138
+ | **AI readiness** | — | Ease of wiring AI / LLMs? |
139
+
140
+
141
+ #### What to do
142
+
143
+ Complete the matrix; declare weights (even or weighted) and compute comparable totals or tiers.
144
+
145
+ #### Why
146
+
147
+ Dimensional transparency aids ADR evidence in Step 5.
148
+
149
+ #### How to validate
150
+
151
+ Tables are recalculable in Markdown; N/A dimensions get a single-line rationale.
152
+
153
+ ---
154
+
155
+ ### Step 4: Trade-off analysis (ATAM)
156
+
157
+ 1. Identify **quality-attribute scenarios** (e.g., "Under 1k concurrent users, P95 < 200 ms").
158
+ 2. Grade each candidate's **support** for the scenario.
159
+ 3. List **trade-offs** (e.g., performance vs team learning cost).
160
+ 4. List **risks** (e.g., maturity of a new framework).
161
+
162
+ #### What to do
163
+
164
+ Say clearly **why runner-up lost**.
165
+
166
+ #### Why
167
+
168
+ ADR value is trade-offs and consequences, not merely declaring a winner.
169
+
170
+ #### How to validate
171
+
172
+ At least one scenario plus several trade-offs / risks cross-referenced with candidate tables.
173
+
174
+ ---
175
+
176
+ ### Step 5: Produce ADR promotion material (Step 3—not persisted)
177
+
178
+ Under **`/genesis` Step 3** you **must not** create or modify `03_ADR/*.md`. Produce full **Markdown** comparison conclusions and promotion material so Step 5 can elevate losslessly against **`references/ADR_TEMPLATE.md`**; sections may read `Proposed` / `TBD`.
179
+
180
+ If the workflow rarely demands placeholder files this step, only empty files or MANIFEST paths are allowed—**do not** treat placeholders as accepted ADRs.
181
+
182
+ **Forbidden**: embed another full ADR exemplar in this SKILL that duplicates **`references/ADR_TEMPLATE.md`**; any section question defers to that file.
183
+
184
+ #### What to do
185
+
186
+ Produce full comparison + draft Markdown scoped to `references/ADR_TEMPLATE.md` (in memory / session).
187
+
188
+ #### Why
189
+
190
+ Align with `genesis` decision checkpoints; avoid early informal ADRs.
191
+
192
+ #### How to validate
193
+
194
+ The parent agent can open `references/ADR_TEMPLATE.md` in Step 5 and align sections without information breaks.
195
+
196
+ ---
197
+
198
+ ## Practitioner rules
199
+
200
+ 1. **Prefer "boring" tech**: choose mature stacks unless there is strong reason not to.
201
+ 2. **Limited innovation budget**: 1–2 innovation slots per project; keep the rest conservative.
202
+ 3. **Team capability wins**: the best technology is zero value if it cannot be wielded.
203
+ 4. **TCO is not cash alone**: time and cognitive load count.
204
+
205
+ ---
206
+
207
+ ## references and bundle paths
208
+
209
+ This bundle includes **`templates_alpha_en/.agents/skills/tech-evaluator/references/ADR_TEMPLATE.md`** mirrored from shipped **`templates/`**. Read **only** `references/` beside this SKILL; **do not** mix-read shipped `templates/.agents/skills/tech-evaluator/` in the same session.
210
+
211
+ | File | Purpose |
212
+ |------|---------|
213
+ | `references/ADR_TEMPLATE.md` | ADR shape and required sections |
214
+
215
+ ---
216
+
217
+ ## Execution shape and subagent orchestration
218
+
219
+ ### What to do
220
+
221
+ - **Preferred**: When the host provides **AGENTS / subagents**, you may delegate **candidate discovery**, **per-candidate multidimensional drafts**, or **ATAM risk sketches**; the orchestrator issues this doc's **spec contract + gates + ADR template fields**, then merges into **one consolidated draft**.
222
+ - **Parent agent**: `sequential-thinking` **must** run one full chain in the main session or a designated merge agent before locking; subagents cannot replace that duty unless the workflow states otherwise.
223
+ - **Fallback**: Without subagents, the current session runs the whole flow end to end.
224
+
225
+ ### Why
226
+
227
+ Parallel discovery and serialized decisions reduce gaps across dimensions.
228
+
229
+ ### How to validate
230
+
231
+ Merged draft still meets spec; no contradictory scores or duplicate candidate names; `/genesis` Step 3 still performs **no** `03_ADR` writes.
232
+
233
+ ---
234
+
235
+ ## Handoff checklist (orchestration / subagents / Step 5)
236
+
237
+ - [ ] Constraints and hypotheses H-xx are listed or explicit gaps declared.
238
+ - [ ] 12-dimension matrix + weight explanation complete.
239
+ - [ ] ATAM: scenarios, trade-offs, risks complete.
240
+ - [ ] Markdown comparison maps to `references/ADR_TEMPLATE.md`.
241
+ - [ ] Verification strategy and test-tier gates answered or flagged "TBD Step 5 / design-system".
242
+ - [ ] **`/genesis` Step 3**: confirm no create / edit of `03_ADR/*.md`.
243
+
244
+ ---
245
+
246
+ <completion_criteria>
247
+ - `sequential-thinking` completed with 3–7 thoughts visible in output and absorbed by the evaluation.
248
+ - Deliverables meet **CRITICAL spec delivery contract** (verifiable / countable / deducible / promotable / explicit verification strategy).
249
+ - On the default `/genesis` Step 3 path, **no** ADR persisted under `.anws/v{N}/03_ADR/`.
250
+ - Handoff checklist holds or exemptions are logged in the final section "Open items".
251
+ - If loaded from `templates_alpha` or `templates_alpha_en`, use the **same overlay tree** as other skills in this session; do not mix shipped `templates/` variants for the same step to avoid gate drift.
252
+ </completion_criteria>