@kiwidata/grimoire 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (215) hide show
  1. package/.claude-plugin/plugin.json +8 -0
  2. package/AGENTS.md +217 -0
  3. package/README.md +748 -0
  4. package/bin/grimoire.js +2 -0
  5. package/dist/cli/index.d.ts +2 -0
  6. package/dist/cli/index.d.ts.map +1 -0
  7. package/dist/cli/index.js +42 -0
  8. package/dist/cli/index.js.map +1 -0
  9. package/dist/commands/archive.d.ts +3 -0
  10. package/dist/commands/archive.d.ts.map +1 -0
  11. package/dist/commands/archive.js +22 -0
  12. package/dist/commands/archive.js.map +1 -0
  13. package/dist/commands/branch-check.d.ts +3 -0
  14. package/dist/commands/branch-check.d.ts.map +1 -0
  15. package/dist/commands/branch-check.js +16 -0
  16. package/dist/commands/branch-check.js.map +1 -0
  17. package/dist/commands/check.d.ts +3 -0
  18. package/dist/commands/check.d.ts.map +1 -0
  19. package/dist/commands/check.js +22 -0
  20. package/dist/commands/check.js.map +1 -0
  21. package/dist/commands/ci.d.ts +3 -0
  22. package/dist/commands/ci.d.ts.map +1 -0
  23. package/dist/commands/ci.js +18 -0
  24. package/dist/commands/ci.js.map +1 -0
  25. package/dist/commands/diff.d.ts +3 -0
  26. package/dist/commands/diff.d.ts.map +1 -0
  27. package/dist/commands/diff.js +10 -0
  28. package/dist/commands/diff.js.map +1 -0
  29. package/dist/commands/docs.d.ts +3 -0
  30. package/dist/commands/docs.d.ts.map +1 -0
  31. package/dist/commands/docs.js +11 -0
  32. package/dist/commands/docs.js.map +1 -0
  33. package/dist/commands/health.d.ts +3 -0
  34. package/dist/commands/health.d.ts.map +1 -0
  35. package/dist/commands/health.js +13 -0
  36. package/dist/commands/health.js.map +1 -0
  37. package/dist/commands/init.d.ts +3 -0
  38. package/dist/commands/init.d.ts.map +1 -0
  39. package/dist/commands/init.js +21 -0
  40. package/dist/commands/init.js.map +1 -0
  41. package/dist/commands/list.d.ts +3 -0
  42. package/dist/commands/list.d.ts.map +1 -0
  43. package/dist/commands/list.js +22 -0
  44. package/dist/commands/list.js.map +1 -0
  45. package/dist/commands/log.d.ts +3 -0
  46. package/dist/commands/log.d.ts.map +1 -0
  47. package/dist/commands/log.js +15 -0
  48. package/dist/commands/log.js.map +1 -0
  49. package/dist/commands/map.d.ts +3 -0
  50. package/dist/commands/map.d.ts.map +1 -0
  51. package/dist/commands/map.js +17 -0
  52. package/dist/commands/map.js.map +1 -0
  53. package/dist/commands/pr.d.ts +3 -0
  54. package/dist/commands/pr.d.ts.map +1 -0
  55. package/dist/commands/pr.js +17 -0
  56. package/dist/commands/pr.js.map +1 -0
  57. package/dist/commands/status.d.ts +3 -0
  58. package/dist/commands/status.d.ts.map +1 -0
  59. package/dist/commands/status.js +12 -0
  60. package/dist/commands/status.js.map +1 -0
  61. package/dist/commands/test-quality.d.ts +3 -0
  62. package/dist/commands/test-quality.d.ts.map +1 -0
  63. package/dist/commands/test-quality.js +37 -0
  64. package/dist/commands/test-quality.js.map +1 -0
  65. package/dist/commands/trace.d.ts +3 -0
  66. package/dist/commands/trace.d.ts.map +1 -0
  67. package/dist/commands/trace.js +12 -0
  68. package/dist/commands/trace.js.map +1 -0
  69. package/dist/commands/update.d.ts +3 -0
  70. package/dist/commands/update.d.ts.map +1 -0
  71. package/dist/commands/update.js +22 -0
  72. package/dist/commands/update.js.map +1 -0
  73. package/dist/commands/validate.d.ts +3 -0
  74. package/dist/commands/validate.d.ts.map +1 -0
  75. package/dist/commands/validate.js +17 -0
  76. package/dist/commands/validate.js.map +1 -0
  77. package/dist/core/archive.d.ts +9 -0
  78. package/dist/core/archive.d.ts.map +1 -0
  79. package/dist/core/archive.js +92 -0
  80. package/dist/core/archive.js.map +1 -0
  81. package/dist/core/branch-check.d.ts +27 -0
  82. package/dist/core/branch-check.d.ts.map +1 -0
  83. package/dist/core/branch-check.js +205 -0
  84. package/dist/core/branch-check.js.map +1 -0
  85. package/dist/core/check.d.ts +24 -0
  86. package/dist/core/check.d.ts.map +1 -0
  87. package/dist/core/check.js +372 -0
  88. package/dist/core/check.js.map +1 -0
  89. package/dist/core/ci.d.ts +24 -0
  90. package/dist/core/ci.d.ts.map +1 -0
  91. package/dist/core/ci.js +162 -0
  92. package/dist/core/ci.js.map +1 -0
  93. package/dist/core/detect.d.ts +10 -0
  94. package/dist/core/detect.d.ts.map +1 -0
  95. package/dist/core/detect.js +368 -0
  96. package/dist/core/detect.js.map +1 -0
  97. package/dist/core/diff.d.ts +29 -0
  98. package/dist/core/diff.d.ts.map +1 -0
  99. package/dist/core/diff.js +197 -0
  100. package/dist/core/diff.js.map +1 -0
  101. package/dist/core/doc-style.d.ts +16 -0
  102. package/dist/core/doc-style.d.ts.map +1 -0
  103. package/dist/core/doc-style.js +192 -0
  104. package/dist/core/doc-style.js.map +1 -0
  105. package/dist/core/docs.d.ts +6 -0
  106. package/dist/core/docs.d.ts.map +1 -0
  107. package/dist/core/docs.js +478 -0
  108. package/dist/core/docs.js.map +1 -0
  109. package/dist/core/health.d.ts +7 -0
  110. package/dist/core/health.d.ts.map +1 -0
  111. package/dist/core/health.js +489 -0
  112. package/dist/core/health.js.map +1 -0
  113. package/dist/core/hooks.d.ts +5 -0
  114. package/dist/core/hooks.d.ts.map +1 -0
  115. package/dist/core/hooks.js +168 -0
  116. package/dist/core/hooks.js.map +1 -0
  117. package/dist/core/init.d.ts +9 -0
  118. package/dist/core/init.d.ts.map +1 -0
  119. package/dist/core/init.js +563 -0
  120. package/dist/core/init.js.map +1 -0
  121. package/dist/core/list.d.ts +4 -0
  122. package/dist/core/list.d.ts.map +1 -0
  123. package/dist/core/list.js +170 -0
  124. package/dist/core/list.js.map +1 -0
  125. package/dist/core/log.d.ts +8 -0
  126. package/dist/core/log.d.ts.map +1 -0
  127. package/dist/core/log.js +150 -0
  128. package/dist/core/log.js.map +1 -0
  129. package/dist/core/map.d.ts +9 -0
  130. package/dist/core/map.d.ts.map +1 -0
  131. package/dist/core/map.js +302 -0
  132. package/dist/core/map.js.map +1 -0
  133. package/dist/core/pr.d.ts +9 -0
  134. package/dist/core/pr.d.ts.map +1 -0
  135. package/dist/core/pr.js +273 -0
  136. package/dist/core/pr.js.map +1 -0
  137. package/dist/core/shared-setup.d.ts +52 -0
  138. package/dist/core/shared-setup.d.ts.map +1 -0
  139. package/dist/core/shared-setup.js +221 -0
  140. package/dist/core/shared-setup.js.map +1 -0
  141. package/dist/core/status.d.ts +6 -0
  142. package/dist/core/status.d.ts.map +1 -0
  143. package/dist/core/status.js +114 -0
  144. package/dist/core/status.js.map +1 -0
  145. package/dist/core/test-quality.d.ts +33 -0
  146. package/dist/core/test-quality.d.ts.map +1 -0
  147. package/dist/core/test-quality.js +378 -0
  148. package/dist/core/test-quality.js.map +1 -0
  149. package/dist/core/trace.d.ts +6 -0
  150. package/dist/core/trace.d.ts.map +1 -0
  151. package/dist/core/trace.js +211 -0
  152. package/dist/core/trace.js.map +1 -0
  153. package/dist/core/update.d.ts +10 -0
  154. package/dist/core/update.d.ts.map +1 -0
  155. package/dist/core/update.js +149 -0
  156. package/dist/core/update.js.map +1 -0
  157. package/dist/core/validate.d.ts +20 -0
  158. package/dist/core/validate.d.ts.map +1 -0
  159. package/dist/core/validate.js +275 -0
  160. package/dist/core/validate.js.map +1 -0
  161. package/dist/index.d.ts +19 -0
  162. package/dist/index.d.ts.map +1 -0
  163. package/dist/index.js +20 -0
  164. package/dist/index.js.map +1 -0
  165. package/dist/utils/config.d.ts +61 -0
  166. package/dist/utils/config.d.ts.map +1 -0
  167. package/dist/utils/config.js +172 -0
  168. package/dist/utils/config.js.map +1 -0
  169. package/dist/utils/fs.d.ts +17 -0
  170. package/dist/utils/fs.d.ts.map +1 -0
  171. package/dist/utils/fs.js +38 -0
  172. package/dist/utils/fs.js.map +1 -0
  173. package/dist/utils/paths.d.ts +10 -0
  174. package/dist/utils/paths.d.ts.map +1 -0
  175. package/dist/utils/paths.js +35 -0
  176. package/dist/utils/paths.js.map +1 -0
  177. package/dist/utils/spawn.d.ts +5 -0
  178. package/dist/utils/spawn.d.ts.map +1 -0
  179. package/dist/utils/spawn.js +34 -0
  180. package/dist/utils/spawn.js.map +1 -0
  181. package/package.json +68 -0
  182. package/skills/grimoire-apply/SKILL.md +274 -0
  183. package/skills/grimoire-audit/SKILL.md +129 -0
  184. package/skills/grimoire-branch-guard/SKILL.md +111 -0
  185. package/skills/grimoire-bug/SKILL.md +160 -0
  186. package/skills/grimoire-bug-explore/SKILL.md +242 -0
  187. package/skills/grimoire-bug-report/SKILL.md +237 -0
  188. package/skills/grimoire-bug-session/SKILL.md +222 -0
  189. package/skills/grimoire-bug-triage/SKILL.md +274 -0
  190. package/skills/grimoire-commit/SKILL.md +150 -0
  191. package/skills/grimoire-discover/SKILL.md +297 -0
  192. package/skills/grimoire-draft/SKILL.md +202 -0
  193. package/skills/grimoire-plan/SKILL.md +329 -0
  194. package/skills/grimoire-pr/SKILL.md +134 -0
  195. package/skills/grimoire-pr-review/SKILL.md +240 -0
  196. package/skills/grimoire-refactor/SKILL.md +251 -0
  197. package/skills/grimoire-remove/SKILL.md +112 -0
  198. package/skills/grimoire-review/SKILL.md +247 -0
  199. package/skills/grimoire-verify/SKILL.md +223 -0
  200. package/skills/references/bug-classification.md +154 -0
  201. package/skills/references/build-vs-buy.md +77 -0
  202. package/skills/references/elicitation-personas.md +118 -0
  203. package/skills/references/refactor-register-format.md +88 -0
  204. package/skills/references/refactor-scan-categories.md +102 -0
  205. package/skills/references/schema-format.md +68 -0
  206. package/skills/references/security-compliance.md +110 -0
  207. package/skills/references/testing-contracts.md +93 -0
  208. package/templates/context.yml +110 -0
  209. package/templates/debt-exceptions.yml +61 -0
  210. package/templates/decision.md +50 -0
  211. package/templates/dupignore +93 -0
  212. package/templates/example.feature +24 -0
  213. package/templates/manifest.md +29 -0
  214. package/templates/mapignore +58 -0
  215. package/templates/mapkeys +65 -0
@@ -0,0 +1,247 @@
1
+ ---
2
+ name: grimoire-review
3
+ description: Multi-perspective design review before coding begins. Expert personas validate the change for completeness, feasibility, security, and data integrity. Use after draft/plan, before apply.
4
+ compatibility: Designed for Claude Code (or similar products)
5
+ metadata:
6
+ author: kiwi-data
7
+ version: "0.1"
8
+ ---
9
+
10
+ # grimoire-review
11
+
12
+ Multi-perspective LLM review of a completed design before coding begins. Expert personas validate the change for completeness, feasibility, security, and data integrity.
13
+
14
+ ## Triggers
15
+ - User has a grimoire change with approved features, decisions, and tasks
16
+ - User asks to review a design before implementing
17
+ - Automatically suggested after `grimoire-plan` completes
18
+ - Loose match: "review", "design review", "ready to code", "before we start"
19
+
20
+ ## Routing
21
+ - No tasks.md exists → `grimoire-plan` first
22
+ - Level 1 change → skip review entirely, proceed to `grimoire-apply`
23
+ - User says "skip review" → proceed to `grimoire-apply`
24
+ - Post-implementation review → `grimoire-pr` (has optional post-impl review)
25
+
26
+ ## Prerequisites
27
+ - A change exists in `.grimoire/changes/<change-id>/` with:
28
+ - `manifest.md` (approved)
29
+ - At least one `.feature` file or decision record
30
+ - `tasks.md` (generated by grimoire-plan)
31
+
32
+ ## Skipping
33
+ This step is optional. The user can skip it by saying "skip review" or "go straight to apply". Not every change needs a full review — small or low-risk changes can go directly from plan to apply.
34
+
35
+ ## Complexity-Gated Review
36
+ Read `complexity` from `manifest.md` frontmatter to determine review depth:
37
+
38
+ | Complexity | Review Depth |
39
+ |------------|-------------|
40
+ | **1 (Trivial)** | Skip review entirely — suggest proceeding to apply |
41
+ | **2 (Simple)** | Senior Engineer only. Skip other personas unless the change touches security or data. |
42
+ | **3 (Moderate)** | All relevant personas (skip Data Engineer if no data changes, skip QA if no user-facing behavior) |
43
+ | **4 (Complex)** | All personas mandatory. No skipping. |
44
+
45
+ The user can always override: "run full review" on a level 2, or "just senior engineer" on a level 4.
46
+
47
+ ## Workflow
48
+
49
+ ### 1. Select Change
50
+ - List active changes in `.grimoire/changes/`
51
+ - If multiple, ask user which one to review
52
+ - If only one, confirm it
53
+
54
+ ### 2. Gather Context
55
+ Read all artifacts for the change:
56
+ - `manifest.md` — change summary, scope, **and Prior Art section** (build-vs-buy rationale)
57
+ - All `.feature` files — behavioral specifications
58
+ - All decision records — architectural choices
59
+ - `tasks.md` — implementation plan
60
+ - `data.yml` — proposed schema changes (if present)
61
+ - Read `.grimoire/config.yaml` for project context (language, tools, conventions)
62
+ - Read `.grimoire/docs/data/schema.yml` for current data baseline (if it exists)
63
+ - Read `.grimoire/docs/context.yml` for deployment environment, related services, and infrastructure (if it exists) — this informs security review (cross-service auth), engineering review (deployment constraints), and data review (infrastructure availability)
64
+ - Read relevant `.grimoire/docs/` area docs if they exist
65
+ - Skim the areas of the codebase the tasks reference
66
+
67
+ ### 3. Product Manager Review
68
+
69
+ Adopt the perspective of a **product manager** focused on completeness and user value.
70
+
71
+ Evaluate:
72
+ - **Outcome**: Does the manifest's Why clearly state the problem being solved and how success is measured? If it describes a mechanism ("add an endpoint") instead of an outcome ("users can reset passwords"), flag it — the team will argue about scope later.
73
+ - **Coverage**: Do the feature scenarios cover all user-facing behaviors? Are there missing edge cases, error states, or alternate flows that a user would encounter?
74
+ - **Clarity**: Are the feature descriptions and user stories clear enough that a non-technical stakeholder could validate them? Would QA know exactly what to test?
75
+ - **Scope**: Is the change well-bounded? Are there implicit requirements hiding in the scenarios that aren't spelled out? Do any scenarios or tasks stray into the manifest's Non-goals? Scope creep into non-goals is a **blocker**.
76
+ - **Acceptance**: Could you ship this and confidently say the feature is "done"? What would a user complain about?
77
+
78
+ Output a short list of findings — flag issues as **blocker** (must fix before coding) or **suggestion** (nice to have).
79
+
80
+ ### 4. Senior Engineer Review
81
+
82
+ Adopt the perspective of a **senior software engineer** reviewing the technical design.
83
+
84
+ Evaluate:
85
+ - **Build vs Buy**: Was the prior art research thorough? Check the manifest's Prior Art section. If the change builds custom code, is the justification for not adopting an existing library convincing? Do a quick sanity check — search for obvious libraries the research may have missed. If a well-maintained library exists that the manifest doesn't mention, flag it as a **blocker**. If the research was done but the build decision is debatable, flag as **suggestion** with the alternative.
86
+ - **Simplicity**: Is this the simplest design that solves the problem? Could any task be done with less code, fewer files, or fewer moving parts? Flag anything that looks over-engineered — new abstractions without justification, premature generalization, unnecessary indirection layers, config-driven behavior where a direct call would do.
87
+ - **Architecture**: Do the decisions make sense for this codebase? Are there simpler alternatives? Will this paint us into a corner?
88
+ - **Task quality**: Are the tasks specific enough to execute without re-planning? Do they reference real files, real patterns, real conventions from the codebase?
89
+ - **Dependencies**: Are tasks ordered correctly? Are there missing dependencies or implicit assumptions between tasks?
90
+ - **Integration**: How does this change interact with existing code? Are there areas that will break or need updating that the tasks don't cover?
91
+ - **Contract compatibility**: Does this change alter the request/response shape for any external API documented in `schema.yml`? If fields are added, removed, renamed, or re-typed in `data.yml`, flag it — the client code and any downstream consumers need contract tests updated. A contract change without updated contract tests is a **blocker**.
92
+ - **Reuse**: Are there existing utilities, patterns, or modules that should be used instead of writing new code? Check `.grimoire/docs/` area docs if available. The goal is less new code, not more.
93
+ - **Surface area**: Does the change introduce new public APIs, exports, or interfaces beyond what's needed? Fewer public functions with fewer parameters is better.
94
+ - **Quality attributes**: If decision records have a Quality Attributes table, are the targets measurable and realistic? For performance-sensitive changes (new endpoints, data pipelines, search), flag blank targets as a **blocker** — you can't verify what you haven't defined. For non-performance-sensitive changes, blank targets are fine.
95
+ - **Testing**: Is the test strategy sound? Are there gaps between what the features describe and what the step definitions will actually verify?
96
+
97
+ Output a short list of findings — flag issues as **blocker** or **suggestion**.
98
+
99
+ ### 5. Security Engineer Review
100
+
101
+ Adopt the perspective of a **security engineer** reviewing the design for vulnerabilities.
102
+
103
+ #### 5a. STRIDE Threat Analysis
104
+
105
+ For each new endpoint, data flow, or trust boundary the change introduces, evaluate using STRIDE:
106
+
107
+ | Threat | Question |
108
+ |--------------------|----------------------------------------------------------------------------------------------|
109
+ | **S**poofing | Can an attacker impersonate a user or service? Are auth checks present at every entry point? |
110
+ | **T**ampering | Can input or data in transit be modified? Is integrity validated (checksums, signatures, CSRF)?|
111
+ | **R**epudiation | Are security-relevant actions logged? Could an attacker act without leaving a trace? |
112
+ | **I**nfo Disclosure| Could error messages, logs, or responses leak sensitive data (stack traces, PII, tokens)? |
113
+ | **D**enial of Service| Are there unbounded operations (large uploads, expensive queries, no rate limits)? |
114
+ | **E**levation of Privilege| Can a user escalate to admin? Are role/permission checks at the right layer? |
115
+
116
+ Skip STRIDE categories that don't apply to the change. Don't manufacture threats.
117
+
118
+ #### 5b. Detailed Security Evaluation
119
+
120
+ - **Input validation**: Do the features involve user input? Are there scenarios covering malicious or malformed input?
121
+ - **Authentication/authorization**: Does the change touch auth boundaries? Are there missing access control checks?
122
+ - **Data handling**: Does the change introduce new data storage, transmission, or processing? Are there privacy or compliance concerns?
123
+ - **Dependencies**: Do the tasks introduce new dependencies? Are there known vulnerability concerns? Check that package names are real and correctly spelled — hallucinated or typosquatted package names are a supply chain attack vector.
124
+ - **Vulnerable packages**: If the tasks add or upgrade dependencies, check for known vulnerabilities. Cross-reference against the project's dependency audit tool (configured in `.grimoire/config.yaml` under `dep_audit`). Flag any package without a clear provenance or with a very low download count.
125
+ - **Attack surface**: Does this change expose new endpoints, APIs, or interfaces? What could an attacker target?
126
+ - **Cross-service security**: If `context.yml` lists related services, does the change properly authenticate when calling them? Are service-to-service auth boundaries maintained? Is data from sibling services validated at the boundary?
127
+ - **Secrets**: Are there hardcoded credentials, tokens, or keys in the design? Check that API keys, database credentials, and tokens are loaded from environment variables or secret stores, never inline.
128
+
129
+ If the change has no security-relevant surface (e.g., a pure UI text change), say so briefly and move on. Not every change needs a deep security review.
130
+
131
+ #### 5c. Compliance Review
132
+
133
+ Check `.grimoire/config.yaml` under `project.compliance`. If configured, evaluate per `../references/security-compliance.md` (section "Compliance Framework Verification"). Missing compliance coverage on a tagged scenario is a **blocker**. If no compliance frameworks configured, skip.
134
+
135
+ #### 5d. OWASP / CWE Classification
136
+
137
+ Tag every security finding with:
138
+ - **OWASP Top 10 (2021)** category — e.g., `A01:2021-Broken Access Control`, `A03:2021-Injection`
139
+ - **CWE ID** — e.g., `CWE-89` (SQL Injection), `CWE-79` (XSS), `CWE-798` (Hardcoded Credentials)
140
+
141
+ This makes findings actionable, searchable, and traceable to compliance frameworks.
142
+
143
+ Tag findings with OWASP category and CWE ID. See `../references/security-compliance.md` for the CWE quick reference table.
144
+
145
+ Output format:
146
+ ```markdown
147
+ ## Security Engineer
148
+ ### STRIDE Summary
149
+ - **Spoofing**: [relevant finding or "N/A"]
150
+ - **Tampering**: [relevant finding or "N/A"]
151
+ - ... (only categories that apply)
152
+
153
+ ### Findings
154
+ - **[blocker]** [A03:2021 / CWE-89] User search query is concatenated into SQL string in tasks — must use parameterized query
155
+ - **[suggestion]** [A01:2021 / CWE-862] Add rate limiting scenario for login endpoint
156
+ - No other security concerns for this change.
157
+ ```
158
+
159
+ ### 6. QA Engineer Review (Optional)
160
+
161
+ **Skip this review if the change is purely internal (no user-facing behavior, no new inputs, no observable state changes).**
162
+
163
+ If the change has user-facing behavior, adopt the perspective of a **QA engineer** focused on testability and real-world failure modes.
164
+
165
+ Evaluate:
166
+ - **Testability**: Can every scenario be verified automatically? Are there behaviors that require manual testing — and if so, is that documented? Are the Given/When/Then steps specific enough to implement as real tests?
167
+ - **Edge cases**: What inputs, states, or timing conditions are not covered by the current scenarios? Think about empty states, concurrent users, interruptions, and boundary values.
168
+ - **Negative scenarios**: For every happy path, is there at least one scenario covering what happens when things go wrong? Missing error scenarios are the #1 source of bug reports.
169
+ - **Observability**: When this feature breaks in production, how will anyone know? Are there logs, metrics, or alerts? Can a tester distinguish between "feature is broken" and "feature is slow"?
170
+ - **Regression risk**: What existing behavior could this change break? Are there integration points with other features that need cross-feature testing?
171
+ - **Accessibility**: Does the change introduce new UI? If so, are there scenarios covering keyboard navigation, screen readers, or contrast requirements?
172
+
173
+ Output a short list of findings — flag issues as **blocker** or **suggestion**.
174
+
175
+ ### 7. Data Engineer Review (Optional)
176
+
177
+ **Skip this review if the change has no `data.yml` and doesn't touch data models, schemas, migrations, or external API integrations.**
178
+
179
+ If the change touches data, adopt the perspective of a **data engineer** reviewing the schema design.
180
+
181
+ Read:
182
+ - `.grimoire/changes/<change-id>/data.yml` — proposed schema changes
183
+ - `.grimoire/docs/data/schema.yml` — current schema baseline (if it exists)
184
+
185
+ Evaluate:
186
+ - **Schema design**: Are field types appropriate? Are there missing constraints (not_null, unique, indexes) that will cause problems at scale? Are enums used where they should be?
187
+ - **Migrations**: Will the proposed changes require a data migration? Is it safe to run on a live database (e.g., adding a nullable column is safe, renaming a column is not)?
188
+ - **Relationships**: Are foreign keys and references correct? Are there missing indexes on foreign keys? Could any relationships create N+1 query problems?
189
+ - **Naming**: Do new fields/models follow the existing naming conventions in schema.yml?
190
+ - **Backwards compatibility**: Will the schema change break existing API consumers, queries, or reports? Are there downstream dependencies?
191
+ - **External APIs**: If adding a new external API dependency, is the `schema_ref` pointing to a stable spec? Is there a fallback if the API is unavailable? Is the client wrapper in the right place?
192
+ - **Contract breaking changes**: Compare `data.yml` against `schema.yml` for any external API with `action: modify`. If the change removes a required response field, changes a field type, renames a field, or adds a new required request field — it's a **breaking contract change**. Flag as **blocker** unless the change documents a migration path (versioned endpoint, fallback handling, or coordinated deployment). Adding optional response fields is safe. Adding optional request fields is safe if the API has a default.
193
+ - **Data integrity**: Are there edge cases where data could end up in an inconsistent state? Should any changes be wrapped in a transaction?
194
+
195
+ Output a short list of findings — flag issues as **blocker** or **suggestion**.
196
+
197
+ ### 8. Present Findings
198
+
199
+ Compile all reviews into a single summary:
200
+
201
+ ```markdown
202
+ # Design Review: <change-id>
203
+
204
+ ## Product Manager
205
+ - **[blocker]** Missing error scenario for invalid email format in registration feature
206
+ - **[suggestion]** Add a scenario for password strength feedback
207
+
208
+ ## Senior Engineer
209
+ - **[blocker]** Task 2.3 references `auth/views.py` but the project uses `accounts/views.py`
210
+ - **[suggestion]** Reuse `validate_email()` from `utils/validators.py` instead of writing a new one
211
+
212
+ ## Security Engineer
213
+ - **[suggestion]** Add rate limiting scenario for login attempts
214
+ - No other security concerns for this change.
215
+
216
+ ## QA Engineer
217
+ - **[blocker]** No negative scenario for expired TOTP codes — testers can't verify error handling
218
+ - **[suggestion]** Add scenario for what happens when 2FA service is unreachable
219
+ (or: "No user-facing behavior changes — skipped.")
220
+
221
+ ## Data Engineer
222
+ - **[blocker]** Missing index on `profiles.user_id` — will cause full table scans on join queries
223
+ - **[suggestion]** `avatar_url` should have a max_length constraint
224
+ (or: "No data changes in this design — skipped.")
225
+
226
+ ## Summary
227
+ - **3 blockers** — must be addressed before coding
228
+ - **3 suggestions** — consider addressing
229
+
230
+ Recommendation: Fix blockers, then proceed to apply.
231
+ ```
232
+
233
+ ### 9. Iterate
234
+ - If there are **blockers**, tell the user which artifacts need updating (features, decisions, or tasks) and offer to help fix them
235
+ - If only **suggestions**, present them and let the user decide which to address
236
+ - If **no issues**, confirm the design is ready and suggest proceeding to `grimoire-apply`
237
+ - Do NOT proceed to apply without user approval
238
+
239
+ ## Important
240
+ - This is a design review, not a code review. Focus on the specifications and plan, not hypothetical implementation details.
241
+ - Be direct. Don't pad findings with praise or soften blockers. The goal is to catch problems before code is written, when they're cheap to fix.
242
+ - A blocker means "if we code this as-is, we'll have to come back and redo work." A suggestion means "this would improve the design but isn't blocking."
243
+ - Keep each persona's review focused and short. Three bullet points that matter are better than ten that don't.
244
+ - If the change is trivial (e.g., rename a field, fix a typo in a feature), say so and don't manufacture issues.
245
+
246
+ ## Done
247
+ When findings are presented and blockers resolved (or accepted), the review is complete. Suggest proceeding to `grimoire-apply`.
@@ -0,0 +1,223 @@
1
+ ---
2
+ name: grimoire-verify
3
+ description: Verify that implementation matches feature specs and decision records. Use after apply is complete, before archiving the change.
4
+ compatibility: Designed for Claude Code (or similar products)
5
+ metadata:
6
+ author: kiwi-data
7
+ version: "0.1"
8
+ ---
9
+
10
+ # grimoire-verify
11
+
12
+ Verify that implementation matches the feature specs and decision records. Run after apply, before archive.
13
+
14
+ ## Triggers
15
+ - User wants to verify a grimoire change is correctly implemented
16
+ - User asks to check, verify, or review a change before archiving
17
+ - Loose match: "verify", "check", "review" with a change reference
18
+
19
+ ## Routing
20
+ - Change not yet applied → `grimoire-apply` first
21
+ - Want a pre-implementation design review → `grimoire-review`
22
+ - Found issues that need fixing → user decides: fix directly or route to `grimoire-apply` / `grimoire-draft`
23
+
24
+ ## Prerequisites
25
+ - A change exists in `.grimoire/changes/<change-id>/` with completed tasks
26
+ - Or: user wants to verify baseline features against the codebase (no active change required)
27
+
28
+ ## Workflow
29
+
30
+ ### 1. Select Scope
31
+ Two modes:
32
+
33
+ **Change verification** (default when a change exists):
34
+ - Select an active change with completed tasks
35
+ - Verify the implementation matches that specific change's features and decisions
36
+
37
+ **Baseline verification** (when user asks to verify the whole project):
38
+ - Verify all features in `features/` against the codebase
39
+ - Check all decisions in `.grimoire/decisions/` are still accurate
40
+
41
+ ### 2. Load Artifacts
42
+ For change verification:
43
+ - Read `manifest.md`, proposed `.feature` files, decision records, `tasks.md`
44
+
45
+ For baseline verification:
46
+ - Read all `features/**/*.feature` and `.grimoire/decisions/*.md`
47
+
48
+ ### 3. Verify in Three Dimensions
49
+
50
+ **A. Completeness — are all tasks done?**
51
+ - Parse `tasks.md` and check all items are `- [x]`
52
+ - If any are `- [ ]`, list them as CRITICAL issues
53
+ - This is objective — checkboxes don't lie
54
+
55
+ **B. Correctness — does the code match the specs?**
56
+ For each scenario in the feature files:
57
+ 1. Search the codebase for the production code that implements this behavior
58
+ 2. Search for the step definition that tests this scenario
59
+ 3. Verify the step definition makes real assertions (not empty, not `assert True`, not `pass`)
60
+ 4. If possible, confirm the test actually runs (check test output, CI results)
61
+
62
+ Flag issues:
63
+ - Scenario with no corresponding step definition → CRITICAL
64
+ - Step definition with empty/trivial body → CRITICAL
65
+ - Step definition that doesn't match the scenario's intent → WARNING
66
+ - Production code not found for a scenario → WARNING (may be indirect)
67
+
68
+ **C. Coherence — does the implementation follow the decisions?**
69
+ For each decision record:
70
+ 1. Read the chosen option and consequences
71
+ 2. Search the codebase for evidence the decision was followed
72
+ 3. Check the Confirmation section — has the criteria been met?
73
+
74
+ Flag issues:
75
+ - Decision says "use PostgreSQL" but code uses SQLite → CRITICAL
76
+ - Decision's Confirmation criteria not verifiable → WARNING
77
+ - Decision consequences not addressed → WARNING
78
+
79
+ ### 3.D Test Quality Intelligence
80
+
81
+ Go beyond "does a step definition exist?" to "would this test catch a real bug?"
82
+
83
+ For each step definition:
84
+ 1. **Assertion strength:** Classify each assertion:
85
+ - **Strong:** `assert result == "expected_value"`, `expect(status).toBe(302)`, `assertEqual(user.email, "test@example.com")`
86
+ - **Weak:** `assert result is not None`, `expect(result).toBeDefined()`, `assert len(items) > 0`
87
+ - **Trivial:** `assert True`, `pass`, empty body, `expect(true).toBe(true)`
88
+
89
+ 2. **Null implementation test:** Could this test pass if the function under test returned `None`, `[]`, `{}`, or `0`? If yes, the test is too weak.
90
+
91
+ 3. **Common anti-patterns to flag:**
92
+ - Step definition body is just `pass` or `...` → CRITICAL
93
+ - Assertion only checks `is not None` or `toBeDefined()` → WARNING
94
+ - Assertion checks type only (`isinstance()`) without checking value → WARNING
95
+ - Test creates a mock and then asserts against the mock's return value (circular) → CRITICAL
96
+ - Try/except that swallows assertion errors → CRITICAL
97
+ - Step definition has no `assert`/`expect` at all → CRITICAL (for Then steps)
98
+ - Test mocks the client wrapper instead of the HTTP boundary → WARNING (tests wiring, not contract compliance)
99
+ - Test mocks internal code that lives in the same repo → WARNING (hides integration bugs)
100
+ - Contract test uses a fixture that doesn't match `schema.yml` → CRITICAL (fictional contract)
101
+ - Test mocks so aggressively that removing production code still passes → CRITICAL
102
+
103
+ 4. **Report format:** Include test quality findings alongside correctness findings:
104
+ ```
105
+ - **[critical]** `test_auth.py:42` — step "Then I should be redirected" has no assertion (empty body)
106
+ - **[warning]** `test_auth.py:58` — step "Then user should exist" only asserts `is not None` — check the actual user properties
107
+ ```
108
+
109
+ If `grimoire test-quality` CLI command is available, suggest running it for a comprehensive analysis.
110
+ To run tests directly: use `config.tools.bdd_test` for BDD and `config.tools.unit_test` for unit tests.
111
+
112
+ ### 4. Security Compliance Verification
113
+
114
+ Verify that security guidance from plan and review stages was followed in implementation. Read `../references/security-compliance.md` for the full checklist.
115
+
116
+ **A. Check plan-stage security patterns:**
117
+ Confirm the implementation uses proven patterns: framework auth (not custom), bcrypt/argon2 (not MD5/SHA), parameterized queries (not string concatenation), CSRF protection, input validation at boundary, no hardcoded secrets.
118
+
119
+ **B. Check review findings were addressed:**
120
+ If a `grimoire-review` was run, list each **blocker** from the Security Engineer review. Search the implementation for evidence each was fixed. Unaddressed blockers → CRITICAL.
121
+
122
+ **C. OWASP Top 10 surface scan:**
123
+ Scan changed files against the OWASP table in `../references/security-compliance.md`. Tag findings with OWASP category and CWE ID.
124
+
125
+ **D. Verify security-tagged scenarios:**
126
+ Check feature files for security tags. For each, verify per the rules in `../references/security-compliance.md`. A security-tagged scenario with no security verification in tests → CRITICAL.
127
+
128
+ If no security tags exist and the change has no security surface, state so briefly and move on.
129
+
130
+ ### 5. Contract Test Coverage
131
+
132
+ Verify that every external API integration has contract tests that match the documented contract.
133
+
134
+ **A. Inventory external APIs:**
135
+
136
+ Read `.grimoire/docs/data/schema.yml` and list every entry with `type: external_api`. For each:
137
+
138
+ 1. **Contract documented?** Check that the entry has `endpoints` with `request`, `response`, and `error_response` shapes. Missing contract documentation → WARNING (the contract is implicit and untested)
139
+
140
+ 2. **Contract test exists?** Search the test suite for tests that validate the client against the documented response shape. Look for:
141
+ - Tests that assert specific response fields match expected types/values
142
+ - Tests that use fixture/recorded responses matching the `schema.yml` shape
143
+ - Tests that verify error handling matches the documented `error_response`
144
+ - Missing contract test for a documented API → CRITICAL
145
+
146
+ 3. **Contract test matches schema?** Compare the fixture/recorded response used in tests against the `schema.yml` contract:
147
+ - Fixture has fields not in `schema.yml` → WARNING (undocumented dependency)
148
+ - `schema.yml` has `required: true` fields not asserted in tests → WARNING (untested contract guarantee)
149
+ - Client reads fields not in `schema.yml` → CRITICAL (invisible contract dependency)
150
+
151
+ 4. **Contract drift?** If this is a change verification (not baseline), compare `data.yml` against `schema.yml`:
152
+ - Any field changes on external APIs without corresponding test updates → CRITICAL
153
+ - New endpoints without contract tests → CRITICAL
154
+
155
+ **Report format:**
156
+ ```markdown
157
+ ## Contract Coverage
158
+ - [x] `stripe_api` — 3 endpoints, all with contract tests in `tests/integrations/test_stripe.py`
159
+ - [ ] **[critical]** `github_api.get_user` — no contract test found for response shape
160
+ - [ ] **[warning]** `sendgrid_api` — contract documented but `error_response` shape missing
161
+ - [ ] **[critical]** `payments_api` — client reads `transaction.metadata.source` not in schema.yml (undocumented field dependency)
162
+ ```
163
+
164
+ If no external APIs exist in `schema.yml`, skip this section.
165
+
166
+ ### 6. Dead Feature Detection
167
+ Check for features that exist in specs but may no longer be implemented:
168
+ - Feature files with no corresponding step definitions anywhere
169
+ - Step definitions that import modules/functions that no longer exist
170
+ - Step definitions with `pass` or `NotImplementedError` bodies
171
+ - Features tagged `@skip` or `@wip` that have been in that state for a long time
172
+
173
+ ### 7. Generate Report
174
+ Produce a structured report:
175
+
176
+ ```markdown
177
+ # Verification Report: <change-id or "baseline">
178
+
179
+ ## Summary
180
+ - Scenarios verified: X
181
+ - Decisions verified: X
182
+ - Security checks: X passed, X failed
183
+ - Issues found: X critical, X warnings, X suggestions
184
+
185
+ ## Critical Issues
186
+ - [ ] <issue description> — `file:line`
187
+
188
+ ## Security Compliance
189
+ - [x] Verified: <security pattern confirmed> — `file:line`
190
+ - [ ] **[critical]** [OWASP/CWE tag] <violation> — `file:line`
191
+ - [ ] **[warning]** [OWASP/CWE tag] <concern> — `file:line`
192
+
193
+ ## Warnings
194
+ - [ ] <issue description> — `file:line`
195
+
196
+ ## Suggestions
197
+ - [ ] <suggestion> — `file:line`
198
+
199
+ ## Verified Scenarios
200
+ - [x] "Scenario name" in `feature/file.feature` — step def in `test_file.py:42`
201
+ - [x] ...
202
+ ```
203
+
204
+ ### 8. Recommend Next Steps
205
+ Based on the report:
206
+ - **All clear** → recommend archiving the change
207
+ - **Critical issues** → must fix before archiving
208
+ - **Warnings only** → user decides whether to fix or accept
209
+ - **Dead features found** → suggest a removal change or updating the features
210
+
211
+ ## Important
212
+ - Verify is read-only. Do NOT fix issues — only report them. The user decides what to do.
213
+ - Be specific: reference file paths and line numbers for every issue.
214
+ - A scenario without a step definition is always CRITICAL — the spec is not tested.
215
+ - A step definition with no assertions is always CRITICAL — it's a false positive.
216
+ - Don't verify implementation details — only verify that the behavior described in the scenario is covered.
217
+ - For baseline verification, this may take a while on large codebases. Present results incrementally by capability.
218
+
219
+ ## Done
220
+ When the verification report is presented, the workflow is complete. Suggest next steps based on findings:
221
+ - **All clear** → `grimoire archive <change-id>` or `grimoire-pr`
222
+ - **Critical issues** → must fix before archiving
223
+ - **Warnings only** → user decides whether to fix or accept
@@ -0,0 +1,154 @@
1
+ # Bug Classification Taxonomy
2
+
3
+ 8-way root cause classification for bug triage. Used by bug-triage (full classification), bug (light classification).
4
+
5
+ ## Categories
6
+
7
+ ### CODE — Application defect
8
+
9
+ The code doesn't match the spec, or the behavior is clearly wrong due to a bug in the application logic.
10
+
11
+ **Signals:**
12
+ - The bug reproduces in tests
13
+ - The code path has an obvious logic error, missing edge case, or regression
14
+ - `git log` shows a recent change that introduced the issue
15
+ - The spec is clear and the implementation diverges from it
16
+
17
+ ### INFRASTRUCTURE — Platform or deployment issue
18
+
19
+ The application code is correct, but the environment it runs in is broken or misconfigured.
20
+
21
+ **Signals:**
22
+ - Works locally or in other environments, fails in a specific one
23
+ - Related to resources (memory, CPU, disk, network timeouts)
24
+ - Deploy pipeline, container, or orchestration issue
25
+ - Database server, cache, or queue is degraded
26
+ - DNS, load balancer, or certificate problem
27
+
28
+ **Examples:** staging database overloaded, k8s pod OOM-killed, CDN serving stale assets, Redis connection pool exhausted.
29
+
30
+ ### CONFIGURATION — Environment or feature config issue
31
+
32
+ The code is correct and infrastructure is healthy, but the environment is configured wrong.
33
+
34
+ **Signals:**
35
+ - Feature flag is off when it should be on (or vice versa)
36
+ - Environment variable is missing, wrong, or pointing to the wrong resource
37
+ - Permissions or CORS settings differ between environments
38
+ - A migration ran in one environment but not another
39
+
40
+ **Examples:** `STRIPE_API_KEY` pointing to test mode in production, feature flag `enable-2fa` disabled on staging, missing database migration on QA.
41
+
42
+ ### DATA — Data integrity or content issue
43
+
44
+ The code and config are correct, but the data is bad, missing, or in an unexpected state.
45
+
46
+ **Signals:**
47
+ - Only affects specific records, accounts, or tenants
48
+ - Data doesn't match expected schema or constraints
49
+ - Related to a recent data migration, import, or manual edit
50
+ - Null/missing where a value is expected
51
+
52
+ **Examples:** user record has null email from a botched migration, product has negative price from a CSV import, orphaned foreign key from a deleted parent.
53
+
54
+ ### THIRD-PARTY — External service or dependency issue
55
+
56
+ The issue originates outside the application boundary — in a vendor API, library, or upstream service.
57
+
58
+ **Signals:**
59
+ - Third-party status page shows an incident
60
+ - API responses from the vendor changed format or started returning errors
61
+ - Library behavior changed after an update
62
+ - Issue only occurs when the external service is involved
63
+
64
+ **Examples:** Stripe webhook format changed, SendGrid rate-limiting, a library upgrade introduced a breaking change, OAuth provider returning new error codes.
65
+
66
+ ### SECURITY — Vulnerability or security defect
67
+
68
+ The issue has security implications — unauthorized access, data exposure, injection, privilege escalation, or other vulnerabilities. May overlap with CODE, CONFIGURATION, or INFRASTRUCTURE but the security dimension changes how it's handled.
69
+
70
+ Check the report's `security: true` flag — the bug-report skill auto-screens for security signals. But also evaluate during investigation even if the flag wasn't set.
71
+
72
+ **Signals:**
73
+ - Authentication or authorization bypass — accessing resources without proper credentials or acting as another user
74
+ - Data exposure — PII, credentials, or internal data visible to unauthorized parties (in responses, logs, error messages, URLs)
75
+ - Injection — SQL, XSS, command injection, template injection, SSRF
76
+ - Privilege escalation — performing actions above the user's role
77
+ - Credential/secret leakage — API keys, tokens, or passwords in source code, logs, client-side bundles, or error responses
78
+ - Broken access control — IDOR (insecure direct object references), missing ownership checks, horizontal privilege escalation
79
+ - Cryptographic issues — weak hashing, plaintext storage, broken TLS configuration
80
+ - Denial of service — unbounded queries, resource exhaustion, regex DoS
81
+
82
+ **Severity uses a security-specific scale:**
83
+ - **critical** — active exploitation possible, data breach risk, auth bypass on production
84
+ - **high** — exploitable vulnerability but requires specific conditions or authenticated access
85
+ - **medium** — security weakness that increases risk but isn't directly exploitable (e.g., missing rate limiting, verbose error messages leaking internals)
86
+ - **low** — defense-in-depth improvement, hardening recommendation (e.g., missing security headers, overly permissive CORS in dev)
87
+
88
+ **Examples:** user can view other users' invoices by changing the ID in the URL (IDOR), admin API endpoint has no auth check, SQL injection in search query, JWT secret is hardcoded in source, error pages expose stack traces and DB connection strings.
89
+
90
+ ### DOCUMENTATION — Correct behavior, wrong expectations
91
+
92
+ The application works as designed, but the user's expectation doesn't match reality because documentation, training, or UX is misleading.
93
+
94
+ **Signals:**
95
+ - Feature spec clearly describes the reported behavior as correct
96
+ - The reporter's expectation is reasonable but doesn't match the design
97
+ - Help text, tooltips, or docs describe different behavior than what's implemented
98
+ - Onboarding or training missed this workflow
99
+
100
+ **Examples:** user expects instant unlock but spec says 30-minute cooldown, docs say "click Save" but the button is labeled "Apply", reported "bug" is actually an undocumented limitation.
101
+
102
+ ### NOT A BUG — Cannot reproduce or invalid
103
+
104
+ After thorough investigation, the reported issue is not reproducible or the report is invalid.
105
+
106
+ **This still requires evidence.** Never dismiss with "works for me." Document:
107
+ - Exactly what you tried
108
+ - In what environment, with what data
109
+ - Why you believe the issue is not valid
110
+ - What follow-up questions might clarify
111
+
112
+ ## Triage Decision Outcomes
113
+
114
+ After classification, one of four outcomes:
115
+
116
+ ### VALIDATE + ROUTE
117
+
118
+ The issue is real. Classify it AND route it:
119
+
120
+ | Classification | Route to | Next action |
121
+ |---|---|---|
122
+ | **Code** | Developer (this team) | → `grimoire-bug` for repro test + fix |
123
+ | **Infrastructure** | Infra/DevOps/SRE | Create or update ticket for the infra team with evidence |
124
+ | **Configuration** | DevOps or config owner for the affected environment | Describe the specific misconfiguration and expected correct value |
125
+ | **Data** | Developer or DBA depending on scope | Describe affected records and whether a migration/script is needed |
126
+ | **Third-party** | Developer (workaround) + vendor (upstream fix) | Document the vendor issue, check for workarounds, file upstream if possible |
127
+ | **Security** | Security lead + developer (see special handling below) | Confidential fix, may trigger incident response |
128
+
129
+ ### REJECT — Not a bug
130
+
131
+ The reported behavior is correct and the expectations are wrong, or the issue cannot be reproduced.
132
+
133
+ Rejection **requires evidence**. Provide one of:
134
+
135
+ - **By design** — cite the specific feature scenario or decision record. Quote the spec.
136
+ - **Cannot reproduce** — document exactly what you tried, in what environment, with what data.
137
+ - **Duplicate** — reference the existing bug report or fix.
138
+
139
+ ### REDIRECT — Documentation/training issue
140
+
141
+ The behavior is correct but the user's confusion is valid. The fix is better docs, UX copy, or training — not a code change.
142
+
143
+ 1. Update status to `redirected`
144
+ 2. Explain why the behavior is correct (cite specs)
145
+ 3. Recommend specific documentation or UX improvements
146
+ 4. Offer to file a separate improvement ticket for the docs/UX fix
147
+
148
+ ### NEEDS INFO — Can't decide yet
149
+
150
+ The report is incomplete or ambiguous. Generate specific follow-up questions — not "can you provide more details?" but:
151
+ - "Does this happen with all user roles or just admin?"
152
+ - "Which environment — dev, staging, or production?"
153
+ - "Can you share the exact error message or a screenshot?"
154
+ - "Is this specific to certain records/accounts, or does it affect everyone?"