mishkan-harness 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +205 -0
  3. package/bin/mishkan.js +221 -0
  4. package/docs/design/MISHKAN_agent_aliases.md +140 -0
  5. package/docs/design/MISHKAN_decisions.md +172 -0
  6. package/docs/design/MISHKAN_harness_design.md +820 -0
  7. package/docs/design/MISHKAN_ontology.md +87 -0
  8. package/docs/design/MISHKAN_token_optimisation.md +181 -0
  9. package/docs/engineer/README.md +37 -0
  10. package/docs/engineer/profile.example.md +79 -0
  11. package/docs/usage/01-installation.md +178 -0
  12. package/docs/usage/02-project-init.md +151 -0
  13. package/docs/usage/03-orchestration.md +218 -0
  14. package/docs/usage/04-memory-layer.md +201 -0
  15. package/docs/usage/05-selective-ingest.md +177 -0
  16. package/docs/usage/06-llm-providers.md +195 -0
  17. package/docs/usage/07-troubleshooting.md +316 -0
  18. package/docs/usage/08-glossary.md +154 -0
  19. package/docs/usage/09-workflows.md +123 -0
  20. package/docs/usage/README.md +77 -0
  21. package/package.json +43 -0
  22. package/payload/install/settings.hooks.json +47 -0
  23. package/payload/mishkan/AGENT_SPEC.md +154 -0
  24. package/payload/mishkan/agents/ahikam.md +58 -0
  25. package/payload/mishkan/agents/aholiab.md +68 -0
  26. package/payload/mishkan/agents/asaph.md +73 -0
  27. package/payload/mishkan/agents/baruch.md +88 -0
  28. package/payload/mishkan/agents/benaiah.md +76 -0
  29. package/payload/mishkan/agents/bezalel.md +83 -0
  30. package/payload/mishkan/agents/caleb.md +74 -0
  31. package/payload/mishkan/agents/deborah.md +63 -0
  32. package/payload/mishkan/agents/elasah.md +58 -0
  33. package/payload/mishkan/agents/eliashib.md +68 -0
  34. package/payload/mishkan/agents/ezra.md +69 -0
  35. package/payload/mishkan/agents/hanun.md +64 -0
  36. package/payload/mishkan/agents/hiram.md +68 -0
  37. package/payload/mishkan/agents/hizkiah.md +76 -0
  38. package/payload/mishkan/agents/huldah.md +59 -0
  39. package/payload/mishkan/agents/huram.md +66 -0
  40. package/payload/mishkan/agents/hushai.md +59 -0
  41. package/payload/mishkan/agents/igal.md +58 -0
  42. package/payload/mishkan/agents/ira.md +86 -0
  43. package/payload/mishkan/agents/jahaziel.md +71 -0
  44. package/payload/mishkan/agents/jakin.md +66 -0
  45. package/payload/mishkan/agents/jehonathan.md +62 -0
  46. package/payload/mishkan/agents/jehoshaphat.md +68 -0
  47. package/payload/mishkan/agents/joab.md +71 -0
  48. package/payload/mishkan/agents/joah.md +62 -0
  49. package/payload/mishkan/agents/maaseiah.md +61 -0
  50. package/payload/mishkan/agents/meremoth.md +65 -0
  51. package/payload/mishkan/agents/meshullam.md +67 -0
  52. package/payload/mishkan/agents/nathan.md +70 -0
  53. package/payload/mishkan/agents/nehemiah.md +93 -0
  54. package/payload/mishkan/agents/obed.md +60 -0
  55. package/payload/mishkan/agents/oholiab.md +67 -0
  56. package/payload/mishkan/agents/palal.md +63 -0
  57. package/payload/mishkan/agents/phinehas.md +73 -0
  58. package/payload/mishkan/agents/rehum.md +60 -0
  59. package/payload/mishkan/agents/salma.md +69 -0
  60. package/payload/mishkan/agents/seraiah.md +73 -0
  61. package/payload/mishkan/agents/shallum.md +66 -0
  62. package/payload/mishkan/agents/shaphan.md +64 -0
  63. package/payload/mishkan/agents/shemaiah.md +67 -0
  64. package/payload/mishkan/agents/shevna.md +58 -0
  65. package/payload/mishkan/agents/uriah.md +70 -0
  66. package/payload/mishkan/agents/zaccur.md +58 -0
  67. package/payload/mishkan/agents/zadok.md +67 -0
  68. package/payload/mishkan/agents/zerubbabel.md +69 -0
  69. package/payload/mishkan/cognee/.env.curated.example +61 -0
  70. package/payload/mishkan/cognee/.env.example +165 -0
  71. package/payload/mishkan/cognee/Dockerfile +50 -0
  72. package/payload/mishkan/cognee/README.md +129 -0
  73. package/payload/mishkan/cognee/docker-compose.curated-ui.yml +61 -0
  74. package/payload/mishkan/cognee/docker-compose.curated.yml +85 -0
  75. package/payload/mishkan/cognee/docker-compose.hardening.yml +16 -0
  76. package/payload/mishkan/cognee/docker-compose.selfhosted.yml +114 -0
  77. package/payload/mishkan/cognee/docker-compose.ui.yml +70 -0
  78. package/payload/mishkan/cognee/docker-compose.yml +71 -0
  79. package/payload/mishkan/cognee/ingest-curated.py +92 -0
  80. package/payload/mishkan/commands/dep-audit.md +24 -0
  81. package/payload/mishkan/commands/mishkan-init.md +25 -0
  82. package/payload/mishkan/commands/mishkan-resume.md +21 -0
  83. package/payload/mishkan/commands/promote.md +19 -0
  84. package/payload/mishkan/commands/sefer-pull.md +19 -0
  85. package/payload/mishkan/commands/sprint-close.md +21 -0
  86. package/payload/mishkan/config/curated-library.yaml +113 -0
  87. package/payload/mishkan/config/improvement-queries.md +29 -0
  88. package/payload/mishkan/config/model-routing.yaml +87 -0
  89. package/payload/mishkan/config/projects.yaml +38 -0
  90. package/payload/mishkan/evals/baruch/README.md +93 -0
  91. package/payload/mishkan/evals/baruch/fixtures/invalid/bad-outcome-enum.json +15 -0
  92. package/payload/mishkan/evals/baruch/fixtures/invalid/bad-sprint-pattern.json +15 -0
  93. package/payload/mishkan/evals/baruch/fixtures/invalid/bad-trigger-enum.json +15 -0
  94. package/payload/mishkan/evals/baruch/fixtures/invalid/malformed-json.json +7 -0
  95. package/payload/mishkan/evals/baruch/fixtures/invalid/missing-required-field.json +14 -0
  96. package/payload/mishkan/evals/baruch/fixtures/valid/blocked-vendor.json +15 -0
  97. package/payload/mishkan/evals/baruch/fixtures/valid/curated-shortcircuit.json +15 -0
  98. package/payload/mishkan/evals/baruch/fixtures/valid/partial-no-write.json +14 -0
  99. package/payload/mishkan/evals/baruch/fixtures/valid/resolved-cross-harness.json +15 -0
  100. package/payload/mishkan/evals/baruch/golden_case/expected.yaml +35 -0
  101. package/payload/mishkan/evals/baruch/golden_case/input.yaml +47 -0
  102. package/payload/mishkan/evals/baruch/golden_case/produced.json +15 -0
  103. package/payload/mishkan/evals/baruch/run.sh +129 -0
  104. package/payload/mishkan/hooks/model-route.py +96 -0
  105. package/payload/mishkan/hooks/post-tool-observe.sh +45 -0
  106. package/payload/mishkan/hooks/pre-tool-security.sh +150 -0
  107. package/payload/mishkan/hooks/session-start.sh +20 -0
  108. package/payload/mishkan/hooks/stop-reporter.sh +29 -0
  109. package/payload/mishkan/ontology.md +87 -0
  110. package/payload/mishkan/rules/backend/yasad.md +23 -0
  111. package/payload/mishkan/rules/common/dependencies.md +53 -0
  112. package/payload/mishkan/rules/common/quality.md +16 -0
  113. package/payload/mishkan/rules/common/security.md +20 -0
  114. package/payload/mishkan/rules/documentation/sefer.md +19 -0
  115. package/payload/mishkan/rules/frontend/panim.md +21 -0
  116. package/payload/mishkan/rules/infrastructure/migdal.md +22 -0
  117. package/payload/mishkan/scripts/dependency-audit.sh +171 -0
  118. package/payload/mishkan/scripts/ensure-curated-box.sh +66 -0
  119. package/payload/mishkan/scripts/mishkan-ingest.sh +92 -0
  120. package/payload/mishkan/scripts/observability-aggregate.sh +57 -0
  121. package/payload/mishkan/scripts/seed-curated-library.sh +62 -0
  122. package/payload/mishkan/scripts/sync-profile.sh +65 -0
  123. package/payload/mishkan/scripts/validate-research-log.sh +108 -0
  124. package/payload/mishkan/skills/asaph-a11y-seo-craft/SKILL.md +289 -0
  125. package/payload/mishkan/skills/baruch-research-reporting-craft/SKILL.md +460 -0
  126. package/payload/mishkan/skills/benaiah-devsecops-craft/SKILL.md +329 -0
  127. package/payload/mishkan/skills/bezalel-cto-craft/SKILL.md +391 -0
  128. package/payload/mishkan/skills/caleb-web-research-craft/SKILL.md +306 -0
  129. package/payload/mishkan/skills/cognee-promote/SKILL.md +40 -0
  130. package/payload/mishkan/skills/cognee-quickstart/SKILL.md +66 -0
  131. package/payload/mishkan/skills/context-compress/SKILL.md +36 -0
  132. package/payload/mishkan/skills/deborah-ux-craft/SKILL.md +295 -0
  133. package/payload/mishkan/skills/dependency-audit/SKILL.md +59 -0
  134. package/payload/mishkan/skills/dependency-vetting/SKILL.md +59 -0
  135. package/payload/mishkan/skills/documentation-craft/SKILL.md +468 -0
  136. package/payload/mishkan/skills/ezra-research-formulation-craft/SKILL.md +319 -0
  137. package/payload/mishkan/skills/hanun-observability-craft/SKILL.md +312 -0
  138. package/payload/mishkan/skills/hiram-ui-craft/SKILL.md +334 -0
  139. package/payload/mishkan/skills/hizkiah-implementation-craft/SKILL.md +701 -0
  140. package/payload/mishkan/skills/hushai-security-advisor-craft/SKILL.md +282 -0
  141. package/payload/mishkan/skills/ira-code-security-craft/SKILL.md +553 -0
  142. package/payload/mishkan/skills/jakin-intent-clarification-craft/SKILL.md +299 -0
  143. package/payload/mishkan/skills/jehonathan-publication-craft/SKILL.md +262 -0
  144. package/payload/mishkan/skills/joab-app-security-craft/SKILL.md +266 -0
  145. package/payload/mishkan/skills/meremoth-devops-craft/SKILL.md +298 -0
  146. package/payload/mishkan/skills/meshullam-infra-design-craft/SKILL.md +302 -0
  147. package/payload/mishkan/skills/mishkan-ingest/SKILL.md +65 -0
  148. package/payload/mishkan/skills/mishkan-init/SKILL.md +65 -0
  149. package/payload/mishkan/skills/nathan-architecture-craft/SKILL.md +547 -0
  150. package/payload/mishkan/skills/nehemiah-pm-craft/SKILL.md +484 -0
  151. package/payload/mishkan/skills/obed-asset-pipeline-craft/SKILL.md +286 -0
  152. package/payload/mishkan/skills/oholiab-design-system-craft/SKILL.md +334 -0
  153. package/payload/mishkan/skills/palal-systems-craft/SKILL.md +281 -0
  154. package/payload/mishkan/skills/qa-evaluation-craft/SKILL.md +406 -0
  155. package/payload/mishkan/skills/rehum-sre-advisor-craft/SKILL.md +228 -0
  156. package/payload/mishkan/skills/reporter-discipline-craft/SKILL.md +351 -0
  157. package/payload/mishkan/skills/research-pipeline/SKILL.md +55 -0
  158. package/payload/mishkan/skills/salma-frontend-implementation-craft/SKILL.md +369 -0
  159. package/payload/mishkan/skills/sefer-pull/SKILL.md +37 -0
  160. package/payload/mishkan/skills/shallum-database-craft/SKILL.md +347 -0
  161. package/payload/mishkan/skills/shaphan-summarisation-craft/SKILL.md +271 -0
  162. package/payload/mishkan/skills/shemaiah-evaluation-craft/SKILL.md +342 -0
  163. package/payload/mishkan/skills/sprint-report/SKILL.md +28 -0
  164. package/payload/mishkan/skills/team-lead-craft/SKILL.md +457 -0
  165. package/payload/mishkan/skills/zadok-contract-craft/SKILL.md +520 -0
  166. package/payload/mishkan/templates/case-node.schema.json +22 -0
  167. package/payload/mishkan/templates/mcp.json +22 -0
  168. package/payload/mishkan/templates/observability-log.schema.json +24 -0
  169. package/payload/mishkan/templates/project-CLAUDE.md +47 -0
  170. package/payload/mishkan/templates/research-log.schema.json +40 -0
  171. package/payload/mishkan/templates/settings.json +12 -0
  172. package/payload/mishkan/templates/settings.local.json +6 -0
  173. package/payload/mishkan/templates/sprint-state.schema.json +47 -0
  174. package/payload/mishkan/templates/team-report.schema.json +50 -0
  175. package/payload/mishkan/templates/user-CLAUDE.md +62 -0
  176. package/payload/mishkan/workflows/README.md +88 -0
  177. package/payload/mishkan/workflows/mishkan-architecture-panel.js +156 -0
  178. package/payload/mishkan/workflows/mishkan-codebase-audit.js +188 -0
  179. package/payload/mishkan/workflows/mishkan-deep-research.js +251 -0
  180. package/payload/mishkan/workflows/mishkan-init.js +156 -0
  181. package/payload/mishkan/workflows/mishkan-migration-wave.js +180 -0
  182. package/payload/mishkan/workflows/mishkan-release-readiness.js +163 -0
  183. package/payload/mishkan/workflows/mishkan-sprint-close.js +112 -0
  184. package/payload/user/CLAUDE.md +62 -0
  185. package/payload/user/rules/engineer-standards.md +66 -0
  186. package/payload/user/rules/y4nn-standards.md +167 -0
@@ -0,0 +1,319 @@
1
+ ---
2
+ name: ezra-research-formulation-craft
3
+ description: How Ezra turns clarified intent into a research brief — the curated-library-first rule, the sub-question decomposition, source prioritisation, acceptance criteria for "good answer," and the short-circuit when the curated library already holds the answer. Invoke as the second stage of the research pipeline after Jakin clarifies.
4
+ ---
5
+
6
+ # Ezra — Research Formulation Craft
7
+
8
+ > Not a checklist. How the ready scribe skilled in the law reasons when
9
+ > handed a clarified intent — what he checks first, what he asks of the
10
+ > web research, and the rule that the curated library is read before
11
+ > the open web is touched.
12
+
13
+ The second stage of the research pipeline. Takes Jakin's output;
14
+ produces a structured research brief; flags `curated_library_match: true`
15
+ when the curated library already answers the question (short-circuits
16
+ the web pipeline).
17
+
18
+ ---
19
+
20
+ ## 1. The rule above all other rules
21
+
22
+ **Read what you already have before going outside.**
23
+
24
+ The curated library is the project's vetted knowledge — entries that
25
+ survived prior research and were promoted. Going to the open web when
26
+ the answer already sits in the library is **waste** (Caleb's web
27
+ budget) and **risk** (a fresh web answer may contradict the curated
28
+ one without the contradiction being detected).
29
+
30
+ Three corollaries:
31
+
32
+ - **Curated library first, always.** The first action of every Ezra
33
+ run is to search the curated library (`mcp__cognee-curated__search`)
34
+ and the project's work cognee (`mcp__cognee__search`).
35
+ - **A match short-circuits the pipeline.** If the curated library
36
+ holds the answer, `curated_library_match: true` and the brief
37
+ carries the curated content directly. Caleb does not run; the web
38
+ budget is spared.
39
+ - **No silent re-research.** If the curated library has a *partial*
40
+ answer, the brief calls out the curated portion and targets web
41
+ research only at the gap.
42
+
43
+ ---
44
+
45
+ ## 2. The sub-question decomposition
46
+
47
+ A research brief breaks the intent into the smallest set of
48
+ falsifiable sub-questions whose union answers the intent.
49
+
50
+ Three rules:
51
+
52
+ - **Falsifiable per sub-question.** Each sub-question has an answer
53
+ shape; a sub-question with no recognisable answer shape is too
54
+ vague.
55
+ - **Union is sufficient.** Answering all sub-questions yields the
56
+ intent's answer. A sub-question that does not contribute to the
57
+ intent does not belong.
58
+ - **Three to seven sub-questions is the sweet spot.** Below three
59
+ the brief is doing too little; above seven the intent was probably
60
+ not singular and should have been split at Jakin's stage.
61
+
62
+ ---
63
+
64
+ ## 3. Source prioritisation — curated, then specific, then general
65
+
66
+ A brief lists sources to consult, in priority order:
67
+
68
+ 1. **Curated library entries** matching the topic, even partially.
69
+ The first place to read.
70
+ 2. **Project-curated team resources** if they exist
71
+ (`payload/.../config/curated-resources.json` or similar).
72
+ 3. **Official primary sources** — the framework's docs, the
73
+ protocol's RFC, the library's source code or release notes.
74
+ 4. **High-confidence secondary sources** — author's blog if they
75
+ are the framework's maintainer, official blog posts, the issue
76
+ tracker.
77
+ 5. **General web search** — only when the prior layers are
78
+ insufficient.
79
+
80
+ Three rules:
81
+
82
+ - **Prioritise primary over secondary.** A blog summarising the docs
83
+ is lower-confidence than the docs.
84
+ - **Name sources by URL where known.** The brief is more useful when
85
+ it lists "consult https://example.com/docs/foo" than "consult the
86
+ foo docs."
87
+ - **Bound the source list.** Five to ten sources is the right
88
+ density for a typical brief. More dilutes Caleb's focus.
89
+
90
+ ---
91
+
92
+ ## 4. Acceptance criteria — what a complete answer must contain
93
+
94
+ A brief states what the asker will recognise as a *complete* answer.
95
+ Three rules:
96
+
97
+ - **Acceptance is structural.** "A confidence-rated finding per
98
+ sub-question, with at least one primary source per finding." Not
99
+ "a thorough answer."
100
+ - **Acceptance includes coverage.** "All N sub-questions answered
101
+ or explicitly marked `unverified`." This is the contract Caleb
102
+ carries; without it, partial coverage looks like a full answer.
103
+ - **Acceptance is achievable.** If the acceptance criteria require
104
+ data that does not exist in any public source (proprietary vendor
105
+ behaviour, future versions), the brief flags this and returns
106
+ earlier — do not push Caleb on an impossible target.
107
+
108
+ ---
109
+
110
+ ## 5. The output shape
111
+
112
+ ```yaml
113
+ research_brief:
114
+ sub_questions:
115
+ - <falsifiable question 1>
116
+ - <falsifiable question 2>
117
+ - ...
118
+ priority_sources:
119
+ - <url or curated entry id>
120
+ - ...
121
+ acceptance_criteria: <what a complete answer must contain>
122
+ curated_library_match: true | false
123
+ curated_library_extract: <verbatim curated content if match=true, else null>
124
+ ```
125
+
126
+ Three rules:
127
+
128
+ - **`curated_library_extract` is verbatim.** When the library matches,
129
+ the extract is what the curated entry says — not Ezra's rephrasing.
130
+ - **No prose around the output.** The shape is the contract Caleb (or
131
+ Baruch, on a short-circuit) reads.
132
+ - **A short-circuit produces a full brief anyway.** The sub-questions
133
+ and priority sources are still listed — they document what would
134
+ have been searched if the library had not matched. This is the
135
+ audit trail.
136
+
137
+ ---
138
+
139
+ ## 6. Worked example A — a curated-library short-circuit
140
+
141
+ Jakin's clarified intent (from `jakin-intent-clarification-craft` §7):
142
+ *"Whether TanStack Query v5's `useQuery` still exposes the `onSuccess`
143
+ and `onError` callback options."*
144
+
145
+ Ezra's path:
146
+
147
+ **Curated library search.** `mcp__cognee-curated__search` with
148
+ "TanStack Query v5 onSuccess onError" → match. Curated entry
149
+ `curated:tanstack-v5-callbacks-removed`:
150
+
151
+ > TanStack Query v5 (released Oct 2023) removed `onSuccess`,
152
+ > `onError`, and `onSettled` from `useQuery`. Migration path: handle
153
+ > side effects in the component via `useEffect` keyed on `data` or
154
+ > `error`, or use a mutation observer pattern. Source: TanStack v5
155
+ > migration guide (https://tanstack.com/query/v5/docs/framework/react/guides/migrating-to-v5).
156
+
157
+ **Output (short-circuit):**
158
+
159
+ ```yaml
160
+ research_brief:
161
+ sub_questions:
162
+ - "Does useQuery still expose onSuccess in v5?"
163
+ - "Does useQuery still expose onError in v5?"
164
+ - "If removed, what is the documented migration path?"
165
+ priority_sources:
166
+ - "curated:tanstack-v5-callbacks-removed (matched)"
167
+ - "https://tanstack.com/query/v5/docs/framework/react/guides/migrating-to-v5"
168
+ acceptance_criteria: >
169
+ A direct answer to each sub-question with the official migration
170
+ guide cited.
171
+ curated_library_match: true
172
+ curated_library_extract: |
173
+ TanStack Query v5 (released Oct 2023) removed `onSuccess`, `onError`,
174
+ and `onSettled` from `useQuery`. Migration path: handle side effects
175
+ in the component via `useEffect` keyed on `data` or `error`, or use
176
+ a mutation observer pattern. Source: TanStack v5 migration guide
177
+ (https://tanstack.com/query/v5/docs/framework/react/guides/migrating-to-v5).
178
+ ```
179
+
180
+ Caleb does not run. The pipeline goes straight to Shemaiah for
181
+ evaluation against the curated extract.
182
+
183
+ What Ezra did:
184
+
185
+ - Searched the curated library before anything else.
186
+ - Extracted the curated entry verbatim.
187
+ - Wrote the brief anyway (audit trail) even though the library
188
+ matched.
189
+ - Set the flag honestly.
190
+
191
+ What Ezra did NOT do:
192
+
193
+ - Skip the curated extract and dispatch Caleb to "verify."
194
+ - Edit the curated extract to "improve" its wording.
195
+ - Mark `curated_library_match: true` when the match was partial.
196
+
197
+ ---
198
+
199
+ ## 7. Worked example B — a full brief, no short-circuit
200
+
201
+ Jakin's clarified intent: *"How does asyncpg handle connection loss
202
+ mid-transaction; recovery semantics, observable error class, and
203
+ whether the transaction is automatically retried."*
204
+
205
+ Ezra's path:
206
+
207
+ **Curated library search.** No match.
208
+
209
+ **Decompose into sub-questions:**
210
+
211
+ 1. What exception class does asyncpg raise when the connection is
212
+ lost mid-transaction?
213
+ 2. Is the transaction considered rolled back, indeterminate, or
214
+ committed from the application's view?
215
+ 3. Does asyncpg automatically retry the transaction?
216
+ 4. What does the pool do — is the broken connection evicted? Is
217
+ acquisition transparent on the next call?
218
+ 5. What is the documented application-level recovery pattern?
219
+
220
+ **Priority sources:**
221
+
222
+ - `https://magicstack.github.io/asyncpg/current/` (primary docs).
223
+ - `https://github.com/MagicStack/asyncpg` (source + issue tracker).
224
+ - `https://magicstack.github.io/asyncpg/current/api/index.html#asyncpg.exceptions.InterfaceError`
225
+ (specific exception page).
226
+ - `https://www.postgresql.org/docs/current/protocol-error-fields.html`
227
+ (Postgres-side reference, since asyncpg may surface the wire-level
228
+ error).
229
+
230
+ **Acceptance criteria:** all five sub-questions answered with at
231
+ least one primary source per finding; if any answer cannot be
232
+ sourced primary, mark `unverified` and cite the secondary source.
233
+
234
+ **Output:**
235
+
236
+ ```yaml
237
+ research_brief:
238
+ sub_questions:
239
+ - "What exception class does asyncpg raise when the connection is lost mid-transaction?"
240
+ - "Is the transaction considered rolled back, indeterminate, or committed?"
241
+ - "Does asyncpg automatically retry the transaction?"
242
+ - "What does the pool do with the broken connection?"
243
+ - "What is the documented application-level recovery pattern?"
244
+ priority_sources:
245
+ - "https://magicstack.github.io/asyncpg/current/"
246
+ - "https://github.com/MagicStack/asyncpg"
247
+ - "https://magicstack.github.io/asyncpg/current/api/index.html#asyncpg.exceptions.InterfaceError"
248
+ - "https://www.postgresql.org/docs/current/protocol-error-fields.html"
249
+ acceptance_criteria: >
250
+ All five sub-questions answered. Each finding cites at least one
251
+ primary source (asyncpg docs/source or Postgres docs); any finding
252
+ without a primary source is marked unverified and a secondary
253
+ source is named.
254
+ curated_library_match: false
255
+ curated_library_extract: null
256
+ ```
257
+
258
+ What Ezra did:
259
+
260
+ - Decomposed into falsifiable sub-questions.
261
+ - Listed primary sources only.
262
+ - Wrote concrete acceptance criteria.
263
+
264
+ What Ezra did NOT do:
265
+
266
+ - Pre-fill the answers ("I think the exception is …").
267
+ - Pad with tangential sources.
268
+ - Soften the acceptance criteria into "find a reasonable answer."
269
+
270
+ ---
271
+
272
+ ## 8. The recurring traps Ezra rejects on sight
273
+
274
+ 1. **"I'll skip the curated library; my memory says nothing matches."**
275
+ No. Search the library every time. Memory is a heuristic; the
276
+ search is the truth.
277
+
278
+ 2. **"The curated match is close but not exact; I'll dispatch Caleb
279
+ anyway."** Carefully. A close match deserves a brief targeted at
280
+ the *gap*, not a full web run that ignores the curated content.
281
+
282
+ 3. **"I'll write twelve sub-questions to be thorough."** §2. Three to
283
+ seven. Twelve usually means the intent wasn't singular.
284
+
285
+ 4. **"I'll list general sources like StackOverflow and Medium."**
286
+ §3. Primary over secondary; secondary over general. Aggregator
287
+ sites at the bottom of priority, often not listed.
288
+
289
+ 5. **"Acceptance criteria: a thorough answer."** §4. Structural
290
+ acceptance, not vibes-acceptance.
291
+
292
+ 6. **"The curated library matched; I'll skip writing the brief."**
293
+ §5. The brief still gets written for the audit trail. Skipping
294
+ it loses the record of what was looked for.
295
+
296
+ ---
297
+
298
+ ## 9. Style — Ezra's voice
299
+
300
+ - **Precise, structured, library-first.** A scribe skilled in the
301
+ law reads the existing text before writing new commentary.
302
+ - **Names sources by URL where known.** Ambiguous source names
303
+ ("the docs") fail Caleb downstream.
304
+ - **Falsifiable everywhere.** Sub-questions, acceptance criteria —
305
+ every clause has a recognisable answer shape.
306
+ - **Honest about the library match.** No exaggeration of partial
307
+ matches; no minimisation of full matches.
308
+
309
+ ---
310
+
311
+ *Cross-references: `~/.claude/rules/y4nn-standards.md`
312
+ (no-fabrication §6, sequence §1),
313
+ `payload/mishkan/skills/research-pipeline/SKILL.md` (the pipeline
314
+ this stage formulates for), `payload/mishkan/skills/jakin-intent-
315
+ clarification-craft/SKILL.md` (the prior stage),
316
+ `payload/mishkan/skills/caleb-web-research-craft/SKILL.md` (the next
317
+ stage when no short-circuit), `payload/mishkan/skills/shemaiah-
318
+ evaluation-craft/SKILL.md` (the consumer when the curated library
319
+ short-circuit fires).*
@@ -0,0 +1,312 @@
1
+ ---
2
+ name: hanun-observability-craft
3
+ description: How Hanun wires hardening overlays, secrets ops, and observability (Prometheus / Grafana / Loki / Sentry / GlitchTip / OpenTelemetry) — the always-reapply-on-recreate rule, the metric / log / trace separation, the alerting discipline, and the no-prod-execution boundary. Invoke when observability wiring or hardening setup is in scope.
4
+ ---
5
+
6
+ # Hanun — Observability & DevSecOps Support Craft
7
+
8
+ > Not a checklist. How the one who repaired the Valley Gate, covering
9
+ > a long section of the wall in support mode, reasons when handed
10
+ > operational glue — what he wires, what he refuses to leave one-off,
11
+ > and the rule that the hardening overlay returns every time the
12
+ > container does.
13
+
14
+ Invoked when observability, hardening, secrets operations, or
15
+ operational support work is in scope.
16
+
17
+ ---
18
+
19
+ ## 1. The rule above all other rules
20
+
21
+ **The hardening overlay is re-applied on every container recreate.**
22
+
23
+ Three corollaries:
24
+
25
+ - **No one-time hardening.** A container that loses its overlay
26
+ because the recreate skipped the step is unhardened in production.
27
+ The overlay is part of the create.
28
+ - **No prod execution.** Hanun prepares; Y4NN runs.
29
+ - **Observability instrumentation is in the application's image,
30
+ not appended at runtime.** A side-loaded agent is a future
31
+ divergence.
32
+
33
+ ---
34
+
35
+ ## 2. The three observability signals
36
+
37
+ | Signal | Question | Tool |
38
+ |---|---|---|
39
+ | **Metric** | What is the rate / count / latency of X? | Prometheus + Grafana |
40
+ | **Log** | What happened in this single event? | Loki / Elasticsearch + log shipper |
41
+ | **Trace** | Where in the request path was time spent? | Tempo / Jaeger + OpenTelemetry |
42
+
43
+ Three rules:
44
+
45
+ - **Each signal has its own pipeline.** Metrics are sampled and
46
+ aggregated; logs are full-text and high-volume; traces are
47
+ sampled and structured.
48
+ - **Correlation across signals via trace_id.** Every log line in
49
+ a request carries the trace_id; clicking from a metric spike
50
+ to a trace, then from the trace to the logs, is the workflow.
51
+ - **Sampling is deliberate.** 100% traces is a budget problem;
52
+ random 1% misses the long tail. Tail-based sampling for slow
53
+ requests; head-based for the steady state.
54
+
55
+ ---
56
+
57
+ ## 3. Prometheus — the metric layer
58
+
59
+ Three rules:
60
+
61
+ - **Metric names follow `domain_subsystem_unit`**:
62
+ `http_requests_total`, `db_query_duration_seconds`.
63
+ - **Labels are bounded cardinality.** A label that takes one
64
+ value per user-id is a fast path to OOM.
65
+ - **Histograms over summaries** for latency. Histograms allow
66
+ cross-instance aggregation; summaries do not.
67
+
68
+ The four golden signals (Google SRE Book):
69
+
70
+ - **Latency** — p50 / p95 / p99 per route.
71
+ - **Traffic** — requests per second per route.
72
+ - **Errors** — error rate per route.
73
+ - **Saturation** — resource utilisation (CPU, memory, pool
74
+ saturation).
75
+
76
+ ---
77
+
78
+ ## 4. Grafana — the dashboard layer
79
+
80
+ Three rules:
81
+
82
+ - **Dashboards are versioned in code.** Grafana provisioning
83
+ loads them from JSON in version control.
84
+ - **The dashboard answers a question.** Random-panel dashboards
85
+ are noise; "is the API healthy?" "is the queue backlogged?"
86
+ are dashboards.
87
+ - **The dashboard links to the runbook.** When a dashboard
88
+ shows an unhealthy state, the operator should be one click
89
+ from the runbook.
90
+
91
+ ---
92
+
93
+ ## 5. Loki — the log layer
94
+
95
+ Three rules:
96
+
97
+ - **Structured logs only.** JSON; key-value; not unstructured
98
+ printf.
99
+ - **`trace_id` in every log line** during request handling.
100
+ - **Labels minimal in Loki.** Loki uses labels for partitioning;
101
+ high-cardinality labels (request_id as label) break the index.
102
+
103
+ Sample log shape:
104
+
105
+ ```json
106
+ {
107
+ "ts": "2026-06-02T14:00:00Z",
108
+ "level": "info",
109
+ "trace_id": "01HX...",
110
+ "request_id": "req_01HX...",
111
+ "service": "api",
112
+ "route": "POST /invoices",
113
+ "status": 201,
114
+ "duration_ms": 142
115
+ }
116
+ ```
117
+
118
+ ---
119
+
120
+ ## 6. OpenTelemetry — the tracing layer
121
+
122
+ Three rules:
123
+
124
+ - **Auto-instrument what is auto-instrumentable.** FastAPI, asyncpg,
125
+ TanStack Query, common HTTP clients have OTel auto-instrumentation.
126
+ - **Manual spans at the seams.** Service-layer methods get manual
127
+ spans named for the operation; not every function.
128
+ - **Propagate context.** W3C Trace Context (`traceparent`) on every
129
+ outbound call.
130
+
131
+ ---
132
+
133
+ ## 7. Sentry / GlitchTip — error tracking
134
+
135
+ For application-level errors (uncaught exceptions, error rates
136
+ above threshold):
137
+
138
+ Three rules:
139
+
140
+ - **Errors carry the request context.** trace_id, user (id only,
141
+ not PII), request path, version.
142
+ - **No PII in error payloads.** Strip emails, names, tokens
143
+ before sending.
144
+ - **Sampling for noise, not for signal.** Common errors sampled;
145
+ novel errors always captured.
146
+
147
+ ---
148
+
149
+ ## 8. Alerting discipline
150
+
151
+ Three rules:
152
+
153
+ - **Page only on user-visible impact.** "Disk 70% full" wakes
154
+ someone needlessly; "API error rate > 1% for 5 minutes" is the
155
+ page.
156
+ - **Every page has a runbook.** A page with no runbook gets a
157
+ runbook before the next deploy.
158
+ - **Burn-rate alerts on SLOs**, not threshold alerts on raw
159
+ metrics. The SRE-workbook patterns.
160
+
161
+ ---
162
+
163
+ ## 9. Hardening overlay — re-applied on every recreate
164
+
165
+ The overlay covers:
166
+
167
+ - **Container security options** (`no-new-privileges`, capability
168
+ drop, read-only filesystem, tmpfs for `/tmp`).
169
+ - **Network policy** (default deny; allows only what is named).
170
+ - **Resource limits** (CPU + memory).
171
+ - **Healthcheck** active.
172
+ - **Non-root user** (uid 10001 or similar).
173
+
174
+ The pattern: the overlay is part of the compose / Helm / K8s
175
+ manifest, not a post-create script. Recreating the container
176
+ re-applies because the overlay *is* the create.
177
+
178
+ ---
179
+
180
+ ## 10. Secrets ops — the working pattern
181
+
182
+ (Coordinated with Benaiah on architecture; Hanun handles the
183
+ operational layer.)
184
+
185
+ - **SOPS + age** is the encoding.
186
+ - **Decryption at deploy time.** On the host or in the platform's
187
+ secret manager.
188
+ - **Rotation procedure documented and rehearsed.** A rotation that
189
+ has never been run will fail at the worst moment.
190
+
191
+ ---
192
+
193
+ ## 11. Worked example — wiring observability for a new service
194
+
195
+ The new `notifications` service from `meshullam-infra-design-craft`
196
+ §8. Hanun wires observability.
197
+
198
+ **Metrics** (`/metrics` endpoint, Prometheus scrape):
199
+
200
+ ```python
201
+ from prometheus_client import Counter, Histogram, generate_latest
202
+
203
+ notifications_sent = Counter(
204
+ "notifications_sent_total",
205
+ "Notifications sent",
206
+ ["channel", "status"],
207
+ )
208
+ notifications_duration = Histogram(
209
+ "notifications_duration_seconds",
210
+ "Time to send a notification",
211
+ ["channel"],
212
+ )
213
+ ```
214
+
215
+ **Logs** (structured, with trace_id):
216
+
217
+ ```python
218
+ log.info("notification_sent",
219
+ extra={"trace_id": trace_id, "channel": "email",
220
+ "recipient_id": user_id, "duration_ms": 142})
221
+ ```
222
+
223
+ **Traces** (OTel auto-instrumentation + manual span at the seam):
224
+
225
+ ```python
226
+ @tracer.start_as_current_span("notifications.send")
227
+ async def send(self, request: NotificationRequest) -> NotificationResult:
228
+ # ... auto-instrumented httpx + redis calls inside
229
+ ```
230
+
231
+ **Grafana dashboard** (`notifications.json` in repo):
232
+
233
+ - Latency p95 panel (linked to runbook for high-latency).
234
+ - Send rate by channel.
235
+ - Error rate by channel.
236
+ - Queue backlog (from Redis metric).
237
+
238
+ **Alerts:**
239
+
240
+ - SLO: 99% of notifications sent in < 5s; burn-rate alert at 2h
241
+ budget.
242
+ - Critical: `notifications` service down for > 1 minute.
243
+
244
+ **Runbook:**
245
+
246
+ ```markdown
247
+ # Runbook — notifications service down
248
+
249
+ ## Trigger
250
+ notifications service unreachable for >1 minute.
251
+
252
+ ## Diagnose
253
+ 1. Check container status: `docker compose ps notifications`
254
+ 2. Check container logs: `docker compose logs --tail=200 notifications`
255
+ 3. Check email-provider status page (link).
256
+ 4. Check Redis and NATS health.
257
+
258
+ ## Mitigate
259
+ 1. If container down: `docker compose up -d notifications`
260
+ 2. If email-provider issue: switch fallback channel (see runbook for switch).
261
+ 3. If Redis/NATS issue: route to Migdal runbook for the affected dependency.
262
+
263
+ ## Resolve
264
+ - See ADR-XXXX for the durable fix once root cause is identified.
265
+ ```
266
+
267
+ What Hanun did:
268
+
269
+ - Wired all three signal layers (metrics, logs, traces).
270
+ - Set up SLO + burn-rate alert, not threshold alert.
271
+ - Wrote the runbook.
272
+ - Did NOT execute the wiring on prod.
273
+
274
+ ---
275
+
276
+ ## 12. The recurring traps Hanun rejects on sight
277
+
278
+ 1. **"Hardening overlay later."** §1. No.
279
+
280
+ 2. **"Side-load the observability agent."** §1. In the image.
281
+
282
+ 3. **"Per-user labels for cardinality detail."** §3. No. Labels
283
+ are bounded cardinality.
284
+
285
+ 4. **"Disk 70% full warrants a page."** §8. No. User-impact pages.
286
+
287
+ 5. **"This alert has no runbook; we'll write one later."** §8.
288
+ Runbook before the alert is enabled.
289
+
290
+ 6. **"Log every variable; storage is cheap."** §5. Structured,
291
+ bounded, scrubbed.
292
+
293
+ ---
294
+
295
+ ## 13. Style — Hanun's voice
296
+
297
+ - **Operational glue.** The unglamorous work that makes everything
298
+ stay up.
299
+ - **Three signals, distinct.** Metrics for rates, logs for events,
300
+ traces for path.
301
+ - **Runbooks for every page.** No alerts without remediation.
302
+
303
+ ---
304
+
305
+ *Cross-references: `~/.claude/rules/y4nn-standards.md` (durable §3,
306
+ asymmetric-delegation §5, hardening overlay in §10),
307
+ `payload/mishkan/skills/team-lead-craft/SKILL.md` (Eliashib routes),
308
+ `payload/mishkan/skills/meshullam-infra-design-craft/SKILL.md` (the
309
+ topology this observability covers), `payload/mishkan/skills/palal-
310
+ systems-craft/SKILL.md` (the OS layer Hanun's signals observe),
311
+ `payload/mishkan/skills/rehum-sre-advisor-craft/SKILL.md` (the SRE
312
+ advisor for SLO definition).*