mishkan-harness 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (186) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +205 -0
  3. package/bin/mishkan.js +221 -0
  4. package/docs/design/MISHKAN_agent_aliases.md +140 -0
  5. package/docs/design/MISHKAN_decisions.md +172 -0
  6. package/docs/design/MISHKAN_harness_design.md +820 -0
  7. package/docs/design/MISHKAN_ontology.md +87 -0
  8. package/docs/design/MISHKAN_token_optimisation.md +181 -0
  9. package/docs/engineer/README.md +37 -0
  10. package/docs/engineer/profile.example.md +79 -0
  11. package/docs/usage/01-installation.md +178 -0
  12. package/docs/usage/02-project-init.md +151 -0
  13. package/docs/usage/03-orchestration.md +218 -0
  14. package/docs/usage/04-memory-layer.md +201 -0
  15. package/docs/usage/05-selective-ingest.md +177 -0
  16. package/docs/usage/06-llm-providers.md +195 -0
  17. package/docs/usage/07-troubleshooting.md +316 -0
  18. package/docs/usage/08-glossary.md +154 -0
  19. package/docs/usage/09-workflows.md +123 -0
  20. package/docs/usage/README.md +77 -0
  21. package/package.json +43 -0
  22. package/payload/install/settings.hooks.json +47 -0
  23. package/payload/mishkan/AGENT_SPEC.md +154 -0
  24. package/payload/mishkan/agents/ahikam.md +58 -0
  25. package/payload/mishkan/agents/aholiab.md +68 -0
  26. package/payload/mishkan/agents/asaph.md +73 -0
  27. package/payload/mishkan/agents/baruch.md +88 -0
  28. package/payload/mishkan/agents/benaiah.md +76 -0
  29. package/payload/mishkan/agents/bezalel.md +83 -0
  30. package/payload/mishkan/agents/caleb.md +74 -0
  31. package/payload/mishkan/agents/deborah.md +63 -0
  32. package/payload/mishkan/agents/elasah.md +58 -0
  33. package/payload/mishkan/agents/eliashib.md +68 -0
  34. package/payload/mishkan/agents/ezra.md +69 -0
  35. package/payload/mishkan/agents/hanun.md +64 -0
  36. package/payload/mishkan/agents/hiram.md +68 -0
  37. package/payload/mishkan/agents/hizkiah.md +76 -0
  38. package/payload/mishkan/agents/huldah.md +59 -0
  39. package/payload/mishkan/agents/huram.md +66 -0
  40. package/payload/mishkan/agents/hushai.md +59 -0
  41. package/payload/mishkan/agents/igal.md +58 -0
  42. package/payload/mishkan/agents/ira.md +86 -0
  43. package/payload/mishkan/agents/jahaziel.md +71 -0
  44. package/payload/mishkan/agents/jakin.md +66 -0
  45. package/payload/mishkan/agents/jehonathan.md +62 -0
  46. package/payload/mishkan/agents/jehoshaphat.md +68 -0
  47. package/payload/mishkan/agents/joab.md +71 -0
  48. package/payload/mishkan/agents/joah.md +62 -0
  49. package/payload/mishkan/agents/maaseiah.md +61 -0
  50. package/payload/mishkan/agents/meremoth.md +65 -0
  51. package/payload/mishkan/agents/meshullam.md +67 -0
  52. package/payload/mishkan/agents/nathan.md +70 -0
  53. package/payload/mishkan/agents/nehemiah.md +93 -0
  54. package/payload/mishkan/agents/obed.md +60 -0
  55. package/payload/mishkan/agents/oholiab.md +67 -0
  56. package/payload/mishkan/agents/palal.md +63 -0
  57. package/payload/mishkan/agents/phinehas.md +73 -0
  58. package/payload/mishkan/agents/rehum.md +60 -0
  59. package/payload/mishkan/agents/salma.md +69 -0
  60. package/payload/mishkan/agents/seraiah.md +73 -0
  61. package/payload/mishkan/agents/shallum.md +66 -0
  62. package/payload/mishkan/agents/shaphan.md +64 -0
  63. package/payload/mishkan/agents/shemaiah.md +67 -0
  64. package/payload/mishkan/agents/shevna.md +58 -0
  65. package/payload/mishkan/agents/uriah.md +70 -0
  66. package/payload/mishkan/agents/zaccur.md +58 -0
  67. package/payload/mishkan/agents/zadok.md +67 -0
  68. package/payload/mishkan/agents/zerubbabel.md +69 -0
  69. package/payload/mishkan/cognee/.env.curated.example +61 -0
  70. package/payload/mishkan/cognee/.env.example +165 -0
  71. package/payload/mishkan/cognee/Dockerfile +50 -0
  72. package/payload/mishkan/cognee/README.md +129 -0
  73. package/payload/mishkan/cognee/docker-compose.curated-ui.yml +61 -0
  74. package/payload/mishkan/cognee/docker-compose.curated.yml +85 -0
  75. package/payload/mishkan/cognee/docker-compose.hardening.yml +16 -0
  76. package/payload/mishkan/cognee/docker-compose.selfhosted.yml +114 -0
  77. package/payload/mishkan/cognee/docker-compose.ui.yml +70 -0
  78. package/payload/mishkan/cognee/docker-compose.yml +71 -0
  79. package/payload/mishkan/cognee/ingest-curated.py +92 -0
  80. package/payload/mishkan/commands/dep-audit.md +24 -0
  81. package/payload/mishkan/commands/mishkan-init.md +25 -0
  82. package/payload/mishkan/commands/mishkan-resume.md +21 -0
  83. package/payload/mishkan/commands/promote.md +19 -0
  84. package/payload/mishkan/commands/sefer-pull.md +19 -0
  85. package/payload/mishkan/commands/sprint-close.md +21 -0
  86. package/payload/mishkan/config/curated-library.yaml +113 -0
  87. package/payload/mishkan/config/improvement-queries.md +29 -0
  88. package/payload/mishkan/config/model-routing.yaml +87 -0
  89. package/payload/mishkan/config/projects.yaml +38 -0
  90. package/payload/mishkan/evals/baruch/README.md +93 -0
  91. package/payload/mishkan/evals/baruch/fixtures/invalid/bad-outcome-enum.json +15 -0
  92. package/payload/mishkan/evals/baruch/fixtures/invalid/bad-sprint-pattern.json +15 -0
  93. package/payload/mishkan/evals/baruch/fixtures/invalid/bad-trigger-enum.json +15 -0
  94. package/payload/mishkan/evals/baruch/fixtures/invalid/malformed-json.json +7 -0
  95. package/payload/mishkan/evals/baruch/fixtures/invalid/missing-required-field.json +14 -0
  96. package/payload/mishkan/evals/baruch/fixtures/valid/blocked-vendor.json +15 -0
  97. package/payload/mishkan/evals/baruch/fixtures/valid/curated-shortcircuit.json +15 -0
  98. package/payload/mishkan/evals/baruch/fixtures/valid/partial-no-write.json +14 -0
  99. package/payload/mishkan/evals/baruch/fixtures/valid/resolved-cross-harness.json +15 -0
  100. package/payload/mishkan/evals/baruch/golden_case/expected.yaml +35 -0
  101. package/payload/mishkan/evals/baruch/golden_case/input.yaml +47 -0
  102. package/payload/mishkan/evals/baruch/golden_case/produced.json +15 -0
  103. package/payload/mishkan/evals/baruch/run.sh +129 -0
  104. package/payload/mishkan/hooks/model-route.py +96 -0
  105. package/payload/mishkan/hooks/post-tool-observe.sh +45 -0
  106. package/payload/mishkan/hooks/pre-tool-security.sh +150 -0
  107. package/payload/mishkan/hooks/session-start.sh +20 -0
  108. package/payload/mishkan/hooks/stop-reporter.sh +29 -0
  109. package/payload/mishkan/ontology.md +87 -0
  110. package/payload/mishkan/rules/backend/yasad.md +23 -0
  111. package/payload/mishkan/rules/common/dependencies.md +53 -0
  112. package/payload/mishkan/rules/common/quality.md +16 -0
  113. package/payload/mishkan/rules/common/security.md +20 -0
  114. package/payload/mishkan/rules/documentation/sefer.md +19 -0
  115. package/payload/mishkan/rules/frontend/panim.md +21 -0
  116. package/payload/mishkan/rules/infrastructure/migdal.md +22 -0
  117. package/payload/mishkan/scripts/dependency-audit.sh +171 -0
  118. package/payload/mishkan/scripts/ensure-curated-box.sh +66 -0
  119. package/payload/mishkan/scripts/mishkan-ingest.sh +92 -0
  120. package/payload/mishkan/scripts/observability-aggregate.sh +57 -0
  121. package/payload/mishkan/scripts/seed-curated-library.sh +62 -0
  122. package/payload/mishkan/scripts/sync-profile.sh +65 -0
  123. package/payload/mishkan/scripts/validate-research-log.sh +108 -0
  124. package/payload/mishkan/skills/asaph-a11y-seo-craft/SKILL.md +289 -0
  125. package/payload/mishkan/skills/baruch-research-reporting-craft/SKILL.md +460 -0
  126. package/payload/mishkan/skills/benaiah-devsecops-craft/SKILL.md +329 -0
  127. package/payload/mishkan/skills/bezalel-cto-craft/SKILL.md +391 -0
  128. package/payload/mishkan/skills/caleb-web-research-craft/SKILL.md +306 -0
  129. package/payload/mishkan/skills/cognee-promote/SKILL.md +40 -0
  130. package/payload/mishkan/skills/cognee-quickstart/SKILL.md +66 -0
  131. package/payload/mishkan/skills/context-compress/SKILL.md +36 -0
  132. package/payload/mishkan/skills/deborah-ux-craft/SKILL.md +295 -0
  133. package/payload/mishkan/skills/dependency-audit/SKILL.md +59 -0
  134. package/payload/mishkan/skills/dependency-vetting/SKILL.md +59 -0
  135. package/payload/mishkan/skills/documentation-craft/SKILL.md +468 -0
  136. package/payload/mishkan/skills/ezra-research-formulation-craft/SKILL.md +319 -0
  137. package/payload/mishkan/skills/hanun-observability-craft/SKILL.md +312 -0
  138. package/payload/mishkan/skills/hiram-ui-craft/SKILL.md +334 -0
  139. package/payload/mishkan/skills/hizkiah-implementation-craft/SKILL.md +701 -0
  140. package/payload/mishkan/skills/hushai-security-advisor-craft/SKILL.md +282 -0
  141. package/payload/mishkan/skills/ira-code-security-craft/SKILL.md +553 -0
  142. package/payload/mishkan/skills/jakin-intent-clarification-craft/SKILL.md +299 -0
  143. package/payload/mishkan/skills/jehonathan-publication-craft/SKILL.md +262 -0
  144. package/payload/mishkan/skills/joab-app-security-craft/SKILL.md +266 -0
  145. package/payload/mishkan/skills/meremoth-devops-craft/SKILL.md +298 -0
  146. package/payload/mishkan/skills/meshullam-infra-design-craft/SKILL.md +302 -0
  147. package/payload/mishkan/skills/mishkan-ingest/SKILL.md +65 -0
  148. package/payload/mishkan/skills/mishkan-init/SKILL.md +65 -0
  149. package/payload/mishkan/skills/nathan-architecture-craft/SKILL.md +547 -0
  150. package/payload/mishkan/skills/nehemiah-pm-craft/SKILL.md +484 -0
  151. package/payload/mishkan/skills/obed-asset-pipeline-craft/SKILL.md +286 -0
  152. package/payload/mishkan/skills/oholiab-design-system-craft/SKILL.md +334 -0
  153. package/payload/mishkan/skills/palal-systems-craft/SKILL.md +281 -0
  154. package/payload/mishkan/skills/qa-evaluation-craft/SKILL.md +406 -0
  155. package/payload/mishkan/skills/rehum-sre-advisor-craft/SKILL.md +228 -0
  156. package/payload/mishkan/skills/reporter-discipline-craft/SKILL.md +351 -0
  157. package/payload/mishkan/skills/research-pipeline/SKILL.md +55 -0
  158. package/payload/mishkan/skills/salma-frontend-implementation-craft/SKILL.md +369 -0
  159. package/payload/mishkan/skills/sefer-pull/SKILL.md +37 -0
  160. package/payload/mishkan/skills/shallum-database-craft/SKILL.md +347 -0
  161. package/payload/mishkan/skills/shaphan-summarisation-craft/SKILL.md +271 -0
  162. package/payload/mishkan/skills/shemaiah-evaluation-craft/SKILL.md +342 -0
  163. package/payload/mishkan/skills/sprint-report/SKILL.md +28 -0
  164. package/payload/mishkan/skills/team-lead-craft/SKILL.md +457 -0
  165. package/payload/mishkan/skills/zadok-contract-craft/SKILL.md +520 -0
  166. package/payload/mishkan/templates/case-node.schema.json +22 -0
  167. package/payload/mishkan/templates/mcp.json +22 -0
  168. package/payload/mishkan/templates/observability-log.schema.json +24 -0
  169. package/payload/mishkan/templates/project-CLAUDE.md +47 -0
  170. package/payload/mishkan/templates/research-log.schema.json +40 -0
  171. package/payload/mishkan/templates/settings.json +12 -0
  172. package/payload/mishkan/templates/settings.local.json +6 -0
  173. package/payload/mishkan/templates/sprint-state.schema.json +47 -0
  174. package/payload/mishkan/templates/team-report.schema.json +50 -0
  175. package/payload/mishkan/templates/user-CLAUDE.md +62 -0
  176. package/payload/mishkan/workflows/README.md +88 -0
  177. package/payload/mishkan/workflows/mishkan-architecture-panel.js +156 -0
  178. package/payload/mishkan/workflows/mishkan-codebase-audit.js +188 -0
  179. package/payload/mishkan/workflows/mishkan-deep-research.js +251 -0
  180. package/payload/mishkan/workflows/mishkan-init.js +156 -0
  181. package/payload/mishkan/workflows/mishkan-migration-wave.js +180 -0
  182. package/payload/mishkan/workflows/mishkan-release-readiness.js +163 -0
  183. package/payload/mishkan/workflows/mishkan-sprint-close.js +112 -0
  184. package/payload/user/CLAUDE.md +62 -0
  185. package/payload/user/rules/engineer-standards.md +66 -0
  186. package/payload/user/rules/y4nn-standards.md +167 -0
@@ -0,0 +1,177 @@
1
+ # 05 — Selective ingest
2
+
3
+ > Goal: explain how documents enter the work cognee graph, and why the default
4
+ > is "memory is opt-in, not bulk".
5
+
6
+ ## The contract
7
+
8
+ **Nothing enters the work graph unless tagged or explicitly invoked.** This is
9
+ the harness-wide rule that the `mishkan-ingest` skill enforces. It solves two
10
+ real problems hit during the build (commit `6213611`):
11
+
12
+ - **PII bleed.** Bulk-ingesting `docs/` pulls in incident reports that contain
13
+ real email addresses, internal hostnames, ticket numbers — all of which then
14
+ sit in the project graph alongside curated reference material.
15
+ - **Oversized-doc embedding failures.** `nomic-embed-text` rejects chunks
16
+ larger than 8,192 tokens with a 422; one too-large document jams cognify
17
+ retries indefinitely.
18
+
19
+ Both go away when you choose what enters memory deliberately.
20
+
21
+ ## Two ways to select
22
+
23
+ ### 1. Frontmatter tag (standing intent)
24
+
25
+ Add a YAML frontmatter block at the very top of a doc:
26
+
27
+ ```yaml
28
+ ---
29
+ mishkan: ingest
30
+ ---
31
+
32
+ # Doc title
33
+
34
+ ```
35
+
36
+ That single key is enough. Any other frontmatter (`author`, `date`, etc.)
37
+ co-exists fine. The tag means *"this doc is part of the project's persistent
38
+ memory"*. The skill default mode walks `./docs/` and ingests every tagged file.
39
+
40
+ ### 2. Explicit paths (ad-hoc pull)
41
+
42
+ Skip the tag, name the files:
43
+
44
+ ```bash
45
+ bash ~/.claude/mishkan/scripts/mishkan-ingest.sh docs/SECURITY.md docs/ROADMAP.md
46
+ ```
47
+
48
+ Useful for one-off pulls or when the doc lives outside the standard `docs/`
49
+ tree.
50
+
51
+ ## Invoking the skill
52
+
53
+ ```bash
54
+ # Default — walk ./docs/ for tagged docs
55
+ bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --tagged-only
56
+
57
+ # Explicit files (no tag required)
58
+ bash ~/.claude/mishkan/scripts/mishkan-ingest.sh path/to/a.md path/to/b.md
59
+
60
+ # Override the dataset name (default is basename of $PWD)
61
+ bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --dataset=research docs/research.md
62
+
63
+ # Show inline help
64
+ bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --help
65
+ ```
66
+
67
+ ## What the skill runs
68
+
69
+ 1. **Selects files.** Tagged-only walks `./docs/` (or any directory you pass)
70
+ and keeps only `.md` files whose YAML frontmatter contains `mishkan: ingest`.
71
+ Explicit paths skip the filter.
72
+ 2. **Stages** the files into the work cognee-mcp container at
73
+ `/home/cognee/ingest_buf/`.
74
+ 3. **Runs `cognee.add(files, dataset_name=<project>)`** — registers and chunks
75
+ under the target dataset.
76
+ 4. **`cognify(datasets=[<project>])`** — LLM extracts entities + relationships.
77
+ Subject to the work box's `LLM_RATE_LIMIT_*` throttle and now-persistent
78
+ storage (commits `70d3c2e` + `e24fabf`).
79
+ 5. **`memify(dataset=<project>)`** — embeds the triplet layer into pgvector
80
+ (commit `210f92b` made this automatic after every cognify).
81
+
82
+ Output marks each step: `>> added N file(s) -> <dataset>`, `>> cognified`,
83
+ `>> memified`.
84
+
85
+ ## Naming the target dataset
86
+
87
+ By default the dataset is named after the project directory:
88
+
89
+ ```bash
90
+ cd ~/code/aiobi-mail
91
+ bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --tagged-only
92
+ # → ingests into dataset "aiobi-mail" in the work store
93
+ ```
94
+
95
+ Override with `--dataset=<name>` when:
96
+
97
+ - You want a sub-corpus of the project (`--dataset=architecture-only`).
98
+ - The basename collides with another project (rename one).
99
+ - You want to ingest into a personal dataset (e.g. `--dataset=research`).
100
+
101
+ The skill **never** writes to `cognee-curated`. The curated store is read-only
102
+ in normal use; only the harness's `seed-curated-library.sh` writes to it, and
103
+ that targets `mishkan-curated-mcp` explicitly.
104
+
105
+ ## A worked example
106
+
107
+ Tag two docs and leave the rest untouched:
108
+
109
+ ```bash
110
+ cd ~/code/aiobi-mail
111
+
112
+ # tag SECURITY.md and ROADMAP.md
113
+ for f in docs/SECURITY.md docs/ROADMAP.md; do
114
+ if ! head -1 "$f" | grep -qx '---'; then
115
+ printf '%s\n%s\n%s\n\n' '---' 'mishkan: ingest' '---' | cat - "$f" > "$f.tmp" && mv "$f.tmp" "$f"
116
+ fi
117
+ done
118
+
119
+ # run the skill
120
+ bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --tagged-only
121
+
122
+ # verify in the graph
123
+ docker exec mishkan-cognee-pg psql -U cognee -d cognee_db -tc \
124
+ "SELECT d.name, count(dd.data_id) AS items
125
+ FROM datasets d LEFT JOIN dataset_data dd ON dd.dataset_id=d.id
126
+ WHERE d.name='aiobi-mail' GROUP BY d.name;"
127
+ ```
128
+
129
+ The other docs in `docs/` (the 79KB migration report, French PDFs, stale
130
+ upstream READMEs) stay out of the graph. Re-running the skill picks up newly
131
+ tagged files only; previously-cognified docs are skipped via cognee's pipeline
132
+ status.
133
+
134
+ ## Re-ingesting after a doc changes
135
+
136
+ The skill is additive. To refresh a doc that's already been cognified:
137
+
138
+ 1. Edit the doc.
139
+ 2. Mark its existing dataset entry as needing reprocessing (cognee tracks
140
+ per-data-item pipeline status). If a clean reset is wanted:
141
+ ```python
142
+ # one-shot, run inside the work mcp container
143
+ import asyncio, cognee
144
+ from cognee.modules.users.methods import get_default_user
145
+ from cognee.modules.data.methods import get_datasets, delete_dataset
146
+ async def m():
147
+ u = await get_default_user()
148
+ for d in await get_datasets(u.id):
149
+ if d.name == "<project>": await delete_dataset(d)
150
+ asyncio.run(m())
151
+ ```
152
+ 3. Rerun `mishkan-ingest.sh --tagged-only`.
153
+
154
+ That removes the relational dataset records cleanly. Note: with cognee access
155
+ control off, deleting a dataset does **not** remove the graph nodes — for a
156
+ true reset, also drop the graph labels for that dataset. See
157
+ [Troubleshooting](./07-troubleshooting.md) for the cleanup pattern used during
158
+ the build.
159
+
160
+ ## What the skill is *not*
161
+
162
+ - Not a sync. It does not detect deletions or watch the filesystem.
163
+ - Not a translator. Non-markdown files are skipped in directory walks.
164
+ - Not a curation tool for the **curated** store. Curated is a separate seed
165
+ flow (`seed-curated-library.sh` against the curated MCP).
166
+ - Not an autonomous "cognee always knows everything" mechanism. The whole
167
+ point is *deliberate* memory.
168
+
169
+ ## See also
170
+
171
+ - The skill itself: `payload/mishkan/skills/mishkan-ingest/SKILL.md`.
172
+ - The script: `payload/mishkan/scripts/mishkan-ingest.sh`.
173
+ - Commit `6213611` (introduction).
174
+ - Memory layer architecture: [04](./04-memory-layer.md).
175
+ - Provider profiles (cognify uses the LLM): [06](./06-llm-providers.md).
176
+ - If cognify errors on the last doc:
177
+ [Troubleshooting](./07-troubleshooting.md#cognify-stuck-on-the-last-doc).
@@ -0,0 +1,195 @@
1
+ # 06 — LLM provider profiles
2
+
3
+ > Goal: choose the right LLM + embedding combination for cognify, avoid the
4
+ > traps the build hit (daily caps, thinking models, oversized chunks), and
5
+ > match provider to box.
6
+
7
+ ## The split: agents vs cognee
8
+
9
+ There are **two different model populations** in MISHKAN and they don't have
10
+ to use the same provider:
11
+
12
+ | Population | What runs it | Where it's configured |
13
+ |---|---|---|
14
+ | **Agents** (the 45) | Claude Code's own model routing (Opus / Sonnet / Haiku) per tier | `payload/mishkan/config/model-routing.yaml` (enforced by `hooks/model-route.py`) |
15
+ | **Cognee's `cognify` extraction + embeddings** | a provider you choose | `payload/mishkan/cognee/.env` (work box) and `.env.curated` (curated box) |
16
+
17
+ This chapter is about the second — the model that powers `cognify`/`memify`/`search`
18
+ inside the cognee containers. The agents' Claude tiers are covered in
19
+ [Orchestration](./03-orchestration.md).
20
+
21
+ ## Match provider to box
22
+
23
+ The two stores have different threat models, which translate into different
24
+ provider choices.
25
+
26
+ | Store | Contains | Provider recommendation | Why |
27
+ |---|---|---|---|
28
+ | **Work** (`:7777`) | project knowledge, may contain PII | **Local Ollama LLM** (private, no quota), or paid/no-train cloud, or a free cloud you accept training on | every free cloud tier trains on prompts; PII shouldn't leak |
29
+ | **Curated** (`:7730`) | public reference resources, no PII | Any free cloud (Gemini, NVIDIA catalog, OpenRouter named-free) is fine | nothing sensitive |
30
+
31
+ **Embeddings should be local** in both stores. Bulk ingest fires many embedding
32
+ calls in a burst; cloud free-tier embedding endpoints 429 on that pattern. Local
33
+ Ollama (`nomic-embed-text`, 768-dim) has no rate cap and embeds in milliseconds
34
+ once the model is loaded.
35
+
36
+ ## The five provider profiles in `.env.example`
37
+
38
+ The shipped `.env.example` carries five commented profiles. Pick one, uncomment,
39
+ recreate the relevant services.
40
+
41
+ | Profile | LLM | Embeddings | Use it when |
42
+ |---|---|---|---|
43
+ | **A — fully self-hosted (Ollama)** | local `qwen2.5:3b` (recommended) or `llama3.1:8b` | local `nomic-embed-text` | want privacy + zero cost, accept slower cognify; the default for work box if there's PII |
44
+ | **B — Google Gemini** | `gemini-2.5-flash` | `gemini-embedding-001` (3072-dim) | fast cloud, **need a billing-enabled key** — bare free keys 429 immediately |
45
+ | **C — OpenAI** | `gpt-5-mini` (or current) | `text-embedding-3-large` (3072-dim) | familiar, paid, reliable |
46
+ | **D — Anthropic/Claude LLM + OpenAI embeddings** | `claude-sonnet-4-5` | OpenAI's | **must split** — Claude ships no embedding model |
47
+ | **E — NVIDIA API Catalog (OpenAI-compatible)** | a non-thinking catalog model (e.g. `meta/llama-3.3-70b-instruct`) | local Ollama | **recommended low-cost cloud** — generous free testing tier, OpenAI-compatible |
48
+
49
+ The dimension column matters: **embedding dimensions cannot change after first
50
+ ingest without wiping the vector store**. Pick 768 (Ollama / Gemini) or 3072
51
+ (OpenAI / Gemini-embedding-001) and stick with it. This caveat is documented
52
+ in the `.env.example` header.
53
+
54
+ ## Hybrid is fine — and the recommended starting point
55
+
56
+ Cloud LLM + local embeddings is the practical hybrid. Live `.env` example used
57
+ during the build:
58
+
59
+ ```
60
+ LLM_PROVIDER=gemini
61
+ LLM_MODEL=gemini/gemini-2.5-flash
62
+ LLM_API_KEY=<billed key>
63
+ EMBEDDING_PROVIDER=ollama
64
+ EMBEDDING_MODEL=nomic-embed-text:latest
65
+ EMBEDDING_ENDPOINT=http://ollama:11434/api/embed
66
+ EMBEDDING_DIMENSIONS=768
67
+ HUGGINGFACE_TOKENIZER=nomic-ai/nomic-embed-text-v1.5
68
+ ```
69
+
70
+ After the NVIDIA pivot (when Gemini's daily cap kept hitting):
71
+
72
+ ```
73
+ LLM_PROVIDER=custom
74
+ LLM_MODEL=openai/meta/llama-3.3-70b-instruct
75
+ LLM_ENDPOINT=https://integrate.api.nvidia.com/v1
76
+ LLM_API_KEY=<nvapi-...>
77
+ LLM_MAX_TOKENS=16384
78
+ # embeddings unchanged (local Ollama)
79
+ ```
80
+
81
+ ## Rate cap vs daily cap — they are different walls
82
+
83
+ This caught the build out and bears repeating:
84
+
85
+ | Cap | Symptom | What helps |
86
+ |---|---|---|
87
+ | **Per-minute (RPM)** | `429` mid-run, after a burst of fast calls | `LLM_RATE_LIMIT_*` throttle in `.env` (8 req/60s default) |
88
+ | **Per-day (RPD)** | `429 RESOURCE_EXHAUSTED` early in a run that's been quiet | **nothing in-process helps** — wait for reset, switch provider, or use a paid tier |
89
+
90
+ The throttle (`LLM_RATE_LIMIT_ENABLED=true`, `_REQUESTS=8`, `_INTERVAL=60`) was
91
+ added in commit `70d3c2e` after Gemini free-tier bulk-cognify kept blowing the
92
+ per-minute window. It cannot rescue you from RPD — Gemini's free RPD is small
93
+ enough that even one large doc cognify can exhaust it.
94
+
95
+ If you keep hitting RPD on a free cloud tier, the durable fixes are
96
+ (in increasing severity):
97
+
98
+ 1. **Selectively ingest** (don't cognify large unneeded docs — see
99
+ [05](./05-selective-ingest.md)).
100
+ 2. **Switch to NVIDIA API Catalog** (Profile E) for a more generous free tier.
101
+ 3. **Switch the work box to local Ollama LLM** (Profile A) — slowest but
102
+ no quota wall and private.
103
+
104
+ ## The thinking-model trap
105
+
106
+ DeepSeek V4 Pro, NVIDIA Nemotron reasoning models, and similar are **thinking
107
+ models**: they emit `<think>...</think>` tokens before the visible answer. Two
108
+ problems for cognee:
109
+
110
+ - **Cost / latency.** Every extraction call burns thousands of reasoning tokens
111
+ before the structured output.
112
+ - **Instructor breaks.** Cognee uses `instructor` for JSON parsing of structured
113
+ output. A reasoning preamble before the JSON throws off the parser.
114
+
115
+ You need to **disable thinking** for cognee. The canonical knob (from NVIDIA's
116
+ own docs) is:
117
+
118
+ ```python
119
+ extra_body={"chat_template_kwargs":{"thinking": False}}
120
+ ```
121
+
122
+ In `.env` via litellm:
123
+
124
+ ```
125
+ LLM_ARGS={"extra_body":{"chat_template_kwargs":{"thinking":false}}}
126
+ ```
127
+
128
+ **Caveat from the build:** during this session, litellm's `extra_body`
129
+ forwarding through the `custom` provider path was unreliable, and the flag
130
+ sometimes didn't reach NVIDIA — calls then 504-timed out as the model thought
131
+ unbounded. **The reliable workaround is to pick a non-thinking model** (e.g.
132
+ `meta/llama-3.3-70b-instruct` on the NVIDIA catalog) rather than fight the flag.
133
+
134
+ ## Embedding dimensions and limits
135
+
136
+ | Embedding model | Dim | Max tokens / chunk | Notes |
137
+ |---|---|---|---|
138
+ | `nomic-embed-text` (Ollama, local) | 768 | 8,192 | the default; long chunks 422 — see [Troubleshooting](./07-troubleshooting.md) |
139
+ | `text-embedding-3-large` (OpenAI) | 3,072 | 8,191 | cloud, paid |
140
+ | `gemini-embedding-001` (Gemini AI Studio) | 3,072 | up to ~30K | cloud; older `text-embedding-004` retired on v1beta (commit `e17f2a9`) |
141
+
142
+ The 8,192-token limit on `nomic-embed-text` is the reason cognify can jam on a
143
+ single oversized chunk. Lower cognee's `LLM_MAX_TOKENS` if you see persistent
144
+ 422s on embedding (the chunker uses the same value).
145
+
146
+ ## How to switch profiles
147
+
148
+ 1. Edit `~/.claude/mishkan/cognee/.env` — comment the active block, uncomment
149
+ the chosen profile, set the key.
150
+ 2. If embedding dimensions changed, the vector store must be wiped:
151
+ ```bash
152
+ docker exec mishkan-cognee-pg psql -U cognee -d cognee_db -c \
153
+ "DROP SCHEMA public CASCADE; CREATE SCHEMA public;"
154
+ # then prune cognee's relational state
155
+ ```
156
+ (Documents will need to be re-ingested.)
157
+ 3. Recreate the services that read the env:
158
+ ```bash
159
+ cd ~/.claude/mishkan/cognee
160
+ docker compose -f docker-compose.yml -f docker-compose.hardening.yml \
161
+ -f docker-compose.selfhosted.yml -f docker-compose.ui.yml \
162
+ --profile ui up -d --force-recreate --no-build \
163
+ cognee-mcp cognee-backend
164
+ ```
165
+ 4. Re-run an ingest to confirm cognify completes against the new provider.
166
+
167
+ ## Sanity-check any new key before a bulk run
168
+
169
+ A 30-second curl saves hours:
170
+
171
+ ```bash
172
+ K='<your-key>'
173
+ # Gemini
174
+ curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$K" \
175
+ -H 'Content-Type: application/json' \
176
+ -d '{"contents":[{"parts":[{"text":"say ok"}]}]}'
177
+
178
+ # NVIDIA / OpenAI-compatible
179
+ curl -s -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
180
+ -H "Authorization: Bearer $K" -H 'Content-Type: application/json' \
181
+ -d '{"model":"meta/llama-3.3-70b-instruct","messages":[{"role":"user","content":"reply with single word ok"}],"max_tokens":20}'
182
+ ```
183
+
184
+ A `200` with the model's reply means the key + model id are good. A `429
185
+ RESOURCE_EXHAUSTED` on the very first call means the daily quota is
186
+ already gone.
187
+
188
+ ## See also
189
+
190
+ - Throttle introduction: commit `70d3c2e`.
191
+ - Provider profile cleanup + Gemini embedding model fix: commit `e17f2a9`.
192
+ - Hybrid Gemini-LLM + Ollama-embed: live `.env` evolution during the build.
193
+ - Cognee provider catalog: `~/.claude/mishkan/cognee/_src/cognee/.env.template`
194
+ (read-only reference).
195
+ - Why daily cap fixes are out of in-process scope: [Troubleshooting §RPD](./07-troubleshooting.md#daily-quota-rpd-wall).
@@ -0,0 +1,316 @@
1
+ # 07 — Troubleshooting
2
+
3
+ > A cookbook of the issues hit during the build and how they were resolved.
4
+ > Each entry: symptom → root cause → fix → commit / file reference.
5
+
6
+ ## cognify errors with `Status code: 422`
7
+
8
+ The 422 is cognee's generic wrapper; the actual cause is upstream. Two distinct
9
+ classes:
10
+
11
+ ### (a) Embedding 422 — `Failed to index data points using model nomic-embed-text`
12
+
13
+ **Symptom**
14
+
15
+ ```
16
+ EmbeddingException: Failed to index data points using model nomic-embed-text:latest (Status code: 422)
17
+ ```
18
+
19
+ repeated in a retry loop on the same data item.
20
+
21
+ **Causes (in order of likelihood)**
22
+
23
+ 1. A chunk exceeds the model's input limit (8,192 tokens for nomic). Cognee
24
+ keeps retrying the same offending chunk.
25
+ 2. Ollama transiently failed to load the embedding model (resource pressure)
26
+ and is still warming back up.
27
+
28
+ **Fix**
29
+
30
+ - Identify the failing document. Most often it's the largest in the dataset.
31
+ - Either tag it out of selective ingest (see [05](./05-selective-ingest.md))
32
+ or reduce cognee's chunk target by lowering `LLM_MAX_TOKENS` in `.env`.
33
+ - For transient Ollama load failures, wait, then re-run cognify — the persistent
34
+ storage fix (commit `e24fabf`) means progress sticks across retries.
35
+
36
+ ### (b) LLM 422 — `Pipeline run failed. Data item could not be processed.`
37
+
38
+ **Symptom**
39
+
40
+ ```
41
+ PipelineRunFailedError: Pipeline run failed. Data item could not be processed. (Status code: 422)
42
+ ```
43
+
44
+ **Causes**
45
+
46
+ - Free-tier cloud LLM hit per-minute or per-day cap → 429 wrapped as 422.
47
+ - Thinking model emitting reasoning before the structured output → instructor
48
+ fails to parse JSON.
49
+
50
+ **Fix**
51
+
52
+ - Per-minute: enable the throttle (`LLM_RATE_LIMIT_ENABLED=true`,
53
+ `LLM_RATE_LIMIT_REQUESTS=8`, `LLM_RATE_LIMIT_INTERVAL=60`). See commit
54
+ `70d3c2e`.
55
+ - Per-day: see [Daily quota wall](#daily-quota-rpd-wall) below.
56
+ - Thinking model: switch to a non-thinking model on the same key (e.g.
57
+ `meta/llama-3.3-70b-instruct` on NVIDIA Catalog). See
58
+ [LLM providers — thinking-model trap](./06-llm-providers.md#the-thinking-model-trap).
59
+
60
+ ## Cognify stuck on the last doc
61
+
62
+ **Symptom** — pipeline runs are mostly COMPLETED, one stuck as STARTED for
63
+ half an hour, graph not advancing.
64
+
65
+ **Diagnosis** — check what cognee is actually doing:
66
+
67
+ ```bash
68
+ # is the LLM endpoint being called at all?
69
+ docker logs --since 5m mishkan-cognee-mcp 2>&1 | grep -iE "extraction|nodes_extracted|429|timeout|embedding"
70
+
71
+ # the cognee internal log file (more detail)
72
+ docker exec mishkan-cognee-mcp sh -c 'tail -300 /home/cognee/.cognee/logs/$(ls -t /home/cognee/.cognee/logs/ | head -1)' \
73
+ | grep -iE "error|exception|retry|429" | tail -20
74
+ ```
75
+
76
+ **Common root causes**
77
+
78
+ - Embedding 422 retry loop on one chunk (above).
79
+ - Stale `DATASET_PROCESSING_STARTED` row blocking re-runs (below).
80
+ - Daily quota exhausted mid-run (below).
81
+
82
+ ## Stale pipeline lock — `Dataset is already being processed`
83
+
84
+ **Symptom** — `cognee.cognify(datasets=[...])` returns immediately, logs say
85
+ *"Dataset is already being processed"*. The work graph doesn't grow.
86
+
87
+ **Cause** — a previous cognify died (timeout, OOM, interrupted) without
88
+ clearing its `DATASET_PROCESSING_STARTED` row in `pipeline_runs`. Cognee's
89
+ qualification check refuses to start a new run while one is "in progress".
90
+
91
+ **Fix**
92
+
93
+ ```bash
94
+ docker exec mishkan-cognee-pg psql -U cognee -d cognee_db -c \
95
+ "UPDATE pipeline_runs SET status='DATASET_PROCESSING_ERRORED'
96
+ WHERE status='DATASET_PROCESSING_STARTED'
97
+ AND created_at < NOW() - INTERVAL '5 minutes';"
98
+ ```
99
+
100
+ Then re-run cognify. The dataset and its data items are intact; only the stale
101
+ lock row is cleared.
102
+
103
+ ## Storage wiped on every `docker compose up --force-recreate`
104
+
105
+ **Symptom** — re-running cognify on an existing dataset errors with
106
+
107
+ ```
108
+ FileNotFoundError: Storage directory does not exist
109
+ ```
110
+
111
+ even though the data items are still listed in `datasets`.
112
+
113
+ **Cause** — cognee's default data + system root is venv-relative inside the
114
+ container (`.venv/.../cognee/.cognee_data` and `.cognee_system`), which is the
115
+ container's ephemeral layer. The Docker volume that ships with the compose was
116
+ mounted at `/app/cognee-mcp/.cognee_system` but cognee didn't write there by
117
+ default — so every recreate wiped the ingested source files.
118
+
119
+ **Fix (already in payload from commit `e24fabf`)** — point cognee at the
120
+ mounted volume via `.env`:
121
+
122
+ ```
123
+ DATA_ROOT_DIRECTORY=/app/cognee-mcp/.cognee_system/data
124
+ SYSTEM_ROOT_DIRECTORY=/app/cognee-mcp/.cognee_system/system
125
+ ```
126
+
127
+ The Dockerfile now pre-creates `.cognee_system` as the `cognee` user (uid
128
+ 10001), so a fresh named volume inherits writable ownership without a manual
129
+ chown.
130
+
131
+ **If you upgrade from a pre-`e24fabf` install** — the existing volume is
132
+ root-owned. Chown it once:
133
+
134
+ ```bash
135
+ docker run --rm -u 0 -v mishkan-cognee_cognee_data:/v busybox \
136
+ sh -c 'chown -R 10001:10001 /v'
137
+ docker compose ... up -d --force-recreate cognee-mcp
138
+ ```
139
+
140
+ ## Curated library is showing inside the work UI
141
+
142
+ **Symptom** — the Cognee UI at `:7724` (work backend) shows `CuratedResource`
143
+ nodes mixed with project data.
144
+
145
+ **Cause** — the curated library got seeded into the work store (incorrect).
146
+ Real fix: physical separation per D-007. Was hit during the build (the seed
147
+ initially ran against the work box) and is what the curated box exists for.
148
+
149
+ **Fix**
150
+
151
+ 1. Ensure the curated box is running (`scripts/ensure-curated-box.sh`).
152
+ 2. Re-run the curated seed against `mishkan-curated-mcp` (the script's default
153
+ container since commit `086e80e`).
154
+ 3. Delete the `CuratedResource` and `Team` labels from the work Neo4j:
155
+ ```bash
156
+ P='<work neo4j password from .env>'
157
+ docker exec mishkan-cognee-neo4j cypher-shell -u neo4j -p "$P" \
158
+ "MATCH (n:CuratedResource) DETACH DELETE n;"
159
+ docker exec mishkan-cognee-neo4j cypher-shell -u neo4j -p "$P" \
160
+ "MATCH (n:Team) DETACH DELETE n;"
161
+ ```
162
+ 4. Drop the stray `curated_library` dataset row from the work cognee_db via
163
+ cognee's `delete_dataset` API (see commit `418d10a` for the exact cleanup
164
+ pattern used during the build).
165
+
166
+ `claude_code_memory` is **not** stray — it is the per-client memory dataset.
167
+ Don't delete it.
168
+
169
+ ## Neo4j Browser "Could not perform discovery. No routing servers available"
170
+
171
+ **Symptom** — Neo4j Browser on `:7716` (or `:7731`) loads, but connecting to
172
+ `neo4j://localhost:7709` fails with the routing error.
173
+
174
+ **Cause** — the `neo4j://` URI scheme triggers cluster routing discovery, which
175
+ fails over a single-instance bolt connection and over SSH tunnels.
176
+
177
+ **Fix** — use the `bolt://` scheme:
178
+
179
+ ```
180
+ Connect URL: bolt://localhost:7709 # work
181
+ bolt://localhost:7732 # curated
182
+ ```
183
+
184
+ ## `tsh` tunnel: `Failed to bind to 127.0.0.1:NNNN: address already in use`
185
+
186
+ **Cause** — a previous tsh forward is still alive on your laptop holding the
187
+ port; tsh aborts the whole tunnel on any one bind failure.
188
+
189
+ **Fix on your laptop**
190
+
191
+ ```bash
192
+ lsof -nP -iTCP:7724 -sTCP:LISTEN # find what's holding it
193
+ pkill -f 'tsh ssh' # kill the stale tunnel(s)
194
+ ```
195
+
196
+ Then re-run the full tunnel command.
197
+
198
+ ## Daily quota (RPD) wall
199
+
200
+ **Symptom** — every retry of cognify returns `429 RESOURCE_EXHAUSTED` instantly,
201
+ including the first one of the run. Cognee's throttle has no effect.
202
+
203
+ **Cause** — the cloud free tier's **daily** request budget is exhausted. The
204
+ throttle controls per-minute rate; it cannot rescue a daily cap.
205
+
206
+ **Fix** — pick one:
207
+
208
+ - Wait for the cap to reset (24 h on most free tiers).
209
+ - Switch to a more generous free tier (NVIDIA API Catalog).
210
+ - Switch the work box to local Ollama (Profile A — zero cost, no quota, slow).
211
+ - Move to a paid tier on the same provider.
212
+
213
+ This is precisely why the harness recommends **local Ollama for the work store**
214
+ (see [LLM providers](./06-llm-providers.md)) when project data has PII or is
215
+ voluminous.
216
+
217
+ ## Auto-mode classifier blocks writing `.claude/settings.json` / `.mcp.json`
218
+
219
+ **Symptom** — the Claude Code auto-mode classifier denies the agent's write to
220
+ agent-config files even when invoked by `/mishkan-init`.
221
+
222
+ **Cause** — the classifier treats `.claude/settings.json`, `.mcp.json`,
223
+ `settings.local.json`, and (sometimes) `CLAUDE.md` as **self-modification**
224
+ and refuses autonomous writes.
225
+
226
+ **Fix** — pick one:
227
+
228
+ - Approve each write at the prompt.
229
+ - Disable the auto-mode classifier for this session, then re-run init.
230
+ - Add a permission rule that allows these specific writes.
231
+
232
+ There is no harness change needed; this is a Claude Code platform guard doing
233
+ its job, not a MISHKAN bug.
234
+
235
+ ## `afplay: not found` Stop-hook error on Linux
236
+
237
+ **Symptom** — every turn ends with
238
+
239
+ ```
240
+ Stop hook error: Failed with non-blocking status code: /bin/sh: 1: afplay: not found
241
+ ```
242
+
243
+ **Cause** — the personal sound hooks in `~/.claude/settings.json` use `afplay`
244
+ (macOS-only). On Linux, that command doesn't exist.
245
+
246
+ **Fix** — make the command portable. Replace the hook command string with:
247
+
248
+ ```sh
249
+ sh -c 'F="<path-to-mp3>"; { command -v afplay >/dev/null 2>&1 && afplay -v 0.1 "$F"; } || { command -v ffplay >/dev/null 2>&1 && ffplay -nodisp -autoexit -loglevel quiet -volume 10 "$F"; } || true'
250
+ ```
251
+
252
+ Tries `afplay` first (macOS), falls back to `ffplay` (Linux), silently no-ops
253
+ if neither is present. These are *your personal* sound hooks, not part of the
254
+ MISHKAN payload — feel free to remove them outright if you don't want audio
255
+ cues.
256
+
257
+ ## "Ghost subnet" — cognee containers can't reach each other
258
+
259
+ **Symptom** — fresh `docker compose up` fails with networking errors; the
260
+ containers come up but communication times out.
261
+
262
+ **Cause** — a leftover Docker network from a previous teardown with the same
263
+ IP range collides with what Compose tries to allocate. Iptables nat
264
+ PREROUTING rules from the dead bridge persist.
265
+
266
+ **Fix**
267
+
268
+ ```bash
269
+ # identify the ghost
270
+ docker network ls
271
+ ip rule show
272
+ iptables -t nat -L PREROUTING -n -v | grep -B1 -A2 br-
273
+
274
+ # remove the offending leftover network if present
275
+ docker network rm <ghost-net-id>
276
+
277
+ # bring the stack back up
278
+ cd ~/.claude/mishkan/cognee
279
+ docker compose ... up -d
280
+ ```
281
+
282
+ The fully-self-hosted compose pins the network subnet (`172.51.0.0/16`) to
283
+ avoid this collision class going forward (decision recorded in commit
284
+ `2262ea8`).
285
+
286
+ ## Useful inspection one-liners
287
+
288
+ ```bash
289
+ # container health
290
+ docker ps --filter 'name=mishkan-' --format '{{.Names}}\t{{.Status}}'
291
+
292
+ # pipeline run status (work store)
293
+ docker exec mishkan-cognee-pg psql -U cognee -d cognee_db -tc \
294
+ "SELECT status, count(*) FROM pipeline_runs GROUP BY status;"
295
+
296
+ # graph topology (any store)
297
+ docker exec mishkan-cognee-neo4j cypher-shell -u neo4j -p '<pw>' \
298
+ "MATCH (n) RETURN labels(n) AS l, count(*) AS n ORDER BY n DESC;"
299
+
300
+ # what's actually listening on the host
301
+ ss -tlnp 2>/dev/null | grep -E '127.0.0.1:77[0-9][0-9]'
302
+
303
+ # Ollama model list and embed endpoint sanity
304
+ docker exec mishkan-ollama ollama list
305
+ docker exec mishkan-cognee-mcp sh -c \
306
+ 'python3 -c "import urllib.request,json; r=urllib.request.urlopen(urllib.request.Request(\"http://ollama:11434/api/embed\", data=json.dumps({\"model\":\"nomic-embed-text:latest\",\"input\":\"hi\"}).encode(), headers={\"Content-Type\":\"application/json\"}), timeout=10); print(r.status)"'
307
+ ```
308
+
309
+ ## See also
310
+
311
+ - [Memory layer](./04-memory-layer.md) — backups and volume layout.
312
+ - [LLM provider profiles](./06-llm-providers.md) — switching providers.
313
+ - [Selective ingest](./05-selective-ingest.md) — controlling what enters
314
+ memory.
315
+ - The build's hard-won fixes are anchored in commits: `e17f2a9`, `70d3c2e`,
316
+ `e24fabf`, `418d10a`, `086e80e`, `2262ea8`.