npm - mishkan-harness - Versions diffs - 0.1.0 - Mend

mishkan-harness 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (186) hide show

package/LICENSE +21 -0
package/README.md +205 -0
package/bin/mishkan.js +221 -0
package/docs/design/MISHKAN_agent_aliases.md +140 -0
package/docs/design/MISHKAN_decisions.md +172 -0
package/docs/design/MISHKAN_harness_design.md +820 -0
package/docs/design/MISHKAN_ontology.md +87 -0
package/docs/design/MISHKAN_token_optimisation.md +181 -0
package/docs/engineer/README.md +37 -0
package/docs/engineer/profile.example.md +79 -0
package/docs/usage/01-installation.md +178 -0
package/docs/usage/02-project-init.md +151 -0
package/docs/usage/03-orchestration.md +218 -0
package/docs/usage/04-memory-layer.md +201 -0
package/docs/usage/05-selective-ingest.md +177 -0
package/docs/usage/06-llm-providers.md +195 -0
package/docs/usage/07-troubleshooting.md +316 -0
package/docs/usage/08-glossary.md +154 -0
package/docs/usage/09-workflows.md +123 -0
package/docs/usage/README.md +77 -0
package/package.json +43 -0
package/payload/install/settings.hooks.json +47 -0
package/payload/mishkan/AGENT_SPEC.md +154 -0
package/payload/mishkan/agents/ahikam.md +58 -0
package/payload/mishkan/agents/aholiab.md +68 -0
package/payload/mishkan/agents/asaph.md +73 -0
package/payload/mishkan/agents/baruch.md +88 -0
package/payload/mishkan/agents/benaiah.md +76 -0
package/payload/mishkan/agents/bezalel.md +83 -0
package/payload/mishkan/agents/caleb.md +74 -0
package/payload/mishkan/agents/deborah.md +63 -0
package/payload/mishkan/agents/elasah.md +58 -0
package/payload/mishkan/agents/eliashib.md +68 -0
package/payload/mishkan/agents/ezra.md +69 -0
package/payload/mishkan/agents/hanun.md +64 -0
package/payload/mishkan/agents/hiram.md +68 -0
package/payload/mishkan/agents/hizkiah.md +76 -0
package/payload/mishkan/agents/huldah.md +59 -0
package/payload/mishkan/agents/huram.md +66 -0
package/payload/mishkan/agents/hushai.md +59 -0
package/payload/mishkan/agents/igal.md +58 -0
package/payload/mishkan/agents/ira.md +86 -0
package/payload/mishkan/agents/jahaziel.md +71 -0
package/payload/mishkan/agents/jakin.md +66 -0
package/payload/mishkan/agents/jehonathan.md +62 -0
package/payload/mishkan/agents/jehoshaphat.md +68 -0
package/payload/mishkan/agents/joab.md +71 -0
package/payload/mishkan/agents/joah.md +62 -0
package/payload/mishkan/agents/maaseiah.md +61 -0
package/payload/mishkan/agents/meremoth.md +65 -0
package/payload/mishkan/agents/meshullam.md +67 -0
package/payload/mishkan/agents/nathan.md +70 -0
package/payload/mishkan/agents/nehemiah.md +93 -0
package/payload/mishkan/agents/obed.md +60 -0
package/payload/mishkan/agents/oholiab.md +67 -0
package/payload/mishkan/agents/palal.md +63 -0
package/payload/mishkan/agents/phinehas.md +73 -0
package/payload/mishkan/agents/rehum.md +60 -0
package/payload/mishkan/agents/salma.md +69 -0
package/payload/mishkan/agents/seraiah.md +73 -0
package/payload/mishkan/agents/shallum.md +66 -0
package/payload/mishkan/agents/shaphan.md +64 -0
package/payload/mishkan/agents/shemaiah.md +67 -0
package/payload/mishkan/agents/shevna.md +58 -0
package/payload/mishkan/agents/uriah.md +70 -0
package/payload/mishkan/agents/zaccur.md +58 -0
package/payload/mishkan/agents/zadok.md +67 -0
package/payload/mishkan/agents/zerubbabel.md +69 -0
package/payload/mishkan/cognee/.env.curated.example +61 -0
package/payload/mishkan/cognee/.env.example +165 -0
package/payload/mishkan/cognee/Dockerfile +50 -0
package/payload/mishkan/cognee/README.md +129 -0
package/payload/mishkan/cognee/docker-compose.curated-ui.yml +61 -0
package/payload/mishkan/cognee/docker-compose.curated.yml +85 -0
package/payload/mishkan/cognee/docker-compose.hardening.yml +16 -0
package/payload/mishkan/cognee/docker-compose.selfhosted.yml +114 -0
package/payload/mishkan/cognee/docker-compose.ui.yml +70 -0
package/payload/mishkan/cognee/docker-compose.yml +71 -0
package/payload/mishkan/cognee/ingest-curated.py +92 -0
package/payload/mishkan/commands/dep-audit.md +24 -0
package/payload/mishkan/commands/mishkan-init.md +25 -0
package/payload/mishkan/commands/mishkan-resume.md +21 -0
package/payload/mishkan/commands/promote.md +19 -0
package/payload/mishkan/commands/sefer-pull.md +19 -0
package/payload/mishkan/commands/sprint-close.md +21 -0
package/payload/mishkan/config/curated-library.yaml +113 -0
package/payload/mishkan/config/improvement-queries.md +29 -0
package/payload/mishkan/config/model-routing.yaml +87 -0
package/payload/mishkan/config/projects.yaml +38 -0
package/payload/mishkan/evals/baruch/README.md +93 -0
package/payload/mishkan/evals/baruch/fixtures/invalid/bad-outcome-enum.json +15 -0
package/payload/mishkan/evals/baruch/fixtures/invalid/bad-sprint-pattern.json +15 -0
package/payload/mishkan/evals/baruch/fixtures/invalid/bad-trigger-enum.json +15 -0
package/payload/mishkan/evals/baruch/fixtures/invalid/malformed-json.json +7 -0
package/payload/mishkan/evals/baruch/fixtures/invalid/missing-required-field.json +14 -0
package/payload/mishkan/evals/baruch/fixtures/valid/blocked-vendor.json +15 -0
package/payload/mishkan/evals/baruch/fixtures/valid/curated-shortcircuit.json +15 -0
package/payload/mishkan/evals/baruch/fixtures/valid/partial-no-write.json +14 -0
package/payload/mishkan/evals/baruch/fixtures/valid/resolved-cross-harness.json +15 -0
package/payload/mishkan/evals/baruch/golden_case/expected.yaml +35 -0
package/payload/mishkan/evals/baruch/golden_case/input.yaml +47 -0
package/payload/mishkan/evals/baruch/golden_case/produced.json +15 -0
package/payload/mishkan/evals/baruch/run.sh +129 -0
package/payload/mishkan/hooks/model-route.py +96 -0
package/payload/mishkan/hooks/post-tool-observe.sh +45 -0
package/payload/mishkan/hooks/pre-tool-security.sh +150 -0
package/payload/mishkan/hooks/session-start.sh +20 -0
package/payload/mishkan/hooks/stop-reporter.sh +29 -0
package/payload/mishkan/ontology.md +87 -0
package/payload/mishkan/rules/backend/yasad.md +23 -0
package/payload/mishkan/rules/common/dependencies.md +53 -0
package/payload/mishkan/rules/common/quality.md +16 -0
package/payload/mishkan/rules/common/security.md +20 -0
package/payload/mishkan/rules/documentation/sefer.md +19 -0
package/payload/mishkan/rules/frontend/panim.md +21 -0
package/payload/mishkan/rules/infrastructure/migdal.md +22 -0
package/payload/mishkan/scripts/dependency-audit.sh +171 -0
package/payload/mishkan/scripts/ensure-curated-box.sh +66 -0
package/payload/mishkan/scripts/mishkan-ingest.sh +92 -0
package/payload/mishkan/scripts/observability-aggregate.sh +57 -0
package/payload/mishkan/scripts/seed-curated-library.sh +62 -0
package/payload/mishkan/scripts/sync-profile.sh +65 -0
package/payload/mishkan/scripts/validate-research-log.sh +108 -0
package/payload/mishkan/skills/asaph-a11y-seo-craft/SKILL.md +289 -0
package/payload/mishkan/skills/baruch-research-reporting-craft/SKILL.md +460 -0
package/payload/mishkan/skills/benaiah-devsecops-craft/SKILL.md +329 -0
package/payload/mishkan/skills/bezalel-cto-craft/SKILL.md +391 -0
package/payload/mishkan/skills/caleb-web-research-craft/SKILL.md +306 -0
package/payload/mishkan/skills/cognee-promote/SKILL.md +40 -0
package/payload/mishkan/skills/cognee-quickstart/SKILL.md +66 -0
package/payload/mishkan/skills/context-compress/SKILL.md +36 -0
package/payload/mishkan/skills/deborah-ux-craft/SKILL.md +295 -0
package/payload/mishkan/skills/dependency-audit/SKILL.md +59 -0
package/payload/mishkan/skills/dependency-vetting/SKILL.md +59 -0
package/payload/mishkan/skills/documentation-craft/SKILL.md +468 -0
package/payload/mishkan/skills/ezra-research-formulation-craft/SKILL.md +319 -0
package/payload/mishkan/skills/hanun-observability-craft/SKILL.md +312 -0
package/payload/mishkan/skills/hiram-ui-craft/SKILL.md +334 -0
package/payload/mishkan/skills/hizkiah-implementation-craft/SKILL.md +701 -0
package/payload/mishkan/skills/hushai-security-advisor-craft/SKILL.md +282 -0
package/payload/mishkan/skills/ira-code-security-craft/SKILL.md +553 -0
package/payload/mishkan/skills/jakin-intent-clarification-craft/SKILL.md +299 -0
package/payload/mishkan/skills/jehonathan-publication-craft/SKILL.md +262 -0
package/payload/mishkan/skills/joab-app-security-craft/SKILL.md +266 -0
package/payload/mishkan/skills/meremoth-devops-craft/SKILL.md +298 -0
package/payload/mishkan/skills/meshullam-infra-design-craft/SKILL.md +302 -0
package/payload/mishkan/skills/mishkan-ingest/SKILL.md +65 -0
package/payload/mishkan/skills/mishkan-init/SKILL.md +65 -0
package/payload/mishkan/skills/nathan-architecture-craft/SKILL.md +547 -0
package/payload/mishkan/skills/nehemiah-pm-craft/SKILL.md +484 -0
package/payload/mishkan/skills/obed-asset-pipeline-craft/SKILL.md +286 -0
package/payload/mishkan/skills/oholiab-design-system-craft/SKILL.md +334 -0
package/payload/mishkan/skills/palal-systems-craft/SKILL.md +281 -0
package/payload/mishkan/skills/qa-evaluation-craft/SKILL.md +406 -0
package/payload/mishkan/skills/rehum-sre-advisor-craft/SKILL.md +228 -0
package/payload/mishkan/skills/reporter-discipline-craft/SKILL.md +351 -0
package/payload/mishkan/skills/research-pipeline/SKILL.md +55 -0
package/payload/mishkan/skills/salma-frontend-implementation-craft/SKILL.md +369 -0
package/payload/mishkan/skills/sefer-pull/SKILL.md +37 -0
package/payload/mishkan/skills/shallum-database-craft/SKILL.md +347 -0
package/payload/mishkan/skills/shaphan-summarisation-craft/SKILL.md +271 -0
package/payload/mishkan/skills/shemaiah-evaluation-craft/SKILL.md +342 -0
package/payload/mishkan/skills/sprint-report/SKILL.md +28 -0
package/payload/mishkan/skills/team-lead-craft/SKILL.md +457 -0
package/payload/mishkan/skills/zadok-contract-craft/SKILL.md +520 -0
package/payload/mishkan/templates/case-node.schema.json +22 -0
package/payload/mishkan/templates/mcp.json +22 -0
package/payload/mishkan/templates/observability-log.schema.json +24 -0
package/payload/mishkan/templates/project-CLAUDE.md +47 -0
package/payload/mishkan/templates/research-log.schema.json +40 -0
package/payload/mishkan/templates/settings.json +12 -0
package/payload/mishkan/templates/settings.local.json +6 -0
package/payload/mishkan/templates/sprint-state.schema.json +47 -0
package/payload/mishkan/templates/team-report.schema.json +50 -0
package/payload/mishkan/templates/user-CLAUDE.md +62 -0
package/payload/mishkan/workflows/README.md +88 -0
package/payload/mishkan/workflows/mishkan-architecture-panel.js +156 -0
package/payload/mishkan/workflows/mishkan-codebase-audit.js +188 -0
package/payload/mishkan/workflows/mishkan-deep-research.js +251 -0
package/payload/mishkan/workflows/mishkan-init.js +156 -0
package/payload/mishkan/workflows/mishkan-migration-wave.js +180 -0
package/payload/mishkan/workflows/mishkan-release-readiness.js +163 -0
package/payload/mishkan/workflows/mishkan-sprint-close.js +112 -0
package/payload/user/CLAUDE.md +62 -0
package/payload/user/rules/engineer-standards.md +66 -0
package/payload/user/rules/y4nn-standards.md +167 -0

package/docs/usage/05-selective-ingest.md ADDED Viewed

@@ -0,0 +1,177 @@
+# 05 — Selective ingest
+> Goal: explain how documents enter the work cognee graph, and why the default
+> is "memory is opt-in, not bulk".
+## The contract
+**Nothing enters the work graph unless tagged or explicitly invoked.** This is
+the harness-wide rule that the `mishkan-ingest` skill enforces. It solves two
+real problems hit during the build (commit `6213611`):
+- **PII bleed.** Bulk-ingesting `docs/` pulls in incident reports that contain
+  real email addresses, internal hostnames, ticket numbers — all of which then
+  sit in the project graph alongside curated reference material.
+- **Oversized-doc embedding failures.** `nomic-embed-text` rejects chunks
+  larger than 8,192 tokens with a 422; one too-large document jams cognify
+  retries indefinitely.
+Both go away when you choose what enters memory deliberately.
+## Two ways to select
+### 1. Frontmatter tag (standing intent)
+Add a YAML frontmatter block at the very top of a doc:
+```yaml
+---
+mishkan: ingest
+---
+# Doc title
+…
+```
+That single key is enough. Any other frontmatter (`author`, `date`, etc.)
+co-exists fine. The tag means *"this doc is part of the project's persistent
+memory"*. The skill default mode walks `./docs/` and ingests every tagged file.
+### 2. Explicit paths (ad-hoc pull)
+Skip the tag, name the files:
+```bash
+bash ~/.claude/mishkan/scripts/mishkan-ingest.sh docs/SECURITY.md docs/ROADMAP.md
+```
+Useful for one-off pulls or when the doc lives outside the standard `docs/`
+tree.
+## Invoking the skill
+```bash
+# Default — walk ./docs/ for tagged docs
+bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --tagged-only
+# Explicit files (no tag required)
+bash ~/.claude/mishkan/scripts/mishkan-ingest.sh path/to/a.md path/to/b.md
+# Override the dataset name (default is basename of $PWD)
+bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --dataset=research docs/research.md
+# Show inline help
+bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --help
+```
+## What the skill runs
+1. **Selects files.** Tagged-only walks `./docs/` (or any directory you pass)
+   and keeps only `.md` files whose YAML frontmatter contains `mishkan: ingest`.
+   Explicit paths skip the filter.
+2. **Stages** the files into the work cognee-mcp container at
+   `/home/cognee/ingest_buf/`.
+3. **Runs `cognee.add(files, dataset_name=<project>)`** — registers and chunks
+   under the target dataset.
+4. **`cognify(datasets=[<project>])`** — LLM extracts entities + relationships.
+   Subject to the work box's `LLM_RATE_LIMIT_*` throttle and now-persistent
+   storage (commits `70d3c2e` + `e24fabf`).
+5. **`memify(dataset=<project>)`** — embeds the triplet layer into pgvector
+   (commit `210f92b` made this automatic after every cognify).
+Output marks each step: `>> added N file(s) -> <dataset>`, `>> cognified`,
+`>> memified`.
+## Naming the target dataset
+By default the dataset is named after the project directory:
+```bash
+cd ~/code/aiobi-mail
+bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --tagged-only
+# → ingests into dataset "aiobi-mail" in the work store
+```
+Override with `--dataset=<name>` when:
+- You want a sub-corpus of the project (`--dataset=architecture-only`).
+- The basename collides with another project (rename one).
+- You want to ingest into a personal dataset (e.g. `--dataset=research`).
+The skill **never** writes to `cognee-curated`. The curated store is read-only
+in normal use; only the harness's `seed-curated-library.sh` writes to it, and
+that targets `mishkan-curated-mcp` explicitly.
+## A worked example
+Tag two docs and leave the rest untouched:
+```bash
+cd ~/code/aiobi-mail
+# tag SECURITY.md and ROADMAP.md
+for f in docs/SECURITY.md docs/ROADMAP.md; do
+  if ! head -1 "$f" | grep -qx '---'; then
+    printf '%s\n%s\n%s\n\n' '---' 'mishkan: ingest' '---' | cat - "$f" > "$f.tmp" && mv "$f.tmp" "$f"
+  fi
+done
+# run the skill
+bash ~/.claude/mishkan/scripts/mishkan-ingest.sh --tagged-only
+# verify in the graph
+docker exec mishkan-cognee-pg psql -U cognee -d cognee_db -tc \
+  "SELECT d.name, count(dd.data_id) AS items
+   FROM datasets d LEFT JOIN dataset_data dd ON dd.dataset_id=d.id
+   WHERE d.name='aiobi-mail' GROUP BY d.name;"
+```
+The other docs in `docs/` (the 79KB migration report, French PDFs, stale
+upstream READMEs) stay out of the graph. Re-running the skill picks up newly
+tagged files only; previously-cognified docs are skipped via cognee's pipeline
+status.
+## Re-ingesting after a doc changes
+The skill is additive. To refresh a doc that's already been cognified:
+1. Edit the doc.
+2. Mark its existing dataset entry as needing reprocessing (cognee tracks
+   per-data-item pipeline status). If a clean reset is wanted:
+   ```python
+   # one-shot, run inside the work mcp container
+   import asyncio, cognee
+   from cognee.modules.users.methods import get_default_user
+   from cognee.modules.data.methods import get_datasets, delete_dataset
+   async def m():
+       u = await get_default_user()
+       for d in await get_datasets(u.id):
+           if d.name == "<project>": await delete_dataset(d)
+   asyncio.run(m())
+   ```
+3. Rerun `mishkan-ingest.sh --tagged-only`.
+That removes the relational dataset records cleanly. Note: with cognee access
+control off, deleting a dataset does **not** remove the graph nodes — for a
+true reset, also drop the graph labels for that dataset. See
+[Troubleshooting](./07-troubleshooting.md) for the cleanup pattern used during
+the build.
+## What the skill is *not*
+- Not a sync. It does not detect deletions or watch the filesystem.
+- Not a translator. Non-markdown files are skipped in directory walks.
+- Not a curation tool for the **curated** store. Curated is a separate seed
+  flow (`seed-curated-library.sh` against the curated MCP).
+- Not an autonomous "cognee always knows everything" mechanism. The whole
+  point is *deliberate* memory.
+## See also
+- The skill itself: `payload/mishkan/skills/mishkan-ingest/SKILL.md`.
+- The script: `payload/mishkan/scripts/mishkan-ingest.sh`.
+- Commit `6213611` (introduction).
+- Memory layer architecture: [04](./04-memory-layer.md).
+- Provider profiles (cognify uses the LLM): [06](./06-llm-providers.md).
+- If cognify errors on the last doc:
+  [Troubleshooting](./07-troubleshooting.md#cognify-stuck-on-the-last-doc).

package/docs/usage/06-llm-providers.md ADDED Viewed

@@ -0,0 +1,195 @@
+# 06 — LLM provider profiles
+> Goal: choose the right LLM + embedding combination for cognify, avoid the
+> traps the build hit (daily caps, thinking models, oversized chunks), and
+> match provider to box.
+## The split: agents vs cognee
+There are **two different model populations** in MISHKAN and they don't have
+to use the same provider:
+| Population | What runs it | Where it's configured |
+|---|---|---|
+| **Agents** (the 45) | Claude Code's own model routing (Opus / Sonnet / Haiku) per tier | `payload/mishkan/config/model-routing.yaml` (enforced by `hooks/model-route.py`) |
+| **Cognee's `cognify` extraction + embeddings** | a provider you choose | `payload/mishkan/cognee/.env` (work box) and `.env.curated` (curated box) |
+This chapter is about the second — the model that powers `cognify`/`memify`/`search`
+inside the cognee containers. The agents' Claude tiers are covered in
+[Orchestration](./03-orchestration.md).
+## Match provider to box
+The two stores have different threat models, which translate into different
+provider choices.
+| Store | Contains | Provider recommendation | Why |
+|---|---|---|---|
+| **Work** (`:7777`) | project knowledge, may contain PII | **Local Ollama LLM** (private, no quota), or paid/no-train cloud, or a free cloud you accept training on | every free cloud tier trains on prompts; PII shouldn't leak |
+| **Curated** (`:7730`) | public reference resources, no PII | Any free cloud (Gemini, NVIDIA catalog, OpenRouter named-free) is fine | nothing sensitive |
+**Embeddings should be local** in both stores. Bulk ingest fires many embedding
+calls in a burst; cloud free-tier embedding endpoints 429 on that pattern. Local
+Ollama (`nomic-embed-text`, 768-dim) has no rate cap and embeds in milliseconds
+once the model is loaded.
+## The five provider profiles in `.env.example`
+The shipped `.env.example` carries five commented profiles. Pick one, uncomment,
+recreate the relevant services.
+| Profile | LLM | Embeddings | Use it when |
+|---|---|---|---|
+| **A — fully self-hosted (Ollama)** | local `qwen2.5:3b` (recommended) or `llama3.1:8b` | local `nomic-embed-text` | want privacy + zero cost, accept slower cognify; the default for work box if there's PII |
+| **B — Google Gemini** | `gemini-2.5-flash` | `gemini-embedding-001` (3072-dim) | fast cloud, **need a billing-enabled key** — bare free keys 429 immediately |
+| **C — OpenAI** | `gpt-5-mini` (or current) | `text-embedding-3-large` (3072-dim) | familiar, paid, reliable |
+| **D — Anthropic/Claude LLM + OpenAI embeddings** | `claude-sonnet-4-5` | OpenAI's | **must split** — Claude ships no embedding model |
+| **E — NVIDIA API Catalog (OpenAI-compatible)** | a non-thinking catalog model (e.g. `meta/llama-3.3-70b-instruct`) | local Ollama | **recommended low-cost cloud** — generous free testing tier, OpenAI-compatible |
+The dimension column matters: **embedding dimensions cannot change after first
+ingest without wiping the vector store**. Pick 768 (Ollama / Gemini) or 3072
+(OpenAI / Gemini-embedding-001) and stick with it. This caveat is documented
+in the `.env.example` header.
+## Hybrid is fine — and the recommended starting point
+Cloud LLM + local embeddings is the practical hybrid. Live `.env` example used
+during the build:
+```
+LLM_PROVIDER=gemini
+LLM_MODEL=gemini/gemini-2.5-flash
+LLM_API_KEY=<billed key>
+EMBEDDING_PROVIDER=ollama
+EMBEDDING_MODEL=nomic-embed-text:latest
+EMBEDDING_ENDPOINT=http://ollama:11434/api/embed
+EMBEDDING_DIMENSIONS=768
+HUGGINGFACE_TOKENIZER=nomic-ai/nomic-embed-text-v1.5
+```
+After the NVIDIA pivot (when Gemini's daily cap kept hitting):
+```
+LLM_PROVIDER=custom
+LLM_MODEL=openai/meta/llama-3.3-70b-instruct
+LLM_ENDPOINT=https://integrate.api.nvidia.com/v1
+LLM_API_KEY=<nvapi-...>
+LLM_MAX_TOKENS=16384
+# embeddings unchanged (local Ollama)
+```
+## Rate cap vs daily cap — they are different walls
+This caught the build out and bears repeating:
+| Cap | Symptom | What helps |
+|---|---|---|
+| **Per-minute (RPM)** | `429` mid-run, after a burst of fast calls | `LLM_RATE_LIMIT_*` throttle in `.env` (8 req/60s default) |
+| **Per-day (RPD)** | `429 RESOURCE_EXHAUSTED` early in a run that's been quiet | **nothing in-process helps** — wait for reset, switch provider, or use a paid tier |
+The throttle (`LLM_RATE_LIMIT_ENABLED=true`, `_REQUESTS=8`, `_INTERVAL=60`) was
+added in commit `70d3c2e` after Gemini free-tier bulk-cognify kept blowing the
+per-minute window. It cannot rescue you from RPD — Gemini's free RPD is small
+enough that even one large doc cognify can exhaust it.
+If you keep hitting RPD on a free cloud tier, the durable fixes are
+(in increasing severity):
+1. **Selectively ingest** (don't cognify large unneeded docs — see
+   [05](./05-selective-ingest.md)).
+2. **Switch to NVIDIA API Catalog** (Profile E) for a more generous free tier.
+3. **Switch the work box to local Ollama LLM** (Profile A) — slowest but
+   no quota wall and private.
+## The thinking-model trap
+DeepSeek V4 Pro, NVIDIA Nemotron reasoning models, and similar are **thinking
+models**: they emit `<think>...</think>` tokens before the visible answer. Two
+problems for cognee:
+- **Cost / latency.** Every extraction call burns thousands of reasoning tokens
+  before the structured output.
+- **Instructor breaks.** Cognee uses `instructor` for JSON parsing of structured
+  output. A reasoning preamble before the JSON throws off the parser.
+You need to **disable thinking** for cognee. The canonical knob (from NVIDIA's
+own docs) is:
+```python
+extra_body={"chat_template_kwargs":{"thinking": False}}
+```
+In `.env` via litellm:
+```
+LLM_ARGS={"extra_body":{"chat_template_kwargs":{"thinking":false}}}
+```
+**Caveat from the build:** during this session, litellm's `extra_body`
+forwarding through the `custom` provider path was unreliable, and the flag
+sometimes didn't reach NVIDIA — calls then 504-timed out as the model thought
+unbounded. **The reliable workaround is to pick a non-thinking model** (e.g.
+`meta/llama-3.3-70b-instruct` on the NVIDIA catalog) rather than fight the flag.
+## Embedding dimensions and limits
+| Embedding model | Dim | Max tokens / chunk | Notes |
+|---|---|---|---|
+| `nomic-embed-text` (Ollama, local) | 768 | 8,192 | the default; long chunks 422 — see [Troubleshooting](./07-troubleshooting.md) |
+| `text-embedding-3-large` (OpenAI) | 3,072 | 8,191 | cloud, paid |
+| `gemini-embedding-001` (Gemini AI Studio) | 3,072 | up to ~30K | cloud; older `text-embedding-004` retired on v1beta (commit `e17f2a9`) |
+The 8,192-token limit on `nomic-embed-text` is the reason cognify can jam on a
+single oversized chunk. Lower cognee's `LLM_MAX_TOKENS` if you see persistent
+422s on embedding (the chunker uses the same value).
+## How to switch profiles
+1. Edit `~/.claude/mishkan/cognee/.env` — comment the active block, uncomment
+   the chosen profile, set the key.
+2. If embedding dimensions changed, the vector store must be wiped:
+   ```bash
+   docker exec mishkan-cognee-pg psql -U cognee -d cognee_db -c \
+     "DROP SCHEMA public CASCADE; CREATE SCHEMA public;"
+   # then prune cognee's relational state
+   ```
+   (Documents will need to be re-ingested.)
+3. Recreate the services that read the env:
+   ```bash
+   cd ~/.claude/mishkan/cognee
+   docker compose -f docker-compose.yml -f docker-compose.hardening.yml \
+                  -f docker-compose.selfhosted.yml -f docker-compose.ui.yml \
+                  --profile ui up -d --force-recreate --no-build \
+                  cognee-mcp cognee-backend
+   ```
+4. Re-run an ingest to confirm cognify completes against the new provider.
+## Sanity-check any new key before a bulk run
+A 30-second curl saves hours:
+```bash
+K='<your-key>'
+# Gemini
+curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$K" \
+  -H 'Content-Type: application/json' \
+  -d '{"contents":[{"parts":[{"text":"say ok"}]}]}'
+# NVIDIA / OpenAI-compatible
+curl -s -X POST "https://integrate.api.nvidia.com/v1/chat/completions" \
+  -H "Authorization: Bearer $K" -H 'Content-Type: application/json' \
+  -d '{"model":"meta/llama-3.3-70b-instruct","messages":[{"role":"user","content":"reply with single word ok"}],"max_tokens":20}'
+```
+A `200` with the model's reply means the key + model id are good. A `429
+RESOURCE_EXHAUSTED` on the very first call means the daily quota is
+already gone.
+## See also
+- Throttle introduction: commit `70d3c2e`.
+- Provider profile cleanup + Gemini embedding model fix: commit `e17f2a9`.
+- Hybrid Gemini-LLM + Ollama-embed: live `.env` evolution during the build.
+- Cognee provider catalog: `~/.claude/mishkan/cognee/_src/cognee/.env.template`
+  (read-only reference).
+- Why daily cap fixes are out of in-process scope: [Troubleshooting §RPD](./07-troubleshooting.md#daily-quota-rpd-wall).

package/docs/usage/07-troubleshooting.md ADDED Viewed

@@ -0,0 +1,316 @@
+# 07 — Troubleshooting
+> A cookbook of the issues hit during the build and how they were resolved.
+> Each entry: symptom → root cause → fix → commit / file reference.
+## cognify errors with `Status code: 422`
+The 422 is cognee's generic wrapper; the actual cause is upstream. Two distinct
+classes:
+### (a) Embedding 422 — `Failed to index data points using model nomic-embed-text`
+**Symptom**
+```
+EmbeddingException: Failed to index data points using model nomic-embed-text:latest (Status code: 422)
+```
+repeated in a retry loop on the same data item.
+**Causes (in order of likelihood)**
+1. A chunk exceeds the model's input limit (8,192 tokens for nomic). Cognee
+   keeps retrying the same offending chunk.
+2. Ollama transiently failed to load the embedding model (resource pressure)
+   and is still warming back up.
+**Fix**
+- Identify the failing document. Most often it's the largest in the dataset.
+- Either tag it out of selective ingest (see [05](./05-selective-ingest.md))
+  or reduce cognee's chunk target by lowering `LLM_MAX_TOKENS` in `.env`.
+- For transient Ollama load failures, wait, then re-run cognify — the persistent
+  storage fix (commit `e24fabf`) means progress sticks across retries.
+### (b) LLM 422 — `Pipeline run failed. Data item could not be processed.`
+**Symptom**
+```
+PipelineRunFailedError: Pipeline run failed. Data item could not be processed. (Status code: 422)
+```
+**Causes**
+- Free-tier cloud LLM hit per-minute or per-day cap → 429 wrapped as 422.
+- Thinking model emitting reasoning before the structured output → instructor
+  fails to parse JSON.
+**Fix**
+- Per-minute: enable the throttle (`LLM_RATE_LIMIT_ENABLED=true`,
+  `LLM_RATE_LIMIT_REQUESTS=8`, `LLM_RATE_LIMIT_INTERVAL=60`). See commit
+  `70d3c2e`.
+- Per-day: see [Daily quota wall](#daily-quota-rpd-wall) below.
+- Thinking model: switch to a non-thinking model on the same key (e.g.
+  `meta/llama-3.3-70b-instruct` on NVIDIA Catalog). See
+  [LLM providers — thinking-model trap](./06-llm-providers.md#the-thinking-model-trap).
+## Cognify stuck on the last doc
+**Symptom** — pipeline runs are mostly COMPLETED, one stuck as STARTED for
+half an hour, graph not advancing.
+**Diagnosis** — check what cognee is actually doing:
+```bash
+# is the LLM endpoint being called at all?
+docker logs --since 5m mishkan-cognee-mcp 2>&1 | grep -iE "extraction|nodes_extracted|429|timeout|embedding"
+# the cognee internal log file (more detail)
+docker exec mishkan-cognee-mcp sh -c 'tail -300 /home/cognee/.cognee/logs/$(ls -t /home/cognee/.cognee/logs/ | head -1)' \
+  | grep -iE "error|exception|retry|429" | tail -20
+```
+**Common root causes**
+- Embedding 422 retry loop on one chunk (above).
+- Stale `DATASET_PROCESSING_STARTED` row blocking re-runs (below).
+- Daily quota exhausted mid-run (below).
+## Stale pipeline lock — `Dataset is already being processed`
+**Symptom** — `cognee.cognify(datasets=[...])` returns immediately, logs say
+*"Dataset is already being processed"*. The work graph doesn't grow.
+**Cause** — a previous cognify died (timeout, OOM, interrupted) without
+clearing its `DATASET_PROCESSING_STARTED` row in `pipeline_runs`. Cognee's
+qualification check refuses to start a new run while one is "in progress".
+**Fix**
+```bash
+docker exec mishkan-cognee-pg psql -U cognee -d cognee_db -c \
+  "UPDATE pipeline_runs SET status='DATASET_PROCESSING_ERRORED'
+   WHERE status='DATASET_PROCESSING_STARTED'
+     AND created_at < NOW() - INTERVAL '5 minutes';"
+```
+Then re-run cognify. The dataset and its data items are intact; only the stale
+lock row is cleared.
+## Storage wiped on every `docker compose up --force-recreate`
+**Symptom** — re-running cognify on an existing dataset errors with
+```
+FileNotFoundError: Storage directory does not exist
+```
+even though the data items are still listed in `datasets`.
+**Cause** — cognee's default data + system root is venv-relative inside the
+container (`.venv/.../cognee/.cognee_data` and `.cognee_system`), which is the
+container's ephemeral layer. The Docker volume that ships with the compose was
+mounted at `/app/cognee-mcp/.cognee_system` but cognee didn't write there by
+default — so every recreate wiped the ingested source files.
+**Fix (already in payload from commit `e24fabf`)** — point cognee at the
+mounted volume via `.env`:
+```
+DATA_ROOT_DIRECTORY=/app/cognee-mcp/.cognee_system/data
+SYSTEM_ROOT_DIRECTORY=/app/cognee-mcp/.cognee_system/system
+```
+The Dockerfile now pre-creates `.cognee_system` as the `cognee` user (uid
+10001), so a fresh named volume inherits writable ownership without a manual
+chown.
+**If you upgrade from a pre-`e24fabf` install** — the existing volume is
+root-owned. Chown it once:
+```bash
+docker run --rm -u 0 -v mishkan-cognee_cognee_data:/v busybox \
+  sh -c 'chown -R 10001:10001 /v'
+docker compose ... up -d --force-recreate cognee-mcp
+```
+## Curated library is showing inside the work UI
+**Symptom** — the Cognee UI at `:7724` (work backend) shows `CuratedResource`
+nodes mixed with project data.
+**Cause** — the curated library got seeded into the work store (incorrect).
+Real fix: physical separation per D-007. Was hit during the build (the seed
+initially ran against the work box) and is what the curated box exists for.
+**Fix**
+1. Ensure the curated box is running (`scripts/ensure-curated-box.sh`).
+2. Re-run the curated seed against `mishkan-curated-mcp` (the script's default
+   container since commit `086e80e`).
+3. Delete the `CuratedResource` and `Team` labels from the work Neo4j:
+   ```bash
+   P='<work neo4j password from .env>'
+   docker exec mishkan-cognee-neo4j cypher-shell -u neo4j -p "$P" \
+     "MATCH (n:CuratedResource) DETACH DELETE n;"
+   docker exec mishkan-cognee-neo4j cypher-shell -u neo4j -p "$P" \
+     "MATCH (n:Team) DETACH DELETE n;"
+   ```
+4. Drop the stray `curated_library` dataset row from the work cognee_db via
+   cognee's `delete_dataset` API (see commit `418d10a` for the exact cleanup
+   pattern used during the build).
+`claude_code_memory` is **not** stray — it is the per-client memory dataset.
+Don't delete it.
+## Neo4j Browser "Could not perform discovery. No routing servers available"
+**Symptom** — Neo4j Browser on `:7716` (or `:7731`) loads, but connecting to
+`neo4j://localhost:7709` fails with the routing error.
+**Cause** — the `neo4j://` URI scheme triggers cluster routing discovery, which
+fails over a single-instance bolt connection and over SSH tunnels.
+**Fix** — use the `bolt://` scheme:
+```
+Connect URL: bolt://localhost:7709     # work
+             bolt://localhost:7732     # curated
+```
+## `tsh` tunnel: `Failed to bind to 127.0.0.1:NNNN: address already in use`
+**Cause** — a previous tsh forward is still alive on your laptop holding the
+port; tsh aborts the whole tunnel on any one bind failure.
+**Fix on your laptop**
+```bash
+lsof -nP -iTCP:7724 -sTCP:LISTEN     # find what's holding it
+pkill -f 'tsh ssh'                   # kill the stale tunnel(s)
+```
+Then re-run the full tunnel command.
+## Daily quota (RPD) wall
+**Symptom** — every retry of cognify returns `429 RESOURCE_EXHAUSTED` instantly,
+including the first one of the run. Cognee's throttle has no effect.
+**Cause** — the cloud free tier's **daily** request budget is exhausted. The
+throttle controls per-minute rate; it cannot rescue a daily cap.
+**Fix** — pick one:
+- Wait for the cap to reset (24 h on most free tiers).
+- Switch to a more generous free tier (NVIDIA API Catalog).
+- Switch the work box to local Ollama (Profile A — zero cost, no quota, slow).
+- Move to a paid tier on the same provider.
+This is precisely why the harness recommends **local Ollama for the work store**
+(see [LLM providers](./06-llm-providers.md)) when project data has PII or is
+voluminous.
+## Auto-mode classifier blocks writing `.claude/settings.json` / `.mcp.json`
+**Symptom** — the Claude Code auto-mode classifier denies the agent's write to
+agent-config files even when invoked by `/mishkan-init`.
+**Cause** — the classifier treats `.claude/settings.json`, `.mcp.json`,
+`settings.local.json`, and (sometimes) `CLAUDE.md` as **self-modification**
+and refuses autonomous writes.
+**Fix** — pick one:
+- Approve each write at the prompt.
+- Disable the auto-mode classifier for this session, then re-run init.
+- Add a permission rule that allows these specific writes.
+There is no harness change needed; this is a Claude Code platform guard doing
+its job, not a MISHKAN bug.
+## `afplay: not found` Stop-hook error on Linux
+**Symptom** — every turn ends with
+```
+Stop hook error: Failed with non-blocking status code: /bin/sh: 1: afplay: not found
+```
+**Cause** — the personal sound hooks in `~/.claude/settings.json` use `afplay`
+(macOS-only). On Linux, that command doesn't exist.
+**Fix** — make the command portable. Replace the hook command string with:
+```sh
+sh -c 'F="<path-to-mp3>"; { command -v afplay >/dev/null 2>&1 && afplay -v 0.1 "$F"; } || { command -v ffplay >/dev/null 2>&1 && ffplay -nodisp -autoexit -loglevel quiet -volume 10 "$F"; } || true'
+```
+Tries `afplay` first (macOS), falls back to `ffplay` (Linux), silently no-ops
+if neither is present. These are *your personal* sound hooks, not part of the
+MISHKAN payload — feel free to remove them outright if you don't want audio
+cues.
+## "Ghost subnet" — cognee containers can't reach each other
+**Symptom** — fresh `docker compose up` fails with networking errors; the
+containers come up but communication times out.
+**Cause** — a leftover Docker network from a previous teardown with the same
+IP range collides with what Compose tries to allocate. Iptables nat
+PREROUTING rules from the dead bridge persist.
+**Fix**
+```bash
+# identify the ghost
+docker network ls
+ip rule show
+iptables -t nat -L PREROUTING -n -v | grep -B1 -A2 br-
+# remove the offending leftover network if present
+docker network rm <ghost-net-id>
+# bring the stack back up
+cd ~/.claude/mishkan/cognee
+docker compose ... up -d
+```
+The fully-self-hosted compose pins the network subnet (`172.51.0.0/16`) to
+avoid this collision class going forward (decision recorded in commit
+`2262ea8`).
+## Useful inspection one-liners
+```bash
+# container health
+docker ps --filter 'name=mishkan-' --format '{{.Names}}\t{{.Status}}'
+# pipeline run status (work store)
+docker exec mishkan-cognee-pg psql -U cognee -d cognee_db -tc \
+  "SELECT status, count(*) FROM pipeline_runs GROUP BY status;"
+# graph topology (any store)
+docker exec mishkan-cognee-neo4j cypher-shell -u neo4j -p '<pw>' \
+  "MATCH (n) RETURN labels(n) AS l, count(*) AS n ORDER BY n DESC;"
+# what's actually listening on the host
+ss -tlnp 2>/dev/null | grep -E '127.0.0.1:77[0-9][0-9]'
+# Ollama model list and embed endpoint sanity
+docker exec mishkan-ollama ollama list
+docker exec mishkan-cognee-mcp sh -c \
+  'python3 -c "import urllib.request,json; r=urllib.request.urlopen(urllib.request.Request(\"http://ollama:11434/api/embed\", data=json.dumps({\"model\":\"nomic-embed-text:latest\",\"input\":\"hi\"}).encode(), headers={\"Content-Type\":\"application/json\"}), timeout=10); print(r.status)"'
+```
+## See also
+- [Memory layer](./04-memory-layer.md) — backups and volume layout.
+- [LLM provider profiles](./06-llm-providers.md) — switching providers.
+- [Selective ingest](./05-selective-ingest.md) — controlling what enters
+  memory.
+- The build's hard-won fixes are anchored in commits: `e17f2a9`, `70d3c2e`,
+  `e24fabf`, `418d10a`, `086e80e`, `2262ea8`.