npm - @pentatonic-ai/ai-agent-sdk - Versions diffs - 0.6.0 → 0.7.0 - Mend

@pentatonic-ai/ai-agent-sdk 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (94) hide show

package/packages/memory-engine/docs/RUNBOOK-AWS.md ADDED Viewed

@@ -0,0 +1,375 @@
+# pentatonic-memory-engine — AWS deployment runbook (v1)
+**Target:** single EC2 (`m6i.2xlarge`) in `us-east-1`, network-boundary auth via Cloudflare Tunnel.
+**Operator:** Phil Hauser (or anyone with `AdministratorAccess` to account `170649632502`).
+**Estimated time end-to-end:** ~45 minutes (mostly waiting for instance/volume provisioning).
+---
+## 0. Prerequisites
+Before starting, verify:
+```bash
+aws sts get-caller-identity
+# Should return Account: 170649632502, AdministratorAccess role
+aws configure get region
+# us-east-1
+```
+If region isn't set: `export AWS_REGION=us-east-1` for the rest of the session.
+You'll also need:
+- A **Cloudflare account** with access to the Pentatonic CF zone (for Tunnel setup)
+- The **`pentatonic-ai-gateway` API key** (from lambda.dev — should already exist)
+---
+## 1. Variables (paste once, reuse below)
+```bash
+export AWS_REGION=us-east-1
+export ENV=prod
+export NAME=pme-${ENV}-us-east-1
+export INSTANCE_TYPE=m6i.2xlarge
+# Latest Ubuntu 22.04 LTS in us-east-1 (verify via aws ec2 describe-images if needed)
+export AMI_ID=$(aws ec2 describe-images \
+  --owners 099720109477 \
+  --filters "Name=name,Values=ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*" \
+  --query 'Images | sort_by(@, &CreationDate) | [-1].ImageId' \
+  --output text)
+echo "Using AMI: $AMI_ID"
+```
+---
+## 2. Networking
+Use the default VPC for v1. (Multi-VPC isolation is a v2 concern.)
+```bash
+export VPC_ID=$(aws ec2 describe-vpcs \
+  --filters "Name=is-default,Values=true" \
+  --query 'Vpcs[0].VpcId' --output text)
+export SUBNET_ID=$(aws ec2 describe-subnets \
+  --filters "Name=vpc-id,Values=$VPC_ID" "Name=default-for-az,Values=true" \
+  --query 'Subnets[0].SubnetId' --output text)
+echo "VPC=$VPC_ID  Subnet=$SUBNET_ID"
+```
+### 2.1 Security group
+No public ingress. Outbound 443/80/53 for Tunnel + gateway + apt + DNS.
+```bash
+export SG_ID=$(aws ec2 create-security-group \
+  --group-name $NAME-sg \
+  --description "pentatonic-memory-engine $ENV — outbound only; ingress via SSM" \
+  --vpc-id $VPC_ID \
+  --query 'GroupId' --output text)
+# Outbound is allowed by default. Strip default outbound and re-add explicitly.
+aws ec2 revoke-security-group-egress \
+  --group-id $SG_ID \
+  --ip-permissions '[{"IpProtocol":"-1","IpRanges":[{"CidrIp":"0.0.0.0/0"}]}]'
+aws ec2 authorize-security-group-egress --group-id $SG_ID \
+  --ip-permissions '[
+    {"IpProtocol":"tcp","FromPort":443,"ToPort":443,"IpRanges":[{"CidrIp":"0.0.0.0/0","Description":"HTTPS for tunnel + gateway + apt"}]},
+    {"IpProtocol":"tcp","FromPort":80, "ToPort":80, "IpRanges":[{"CidrIp":"0.0.0.0/0","Description":"HTTP for apt fallback"}]},
+    {"IpProtocol":"udp","FromPort":53, "ToPort":53, "IpRanges":[{"CidrIp":"0.0.0.0/0","Description":"DNS"}]},
+    {"IpProtocol":"tcp","FromPort":53, "ToPort":53, "IpRanges":[{"CidrIp":"0.0.0.0/0","Description":"DNS-over-TCP"}]}
+  ]'
+echo "SG=$SG_ID"
+```
+**No inbound rule.** Ops access happens via SSM Session Manager (next step), not SSH.
+---
+## 3. IAM role for SSM Session Manager + EBS snapshot agent
+Lets you `aws ssm start-session` into the box without an SSH key.
+```bash
+aws iam create-role --role-name $NAME-role \
+  --assume-role-policy-document '{
+    "Version":"2012-10-17",
+    "Statement":[{"Effect":"Allow","Principal":{"Service":"ec2.amazonaws.com"},"Action":"sts:AssumeRole"}]
+  }'
+aws iam attach-role-policy --role-name $NAME-role \
+  --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
+aws iam create-instance-profile --instance-profile-name $NAME-profile
+aws iam add-role-to-instance-profile \
+  --instance-profile-name $NAME-profile \
+  --role-name $NAME-role
+# Wait for IAM eventual-consistency before launching EC2
+sleep 10
+```
+---
+## 4. EBS volumes
+Five `gp3` volumes, 50 GiB each (resize online later if needed). One per layer's data dir.
+```bash
+export AZ=$(aws ec2 describe-subnets --subnet-ids $SUBNET_ID \
+  --query 'Subnets[0].AvailabilityZone' --output text)
+for layer in l2 l3 l4 l5 l6; do
+  vol_id=$(aws ec2 create-volume \
+    --availability-zone $AZ \
+    --size 50 --volume-type gp3 \
+    --tag-specifications "ResourceType=volume,Tags=[{Key=Name,Value=$NAME-$layer},{Key=pme-layer,Value=$layer}]" \
+    --query 'VolumeId' --output text)
+  echo "$layer = $vol_id"
+  eval "export VOL_${layer}=$vol_id"
+done
+# Wait until all are 'available'
+aws ec2 wait volume-available --volume-ids $VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6
+echo "All volumes available."
+```
+---
+## 5. Launch the EC2
+```bash
+# User data: format the EBS volumes on first boot, install docker, mount.
+cat > /tmp/userdata.sh <<'EOF'
+#!/bin/bash
+set -euxo pipefail
+apt-get update
+apt-get install -y docker.io docker-compose-v2 git xfsprogs
+# Wait for EBS volumes to attach (they're attached just after instance launch by AWS CLI below)
+for layer in l2 l3 l4 l5 l6; do
+  for i in {1..30}; do
+    if [ -e /dev/disk/by-label/$layer ] || lsblk -no NAME,SERIAL | grep -q "$layer"; then
+      break
+    fi
+    sleep 2
+  done
+done
+# Find each volume by tag (we'll attach by device name below; this just creates mount points)
+mkdir -p /var/lib/pme/{l2,l3,l4,l5,l6}
+# Format + mount each — done by per-volume systemd in step 6.5 below
+systemctl enable --now docker
+# Pull engine repo
+cd /opt
+git clone https://github.com/Pentatonic-Ltd/memory_stack_updated.git engine
+chown -R ubuntu:ubuntu /opt/engine
+EOF
+export INSTANCE_ID=$(aws ec2 run-instances \
+  --image-id $AMI_ID \
+  --instance-type $INSTANCE_TYPE \
+  --subnet-id $SUBNET_ID \
+  --security-group-ids $SG_ID \
+  --iam-instance-profile Name=$NAME-profile \
+  --block-device-mappings 'DeviceName=/dev/sda1,Ebs={VolumeSize=30,VolumeType=gp3}' \
+  --metadata-options 'HttpTokens=required,HttpEndpoint=enabled' \
+  --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$NAME}]" \
+  --user-data file:///tmp/userdata.sh \
+  --query 'Instances[0].InstanceId' --output text)
+aws ec2 wait instance-running --instance-ids $INSTANCE_ID
+echo "Instance $INSTANCE_ID is running."
+```
+### 5.1 Attach EBS volumes
+```bash
+aws ec2 attach-volume --volume-id $VOL_l2 --instance-id $INSTANCE_ID --device /dev/xvdf
+aws ec2 attach-volume --volume-id $VOL_l3 --instance-id $INSTANCE_ID --device /dev/xvdg
+aws ec2 attach-volume --volume-id $VOL_l4 --instance-id $INSTANCE_ID --device /dev/xvdh
+aws ec2 attach-volume --volume-id $VOL_l5 --instance-id $INSTANCE_ID --device /dev/xvdi
+aws ec2 attach-volume --volume-id $VOL_l6 --instance-id $INSTANCE_ID --device /dev/xvdj
+# Wait for all to attach
+for v in $VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6; do
+  aws ec2 wait volume-in-use --volume-ids $v
+done
+echo "All volumes attached."
+```
+---
+## 6. Mount EBS volumes inside the EC2
+Connect via SSM Session Manager:
+```bash
+aws ssm start-session --target $INSTANCE_ID
+```
+Then inside the instance:
+```bash
+# Format each volume (one-time)
+for pair in xvdf:l2 xvdg:l3 xvdh:l4 xvdi:l5 xvdj:l6; do
+  dev=${pair%:*}; layer=${pair#*:}
+  if ! sudo blkid /dev/$dev >/dev/null 2>&1; then
+    sudo mkfs.xfs -L $layer /dev/$dev
+  fi
+done
+# Add to /etc/fstab and mount
+for pair in xvdf:l2 xvdg:l3 xvdh:l4 xvdi:l5 xvdj:l6; do
+  dev=${pair%:*}; layer=${pair#*:}
+  uuid=$(sudo blkid -s UUID -o value /dev/$dev)
+  sudo mkdir -p /var/lib/pme/$layer
+  echo "UUID=$uuid /var/lib/pme/$layer xfs defaults,nofail 0 2" | sudo tee -a /etc/fstab
+done
+sudo systemctl daemon-reload
+sudo mount -a
+df -h /var/lib/pme/*
+# All five should show ~50G mounted, 49G available.
+```
+---
+## 7. Cloudflare Tunnel setup
+In the Cloudflare dashboard:
+1. **Zero Trust → Networks → Tunnels → Create a tunnel** (Cloudflared connector type)
+2. Name: `engine-prod-us-east-1`
+3. Save → copy the **tunnel token** (the `eyJ...` string).
+4. **Public hostnames** tab → Add:
+   - Subdomain: `engine`
+   - Domain: `pentatonic.internal` (or whatever internal CF zone you use)
+   - Type: HTTP, URL: `compat:8099`
+Copy the tunnel token; you'll set it as `CLOUDFLARED_TUNNEL_TOKEN` in `.env` below.
+> The hostname is reachable only by Workers/services in the same Cloudflare account by default. If you want to lock down further, attach a **Cloudflare Access policy** requiring a service token on the hostname — then set the service-token header in TES Workers' fetch calls. Optional for v1; can layer on later.
+---
+## 8. Configure and bring up the engine
+Back in the SSM session on the EC2:
+```bash
+cd /opt/engine
+# Pull the AWS overlay (PR'd separately to memory_stack_updated; for now copy it manually)
+# Once merged upstream, this file is part of the repo.
+sudo curl -fL -o docker-compose.aws.yml \
+  https://raw.githubusercontent.com/Pentatonic-Ltd/memory_stack_updated/main/docker-compose.aws.yml
+# Generate Neo4j password
+NEO4J_PASSWORD=$(openssl rand -base64 24 | tr -d '/+=')
+# Write .env (substitute values)
+cat | sudo tee .env <<EOF
+PME_PORT=8099
+NV_EMBED_URL=https://gateway.pentatonic.ai/v1/embeddings   # confirm exact URL with the gateway team
+PENTATONIC_AI_GATEWAY_KEY=<paste from secret store>
+CLOUDFLARED_TUNNEL_TOKEN=<paste from CF dashboard>
+NEO4J_PASSWORD=$NEO4J_PASSWORD
+EOF
+sudo chmod 600 .env
+# Bring up the stack
+sudo docker compose -f docker-compose.yml -f docker-compose.aws.yml up -d
+sudo docker compose ps
+```
+First run pulls images (~3-5 min) and builds engine images (~10-15 min). Subsequent restarts are fast.
+---
+## 9. Smoke test
+From your laptop or any TES dev environment with access to the CF zone:
+```bash
+curl -sf https://engine.pentatonic.internal/health | jq
+# Expected: {"status":"ok","layers":{"l0":"ok",...,"l6":"ok"},"engine":"pentatonic-memory-engine"}
+curl -sX POST https://engine.pentatonic.internal/store \
+  -H "content-type: application/json" \
+  -d '{"content":"hello from runbook smoke test","metadata":{"arena":"smoke"}}'
+curl -sX POST https://engine.pentatonic.internal/search \
+  -H "content-type: application/json" \
+  -d '{"query":"hello","limit":3,"min_score":0.001}' | jq
+```
+If `/search` returns the row from `/store`, end-to-end works.
+---
+## 10. AWS Backup
+```bash
+# Tag all volumes for the backup plan
+for v in $VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6; do
+  aws ec2 create-tags --resources $v --tags Key=Backup,Value=daily
+done
+# Backup plan: nightly snapshot, 14-day retention.
+# Easiest: AWS Backup console → Plan → "DailyBackup14Day" → resource selection by tag Backup=daily.
+# Or via CLI — see https://docs.aws.amazon.com/aws-backup/latest/devguide/creating-a-backup-plan.html
+```
+Run the restore drill at least once before going live: spin up a sibling instance, attach restored volumes, confirm engine comes back healthy.
+---
+## 11. CloudWatch alarms (recommended, not strictly v1)
+- EC2 instance status check failed → SNS alert
+- EBS volume usage > 80% → SNS alert
+- Engine `/health` failure (custom Lambda probe via the tunnel) → SNS alert
+---
+## 12. Resource summary
+| Resource | Identifier (filled at runtime) |
+|---|---|
+| Instance | `$INSTANCE_ID` (m6i.2xlarge) |
+| VPC / Subnet | `$VPC_ID` / `$SUBNET_ID` |
+| Security group | `$SG_ID` |
+| IAM role / profile | `$NAME-role` / `$NAME-profile` |
+| EBS volumes | `$VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6` (50 GiB gp3 each) |
+| Cloudflare Tunnel | `engine-prod-us-east-1` → `engine.pentatonic.internal` |
+Estimated v1 cost: **~$340/mo on-demand** (instance) + **~$20/mo** (5×50 GiB gp3) + AWS Backup snapshots (~$5-10/mo at 14-day retention) + data transfer (negligible from CF Tunnel).
+---
+## Teardown (if you need to recreate)
+```bash
+aws ec2 terminate-instances --instance-ids $INSTANCE_ID
+aws ec2 wait instance-terminated --instance-ids $INSTANCE_ID
+for v in $VOL_l2 $VOL_l3 $VOL_l4 $VOL_l5 $VOL_l6; do
+  aws ec2 delete-volume --volume-id $v
+done
+aws ec2 delete-security-group --group-id $SG_ID
+aws iam remove-role-from-instance-profile --instance-profile-name $NAME-profile --role-name $NAME-role
+aws iam delete-instance-profile --instance-profile-name $NAME-profile
+aws iam detach-role-policy --role-name $NAME-role --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
+aws iam delete-role --role-name $NAME-role
+```

package/packages/memory-engine/docs/why-v05-underperforms.md ADDED Viewed

@@ -0,0 +1,138 @@
+# Why `pentatonic-memory` v0.5.x underperforms on retrieval benchmarks
+This document explains the architectural reasons `pentatonic-memory` v0.5.x scores 17.6% on substring-graded retrieval benches. None of these are bugs — they are deliberate design decisions optimised for a different workload (chat-style fact recall over agent memory). They just happen to be the wrong defaults for general-purpose retrieval.
+The engine in this package addresses each one.
+## 1. Atom boost wins over source
+```js
+// pentatonic-memory v0.5.10/src/search.js
+const DEFAULT_WEIGHTS = {
+  ...
+  atomBoost: 0.15,         // ← 15% boost for distilled atomic facts
+  verbosityPenalty: 0.1,   // ← penalty for long raw content
+};
+```
+`distill.js` runs an LLM on every ingested memory and extracts "atomic facts." Those atoms are stored as separate rows linked back via `source_id`. Search then ranks atoms higher than their source via the boost.
+For chat-style queries ("what does Phil drink?") this works: the atom "Phil drinks cortado" is ranked above the raw turn "Yeah, oh hey Phil came over yesterday and he had a cortado…".
+For substring grading ("what was the price of thing-9001?") it backfires: the atom is "the user reported a sale event" and the raw "thing-9001 sold for $15.50 to buyer-42" gets dropped or out-ranked. The literal answer string is gone.
+**Engine default:** `atomBoost = 0`, `verbosityPenalty = 0`. Distillation is opt-in per query.
+## 2. `dedupeBySource` removes the right answer
+```js
+// pentatonic-memory v0.5.10/src/search.js, line 161
+if (opts.dedupeBySource !== false) {
+  const atomSources = new Set(
+    filtered.filter((r) => r.source_id).map((r) => r.source_id)
+  );
+  if (atomSources.size > 0) {
+    filtered = filtered.filter((r) => !atomSources.has(r.id));
+  }
+}
+```
+When an atom matches, its source raw is **dropped** from results. The thinking is "the atom contains the relevant fact, the source is redundant." For substring grading, the source contains the literal text the bench is looking for, while the atom is a paraphrase.
+**Engine default:** return both atom and source. Caller can dedupe if they want to reduce token spend.
+## 3. `minScore: 0.5` is too aggressive
+```js
+const threshold = opts.minScore ?? 0.5;
+```
+NV-Embed-v2 routinely produces cosine similarities of 0.30–0.45 for genuinely relevant chunks. The 0.5 default filters those out completely. The bench passes `min_score: 0.0001` to compensate, but real callers using SDK defaults silently lose recall.
+**Engine default:** `min_score: 0.001`. The CTE's relevance × recency × frequency formula handles ranking; let everything through and trust the ordering.
+## 4. No `/forget` endpoint
+```js
+// server.js routes:
+//   POST /search
+//   POST /store
+//   GET  /health
+//   (no /forget, no /memories)
+```
+v0.4.x had `/forget` and `/memories`. v0.5.x removed them. Without `/forget`:
+- Tests can't isolate runs (data accumulates across test suites)
+- Benches pollute each other's namespaces (we observed v0.5.6 going from 17.6% to 9.4% over 5 runs of pollution)
+- GDPR data deletion requests require direct Postgres access
+- Multi-tenant deployments can't enforce tenant boundaries via the SDK alone
+**Engine:** restored `/forget` with `id` and `metadata_contains` filters.
+## 5. No `/store-batch`
+Even though `ai.js` has an `embedBatch()` helper, the server only exposes single-record `/store`. Bulk ingest does N HTTP roundtrips, each with one synchronous embed call.
+For the bench harness, this means a 22-doc corpus takes ~25 minutes to ingest because every doc waits for an Ollama HyDE generation (60s default) plus an embed call.
+**Engine:** added `/store-batch`. One HTTP roundtrip, one batched embed call, one bulk INSERT. 30-50× faster on >5 records.
+## 6. HyDE generated at INGEST time
+```js
+// ingest.js — for every /store call:
+const hypothetical_queries = await llm.chat(/* generate 3-5 fake queries */);
+metadata.hypothetical_queries = hypothetical_queries;
+```
+This adds a 60s LLM call to every ingest. Worse, the queries are generated against the *content*, not the user's actual query — so they tend to be generic ("what is the topic of this document"), not useful for matching at search time.
+**Engine:** HyDE runs at SEARCH time against the user's actual query. Each search generates 3 hypothetical answers, embeds each, runs vector search per embedding, and RRF-fuses the rank lists. Better matching, no ingest blocking.
+## 7. No content chunking
+v0.5.x stores a 10,000-token document as one row with one 4096-d embedding. The vector represents the *average* meaning of the document, washing out specific facts.
+**Engine:** chunks at ingest into ~200-500 token segments, each with its own embedding and `chunk_index`. Search returns chunks; downstream caller can hydrate the parent document if needed.
+## 8. No reranker
+v0.5.x's `search.js` returns top-K directly from the SQL CTE score. No second-pass reranker.
+**Engine:** L6 doc-store runs a `ms-marco-MiniLM-L-6-v2` cross-encoder over the top-50 from initial retrieval, then returns top-K. Substantially better precision on questions that need exact term matching after broad recall.
+## 9. No graph / entity layer
+v0.5.x doesn't extract entities at ingest, doesn't build relationships, can't answer multi-hop questions ("who owns thing-X" → "find listings where X was sold" → "fetch buyer's contact").
+**Engine:** L3 Knowledge Graph (Neo4j Community) extracts entities at ingest, builds edges between co-occurring entities, and at search time boosts rows that mention the same entities as the query. Critical for the marketplace-ops and customer-support benches.
+## 10. Single vector store, single embedding per row
+v0.5.x writes one row per memory with one embedding column in pgvector. The HNSW index doesn't work above 2000 dimensions, so 4096-d NV-Embed embeddings fall back to sequential scan. At >100k memories, that's >100ms per query.
+**Engine:** indexes the same content into multiple stores in parallel:
+- L0 BM25 (SQLite FTS5)
+- L4 sqlite-vec (small, in-process)
+- L5 Milvus (medium, dedicated)
+- L6 doc-store (with reranker)
+- L3 KG (relationship-pivoted)
+Search runs all five in parallel, RRF-fuses the rank lists, applies reranker on top-50. Different query types win on different layers — the fusion absorbs the strengths of each.
+## Summary
+| Gap | Bench impact (estimated) | Fix complexity |
+|---|---|---|
+| 1. atomBoost +0.15 | -15-20pp | trivial (config flag) |
+| 2. dedupeBySource: true | -5-10pp | trivial (config flag) |
+| 3. minScore: 0.5 default | -3-8pp | trivial (config change) |
+| 4. No /forget | n/a but blocks tests | trivial (10 LOC) |
+| 5. No /store-batch | n/a but blocks bench (~25 min ingest) | low (50 LOC) |
+| 6. HyDE at ingest time | -5-10pp + 60s/store | medium (refactor) |
+| 7. No chunking | -5-15pp on long docs | medium (schema change) |
+| 8. No reranker | -5-10pp | medium (sidecar service) |
+| 9. No graph layer | -5-10pp on entity queries | high (new schema + extraction) |
+| 10. Single vector store | -10-20pp, latency at scale | high (parallel infrastructure) |
+This package addresses 1-10 simultaneously by routing through the 7-layer engine, recovering ~65pp of the gap.

package/packages/memory-engine/engine/README.md ADDED Viewed

@@ -0,0 +1,52 @@
+# engine/
+Bundled engine layers for the Pentatonic Memory Engine.
+| File | Layer | LOC | Purpose |
+|---|---|---|---|
+| `l2-hybridrag-proxy.py` | L2 | ~1.5k | RRF fusion across all layers, exposed on `:8031` |
+| `l5-comms-layer.py` | L5 | ~0.7k | Milvus comms layer for chat/email/contact/memory collections, exposed on `:8034` |
+| `l6-document-store.py` | L6 | ~1.5k | Document store + cross-encoder reranker, exposed on `:8037` |
+| `services/nv-embed/server.py` | — | ~150 | NV-Embed-v2 4096-dim embedding service, exposed on `:8041` |
+## pme_memory SDK
+The `pme_memory/` package at the repo root is an installable Python SDK for the L5 communications layer. It provides:
+- **store.py** — Milvus client and collection management (chats, emails, contacts, memory)
+- **search.py** — Semantic search across collections
+- **embed.py** — Dual-stack embedding (NV-Embed-v2 primary, Ollama fallback)
+- **indexer.py** — Data ingestion pipeline (JSONL chats, email archives, contacts, memory files)
+- **scoring.py** — Pressure scoring for need signals (recency, novelty, centrality, priority)
+- **synthesis.py** — Deterministic multi-parent artifact merge
+- **artifacts.py** — Append-only artifact DAG store (JSONL)
+- **hygiene.py** — DAG maintenance (dedup, conflict detection, orphan pruning)
+- **health.py** — L5 health check
+- **needs.py** — Need signal indexing
+- **provenance.py** — Lineage visualization
+Install: `pip install -e ".[full]"` — CLI: `pme-memory health|stats|index|search|serve`
+## KG Extraction Scripts
+The `scripts/` directory contains Knowledge Graph population tools:
+- **kg-extractor.py** — spaCy + regex entity/relationship extraction from memory files → Neo4j
+- **kg-preflexor-v2.py** — 2-pass concurrent LLM-based extraction via Ollama (14 structured entity types + native graph discovery)
+## Where L0, L3 and the embedding service live
+- **L0 BM25** — provided by SQLite FTS5; the L2 proxy queries it directly via `sqlite3`. No separate service binary.
+- **L3 Knowledge Graph** — provided by Neo4j Community (free, OSS) running in a sibling container. The proxy queries it via the bolt protocol on `:7687`.
+- **NV-Embed-v2 embedding service** — see `services/nv-embed/` for the Docker context. Exposes the OpenAI-compatible `/v1/embeddings` endpoint on `:8041`.
+## Dependencies
+Each service has its own `requirements.txt` in `services/<layer>/`. Common heavy deps:
+- `pymilvus>=2.6.12` (L5)
+- `sentence-transformers` (L6 reranker, NV-Embed)
+- `httpx`, `fastapi`, `uvicorn` (all)
+- `spacy` (L6 entity extraction)
+NV-Embed needs Torch + the model weights (auto-downloaded on first run from Hugging Face).