npm - @adaptic/maestro - Versions diffs - 1.1.6 → 1.1.8 - Mend

@adaptic/maestro 1.1.6 → 1.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/.claude/commands/init-maestro.md +225 -279
package/README.md +19 -2
package/docs/guides/email-setup.md +399 -0
package/docs/guides/media-generation-setup.md +349 -0
package/docs/guides/outbound-governance-setup.md +438 -0
package/docs/guides/pdf-generation-setup.md +315 -0
package/docs/guides/poller-daemon-setup.md +550 -0
package/docs/guides/rag-context-setup.md +459 -0
package/docs/guides/slack-setup.md +348 -0
package/docs/guides/voice-sms-setup.md +698 -0
package/docs/guides/whatsapp-setup.md +282 -0
package/docs/runbooks/mac-mini-bootstrap.md +21 -0
package/package.json +1 -1
package/scaffold/config/caller-id-map.yaml +46 -0
package/scripts/media-generation/README.md +2 -0
package/scripts/pdf-generation/README.md +2 -0
package/scripts/poller/slack-poller.mjs +22 -7
package/scripts/poller/trigger.mjs +12 -1
package/scripts/setup/boot-claude-session.sh +4 -8
package/scripts/setup/configure-macos.sh +8 -4

package/docs/guides/rag-context-setup.md ADDED Viewed

@@ -0,0 +1,459 @@
+# RAG & Context Retrieval Setup Guide
+How the agent's memory and context retrieval system works: the SQLite FTS5 search index, per-user access-scoped search, pre-draft context enrichment, post-interaction entity extraction, and the integration points with every send script.
+**Prerequisites**: Complete the [Mac Mini Bootstrap](../runbooks/mac-mini-bootstrap.md). The RAG system uses only Python standard library plus optional PyYAML — no vector database or embedding API required.
+---
+## Architecture Overview
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│  INDEXING (write path)                                               │
+│                                                                      │
+│  ┌──────────────────────┐    ┌─────────────────────────────────┐    │
+│  │ rag-indexer.py        │───▶│ state/rag/search.db             │    │
+│  │ (incremental FTS5)    │    │ (SQLite FTS5 full-text search)  │    │
+│  └──────────────────────┘    └─────────────────────────────────┘    │
+│         ▲                                                            │
+│         │ sources:                                                   │
+│         ├── memory/interactions/**/*.jsonl  (Slack, email, calls)    │
+│         ├── knowledge/sources/*.yaml       (public knowledge)        │
+│         ├── knowledge/decisions/*.yaml     (leadership+)             │
+│         ├── knowledge/syntheses/*.yaml     (strategy = ceo-only)     │
+│         └── state/queues/*.yaml            (leadership+)             │
+│                                                                      │
+│  ┌──────────────────────┐    ┌─────────────────────────────────┐    │
+│  │ post-interaction-     │───▶│ memory/indexes/                 │    │
+│  │ indexer.py            │    │   entity-relationships.yaml     │    │
+│  │ (entity extraction)   │    │ (people, orgs, topics, facts)   │    │
+│  └──────────────────────┘    └─────────────────────────────────┘    │
+├─────────────────────────────────────────────────────────────────────┤
+│  RETRIEVAL (read path)                                               │
+│                                                                      │
+│  ┌──────────────────────┐    ┌─────────────────────────────────┐    │
+│  │ user-context-search   │◀──│ config/caller-id-map.yaml       │    │
+│  │ .py                   │    │ (user → access level mapping)   │    │
+│  │ (grep or sqlite)      │    └─────────────────────────────────┘    │
+│  └──────────────────────┘                                            │
+│                                                                      │
+│  ┌──────────────────────┐    ┌─────────────────────────────────┐    │
+│  │ pre-draft-context.py  │───▶│ Entity context for recipient    │    │
+│  │ (fact retrieval)      │    │ (< 500ms, file reads only)      │    │
+│  └──────────────────────┘    └─────────────────────────────────┘    │
+│         │                                                            │
+│         ▼                                                            │
+│  ┌──────────────────────┐    Integrated into every send script:     │
+│  │ pre_draft_lookup.py   │    send-email-threaded.py, slack-send.sh │
+│  │ (wrapper + audit log) │    send-whatsapp.sh, send-sms.sh         │
+│  └──────────────────────┘                                            │
+└─────────────────────────────────────────────────────────────────────┘
+```
+**Key design choice**: The RAG system uses **SQLite FTS5** (full-text search) rather than vector embeddings. This means zero external dependencies, sub-millisecond search latency, no API calls during retrieval, and a database that's a single file you can back up with `cp`. The trade-off is keyword-based matching rather than semantic search — but for structured operational data (names, topics, dates), keyword search is more reliable.
+---
+## 1. The Search Index (`rag-indexer.py`)
+### 1.1 What It Indexes
+| Source | Path Pattern | User Scope | Content |
+|---|---|---|---|
+| Interactions | `memory/interactions/**/*.jsonl` | Per-user (DMs) or channel-scoped | Slack messages, emails, call transcripts |
+| Public knowledge | `knowledge/sources/*.yaml` | `public` | Company facts, product info |
+| Decisions | `knowledge/decisions/*.yaml` | `leadership` | Strategic decisions with rationale |
+| Syntheses | `knowledge/syntheses/*.yaml` | `public` (except strategy-state = `ceo`) | Analysis summaries |
+| Queues | `state/queues/*.yaml` | `leadership` | Action items, follow-ups, blockers |
+### 1.2 User Scope (Ring-Fencing)
+Every indexed document gets a `user_scope` that controls who can retrieve it:
+| Scope | Visible To | Example Content |
+|---|---|---|
+| `public` | All authenticated users | Company overview, public knowledge |
+| `leadership` | Leadership + CEO | Decisions, queue items, internal docs |
+| `ceo` | CEO only | Strategy state, all interactions, all logs |
+| `{user-slug}` | That specific user only | Their DMs, their emails |
+This ensures that when the agent is on a voice call with a partner, it can't accidentally surface confidential CEO-only information.
+### 1.3 Database Schema
+SQLite database at `state/rag/search.db`:
+```sql
+-- Main document table
+documents (
+    id TEXT PRIMARY KEY,          -- SHA-256 hash of source_path + content
+    source_type TEXT NOT NULL,    -- "interaction", "knowledge", "decision", etc.
+    source_path TEXT NOT NULL,    -- Relative path to source file
+    user_scope TEXT NOT NULL,     -- Access scope (public, leadership, ceo, user-slug)
+    content TEXT NOT NULL,        -- Full text content
+    title TEXT,                   -- Document title (if available)
+    timestamp TEXT,               -- ISO 8601 timestamp
+    metadata_json TEXT            -- Additional metadata as JSON
+);
+-- FTS5 virtual table (full-text search)
+documents_fts USING fts5(content, title);
+```
+Triggers keep the FTS index in sync with the main table automatically.
+### 1.4 Running the Indexer
+```bash
+# Incremental index (only changed files since last run)
+python3 scripts/rag-indexer.py
+# Full re-index (drop and rebuild everything)
+python3 scripts/rag-indexer.py --full
+# Show index statistics
+python3 scripts/rag-indexer.py --stats
+# Custom database path
+python3 scripts/rag-indexer.py --db /path/to/custom.db
+```
+### 1.5 Incremental Indexing
+The indexer tracks file modification times in `state/rag/index-state.json`. On each run, it only re-indexes files that have changed since the last run. A `--full` flag forces a complete rebuild.
+### 1.6 Performance
+- WAL mode enabled (`PRAGMA journal_mode=WAL`) for concurrent read/write
+- Synchronous mode set to NORMAL for faster writes
+- Typical index time: < 5 seconds for incremental, < 30 seconds for full rebuild on ~10K documents
+---
+## 2. User Context Search (`user-context-search.py`)
+Ring-fenced search that respects user access levels.
+### 2.1 How It Works
+1. Looks up the user in `config/caller-id-map.yaml` to determine access level
+2. Maps access level to searchable paths/scopes
+3. Executes search using one of two backends:
+   - **grep** (default): File-based grep with scoped directory paths
+   - **sqlite**: FTS5 query against the pre-indexed database
+### 2.2 Usage
+```bash
+# Search as CEO (full access)
+python3 scripts/user-context-search.py --user mehran --query "DFSA submission"
+# Search as leadership (scoped access)
+python3 scripts/user-context-search.py --user hootan --query "portal" --max-results 5
+# Search as unknown user (public only)
+python3 scripts/user-context-search.py --user unknown --query "adaptic"
+# Use SQLite FTS5 backend
+python3 scripts/user-context-search.py --user mehran --query "flight" --backend sqlite
+# Text output format (vs JSON default)
+python3 scripts/user-context-search.py --user mehran --query "test" --format text
+```
+### 2.3 Access Level → Searchable Paths
+| Access Level | Directories Searched |
+|---|---|
+| `ceo` | Everything: `memory/`, `knowledge/`, `state/`, `outputs/`, `docs/`, `logs/` |
+| `leadership` | `knowledge/` (company), `docs/`, research/briefs, own interaction logs |
+| `partner` | `knowledge/sources/` (public), research, own interaction logs |
+| `default` | `knowledge/sources/` only |
+### 2.4 User Resolution
+Users can be identified by:
+- Username slug (e.g. `mehran`)
+- Slack ID (e.g. `U099N1JGKRQ`)
+- Email address (e.g. `mehran@adaptic.ai`)
+- Phone number (e.g. `+971585291799`)
+All resolve to the same user entry in `caller-id-map.yaml`.
+### 2.5 No External Dependencies
+The search script uses only Python standard library. PyYAML is used if available (faster), but a built-in YAML parser handles the `caller-id-map.yaml` structure if PyYAML is missing.
+---
+## 3. Pre-Draft Context (`pre-draft-context.py`)
+Retrieves relevant context about a recipient **before** composing a message. This is the core of the "know before you write" principle.
+### 3.1 What It Retrieves
+1. **Entity index lookup** — Finds the recipient in `memory/indexes/entity-relationships.yaml`
+2. **User profile** — Loads `memory/profiles/users/{slug}.yaml` for standing instructions, preferences, tone
+3. **Recent interactions** — Checks `memory/interactions/` for recent conversation history
+4. **Queue context** — Checks `state/queues/` for related open items
+5. **Disclosure boundaries** — Generates per-recipient information boundaries
+### 3.2 Usage
+```bash
+# Full JSON context
+python3 scripts/pre-draft-context.py --recipient "Graham Syder"
+# With topic focus
+python3 scripts/pre-draft-context.py --recipient "Hootan" --topic "DFSA compliance"
+# Brief format (3-5 lines, suitable for prompt injection)
+python3 scripts/pre-draft-context.py --recipient "Nima" --topic "83b election" --format brief
+# By email address
+python3 scripts/pre-draft-context.py --recipient "hootan@adaptic.ai" --format brief
+```
+### 3.3 Performance
+Target: **< 500ms** per lookup. Achieves this by:
+- File reads only (no API calls)
+- JSON cache of entity index (`.entity-relationships.json` alongside the YAML)
+- PyYAML C loader when available (10x faster than pure Python)
+### 3.4 Exit Codes
+| Code | Meaning |
+|---|---|
+| 0 | Context found |
+| 1 | Recipient not found |
+| 2 | Error |
+---
+## 4. Pre-Draft Lookup Wrapper (`pre_draft_lookup.py`)
+Thin wrapper around `pre-draft-context.py` designed for import into send scripts.
+### 4.1 What It Adds
+1. Calls `retrieve_context()` from `pre-draft-context.py`
+2. Generates disclosure boundaries for the recipient (via `disclosure_boundaries.py`)
+3. Logs the lookup result to `logs/audit/YYYY-MM-DD-pre-draft-lookups.jsonl`
+4. Returns context dict or `None` — **never blocks a send**
+### 4.2 Python Import
+```python
+from pre_draft_lookup import pre_draft_lookup
+context = pre_draft_lookup("Mehran Granfar", message_type="slack")
+if context:
+    # context contains entity info, user profile, recent interactions
+    pass
+```
+### 4.3 CLI Usage (for Shell Scripts)
+```bash
+python3 scripts/pre_draft_lookup.py --recipient "Mehran Granfar" --type slack --channel C099ABC
+```
+### 4.4 Integration Points
+Called automatically by these send scripts:
+- `send-email-threaded.py` — imported as Python module
+- `send-email-with-attachment.py` — imported as Python module
+- `slack-send.sh` — called as subprocess before sending
+- `send-whatsapp.sh` — called via `validate-outbound.py` pipeline
+The lookup is **advisory only** — it enriches the agent's context but never prevents a send.
+---
+## 5. Post-Interaction Indexer (`post-interaction-indexer.py`)
+Automatically extracts entities, relationships, and facts from interactions and updates the entity index.
+### 5.1 What It Extracts
+From each interaction (email, Slack message, call transcript):
+- **People** mentioned (names, titles, roles)
+- **Organisations** referenced
+- **Topics** discussed
+- **Facts** stated (dates, decisions, commitments)
+- **Relationships** between entities
+### 5.2 When It Runs
+| Trigger | How |
+|---|---|
+| Session end | `session-end-log.sh` hook spawns it in background with `--scan-today` |
+| After inbox processing | Inbox processor calls it per-item |
+| After backlog execution | Backlog executor calls it for interaction-heavy tasks |
+| Manual backfill | Run with `--input` for specific files |
+### 5.3 Usage
+```bash
+# Process a specific inbox item
+python3 scripts/post-interaction-indexer.py --input state/inbox/slack/1775230363.978999-dm.json
+# Process all unprocessed interactions from today
+python3 scripts/post-interaction-indexer.py --scan-today
+# Dry run (show extractions without writing)
+python3 scripts/post-interaction-indexer.py --input <file> --dry-run
+# Process from stdin
+echo '{"event_type":"email","email":{"from":"Jane <jane@co.com>"}}' | \
+    python3 scripts/post-interaction-indexer.py --stdin
+```
+### 5.4 Output
+Updates `memory/indexes/entity-relationships.yaml` with new entities and relationships. Uses a JSON cache (`.entity-relationships.json`) for fast reads.
+### 5.5 Performance Target
+< 2 seconds per interaction (regex-based extraction, no API calls).
+---
+## 6. Directory Structure
+```
+state/rag/
+  search.db              # SQLite FTS5 database
+  index-state.json       # File modification tracking for incremental indexing
+memory/
+  interactions/          # Raw interaction logs (JSONL)
+    slack/
+      {channel}/{date}.jsonl
+    email/
+      {user}/{date}.jsonl
+  indexes/
+    entity-relationships.yaml   # Extracted entities and relationships
+    .entity-relationships.json  # JSON cache for fast reads
+  profiles/
+    users/{slug}.yaml    # Per-user preferences, standing instructions
+    channels/{name}.yaml # Per-channel tone, rules
+knowledge/
+  sources/               # Public company knowledge (yaml)
+  decisions/             # Strategic decisions (leadership+)
+  syntheses/             # Analysis summaries
+config/
+  caller-id-map.yaml     # User → access level mapping
+```
+---
+## 7. Testing
+### 7.1 RAG Search Tests
+```bash
+# Run the full RAG search test suite
+./scripts/test-rag-search.sh
+```
+Tests:
+1. CEO (mehran) gets broad results
+2. Leadership (hootan) gets scoped results (no private CEO data)
+3. Unknown user gets minimal access (knowledge/sources only)
+4. Slack ID resolution works
+5. Email resolution works
+### 7.2 Pre-Draft Integration Test
+```bash
+python3 scripts/test-pre-draft-integration.py
+```
+### 7.3 Manual Verification
+| # | Test | How to Verify |
+|---|---|---|
+| 1 | Index builds | `python3 scripts/rag-indexer.py --stats` — should show document counts |
+| 2 | Incremental works | Run indexer twice; second run should skip unchanged files |
+| 3 | Full rebuild | `python3 scripts/rag-indexer.py --full` — should recreate DB |
+| 4 | CEO search | `python3 scripts/user-context-search.py --user mehran --query "test"` |
+| 5 | Scoped search | Leadership user shouldn't see CEO-only docs |
+| 6 | Pre-draft context | `python3 scripts/pre-draft-context.py --recipient "Mehran"` |
+| 7 | Lookup wrapper | `python3 scripts/pre_draft_lookup.py --recipient "Mehran" --type slack` |
+| 8 | Post-interaction | `python3 scripts/post-interaction-indexer.py --scan-today` |
+| 9 | Audit logging | Check `logs/audit/YYYY-MM-DD-pre-draft-lookups.jsonl` |
+---
+## 8. Troubleshooting
+### "No results found" for a query you know exists
+1. Check the index is built: `python3 scripts/rag-indexer.py --stats`
+2. If stats show 0 documents, run a full index: `python3 scripts/rag-indexer.py --full`
+3. Check the content is in an indexed directory (see section 1.1)
+4. Try the grep backend: `--backend grep` may find files not yet indexed
+### Recipient not found in pre-draft context
+1. Check entity index exists: `ls memory/indexes/entity-relationships.yaml`
+2. Run the post-interaction indexer: `python3 scripts/post-interaction-indexer.py --scan-today`
+3. Check if the recipient has a user profile: `ls memory/profiles/users/`
+4. Names must match — try variations (full name, email, slug)
+### Search too slow
+1. Use the sqlite backend instead of grep: `--backend sqlite`
+2. Rebuild the FTS index: `python3 scripts/rag-indexer.py --full`
+3. Check DB size: `ls -lh state/rag/search.db`
+### PyYAML not available
+The search script works without PyYAML using a built-in minimal parser. However, the post-interaction indexer **requires** PyYAML:
+```bash
+pip3 install pyyaml
+```
+### Index state corrupted
+Delete and rebuild:
+```bash
+rm state/rag/index-state.json
+python3 scripts/rag-indexer.py --full
+```
+---
+## Key Files
+| File | Purpose |
+|---|---|
+| `scripts/rag-indexer.py` | SQLite FTS5 indexer (incremental + full rebuild) |
+| `scripts/user-context-search.py` | Per-user ring-fenced search (grep or sqlite) |
+| `scripts/pre-draft-context.py` | Pre-send recipient context retrieval |
+| `scripts/pre_draft_lookup.py` | Wrapper for send script integration + audit logging |
+| `scripts/post-interaction-indexer.py` | Entity/relationship extraction from interactions |
+| `scripts/test-rag-search.sh` | RAG search test suite |
+| `scripts/test-rag-phase2.sh` | Phase 2 (FTS5) test suite |
+| `scripts/test-pre-draft-integration.py` | Pre-draft integration tests |
+| `config/caller-id-map.yaml` | User → access level mapping |
+| `state/rag/search.db` | SQLite FTS5 database |
+| `state/rag/index-state.json` | Incremental indexing state |
+| `memory/indexes/entity-relationships.yaml` | Extracted entity index |
+---
+## Related Documents
+- [Outbound Governance Setup](outbound-governance-setup.md) — How pre-draft context feeds into disclosure assessment
+- [Email Setup](email-setup.md) — Email send scripts that consume pre-draft context
+- [Slack Setup](slack-setup.md) — Slack send script pre-draft integration
+- [Voice & SMS Setup](voice-sms-setup.md) — Caller ID mapping shared with RAG access control
+- [Poller & Daemon Setup](poller-daemon-setup.md) — Session-end hook triggers post-interaction indexer