PyPI - mneme-cli - Versions diffs - 0.4.0__tar.gz → 0.5.0__tar.gz - Mend

mneme-cli 0.4.0tar.gz → 0.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

{mneme_cli-0.4.0/mneme/templates/workspace → mneme_cli-0.5.0}/AGENTS.md RENAMED Viewed

@@ -69,7 +69,7 @@ A mneme workspace is a directory. Its shape is stable across versions:
     graph.json       relationship graph
     tags.json        tag registry
     traceability.json  trace links between pages
-  memvid/            optional .mv2 archives (semantic search)
+  search.db          SQLite FTS5 search index (rebuilt from wiki)
   profiles/          workspace-local profiles and CSV mappings
     mappings/        JSON column mappings for ingest-csv
   exports/           JSON / markdown exports
@@ -107,7 +107,7 @@ mneme tornado --client <client>          # batch from inbox/
 ```
 `ingest` is atomic: it writes the wiki page, updates the schema, and
-advances the Memvid archive in one operation. `ingest-csv` produces one
+indexes the page in SQLite FTS5 in one operation. `ingest-csv` produces one
 wiki page per row, with trace links derived from the mapping. `tornado`
 is a bulk inbox processor — it auto-detects page type and routes CSVs
 through `ingest-csv`, everything else through `ingest`.
@@ -207,6 +207,44 @@ baseline, your current wiki page, and a fresh ingest of the new source.
 If there are conflicts, the page is left with merge markers. Edit them
 out manually, then run `resync-resolve`.
+### 3.6 TAG — agent-driven tagging
+```bash
+mneme tags suggest <client>/<page>                     # build tag packet
+mneme tags suggest <client>/<page> --json              # raw dict
+mneme tags apply <client>/<page> --add t1,t2 --remove t3
+```
+`mneme tags suggest` builds a **tag packet**: the page content, current
+tags, the workspace tag taxonomy (every existing tag with usage counts),
+active profile guidance, and a ready-to-paste prompt instructing you to
+choose 3–7 tags. Mneme does **not** propose tags itself — content
+understanding is your job. The packet gives you all the context you need.
+Your contract when consuming a tag packet:
+1. **Prefer existing tags** from the taxonomy when they fit. Consistency
+   matters more than novelty — `iso-13485` should not become `iso13485`
+   on the next page.
+2. **Add new tags only** when no existing tag captures the concept.
+3. Follow the format: lowercase, hyphenated (`risk-management`, not
+   `Risk Management`).
+4. Do not propose generic tags (`summary`, `overview`, `report`).
+5. Do not add the client slug — it is auto-applied.
+6. Output JSON: `{"tags": ["existing-a", "existing-b"], "new_tags": ["proposed-c"]}`.
+`mneme tags apply` is **atomic**: it rewrites the wiki page frontmatter,
+updates `schema/tags.json`, re-syncs the page to the FTS5 index, and
+appends a log entry — all in one operation. Search picks up the new tags
+immediately. Use `--add` and/or `--remove`, comma-separated.
+Existing taxonomy ops:
+```bash
+mneme tags list                                        # all tags + counts
+mneme tags merge <old> <new>                           # rename across all pages
+```
 ---
 ## 4. Profiles and the writing-style contract

{mneme_cli-0.4.0 → mneme_cli-0.5.0}/CHANGELOG.md RENAMED Viewed

@@ -4,6 +4,60 @@ All notable changes to this project are documented here.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.5.0] - 2026-04-13
+### Breaking Changes
+- **Replaced memvid-sdk with SQLite FTS5.** The `memvid/` directory and `.mv2`
+  archives are no longer used. Search is now powered by a local `search.db`
+  file using BM25 ranking with Porter stemming. **Zero external dependencies**
+  for search — `sqlite3` is in the Python stdlib.
+- `mneme repair` now rebuilds the FTS5 index instead of memvid archives.
+- `mneme drift` reports `unindexed` / `orphaned` / `stale` instead of
+  `missing_from_memvid` / `orphan_frames`.
+- `get_stats()` returns a `search` key (page_count, db_size_bytes,
+  search_latency_ms) instead of `memvid`.
+- `sync_page_to_memvid()` renamed to `sync_page_to_index()`. Returns
+  `bool` (indexed) instead of `int` (frame count).
+- Removed `chunk_body()`, `_sanitize_memvid_query()`, and all chunking
+  config (`MAX_CHUNK_SIZE`, `MIN_CHUNK_SIZE`, `MAX_CHUNKS_PER_INGEST`,
+  `CHUNK_COMMIT_BATCH`).
+- Removed `MEMVID_DIR`, `MASTER_MV2`, `PER_CLIENT_DIR` config constants.
+  Replaced by `SEARCH_DB`.
+### Added
+- **`mneme reindex`** command — rebuild search index from wiki pages.
+- **`ingest-dir --recursive` / `-r`** — recurse into subdirectories.
+- **`ingest-dir --preserve-structure`** — mirror source directory structure
+  in wiki subdirectories (avoids dedup collisions between same-basename files
+  in different directories).
+- **`ingest-csv --delimiter`** flag with auto-detection via `csv.Sniffer`.
+- **`.xlsx` ingest support** — install with `pip install "mneme-cli[xlsx]"`.
+  Sheets are rendered as markdown tables.
+- **`mneme trace matrix --csv [--out FILE]`** — export the trace matrix as
+  CSV for QMS audits and DHF inclusion.
+- **`graph.json` auto-populated** during ingest from wiki page wikilinks
+  and `related` frontmatter.
+- **`stats` relationship count** now includes traceability.json links, not
+  just graph.json edges.
+- **log.md rotation** — entries beyond `LOG_MAX_ENTRIES` (default 500) are
+  archived to `log-archive-YYYY-MM-DD.md`.
+### Fixed
+- `mneme status` crash (UnboundLocalError on `log_content`).
+- CSV ingest crash on `None` cells (`row.get()` returning None).
+- Duplicate ingest detection now uses full source path, not just filename
+  (two `INSTRUCTIONS.md` files in different directories now both ingest).
+### Removed
+- `memvid-sdk` dependency.
+- `MNEME_NO_MEMVID` env var (no longer needed — FTS5 is always available).
+- Chunking logic (`chunk_body`, `MAX_CHUNK_SIZE`, frame management).
+- Tantivy-reserved-word query sanitizer (FTS5 has different syntax).
 ## [Unreleased]
 ### Added

{mneme_cli-0.4.0 → mneme_cli-0.5.0}/FEATURES.md RENAMED Viewed

@@ -17,13 +17,14 @@
 | `mneme new` | Scaffold a new workspace from the bundled template (preferred over `init`) |
 | `mneme init` | Scaffold a workspace in cwd (legacy) |
 | `mneme --workspace <dir>` / `MNEME_HOME=<dir>` | Run any command against a specific workspace |
-| `mneme ingest` | Atomic ingest: source -> wiki + Memvid + schema |
+| `mneme ingest` | Atomic ingest: source -> wiki + FTS5 index + schema |
 | `mneme resync` | Diff-aware re-ingest: 3-way merge (baseline / wiki / fresh ingest) via `git merge-file` |
 | `mneme resync-resolve` | Mark a conflicted resync page as resolved after editing out markers |
 | `mneme ingest-dir` | Batch ingest all files from a directory |
 | `mneme search` | Dual-layer search with `--client` scoping |
 | `mneme lint` | Health check: orphan pages, dead links, stale pages, citations, schema drift, coverage |
-| `mneme sync` | Sync wiki pages to Memvid |
+| `mneme sync` | Sync wiki pages to FTS5 search index |
+| `mneme reindex` | Rebuild search index from wiki pages |
 | `mneme drift` | Detect layer desynchronization |
 | `mneme stats` | Health overview |
 | `mneme repair` | Fix corrupted archives and schema |
@@ -31,6 +32,8 @@
 | `mneme recent` | Show last N activity log entries |
 | `mneme tags list` | List all tags with page counts |
 | `mneme tags merge` | Merge one tag into another across all pages |
+| `mneme tags suggest <page>` | Build a *tag packet* for an LLM agent (page content + taxonomy + prompt) |
+| `mneme tags apply <page> --add t1,t2 --remove t3` | Atomic tag update: rewrites frontmatter, updates schema/tags.json, re-syncs FTS5 index |
 | `mneme diff` | Git-aware diff for a wiki page |
 | `mneme snapshot` | Versioned zip archive of a client + git tag |
 | `mneme dedupe` | Detect near-duplicate wiki pages |
@@ -54,7 +57,7 @@
 | `mneme scan-repo` | Scan code repo, compare against QMS docs, find gaps |
 | `mneme tornado` | Inbox processor: auto-detect type/client, ingest, archive to sources |
 | `mneme ingest-csv` | CSV ingest: one row = one wiki page, with column-to-frontmatter mapping and auto trace links |
-| `mneme demo clean` | Remove all demo content: demo-retail client, demo/ folder, schema entries, memvid manifest, index/log entries |
+| `mneme demo clean` | Remove all demo content: demo-retail client, demo/ folder, schema entries, search index entries, index/log entries |
 ### Web UI (localhost:3141)

{mneme_cli-0.4.0 → mneme_cli-0.5.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mneme-cli
-Version: 0.4.0
+Version: 0.5.0
 Summary: Mnemosyne - CLI tool that turns documents into a searchable second brain. Ingest once, query forever.
 Author-email: Tolis Moustaklis <apostolos.moustaklis@gmail.com>
 License-Expression: MIT
@@ -9,7 +9,7 @@ Project-URL: Repository, https://github.com/tolism/mneme
 Project-URL: Issues, https://github.com/tolism/mneme/issues
 Project-URL: Documentation, https://github.com/tolism/mneme#readme
 Project-URL: Changelog, https://github.com/tolism/mneme/blob/main/CHANGELOG.md
-Keywords: knowledge-management,second-brain,cli,wiki,memvid,llm,qms,obsidian,traceability
+Keywords: knowledge-management,second-brain,cli,wiki,sqlite,fts5,llm,qms,obsidian,traceability
 Classifier: Development Status :: 3 - Alpha
 Classifier: Environment :: Console
 Classifier: Intended Audience :: Developers
@@ -28,12 +28,14 @@ Classifier: Topic :: Text Processing :: Markup :: Markdown
 Requires-Python: >=3.9
 Description-Content-Type: text/markdown
 License-File: LICENSE
-Requires-Dist: memvid-sdk>=2.0.0
 Requires-Dist: portalocker>=2.0.0
 Provides-Extra: pdf
 Requires-Dist: pymupdf>=1.23.0; extra == "pdf"
+Provides-Extra: xlsx
+Requires-Dist: openpyxl>=3.1.0; extra == "xlsx"
 Provides-Extra: all
 Requires-Dist: pymupdf>=1.23.0; extra == "all"
+Requires-Dist: openpyxl>=3.1.0; extra == "all"
 Provides-Extra: release
 Requires-Dist: build>=1.0.0; extra == "release"
 Requires-Dist: twine>=5.0.0; extra == "release"
@@ -163,15 +165,18 @@ One installed CLI serves many projects — each workspace is just a directory.
 | `mneme search "<query>"` | Search across all layers |
 | `mneme draft --doc-type <t> --section <s> --client <c>` | Build a *write packet* for an LLM agent to produce one section |
 | `mneme validate writing-style <page>` | Build a *review packet* for an LLM agent to grade a page |
+| `mneme tags suggest <page>` | Build a *tag packet* for an LLM agent to choose tags |
+| `mneme tags apply <page> --add t1,t2 --remove t3` | Atomic tag update (frontmatter + schema + search index) |
 | `mneme agent plan --goal "..." --doc-type <t> --client <c>` | Generate a deterministic TODO plan from the active profile |
 | `mneme agent next-task` | Return the next ready task in the active plan |
 | `mneme agent task-done <id>` | Mark a task as done |
-| `mneme sync` | Sync wiki to Memvid memory |
+| `mneme sync` | Sync wiki pages to FTS5 search index |
+| `mneme reindex` | Rebuild search index from wiki pages |
 | `mneme drift` | Detect layer desynchronization |
 | `mneme stats` | Health overview |
 | `mneme repair` | Fix corrupted archives |
-**Formats:** `.md`, `.txt`, `.pdf`
+**Formats:** `.md`, `.txt`, `.pdf`, `.xlsx` (with `pip install "mneme-cli[xlsx]"`)
 ---
@@ -206,6 +211,121 @@ Mneme generates the plan deterministically from the active profile's section_not
 ---
+## End-to-end example: from raw documents to a tagged, searchable, validated knowledge base
+A realistic walkthrough showing how the human, the CLI, and the LLM agent collaborate. Suppose you're building a knowledge base for **Parkiwatch**, a medical device for Parkinson's monitoring.
+### Step 1 — Scaffold a workspace (human, one-time)
+```bash
+mneme new ~/projects/parkiwatch --name Parkiwatch --client parkiwatch --profile eu-mdr
+cd ~/projects/parkiwatch
+```
+Creates the workspace tree, sets the EU MDR writing-style profile, and initializes empty schema files.
+### Step 2 — Ingest source material (human)
+```bash
+# Drop a folder of source documents into inbox/, then bulk-process
+cp -r ~/Downloads/parkinson-research/* inbox/
+mneme tornado --client parkiwatch
+# Or ingest individual files
+mneme ingest research-paper.pdf parkiwatch
+mneme ingest-csv risk-register.csv parkiwatch --mapping risk-register
+mneme ingest spec-table.xlsx parkiwatch          # .xlsx renders sheets as markdown tables
+mneme ingest-dir docs/ parkiwatch --recursive    # walk subdirectories
+```
+What happens per ingest: source file → wiki page in `wiki/parkiwatch/` → frontmatter with auto-extracted entities → entry in `index.md` → row in the FTS5 search DB → log entry.
+### Step 3 — Tag the new pages (LLM agent)
+The new pages have only the auto-applied `parkiwatch` client tag. The agent now adds meaningful tags:
+```bash
+# For each new page, the agent runs:
+mneme tags suggest parkiwatch/research-paper > /tmp/packet.md
+```
+The packet contains the page body, the current tag taxonomy (every tag in the workspace + usage counts), and a ready-to-paste prompt. **The LLM reads the packet** — it understands the content and decides on tags, preferring existing taxonomy entries when they fit. The LLM's response is JSON:
+```json
+{"tags": ["clinical-trial", "iso-13485"], "new_tags": ["bradykinesia-detection"]}
+```
+The agent then runs:
+```bash
+mneme tags apply parkiwatch/research-paper \
+  --add clinical-trial,iso-13485,bradykinesia-detection
+```
+Atomic operation: rewrites the wiki page frontmatter, updates `schema/tags.json`, re-indexes the page in FTS5 (so search picks up the new tags immediately), appends a log entry. **Repeat for every page** — the taxonomy grows, and subsequent pages tend to reuse existing tags (consistency).
+### Step 4 — Search the knowledge base (anyone)
+```bash
+mneme search "bradykinesia"                              # BM25 + Porter stemming
+mneme search "clinical evaluation" --client parkiwatch   # client-scoped
+```
+Sub-millisecond. Returns the page title, snippet (with `<b>highlights</b>`), tags, and BM25 score.
+### Step 5 — Produce a regulatory deliverable (LLM agent driving the agent loop)
+```bash
+# Generate a deterministic plan from the active profile
+mneme agent plan --goal "produce a Design Validation Report" \
+                 --doc-type design-validation-report \
+                 --client parkiwatch
+# → 15 tasks: 11 section drafts + assemble + harmonize + review + submission-check
+# Walk the plan
+mneme agent next-task
+# → Task: section-purpose-and-scope
+#   next_command: mneme draft --doc-type design-validation-report \
+#                             --section purpose-and-scope --client parkiwatch
+mneme draft --doc-type design-validation-report \
+            --section purpose-and-scope --client parkiwatch \
+            --query "purpose scope intended use" \
+            --out /tmp/write-packet.md
+# The LLM reads /tmp/write-packet.md (which includes wiki search hits as evidence,
+# the profile's writing-style rules, and a write prompt) and produces the section.
+# The agent writes the section to wiki/parkiwatch/design-validation-report.md.
+mneme agent task-done section-purpose-and-scope
+# ... repeat for each section ...
+# After all sections drafted:
+mneme harmonize --client parkiwatch --fix       # mechanical vocabulary swap
+mneme validate writing-style parkiwatch/design-validation-report > /tmp/review.md
+# The LLM reads /tmp/review.md, critiques every section, applies fixes in place
+mneme agent task-done review-page
+# Submission readiness
+mneme validate consistency --client parkiwatch  # cross-doc version checks
+mneme trace gaps parkiwatch                     # find broken trace chains
+mneme trace matrix parkiwatch --csv --out trace-matrix.csv  # for the DHF
+mneme snapshot parkiwatch                       # versioned audit zip
+```
+### Who does what
+| Layer | Responsibility |
+|---|---|
+| **Human** | Drops sources, runs commands, reviews diffs, ships the deliverable |
+| **mneme CLI** | Deterministic infrastructure: parses files, builds packets, indexes, traces, harmonizes vocabulary, generates plans, atomic state updates |
+| **LLM agent** | All reasoning: classifying entities, choosing tags, drafting prose, grading writing style, deciding when a chain is complete |
+mneme never calls an LLM. The LLM never bypasses mneme's atomic operations. They meet at the packet boundary.
+---
 ## How It Works
 ```
@@ -218,9 +338,9 @@ Mneme generates the plan deterministically from the active profile's section_not
          |       Frontmatter, citations, [[wikilinks]]
          |       You read and browse here
          |
-         +---> Memory Layer (.mv2 archive)
-         |       Smart Frames, semantic embeddings
-         |       Machines query here (<5ms)
+         +---> Search Index (SQLite FTS5)
+         |       BM25 ranking, Porter stemming
+         |       Sub-millisecond queries, zero dependencies
          |
          +---> Schema Layer (JSON)
                  entities.json - people, companies, products
@@ -228,9 +348,9 @@ Mneme generates the plan deterministically from the active profile's section_not
                  tags.json    - taxonomy
 ```
-Every `mneme ingest` writes both layers atomically. `mneme drift` catches desync. `mneme repair` fixes it.
+Every `mneme ingest` writes the wiki page and updates the search index atomically. `mneme drift` catches desync. `mneme reindex` rebuilds the index from wiki pages.
-**Memvid is optional.** Without it, mneme runs as a wiki-only knowledge base with text search. Add `memvid-sdk` when you outgrow grep.
+**Zero external dependencies for search.** SQLite FTS5 is built into Python's stdlib — no install, no API key, no capacity limit.
 ---
@@ -405,14 +525,15 @@ See `EXAMPLES.md` Example 13 for a full walkthrough with a real Parkiwatch scena
 ## When You Need This
-| Scale | Wiki alone | Wiki + Memvid |
-|---|---|---|
-| 5 docs | Plenty | Overkill |
-| 50 docs | Fine | Starting to help |
-| 500 docs | Grep takes 2-3s, misses semantic matches | 2ms, cross-client connections |
-| 5,000 docs | Unusable | Still 2ms |
+| Scale | Search performance |
+|---|---|
+| 5 docs | Sub-millisecond |
+| 50 docs | Sub-millisecond |
+| 500 docs | Sub-millisecond, BM25 ranked |
+| 5,000 docs | A few ms, still ranked by relevance |
+| 50,000 docs | Tens of ms |
-Start wiki-only. Add the memory layer when search gets slow.
+SQLite FTS5 scales transparently. No tuning, no capacity limits.
 ---
@@ -423,7 +544,7 @@ mneme/
   sources/        Raw documents (immutable, never modified)
   wiki/           Markdown knowledge pages (Obsidian-compatible)
   schema/         entities.json, graph.json, tags.json
-  memvid/         .mv2 memory archives
+  search.db       SQLite FTS5 search index
   core.py         Engine (ingest, search, sync, drift, repair)
   config.py       Configuration
   server.py       Web dashboard
@@ -489,7 +610,7 @@ password = pypi-AgENd...          # from https://test.pypi.org/manage/account/to
 This project builds on two foundational ideas:
 - **LLM Wiki pattern** by [Andrej Karpathy](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) -- the insight that LLMs should build and maintain a persistent, compounding wiki instead of re-deriving answers from raw documents on every query
-- **Memvid** by [Olow304/memvid](https://github.com/Olow304/memvid) -- single-file AI memory with sub-millisecond retrieval, no vector DB required
+- **SQLite FTS5** -- the world's most-deployed embedded database, with built-in BM25 full-text search
 - **Original implementation** -- [tashisleepy/knowledge-engine](https://github.com/tashisleepy/knowledge-engine) -- the first version that fused both patterns into a dual-layer bridge
 ---

{mneme_cli-0.4.0 → mneme_cli-0.5.0}/README.md RENAMED Viewed

@@ -122,15 +122,18 @@ One installed CLI serves many projects — each workspace is just a directory.
 | `mneme search "<query>"` | Search across all layers |
 | `mneme draft --doc-type <t> --section <s> --client <c>` | Build a *write packet* for an LLM agent to produce one section |
 | `mneme validate writing-style <page>` | Build a *review packet* for an LLM agent to grade a page |
+| `mneme tags suggest <page>` | Build a *tag packet* for an LLM agent to choose tags |
+| `mneme tags apply <page> --add t1,t2 --remove t3` | Atomic tag update (frontmatter + schema + search index) |
 | `mneme agent plan --goal "..." --doc-type <t> --client <c>` | Generate a deterministic TODO plan from the active profile |
 | `mneme agent next-task` | Return the next ready task in the active plan |
 | `mneme agent task-done <id>` | Mark a task as done |
-| `mneme sync` | Sync wiki to Memvid memory |
+| `mneme sync` | Sync wiki pages to FTS5 search index |
+| `mneme reindex` | Rebuild search index from wiki pages |
 | `mneme drift` | Detect layer desynchronization |
 | `mneme stats` | Health overview |
 | `mneme repair` | Fix corrupted archives |
-**Formats:** `.md`, `.txt`, `.pdf`
+**Formats:** `.md`, `.txt`, `.pdf`, `.xlsx` (with `pip install "mneme-cli[xlsx]"`)
 ---
@@ -165,6 +168,121 @@ Mneme generates the plan deterministically from the active profile's section_not
 ---
+## End-to-end example: from raw documents to a tagged, searchable, validated knowledge base
+A realistic walkthrough showing how the human, the CLI, and the LLM agent collaborate. Suppose you're building a knowledge base for **Parkiwatch**, a medical device for Parkinson's monitoring.
+### Step 1 — Scaffold a workspace (human, one-time)
+```bash
+mneme new ~/projects/parkiwatch --name Parkiwatch --client parkiwatch --profile eu-mdr
+cd ~/projects/parkiwatch
+```
+Creates the workspace tree, sets the EU MDR writing-style profile, and initializes empty schema files.
+### Step 2 — Ingest source material (human)
+```bash
+# Drop a folder of source documents into inbox/, then bulk-process
+cp -r ~/Downloads/parkinson-research/* inbox/
+mneme tornado --client parkiwatch
+# Or ingest individual files
+mneme ingest research-paper.pdf parkiwatch
+mneme ingest-csv risk-register.csv parkiwatch --mapping risk-register
+mneme ingest spec-table.xlsx parkiwatch          # .xlsx renders sheets as markdown tables
+mneme ingest-dir docs/ parkiwatch --recursive    # walk subdirectories
+```
+What happens per ingest: source file → wiki page in `wiki/parkiwatch/` → frontmatter with auto-extracted entities → entry in `index.md` → row in the FTS5 search DB → log entry.
+### Step 3 — Tag the new pages (LLM agent)
+The new pages have only the auto-applied `parkiwatch` client tag. The agent now adds meaningful tags:
+```bash
+# For each new page, the agent runs:
+mneme tags suggest parkiwatch/research-paper > /tmp/packet.md
+```
+The packet contains the page body, the current tag taxonomy (every tag in the workspace + usage counts), and a ready-to-paste prompt. **The LLM reads the packet** — it understands the content and decides on tags, preferring existing taxonomy entries when they fit. The LLM's response is JSON:
+```json
+{"tags": ["clinical-trial", "iso-13485"], "new_tags": ["bradykinesia-detection"]}
+```
+The agent then runs:
+```bash
+mneme tags apply parkiwatch/research-paper \
+  --add clinical-trial,iso-13485,bradykinesia-detection
+```
+Atomic operation: rewrites the wiki page frontmatter, updates `schema/tags.json`, re-indexes the page in FTS5 (so search picks up the new tags immediately), appends a log entry. **Repeat for every page** — the taxonomy grows, and subsequent pages tend to reuse existing tags (consistency).
+### Step 4 — Search the knowledge base (anyone)
+```bash
+mneme search "bradykinesia"                              # BM25 + Porter stemming
+mneme search "clinical evaluation" --client parkiwatch   # client-scoped
+```
+Sub-millisecond. Returns the page title, snippet (with `<b>highlights</b>`), tags, and BM25 score.
+### Step 5 — Produce a regulatory deliverable (LLM agent driving the agent loop)
+```bash
+# Generate a deterministic plan from the active profile
+mneme agent plan --goal "produce a Design Validation Report" \
+                 --doc-type design-validation-report \
+                 --client parkiwatch
+# → 15 tasks: 11 section drafts + assemble + harmonize + review + submission-check
+# Walk the plan
+mneme agent next-task
+# → Task: section-purpose-and-scope
+#   next_command: mneme draft --doc-type design-validation-report \
+#                             --section purpose-and-scope --client parkiwatch
+mneme draft --doc-type design-validation-report \
+            --section purpose-and-scope --client parkiwatch \
+            --query "purpose scope intended use" \
+            --out /tmp/write-packet.md
+# The LLM reads /tmp/write-packet.md (which includes wiki search hits as evidence,
+# the profile's writing-style rules, and a write prompt) and produces the section.
+# The agent writes the section to wiki/parkiwatch/design-validation-report.md.
+mneme agent task-done section-purpose-and-scope
+# ... repeat for each section ...
+# After all sections drafted:
+mneme harmonize --client parkiwatch --fix       # mechanical vocabulary swap
+mneme validate writing-style parkiwatch/design-validation-report > /tmp/review.md
+# The LLM reads /tmp/review.md, critiques every section, applies fixes in place
+mneme agent task-done review-page
+# Submission readiness
+mneme validate consistency --client parkiwatch  # cross-doc version checks
+mneme trace gaps parkiwatch                     # find broken trace chains
+mneme trace matrix parkiwatch --csv --out trace-matrix.csv  # for the DHF
+mneme snapshot parkiwatch                       # versioned audit zip
+```
+### Who does what
+| Layer | Responsibility |
+|---|---|
+| **Human** | Drops sources, runs commands, reviews diffs, ships the deliverable |
+| **mneme CLI** | Deterministic infrastructure: parses files, builds packets, indexes, traces, harmonizes vocabulary, generates plans, atomic state updates |
+| **LLM agent** | All reasoning: classifying entities, choosing tags, drafting prose, grading writing style, deciding when a chain is complete |
+mneme never calls an LLM. The LLM never bypasses mneme's atomic operations. They meet at the packet boundary.
+---
 ## How It Works
 ```
@@ -177,9 +295,9 @@ Mneme generates the plan deterministically from the active profile's section_not
          |       Frontmatter, citations, [[wikilinks]]
          |       You read and browse here
          |
-         +---> Memory Layer (.mv2 archive)
-         |       Smart Frames, semantic embeddings
-         |       Machines query here (<5ms)
+         +---> Search Index (SQLite FTS5)
+         |       BM25 ranking, Porter stemming
+         |       Sub-millisecond queries, zero dependencies
          |
          +---> Schema Layer (JSON)
                  entities.json - people, companies, products
@@ -187,9 +305,9 @@ Mneme generates the plan deterministically from the active profile's section_not
                  tags.json    - taxonomy
 ```
-Every `mneme ingest` writes both layers atomically. `mneme drift` catches desync. `mneme repair` fixes it.
+Every `mneme ingest` writes the wiki page and updates the search index atomically. `mneme drift` catches desync. `mneme reindex` rebuilds the index from wiki pages.
-**Memvid is optional.** Without it, mneme runs as a wiki-only knowledge base with text search. Add `memvid-sdk` when you outgrow grep.
+**Zero external dependencies for search.** SQLite FTS5 is built into Python's stdlib — no install, no API key, no capacity limit.
 ---
@@ -364,14 +482,15 @@ See `EXAMPLES.md` Example 13 for a full walkthrough with a real Parkiwatch scena
 ## When You Need This
-| Scale | Wiki alone | Wiki + Memvid |
-|---|---|---|
-| 5 docs | Plenty | Overkill |
-| 50 docs | Fine | Starting to help |
-| 500 docs | Grep takes 2-3s, misses semantic matches | 2ms, cross-client connections |
-| 5,000 docs | Unusable | Still 2ms |
+| Scale | Search performance |
+|---|---|
+| 5 docs | Sub-millisecond |
+| 50 docs | Sub-millisecond |
+| 500 docs | Sub-millisecond, BM25 ranked |
+| 5,000 docs | A few ms, still ranked by relevance |
+| 50,000 docs | Tens of ms |
-Start wiki-only. Add the memory layer when search gets slow.
+SQLite FTS5 scales transparently. No tuning, no capacity limits.
 ---
@@ -382,7 +501,7 @@ mneme/
   sources/        Raw documents (immutable, never modified)
   wiki/           Markdown knowledge pages (Obsidian-compatible)
   schema/         entities.json, graph.json, tags.json
-  memvid/         .mv2 memory archives
+  search.db       SQLite FTS5 search index
   core.py         Engine (ingest, search, sync, drift, repair)
   config.py       Configuration
   server.py       Web dashboard
@@ -448,7 +567,7 @@ password = pypi-AgENd...          # from https://test.pypi.org/manage/account/to
 This project builds on two foundational ideas:
 - **LLM Wiki pattern** by [Andrej Karpathy](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) -- the insight that LLMs should build and maintain a persistent, compounding wiki instead of re-deriving answers from raw documents on every query
-- **Memvid** by [Olow304/memvid](https://github.com/Olow304/memvid) -- single-file AI memory with sub-millisecond retrieval, no vector DB required
+- **SQLite FTS5** -- the world's most-deployed embedded database, with built-in BM25 full-text search
 - **Original implementation** -- [tashisleepy/knowledge-engine](https://github.com/tashisleepy/knowledge-engine) -- the first version that fused both patterns into a dual-layer bridge
 ---

{mneme_cli-0.4.0 → mneme_cli-0.5.0}/mneme/__init__.py RENAMED Viewed

@@ -5,4 +5,4 @@ Public API:
     from mneme.core import ingest_source_to_both, dual_search, ...
 """
-__version__ = "0.4.0"
+__version__ = "0.5.0"

{mneme_cli-0.4.0 → mneme_cli-0.5.0}/mneme/config.py RENAMED Viewed

@@ -8,7 +8,7 @@ Two distinct roots:
   from here.
 * WORKSPACE_DIR - where the user's data lives (wiki/, sources/, schema/,
-  memvid/, index.md, log.md). Resolved in this order:
+  search.db, index.md, log.md). Resolved in this order:
       1. The MNEME_HOME environment variable, if set.
       2. The current working directory.
@@ -54,9 +54,7 @@ BASE_DIR = WORKSPACE_DIR
 WIKI_DIR = os.path.join(WORKSPACE_DIR, 'wiki')
 SOURCES_DIR = os.path.join(WORKSPACE_DIR, 'sources')
 SCHEMA_DIR = os.path.join(WORKSPACE_DIR, 'schema')
-MEMVID_DIR = os.path.join(WORKSPACE_DIR, 'memvid')
-MASTER_MV2 = os.path.join(MEMVID_DIR, 'master.mv2')
-PER_CLIENT_DIR = os.path.join(MEMVID_DIR, 'per-client')
+SEARCH_DB = os.path.join(WORKSPACE_DIR, 'search.db')
 INDEX_FILE = os.path.join(WORKSPACE_DIR, 'index.md')
 LOG_FILE = os.path.join(WORKSPACE_DIR, 'log.md')
 TEMPLATES_DIR = os.path.join(WIKI_DIR, '_templates')
@@ -78,13 +76,8 @@ WORKSPACE_MAPPINGS_DIR = os.path.join(WORKSPACE_PROFILES_DIR, 'mappings')
 EXCLUDED_DIRS = ['_templates', '.baselines']
 EXCLUDED_FILES = ['_meta.yaml']
-# Chunk settings for memvid
-MAX_CHUNK_SIZE = 500  # characters per Smart Frame
-MIN_CHUNK_SIZE = 50   # don't create tiny frames
-# Ingest limits to prevent hangs on huge files
-MAX_CHUNKS_PER_INGEST = 200   # hard cap on chunks sent to memvid per page
-CHUNK_COMMIT_BATCH = 50       # commit to memvid every N chunks
+# Log rotation
+LOG_MAX_ENTRIES = 500  # archive after this many entries
 # Entity extraction stopwords
 ENTITY_STOPWORDS = {

mneme-cli 0.4.0__tar.gz → 0.5.0__tar.gz

mneme-cli 0.4.0tar.gz → 0.5.0tar.gz