mempalace-code 1.6.0__tar.gz → 1.6.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (146) hide show
  1. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/CHANGELOG.md +23 -0
  2. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/PKG-INFO +26 -18
  3. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/README.md +25 -17
  4. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/BACKLOG-archived.yaml +5 -0
  5. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/BACKLOG.yaml +30 -0
  6. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/COMPARISON_GRAPHIFY.md +9 -9
  7. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/HOW_SEARCH_WORKS.md +9 -6
  8. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/README.md +7 -6
  9. mempalace_code-1.6.2/mempalace/language_catalog.py +251 -0
  10. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/mcp_server.py +2 -7
  11. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/miner.py +181 -143
  12. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/searcher.py +9 -51
  13. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/storage.py +8 -0
  14. mempalace_code-1.6.2/mempalace/version.py +18 -0
  15. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/pyproject.toml +1 -1
  16. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_chunking.py +56 -0
  17. mempalace_code-1.6.2/tests/test_language_catalog.py +119 -0
  18. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_mcp_server.py +19 -0
  19. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_miner.py +68 -0
  20. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_searcher.py +21 -0
  21. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_watcher.py +8 -0
  22. mempalace_code-1.6.0/mempalace/version.py +0 -12
  23. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/prompts/codex-hardening-review.md +0 -0
  24. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/prompts/codex-plan-review.md +0 -0
  25. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/settings.json +0 -0
  26. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/_shared/commit-checkpoint.md +0 -0
  27. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/_shared/mode-classification.md +0 -0
  28. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/_shared/task-state.md +0 -0
  29. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/bench/SKILL.md +0 -0
  30. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/doc-refresh/INSTRUCTIONS.md +0 -0
  31. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/doc-refresh/SKILL.md +0 -0
  32. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/entropy-gc/INSTRUCTIONS.md +0 -0
  33. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/entropy-gc/SKILL.md +0 -0
  34. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/mine/SKILL.md +0 -0
  35. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/palace-health/SKILL.md +0 -0
  36. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/release/SKILL.md +0 -0
  37. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/release-prep/SKILL.md +0 -0
  38. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/ship/INSTRUCTIONS.md +0 -0
  39. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/ship/SKILL.md +0 -0
  40. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/start/INSTRUCTIONS.md +0 -0
  41. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/start/SKILL.md +0 -0
  42. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/status/SKILL.md +0 -0
  43. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/task-hardening/INSTRUCTIONS.md +0 -0
  44. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/task-hardening/SKILL.md +0 -0
  45. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/task-plan/INSTRUCTIONS.md +0 -0
  46. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/task-plan/SKILL.md +0 -0
  47. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/verify/INSTRUCTIONS.md +0 -0
  48. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/verify/SKILL.md +0 -0
  49. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
  50. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
  51. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/PULL_REQUEST_TEMPLATE.md +0 -0
  52. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/workflows/ci.yml +0 -0
  53. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/workflows/publish.yml +0 -0
  54. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.gitignore +0 -0
  55. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.pre-commit-config.yaml +0 -0
  56. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.verify-state +0 -0
  57. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/AGENTS.md +0 -0
  58. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/CLAUDE.md +0 -0
  59. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/CONTRIBUTING.md +0 -0
  60. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/LICENSE +0 -0
  61. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/NOTICE +0 -0
  62. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/assets/mempalace_banner.jpg +0 -0
  63. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/BENCHMARKS.md +0 -0
  64. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/HYBRID_MODE.md +0 -0
  65. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/README.md +0 -0
  66. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/code_retrieval_bench.py +0 -0
  67. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/convomem_bench.py +0 -0
  68. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/data/code_retrieval_queries.json +0 -0
  69. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/dotnet_bench.py +0 -0
  70. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/embed_ab_bench.py +0 -0
  71. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/locomo_bench.py +0 -0
  72. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/longmemeval_bench.py +0 -0
  73. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/membench_bench.py +0 -0
  74. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/results_embed_ab_2026-04-09.json +0 -0
  75. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/results_token_delta_mempalace.json +0 -0
  76. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/results_token_delta_wh40k.json +0 -0
  77. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/token_delta_bench.py +0 -0
  78. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/AGENT_INSTALL.md +0 -0
  79. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/BACKUP_RESTORE.md +0 -0
  80. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/BENCH_TOKEN_DELTA.md +0 -0
  81. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/LLM_USAGE_RULES.md +0 -0
  82. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/OFFLINE_USAGE.md +0 -0
  83. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/UPSTREAM_HARDENING.md +0 -0
  84. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/WHY_THIS_FORK.md +0 -0
  85. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/HOOKS_TUTORIAL.md +0 -0
  86. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/basic_mining.py +0 -0
  87. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/convo_import.py +0 -0
  88. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/gemini_cli_setup.md +0 -0
  89. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/mcp_setup.md +0 -0
  90. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/hooks/README.md +0 -0
  91. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/hooks/mempal_precompact_hook.sh +0 -0
  92. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/hooks/mempal_save_hook.sh +0 -0
  93. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/__init__.py +0 -0
  94. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/__main__.py +0 -0
  95. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/_chroma_store.py +0 -0
  96. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/backup.py +0 -0
  97. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/cli.py +0 -0
  98. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/config.py +0 -0
  99. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/convo_miner.py +0 -0
  100. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/dialect.py +0 -0
  101. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/entity_detector.py +0 -0
  102. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/entity_registry.py +0 -0
  103. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/export.py +0 -0
  104. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/general_extractor.py +0 -0
  105. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/knowledge_graph.py +0 -0
  106. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/layers.py +0 -0
  107. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/migrate.py +0 -0
  108. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/normalize.py +0 -0
  109. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/onboarding.py +0 -0
  110. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/palace_graph.py +0 -0
  111. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/py.typed +0 -0
  112. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/room_detector_local.py +0 -0
  113. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/spellcheck.py +0 -0
  114. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/split_mega_files.py +0 -0
  115. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/treesitter.py +0 -0
  116. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/watcher.py +0 -0
  117. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/scripts/bootstrap.sh +0 -0
  118. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/scripts/codex-review.sh +0 -0
  119. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/scripts/nuke_wing.py +0 -0
  120. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/conftest.py +0 -0
  121. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_backup.py +0 -0
  122. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_chroma_compat.py +0 -0
  123. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_cli.py +0 -0
  124. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_code_retrieval_bench.py +0 -0
  125. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_config.py +0 -0
  126. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_convo_miner.py +0 -0
  127. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_dialect.py +0 -0
  128. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_dotnet_config.py +0 -0
  129. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_e2e.py +0 -0
  130. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_embed_ab_bench.py +0 -0
  131. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_entity_detector.py +0 -0
  132. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_export.py +0 -0
  133. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_general_extractor.py +0 -0
  134. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_kg_extract.py +0 -0
  135. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_knowledge_graph.py +0 -0
  136. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_lang_detect.py +0 -0
  137. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_migrate.py +0 -0
  138. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_normalize.py +0 -0
  139. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_offline.py +0 -0
  140. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_split_mega_files.py +0 -0
  141. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_storage.py +0 -0
  142. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_storage_lance.py +0 -0
  143. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_symbol_extract.py +0 -0
  144. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_treesitter.py +0 -0
  145. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_version_consistency.py +0 -0
  146. {mempalace_code-1.6.0 → mempalace_code-1.6.2}/uv.lock +0 -0
@@ -1,5 +1,28 @@
1
1
  # Changelog
2
2
 
3
+ ## v1.6.2 — 2026-05-01
4
+
5
+ ### Added
6
+
7
+ - Shared language catalog for miner detection, `code_search` validation, and MCP language hints.
8
+ - `code_search(language=...)` now accepts Kotlin, XML project files, and Perl shebang-detected files, matching mined language labels from the catalog.
9
+
10
+ ### Changed
11
+
12
+ - The `mempalace_code_search` MCP language description is generated from the same catalog used by search validation, reducing future drift when language support changes.
13
+ - PR #4's scan-exclude proposal is split into backlog item `MINE-APP-SCAN-EXCLUDES-PR4` instead of being merged with the catalog refactor.
14
+
15
+ ## v1.6.1 — 2026-04-30
16
+
17
+ ### Added
18
+
19
+ - Markdown section metadata in mined drawers: heading, heading level, heading path, document section type, and flags for Mermaid diagrams, fenced code blocks, and Markdown tables.
20
+ - `search_memories` now returns Markdown section context with each result when available.
21
+
22
+ ### Changed
23
+
24
+ - Markdown prose chunking treats `#` through `######` headings as section boundaries and preserves section metadata through small-section merges and oversized-section splits.
25
+
3
26
  ## v1.6.0 — 2026-04-27
4
27
 
5
28
  ### Added
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: mempalace-code
3
- Version: 1.6.0
3
+ Version: 1.6.2
4
4
  Summary: Developer memory tool — mine codebases and conversations into a LanceDB-backed searchable palace. No API key required.
5
5
  Project-URL: Homepage, https://github.com/rergards/mempalace-code
6
6
  Project-URL: Repository, https://github.com/rergards/mempalace-code
@@ -69,14 +69,14 @@ No cloud. No API keys. No subscription. Nothing leaves your machine.
69
69
 
70
70
  <table>
71
71
  <tr>
72
- <td align="center"><strong>Tree-sitter AST Parsing</strong><br><sub>Chunks at function boundaries<br>not arbitrary line counts</sub></td>
72
+ <td align="center"><strong>Language-Aware Mining</strong><br><sub>AST, regex, and adaptive chunking<br>matched to each file type</sub></td>
73
73
  <td align="center"><strong>27 MCP Tools</strong><br><sub>Native Claude Code integration<br>search, store, traverse</sub></td>
74
74
  <td align="center"><strong>Temporal Knowledge Graph</strong><br><sub>Facts that change over time<br>with validity windows</sub></td>
75
75
  </tr>
76
76
  <tr>
77
77
  <td align="center"><strong>595x Token Savings</strong><br><sub>measured peak · median 80x<br><a href="docs/BENCH_TOKEN_DELTA.md">scales with project size</a></sub></td>
78
78
  <td align="center"><strong>Cross-Project Tunnels</strong><br><sub>Search <code>auth</code> in one project<br>find it everywhere</sub></td>
79
- <td align="center"><strong>1312 Tests · $0 Cost</strong><br><sub>Every feature acceptance-gated<br>fully offline after install</sub></td>
79
+ <td align="center"><strong>1322 Tests · $0 Cost</strong><br><sub>Every feature acceptance-gated<br>fully offline after install</sub></td>
80
80
  </tr>
81
81
  </table>
82
82
 
@@ -150,7 +150,7 @@ You write code. You make decisions. You debug things. Between sessions, all that
150
150
  mempalace-code **indexes it once** into a local vector store, then your AI finds it in milliseconds — using [595x fewer tokens](docs/BENCH_TOKEN_DELTA.md) than grep + read at measured peak (median 80x on a 19k-chunk project, and it keeps scaling). Think of it as `git log` for everything that *isn't* in the code: the *why*, the discussions, the dead ends, the decisions.
151
151
 
152
152
  **What gets indexed:**
153
- - Code files — functions, classes, modules (Python, TypeScript/JS, Go, Rust, C/C++, C#, F#, VB.NET, XAML, Java, Kotlin, Scala, Swift, Dart, PHP, Markdown) plus Kubernetes manifests
153
+ - Code files — structural chunks for Python, TypeScript/JS/TSX/JSX, Go, Rust, Java, Kotlin, C#, F#, VB.NET, XAML, Swift, PHP, Scala, Dart, Terraform/HCL, Markdown, and Kubernetes manifests; adaptive chunks for C/C++, Ruby, shell, SQL, HTML/CSS, JSON/YAML/TOML, CSV, Dockerfile, Make, templates, and config files
154
154
  - .NET solutions — `.sln`/`.csproj` project graphs, cross-project symbol relationships, interface implementations
155
155
  - Conversation exports — Claude, ChatGPT, Slack
156
156
  - Architecture notes, decisions, anything you store manually
@@ -163,14 +163,14 @@ mempalace-code **indexes it once** into a local vector store, then your AI finds
163
163
 
164
164
  ### Language-Aware Code Mining
165
165
 
166
- `mempalace mine` walks your source tree and chunks at **structural boundaries** functions, classes, methods not arbitrary line counts. Leading comments and docstrings stay attached to their declarations.
166
+ `mempalace mine` walks your source tree and chooses the best chunker for each file type: AST boundaries where optional tree-sitter grammars are available, regex structural boundaries for supported languages, YAML-aware Kubernetes resource splits, Markdown/prose sections, or adaptive line-count chunks for formats without reliable declarations. Leading comments and docstrings stay attached to declarations where structural chunking is active; Markdown drawers keep heading path, section type, and Mermaid/code/table flags in search metadata.
167
167
 
168
168
  | Language | Strategy | AST Support |
169
169
  |----------|----------|:-----------:|
170
- | Python | Functions, classes, methods, decorators | Tree-sitter |
171
- | TypeScript / JavaScript / TSX / JSX | Functions, classes, exports, imports | Tree-sitter |
172
- | Go | Functions, types, methods, interfaces | Tree-sitter |
173
- | Rust | Functions, structs, enums, traits, impls | Tree-sitter |
170
+ | Python | Functions, classes, methods, decorators | Optional tree-sitter |
171
+ | TypeScript / JavaScript / TSX / JSX | Functions, classes, exports, imports | Optional tree-sitter |
172
+ | Go | Functions, types, methods, interfaces | Optional tree-sitter |
173
+ | Rust | Functions, structs, enums, traits, impls | Optional tree-sitter |
174
174
  | Java | Classes, interfaces, methods, annotations | Regex |
175
175
  | Kotlin | Classes, objects, functions, extensions | Regex |
176
176
  | Scala | Classes, case classes, objects, traits, enums, functions, implicits, type aliases, generics | Regex |
@@ -180,12 +180,20 @@ mempalace-code **indexes it once** into a local vector store, then your AI finds
180
180
  | C# | Classes, interfaces, records, methods, properties | Regex |
181
181
  | F# / VB.NET | Modules, types, functions | Regex |
182
182
  | XAML | Controls, resources, code-behind linking | Regex |
183
- | C / C++ | Functions, structs, enums, classes | Regex |
184
- | Kubernetes manifests | Deployments, Services, ConfigMaps, Secrets, Ingresses, CRDs (indexed by kind/namespace/labels) | YAML-aware |
185
- | Markdown / plain text | Heading sections, paragraphs | — |
186
- | YAML / JSON / TOML | Adaptive line-count | — |
183
+ | Terraform / HCL | Terraform/HCL top-level blocks (`resource`, `module`, `variable`, `moved`, `import`, `check`, etc.) | Regex |
184
+ | Kubernetes manifests | Deployments, Services, ConfigMaps, Secrets, Ingresses, CRDs (indexed by kind/name) | YAML-aware |
185
+ | Markdown / plain text | Heading sections (`#`-`######`), heading paths, section metadata, paragraphs | — |
186
+ | C / C++ | Indexed and searchable with best-effort symbol metadata; chunked adaptively today | — |
187
+ | Ruby / shell / SQL | Indexed and searchable; chunked adaptively today | — |
188
+ | HTML / CSS / CSV | Indexed and searchable; chunked adaptively today | — |
189
+ | YAML / JSON / TOML | Adaptive line-count; Kubernetes YAML auto-detected separately | — |
190
+ | Dockerfile / Make / templates / config | Dockerfile, Containerfile, Makefile, GNUmakefile, Vagrantfile, Go templates, Jinja2, `.conf`, `.cfg`, `.ini` | — |
187
191
 
188
- Tree-sitter is optional (`pip install "mempalace-code[treesitter]"`). Without it, all languages fall back to regex boundary detection — still structural, just less precise.
192
+ The `mempalace_code_search` language filter is generated from the same language
193
+ catalog as the miner. If a file type is mined with a language label, the MCP
194
+ schema and unsupported-language hints stay aligned with that catalog.
195
+
196
+ Tree-sitter is optional (`pip install "mempalace-code[treesitter]"`). When a grammar is missing, Python, TypeScript/JavaScript/TSX/JSX, Go, and Rust fall back to regex structural chunking. Other recognized formats use their regex, YAML-aware, prose, or adaptive chunker as listed above.
189
197
 
190
198
  ```bash
191
199
  mempalace mine ~/projects/myapp # all supported file types
@@ -307,7 +315,7 @@ claude mcp add mempalace -- python -m mempalace.mcp_server
307
315
  | `mempalace_list_wings` | All wings with drawer counts |
308
316
  | `mempalace_list_rooms` | Rooms within a wing |
309
317
  | `mempalace_get_taxonomy` | Full wing → room → count tree |
310
- | `mempalace_search` | Semantic search with optional wing/room filters |
318
+ | `mempalace_search` | Semantic search with optional wing/room filters; Markdown hits include heading path and section metadata |
311
319
  | `mempalace_code_search` | Filter by language, symbol name/type, file glob |
312
320
  | `mempalace_file_context` | All indexed chunks for a source file, ordered by chunk_index |
313
321
  | `mempalace_check_duplicate` | Similarity check before filing (0.9 threshold) |
@@ -504,7 +512,7 @@ This is a code-first fork of [milla-jovovich/mempalace](https://github.com/milla
504
512
  | No backup, no recovery | `backup` / `restore` / `export` / `import` |
505
513
  | No incremental mining | Content-hash incremental: only changed files re-chunked |
506
514
  | No code-search | `code_search` — filter by language, symbol, glob |
507
- | Line-count chunking | Tree-sitter AST + regex structural chunking |
515
+ | Line-count chunking | Language-aware mining: tree-sitter AST for supported grammars, regex structural chunking, YAML-aware Kubernetes splits, prose sections, and adaptive chunks for configs/data |
508
516
 
509
517
  Full audit: [`docs/UPSTREAM_HARDENING.md`](docs/UPSTREAM_HARDENING.md).
510
518
 
@@ -657,9 +665,9 @@ python -m pytest tests/ -x -q # full suite, all local, no network
657
665
  Apache 2.0 — see [LICENSE](LICENSE) and [NOTICE](NOTICE).
658
666
 
659
667
  <!-- Link Definitions -->
660
- [version-shield]: https://img.shields.io/badge/version-1.6.0-4dc9f6?style=flat-square&labelColor=0a0e14
668
+ [version-shield]: https://img.shields.io/badge/version-1.6.2-4dc9f6?style=flat-square&labelColor=0a0e14
661
669
  [release-link]: https://github.com/rergards/mempalace-code/releases
662
- [python-shield]: https://img.shields.io/badge/python-3.9+-7dd8f8?style=flat-square&labelColor=0a0e14&logo=python&logoColor=7dd8f8
670
+ [python-shield]: https://img.shields.io/badge/python-3.11+-7dd8f8?style=flat-square&labelColor=0a0e14&logo=python&logoColor=7dd8f8
663
671
  [python-link]: https://www.python.org/
664
672
  [license-shield]: https://img.shields.io/badge/license-Apache_2.0-b0e8ff?style=flat-square&labelColor=0a0e14
665
673
  [license-link]: https://github.com/rergards/mempalace-code/blob/main/LICENSE
@@ -22,14 +22,14 @@ No cloud. No API keys. No subscription. Nothing leaves your machine.
22
22
 
23
23
  <table>
24
24
  <tr>
25
- <td align="center"><strong>Tree-sitter AST Parsing</strong><br><sub>Chunks at function boundaries<br>not arbitrary line counts</sub></td>
25
+ <td align="center"><strong>Language-Aware Mining</strong><br><sub>AST, regex, and adaptive chunking<br>matched to each file type</sub></td>
26
26
  <td align="center"><strong>27 MCP Tools</strong><br><sub>Native Claude Code integration<br>search, store, traverse</sub></td>
27
27
  <td align="center"><strong>Temporal Knowledge Graph</strong><br><sub>Facts that change over time<br>with validity windows</sub></td>
28
28
  </tr>
29
29
  <tr>
30
30
  <td align="center"><strong>595x Token Savings</strong><br><sub>measured peak · median 80x<br><a href="docs/BENCH_TOKEN_DELTA.md">scales with project size</a></sub></td>
31
31
  <td align="center"><strong>Cross-Project Tunnels</strong><br><sub>Search <code>auth</code> in one project<br>find it everywhere</sub></td>
32
- <td align="center"><strong>1312 Tests · $0 Cost</strong><br><sub>Every feature acceptance-gated<br>fully offline after install</sub></td>
32
+ <td align="center"><strong>1322 Tests · $0 Cost</strong><br><sub>Every feature acceptance-gated<br>fully offline after install</sub></td>
33
33
  </tr>
34
34
  </table>
35
35
 
@@ -103,7 +103,7 @@ You write code. You make decisions. You debug things. Between sessions, all that
103
103
  mempalace-code **indexes it once** into a local vector store, then your AI finds it in milliseconds — using [595x fewer tokens](docs/BENCH_TOKEN_DELTA.md) than grep + read at measured peak (median 80x on a 19k-chunk project, and it keeps scaling). Think of it as `git log` for everything that *isn't* in the code: the *why*, the discussions, the dead ends, the decisions.
104
104
 
105
105
  **What gets indexed:**
106
- - Code files — functions, classes, modules (Python, TypeScript/JS, Go, Rust, C/C++, C#, F#, VB.NET, XAML, Java, Kotlin, Scala, Swift, Dart, PHP, Markdown) plus Kubernetes manifests
106
+ - Code files — structural chunks for Python, TypeScript/JS/TSX/JSX, Go, Rust, Java, Kotlin, C#, F#, VB.NET, XAML, Swift, PHP, Scala, Dart, Terraform/HCL, Markdown, and Kubernetes manifests; adaptive chunks for C/C++, Ruby, shell, SQL, HTML/CSS, JSON/YAML/TOML, CSV, Dockerfile, Make, templates, and config files
107
107
  - .NET solutions — `.sln`/`.csproj` project graphs, cross-project symbol relationships, interface implementations
108
108
  - Conversation exports — Claude, ChatGPT, Slack
109
109
  - Architecture notes, decisions, anything you store manually
@@ -116,14 +116,14 @@ mempalace-code **indexes it once** into a local vector store, then your AI finds
116
116
 
117
117
  ### Language-Aware Code Mining
118
118
 
119
- `mempalace mine` walks your source tree and chunks at **structural boundaries** functions, classes, methods not arbitrary line counts. Leading comments and docstrings stay attached to their declarations.
119
+ `mempalace mine` walks your source tree and chooses the best chunker for each file type: AST boundaries where optional tree-sitter grammars are available, regex structural boundaries for supported languages, YAML-aware Kubernetes resource splits, Markdown/prose sections, or adaptive line-count chunks for formats without reliable declarations. Leading comments and docstrings stay attached to declarations where structural chunking is active; Markdown drawers keep heading path, section type, and Mermaid/code/table flags in search metadata.
120
120
 
121
121
  | Language | Strategy | AST Support |
122
122
  |----------|----------|:-----------:|
123
- | Python | Functions, classes, methods, decorators | Tree-sitter |
124
- | TypeScript / JavaScript / TSX / JSX | Functions, classes, exports, imports | Tree-sitter |
125
- | Go | Functions, types, methods, interfaces | Tree-sitter |
126
- | Rust | Functions, structs, enums, traits, impls | Tree-sitter |
123
+ | Python | Functions, classes, methods, decorators | Optional tree-sitter |
124
+ | TypeScript / JavaScript / TSX / JSX | Functions, classes, exports, imports | Optional tree-sitter |
125
+ | Go | Functions, types, methods, interfaces | Optional tree-sitter |
126
+ | Rust | Functions, structs, enums, traits, impls | Optional tree-sitter |
127
127
  | Java | Classes, interfaces, methods, annotations | Regex |
128
128
  | Kotlin | Classes, objects, functions, extensions | Regex |
129
129
  | Scala | Classes, case classes, objects, traits, enums, functions, implicits, type aliases, generics | Regex |
@@ -133,12 +133,20 @@ mempalace-code **indexes it once** into a local vector store, then your AI finds
133
133
  | C# | Classes, interfaces, records, methods, properties | Regex |
134
134
  | F# / VB.NET | Modules, types, functions | Regex |
135
135
  | XAML | Controls, resources, code-behind linking | Regex |
136
- | C / C++ | Functions, structs, enums, classes | Regex |
137
- | Kubernetes manifests | Deployments, Services, ConfigMaps, Secrets, Ingresses, CRDs (indexed by kind/namespace/labels) | YAML-aware |
138
- | Markdown / plain text | Heading sections, paragraphs | — |
139
- | YAML / JSON / TOML | Adaptive line-count | — |
136
+ | Terraform / HCL | Terraform/HCL top-level blocks (`resource`, `module`, `variable`, `moved`, `import`, `check`, etc.) | Regex |
137
+ | Kubernetes manifests | Deployments, Services, ConfigMaps, Secrets, Ingresses, CRDs (indexed by kind/name) | YAML-aware |
138
+ | Markdown / plain text | Heading sections (`#`-`######`), heading paths, section metadata, paragraphs | — |
139
+ | C / C++ | Indexed and searchable with best-effort symbol metadata; chunked adaptively today | — |
140
+ | Ruby / shell / SQL | Indexed and searchable; chunked adaptively today | — |
141
+ | HTML / CSS / CSV | Indexed and searchable; chunked adaptively today | — |
142
+ | YAML / JSON / TOML | Adaptive line-count; Kubernetes YAML auto-detected separately | — |
143
+ | Dockerfile / Make / templates / config | Dockerfile, Containerfile, Makefile, GNUmakefile, Vagrantfile, Go templates, Jinja2, `.conf`, `.cfg`, `.ini` | — |
140
144
 
141
- Tree-sitter is optional (`pip install "mempalace-code[treesitter]"`). Without it, all languages fall back to regex boundary detection — still structural, just less precise.
145
+ The `mempalace_code_search` language filter is generated from the same language
146
+ catalog as the miner. If a file type is mined with a language label, the MCP
147
+ schema and unsupported-language hints stay aligned with that catalog.
148
+
149
+ Tree-sitter is optional (`pip install "mempalace-code[treesitter]"`). When a grammar is missing, Python, TypeScript/JavaScript/TSX/JSX, Go, and Rust fall back to regex structural chunking. Other recognized formats use their regex, YAML-aware, prose, or adaptive chunker as listed above.
142
150
 
143
151
  ```bash
144
152
  mempalace mine ~/projects/myapp # all supported file types
@@ -260,7 +268,7 @@ claude mcp add mempalace -- python -m mempalace.mcp_server
260
268
  | `mempalace_list_wings` | All wings with drawer counts |
261
269
  | `mempalace_list_rooms` | Rooms within a wing |
262
270
  | `mempalace_get_taxonomy` | Full wing → room → count tree |
263
- | `mempalace_search` | Semantic search with optional wing/room filters |
271
+ | `mempalace_search` | Semantic search with optional wing/room filters; Markdown hits include heading path and section metadata |
264
272
  | `mempalace_code_search` | Filter by language, symbol name/type, file glob |
265
273
  | `mempalace_file_context` | All indexed chunks for a source file, ordered by chunk_index |
266
274
  | `mempalace_check_duplicate` | Similarity check before filing (0.9 threshold) |
@@ -457,7 +465,7 @@ This is a code-first fork of [milla-jovovich/mempalace](https://github.com/milla
457
465
  | No backup, no recovery | `backup` / `restore` / `export` / `import` |
458
466
  | No incremental mining | Content-hash incremental: only changed files re-chunked |
459
467
  | No code-search | `code_search` — filter by language, symbol, glob |
460
- | Line-count chunking | Tree-sitter AST + regex structural chunking |
468
+ | Line-count chunking | Language-aware mining: tree-sitter AST for supported grammars, regex structural chunking, YAML-aware Kubernetes splits, prose sections, and adaptive chunks for configs/data |
461
469
 
462
470
  Full audit: [`docs/UPSTREAM_HARDENING.md`](docs/UPSTREAM_HARDENING.md).
463
471
 
@@ -610,9 +618,9 @@ python -m pytest tests/ -x -q # full suite, all local, no network
610
618
  Apache 2.0 — see [LICENSE](LICENSE) and [NOTICE](NOTICE).
611
619
 
612
620
  <!-- Link Definitions -->
613
- [version-shield]: https://img.shields.io/badge/version-1.6.0-4dc9f6?style=flat-square&labelColor=0a0e14
621
+ [version-shield]: https://img.shields.io/badge/version-1.6.2-4dc9f6?style=flat-square&labelColor=0a0e14
614
622
  [release-link]: https://github.com/rergards/mempalace-code/releases
615
- [python-shield]: https://img.shields.io/badge/python-3.9+-7dd8f8?style=flat-square&labelColor=0a0e14&logo=python&logoColor=7dd8f8
623
+ [python-shield]: https://img.shields.io/badge/python-3.11+-7dd8f8?style=flat-square&labelColor=0a0e14&logo=python&logoColor=7dd8f8
616
624
  [python-link]: https://www.python.org/
617
625
  [license-shield]: https://img.shields.io/badge/license-Apache_2.0-b0e8ff?style=flat-square&labelColor=0a0e14
618
626
  [license-link]: https://github.com/rergards/mempalace-code/blob/main/LICENSE
@@ -308,3 +308,8 @@ items:
308
308
  resolution: '2026-04-26: completed in pipeline run'
309
309
  archived_date: "2026-04-26"
310
310
  done_summary: completed in pipeline run
311
+ - key: CODE-LANGUAGE-CATALOG-FRESH-PR4
312
+ summary: 'Rebuild PR #4 language catalog on top of current mining/search code'
313
+ resolution: '2026-05-01: completed in pipeline run'
314
+ archived_date: "2026-05-01"
315
+ done_summary: completed in pipeline run
@@ -43,6 +43,36 @@ sections:
43
43
  # OPEN ITEMS
44
44
  # ============================================================
45
45
  items:
46
+ - key: MINE-APP-SCAN-EXCLUDES-PR4
47
+ summary: "Evaluate PR #4 app-level scan excludes as a separate mining feature"
48
+ type: task
49
+ priority: medium
50
+ status: open
51
+ size: M
52
+ section_id: coding_first
53
+ labels: [mining, watcher, config, pr4]
54
+ description: |-
55
+ ## Problem
56
+ PR #4 includes app-level scan exclusions (`scan_skip_dirs`, `scan_skip_files`,
57
+ `scan_skip_globs`) alongside its language-catalog refactor. The catalog part was
58
+ rebuilt on current main, but global scan excludes are a separate behavior change and
59
+ should not be merged accidentally through the catalog task.
60
+
61
+ ## Scope
62
+ Re-evaluate the PR #4 scan-exclude idea on current main and implement it only if the
63
+ behavior is still desirable. The feature should exclude noisy generated artifacts such
64
+ as `.kotlin-lsp/` and `workspace.json` consistently from `scan_project()` and watcher
65
+ relevance checks, while preserving explicit `--include-ignored` overrides.
66
+
67
+ ## Acceptance criteria
68
+ - Global scan-exclude config keys are documented and loaded from
69
+ `~/.mempalace/config.json`.
70
+ - `scan_project()` and watcher relevance filtering use the same app-level rules.
71
+ - Explicit `--include-ignored` paths override app-level excludes.
72
+ - Defaults are conservative and do not hide common source files.
73
+ - Tests cover config loading, miner scanning, watcher filtering, and override behavior.
74
+ - README/docs describe how to remove previously indexed noise by re-mining.
75
+ resolution: null
46
76
  - key: CLEAN-ONBOARDING
47
77
  summary: "Replace interactive onboarding with config-file-first setup"
48
78
  type: task
@@ -19,7 +19,7 @@ If you want to answer "what did we decide about auth last quarter?" or "find the
19
19
  | Dimension | Graphify | mempalace-code |
20
20
  |-----------|----------|-----------|
21
21
  | Core data structure | NetworkX MultiDiGraph | LanceDB columnar vector store + SQLite KG |
22
- | Code understanding | tree-sitter AST, 20 languages | regex-based structural chunking (def/class/export) + language detection |
22
+ | Code understanding | tree-sitter AST, 20 languages | language-aware mining: optional tree-sitter chunks for Python/JS/TS/TSX/JSX/Go/Rust, regex structural chunks for supported languages, YAML-aware Kubernetes, adaptive config/prose chunks |
23
23
  | Semantic layer | Claude subagent extracts concepts into graph nodes | `all-MiniLM-L6-v2` embeddings (384d, local) |
24
24
  | Graph clustering | **Leiden community detection** (produces "god nodes" + clusters) | none — query-time ranked retrieval only |
25
25
  | Search primitive | graph traversal, BFS with hop limits | cosine distance over vectors, filtered by wing/room |
@@ -28,7 +28,7 @@ If you want to answer "what did we decide about auth last quarter?" or "find the
28
28
  | Conversation mining | none | `convo_miner.py` ingests Claude/ChatGPT/Slack exports |
29
29
  | Multimodal | **PDFs, images, videos, YouTube links** (via host LLM API) | text only |
30
30
  | Visualization | **interactive HTML graph** (pyvis) | none |
31
- | Incremental rebuild | **SHA256 file-level cache** | not yet (planned: CODE-INCREMENTAL) |
31
+ | Incremental rebuild | **SHA256 file-level cache** | content-hash incremental mining; only changed files are re-chunked |
32
32
  | Privacy on ingest | code stays local; **docs/PDFs/images sent to host LLM API** | **nothing leaves the host, ever** (fully offline) |
33
33
  | Embedding dependency | none | 80 MB `all-MiniLM-L6-v2` model downloaded once |
34
34
  | MCP surface | `/graphify query`, `/graphify path`, `/graphify explain` | 27 MCP tools (search, traverse, diary, KG, arch-retrieval, stats, …) |
@@ -85,11 +85,11 @@ Graphify is per-project — each repo has its own `graphify-out/` directory and
85
85
 
86
86
  mempalace-code has no visualization layer. Vector spaces do not visualize well; graph structures do.
87
87
 
88
- ### 2. Tree-sitter AST, 20 languages
88
+ ### 2. Full AST graph precision across more languages
89
89
 
90
90
  Graphify uses tree-sitter for parsing, covering 20 languages precisely. Function calls, imports, class references, and type usages are captured at AST fidelity.
91
91
 
92
- mempalace-code uses regex-based structural chunking. It handles Python, JS, TS, Go, Rust reasonably well but it is not an AST it cannot track `foo()` → function definition of `foo` across files. Symbol metadata is per-chunk only, not cross-referenced.
92
+ mempalace-code uses tree-sitter for chunk boundaries when optional grammars are installed for Python, TypeScript/JavaScript/TSX/JSX, Go, and Rust. It also uses regex structural chunking for Java, Kotlin, .NET languages, XAML, Swift, PHP, Scala, Dart, and Terraform/HCL, YAML-aware splitting for Kubernetes manifests, and adaptive chunking for configs/data/prose. That is still not a call graph: it cannot track `foo()` → function definition of `foo` across files. Symbol metadata is per-chunk only, not cross-referenced.
93
93
 
94
94
  **Consequence**: for "find all call sites of this function" graphify is the right tool. mempalace-code will not answer that precisely.
95
95
 
@@ -105,11 +105,11 @@ For projects that include research papers, architecture diagrams as PNGs, or rec
105
105
 
106
106
  mempalace-code is text-only. No PDF parsing, no image captioning, no video transcription.
107
107
 
108
- ### 5. Shipped SHA256 incremental rebuild
108
+ ### 5. Incremental rebuild is no longer a Graphify-only win
109
109
 
110
110
  Graphify caches parsed AST by file SHA256. Re-running on an unchanged file is a cache hit; only changed files are re-processed.
111
111
 
112
- mempalace-code's incremental re-mine is on the pre_release backlog (`CODE-INCREMENTAL`) but not yet shipped. Today, `mempalace mine` against a large repo is full-rebuild.
112
+ mempalace-code now also mines incrementally by content hash: unchanged drawers are skipped and only changed files are re-chunked unless `--full` is passed. Graphify still wins on full structural graph analysis, but the basic "do not rebuild every unchanged file" capability is now table stakes for both tools.
113
113
 
114
114
  ### 6. 10-platform reach via installer
115
115
 
@@ -175,12 +175,12 @@ These are genuinely good ideas from graphify that mempalace can incorporate with
175
175
 
176
176
  | Idea | Cost | Value | Status |
177
177
  |------|------|-------|--------|
178
- | **SHA256 file cache for incremental re-mine** | M | high | Already in pre_release as `CODE-INCREMENTAL` |
178
+ | **Broader AST coverage / call graph extraction** | L | high | Post-launch candidate current tree-sitter support is chunk-boundary only for Python/JS/TS/TSX/JSX/Go/Rust |
179
179
  | **Explicit per-edge / per-drawer provenance label** | S | medium | New (not in backlog yet) — e.g. `confidence`, `extractor_version` |
180
180
  | **`benchmarks/TOKEN_DELTA.md` with one public number** | S | high | Filed as `LAUNCH-BENCH-TOKEN-DELTA` (owner task) |
181
181
  | **Minimal static HTML visualization** of palace structure (wings × rooms × drawer counts) | M | medium | New candidate for post-launch |
182
182
  | **Per-platform installer** (`mempalace install --platform codex\|cursor\|gemini`) | L | low | Not urgent — Claude Code + Codex both have native MCP; per-platform hooks are maintenance burden |
183
- | **Tree-sitter backend for structural chunking** | L | medium | Not urgent — current regex chunker scores R@5 = 0.95 on the internal bench |
183
+ | **Tree-sitter grammars beyond Python/JS/TS/Go/Rust** | M | medium | Not urgent — current regex/adaptive chunkers cover the launch languages, but not full AST semantics |
184
184
 
185
185
  Note: the always-on PreToolUse hook is intentionally absent from this list. See the preceding section for why.
186
186
 
@@ -194,7 +194,7 @@ Note: the always-on PreToolUse hook is intentionally absent from this list. See
194
194
  - crash-safe LanceDB (survives `Ctrl+C`)
195
195
 
196
196
  **Do not claim**:
197
- - AST precision — mempalace uses regex chunking
197
+ - full AST/code-graph precision — mempalace uses AST chunk boundaries for a subset, regex structural chunks for many languages, and adaptive chunks for configs/data, but does not build call graphs
198
198
  - multimodal ingest — mempalace is text-only
199
199
  - visualization — mempalace has none
200
200
  - community detection — different problem, different algorithm, not mempalace's game
@@ -4,7 +4,7 @@ mempalace-code does **semantic vector search** — it finds content by *meaning*
4
4
 
5
5
  ## The Algorithm in 5 Steps
6
6
 
7
- 1. **During mining** (`mempalace mine`), every source file is split into chunks. Each chunk is passed through the `all-MiniLM-L6-v2` model, which converts the text into a **384-dimensional vector** — a numeric fingerprint of its meaning. The vector is stored in LanceDB alongside metadata (`wing`, `room`, `source_file`, `language`, `symbol_name`, `symbol_type`).
7
+ 1. **During mining** (`mempalace mine`), every source file is split into chunks. Each chunk is passed through the `all-MiniLM-L6-v2` model, which converts the text into a **384-dimensional vector** — a numeric fingerprint of its meaning. The vector is stored in LanceDB alongside metadata (`wing`, `room`, `source_file`, `language`, `symbol_name`, `symbol_type`). Markdown drawers also store section metadata (`heading`, `heading_level`, `heading_path`, `doc_section_type`) and flags for Mermaid diagrams, fenced code blocks, and tables.
8
8
 
9
9
  2. **At query time**, the query string (e.g. `"detect language file extension"`) goes through the same model and produces another 384-dimensional vector in the same semantic space.
10
10
 
@@ -12,7 +12,7 @@ mempalace-code does **semantic vector search** — it finds content by *meaning*
12
12
 
13
13
  4. **Optional `wing` / `room` filters** are applied as standard SQL `WHERE` predicates. LanceDB decides whether to pre-filter before the vector search or post-filter after it.
14
14
 
15
- 5. **Top-N results are returned** with a `similarity = 1 - distance` score (1.0 = perfect match, 0.0 = unrelated).
15
+ 5. **Top-N results are returned** with a `similarity = 1 - distance` score (1.0 = perfect match, 0.0 = unrelated). Programmatic search returns the stored metadata with each hit so agents can cite the file, symbol, language, and Markdown section path when available.
16
16
 
17
17
  ## ASCII Diagram
18
18
 
@@ -62,9 +62,10 @@ mempalace-code does **semantic vector search** — it finds content by *meaning*
62
62
  │ source: miner.py sim: 0.396 │
63
63
  │ def detect_language(path): ... │
64
64
  │ │
65
- │ [2] mempalace / miner
66
- │ source: miner.py sim: 0.351
67
- EXTENSION_LANG_MAP = { ... }
65
+ │ [2] mempalace / language_catalog
66
+ │ source: language_catalog.py
67
+ sim: 0.351
68
+ │ _EXTENSION_LANG_MAP = { ... } │
68
69
  │ │
69
70
  │ [3] ... │
70
71
  └──────────────────────────────────────┘
@@ -78,9 +79,11 @@ mempalace-code does **semantic vector search** — it finds content by *meaning*
78
79
  - **The ANN index is approximate.** LanceDB uses IVF-PQ, which trades a tiny amount of recall for a massive speedup. On palaces with ~20k rows, the difference between the ANN search and exact brute force is negligible.
79
80
  - **Similarity is not a probability.** A score of 0.396 does not mean "40% match". Scores are only comparable *within the same query* — 0.4 beats 0.3 for the same query, but a 0.4 on one query and a 0.4 on another are not the same thing.
80
81
  - **`wing` / `room` filters are cheap.** They are plain columns in LanceDB, evaluated as SQL predicates.
82
+ - **Language filters share the miner catalog.** `code_search(language=...)` validates against the same language labels the miner emits, and the MCP schema hint is generated from that catalog.
83
+ - **Markdown location survives retrieval.** For `.md` files, `search_memories()` results include `heading`, `heading_level`, `heading_path`, `doc_section_type`, `contains_mermaid`, `contains_code`, and `contains_table` when the drawer came from a headed section.
81
84
 
82
85
  ## Where the Code Lives
83
86
 
84
- - `mempalace/searcher.py:21-90` — high-level `search()` and `search_memories()` functions.
87
+ - `mempalace/searcher.py` — high-level `search()` and `search_memories()` functions.
85
88
  - `mempalace/storage.py` — `LanceStore.query()`, which owns the embedding model, the LanceDB handle, and the actual vector search call.
86
89
  - `mempalace/miner.py` — smart chunker, language detection, symbol extraction, and the batch embedding loop used during `mempalace mine`.
@@ -6,14 +6,15 @@ The Python package that powers mempalace-code. All modules, all logic.
6
6
 
7
7
  | Module | What it does |
8
8
  |--------|-------------|
9
- | `cli.py` | CLI entry point — routes to mine, search, init, compress, wake-up |
9
+ | `cli.py` | CLI entry point — routes to init, mine, search, watch, backup/restore, export/import, health, and wake-up |
10
10
  | `config.py` | Configuration loading — `~/.mempalace/config.json`, env vars, defaults |
11
+ | `language_catalog.py` | Shared language metadata for miner detection, `code_search` validation, and MCP language hints |
11
12
  | `normalize.py` | Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format |
12
- | `miner.py` | Project file ingest — scans directories, chunks by paragraph, stores to ChromaDB |
13
+ | `miner.py` | Project file ingest — scans directories, detects languages, chunks code/prose/config, stores drawers; Markdown chunks keep heading path and section metadata |
13
14
  | `convo_miner.py` | Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content |
14
- | `searcher.py` | Semantic search via ChromaDB vectors — filters by wing/room, returns verbatim + scores |
15
+ | `searcher.py` | Semantic search via LanceDB vectors — filters by wing/room/language/symbol, returns verbatim text, scores, and stored metadata such as Markdown heading path |
15
16
  | `layers.py` | 4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search) |
16
- | `dialect.py` | AAAK compression — entity codes, emotion markers, 30x lossless ratio |
17
+ | `dialect.py` | AAAK lossy summary dialect — entity codes, topic markers, and token-saving estimates |
17
18
  | `knowledge_graph.py` | Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation |
18
19
  | `palace_graph.py` | Room-based navigation graph — BFS traversal, tunnel detection across wings |
19
20
  | `mcp_server.py` | MCP server — 27 tools, AAAK auto-teach, Palace Protocol, agent diary |
@@ -28,7 +29,7 @@ The Python package that powers mempalace-code. All modules, all logic.
28
29
  ## Architecture
29
30
 
30
31
  ```
31
- User → CLI → miner/convo_miner → ChromaDB (palace)
32
+ User → CLI → miner/convo_miner → LanceDB (palace)
32
33
 
33
34
  knowledge_graph (SQLite)
34
35
 
@@ -37,4 +38,4 @@ User → MCP Server → searcher → results
37
38
  → diary → agent journal
38
39
  ```
39
40
 
40
- The palace (ChromaDB) stores verbatim content. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool.
41
+ The palace (LanceDB) stores verbatim drawer content and vector metadata. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool. ChromaDB is a deprecated optional legacy backend only.