mempalace-code 1.6.0__tar.gz → 1.6.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/CHANGELOG.md +23 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/PKG-INFO +26 -18
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/README.md +25 -17
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/BACKLOG-archived.yaml +5 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/BACKLOG.yaml +30 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/COMPARISON_GRAPHIFY.md +9 -9
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/HOW_SEARCH_WORKS.md +9 -6
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/README.md +7 -6
- mempalace_code-1.6.2/mempalace/language_catalog.py +251 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/mcp_server.py +2 -7
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/miner.py +181 -143
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/searcher.py +9 -51
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/storage.py +8 -0
- mempalace_code-1.6.2/mempalace/version.py +18 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/pyproject.toml +1 -1
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_chunking.py +56 -0
- mempalace_code-1.6.2/tests/test_language_catalog.py +119 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_mcp_server.py +19 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_miner.py +68 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_searcher.py +21 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_watcher.py +8 -0
- mempalace_code-1.6.0/mempalace/version.py +0 -12
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/prompts/codex-hardening-review.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/prompts/codex-plan-review.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/settings.json +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/_shared/commit-checkpoint.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/_shared/mode-classification.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/_shared/task-state.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/bench/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/doc-refresh/INSTRUCTIONS.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/doc-refresh/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/entropy-gc/INSTRUCTIONS.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/entropy-gc/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/mine/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/palace-health/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/release/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/release-prep/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/ship/INSTRUCTIONS.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/ship/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/start/INSTRUCTIONS.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/start/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/status/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/task-hardening/INSTRUCTIONS.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/task-hardening/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/task-plan/INSTRUCTIONS.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/task-plan/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/verify/INSTRUCTIONS.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.claude/skills/verify/SKILL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/PULL_REQUEST_TEMPLATE.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/workflows/ci.yml +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.github/workflows/publish.yml +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.gitignore +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.pre-commit-config.yaml +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/.verify-state +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/AGENTS.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/CLAUDE.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/CONTRIBUTING.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/LICENSE +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/NOTICE +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/assets/mempalace_banner.jpg +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/BENCHMARKS.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/HYBRID_MODE.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/README.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/code_retrieval_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/convomem_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/data/code_retrieval_queries.json +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/dotnet_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/embed_ab_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/locomo_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/longmemeval_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/membench_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/results_embed_ab_2026-04-09.json +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/results_token_delta_mempalace.json +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/results_token_delta_wh40k.json +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/benchmarks/token_delta_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/AGENT_INSTALL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/BACKUP_RESTORE.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/BENCH_TOKEN_DELTA.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/LLM_USAGE_RULES.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/OFFLINE_USAGE.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/UPSTREAM_HARDENING.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/docs/WHY_THIS_FORK.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/HOOKS_TUTORIAL.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/basic_mining.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/convo_import.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/gemini_cli_setup.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/examples/mcp_setup.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/hooks/README.md +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/hooks/mempal_precompact_hook.sh +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/hooks/mempal_save_hook.sh +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/__init__.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/__main__.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/_chroma_store.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/backup.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/cli.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/config.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/convo_miner.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/dialect.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/entity_detector.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/entity_registry.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/export.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/general_extractor.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/knowledge_graph.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/layers.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/migrate.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/normalize.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/onboarding.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/palace_graph.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/py.typed +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/room_detector_local.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/spellcheck.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/split_mega_files.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/treesitter.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/mempalace/watcher.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/scripts/bootstrap.sh +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/scripts/codex-review.sh +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/scripts/nuke_wing.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/conftest.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_backup.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_chroma_compat.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_cli.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_code_retrieval_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_config.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_convo_miner.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_dialect.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_dotnet_config.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_e2e.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_embed_ab_bench.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_entity_detector.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_export.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_general_extractor.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_kg_extract.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_knowledge_graph.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_lang_detect.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_migrate.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_normalize.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_offline.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_split_mega_files.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_storage.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_storage_lance.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_symbol_extract.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_treesitter.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/tests/test_version_consistency.py +0 -0
- {mempalace_code-1.6.0 → mempalace_code-1.6.2}/uv.lock +0 -0
|
@@ -1,5 +1,28 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## v1.6.2 — 2026-05-01
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
|
|
7
|
+
- Shared language catalog for miner detection, `code_search` validation, and MCP language hints.
|
|
8
|
+
- `code_search(language=...)` now accepts Kotlin, XML project files, and Perl shebang-detected files, matching mined language labels from the catalog.
|
|
9
|
+
|
|
10
|
+
### Changed
|
|
11
|
+
|
|
12
|
+
- The `mempalace_code_search` MCP language description is generated from the same catalog used by search validation, reducing future drift when language support changes.
|
|
13
|
+
- PR #4's scan-exclude proposal is split into backlog item `MINE-APP-SCAN-EXCLUDES-PR4` instead of being merged with the catalog refactor.
|
|
14
|
+
|
|
15
|
+
## v1.6.1 — 2026-04-30
|
|
16
|
+
|
|
17
|
+
### Added
|
|
18
|
+
|
|
19
|
+
- Markdown section metadata in mined drawers: heading, heading level, heading path, document section type, and flags for Mermaid diagrams, fenced code blocks, and Markdown tables.
|
|
20
|
+
- `search_memories` now returns Markdown section context with each result when available.
|
|
21
|
+
|
|
22
|
+
### Changed
|
|
23
|
+
|
|
24
|
+
- Markdown prose chunking treats `#` through `######` headings as section boundaries and preserves section metadata through small-section merges and oversized-section splits.
|
|
25
|
+
|
|
3
26
|
## v1.6.0 — 2026-04-27
|
|
4
27
|
|
|
5
28
|
### Added
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: mempalace-code
|
|
3
|
-
Version: 1.6.
|
|
3
|
+
Version: 1.6.2
|
|
4
4
|
Summary: Developer memory tool — mine codebases and conversations into a LanceDB-backed searchable palace. No API key required.
|
|
5
5
|
Project-URL: Homepage, https://github.com/rergards/mempalace-code
|
|
6
6
|
Project-URL: Repository, https://github.com/rergards/mempalace-code
|
|
@@ -69,14 +69,14 @@ No cloud. No API keys. No subscription. Nothing leaves your machine.
|
|
|
69
69
|
|
|
70
70
|
<table>
|
|
71
71
|
<tr>
|
|
72
|
-
<td align="center"><strong>
|
|
72
|
+
<td align="center"><strong>Language-Aware Mining</strong><br><sub>AST, regex, and adaptive chunking<br>matched to each file type</sub></td>
|
|
73
73
|
<td align="center"><strong>27 MCP Tools</strong><br><sub>Native Claude Code integration<br>search, store, traverse</sub></td>
|
|
74
74
|
<td align="center"><strong>Temporal Knowledge Graph</strong><br><sub>Facts that change over time<br>with validity windows</sub></td>
|
|
75
75
|
</tr>
|
|
76
76
|
<tr>
|
|
77
77
|
<td align="center"><strong>595x Token Savings</strong><br><sub>measured peak · median 80x<br><a href="docs/BENCH_TOKEN_DELTA.md">scales with project size</a></sub></td>
|
|
78
78
|
<td align="center"><strong>Cross-Project Tunnels</strong><br><sub>Search <code>auth</code> in one project<br>find it everywhere</sub></td>
|
|
79
|
-
<td align="center"><strong>
|
|
79
|
+
<td align="center"><strong>1322 Tests · $0 Cost</strong><br><sub>Every feature acceptance-gated<br>fully offline after install</sub></td>
|
|
80
80
|
</tr>
|
|
81
81
|
</table>
|
|
82
82
|
|
|
@@ -150,7 +150,7 @@ You write code. You make decisions. You debug things. Between sessions, all that
|
|
|
150
150
|
mempalace-code **indexes it once** into a local vector store, then your AI finds it in milliseconds — using [595x fewer tokens](docs/BENCH_TOKEN_DELTA.md) than grep + read at measured peak (median 80x on a 19k-chunk project, and it keeps scaling). Think of it as `git log` for everything that *isn't* in the code: the *why*, the discussions, the dead ends, the decisions.
|
|
151
151
|
|
|
152
152
|
**What gets indexed:**
|
|
153
|
-
- Code files —
|
|
153
|
+
- Code files — structural chunks for Python, TypeScript/JS/TSX/JSX, Go, Rust, Java, Kotlin, C#, F#, VB.NET, XAML, Swift, PHP, Scala, Dart, Terraform/HCL, Markdown, and Kubernetes manifests; adaptive chunks for C/C++, Ruby, shell, SQL, HTML/CSS, JSON/YAML/TOML, CSV, Dockerfile, Make, templates, and config files
|
|
154
154
|
- .NET solutions — `.sln`/`.csproj` project graphs, cross-project symbol relationships, interface implementations
|
|
155
155
|
- Conversation exports — Claude, ChatGPT, Slack
|
|
156
156
|
- Architecture notes, decisions, anything you store manually
|
|
@@ -163,14 +163,14 @@ mempalace-code **indexes it once** into a local vector store, then your AI finds
|
|
|
163
163
|
|
|
164
164
|
### Language-Aware Code Mining
|
|
165
165
|
|
|
166
|
-
`mempalace mine` walks your source tree and
|
|
166
|
+
`mempalace mine` walks your source tree and chooses the best chunker for each file type: AST boundaries where optional tree-sitter grammars are available, regex structural boundaries for supported languages, YAML-aware Kubernetes resource splits, Markdown/prose sections, or adaptive line-count chunks for formats without reliable declarations. Leading comments and docstrings stay attached to declarations where structural chunking is active; Markdown drawers keep heading path, section type, and Mermaid/code/table flags in search metadata.
|
|
167
167
|
|
|
168
168
|
| Language | Strategy | AST Support |
|
|
169
169
|
|----------|----------|:-----------:|
|
|
170
|
-
| Python | Functions, classes, methods, decorators |
|
|
171
|
-
| TypeScript / JavaScript / TSX / JSX | Functions, classes, exports, imports |
|
|
172
|
-
| Go | Functions, types, methods, interfaces |
|
|
173
|
-
| Rust | Functions, structs, enums, traits, impls |
|
|
170
|
+
| Python | Functions, classes, methods, decorators | Optional tree-sitter |
|
|
171
|
+
| TypeScript / JavaScript / TSX / JSX | Functions, classes, exports, imports | Optional tree-sitter |
|
|
172
|
+
| Go | Functions, types, methods, interfaces | Optional tree-sitter |
|
|
173
|
+
| Rust | Functions, structs, enums, traits, impls | Optional tree-sitter |
|
|
174
174
|
| Java | Classes, interfaces, methods, annotations | Regex |
|
|
175
175
|
| Kotlin | Classes, objects, functions, extensions | Regex |
|
|
176
176
|
| Scala | Classes, case classes, objects, traits, enums, functions, implicits, type aliases, generics | Regex |
|
|
@@ -180,12 +180,20 @@ mempalace-code **indexes it once** into a local vector store, then your AI finds
|
|
|
180
180
|
| C# | Classes, interfaces, records, methods, properties | Regex |
|
|
181
181
|
| F# / VB.NET | Modules, types, functions | Regex |
|
|
182
182
|
| XAML | Controls, resources, code-behind linking | Regex |
|
|
183
|
-
|
|
|
184
|
-
| Kubernetes manifests | Deployments, Services, ConfigMaps, Secrets, Ingresses, CRDs (indexed by kind/
|
|
185
|
-
| Markdown / plain text | Heading sections, paragraphs | — |
|
|
186
|
-
|
|
|
183
|
+
| Terraform / HCL | Terraform/HCL top-level blocks (`resource`, `module`, `variable`, `moved`, `import`, `check`, etc.) | Regex |
|
|
184
|
+
| Kubernetes manifests | Deployments, Services, ConfigMaps, Secrets, Ingresses, CRDs (indexed by kind/name) | YAML-aware |
|
|
185
|
+
| Markdown / plain text | Heading sections (`#`-`######`), heading paths, section metadata, paragraphs | — |
|
|
186
|
+
| C / C++ | Indexed and searchable with best-effort symbol metadata; chunked adaptively today | — |
|
|
187
|
+
| Ruby / shell / SQL | Indexed and searchable; chunked adaptively today | — |
|
|
188
|
+
| HTML / CSS / CSV | Indexed and searchable; chunked adaptively today | — |
|
|
189
|
+
| YAML / JSON / TOML | Adaptive line-count; Kubernetes YAML auto-detected separately | — |
|
|
190
|
+
| Dockerfile / Make / templates / config | Dockerfile, Containerfile, Makefile, GNUmakefile, Vagrantfile, Go templates, Jinja2, `.conf`, `.cfg`, `.ini` | — |
|
|
187
191
|
|
|
188
|
-
|
|
192
|
+
The `mempalace_code_search` language filter is generated from the same language
|
|
193
|
+
catalog as the miner. If a file type is mined with a language label, the MCP
|
|
194
|
+
schema and unsupported-language hints stay aligned with that catalog.
|
|
195
|
+
|
|
196
|
+
Tree-sitter is optional (`pip install "mempalace-code[treesitter]"`). When a grammar is missing, Python, TypeScript/JavaScript/TSX/JSX, Go, and Rust fall back to regex structural chunking. Other recognized formats use their regex, YAML-aware, prose, or adaptive chunker as listed above.
|
|
189
197
|
|
|
190
198
|
```bash
|
|
191
199
|
mempalace mine ~/projects/myapp # all supported file types
|
|
@@ -307,7 +315,7 @@ claude mcp add mempalace -- python -m mempalace.mcp_server
|
|
|
307
315
|
| `mempalace_list_wings` | All wings with drawer counts |
|
|
308
316
|
| `mempalace_list_rooms` | Rooms within a wing |
|
|
309
317
|
| `mempalace_get_taxonomy` | Full wing → room → count tree |
|
|
310
|
-
| `mempalace_search` | Semantic search with optional wing/room filters |
|
|
318
|
+
| `mempalace_search` | Semantic search with optional wing/room filters; Markdown hits include heading path and section metadata |
|
|
311
319
|
| `mempalace_code_search` | Filter by language, symbol name/type, file glob |
|
|
312
320
|
| `mempalace_file_context` | All indexed chunks for a source file, ordered by chunk_index |
|
|
313
321
|
| `mempalace_check_duplicate` | Similarity check before filing (0.9 threshold) |
|
|
@@ -504,7 +512,7 @@ This is a code-first fork of [milla-jovovich/mempalace](https://github.com/milla
|
|
|
504
512
|
| No backup, no recovery | `backup` / `restore` / `export` / `import` |
|
|
505
513
|
| No incremental mining | Content-hash incremental: only changed files re-chunked |
|
|
506
514
|
| No code-search | `code_search` — filter by language, symbol, glob |
|
|
507
|
-
| Line-count chunking |
|
|
515
|
+
| Line-count chunking | Language-aware mining: tree-sitter AST for supported grammars, regex structural chunking, YAML-aware Kubernetes splits, prose sections, and adaptive chunks for configs/data |
|
|
508
516
|
|
|
509
517
|
Full audit: [`docs/UPSTREAM_HARDENING.md`](docs/UPSTREAM_HARDENING.md).
|
|
510
518
|
|
|
@@ -657,9 +665,9 @@ python -m pytest tests/ -x -q # full suite, all local, no network
|
|
|
657
665
|
Apache 2.0 — see [LICENSE](LICENSE) and [NOTICE](NOTICE).
|
|
658
666
|
|
|
659
667
|
<!-- Link Definitions -->
|
|
660
|
-
[version-shield]: https://img.shields.io/badge/version-1.6.
|
|
668
|
+
[version-shield]: https://img.shields.io/badge/version-1.6.2-4dc9f6?style=flat-square&labelColor=0a0e14
|
|
661
669
|
[release-link]: https://github.com/rergards/mempalace-code/releases
|
|
662
|
-
[python-shield]: https://img.shields.io/badge/python-3.
|
|
670
|
+
[python-shield]: https://img.shields.io/badge/python-3.11+-7dd8f8?style=flat-square&labelColor=0a0e14&logo=python&logoColor=7dd8f8
|
|
663
671
|
[python-link]: https://www.python.org/
|
|
664
672
|
[license-shield]: https://img.shields.io/badge/license-Apache_2.0-b0e8ff?style=flat-square&labelColor=0a0e14
|
|
665
673
|
[license-link]: https://github.com/rergards/mempalace-code/blob/main/LICENSE
|
|
@@ -22,14 +22,14 @@ No cloud. No API keys. No subscription. Nothing leaves your machine.
|
|
|
22
22
|
|
|
23
23
|
<table>
|
|
24
24
|
<tr>
|
|
25
|
-
<td align="center"><strong>
|
|
25
|
+
<td align="center"><strong>Language-Aware Mining</strong><br><sub>AST, regex, and adaptive chunking<br>matched to each file type</sub></td>
|
|
26
26
|
<td align="center"><strong>27 MCP Tools</strong><br><sub>Native Claude Code integration<br>search, store, traverse</sub></td>
|
|
27
27
|
<td align="center"><strong>Temporal Knowledge Graph</strong><br><sub>Facts that change over time<br>with validity windows</sub></td>
|
|
28
28
|
</tr>
|
|
29
29
|
<tr>
|
|
30
30
|
<td align="center"><strong>595x Token Savings</strong><br><sub>measured peak · median 80x<br><a href="docs/BENCH_TOKEN_DELTA.md">scales with project size</a></sub></td>
|
|
31
31
|
<td align="center"><strong>Cross-Project Tunnels</strong><br><sub>Search <code>auth</code> in one project<br>find it everywhere</sub></td>
|
|
32
|
-
<td align="center"><strong>
|
|
32
|
+
<td align="center"><strong>1322 Tests · $0 Cost</strong><br><sub>Every feature acceptance-gated<br>fully offline after install</sub></td>
|
|
33
33
|
</tr>
|
|
34
34
|
</table>
|
|
35
35
|
|
|
@@ -103,7 +103,7 @@ You write code. You make decisions. You debug things. Between sessions, all that
|
|
|
103
103
|
mempalace-code **indexes it once** into a local vector store, then your AI finds it in milliseconds — using [595x fewer tokens](docs/BENCH_TOKEN_DELTA.md) than grep + read at measured peak (median 80x on a 19k-chunk project, and it keeps scaling). Think of it as `git log` for everything that *isn't* in the code: the *why*, the discussions, the dead ends, the decisions.
|
|
104
104
|
|
|
105
105
|
**What gets indexed:**
|
|
106
|
-
- Code files —
|
|
106
|
+
- Code files — structural chunks for Python, TypeScript/JS/TSX/JSX, Go, Rust, Java, Kotlin, C#, F#, VB.NET, XAML, Swift, PHP, Scala, Dart, Terraform/HCL, Markdown, and Kubernetes manifests; adaptive chunks for C/C++, Ruby, shell, SQL, HTML/CSS, JSON/YAML/TOML, CSV, Dockerfile, Make, templates, and config files
|
|
107
107
|
- .NET solutions — `.sln`/`.csproj` project graphs, cross-project symbol relationships, interface implementations
|
|
108
108
|
- Conversation exports — Claude, ChatGPT, Slack
|
|
109
109
|
- Architecture notes, decisions, anything you store manually
|
|
@@ -116,14 +116,14 @@ mempalace-code **indexes it once** into a local vector store, then your AI finds
|
|
|
116
116
|
|
|
117
117
|
### Language-Aware Code Mining
|
|
118
118
|
|
|
119
|
-
`mempalace mine` walks your source tree and
|
|
119
|
+
`mempalace mine` walks your source tree and chooses the best chunker for each file type: AST boundaries where optional tree-sitter grammars are available, regex structural boundaries for supported languages, YAML-aware Kubernetes resource splits, Markdown/prose sections, or adaptive line-count chunks for formats without reliable declarations. Leading comments and docstrings stay attached to declarations where structural chunking is active; Markdown drawers keep heading path, section type, and Mermaid/code/table flags in search metadata.
|
|
120
120
|
|
|
121
121
|
| Language | Strategy | AST Support |
|
|
122
122
|
|----------|----------|:-----------:|
|
|
123
|
-
| Python | Functions, classes, methods, decorators |
|
|
124
|
-
| TypeScript / JavaScript / TSX / JSX | Functions, classes, exports, imports |
|
|
125
|
-
| Go | Functions, types, methods, interfaces |
|
|
126
|
-
| Rust | Functions, structs, enums, traits, impls |
|
|
123
|
+
| Python | Functions, classes, methods, decorators | Optional tree-sitter |
|
|
124
|
+
| TypeScript / JavaScript / TSX / JSX | Functions, classes, exports, imports | Optional tree-sitter |
|
|
125
|
+
| Go | Functions, types, methods, interfaces | Optional tree-sitter |
|
|
126
|
+
| Rust | Functions, structs, enums, traits, impls | Optional tree-sitter |
|
|
127
127
|
| Java | Classes, interfaces, methods, annotations | Regex |
|
|
128
128
|
| Kotlin | Classes, objects, functions, extensions | Regex |
|
|
129
129
|
| Scala | Classes, case classes, objects, traits, enums, functions, implicits, type aliases, generics | Regex |
|
|
@@ -133,12 +133,20 @@ mempalace-code **indexes it once** into a local vector store, then your AI finds
|
|
|
133
133
|
| C# | Classes, interfaces, records, methods, properties | Regex |
|
|
134
134
|
| F# / VB.NET | Modules, types, functions | Regex |
|
|
135
135
|
| XAML | Controls, resources, code-behind linking | Regex |
|
|
136
|
-
|
|
|
137
|
-
| Kubernetes manifests | Deployments, Services, ConfigMaps, Secrets, Ingresses, CRDs (indexed by kind/
|
|
138
|
-
| Markdown / plain text | Heading sections, paragraphs | — |
|
|
139
|
-
|
|
|
136
|
+
| Terraform / HCL | Terraform/HCL top-level blocks (`resource`, `module`, `variable`, `moved`, `import`, `check`, etc.) | Regex |
|
|
137
|
+
| Kubernetes manifests | Deployments, Services, ConfigMaps, Secrets, Ingresses, CRDs (indexed by kind/name) | YAML-aware |
|
|
138
|
+
| Markdown / plain text | Heading sections (`#`-`######`), heading paths, section metadata, paragraphs | — |
|
|
139
|
+
| C / C++ | Indexed and searchable with best-effort symbol metadata; chunked adaptively today | — |
|
|
140
|
+
| Ruby / shell / SQL | Indexed and searchable; chunked adaptively today | — |
|
|
141
|
+
| HTML / CSS / CSV | Indexed and searchable; chunked adaptively today | — |
|
|
142
|
+
| YAML / JSON / TOML | Adaptive line-count; Kubernetes YAML auto-detected separately | — |
|
|
143
|
+
| Dockerfile / Make / templates / config | Dockerfile, Containerfile, Makefile, GNUmakefile, Vagrantfile, Go templates, Jinja2, `.conf`, `.cfg`, `.ini` | — |
|
|
140
144
|
|
|
141
|
-
|
|
145
|
+
The `mempalace_code_search` language filter is generated from the same language
|
|
146
|
+
catalog as the miner. If a file type is mined with a language label, the MCP
|
|
147
|
+
schema and unsupported-language hints stay aligned with that catalog.
|
|
148
|
+
|
|
149
|
+
Tree-sitter is optional (`pip install "mempalace-code[treesitter]"`). When a grammar is missing, Python, TypeScript/JavaScript/TSX/JSX, Go, and Rust fall back to regex structural chunking. Other recognized formats use their regex, YAML-aware, prose, or adaptive chunker as listed above.
|
|
142
150
|
|
|
143
151
|
```bash
|
|
144
152
|
mempalace mine ~/projects/myapp # all supported file types
|
|
@@ -260,7 +268,7 @@ claude mcp add mempalace -- python -m mempalace.mcp_server
|
|
|
260
268
|
| `mempalace_list_wings` | All wings with drawer counts |
|
|
261
269
|
| `mempalace_list_rooms` | Rooms within a wing |
|
|
262
270
|
| `mempalace_get_taxonomy` | Full wing → room → count tree |
|
|
263
|
-
| `mempalace_search` | Semantic search with optional wing/room filters |
|
|
271
|
+
| `mempalace_search` | Semantic search with optional wing/room filters; Markdown hits include heading path and section metadata |
|
|
264
272
|
| `mempalace_code_search` | Filter by language, symbol name/type, file glob |
|
|
265
273
|
| `mempalace_file_context` | All indexed chunks for a source file, ordered by chunk_index |
|
|
266
274
|
| `mempalace_check_duplicate` | Similarity check before filing (0.9 threshold) |
|
|
@@ -457,7 +465,7 @@ This is a code-first fork of [milla-jovovich/mempalace](https://github.com/milla
|
|
|
457
465
|
| No backup, no recovery | `backup` / `restore` / `export` / `import` |
|
|
458
466
|
| No incremental mining | Content-hash incremental: only changed files re-chunked |
|
|
459
467
|
| No code-search | `code_search` — filter by language, symbol, glob |
|
|
460
|
-
| Line-count chunking |
|
|
468
|
+
| Line-count chunking | Language-aware mining: tree-sitter AST for supported grammars, regex structural chunking, YAML-aware Kubernetes splits, prose sections, and adaptive chunks for configs/data |
|
|
461
469
|
|
|
462
470
|
Full audit: [`docs/UPSTREAM_HARDENING.md`](docs/UPSTREAM_HARDENING.md).
|
|
463
471
|
|
|
@@ -610,9 +618,9 @@ python -m pytest tests/ -x -q # full suite, all local, no network
|
|
|
610
618
|
Apache 2.0 — see [LICENSE](LICENSE) and [NOTICE](NOTICE).
|
|
611
619
|
|
|
612
620
|
<!-- Link Definitions -->
|
|
613
|
-
[version-shield]: https://img.shields.io/badge/version-1.6.
|
|
621
|
+
[version-shield]: https://img.shields.io/badge/version-1.6.2-4dc9f6?style=flat-square&labelColor=0a0e14
|
|
614
622
|
[release-link]: https://github.com/rergards/mempalace-code/releases
|
|
615
|
-
[python-shield]: https://img.shields.io/badge/python-3.
|
|
623
|
+
[python-shield]: https://img.shields.io/badge/python-3.11+-7dd8f8?style=flat-square&labelColor=0a0e14&logo=python&logoColor=7dd8f8
|
|
616
624
|
[python-link]: https://www.python.org/
|
|
617
625
|
[license-shield]: https://img.shields.io/badge/license-Apache_2.0-b0e8ff?style=flat-square&labelColor=0a0e14
|
|
618
626
|
[license-link]: https://github.com/rergards/mempalace-code/blob/main/LICENSE
|
|
@@ -308,3 +308,8 @@ items:
|
|
|
308
308
|
resolution: '2026-04-26: completed in pipeline run'
|
|
309
309
|
archived_date: "2026-04-26"
|
|
310
310
|
done_summary: completed in pipeline run
|
|
311
|
+
- key: CODE-LANGUAGE-CATALOG-FRESH-PR4
|
|
312
|
+
summary: 'Rebuild PR #4 language catalog on top of current mining/search code'
|
|
313
|
+
resolution: '2026-05-01: completed in pipeline run'
|
|
314
|
+
archived_date: "2026-05-01"
|
|
315
|
+
done_summary: completed in pipeline run
|
|
@@ -43,6 +43,36 @@ sections:
|
|
|
43
43
|
# OPEN ITEMS
|
|
44
44
|
# ============================================================
|
|
45
45
|
items:
|
|
46
|
+
- key: MINE-APP-SCAN-EXCLUDES-PR4
|
|
47
|
+
summary: "Evaluate PR #4 app-level scan excludes as a separate mining feature"
|
|
48
|
+
type: task
|
|
49
|
+
priority: medium
|
|
50
|
+
status: open
|
|
51
|
+
size: M
|
|
52
|
+
section_id: coding_first
|
|
53
|
+
labels: [mining, watcher, config, pr4]
|
|
54
|
+
description: |-
|
|
55
|
+
## Problem
|
|
56
|
+
PR #4 includes app-level scan exclusions (`scan_skip_dirs`, `scan_skip_files`,
|
|
57
|
+
`scan_skip_globs`) alongside its language-catalog refactor. The catalog part was
|
|
58
|
+
rebuilt on current main, but global scan excludes are a separate behavior change and
|
|
59
|
+
should not be merged accidentally through the catalog task.
|
|
60
|
+
|
|
61
|
+
## Scope
|
|
62
|
+
Re-evaluate the PR #4 scan-exclude idea on current main and implement it only if the
|
|
63
|
+
behavior is still desirable. The feature should exclude noisy generated artifacts such
|
|
64
|
+
as `.kotlin-lsp/` and `workspace.json` consistently from `scan_project()` and watcher
|
|
65
|
+
relevance checks, while preserving explicit `--include-ignored` overrides.
|
|
66
|
+
|
|
67
|
+
## Acceptance criteria
|
|
68
|
+
- Global scan-exclude config keys are documented and loaded from
|
|
69
|
+
`~/.mempalace/config.json`.
|
|
70
|
+
- `scan_project()` and watcher relevance filtering use the same app-level rules.
|
|
71
|
+
- Explicit `--include-ignored` paths override app-level excludes.
|
|
72
|
+
- Defaults are conservative and do not hide common source files.
|
|
73
|
+
- Tests cover config loading, miner scanning, watcher filtering, and override behavior.
|
|
74
|
+
- README/docs describe how to remove previously indexed noise by re-mining.
|
|
75
|
+
resolution: null
|
|
46
76
|
- key: CLEAN-ONBOARDING
|
|
47
77
|
summary: "Replace interactive onboarding with config-file-first setup"
|
|
48
78
|
type: task
|
|
@@ -19,7 +19,7 @@ If you want to answer "what did we decide about auth last quarter?" or "find the
|
|
|
19
19
|
| Dimension | Graphify | mempalace-code |
|
|
20
20
|
|-----------|----------|-----------|
|
|
21
21
|
| Core data structure | NetworkX MultiDiGraph | LanceDB columnar vector store + SQLite KG |
|
|
22
|
-
| Code understanding | tree-sitter AST, 20 languages |
|
|
22
|
+
| Code understanding | tree-sitter AST, 20 languages | language-aware mining: optional tree-sitter chunks for Python/JS/TS/TSX/JSX/Go/Rust, regex structural chunks for supported languages, YAML-aware Kubernetes, adaptive config/prose chunks |
|
|
23
23
|
| Semantic layer | Claude subagent extracts concepts into graph nodes | `all-MiniLM-L6-v2` embeddings (384d, local) |
|
|
24
24
|
| Graph clustering | **Leiden community detection** (produces "god nodes" + clusters) | none — query-time ranked retrieval only |
|
|
25
25
|
| Search primitive | graph traversal, BFS with hop limits | cosine distance over vectors, filtered by wing/room |
|
|
@@ -28,7 +28,7 @@ If you want to answer "what did we decide about auth last quarter?" or "find the
|
|
|
28
28
|
| Conversation mining | none | `convo_miner.py` ingests Claude/ChatGPT/Slack exports |
|
|
29
29
|
| Multimodal | **PDFs, images, videos, YouTube links** (via host LLM API) | text only |
|
|
30
30
|
| Visualization | **interactive HTML graph** (pyvis) | none |
|
|
31
|
-
| Incremental rebuild | **SHA256 file-level cache** |
|
|
31
|
+
| Incremental rebuild | **SHA256 file-level cache** | content-hash incremental mining; only changed files are re-chunked |
|
|
32
32
|
| Privacy on ingest | code stays local; **docs/PDFs/images sent to host LLM API** | **nothing leaves the host, ever** (fully offline) |
|
|
33
33
|
| Embedding dependency | none | 80 MB `all-MiniLM-L6-v2` model downloaded once |
|
|
34
34
|
| MCP surface | `/graphify query`, `/graphify path`, `/graphify explain` | 27 MCP tools (search, traverse, diary, KG, arch-retrieval, stats, …) |
|
|
@@ -85,11 +85,11 @@ Graphify is per-project — each repo has its own `graphify-out/` directory and
|
|
|
85
85
|
|
|
86
86
|
mempalace-code has no visualization layer. Vector spaces do not visualize well; graph structures do.
|
|
87
87
|
|
|
88
|
-
### 2.
|
|
88
|
+
### 2. Full AST graph precision across more languages
|
|
89
89
|
|
|
90
90
|
Graphify uses tree-sitter for parsing, covering 20 languages precisely. Function calls, imports, class references, and type usages are captured at AST fidelity.
|
|
91
91
|
|
|
92
|
-
mempalace-code uses
|
|
92
|
+
mempalace-code uses tree-sitter for chunk boundaries when optional grammars are installed for Python, TypeScript/JavaScript/TSX/JSX, Go, and Rust. It also uses regex structural chunking for Java, Kotlin, .NET languages, XAML, Swift, PHP, Scala, Dart, and Terraform/HCL, YAML-aware splitting for Kubernetes manifests, and adaptive chunking for configs/data/prose. That is still not a call graph: it cannot track `foo()` → function definition of `foo` across files. Symbol metadata is per-chunk only, not cross-referenced.
|
|
93
93
|
|
|
94
94
|
**Consequence**: for "find all call sites of this function" graphify is the right tool. mempalace-code will not answer that precisely.
|
|
95
95
|
|
|
@@ -105,11 +105,11 @@ For projects that include research papers, architecture diagrams as PNGs, or rec
|
|
|
105
105
|
|
|
106
106
|
mempalace-code is text-only. No PDF parsing, no image captioning, no video transcription.
|
|
107
107
|
|
|
108
|
-
### 5.
|
|
108
|
+
### 5. Incremental rebuild is no longer a Graphify-only win
|
|
109
109
|
|
|
110
110
|
Graphify caches parsed AST by file SHA256. Re-running on an unchanged file is a cache hit; only changed files are re-processed.
|
|
111
111
|
|
|
112
|
-
mempalace-code
|
|
112
|
+
mempalace-code now also mines incrementally by content hash: unchanged drawers are skipped and only changed files are re-chunked unless `--full` is passed. Graphify still wins on full structural graph analysis, but the basic "do not rebuild every unchanged file" capability is now table stakes for both tools.
|
|
113
113
|
|
|
114
114
|
### 6. 10-platform reach via installer
|
|
115
115
|
|
|
@@ -175,12 +175,12 @@ These are genuinely good ideas from graphify that mempalace can incorporate with
|
|
|
175
175
|
|
|
176
176
|
| Idea | Cost | Value | Status |
|
|
177
177
|
|------|------|-------|--------|
|
|
178
|
-
| **
|
|
178
|
+
| **Broader AST coverage / call graph extraction** | L | high | Post-launch candidate — current tree-sitter support is chunk-boundary only for Python/JS/TS/TSX/JSX/Go/Rust |
|
|
179
179
|
| **Explicit per-edge / per-drawer provenance label** | S | medium | New (not in backlog yet) — e.g. `confidence`, `extractor_version` |
|
|
180
180
|
| **`benchmarks/TOKEN_DELTA.md` with one public number** | S | high | Filed as `LAUNCH-BENCH-TOKEN-DELTA` (owner task) |
|
|
181
181
|
| **Minimal static HTML visualization** of palace structure (wings × rooms × drawer counts) | M | medium | New candidate for post-launch |
|
|
182
182
|
| **Per-platform installer** (`mempalace install --platform codex\|cursor\|gemini`) | L | low | Not urgent — Claude Code + Codex both have native MCP; per-platform hooks are maintenance burden |
|
|
183
|
-
| **Tree-sitter
|
|
183
|
+
| **Tree-sitter grammars beyond Python/JS/TS/Go/Rust** | M | medium | Not urgent — current regex/adaptive chunkers cover the launch languages, but not full AST semantics |
|
|
184
184
|
|
|
185
185
|
Note: the always-on PreToolUse hook is intentionally absent from this list. See the preceding section for why.
|
|
186
186
|
|
|
@@ -194,7 +194,7 @@ Note: the always-on PreToolUse hook is intentionally absent from this list. See
|
|
|
194
194
|
- crash-safe LanceDB (survives `Ctrl+C`)
|
|
195
195
|
|
|
196
196
|
**Do not claim**:
|
|
197
|
-
- AST precision — mempalace uses regex
|
|
197
|
+
- full AST/code-graph precision — mempalace uses AST chunk boundaries for a subset, regex structural chunks for many languages, and adaptive chunks for configs/data, but does not build call graphs
|
|
198
198
|
- multimodal ingest — mempalace is text-only
|
|
199
199
|
- visualization — mempalace has none
|
|
200
200
|
- community detection — different problem, different algorithm, not mempalace's game
|
|
@@ -4,7 +4,7 @@ mempalace-code does **semantic vector search** — it finds content by *meaning*
|
|
|
4
4
|
|
|
5
5
|
## The Algorithm in 5 Steps
|
|
6
6
|
|
|
7
|
-
1. **During mining** (`mempalace mine`), every source file is split into chunks. Each chunk is passed through the `all-MiniLM-L6-v2` model, which converts the text into a **384-dimensional vector** — a numeric fingerprint of its meaning. The vector is stored in LanceDB alongside metadata (`wing`, `room`, `source_file`, `language`, `symbol_name`, `symbol_type`).
|
|
7
|
+
1. **During mining** (`mempalace mine`), every source file is split into chunks. Each chunk is passed through the `all-MiniLM-L6-v2` model, which converts the text into a **384-dimensional vector** — a numeric fingerprint of its meaning. The vector is stored in LanceDB alongside metadata (`wing`, `room`, `source_file`, `language`, `symbol_name`, `symbol_type`). Markdown drawers also store section metadata (`heading`, `heading_level`, `heading_path`, `doc_section_type`) and flags for Mermaid diagrams, fenced code blocks, and tables.
|
|
8
8
|
|
|
9
9
|
2. **At query time**, the query string (e.g. `"detect language file extension"`) goes through the same model and produces another 384-dimensional vector in the same semantic space.
|
|
10
10
|
|
|
@@ -12,7 +12,7 @@ mempalace-code does **semantic vector search** — it finds content by *meaning*
|
|
|
12
12
|
|
|
13
13
|
4. **Optional `wing` / `room` filters** are applied as standard SQL `WHERE` predicates. LanceDB decides whether to pre-filter before the vector search or post-filter after it.
|
|
14
14
|
|
|
15
|
-
5. **Top-N results are returned** with a `similarity = 1 - distance` score (1.0 = perfect match, 0.0 = unrelated).
|
|
15
|
+
5. **Top-N results are returned** with a `similarity = 1 - distance` score (1.0 = perfect match, 0.0 = unrelated). Programmatic search returns the stored metadata with each hit so agents can cite the file, symbol, language, and Markdown section path when available.
|
|
16
16
|
|
|
17
17
|
## ASCII Diagram
|
|
18
18
|
|
|
@@ -62,9 +62,10 @@ mempalace-code does **semantic vector search** — it finds content by *meaning*
|
|
|
62
62
|
│ source: miner.py sim: 0.396 │
|
|
63
63
|
│ def detect_language(path): ... │
|
|
64
64
|
│ │
|
|
65
|
-
│ [2] mempalace /
|
|
66
|
-
│ source:
|
|
67
|
-
│
|
|
65
|
+
│ [2] mempalace / language_catalog │
|
|
66
|
+
│ source: language_catalog.py │
|
|
67
|
+
│ sim: 0.351 │
|
|
68
|
+
│ _EXTENSION_LANG_MAP = { ... } │
|
|
68
69
|
│ │
|
|
69
70
|
│ [3] ... │
|
|
70
71
|
└──────────────────────────────────────┘
|
|
@@ -78,9 +79,11 @@ mempalace-code does **semantic vector search** — it finds content by *meaning*
|
|
|
78
79
|
- **The ANN index is approximate.** LanceDB uses IVF-PQ, which trades a tiny amount of recall for a massive speedup. On palaces with ~20k rows, the difference between the ANN search and exact brute force is negligible.
|
|
79
80
|
- **Similarity is not a probability.** A score of 0.396 does not mean "40% match". Scores are only comparable *within the same query* — 0.4 beats 0.3 for the same query, but a 0.4 on one query and a 0.4 on another are not the same thing.
|
|
80
81
|
- **`wing` / `room` filters are cheap.** They are plain columns in LanceDB, evaluated as SQL predicates.
|
|
82
|
+
- **Language filters share the miner catalog.** `code_search(language=...)` validates against the same language labels the miner emits, and the MCP schema hint is generated from that catalog.
|
|
83
|
+
- **Markdown location survives retrieval.** For `.md` files, `search_memories()` results include `heading`, `heading_level`, `heading_path`, `doc_section_type`, `contains_mermaid`, `contains_code`, and `contains_table` when the drawer came from a headed section.
|
|
81
84
|
|
|
82
85
|
## Where the Code Lives
|
|
83
86
|
|
|
84
|
-
- `mempalace/searcher.py
|
|
87
|
+
- `mempalace/searcher.py` — high-level `search()` and `search_memories()` functions.
|
|
85
88
|
- `mempalace/storage.py` — `LanceStore.query()`, which owns the embedding model, the LanceDB handle, and the actual vector search call.
|
|
86
89
|
- `mempalace/miner.py` — smart chunker, language detection, symbol extraction, and the batch embedding loop used during `mempalace mine`.
|
|
@@ -6,14 +6,15 @@ The Python package that powers mempalace-code. All modules, all logic.
|
|
|
6
6
|
|
|
7
7
|
| Module | What it does |
|
|
8
8
|
|--------|-------------|
|
|
9
|
-
| `cli.py` | CLI entry point — routes to mine, search,
|
|
9
|
+
| `cli.py` | CLI entry point — routes to init, mine, search, watch, backup/restore, export/import, health, and wake-up |
|
|
10
10
|
| `config.py` | Configuration loading — `~/.mempalace/config.json`, env vars, defaults |
|
|
11
|
+
| `language_catalog.py` | Shared language metadata for miner detection, `code_search` validation, and MCP language hints |
|
|
11
12
|
| `normalize.py` | Converts 5 chat formats (Claude Code JSONL, Claude.ai JSON, ChatGPT JSON, Slack JSON, plain text) to standard transcript format |
|
|
12
|
-
| `miner.py` | Project file ingest — scans directories, chunks
|
|
13
|
+
| `miner.py` | Project file ingest — scans directories, detects languages, chunks code/prose/config, stores drawers; Markdown chunks keep heading path and section metadata |
|
|
13
14
|
| `convo_miner.py` | Conversation ingest — chunks by exchange pair (Q+A), detects rooms from content |
|
|
14
|
-
| `searcher.py` | Semantic search via
|
|
15
|
+
| `searcher.py` | Semantic search via LanceDB vectors — filters by wing/room/language/symbol, returns verbatim text, scores, and stored metadata such as Markdown heading path |
|
|
15
16
|
| `layers.py` | 4-layer memory stack: L0 (identity), L1 (critical facts), L2 (room recall), L3 (deep search) |
|
|
16
|
-
| `dialect.py` | AAAK
|
|
17
|
+
| `dialect.py` | AAAK lossy summary dialect — entity codes, topic markers, and token-saving estimates |
|
|
17
18
|
| `knowledge_graph.py` | Temporal entity-relationship graph — SQLite, time-filtered queries, fact invalidation |
|
|
18
19
|
| `palace_graph.py` | Room-based navigation graph — BFS traversal, tunnel detection across wings |
|
|
19
20
|
| `mcp_server.py` | MCP server — 27 tools, AAAK auto-teach, Palace Protocol, agent diary |
|
|
@@ -28,7 +29,7 @@ The Python package that powers mempalace-code. All modules, all logic.
|
|
|
28
29
|
## Architecture
|
|
29
30
|
|
|
30
31
|
```
|
|
31
|
-
User → CLI → miner/convo_miner →
|
|
32
|
+
User → CLI → miner/convo_miner → LanceDB (palace)
|
|
32
33
|
↕
|
|
33
34
|
knowledge_graph (SQLite)
|
|
34
35
|
↕
|
|
@@ -37,4 +38,4 @@ User → MCP Server → searcher → results
|
|
|
37
38
|
→ diary → agent journal
|
|
38
39
|
```
|
|
39
40
|
|
|
40
|
-
The palace (
|
|
41
|
+
The palace (LanceDB) stores verbatim drawer content and vector metadata. The knowledge graph (SQLite) stores structured relationships. The MCP server exposes both to any AI tool. ChromaDB is a deprecated optional legacy backend only.
|