kernel-lore-mcp 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (209) hide show
  1. kernel_lore_mcp-0.1.0/.github/workflows/ci.yml +77 -0
  2. kernel_lore_mcp-0.1.0/.gitignore +57 -0
  3. kernel_lore_mcp-0.1.0/.python-version +1 -0
  4. kernel_lore_mcp-0.1.0/CLAUDE.md +336 -0
  5. kernel_lore_mcp-0.1.0/Cargo.lock +4095 -0
  6. kernel_lore_mcp-0.1.0/Cargo.toml +118 -0
  7. kernel_lore_mcp-0.1.0/GOVERNANCE.md +47 -0
  8. kernel_lore_mcp-0.1.0/LEGAL.md +82 -0
  9. kernel_lore_mcp-0.1.0/LICENSE +21 -0
  10. kernel_lore_mcp-0.1.0/PKG-INFO +182 -0
  11. kernel_lore_mcp-0.1.0/README.md +146 -0
  12. kernel_lore_mcp-0.1.0/SECURITY.md +74 -0
  13. kernel_lore_mcp-0.1.0/TODO.md +352 -0
  14. kernel_lore_mcp-0.1.0/docs/README.md +29 -0
  15. kernel_lore_mcp-0.1.0/docs/architecture/data-flow.md +62 -0
  16. kernel_lore_mcp-0.1.0/docs/architecture/deployment-modes.md +101 -0
  17. kernel_lore_mcp-0.1.0/docs/architecture/overview.md +107 -0
  18. kernel_lore_mcp-0.1.0/docs/architecture/reciprocity.md +73 -0
  19. kernel_lore_mcp-0.1.0/docs/architecture/three-tier-index.md +69 -0
  20. kernel_lore_mcp-0.1.0/docs/architecture/trade-offs.md +72 -0
  21. kernel_lore_mcp-0.1.0/docs/demos/first-session.md +166 -0
  22. kernel_lore_mcp-0.1.0/docs/indexing/bm25-tier.md +86 -0
  23. kernel_lore_mcp-0.1.0/docs/indexing/compressed-store.md +64 -0
  24. kernel_lore_mcp-0.1.0/docs/indexing/metadata-tier.md +56 -0
  25. kernel_lore_mcp-0.1.0/docs/indexing/tokenizer-spec.md +135 -0
  26. kernel_lore_mcp-0.1.0/docs/indexing/trigram-tier.md +94 -0
  27. kernel_lore_mcp-0.1.0/docs/ingestion/grokmirror.md +61 -0
  28. kernel_lore_mcp-0.1.0/docs/ingestion/mbox-parsing.md +73 -0
  29. kernel_lore_mcp-0.1.0/docs/ingestion/patch-parsing.md +77 -0
  30. kernel_lore_mcp-0.1.0/docs/ingestion/shard-walking.md +89 -0
  31. kernel_lore_mcp-0.1.0/docs/mcp/client-config.md +210 -0
  32. kernel_lore_mcp-0.1.0/docs/mcp/query-routing.md +73 -0
  33. kernel_lore_mcp-0.1.0/docs/mcp/tools.md +127 -0
  34. kernel_lore_mcp-0.1.0/docs/mcp/transport-auth.md +81 -0
  35. kernel_lore_mcp-0.1.0/docs/ops/cost-model.md +50 -0
  36. kernel_lore_mcp-0.1.0/docs/ops/ec2-sizing.md +87 -0
  37. kernel_lore_mcp-0.1.0/docs/ops/monitoring.md +50 -0
  38. kernel_lore_mcp-0.1.0/docs/ops/runbook.md +308 -0
  39. kernel_lore_mcp-0.1.0/docs/ops/threat-model.md +153 -0
  40. kernel_lore_mcp-0.1.0/docs/ops/update-cadence.md +80 -0
  41. kernel_lore_mcp-0.1.0/docs/ops/update-frequency.md +198 -0
  42. kernel_lore_mcp-0.1.0/docs/plans/2026-04-14-best-in-class-kernel-mcp.md +345 -0
  43. kernel_lore_mcp-0.1.0/docs/plans/2026-04-15-mcp-spec-coverage-and-uplift.md +717 -0
  44. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-agent-ergonomics.md +114 -0
  45. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-best-in-class-mcp-survey.md +242 -0
  46. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-external-data-sources.md +196 -0
  47. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-gix-vs-git2.md +76 -0
  48. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-mcp-python-sdk.md +55 -0
  49. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-pyo3-maturin.md +80 -0
  50. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-search-library-landscape.md +79 -0
  51. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-storage-footprint.md +57 -0
  52. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-tantivy.md +73 -0
  53. kernel_lore_mcp-0.1.0/docs/research/2026-04-14-workflow-gap-analysis.md +171 -0
  54. kernel_lore_mcp-0.1.0/docs/research/training-retriever.md +131 -0
  55. kernel_lore_mcp-0.1.0/docs/standards/README.md +43 -0
  56. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/01-research.md +87 -0
  57. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/02-design.md +99 -0
  58. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/03-implement.md +123 -0
  59. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/04-test.md +107 -0
  60. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/05-quality.md +113 -0
  61. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/06-review.md +95 -0
  62. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/07-commit.md +87 -0
  63. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/08-debug.md +93 -0
  64. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/09-optimize.md +100 -0
  65. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/10-document.md +76 -0
  66. kernel_lore_mcp-0.1.0/docs/standards/python/checklists/index.md +155 -0
  67. kernel_lore_mcp-0.1.0/docs/standards/python/code-quality.md +431 -0
  68. kernel_lore_mcp-0.1.0/docs/standards/python/data-structures.md +489 -0
  69. kernel_lore_mcp-0.1.0/docs/standards/python/design/boundaries.md +383 -0
  70. kernel_lore_mcp-0.1.0/docs/standards/python/design/concurrency.md +395 -0
  71. kernel_lore_mcp-0.1.0/docs/standards/python/design/dependencies.md +357 -0
  72. kernel_lore_mcp-0.1.0/docs/standards/python/design/errors.md +480 -0
  73. kernel_lore_mcp-0.1.0/docs/standards/python/design/modules.md +321 -0
  74. kernel_lore_mcp-0.1.0/docs/standards/python/git.md +374 -0
  75. kernel_lore_mcp-0.1.0/docs/standards/python/index.md +139 -0
  76. kernel_lore_mcp-0.1.0/docs/standards/python/language.md +369 -0
  77. kernel_lore_mcp-0.1.0/docs/standards/python/libraries/fastmcp.md +458 -0
  78. kernel_lore_mcp-0.1.0/docs/standards/python/libraries/httpx.md +340 -0
  79. kernel_lore_mcp-0.1.0/docs/standards/python/libraries/pydantic.md +510 -0
  80. kernel_lore_mcp-0.1.0/docs/standards/python/libraries/structlog.md +373 -0
  81. kernel_lore_mcp-0.1.0/docs/standards/python/naming.md +521 -0
  82. kernel_lore_mcp-0.1.0/docs/standards/python/pyo3-maturin.md +582 -0
  83. kernel_lore_mcp-0.1.0/docs/standards/python/testing.md +492 -0
  84. kernel_lore_mcp-0.1.0/docs/standards/python/uv.md +389 -0
  85. kernel_lore_mcp-0.1.0/docs/standards/rust/cargo.md +401 -0
  86. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/01-research.md +67 -0
  87. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/02-design.md +101 -0
  88. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/03-implement.md +119 -0
  89. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/04-test.md +131 -0
  90. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/05-quality.md +133 -0
  91. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/06-review.md +125 -0
  92. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/07-commit.md +135 -0
  93. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/08-debug.md +121 -0
  94. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/09-optimize.md +148 -0
  95. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/10-document.md +152 -0
  96. kernel_lore_mcp-0.1.0/docs/standards/rust/checklists/index.md +151 -0
  97. kernel_lore_mcp-0.1.0/docs/standards/rust/code-quality.md +349 -0
  98. kernel_lore_mcp-0.1.0/docs/standards/rust/design/boundaries.md +295 -0
  99. kernel_lore_mcp-0.1.0/docs/standards/rust/design/concurrency.md +276 -0
  100. kernel_lore_mcp-0.1.0/docs/standards/rust/design/data-structures.md +357 -0
  101. kernel_lore_mcp-0.1.0/docs/standards/rust/design/errors.md +379 -0
  102. kernel_lore_mcp-0.1.0/docs/standards/rust/design/modules.md +276 -0
  103. kernel_lore_mcp-0.1.0/docs/standards/rust/ffi.md +559 -0
  104. kernel_lore_mcp-0.1.0/docs/standards/rust/index.md +140 -0
  105. kernel_lore_mcp-0.1.0/docs/standards/rust/language.md +260 -0
  106. kernel_lore_mcp-0.1.0/docs/standards/rust/libraries/arrow-parquet.md +314 -0
  107. kernel_lore_mcp-0.1.0/docs/standards/rust/libraries/gix.md +269 -0
  108. kernel_lore_mcp-0.1.0/docs/standards/rust/libraries/pyo3.md +333 -0
  109. kernel_lore_mcp-0.1.0/docs/standards/rust/libraries/regex-automata.md +280 -0
  110. kernel_lore_mcp-0.1.0/docs/standards/rust/libraries/roaring-fst.md +353 -0
  111. kernel_lore_mcp-0.1.0/docs/standards/rust/libraries/tantivy.md +260 -0
  112. kernel_lore_mcp-0.1.0/docs/standards/rust/libraries/zstd.md +317 -0
  113. kernel_lore_mcp-0.1.0/docs/standards/rust/naming.md +385 -0
  114. kernel_lore_mcp-0.1.0/docs/standards/rust/testing.md +375 -0
  115. kernel_lore_mcp-0.1.0/docs/standards/rust/unsafe.md +280 -0
  116. kernel_lore_mcp-0.1.0/pyproject.toml +125 -0
  117. kernel_lore_mcp-0.1.0/rust-toolchain.toml +6 -0
  118. kernel_lore_mcp-0.1.0/scripts/agentic_smoke.sh +229 -0
  119. kernel_lore_mcp-0.1.0/scripts/grokmirror-personal.conf +56 -0
  120. kernel_lore_mcp-0.1.0/scripts/grokmirror.conf +65 -0
  121. kernel_lore_mcp-0.1.0/scripts/klmcp-doctor.sh +188 -0
  122. kernel_lore_mcp-0.1.0/scripts/klmcp-grok-pull.sh +87 -0
  123. kernel_lore_mcp-0.1.0/scripts/klmcp-ingest.sh +72 -0
  124. kernel_lore_mcp-0.1.0/scripts/post-pull-hook.sh +24 -0
  125. kernel_lore_mcp-0.1.0/scripts/systemd/etc-kernel-lore-mcp-env.sample +31 -0
  126. kernel_lore_mcp-0.1.0/scripts/systemd/klmcp-grokmirror.service +48 -0
  127. kernel_lore_mcp-0.1.0/scripts/systemd/klmcp-grokmirror.timer +27 -0
  128. kernel_lore_mcp-0.1.0/scripts/systemd/klmcp-ingest.path +14 -0
  129. kernel_lore_mcp-0.1.0/scripts/systemd/klmcp-ingest.service +42 -0
  130. kernel_lore_mcp-0.1.0/scripts/systemd/klmcp-mcp.service +45 -0
  131. kernel_lore_mcp-0.1.0/src/bin/ingest.rs +263 -0
  132. kernel_lore_mcp-0.1.0/src/bin/reindex.rs +15 -0
  133. kernel_lore_mcp-0.1.0/src/bm25.rs +430 -0
  134. kernel_lore_mcp-0.1.0/src/embedding.rs +397 -0
  135. kernel_lore_mcp-0.1.0/src/error.rs +60 -0
  136. kernel_lore_mcp-0.1.0/src/ingest.rs +565 -0
  137. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/__init__.py +34 -0
  138. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/__main__.py +156 -0
  139. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/_core.pyi +151 -0
  140. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/cli/__init__.py +0 -0
  141. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/cli/embed.py +200 -0
  142. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/cli/ingest.py +183 -0
  143. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/config.py +82 -0
  144. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/embedding.py +63 -0
  145. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/errors.py +124 -0
  146. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/freshness.py +52 -0
  147. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/kwic.py +129 -0
  148. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/logging_.py +44 -0
  149. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/mapping.py +145 -0
  150. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/models.py +362 -0
  151. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/prompts.py +339 -0
  152. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/resources/__init__.py +5 -0
  153. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/resources/blind_spots.py +45 -0
  154. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/resources/templates.py +195 -0
  155. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/routes/__init__.py +5 -0
  156. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/routes/metrics.py +98 -0
  157. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/routes/status.py +128 -0
  158. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/sampling.py +77 -0
  159. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/server.py +163 -0
  160. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/__init__.py +6 -0
  161. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/activity.py +87 -0
  162. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/expand_citation.py +37 -0
  163. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/explain_patch.py +69 -0
  164. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/message.py +65 -0
  165. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/nearest.py +183 -0
  166. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/patch.py +70 -0
  167. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/patch_diff.py +109 -0
  168. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/patch_search.py +59 -0
  169. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/primitives.py +419 -0
  170. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/sampling_tools.py +411 -0
  171. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/search.py +99 -0
  172. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/series.py +31 -0
  173. kernel_lore_mcp-0.1.0/src/kernel_lore_mcp/tools/thread.py +77 -0
  174. kernel_lore_mcp-0.1.0/src/lib.rs +75 -0
  175. kernel_lore_mcp-0.1.0/src/metadata.rs +367 -0
  176. kernel_lore_mcp-0.1.0/src/parse.rs +578 -0
  177. kernel_lore_mcp-0.1.0/src/python.rs +577 -0
  178. kernel_lore_mcp-0.1.0/src/reader.rs +1730 -0
  179. kernel_lore_mcp-0.1.0/src/router.rs +456 -0
  180. kernel_lore_mcp-0.1.0/src/schema.rs +140 -0
  181. kernel_lore_mcp-0.1.0/src/state.rs +339 -0
  182. kernel_lore_mcp-0.1.0/src/store.rs +298 -0
  183. kernel_lore_mcp-0.1.0/src/tid.rs +404 -0
  184. kernel_lore_mcp-0.1.0/src/trigram.rs +424 -0
  185. kernel_lore_mcp-0.1.0/tests/__init__.py +0 -0
  186. kernel_lore_mcp-0.1.0/tests/bin_ingest.rs +156 -0
  187. kernel_lore_mcp-0.1.0/tests/python/__init__.py +0 -0
  188. kernel_lore_mcp-0.1.0/tests/python/conftest.py +27 -0
  189. kernel_lore_mcp-0.1.0/tests/python/fixtures/__init__.py +93 -0
  190. kernel_lore_mcp-0.1.0/tests/python/test_annotations.py +59 -0
  191. kernel_lore_mcp-0.1.0/tests/python/test_cost_hints.py +41 -0
  192. kernel_lore_mcp-0.1.0/tests/python/test_embedding_e2e.py +151 -0
  193. kernel_lore_mcp-0.1.0/tests/python/test_errors.py +87 -0
  194. kernel_lore_mcp-0.1.0/tests/python/test_freshness.py +136 -0
  195. kernel_lore_mcp-0.1.0/tests/python/test_http_transport.py +203 -0
  196. kernel_lore_mcp-0.1.0/tests/python/test_ingest_and_reader.py +195 -0
  197. kernel_lore_mcp-0.1.0/tests/python/test_kwic.py +138 -0
  198. kernel_lore_mcp-0.1.0/tests/python/test_mcp_tools_e2e.py +246 -0
  199. kernel_lore_mcp-0.1.0/tests/python/test_primitives_e2e.py +218 -0
  200. kernel_lore_mcp-0.1.0/tests/python/test_prompts.py +137 -0
  201. kernel_lore_mcp-0.1.0/tests/python/test_resource_templates.py +144 -0
  202. kernel_lore_mcp-0.1.0/tests/python/test_resources_routes.py +163 -0
  203. kernel_lore_mcp-0.1.0/tests/python/test_response_format.py +103 -0
  204. kernel_lore_mcp-0.1.0/tests/python/test_sampling_tools.py +191 -0
  205. kernel_lore_mcp-0.1.0/tests/python/test_smoke.py +49 -0
  206. kernel_lore_mcp-0.1.0/tests/python/test_status_subcommand.py +93 -0
  207. kernel_lore_mcp-0.1.0/tests/python/test_stdio_subprocess.py +240 -0
  208. kernel_lore_mcp-0.1.0/tests/python/test_thread_patch_explain.py +162 -0
  209. kernel_lore_mcp-0.1.0/uv.lock +2056 -0
@@ -0,0 +1,77 @@
1
+ name: ci
2
+
3
+ on:
4
+ push:
5
+ branches: [main, master]
6
+ pull_request:
7
+
8
+ permissions:
9
+ contents: read
10
+
11
+ jobs:
12
+ rust:
13
+ name: rust (fmt + clippy + test)
14
+ runs-on: ubuntu-latest
15
+ steps:
16
+ - uses: actions/checkout@v4
17
+ - uses: dtolnay/rust-toolchain@stable
18
+ with:
19
+ components: rustfmt, clippy
20
+ - uses: Swatinem/rust-cache@v2
21
+ - run: cargo fmt --all -- --check
22
+ - run: cargo clippy --all-targets -- -D warnings
23
+ - run: cargo test --lib
24
+
25
+ python:
26
+ name: python (ruff + mypy)
27
+ runs-on: ubuntu-latest
28
+ steps:
29
+ - uses: actions/checkout@v4
30
+ - uses: astral-sh/setup-uv@v5
31
+ with:
32
+ python-version: "3.12"
33
+ - run: uv sync --frozen --no-install-project
34
+ continue-on-error: true
35
+ - run: uv sync
36
+ - run: uv run ruff check src tests
37
+ - run: uv run ruff format --check src tests
38
+ # mypy requires a built _core module; run after build job.
39
+
40
+ build:
41
+ name: maturin build + pytest (smoke)
42
+ runs-on: ubuntu-latest
43
+ steps:
44
+ - uses: actions/checkout@v4
45
+ - uses: astral-sh/setup-uv@v5
46
+ with:
47
+ python-version: "3.12"
48
+ - uses: dtolnay/rust-toolchain@stable
49
+ - uses: Swatinem/rust-cache@v2
50
+ - run: uv sync
51
+ - run: uv run maturin develop --release
52
+ - run: uv run pytest -v
53
+
54
+ wheels:
55
+ name: build abi3 wheels (${{ matrix.target }})
56
+ runs-on: ${{ matrix.os }}
57
+ strategy:
58
+ fail-fast: false
59
+ matrix:
60
+ include:
61
+ - os: ubuntu-latest
62
+ target: x86_64-unknown-linux-gnu
63
+ - os: ubuntu-latest
64
+ target: aarch64-unknown-linux-gnu
65
+ steps:
66
+ - uses: actions/checkout@v4
67
+ - uses: PyO3/maturin-action@v1
68
+ with:
69
+ target: ${{ matrix.target }}
70
+ command: build
71
+ args: --release --out dist
72
+ sccache: true
73
+ manylinux: auto
74
+ - uses: actions/upload-artifact@v4
75
+ with:
76
+ name: wheels-${{ matrix.target }}
77
+ path: dist
@@ -0,0 +1,57 @@
1
+ # Rust
2
+ /target/
3
+ # Cargo.lock is tracked — we build binaries (reindex) and a cdylib wheel
4
+ # where reproducible builds matter. Follow the Rust guidance: commit the
5
+ # lockfile for bin+cdylib crates.
6
+ **/*.rs.bk
7
+
8
+ # Python
9
+ __pycache__/
10
+ *.py[cod]
11
+ *$py.class
12
+ *.so
13
+ .Python
14
+ .venv/
15
+ venv/
16
+ env/
17
+ dist/
18
+ build/
19
+ *.egg-info/
20
+ .pytest_cache/
21
+ .ruff_cache/
22
+ .mypy_cache/
23
+ .coverage
24
+ htmlcov/
25
+
26
+ # uv
27
+ .uv/
28
+
29
+ # maturin
30
+ target/wheels/
31
+
32
+ # Editors
33
+ .vscode/
34
+ .idea/
35
+ *.swp
36
+ *.swo
37
+ .DS_Store
38
+
39
+ # Runtime data — never commit lore mirrors, indices, or compressed stores
40
+ /data/
41
+ /cache/
42
+ /var/
43
+ *.tantivy
44
+ *.zst
45
+ *.parquet
46
+ *.fst
47
+ *.roaring
48
+ manifest.js.gz
49
+
50
+ # Logs
51
+ *.log
52
+
53
+ # Secrets
54
+ .env
55
+ .env.local
56
+ *.pem
57
+ *.key
@@ -0,0 +1 @@
1
+ 3.12
@@ -0,0 +1,336 @@
1
+ # kernel-lore-mcp — project state
2
+
3
+ ## Pointers
4
+
5
+ - **Standards:** [`docs/standards/`](./docs/standards/).
6
+ - **Reciprocity:** [`docs/architecture/reciprocity.md`](./docs/architecture/reciprocity.md).
7
+ - **Threat model:** [`docs/ops/threat-model.md`](./docs/ops/threat-model.md).
8
+ - **Update frequency / cadence policy:** [`docs/ops/update-frequency.md`](./docs/ops/update-frequency.md).
9
+ - **Training retriever north star:** [`docs/research/training-retriever.md`](./docs/research/training-retriever.md).
10
+ - **Deployment modes:** [`docs/architecture/deployment-modes.md`](./docs/architecture/deployment-modes.md).
11
+ - **Legal posture:** [`LEGAL.md`](./LEGAL.md).
12
+ - **Security policy:** [`SECURITY.md`](./SECURITY.md).
13
+ - **Governance:** [`GOVERNANCE.md`](./GOVERNANCE.md).
14
+ - **Top-level 6-month roadmap (best-in-class kernel-research MCP, supersedes framing of the uplift plan):**
15
+ [`docs/plans/2026-04-14-best-in-class-kernel-mcp.md`](./docs/plans/2026-04-14-best-in-class-kernel-mcp.md).
16
+ Built on four research streams under [`docs/research/2026-04-14-*`](./docs/research/).
17
+ - **MCP-surface detail (Phases 10–17, still authoritative for code-level scope):**
18
+ [`docs/plans/2026-04-15-mcp-spec-coverage-and-uplift.md`](./docs/plans/2026-04-15-mcp-spec-coverage-and-uplift.md).
19
+ Includes the full lessons-learned retro at the bottom.
20
+
21
+ ## Standards
22
+
23
+ Before touching Python code: read [`docs/standards/python/index.md`](./docs/standards/python/index.md)
24
+ and the relevant guide. Before touching Rust code: read
25
+ [`docs/standards/rust/index.md`](./docs/standards/rust/index.md) and
26
+ the relevant guide. When these standards disagree with this
27
+ CLAUDE.md, CLAUDE.md wins — it's the project-specific contract.
28
+
29
+ ## Original goal
30
+
31
+ A public MCP server that makes lore.kernel.org (all kernel mailing
32
+ lists) searchable by LLM-backed developer tools — Claude Code,
33
+ Codex, Cursor, and anything else that speaks MCP. Target user is a
34
+ kernel contributor or security researcher who wants structured,
35
+ low-latency queries over every patch, review thread, and bug
36
+ report on every kernel list without living inside `lei`.
37
+
38
+ This is infrastructure, not a product. Be conservative. Be correct.
39
+ Do not over-engineer. Do not under-engineer. Full design rationale
40
+ lives in `docs/architecture/`. Execution contract in `TODO.md`.
41
+
42
+ ## Non-negotiable product constraints
43
+
44
+ 1. **No authentication, ever.** No API keys, no OAuth, no bearer
45
+ tokens, no login flow. Anonymous read-only is the posture on
46
+ every deployment — local, hosted, every instance in between.
47
+ This keeps the barrier to agent integration at zero. Any tool /
48
+ resource / prompt that would require the caller to hold a secret
49
+ is rejected at design time.
50
+ 2. **We reduce load on lore.kernel.org; we never add to it.** The
51
+ server ingests via `grokmirror` (the sanctioned upstream mirror
52
+ protocol) and serves the indexed corpus. Every agent pointed at
53
+ a kernel-lore-mcp instance is one fewer agent that would
54
+ otherwise scrape lore directly. Fanout-to-one is the value
55
+ proposition. Do not apologize for integrating — the hosted +
56
+ self-hosted instances together subtract traffic from lore.
57
+ 3. **Any upstream credential (e.g. KCIDB BigQuery, GitHub API for
58
+ data ingestion) lives in the server's deployment config and is
59
+ never exposed to callers.** Callers never need an upstream
60
+ account to use our MCP.
61
+
62
+ ## Stack (April 2026, pinned on purpose)
63
+
64
+ | Component | Version | Notes |
65
+ |---|---|---|
66
+ | Rust toolchain | stable 1.85 (edition 2024) | pinned in `rust-toolchain.toml` |
67
+ | PyO3 | 0.28.3 | `Python::detach` / `Python::attach` are the CURRENT names (renamed from `allow_threads` / `with_gil` in PRs #5209 #5221, shipped in 0.28). Do not write `allow_threads` in new code. |
68
+ | maturin | 1.13.1 | build backend |
69
+ | Python | 3.12 minimum (abi3 floor), 3.14 preferred. Free-threaded `python3.14t` requires `--no-default-features` (abi3 incompatible until PEP 803 "abi3t" lands). |
70
+ | tantivy | 0.26.0 | stemming gated behind `stemmer` feature — NEVER enabled |
71
+ | tantivy-py | NOT USED | we bind tantivy ourselves in the PyO3 module |
72
+ | gix (gitoxide) | 0.81.0 | features: `max-performance-safe`, `revision`, `parallel`. NO `blocking-network-client` (grokmirror fetches) |
73
+ | mail-parser | 0.11 | `full_encoding` feature (legacy charsets) |
74
+ | roaring | 0.11 | posting lists (trigram tier) |
75
+ | fst | 0.4 | term dictionary (trigram tier) |
76
+ | regex-automata | 0.4 | DFA-only regex — safe for untrusted input |
77
+ | zstd | 0.13 | compressed raw store (dictionary-trained per list) |
78
+ | arrow | 58 | metadata tier (Parquet on disk) |
79
+ | parquet | 58 | ditto; `zstd` + `arrow` + `async` features |
80
+ | fastmcp | 3.2.4 | MCP framework. Streamable HTTP only; NOT SSE. |
81
+ | mcp (low-level SDK) | 1.27 (explicit dep) | types only; serving via FastMCP |
82
+
83
+ Any bump to these pins is a project decision, not a casual
84
+ `cargo update` / `uv lock --upgrade`. Log the reason in a commit
85
+ message.
86
+
87
+ ## Canonical project layout (uv init --build-backend maturin)
88
+
89
+ ```
90
+ kernel-lore-mcp/
91
+ ├── CLAUDE.md # you are here — authoritative proscriptions
92
+ ├── TODO.md # review-driven execution contract
93
+ ├── README.md # public pitch, links to docs/
94
+ ├── LICENSE # MIT
95
+ ├── rust-toolchain.toml # pinned Rust toolchain
96
+ ├── pyproject.toml # maturin build backend, uv deps/groups
97
+ ├── Cargo.toml # Rust crate (cdylib + rlib) + reindex bin
98
+ ├── src/ # Rust + Python mixed (maturin convention)
99
+ │ ├── lib.rs # #[pymodule] root
100
+ │ ├── error.rs # Error + From<_> for PyErr
101
+ │ ├── state.rs # last_indexed_oid, generation, writer lockfile
102
+ │ ├── schema.rs # shared Arrow + tantivy schemas
103
+ │ ├── store.rs # compressed raw store
104
+ │ ├── metadata.rs # Arrow/Parquet columnar tier
105
+ │ ├── trigram.rs # fst + roaring trigram tier
106
+ │ ├── bm25.rs # tantivy tier
107
+ │ ├── ingest.rs # gix shard walk + extract + dispatch
108
+ │ ├── router.rs # query grammar + tier dispatch + merge
109
+ │ ├── bin/
110
+ │ │ └── reindex.rs # rebuild indices from compressed store
111
+ │ └── kernel_lore_mcp/ # Python package
112
+ │ ├── __init__.py # lazy _core import
113
+ │ ├── __main__.py # entry point; default bind 127.0.0.1
114
+ │ ├── server.py # FastMCP app; explicit tool registration
115
+ │ ├── config.py # pydantic-settings
116
+ │ ├── models.py # pydantic response models (outputSchema)
117
+ │ ├── logging_.py # structlog; stdio mode -> stderr only
118
+ │ ├── tools/ # one file per MCP tool
119
+ │ ├── resources/ # blind_spots://coverage etc.
120
+ │ ├── routes/ # /status, /metrics via @mcp.custom_route
121
+ │ └── _core.pyi # type stubs for the Rust extension
122
+ ├── tests/
123
+ │ └── python/ # pytest with in-process fastmcp.Client
124
+ ├── docs/
125
+ │ ├── architecture/ # design rationale
126
+ │ ├── ingestion/ # how data comes in
127
+ │ ├── indexing/ # the three tiers, tokenizer spec
128
+ │ ├── mcp/ # tool schemas, routing, transport, clients
129
+ │ ├── ops/ # EC2, cost, freshness, deploy, security
130
+ │ └── research/ # dated investigations
131
+ ├── scripts/ # one-offs (grokmirror conf)
132
+ └── .github/workflows/ # CI
133
+ ```
134
+
135
+ **Keep this organized.** Every doc has a single home. If you can't
136
+ decide where something goes, update the taxonomy — don't scatter.
137
+
138
+ ## Three-tier index architecture
139
+
140
+ Corpus is heterogeneous. One index is wrong. See
141
+ `docs/architecture/three-tier-index.md`.
142
+
143
+ 1. **Metadata tier** — Arrow/Parquet. Structured fields: message_id,
144
+ list, from, subject (raw + normalized + tags), date, in_reply_to,
145
+ references[], tid (thread id, precomputed at ingest),
146
+ touched_files[], touched_functions[], series_version,
147
+ series_index, is_cover_letter, has_patch, patch_stats, trailers
148
+ (signed_off_by[], reviewed_by[], acked_by[], tested_by[],
149
+ co_developed_by[], reported_by[], fixes[], link[], closes[]),
150
+ cross_posted_to[], body_offset, body_length, body_sha256,
151
+ schema_version. ~3 GB for all of lore.
152
+ 2. **Trigram tier** — custom; `fst` + `roaring`. Indexes patch/diff
153
+ content. Regex + substring over code. Confirm-with-real-regex
154
+ by decompressing the patch body from the compressed store
155
+ (candidates capped; see `src/trigram.rs`). ~20 GB.
156
+ 3. **BM25 tier** — tantivy with our `kernel_prose` analyzer.
157
+ Indexes prose body (message minus patch) + subject. Positions
158
+ OFF (`IndexRecordOption::WithFreqs`). Phrase queries on prose
159
+ are REJECTED, not silently degraded. ~10 GB.
160
+
161
+ Rebuildability contract: the compressed raw store is the source
162
+ of truth. All three tiers can be rebuilt from it without
163
+ refetching lore. `reindex` binary does this.
164
+
165
+ ## Tokenizer proscriptions
166
+
167
+ Non-negotiable. See `docs/indexing/tokenizer-spec.md`.
168
+
169
+ 1. **No stemming, no stopwords, no asciifolding, no typo tolerance.**
170
+ tantivy 0.26 puts stemming behind a feature flag — leave off.
171
+ 2. **Strip quoted reply prefixes (`^> `) and signature blocks
172
+ (after `-- \n`) before indexing.**
173
+ 3. **Split patch off at ingest.** First `^diff --git` line starts
174
+ the patch; prose goes to BM25, patch to trigram. **Never mix.**
175
+ 4. **Preserve kernel identifiers whole AND emit subtokens.** For
176
+ `vector_mmsg_rx`, emit the whole identifier plus `vector`,
177
+ `mmsg`, `rx` subtokens at `position_inc=0`. **Preserve the
178
+ leading-underscore signal** (`__skb_unlink` stays distinct from
179
+ `skb_unlink`).
180
+ 5. **Atomic tokens** for email addresses, Message-IDs, commit SHAs,
181
+ CVE IDs — dedicated `raw` analyzer or STRING field.
182
+ 6. **Subject-line prefixes (`[PATCH ...]`, `Re:`, `Fwd:`) stripped
183
+ before BM25**; raw subject stored for display.
184
+ 7. **Subject tags** — `[RFC]`, `[RFT]`, `[GIT PULL]`, `[ANNOUNCE]`,
185
+ `[RESEND]`, `[PATCH vN]`, `N/M` — extracted to `subject_tags[]`
186
+ column, not discarded.
187
+
188
+ ## Ingestion pipeline
189
+
190
+ - `grokmirror` pulls lore shards on a 10-minute cron.
191
+ - Ingestion runs as a **separate systemd unit** (`klmcp-ingest`),
192
+ NOT in-process with the MCP server. It holds the sole
193
+ `tantivy::IndexWriter` + trigram builder + store appender.
194
+ - Walk via `gix::ThreadSafeRepository` with one rayon task per
195
+ shard (never within a shard — packfile cache locality).
196
+ Incremental via `rev_walk([head]).with_hidden([last_oid])` with
197
+ full-rewalk fallback if `last_oid` is dangling (shard repack).
198
+ - Per commit: extract fields in `docs/ingestion/mbox-parsing.md`
199
+ and trailers listed above; propagate `touched_files` from
200
+ sibling `1..N/N` patches to cover-letter via `tid`.
201
+ - After shard done: atomic rename of state file.
202
+ - After all shards done: writer commit, Parquet finalize, trigram
203
+ segment rename, bump `state::generation` counter.
204
+
205
+ ## Reader reload discipline
206
+
207
+ - `tantivy::ReloadPolicy::Manual`.
208
+ - Every query-request entry `stat()`s the generation file; if the
209
+ u64 advanced, `reader.reload()?` runs before the query.
210
+ - Same file tells multi-worker uvicorn deployments to stay coherent.
211
+
212
+ ## MCP server contract
213
+
214
+ See `docs/mcp/` for full details.
215
+
216
+ - Transport: **Streamable HTTP only** (SSE deprecated Apr 1 2026).
217
+ stdio for local dev.
218
+ - Default bind `127.0.0.1`; set `KLMCP_BIND=0.0.0.0` explicitly for
219
+ public deploy.
220
+ - **stdio mode**: all logs to stderr. Never write a byte to stdout
221
+ outside the MCP framing — corrupts the protocol.
222
+ - Tools v1: `lore_search`, `lore_thread`, `lore_patch`,
223
+ `lore_activity`, `lore_message`, `lore_series_versions`,
224
+ `lore_patch_diff`. All read-only. All annotate `readOnlyHint: true`.
225
+ - **Tools return pydantic `BaseModel`**, not `dict`. FastMCP
226
+ auto-derives `outputSchema` + emits `structuredContent`.
227
+ - **Every hit carries**: `message_id`, `cite_key`, `from_addr`
228
+ (always), `lore_url`, `subject_tags[]`, `is_cover_letter`,
229
+ `series_version`, `series_index`, `patch_stats` (if `has_patch`),
230
+ `snippet{offset,length,sha256,text}`, `tier_provenance[]`,
231
+ `is_exact_match`, `cross_posted_to[]`.
232
+ - **Pagination**: opaque **HMAC-signed** cursor (not a plain b64
233
+ tuple — rejects tampered cursors). Key from env
234
+ `KLMCP_CURSOR_KEY`.
235
+ - **No phrase queries on prose body** in v1 (no positions). Router
236
+ returns `Error::QueryParse` with an actionable message, never
237
+ silent degradation.
238
+ - **Regex queries MUST compile to DFA** via `regex-automata`. No
239
+ backrefs, no catastrophic patterns. Rejected with
240
+ `Error::RegexComplexity`.
241
+ - **`blind_spots` is an MCP resource** (`blind_spots://coverage`),
242
+ NOT a per-response payload. Per-response tax is a token sink.
243
+ - **Default `rt:` is 5 years**; always echoed in
244
+ `default_applied: ["rt:5y"]` so LLMs know.
245
+ - **`/status`** via `@mcp.custom_route`; cached 30s. **`/metrics`**
246
+ via `prometheus_client` (localhost by default).
247
+
248
+ ## Query grammar (lei-compatible subset, expanded)
249
+
250
+ Full list in `docs/mcp/query-routing.md`. Key operators beyond the
251
+ v0 sketch:
252
+
253
+ - Metadata: `tc:` (to-or-cc combined), `reviewed-by:`, `acked-by:`,
254
+ `tested-by:`, `signed-off-by:`, `co-developed-by:`, `fixes:<sha>`
255
+ (reverse-lookup patches mentioning this SHA), `closes:`, `link:`,
256
+ `patchid:`, `applied:`, `cherry:`, `tag:<RFC|RFT|...>`,
257
+ `trailer:<name>:<value>`.
258
+ - Trigram: `dfpre:`, `dfpost:`, `dfa:` (either side), `dfb:` (body
259
+ incl. context), `dfctx:` (context only), `/<regex>/`.
260
+ - BM25: `b:`, `nq:` (body minus quoted reply).
261
+
262
+ ## What NOT to use
263
+
264
+ Evaluated and rejected; see `docs/research/`:
265
+
266
+ - **Sonic, Toshi, Meilisearch, Quickwit, Bluge/Bleve** — wrong shape
267
+ or wrong language for embedded Rust library use.
268
+ - **tantivy stemmer** — off by design. Never enable.
269
+ - **git2-rs / libgit2** — not `Sync`; gix wins on linear history.
270
+ - **vendored `mcp.server.fastmcp`** — diverged from standalone
271
+ `fastmcp`. Use standalone.
272
+ - **SSE transport** — deprecated Apr 1 2026.
273
+ - **Keeping full git shards** after ingest — compressed store is
274
+ source of truth.
275
+ - **`allow_threads` / `with_gil`** in new PyO3 code — renamed to
276
+ `detach` / `attach` in 0.28.
277
+ - **FastAPI REST surface** — deferred past v1 unless demand lands.
278
+ Don't mount it.
279
+
280
+ ## Operational contract
281
+
282
+ - EC2 single-box deploy, **`r7g.xlarge`** Graviton (32 GB RAM —
283
+ NOT `c7g.xlarge` 8 GB, which won't hold the hot set), or
284
+ `r7i.xlarge` on Intel.
285
+ - **gp3 16000 IOPS / 1000 MB/s** (6000/250 would queue cold BM25).
286
+ - Ingestion is a separate systemd unit from serving.
287
+ - Blue/green deploy via dual systemd units + nginx upstream swap.
288
+ - RPO = hours (re-grok-pull from lore). RTO = ~30 min
289
+ (snapshot-restore cold). State this explicitly in responses if
290
+ ever relevant.
291
+ - `robots.txt` + `LEGAL.md` posture for public re-hosting of
292
+ author names/emails. See `docs/ops/` and LEGAL doc.
293
+
294
+ ## Known blind spots
295
+
296
+ Surfaced via the `blind_spots://coverage` MCP resource (once, not
297
+ per-response):
298
+
299
+ - Private `security@kernel.org` queue.
300
+ - Distro vendor backports.
301
+ - Syzbot pre-public findings.
302
+ - ZDI / research-shop internal pipelines.
303
+ - CVE Project in-flight embargoes.
304
+ - Off-list discussion (IRC, private email, calls).
305
+ - Lore trails vger by 1–5 minutes; our ingest adds 10–20 more.
306
+
307
+ ## Research-novelty discipline (inherited from parent project)
308
+
309
+ This server makes novelty-checking *easier*, but the discipline is
310
+ unchanged: don't fish where everyone else is fishing. The
311
+ `lore_activity` tool must make it trivial to ask "who has touched
312
+ this file in the last 6 months, grouped by series, with trailers,
313
+ and mapped against MAINTAINERS membership." That's what the
314
+ metadata tier is for.
315
+
316
+ ## Session-specific guidance
317
+
318
+ - Prefer editing existing files over creating new ones.
319
+ - Never add speculative features. This project gets misused as a
320
+ sandbox because it touches many interesting topics.
321
+ - Do not run `grok-pull` from developer machines by default — the
322
+ deploy box does that. Ingestion tests use synthetic fixtures in
323
+ `tests/python/fixtures/`.
324
+ - Do not commit compressed stores, indices, or fetched lore data.
325
+ `.gitignore` catches `data/`, `*.tantivy`, `*.zst`, etc.
326
+ - Do not write comments explaining WHAT code does. Identifiers
327
+ already tell you that. Comments explain WHY — non-obvious
328
+ constraints, workarounds for specific bugs.
329
+ - Do not add a stemmer. Do not add SSE transport. Do not add
330
+ git2-rs. Do not add FastAPI for v1. Do not hold the GIL across
331
+ heavy Rust calls. Do not write logs to stdout in stdio mode.
332
+ Do not return bare dicts from MCP tools. Do not use the
333
+ side-effect-import tool registration pattern. Do not add
334
+ authentication of any kind (API keys, OAuth, bearer tokens) —
335
+ this is the same MCP server whether it's running on localhost
336
+ or the public instance. These are decided.