@ngao/search 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (259) hide show
  1. package/.claude/settings.local.json +10 -0
  2. package/.env.example +7 -0
  3. package/.eslintrc.json +20 -0
  4. package/.github/workflows/build.yml +39 -0
  5. package/.github/workflows/release.yml +34 -0
  6. package/.github/workflows/test.yml +35 -0
  7. package/.mcp-config.json +14 -0
  8. package/.prettierrc.json +10 -0
  9. package/LICENSE +17 -0
  10. package/Makefile +26 -0
  11. package/README.md +57 -172
  12. package/config.example.json +8 -0
  13. package/dist/backend/api/search-engine.d.ts +40 -0
  14. package/dist/backend/api/search-engine.d.ts.map +1 -0
  15. package/dist/backend/api/search-engine.js +227 -0
  16. package/dist/backend/api/search-engine.js.map +1 -0
  17. package/dist/backend/core/block-impl.d.ts +32 -0
  18. package/dist/backend/core/block-impl.d.ts.map +1 -0
  19. package/dist/backend/core/block-impl.js +33 -0
  20. package/dist/backend/core/block-impl.js.map +1 -0
  21. package/dist/backend/core/config-loader.d.ts +68 -0
  22. package/dist/backend/core/config-loader.d.ts.map +1 -0
  23. package/dist/backend/core/config-loader.js +234 -0
  24. package/dist/backend/core/config-loader.js.map +1 -0
  25. package/dist/backend/core/constants.d.ts +39 -0
  26. package/dist/backend/core/constants.d.ts.map +1 -0
  27. package/dist/backend/core/constants.js +57 -0
  28. package/dist/backend/core/constants.js.map +1 -0
  29. package/dist/backend/core/enums.d.ts +54 -0
  30. package/dist/backend/core/enums.d.ts.map +1 -0
  31. package/dist/backend/core/enums.js +61 -0
  32. package/dist/backend/core/enums.js.map +1 -0
  33. package/dist/backend/core/errors.d.ts +83 -0
  34. package/dist/backend/core/errors.d.ts.map +1 -0
  35. package/dist/backend/core/errors.js +151 -0
  36. package/dist/backend/core/errors.js.map +1 -0
  37. package/dist/backend/core/logger.d.ts +68 -0
  38. package/dist/backend/core/logger.d.ts.map +1 -0
  39. package/dist/backend/core/logger.js +151 -0
  40. package/dist/backend/core/logger.js.map +1 -0
  41. package/dist/backend/core/models.d.ts +332 -0
  42. package/dist/backend/core/models.d.ts.map +1 -0
  43. package/dist/backend/core/models.js +6 -0
  44. package/dist/backend/core/models.js.map +1 -0
  45. package/dist/backend/core/service-types.d.ts +184 -0
  46. package/dist/backend/core/service-types.d.ts.map +1 -0
  47. package/dist/backend/core/service-types.js +7 -0
  48. package/dist/backend/core/service-types.js.map +1 -0
  49. package/dist/backend/core/types.d.ts +219 -0
  50. package/dist/backend/core/types.d.ts.map +1 -0
  51. package/dist/backend/core/types.js +109 -0
  52. package/dist/backend/core/types.js.map +1 -0
  53. package/dist/backend/index.d.ts +5 -0
  54. package/dist/backend/index.d.ts.map +1 -0
  55. package/dist/backend/index.js +13 -0
  56. package/dist/backend/index.js.map +1 -0
  57. package/dist/backend/indexing/block-extractor.d.ts +22 -0
  58. package/dist/backend/indexing/block-extractor.d.ts.map +1 -0
  59. package/dist/backend/indexing/block-extractor.js +52 -0
  60. package/dist/backend/indexing/block-extractor.js.map +1 -0
  61. package/dist/backend/indexing/index-builder.d.ts +26 -0
  62. package/dist/backend/indexing/index-builder.d.ts.map +1 -0
  63. package/dist/backend/indexing/index-builder.js +71 -0
  64. package/dist/backend/indexing/index-builder.js.map +1 -0
  65. package/dist/backend/parsers/base-file-parser.d.ts +134 -0
  66. package/dist/backend/parsers/base-file-parser.d.ts.map +1 -0
  67. package/dist/backend/parsers/base-file-parser.js +149 -0
  68. package/dist/backend/parsers/base-file-parser.js.map +1 -0
  69. package/dist/backend/parsers/javascript-parser.d.ts +36 -0
  70. package/dist/backend/parsers/javascript-parser.d.ts.map +1 -0
  71. package/dist/backend/parsers/javascript-parser.js +194 -0
  72. package/dist/backend/parsers/javascript-parser.js.map +1 -0
  73. package/dist/backend/parsers/json-parser.d.ts +15 -0
  74. package/dist/backend/parsers/json-parser.d.ts.map +1 -0
  75. package/dist/backend/parsers/json-parser.js +75 -0
  76. package/dist/backend/parsers/json-parser.js.map +1 -0
  77. package/dist/backend/parsers/markdown-parser.d.ts +17 -0
  78. package/dist/backend/parsers/markdown-parser.d.ts.map +1 -0
  79. package/dist/backend/parsers/markdown-parser.js +94 -0
  80. package/dist/backend/parsers/markdown-parser.js.map +1 -0
  81. package/dist/backend/parsers/parser-factory.d.ts +43 -0
  82. package/dist/backend/parsers/parser-factory.d.ts.map +1 -0
  83. package/dist/backend/parsers/parser-factory.js +149 -0
  84. package/dist/backend/parsers/parser-factory.js.map +1 -0
  85. package/dist/backend/parsers/python-parser.d.ts +21 -0
  86. package/dist/backend/parsers/python-parser.d.ts.map +1 -0
  87. package/dist/backend/parsers/python-parser.js +185 -0
  88. package/dist/backend/parsers/python-parser.js.map +1 -0
  89. package/dist/backend/parsers/yaml-parser.d.ts +16 -0
  90. package/dist/backend/parsers/yaml-parser.d.ts.map +1 -0
  91. package/dist/backend/parsers/yaml-parser.js +81 -0
  92. package/dist/backend/parsers/yaml-parser.js.map +1 -0
  93. package/dist/backend/repositories/implementations/lancedb-block-repository.d.ts +125 -0
  94. package/dist/backend/repositories/implementations/lancedb-block-repository.d.ts.map +1 -0
  95. package/dist/backend/repositories/implementations/lancedb-block-repository.js +505 -0
  96. package/dist/backend/repositories/implementations/lancedb-block-repository.js.map +1 -0
  97. package/dist/backend/repositories/implementations/lancedb-metadata-repository.d.ts +107 -0
  98. package/dist/backend/repositories/implementations/lancedb-metadata-repository.d.ts.map +1 -0
  99. package/dist/backend/repositories/implementations/lancedb-metadata-repository.js +275 -0
  100. package/dist/backend/repositories/implementations/lancedb-metadata-repository.js.map +1 -0
  101. package/dist/backend/repositories/implementations/memory-cache.d.ts +18 -0
  102. package/dist/backend/repositories/implementations/memory-cache.d.ts.map +1 -0
  103. package/dist/backend/repositories/implementations/memory-cache.js +53 -0
  104. package/dist/backend/repositories/implementations/memory-cache.js.map +1 -0
  105. package/dist/backend/repositories/repository.interface.d.ts +334 -0
  106. package/dist/backend/repositories/repository.interface.d.ts.map +1 -0
  107. package/dist/backend/repositories/repository.interface.js +7 -0
  108. package/dist/backend/repositories/repository.interface.js.map +1 -0
  109. package/dist/backend/search/context-extractor.d.ts +29 -0
  110. package/dist/backend/search/context-extractor.d.ts.map +1 -0
  111. package/dist/backend/search/context-extractor.js +106 -0
  112. package/dist/backend/search/context-extractor.js.map +1 -0
  113. package/dist/backend/search/multi-index-searcher.d.ts +28 -0
  114. package/dist/backend/search/multi-index-searcher.d.ts.map +1 -0
  115. package/dist/backend/search/multi-index-searcher.js +81 -0
  116. package/dist/backend/search/multi-index-searcher.js.map +1 -0
  117. package/dist/backend/search/query-parser.d.ts +37 -0
  118. package/dist/backend/search/query-parser.d.ts.map +1 -0
  119. package/dist/backend/search/query-parser.js +145 -0
  120. package/dist/backend/search/query-parser.js.map +1 -0
  121. package/dist/backend/search/ranking-engine.d.ts +31 -0
  122. package/dist/backend/search/ranking-engine.d.ts.map +1 -0
  123. package/dist/backend/search/ranking-engine.js +165 -0
  124. package/dist/backend/search/ranking-engine.js.map +1 -0
  125. package/dist/backend/search/result-formatter.d.ts +29 -0
  126. package/dist/backend/search/result-formatter.d.ts.map +1 -0
  127. package/dist/backend/search/result-formatter.js +70 -0
  128. package/dist/backend/search/result-formatter.js.map +1 -0
  129. package/dist/backend/service-types.d.ts +184 -0
  130. package/dist/backend/service-types.d.ts.map +1 -0
  131. package/dist/backend/service-types.js +7 -0
  132. package/dist/backend/service-types.js.map +1 -0
  133. package/dist/backend/services/embedding-service.d.ts +75 -0
  134. package/dist/backend/services/embedding-service.d.ts.map +1 -0
  135. package/dist/backend/services/embedding-service.js +298 -0
  136. package/dist/backend/services/embedding-service.js.map +1 -0
  137. package/dist/backend/services/file-watcher.d.ts +17 -0
  138. package/dist/backend/services/file-watcher.d.ts.map +1 -0
  139. package/dist/backend/services/file-watcher.js +92 -0
  140. package/dist/backend/services/file-watcher.js.map +1 -0
  141. package/dist/backend/services/index-information-service.d.ts +114 -0
  142. package/dist/backend/services/index-information-service.d.ts.map +1 -0
  143. package/dist/backend/services/index-information-service.js +104 -0
  144. package/dist/backend/services/index-information-service.js.map +1 -0
  145. package/dist/backend/services/ngao-search-service.d.ts +107 -0
  146. package/dist/backend/services/ngao-search-service.d.ts.map +1 -0
  147. package/dist/backend/services/ngao-search-service.js +384 -0
  148. package/dist/backend/services/ngao-search-service.js.map +1 -0
  149. package/dist/backend/services/quantization-service.d.ts +53 -0
  150. package/dist/backend/services/quantization-service.d.ts.map +1 -0
  151. package/dist/backend/services/quantization-service.js +84 -0
  152. package/dist/backend/services/quantization-service.js.map +1 -0
  153. package/dist/backend/services/reindex-manager.d.ts +25 -0
  154. package/dist/backend/services/reindex-manager.d.ts.map +1 -0
  155. package/dist/backend/services/reindex-manager.js +78 -0
  156. package/dist/backend/services/reindex-manager.js.map +1 -0
  157. package/dist/backend/services/session-manager.d.ts +115 -0
  158. package/dist/backend/services/session-manager.d.ts.map +1 -0
  159. package/dist/backend/services/session-manager.js +150 -0
  160. package/dist/backend/services/session-manager.js.map +1 -0
  161. package/dist/backend/services/vector-search-service.d.ts +81 -0
  162. package/dist/backend/services/vector-search-service.d.ts.map +1 -0
  163. package/dist/backend/services/vector-search-service.js +143 -0
  164. package/dist/backend/services/vector-search-service.js.map +1 -0
  165. package/dist/backend/utils/file-utils.d.ts +92 -0
  166. package/dist/backend/utils/file-utils.d.ts.map +1 -0
  167. package/dist/backend/utils/file-utils.js +247 -0
  168. package/dist/backend/utils/file-utils.js.map +1 -0
  169. package/dist/cli/setup.d.ts +4 -0
  170. package/dist/cli/setup.d.ts.map +1 -0
  171. package/dist/cli/setup.js +138 -0
  172. package/dist/cli/setup.js.map +1 -0
  173. package/dist/index.d.ts +6 -0
  174. package/dist/index.d.ts.map +1 -0
  175. package/dist/index.js +22 -0
  176. package/dist/index.js.map +1 -0
  177. package/dist/main.d.ts +14 -0
  178. package/dist/main.d.ts.map +1 -0
  179. package/dist/main.js +7 -67075
  180. package/dist/main.js.map +1 -0
  181. package/dist/mcp/tool-schemas.d.ts +205 -0
  182. package/dist/mcp/tool-schemas.d.ts.map +1 -0
  183. package/dist/mcp/tool-schemas.js +391 -0
  184. package/dist/mcp/tool-schemas.js.map +1 -0
  185. package/dist/server/logger.d.ts +50 -0
  186. package/dist/server/logger.d.ts.map +1 -0
  187. package/dist/server/logger.js +77 -0
  188. package/dist/server/logger.js.map +1 -0
  189. package/dist/server/tool-registry.d.ts +64 -0
  190. package/dist/server/tool-registry.d.ts.map +1 -0
  191. package/dist/server/tool-registry.js +93 -0
  192. package/dist/server/tool-registry.js.map +1 -0
  193. package/dist/server/transports/mcp-transport.d.ts +31 -0
  194. package/dist/server/transports/mcp-transport.d.ts.map +1 -0
  195. package/dist/server/transports/mcp-transport.js +331 -0
  196. package/dist/server/transports/mcp-transport.js.map +1 -0
  197. package/dist/server/transports/rest-transport.d.ts +36 -0
  198. package/dist/server/transports/rest-transport.d.ts.map +1 -0
  199. package/dist/server/transports/rest-transport.js +250 -0
  200. package/dist/server/transports/rest-transport.js.map +1 -0
  201. package/docs/API.md +116 -0
  202. package/docs/ARCHITECTURE.md +101 -0
  203. package/docs/FILE_WATCHING.md +120 -0
  204. package/docs/INSTALLATION.md +87 -0
  205. package/docs/MCP_INTEGRATION.md +108 -0
  206. package/docs/README.md +288 -0
  207. package/docs/USAGE.md +123 -0
  208. package/docs/architecture-design-standards/01_ARCHITECTURE.md +863 -0
  209. package/docs/architecture-design-standards/02_SEARCH_ENGINE_DESIGN.md +958 -0
  210. package/docs/architecture-design-standards/03_DATAFLOW.md +1000 -0
  211. package/docs/architecture-design-standards/04_VISUAL_GUIDE.md +922 -0
  212. package/docs/architecture-design-standards/05_REPOSITORY_PATTERN_GUIDE.md +503 -0
  213. package/docs/architecture-design-standards/06_IMPLEMENTATION_PATTERNS.md +1026 -0
  214. package/docs/architecture-design-standards/07_TYPESCRIPT_GUIDE.md +1027 -0
  215. package/docs/architecture-design-standards/08_CODING_STANDARDS.md +1274 -0
  216. package/docs/reference/01_START_HERE.md +108 -0
  217. package/docs/reference/02_QUICK_REFERENCE.md +363 -0
  218. package/docs/reference/03_DOCUMENTATION_INDEX.md +293 -0
  219. package/docs/reference/04_DELIVERY_SUMMARY.md +463 -0
  220. package/docs/reference/05_IMPLEMENTATION_OVERVIEW.md +319 -0
  221. package/docs/reference/06_RESEARCH_SUMMARY.md +519 -0
  222. package/docs/tracking/03_IMPLEMENTATION_ROADMAP.md +788 -0
  223. package/jest.config.json +12 -0
  224. package/package.json +46 -53
  225. package/prepend-shebang.js +18 -0
  226. package/scripts/setup-mcp.sh +66 -0
  227. package/src/backend/index.ts +5 -0
  228. package/src/backend/service-types.ts +219 -0
  229. package/src/backend/services/file-watcher.ts +79 -0
  230. package/src/backend/services/ngao-search-service.ts +430 -0
  231. package/src/backend/services/reindex-manager.ts +90 -0
  232. package/src/backend/services/session-manager.ts +214 -0
  233. package/src/cli/setup.ts +122 -0
  234. package/src/index.ts +6 -0
  235. package/src/main.ts +225 -0
  236. package/src/mcp/tool-schemas.ts +439 -0
  237. package/src/server/logger.ts +88 -0
  238. package/src/server/tool-registry.ts +117 -0
  239. package/src/server/transports/mcp-transport.ts +374 -0
  240. package/src/server/transports/rest-transport.ts +258 -0
  241. package/tests/unit/agent-tools.test.ts +454 -0
  242. package/tests/unit/file-watcher.test.d.ts +2 -0
  243. package/tests/unit/file-watcher.test.d.ts.map +1 -0
  244. package/tests/unit/file-watcher.test.js +9 -0
  245. package/tests/unit/file-watcher.test.js.map +1 -0
  246. package/tests/unit/file-watcher.test.ts +7 -0
  247. package/tests/unit/search-integration.test.ts +256 -0
  248. package/tests/unit/services.test.d.ts +2 -0
  249. package/tests/unit/services.test.d.ts.map +1 -0
  250. package/tests/unit/services.test.js +9 -0
  251. package/tests/unit/services.test.js.map +1 -0
  252. package/tests/unit/services.test.ts +7 -0
  253. package/tsconfig.json +23 -0
  254. package/webpack.backend.config.js +60 -0
  255. package/webpack.config.js +34 -0
  256. package/models/Xenova/all-MiniLM-L6-v2/config.json +0 -25
  257. package/models/Xenova/all-MiniLM-L6-v2/onnx/model_quantized.onnx +0 -0
  258. package/models/Xenova/all-MiniLM-L6-v2/tokenizer.json +0 -30686
  259. package/models/Xenova/all-MiniLM-L6-v2/tokenizer_config.json +0 -15
@@ -0,0 +1,922 @@
1
+ # Visual Architecture & Decision Trees
2
+
3
+ ## 1. System Architecture Visualization
4
+
5
+ ```
6
+ ┌────────────────────────────────────────────────────────────────────┐
7
+ │ NGAO SEARCH SYSTEM │
8
+ │ Multi-Format LLM-Friendly │
9
+ │ Code/Document Search │
10
+ └────────────────────────────────────────────────────────────────────┘
11
+
12
+ ╔════════════════════════════════════════════════════════════════════╗
13
+ ║ INPUT LAYER ║
14
+ ╚════════════════════════════════════════════════════════════════════╝
15
+
16
+ ┌─────────────┬─────────────┬──────────────┬──────────────┐
17
+ │ Python │ Markdown │ JavaScript │ JSON/YAML │
18
+ │ Files │ Files │ Files │ Config Files │
19
+ └──────┬──────┴──────┬──────┴───────┬──────┴──────┬───────┘
20
+ │ │ │ │
21
+ └─────────────┼──────────────┼─────────────┘
22
+
23
+
24
+ ╔════════════════════════════════════════════════════════════════════╗
25
+ ║ PARSING LAYER ║
26
+ ║ (Format-Specific AST Extraction) ║
27
+ ╚════════════════════════════════════════════════════════════════════╝
28
+
29
+ ┌─────────────────────────────────────────┐
30
+ │ FileType Router & Parser Selector │
31
+ └─────────────────────────────────────────┘
32
+
33
+ ┌──────────────┬──────────────┬──────────────┬──────────────┐
34
+ │ │ │ │ │
35
+ ▼ ▼ ▼ ▼ ▼
36
+ ┌────────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
37
+ │ Python │ │ Markdown │ │ Babel/ │ │ JSON │ │ Generic │
38
+ │ AST │ │ remark │ │ tree- │ │ Parser │ │ Text │
39
+ │ Traversal │ │ Unified │ │ sitter │ │ │ │ Parser │
40
+ └────────────┘ └───────────┘ └──────────┘ └──────────┘ └─────────┘
41
+
42
+ Extract: Extract: Extract: Extract: Extract:
43
+ ├─ Functions ├─ Headings ├─ Functions ├─ Keys ├─ Words
44
+ ├─ Classes ├─ Sections ├─ Classes ├─ Values └─ Lines
45
+ ├─ Methods ├─ Paragraphs ├─ Hooks └─ Nesting
46
+ ├─ Docstrings └─ Code └─ JSDoc
47
+ └─ Decorators blocks
48
+
49
+
50
+ ┌─────────────────────────────────────────┐
51
+ │ Block Extraction & Normalization │
52
+ │ (Compute scope, metadata, line ranges) │
53
+ └─────────────────────────────────────────┘
54
+
55
+ BLOCKS
56
+ ┌──────────────────┐
57
+ │ block_id: func_42│
58
+ │ file: src/auth.. │
59
+ │ type: method │
60
+ │ scope: [class] │
61
+ │ lines: 45-78 │
62
+ │ content: ... │
63
+ │ metadata: {...} │
64
+ └──────────────────┘
65
+
66
+ ╔════════════════════════════════════════════════════════════════════╗
67
+ ║ INDEXING LAYER ║
68
+ ║ (Build Specialized Search Indexes) ║
69
+ ╚════════════════════════════════════════════════════════════════════╝
70
+
71
+ Normalized Blocks
72
+
73
+ ┌──────────┴──────────┬─────────────┬─────────────┐
74
+ │ │ │ │
75
+ ▼ ▼ ▼ ▼
76
+ ┌───────────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────┐
77
+ │ INVERTED │ │ SCOPE INDEX │ │ BLOCK │ │SEMANTIC │
78
+ │ INDEX │ │ (Hierarchy) │ │REGISTRY │ │(Embeddings)
79
+ ├───────────────┤ ├──────────────┤ ├──────────┤ ├──────────┤
80
+ │ keyword→pos │ │file→scope→ │ │block_id→ │ │block→ │
81
+ │ │ │children │ │metadata │ │embedding │
82
+ │auth: │ │ │ │ │ │ │
83
+ │├─files:[..] │ │src/auth.py: │ │func_42: {│ │[0.1, │
84
+ │└─pos:[45,50] │ │├─module │ │file:..., │ │0.2, │
85
+ │ │ │├─class:Auth │ │line:45-78│ │0.3,..] │
86
+ │(70% storage) │ │└─method:h │ │type:meth │ │ │
87
+ │ │ │ │ │} │ │(optional)
88
+ │(Search by │ │(20% storage) │ │ │ │ │
89
+ │ keywords) │ │ │ │(10%st) │ │(10-20% │
90
+ │ │ │(Navigate │ │ │ │ storage)│
91
+ │ │ │structure) │ │(Quick │ │ │
92
+ │ │ │ │ │lookup) │ │(Semantic│
93
+ │ │ │ │ │ │ │ search) │
94
+ └───────────────┘ └──────────────┘ └──────────┘ └──────────┘
95
+
96
+ ┌─────────────────────────────────────┐
97
+ │ Persist Indexes to Storage │
98
+ │ (SQLite + JSON or pure JSON) │
99
+ └─────────────────────────────────────┘
100
+
101
+ .ngao_search/
102
+ ├─ inverted_index.json
103
+ ├─ scope_index.json
104
+ ├─ block_registry.json
105
+ └─ index_metadata.json
106
+
107
+
108
+ ╔════════════════════════════════════════════════════════════════════╗
109
+ ║ QUERY LAYER ║
110
+ ╚════════════════════════════════════════════════════════════════════╝
111
+
112
+ User Query: "find auth handler with retry"
113
+
114
+ ┌───────────────────────────────┐
115
+ │ Query Parser & Analyzer │
116
+ ├───────────────────────────────┤
117
+ │ • Tokenize │
118
+ │ • Remove stopwords │
119
+ │ • Extract filters (type:...) │
120
+ │ • Plan search strategy │
121
+ └───────────────────────────────┘
122
+
123
+ Terms: ["auth", "handler", "retry"]
124
+ Filters: {}
125
+ Strategy: multi_index
126
+
127
+ ┌──────────────┬────────────┬──────────┐
128
+ │ │ │ │
129
+ ▼ ▼ ▼ ▼
130
+ KEYWORD SCOPE SEMANTIC REGEX
131
+ SEARCH SEARCH SEARCH (opt) SEARCH
132
+ │ │ │ │
133
+ └──────────────┴─────────────┴────────────┘
134
+
135
+ Aggregate & Deduplicate Results
136
+ └─→ [block_42, block_88, method_15, ...]
137
+
138
+ ┌─────────────────────────────────────┐
139
+ │ Context Extraction │
140
+ │ • Load source file │
141
+ │ • Extract snippet ±context │
142
+ │ • Preserve formatting │
143
+ │ • Highlight matches │
144
+ └─────────────────────────────────────┘
145
+
146
+ ┌─────────────────────────────────────┐
147
+ │ Relevance Ranking (Multi-Factor) │
148
+ │ │
149
+ │ Score = 0.35×keyword_match + │
150
+ │ 0.25×position + │
151
+ │ 0.15×scope_specificity + │
152
+ │ 0.15×recency + │
153
+ │ 0.10×frequency │
154
+ └─────────────────────────────────────┘
155
+
156
+ Sort by Score (DESC)
157
+ Truncate to max_results
158
+
159
+
160
+ ╔════════════════════════════════════════════════════════════════════╗
161
+ ║ OUTPUT LAYER ║
162
+ ║ (LLM-Friendly Structured Format) ║
163
+ ╚════════════════════════════════════════════════════════════════════╝
164
+
165
+ ┌─────────────────────────────────────┐
166
+ │ Transform to Type-Specific Format │
167
+ └─────────────────────────────────────┘
168
+
169
+ For Python:
170
+ {type: "python", signature: "...", decorators: [...]}
171
+
172
+ For Markdown:
173
+ {type: "markdown", heading_hierarchy: [...]}
174
+
175
+ For JavaScript:
176
+ {type: "javascript", jsdoc: "..."}
177
+
178
+ For JSON:
179
+ {type: "json", key_path: [...], value: "..."}
180
+
181
+ ┌──────────────────────────────────────┐
182
+ │ Wrap in LLM-Friendly Schema │
183
+ │ • Structured JSON │
184
+ │ • All metadata included │
185
+ │ • Scope hierarchy clear │
186
+ │ • Match positions highlighted │
187
+ └──────────────────────────────────────┘
188
+
189
+ ┌───────────────┬──────────┬──────────┬────────┐
190
+ │ │ │ │ │
191
+ ▼ ▼ ▼ ▼ ▼
192
+ JSON API CLI IDE Web UI LLM
193
+ (HTTP) (Terminal) Plugin Interface Integration
194
+
195
+
196
+ ┌─────────────────────────────────────────────────────────────────────┐
197
+ │ RESULT EXAMPLE (LLM-Ready JSON): │
198
+ ├─────────────────────────────────────────────────────────────────────┤
199
+ │ { │
200
+ │ "rank": 1, │
201
+ │ "relevance_score": 0.92, │
202
+ │ "file": {"path": "src/auth/handler.py", "type": "python"}, │
203
+ │ "location": { │
204
+ │ "start_line": 45, │
205
+ │ "end_line": 78, │
206
+ │ "scope_hierarchy": ["module", "class:AuthHandler"] │
207
+ │ }, │
208
+ │ "match": { │
209
+ │ "name": "handle", │
210
+ │ "signature": "def handle(self, request) -> Response:", │
211
+ │ "matched_terms": ["handle", "auth"], │
212
+ │ "match_positions": [{"line": 45, "text": "def handle"}, ...] │
213
+ │ }, │
214
+ │ "content": { │
215
+ │ "snippet": "[code with context and line numbers]", │
216
+ │ "context_lines": 15 │
217
+ │ }, │
218
+ │ "metadata": { │
219
+ │ "docstring": "Handle authentication...", │
220
+ │ "decorators": ["@validate_request"], │
221
+ │ "is_public": true │
222
+ │ } │
223
+ │ } │
224
+ └─────────────────────────────────────────────────────────────────────┘
225
+ ```
226
+
227
+ ---
228
+
229
+ ## 2. File Type Decision Tree
230
+
231
+ ```
232
+ ┌─────────────────────────┐
233
+ │ Incoming File │
234
+ └────────────┬────────────┘
235
+
236
+ Check File Extension
237
+
238
+ ┌────────┼────────┬────────┬────────┐
239
+ │ │ │ │ │
240
+ *.py *.md *.js *.json Other
241
+ │ │ │ │ │
242
+ ▼ ▼ ▼ ▼ ▼
243
+
244
+ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ ┌─────────┐
245
+ │ Python │ │ Markdown │ │JavaScript│ │ JSON │ │ Generic │
246
+ │ Parser │ │ Parser │ │ Parser │ │ Parser │ │ Parser │
247
+ └────┬────┘ └────┬─────┘ └────┬─────┘ └────┬────┘ └────┬────┘
248
+ │ │ │ │ │
249
+ ▼ ▼ ▼ ▼ ▼
250
+
251
+ Parse with Parse with Parse with Parse with Simple
252
+ `ast` `remark` `babel` JSON API Regex
253
+ module unified parser matching
254
+
255
+ Extract: Extract: Extract: Extract: Extract:
256
+ ├─ Classes ├─ Headings ├─ Classes ├─ Keys └─ Words
257
+ ├─ Funcs ├─ Sections ├─ Funcs ├─ Values
258
+ ├─ Methods ├─ Params ├─ Hooks └─ Nesting
259
+ ├─ Types └─ Code ├─ Exports
260
+ └─ Docstr blocks └─ JSDoc
261
+
262
+ Blocks: Blocks: Blocks: Blocks: Blocks:
263
+ scope: scope: scope: scope: scope:
264
+ module→ H1→H2→... module→ path→ (none)
265
+ class→ section class→ sub_key
266
+ method para method
267
+
268
+ ▼ ▼ ▼ ▼ ▼
269
+
270
+ ╔═════════════════════════════════════════════════════╗
271
+ ║ Normalized Block Objects (Common Format) ║
272
+ ╠═════════════════════════════════════════════════════╣
273
+ ║ • file_path ║
274
+ ║ • block_id ║
275
+ ║ • item_type ║
276
+ ║ • name ║
277
+ ║ • content ║
278
+ ║ • location (start_line, end_line) ║
279
+ ║ • scope_hierarchy ║
280
+ ║ • metadata (type-specific) ║
281
+ ║ • docstring ║
282
+ ╚═════════════════════════════════════════════════════╝
283
+
284
+ → To Indexing Layer
285
+ ```
286
+
287
+ ---
288
+
289
+ ## 3. Query Processing Decision Tree
290
+
291
+ ```
292
+ ┌────────────────────────┐
293
+ │ Raw User Query │
294
+ └────────────┬───────────┘
295
+
296
+ ┌────────────────────────┐
297
+ │ Parse Query │
298
+ ├────────────────────────┤
299
+ │ Tokenize: split into │
300
+ │ individual terms │
301
+ └────────────┬───────────┘
302
+
303
+ ┌───────┴───────┐
304
+ │ │
305
+ Extract Filters Analyze Terms
306
+ │ │
307
+ type:*.py Remove
308
+ scope:auth.* stopwords
309
+ file:*.py │
310
+ │ └─→ Stemming
311
+ │ │
312
+ │ "authentication"
313
+ │ → "auth"
314
+ │ │
315
+ ├─────────────────┘
316
+
317
+
318
+ ┌──────────────────────────┐
319
+ │ Plan Search Strategy │
320
+ │ │
321
+ │ Single term? → Keywords │
322
+ │ Multi-term? → Multi-idx │
323
+ │ Regex? → Pattern search │
324
+ │ Semantic? → Embeddings │
325
+ └──────────┬───────────────┘
326
+
327
+ ┌───────────┴────────────┬──────────┬───────────┐
328
+ │ │ │ │
329
+ ▼ ▼ ▼ ▼
330
+
331
+ KEYWORD SEARCH SCOPE SEARCH SEMANTIC REGEX
332
+ (Inverted Index) (Hierarchy) SEARCH SEARCH
333
+
334
+ Search for each Filter by Find Match
335
+ term in inverted scope path similar pattern
336
+ index (scope:auth.*) embeddings
337
+
338
+ Find position Narrow down Find Get all
339
+ info in inverted results to similar matching
340
+ index certain parts code files
341
+ of structure
342
+
343
+ Match all │ │ │
344
+ results │ │ │
345
+ │ │ │ │
346
+ ├──────────────────┴───────────────┴───────────┴──→
347
+
348
+ ┌─────────────────────────────┐
349
+ │ Aggregate Results │
350
+ │ • Combine from all indexes │
351
+ │ • Deduplicate blocks │
352
+ │ • Prepare for ranking │
353
+ └──────────┬──────────────────┘
354
+
355
+
356
+ ┌─────────────────────────────┐
357
+ │ Extract Context │
358
+ │ • Load source files │
359
+ │ • Get snippet ±context │
360
+ │ • Preserve formatting │
361
+ └──────────┬──────────────────┘
362
+
363
+
364
+ ┌──────────────────────────────┐
365
+ │ Rank Results │
366
+ │ │
367
+ │ For each result compute: │
368
+ │ score = 0.35×kw_match + │
369
+ │ 0.25×position + │
370
+ │ 0.15×scope + │
371
+ │ 0.15×recency + │
372
+ │ 0.10×frequency │
373
+ └──────────┬───────────────────┘
374
+
375
+
376
+ ┌─────────────────────────────┐
377
+ │ Sort & Truncate │
378
+ │ • Sort by score (DESC) │
379
+ │ • Keep top N results │
380
+ │ • Return to formatter │
381
+ └──────────┬──────────────────┘
382
+
383
+
384
+ Formatted Results
385
+ ```
386
+
387
+ ---
388
+
389
+ ## 4. Ranking Factor Contribution
390
+
391
+ ```
392
+ RELEVANCE SCORE BREAKDOWN
393
+ ═════════════════════════════════════════════════════════
394
+
395
+ Query: "handle auth retry" searching in codebase
396
+
397
+ For each matching block, compute:
398
+
399
+ FACTOR 1: KEYWORD MATCH (Weight: 0.35)
400
+ ───────────────────────────────────────
401
+
402
+ Block name: "handle_auth_retry"
403
+ Terms matched: ["handle", "auth", "retry"]
404
+ Match score: 3/3 terms = 1.0
405
+
406
+ Contribution: 1.0 × 0.35 = 0.35
407
+
408
+ [████████████████████████████████████] 35%
409
+
410
+
411
+ FACTOR 2: POSITION (Weight: 0.25)
412
+ ─────────────────────────────────
413
+
414
+ Match location:
415
+ - In function name: ✓ (score: 1.0)
416
+ - In docstring: ✓ (adds: 0.0)
417
+ - In code body: ✓ (adds: 0.0)
418
+
419
+ Position score: 1.0
420
+
421
+ Contribution: 1.0 × 0.25 = 0.25
422
+
423
+ [████████████████████████] 25%
424
+
425
+
426
+ FACTOR 3: SCOPE SPECIFICITY (Weight: 0.15)
427
+ ──────────────────────────────────────────
428
+
429
+ Scope: ["module", "class:AuthHandler", "method:handle"]
430
+ Depth: 3
431
+ Score: min(0.5 + (3 × 0.1), 1.0) = 0.8
432
+
433
+ Contribution: 0.8 × 0.15 = 0.12
434
+
435
+ [██████████████] 12%
436
+
437
+
438
+ FACTOR 4: RECENCY (Weight: 0.15)
439
+ ────────────────────────────────
440
+
441
+ Last modified: 2 hours ago
442
+ Score: 1.0 (within 24 hours)
443
+
444
+ Contribution: 1.0 × 0.15 = 0.15
445
+
446
+ [███████████████] 15%
447
+
448
+
449
+ FACTOR 5: FREQUENCY (Weight: 0.10)
450
+ ──────────────────────────────────
451
+
452
+ Term occurrences:
453
+ - "handle": 2 times
454
+ - "auth": 3 times
455
+ - "retry": 1 time
456
+ Total: 6 times in 50 lines
457
+ Normalized: min(6 / 50, 1.0) = 0.12
458
+
459
+ Contribution: 0.12 × 0.10 = 0.012
460
+
461
+ [█] ~1%
462
+
463
+
464
+ TOTAL RELEVANCE SCORE:
465
+ ═════════════════════════════════════════════════════════
466
+
467
+ 0.35 + 0.25 + 0.12 + 0.15 + 0.012 = 0.862
468
+
469
+ Final Score: 0.86 / 1.0
470
+ ┌──────────────────────────────────────────┐
471
+ │████████████████████████████████████░░░░░░│
472
+ └──────────────────────────────────────────┘
473
+ 86% relevance
474
+
475
+ Result Rank: Top results sorted by this score
476
+ ```
477
+
478
+ ---
479
+
480
+ ## 5. Block Extraction Flowchart
481
+
482
+ ```
483
+ INPUT: Raw source code file
484
+
485
+ ├─→ PYTHON FILE (*.py)
486
+ │ │
487
+ │ ├─ Parse with ast.parse()
488
+ │ │ │
489
+ │ │ ├─→ ModuleVisitor.visit()
490
+ │ │ │ │
491
+ │ │ │ ├─→ Module docstring
492
+ │ │ │ │
493
+ │ │ │ ├─→ Top-level functions
494
+ │ │ │ │ └─→ For each:
495
+ │ │ │ │ ├─ Extract params
496
+ │ │ │ │ ├─ Get docstring
497
+ │ │ │ │ ├─ Get decorators
498
+ │ │ │ │ └─ Line numbers
499
+ │ │ │ │
500
+ │ │ │ └─→ Class definitions
501
+ │ │ │ └─→ For each class:
502
+ │ │ │ ├─ Get docstring
503
+ │ │ │ ├─ Get decorators
504
+ │ │ │ └─→ Class methods
505
+ │ │ │ └─→ For each method:
506
+ │ │ │ ├─ Extract params
507
+ │ │ │ ├─ Get return type
508
+ │ │ │ ├─ Get docstring
509
+ │ │ │ └─ Line numbers
510
+ │ │ │
511
+ │ │ └─→ Normalize all blocks
512
+ │ │ ├─ Compute scope_hierarchy
513
+ │ │ ├─ Extract content
514
+ │ │ ├─ Get line ranges
515
+ │ │ └─ Create block_ids
516
+ │ │
517
+ │ └─→ Output: List[Block]
518
+
519
+ ├─→ MARKDOWN FILE (*.md)
520
+ │ │
521
+ │ ├─ Parse with remark/unified
522
+ │ │ │
523
+ │ │ ├─→ Heading traversal
524
+ │ │ │ ├─ H1 → Top-level section
525
+ │ │ │ ├─ H2 → Subsection
526
+ │ │ │ ├─ H3 → Sub-subsection
527
+ │ │ │ └─ Track hierarchy
528
+ │ │ │
529
+ │ │ ├─→ Content between headings
530
+ │ │ │ ├─ Paragraphs
531
+ │ │ │ ├─ Code blocks (with language)
532
+ │ │ │ ├─ Lists
533
+ │ │ │ └─ Tables
534
+ │ │ │
535
+ │ │ ├─→ Frontmatter (YAML)
536
+ │ │ │ └─ Extract metadata
537
+ │ │ │
538
+ │ │ └─→ Normalize sections
539
+ │ │ ├─ Group content by heading
540
+ │ │ ├─ Create hierarchy path
541
+ │ │ ├─ Line number tracking
542
+ │ │ └─ Create block_ids
543
+ │ │
544
+ │ └─→ Output: List[Block]
545
+
546
+ ├─→ JAVASCRIPT FILE (*.js/*.jsx)
547
+ │ │
548
+ │ ├─ Parse with @babel/parser
549
+ │ │ │
550
+ │ │ ├─→ Extract top-level:
551
+ │ │ │ ├─ Import statements
552
+ │ │ │ ├─ Export statements
553
+ │ │ │ └─ Variable declarations
554
+ │ │ │
555
+ │ │ ├─→ Function declarations
556
+ │ │ │ └─→ For each:
557
+ │ │ │ ├─ Get parameters
558
+ │ │ │ ├─ Get JSDoc
559
+ │ │ │ └─ Line range
560
+ │ │ │
561
+ │ │ ├─→ Arrow functions (const useHook = ...)
562
+ │ │ │ └─→ For each:
563
+ │ │ │ ├─ Identify hook pattern
564
+ │ │ │ ├─ Get JSDoc
565
+ │ │ │ └─ Line range
566
+ │ │ │
567
+ │ │ ├─→ Class definitions
568
+ │ │ │ └─→ For each:
569
+ │ │ │ ├─ Methods
570
+ │ │ │ ├─ Properties
571
+ │ │ │ └─ Constructor
572
+ │ │ │
573
+ │ │ ├─→ React Components (JSX)
574
+ │ │ │ └─→ For each:
575
+ │ │ │ ├─ Props
576
+ │ │ │ ├─ Return JSX
577
+ │ │ │ └─ JSDoc
578
+ │ │ │
579
+ │ │ └─→ Normalize all blocks
580
+ │ │ ├─ Compute scope
581
+ │ │ ├─ Extract signature
582
+ │ │ └─ Line ranges
583
+ │ │
584
+ │ └─→ Output: List[Block]
585
+
586
+ ├─→ JSON FILE (*.json)
587
+ │ │
588
+ │ ├─ Parse as JSON
589
+ │ │ │
590
+ │ │ ├─→ Flatten key hierarchy
591
+ │ │ │ └─ database.connection.pool_size
592
+ │ │ │
593
+ │ │ ├─→ For each key-value:
594
+ │ │ │ ├─ Key path
595
+ │ │ │ ├─ Value
596
+ │ │ │ ├─ Type (string|number|bool|object|array)
597
+ │ │ │ └─ Nesting depth
598
+ │ │ │
599
+ │ │ └─→ Normalize blocks
600
+ │ │ ├─ Create scope from path
601
+ │ │ └─ Line number tracking
602
+ │ │
603
+ │ └─→ Output: List[Block]
604
+
605
+ └─→ GENERIC TEXT FILE
606
+
607
+ ├─ Fallback parsing (last resort)
608
+ │ │
609
+ │ ├─→ Line-by-line analysis
610
+ │ ├─→ Regex pattern matching
611
+ │ └─→ Basic structure detection
612
+
613
+ └─→ Output: List[Block] (minimal metadata)
614
+ ```
615
+
616
+ ---
617
+
618
+ ## 6. Context Extraction Window
619
+
620
+ ```
621
+ SOURCE FILE (Python)
622
+ ═════════════════════════════════════════════════════
623
+
624
+ 40: class AuthHandler:
625
+ 41: """Handles authentication"""
626
+ 42:
627
+ 43: def __init__(self, config):
628
+ 44: self.config = config
629
+ 45:
630
+ 46: def handle(self, request): ← BLOCK START (line 46)
631
+ 47: """Handle incoming request"""
632
+ 48:
633
+ 49: # Validate token
634
+ 50: token = request.headers.get('Authorization')
635
+ 51: if not token:
636
+ 52: return Response(status=401)
637
+ 53:
638
+ 54: # Process authentication
639
+ 55: result = self.validate_token(token)
640
+ 56: if not result:
641
+ 57: return Response(status=403)
642
+ 58:
643
+ 59: return Response(status=200) ← BLOCK END (line 59)
644
+ 60:
645
+ 61: def validate_token(self, token):
646
+ 62: """Validate token"""
647
+
648
+
649
+ CONTEXT EXTRACTION CONFIG
650
+ ═════════════════════════════════════════════════════
651
+
652
+ {
653
+ "file_type": "python",
654
+ "context_lines_before": 5,
655
+ "context_lines_after": 5,
656
+ "include_docstring": true,
657
+ "include_parent_class": true,
658
+ "max_context_lines": 30
659
+ }
660
+
661
+
662
+ EXTRACTED CONTEXT WINDOW
663
+ ═════════════════════════════════════════════════════
664
+
665
+ BEFORE:
666
+ 41: │ """Handles authentication"""
667
+ 42: │
668
+ 43: │ def __init__(self, config):
669
+ 44: │ self.config = config
670
+ 45: │
671
+
672
+ BLOCK CONTENT:
673
+ 46: │ def handle(self, request):
674
+ 47: │ """Handle incoming request"""
675
+ 48: │
676
+ 49: │ # Validate token
677
+ 50: │ token = request.headers.get('Authorization')
678
+ 51: │ if not token:
679
+ 52: │ return Response(status=401)
680
+ 53: │
681
+ 54: │ # Process authentication
682
+ 55: │ result = self.validate_token(token)
683
+ 56: │ if not result:
684
+ 57: │ return Response(status=403)
685
+ 58: │
686
+ 59: │ return Response(status=200)
687
+
688
+ AFTER:
689
+ 60: │
690
+ 61: │ def validate_token(self, token):
691
+ 62: │ """Validate token"""
692
+
693
+
694
+ FORMATTED OUTPUT FOR LLM
695
+ ═════════════════════════════════════════════════════
696
+
697
+ Scope: class AuthHandler → method handle
698
+ Lines: 46-59 (in file src/auth/handler.py)
699
+
700
+ Code:
701
+ ```python
702
+ 41 │ """Handles authentication"""
703
+ 42 │
704
+ 43 │ def __init__(self, config):
705
+ 44 │ self.config = config
706
+ 45 │
707
+ 46 │ def handle(self, request):
708
+ 47 │ """Handle incoming request"""
709
+ 48 │
710
+ 49 │ # Validate token
711
+ 50 │ token = request.headers.get('Authorization')
712
+ 51 │ if not token:
713
+ 52 │ return Response(status=401)
714
+ 53 │
715
+ 54 │ # Process authentication
716
+ 55 │ result = self.validate_token(token)
717
+ 56 │ if not result:
718
+ 57 │ return Response(status=403)
719
+ 58 │
720
+ 59 │ return Response(status=200)
721
+ 60 │
722
+ 61 │ def validate_token(self, token):
723
+ 62 │ """Validate token"""
724
+ ```
725
+
726
+ Docstring: "Handle incoming request"
727
+ Method signature: def handle(self, request) -> ?
728
+ Related method: validate_token() called on line 55
729
+ ```
730
+
731
+ ---
732
+
733
+ ## 7. Performance Optimization Layers
734
+
735
+ ```
736
+ ┌────────────────────────────────────────────────────────┐
737
+ │ OPTIMIZATION LAYER 1: INPUT NORMALIZATION │
738
+ │ (Query Pre-processing) │
739
+ └────────────────────────────────────────────────────────┘
740
+
741
+ Query: "finding authentication token handler retry logic"
742
+
743
+ Tokenize:
744
+ ["finding", "authentication", "token", "handler", "retry", "logic"]
745
+
746
+ Remove stopwords:
747
+ ["authentication", "token", "handler", "retry", "logic"]
748
+ (removed: "finding", "logic")
749
+
750
+ Stem/Lemmatize:
751
+ ["auth", "token", "handle", "retry", "logic"]
752
+
753
+ Optimized query: 5 terms instead of 6
754
+ Reduced search space by 17%
755
+
756
+ Search cost reduction: -20-30%
757
+
758
+
759
+ ┌────────────────────────────────────────────────────────┐
760
+ │ OPTIMIZATION LAYER 2: INDEX STRUCTURE │
761
+ │ (Data Structure Optimization) │
762
+ └────────────────────────────────────────────────────────┘
763
+
764
+ Naive inverted index:
765
+ {
766
+ "auth": [
767
+ {"file": "f1", "line": 45, "block": "b1"},
768
+ {"file": "f1", "line": 50, "block": "b1"},
769
+ {"file": "f2", "line": 100, "block": "b5"},
770
+ ... (100 entries per keyword)
771
+ ]
772
+ }
773
+ Size: ~1 MB per keyword
774
+
775
+ Optimized inverted index:
776
+ {
777
+ "auth": {
778
+ "files": [compressed bitmap of file IDs],
779
+ "positions": [varint-encoded line numbers]
780
+ }
781
+ }
782
+ Size: ~200 KB per keyword
783
+
784
+ Size reduction: 80%
785
+ Lookup time: O(1) → O(1) ✓
786
+
787
+
788
+ ┌────────────────────────────────────────────────────────┐
789
+ │ OPTIMIZATION LAYER 3: QUERY EXECUTION STRATEGY │
790
+ │ (Smart Search Strategy) │
791
+ └────────────────────────────────────────────────────────┘
792
+
793
+ Multi-term query: ["auth", "handler", "retry"]
794
+
795
+ Strategy A (Naive): Search all terms, combine results
796
+ Time: O(log n × 3 terms) = O(3 log n)
797
+ Results: 1000 candidates
798
+
799
+ Strategy B (Optimized): Search smallest first, prune
800
+ 1. Count postings for each term:
801
+ - "retry": 45 results (rarest)
802
+ - "auth": 200 results
803
+ - "handler": 500 results
804
+
805
+ 2. Search "retry" first (smallest):
806
+ 45 candidates
807
+
808
+ 3. Intersect with "auth":
809
+ 45 × 200 positions checked
810
+ ~35 survivors
811
+
812
+ 4. Intersect with "handler":
813
+ ~25 final results
814
+
815
+ Time: O(log n + intersection cost)
816
+ Results: 25 candidates
817
+
818
+ Speedup: 40x fewer candidates
819
+
820
+
821
+ ┌────────────────────────────────────────────────────────┐
822
+ │ OPTIMIZATION LAYER 4: CACHING │
823
+ │ (Multi-Level Cache) │
824
+ └────────────────────────────────────────────────────────┘
825
+
826
+ Query Layers:
827
+ ┌─────────────────────────────────┐
828
+ │ Level 1: Query Result Cache │ Fast
829
+ │ (LRU, TTL: 1 hour) │ Hit rate: 30-40%
830
+ │ Size: 50 MB │
831
+ └────────────────────────────────┐
832
+
833
+ Miss │ (70% of queries)
834
+
835
+ ┌─────────────────────────────────┐
836
+ │ Level 2: Block Content Cache │ Medium
837
+ │ (Recently accessed blocks) │ Hit rate: 60-70%
838
+ │ Size: 100 MB │
839
+ └────────────────────────────────┐
840
+
841
+ Miss │ (30% of queries)
842
+
843
+ ┌─────────────────────────────────┐
844
+ │ Level 3: Disk (SQLite/JSON) │ Slow
845
+ │ Size: ~75 MB │ Read time: 50-200 ms
846
+ └─────────────────────────────────┘
847
+
848
+ Overall performance:
849
+ - Cache hit (L1): ~5 ms (70% speedup vs disk)
850
+ - Cache hit (L2): ~20 ms (10-15x speedup)
851
+ - Cache miss: ~150 ms (disk read)
852
+ - Average (with cache): ~30 ms
853
+
854
+ Cache speedup factor: 5-6x
855
+
856
+
857
+ ┌────────────────────────────────────────────────────────┐
858
+ │ OPTIMIZATION LAYER 5: INCREMENTAL INDEXING │
859
+ │ (Change-Based Updates) │
860
+ └────────────────────────────────────────────────────────┘
861
+
862
+ Full Reindex (without incremental):
863
+ - 1000 files × 100 blocks/file = 100k blocks
864
+ - Time: 10-15 minutes
865
+ - Cost: 100% reprocessing
866
+
867
+ Incremental Indexing:
868
+ File changed: handler.py (2 KB file)
869
+
870
+ 1. Compute hash: abc123def456...
871
+ 2. Compare with stored: xyz789...
872
+ 3. Hash mismatch: FILE CHANGED
873
+
874
+ 4. Load old blocks for handler.py (from registry)
875
+ - Remove 12 old blocks from indexes
876
+ - Time: 10 ms
877
+
878
+ 5. Parse new version: 50 ms
879
+ 6. Extract new blocks: 10 ms
880
+ 7. Add to indexes: 20 ms
881
+ 8. Update metadata: 5 ms
882
+
883
+ Total time: ~95 ms
884
+ Speedup: 6000x faster than full reindex!
885
+
886
+
887
+ ┌────────────────────────────────────────────────────────┐
888
+ │ OPTIMIZATION LAYER 6: PARALLEL PROCESSING │
889
+ │ (Multi-threaded Indexing) │
890
+ └────────────────────────────────────────────────────────┘
891
+
892
+ Sequential Processing:
893
+ File1 → Parse (100ms) → Extract (20ms) → Index (10ms)
894
+ File2 → Parse (100ms) → Extract (20ms) → Index (10ms)
895
+ File3 → Parse (100ms) → Extract (20ms) → Index (10ms)
896
+ Total: 390 ms
897
+
898
+ Parallel Processing (4 workers):
899
+ Worker1: File1 Parse (100ms)
900
+ Worker2: File2 Parse (100ms)
901
+ Worker3: File3 Parse (100ms)
902
+ Worker4: File4 Parse (100ms) ← All parallel
903
+ ───────────────────────────────────
904
+ 100 ms total (parse phase)
905
+
906
+ Worker1: File1 Extract (20ms)
907
+ Worker2: File2 Extract (20ms)
908
+ Worker3: File3 Extract (20ms)
909
+ Worker4: File4 Extract (20ms)
910
+ ───────────────────────────────────
911
+ 20 ms total (extract phase)
912
+
913
+ Worker1-4: Batch index (40ms)
914
+ ───────────────────────────────────
915
+ 40 ms total (index phase)
916
+
917
+ Parallel total: ~160 ms
918
+ Speedup: 2.4x (limited by I/O and batch overhead)
919
+ ```
920
+
921
+ This visual guide provides comprehensive reference for understanding the system architecture and decision-making processes.
922
+