@ngao/search 0.1.0 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +10 -0
- package/.env.example +7 -0
- package/.eslintrc.json +20 -0
- package/.github/workflows/build.yml +39 -0
- package/.github/workflows/release.yml +34 -0
- package/.github/workflows/test.yml +35 -0
- package/.mcp-config.json +14 -0
- package/.prettierrc.json +10 -0
- package/LICENSE +17 -0
- package/Makefile +26 -0
- package/README.md +57 -172
- package/config.example.json +8 -0
- package/dist/backend/api/search-engine.d.ts +40 -0
- package/dist/backend/api/search-engine.d.ts.map +1 -0
- package/dist/backend/api/search-engine.js +227 -0
- package/dist/backend/api/search-engine.js.map +1 -0
- package/dist/backend/core/block-impl.d.ts +32 -0
- package/dist/backend/core/block-impl.d.ts.map +1 -0
- package/dist/backend/core/block-impl.js +33 -0
- package/dist/backend/core/block-impl.js.map +1 -0
- package/dist/backend/core/config-loader.d.ts +68 -0
- package/dist/backend/core/config-loader.d.ts.map +1 -0
- package/dist/backend/core/config-loader.js +234 -0
- package/dist/backend/core/config-loader.js.map +1 -0
- package/dist/backend/core/constants.d.ts +39 -0
- package/dist/backend/core/constants.d.ts.map +1 -0
- package/dist/backend/core/constants.js +57 -0
- package/dist/backend/core/constants.js.map +1 -0
- package/dist/backend/core/enums.d.ts +54 -0
- package/dist/backend/core/enums.d.ts.map +1 -0
- package/dist/backend/core/enums.js +61 -0
- package/dist/backend/core/enums.js.map +1 -0
- package/dist/backend/core/errors.d.ts +83 -0
- package/dist/backend/core/errors.d.ts.map +1 -0
- package/dist/backend/core/errors.js +151 -0
- package/dist/backend/core/errors.js.map +1 -0
- package/dist/backend/core/logger.d.ts +68 -0
- package/dist/backend/core/logger.d.ts.map +1 -0
- package/dist/backend/core/logger.js +151 -0
- package/dist/backend/core/logger.js.map +1 -0
- package/dist/backend/core/models.d.ts +332 -0
- package/dist/backend/core/models.d.ts.map +1 -0
- package/dist/backend/core/models.js +6 -0
- package/dist/backend/core/models.js.map +1 -0
- package/dist/backend/core/service-types.d.ts +184 -0
- package/dist/backend/core/service-types.d.ts.map +1 -0
- package/dist/backend/core/service-types.js +7 -0
- package/dist/backend/core/service-types.js.map +1 -0
- package/dist/backend/core/types.d.ts +219 -0
- package/dist/backend/core/types.d.ts.map +1 -0
- package/dist/backend/core/types.js +109 -0
- package/dist/backend/core/types.js.map +1 -0
- package/dist/backend/index.d.ts +5 -0
- package/dist/backend/index.d.ts.map +1 -0
- package/dist/backend/index.js +13 -0
- package/dist/backend/index.js.map +1 -0
- package/dist/backend/indexing/block-extractor.d.ts +22 -0
- package/dist/backend/indexing/block-extractor.d.ts.map +1 -0
- package/dist/backend/indexing/block-extractor.js +52 -0
- package/dist/backend/indexing/block-extractor.js.map +1 -0
- package/dist/backend/indexing/index-builder.d.ts +26 -0
- package/dist/backend/indexing/index-builder.d.ts.map +1 -0
- package/dist/backend/indexing/index-builder.js +71 -0
- package/dist/backend/indexing/index-builder.js.map +1 -0
- package/dist/backend/parsers/base-file-parser.d.ts +134 -0
- package/dist/backend/parsers/base-file-parser.d.ts.map +1 -0
- package/dist/backend/parsers/base-file-parser.js +149 -0
- package/dist/backend/parsers/base-file-parser.js.map +1 -0
- package/dist/backend/parsers/javascript-parser.d.ts +36 -0
- package/dist/backend/parsers/javascript-parser.d.ts.map +1 -0
- package/dist/backend/parsers/javascript-parser.js +194 -0
- package/dist/backend/parsers/javascript-parser.js.map +1 -0
- package/dist/backend/parsers/json-parser.d.ts +15 -0
- package/dist/backend/parsers/json-parser.d.ts.map +1 -0
- package/dist/backend/parsers/json-parser.js +75 -0
- package/dist/backend/parsers/json-parser.js.map +1 -0
- package/dist/backend/parsers/markdown-parser.d.ts +17 -0
- package/dist/backend/parsers/markdown-parser.d.ts.map +1 -0
- package/dist/backend/parsers/markdown-parser.js +94 -0
- package/dist/backend/parsers/markdown-parser.js.map +1 -0
- package/dist/backend/parsers/parser-factory.d.ts +43 -0
- package/dist/backend/parsers/parser-factory.d.ts.map +1 -0
- package/dist/backend/parsers/parser-factory.js +149 -0
- package/dist/backend/parsers/parser-factory.js.map +1 -0
- package/dist/backend/parsers/python-parser.d.ts +21 -0
- package/dist/backend/parsers/python-parser.d.ts.map +1 -0
- package/dist/backend/parsers/python-parser.js +185 -0
- package/dist/backend/parsers/python-parser.js.map +1 -0
- package/dist/backend/parsers/yaml-parser.d.ts +16 -0
- package/dist/backend/parsers/yaml-parser.d.ts.map +1 -0
- package/dist/backend/parsers/yaml-parser.js +81 -0
- package/dist/backend/parsers/yaml-parser.js.map +1 -0
- package/dist/backend/repositories/implementations/lancedb-block-repository.d.ts +125 -0
- package/dist/backend/repositories/implementations/lancedb-block-repository.d.ts.map +1 -0
- package/dist/backend/repositories/implementations/lancedb-block-repository.js +505 -0
- package/dist/backend/repositories/implementations/lancedb-block-repository.js.map +1 -0
- package/dist/backend/repositories/implementations/lancedb-metadata-repository.d.ts +107 -0
- package/dist/backend/repositories/implementations/lancedb-metadata-repository.d.ts.map +1 -0
- package/dist/backend/repositories/implementations/lancedb-metadata-repository.js +275 -0
- package/dist/backend/repositories/implementations/lancedb-metadata-repository.js.map +1 -0
- package/dist/backend/repositories/implementations/memory-cache.d.ts +18 -0
- package/dist/backend/repositories/implementations/memory-cache.d.ts.map +1 -0
- package/dist/backend/repositories/implementations/memory-cache.js +53 -0
- package/dist/backend/repositories/implementations/memory-cache.js.map +1 -0
- package/dist/backend/repositories/repository.interface.d.ts +334 -0
- package/dist/backend/repositories/repository.interface.d.ts.map +1 -0
- package/dist/backend/repositories/repository.interface.js +7 -0
- package/dist/backend/repositories/repository.interface.js.map +1 -0
- package/dist/backend/search/context-extractor.d.ts +29 -0
- package/dist/backend/search/context-extractor.d.ts.map +1 -0
- package/dist/backend/search/context-extractor.js +106 -0
- package/dist/backend/search/context-extractor.js.map +1 -0
- package/dist/backend/search/multi-index-searcher.d.ts +28 -0
- package/dist/backend/search/multi-index-searcher.d.ts.map +1 -0
- package/dist/backend/search/multi-index-searcher.js +81 -0
- package/dist/backend/search/multi-index-searcher.js.map +1 -0
- package/dist/backend/search/query-parser.d.ts +37 -0
- package/dist/backend/search/query-parser.d.ts.map +1 -0
- package/dist/backend/search/query-parser.js +145 -0
- package/dist/backend/search/query-parser.js.map +1 -0
- package/dist/backend/search/ranking-engine.d.ts +31 -0
- package/dist/backend/search/ranking-engine.d.ts.map +1 -0
- package/dist/backend/search/ranking-engine.js +165 -0
- package/dist/backend/search/ranking-engine.js.map +1 -0
- package/dist/backend/search/result-formatter.d.ts +29 -0
- package/dist/backend/search/result-formatter.d.ts.map +1 -0
- package/dist/backend/search/result-formatter.js +70 -0
- package/dist/backend/search/result-formatter.js.map +1 -0
- package/dist/backend/service-types.d.ts +184 -0
- package/dist/backend/service-types.d.ts.map +1 -0
- package/dist/backend/service-types.js +7 -0
- package/dist/backend/service-types.js.map +1 -0
- package/dist/backend/services/embedding-service.d.ts +75 -0
- package/dist/backend/services/embedding-service.d.ts.map +1 -0
- package/dist/backend/services/embedding-service.js +298 -0
- package/dist/backend/services/embedding-service.js.map +1 -0
- package/dist/backend/services/file-watcher.d.ts +17 -0
- package/dist/backend/services/file-watcher.d.ts.map +1 -0
- package/dist/backend/services/file-watcher.js +92 -0
- package/dist/backend/services/file-watcher.js.map +1 -0
- package/dist/backend/services/index-information-service.d.ts +114 -0
- package/dist/backend/services/index-information-service.d.ts.map +1 -0
- package/dist/backend/services/index-information-service.js +104 -0
- package/dist/backend/services/index-information-service.js.map +1 -0
- package/dist/backend/services/ngao-search-service.d.ts +107 -0
- package/dist/backend/services/ngao-search-service.d.ts.map +1 -0
- package/dist/backend/services/ngao-search-service.js +384 -0
- package/dist/backend/services/ngao-search-service.js.map +1 -0
- package/dist/backend/services/quantization-service.d.ts +53 -0
- package/dist/backend/services/quantization-service.d.ts.map +1 -0
- package/dist/backend/services/quantization-service.js +84 -0
- package/dist/backend/services/quantization-service.js.map +1 -0
- package/dist/backend/services/reindex-manager.d.ts +25 -0
- package/dist/backend/services/reindex-manager.d.ts.map +1 -0
- package/dist/backend/services/reindex-manager.js +78 -0
- package/dist/backend/services/reindex-manager.js.map +1 -0
- package/dist/backend/services/session-manager.d.ts +115 -0
- package/dist/backend/services/session-manager.d.ts.map +1 -0
- package/dist/backend/services/session-manager.js +150 -0
- package/dist/backend/services/session-manager.js.map +1 -0
- package/dist/backend/services/vector-search-service.d.ts +81 -0
- package/dist/backend/services/vector-search-service.d.ts.map +1 -0
- package/dist/backend/services/vector-search-service.js +143 -0
- package/dist/backend/services/vector-search-service.js.map +1 -0
- package/dist/backend/utils/file-utils.d.ts +92 -0
- package/dist/backend/utils/file-utils.d.ts.map +1 -0
- package/dist/backend/utils/file-utils.js +247 -0
- package/dist/backend/utils/file-utils.js.map +1 -0
- package/dist/cli/setup.d.ts +4 -0
- package/dist/cli/setup.d.ts.map +1 -0
- package/dist/cli/setup.js +138 -0
- package/dist/cli/setup.js.map +1 -0
- package/dist/index.d.ts +6 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +22 -0
- package/dist/index.js.map +1 -0
- package/dist/main.d.ts +14 -0
- package/dist/main.d.ts.map +1 -0
- package/dist/main.js +7 -67075
- package/dist/main.js.map +1 -0
- package/dist/mcp/tool-schemas.d.ts +205 -0
- package/dist/mcp/tool-schemas.d.ts.map +1 -0
- package/dist/mcp/tool-schemas.js +391 -0
- package/dist/mcp/tool-schemas.js.map +1 -0
- package/dist/server/logger.d.ts +50 -0
- package/dist/server/logger.d.ts.map +1 -0
- package/dist/server/logger.js +77 -0
- package/dist/server/logger.js.map +1 -0
- package/dist/server/tool-registry.d.ts +64 -0
- package/dist/server/tool-registry.d.ts.map +1 -0
- package/dist/server/tool-registry.js +93 -0
- package/dist/server/tool-registry.js.map +1 -0
- package/dist/server/transports/mcp-transport.d.ts +31 -0
- package/dist/server/transports/mcp-transport.d.ts.map +1 -0
- package/dist/server/transports/mcp-transport.js +331 -0
- package/dist/server/transports/mcp-transport.js.map +1 -0
- package/dist/server/transports/rest-transport.d.ts +36 -0
- package/dist/server/transports/rest-transport.d.ts.map +1 -0
- package/dist/server/transports/rest-transport.js +250 -0
- package/dist/server/transports/rest-transport.js.map +1 -0
- package/docs/API.md +116 -0
- package/docs/ARCHITECTURE.md +101 -0
- package/docs/FILE_WATCHING.md +120 -0
- package/docs/INSTALLATION.md +87 -0
- package/docs/MCP_INTEGRATION.md +108 -0
- package/docs/README.md +288 -0
- package/docs/USAGE.md +123 -0
- package/docs/architecture-design-standards/01_ARCHITECTURE.md +863 -0
- package/docs/architecture-design-standards/02_SEARCH_ENGINE_DESIGN.md +958 -0
- package/docs/architecture-design-standards/03_DATAFLOW.md +1000 -0
- package/docs/architecture-design-standards/04_VISUAL_GUIDE.md +922 -0
- package/docs/architecture-design-standards/05_REPOSITORY_PATTERN_GUIDE.md +503 -0
- package/docs/architecture-design-standards/06_IMPLEMENTATION_PATTERNS.md +1026 -0
- package/docs/architecture-design-standards/07_TYPESCRIPT_GUIDE.md +1027 -0
- package/docs/architecture-design-standards/08_CODING_STANDARDS.md +1274 -0
- package/docs/reference/01_START_HERE.md +108 -0
- package/docs/reference/02_QUICK_REFERENCE.md +363 -0
- package/docs/reference/03_DOCUMENTATION_INDEX.md +293 -0
- package/docs/reference/04_DELIVERY_SUMMARY.md +463 -0
- package/docs/reference/05_IMPLEMENTATION_OVERVIEW.md +319 -0
- package/docs/reference/06_RESEARCH_SUMMARY.md +519 -0
- package/docs/tracking/03_IMPLEMENTATION_ROADMAP.md +788 -0
- package/jest.config.json +12 -0
- package/package.json +46 -53
- package/prepend-shebang.js +18 -0
- package/scripts/setup-mcp.sh +66 -0
- package/src/backend/index.ts +5 -0
- package/src/backend/service-types.ts +219 -0
- package/src/backend/services/file-watcher.ts +79 -0
- package/src/backend/services/ngao-search-service.ts +430 -0
- package/src/backend/services/reindex-manager.ts +90 -0
- package/src/backend/services/session-manager.ts +214 -0
- package/src/cli/setup.ts +122 -0
- package/src/index.ts +6 -0
- package/src/main.ts +225 -0
- package/src/mcp/tool-schemas.ts +439 -0
- package/src/server/logger.ts +88 -0
- package/src/server/tool-registry.ts +117 -0
- package/src/server/transports/mcp-transport.ts +374 -0
- package/src/server/transports/rest-transport.ts +258 -0
- package/tests/unit/agent-tools.test.ts +454 -0
- package/tests/unit/file-watcher.test.d.ts +2 -0
- package/tests/unit/file-watcher.test.d.ts.map +1 -0
- package/tests/unit/file-watcher.test.js +9 -0
- package/tests/unit/file-watcher.test.js.map +1 -0
- package/tests/unit/file-watcher.test.ts +7 -0
- package/tests/unit/search-integration.test.ts +256 -0
- package/tests/unit/services.test.d.ts +2 -0
- package/tests/unit/services.test.d.ts.map +1 -0
- package/tests/unit/services.test.js +9 -0
- package/tests/unit/services.test.js.map +1 -0
- package/tests/unit/services.test.ts +7 -0
- package/tsconfig.json +23 -0
- package/webpack.backend.config.js +60 -0
- package/webpack.config.js +34 -0
- package/models/Xenova/all-MiniLM-L6-v2/config.json +0 -25
- package/models/Xenova/all-MiniLM-L6-v2/onnx/model_quantized.onnx +0 -0
- package/models/Xenova/all-MiniLM-L6-v2/tokenizer.json +0 -30686
- package/models/Xenova/all-MiniLM-L6-v2/tokenizer_config.json +0 -15
|
@@ -0,0 +1,922 @@
|
|
|
1
|
+
# Visual Architecture & Decision Trees
|
|
2
|
+
|
|
3
|
+
## 1. System Architecture Visualization
|
|
4
|
+
|
|
5
|
+
```
|
|
6
|
+
┌────────────────────────────────────────────────────────────────────┐
|
|
7
|
+
│ NGAO SEARCH SYSTEM │
|
|
8
|
+
│ Multi-Format LLM-Friendly │
|
|
9
|
+
│ Code/Document Search │
|
|
10
|
+
└────────────────────────────────────────────────────────────────────┘
|
|
11
|
+
|
|
12
|
+
╔════════════════════════════════════════════════════════════════════╗
|
|
13
|
+
║ INPUT LAYER ║
|
|
14
|
+
╚════════════════════════════════════════════════════════════════════╝
|
|
15
|
+
|
|
16
|
+
┌─────────────┬─────────────┬──────────────┬──────────────┐
|
|
17
|
+
│ Python │ Markdown │ JavaScript │ JSON/YAML │
|
|
18
|
+
│ Files │ Files │ Files │ Config Files │
|
|
19
|
+
└──────┬──────┴──────┬──────┴───────┬──────┴──────┬───────┘
|
|
20
|
+
│ │ │ │
|
|
21
|
+
└─────────────┼──────────────┼─────────────┘
|
|
22
|
+
│
|
|
23
|
+
▼
|
|
24
|
+
╔════════════════════════════════════════════════════════════════════╗
|
|
25
|
+
║ PARSING LAYER ║
|
|
26
|
+
║ (Format-Specific AST Extraction) ║
|
|
27
|
+
╚════════════════════════════════════════════════════════════════════╝
|
|
28
|
+
|
|
29
|
+
┌─────────────────────────────────────────┐
|
|
30
|
+
│ FileType Router & Parser Selector │
|
|
31
|
+
└─────────────────────────────────────────┘
|
|
32
|
+
↓
|
|
33
|
+
┌──────────────┬──────────────┬──────────────┬──────────────┐
|
|
34
|
+
│ │ │ │ │
|
|
35
|
+
▼ ▼ ▼ ▼ ▼
|
|
36
|
+
┌────────────┐ ┌───────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
|
|
37
|
+
│ Python │ │ Markdown │ │ Babel/ │ │ JSON │ │ Generic │
|
|
38
|
+
│ AST │ │ remark │ │ tree- │ │ Parser │ │ Text │
|
|
39
|
+
│ Traversal │ │ Unified │ │ sitter │ │ │ │ Parser │
|
|
40
|
+
└────────────┘ └───────────┘ └──────────┘ └──────────┘ └─────────┘
|
|
41
|
+
|
|
42
|
+
Extract: Extract: Extract: Extract: Extract:
|
|
43
|
+
├─ Functions ├─ Headings ├─ Functions ├─ Keys ├─ Words
|
|
44
|
+
├─ Classes ├─ Sections ├─ Classes ├─ Values └─ Lines
|
|
45
|
+
├─ Methods ├─ Paragraphs ├─ Hooks └─ Nesting
|
|
46
|
+
├─ Docstrings └─ Code └─ JSDoc
|
|
47
|
+
└─ Decorators blocks
|
|
48
|
+
|
|
49
|
+
|
|
50
|
+
┌─────────────────────────────────────────┐
|
|
51
|
+
│ Block Extraction & Normalization │
|
|
52
|
+
│ (Compute scope, metadata, line ranges) │
|
|
53
|
+
└─────────────────────────────────────────┘
|
|
54
|
+
↓
|
|
55
|
+
BLOCKS
|
|
56
|
+
┌──────────────────┐
|
|
57
|
+
│ block_id: func_42│
|
|
58
|
+
│ file: src/auth.. │
|
|
59
|
+
│ type: method │
|
|
60
|
+
│ scope: [class] │
|
|
61
|
+
│ lines: 45-78 │
|
|
62
|
+
│ content: ... │
|
|
63
|
+
│ metadata: {...} │
|
|
64
|
+
└──────────────────┘
|
|
65
|
+
|
|
66
|
+
╔════════════════════════════════════════════════════════════════════╗
|
|
67
|
+
║ INDEXING LAYER ║
|
|
68
|
+
║ (Build Specialized Search Indexes) ║
|
|
69
|
+
╚════════════════════════════════════════════════════════════════════╝
|
|
70
|
+
|
|
71
|
+
Normalized Blocks
|
|
72
|
+
↓
|
|
73
|
+
┌──────────┴──────────┬─────────────┬─────────────┐
|
|
74
|
+
│ │ │ │
|
|
75
|
+
▼ ▼ ▼ ▼
|
|
76
|
+
┌───────────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────┐
|
|
77
|
+
│ INVERTED │ │ SCOPE INDEX │ │ BLOCK │ │SEMANTIC │
|
|
78
|
+
│ INDEX │ │ (Hierarchy) │ │REGISTRY │ │(Embeddings)
|
|
79
|
+
├───────────────┤ ├──────────────┤ ├──────────┤ ├──────────┤
|
|
80
|
+
│ keyword→pos │ │file→scope→ │ │block_id→ │ │block→ │
|
|
81
|
+
│ │ │children │ │metadata │ │embedding │
|
|
82
|
+
│auth: │ │ │ │ │ │ │
|
|
83
|
+
│├─files:[..] │ │src/auth.py: │ │func_42: {│ │[0.1, │
|
|
84
|
+
│└─pos:[45,50] │ │├─module │ │file:..., │ │0.2, │
|
|
85
|
+
│ │ │├─class:Auth │ │line:45-78│ │0.3,..] │
|
|
86
|
+
│(70% storage) │ │└─method:h │ │type:meth │ │ │
|
|
87
|
+
│ │ │ │ │} │ │(optional)
|
|
88
|
+
│(Search by │ │(20% storage) │ │ │ │ │
|
|
89
|
+
│ keywords) │ │ │ │(10%st) │ │(10-20% │
|
|
90
|
+
│ │ │(Navigate │ │ │ │ storage)│
|
|
91
|
+
│ │ │structure) │ │(Quick │ │ │
|
|
92
|
+
│ │ │ │ │lookup) │ │(Semantic│
|
|
93
|
+
│ │ │ │ │ │ │ search) │
|
|
94
|
+
└───────────────┘ └──────────────┘ └──────────┘ └──────────┘
|
|
95
|
+
|
|
96
|
+
┌─────────────────────────────────────┐
|
|
97
|
+
│ Persist Indexes to Storage │
|
|
98
|
+
│ (SQLite + JSON or pure JSON) │
|
|
99
|
+
└─────────────────────────────────────┘
|
|
100
|
+
↓
|
|
101
|
+
.ngao_search/
|
|
102
|
+
├─ inverted_index.json
|
|
103
|
+
├─ scope_index.json
|
|
104
|
+
├─ block_registry.json
|
|
105
|
+
└─ index_metadata.json
|
|
106
|
+
|
|
107
|
+
|
|
108
|
+
╔════════════════════════════════════════════════════════════════════╗
|
|
109
|
+
║ QUERY LAYER ║
|
|
110
|
+
╚════════════════════════════════════════════════════════════════════╝
|
|
111
|
+
|
|
112
|
+
User Query: "find auth handler with retry"
|
|
113
|
+
↓
|
|
114
|
+
┌───────────────────────────────┐
|
|
115
|
+
│ Query Parser & Analyzer │
|
|
116
|
+
├───────────────────────────────┤
|
|
117
|
+
│ • Tokenize │
|
|
118
|
+
│ • Remove stopwords │
|
|
119
|
+
│ • Extract filters (type:...) │
|
|
120
|
+
│ • Plan search strategy │
|
|
121
|
+
└───────────────────────────────┘
|
|
122
|
+
↓
|
|
123
|
+
Terms: ["auth", "handler", "retry"]
|
|
124
|
+
Filters: {}
|
|
125
|
+
Strategy: multi_index
|
|
126
|
+
↓
|
|
127
|
+
┌──────────────┬────────────┬──────────┐
|
|
128
|
+
│ │ │ │
|
|
129
|
+
▼ ▼ ▼ ▼
|
|
130
|
+
KEYWORD SCOPE SEMANTIC REGEX
|
|
131
|
+
SEARCH SEARCH SEARCH (opt) SEARCH
|
|
132
|
+
│ │ │ │
|
|
133
|
+
└──────────────┴─────────────┴────────────┘
|
|
134
|
+
↓
|
|
135
|
+
Aggregate & Deduplicate Results
|
|
136
|
+
└─→ [block_42, block_88, method_15, ...]
|
|
137
|
+
↓
|
|
138
|
+
┌─────────────────────────────────────┐
|
|
139
|
+
│ Context Extraction │
|
|
140
|
+
│ • Load source file │
|
|
141
|
+
│ • Extract snippet ±context │
|
|
142
|
+
│ • Preserve formatting │
|
|
143
|
+
│ • Highlight matches │
|
|
144
|
+
└─────────────────────────────────────┘
|
|
145
|
+
↓
|
|
146
|
+
┌─────────────────────────────────────┐
|
|
147
|
+
│ Relevance Ranking (Multi-Factor) │
|
|
148
|
+
│ │
|
|
149
|
+
│ Score = 0.35×keyword_match + │
|
|
150
|
+
│ 0.25×position + │
|
|
151
|
+
│ 0.15×scope_specificity + │
|
|
152
|
+
│ 0.15×recency + │
|
|
153
|
+
│ 0.10×frequency │
|
|
154
|
+
└─────────────────────────────────────┘
|
|
155
|
+
↓
|
|
156
|
+
Sort by Score (DESC)
|
|
157
|
+
Truncate to max_results
|
|
158
|
+
↓
|
|
159
|
+
|
|
160
|
+
╔════════════════════════════════════════════════════════════════════╗
|
|
161
|
+
║ OUTPUT LAYER ║
|
|
162
|
+
║ (LLM-Friendly Structured Format) ║
|
|
163
|
+
╚════════════════════════════════════════════════════════════════════╝
|
|
164
|
+
|
|
165
|
+
┌─────────────────────────────────────┐
|
|
166
|
+
│ Transform to Type-Specific Format │
|
|
167
|
+
└─────────────────────────────────────┘
|
|
168
|
+
↓
|
|
169
|
+
For Python:
|
|
170
|
+
{type: "python", signature: "...", decorators: [...]}
|
|
171
|
+
|
|
172
|
+
For Markdown:
|
|
173
|
+
{type: "markdown", heading_hierarchy: [...]}
|
|
174
|
+
|
|
175
|
+
For JavaScript:
|
|
176
|
+
{type: "javascript", jsdoc: "..."}
|
|
177
|
+
|
|
178
|
+
For JSON:
|
|
179
|
+
{type: "json", key_path: [...], value: "..."}
|
|
180
|
+
↓
|
|
181
|
+
┌──────────────────────────────────────┐
|
|
182
|
+
│ Wrap in LLM-Friendly Schema │
|
|
183
|
+
│ • Structured JSON │
|
|
184
|
+
│ • All metadata included │
|
|
185
|
+
│ • Scope hierarchy clear │
|
|
186
|
+
│ • Match positions highlighted │
|
|
187
|
+
└──────────────────────────────────────┘
|
|
188
|
+
↓
|
|
189
|
+
┌───────────────┬──────────┬──────────┬────────┐
|
|
190
|
+
│ │ │ │ │
|
|
191
|
+
▼ ▼ ▼ ▼ ▼
|
|
192
|
+
JSON API CLI IDE Web UI LLM
|
|
193
|
+
(HTTP) (Terminal) Plugin Interface Integration
|
|
194
|
+
|
|
195
|
+
|
|
196
|
+
┌─────────────────────────────────────────────────────────────────────┐
|
|
197
|
+
│ RESULT EXAMPLE (LLM-Ready JSON): │
|
|
198
|
+
├─────────────────────────────────────────────────────────────────────┤
|
|
199
|
+
│ { │
|
|
200
|
+
│ "rank": 1, │
|
|
201
|
+
│ "relevance_score": 0.92, │
|
|
202
|
+
│ "file": {"path": "src/auth/handler.py", "type": "python"}, │
|
|
203
|
+
│ "location": { │
|
|
204
|
+
│ "start_line": 45, │
|
|
205
|
+
│ "end_line": 78, │
|
|
206
|
+
│ "scope_hierarchy": ["module", "class:AuthHandler"] │
|
|
207
|
+
│ }, │
|
|
208
|
+
│ "match": { │
|
|
209
|
+
│ "name": "handle", │
|
|
210
|
+
│ "signature": "def handle(self, request) -> Response:", │
|
|
211
|
+
│ "matched_terms": ["handle", "auth"], │
|
|
212
|
+
│ "match_positions": [{"line": 45, "text": "def handle"}, ...] │
|
|
213
|
+
│ }, │
|
|
214
|
+
│ "content": { │
|
|
215
|
+
│ "snippet": "[code with context and line numbers]", │
|
|
216
|
+
│ "context_lines": 15 │
|
|
217
|
+
│ }, │
|
|
218
|
+
│ "metadata": { │
|
|
219
|
+
│ "docstring": "Handle authentication...", │
|
|
220
|
+
│ "decorators": ["@validate_request"], │
|
|
221
|
+
│ "is_public": true │
|
|
222
|
+
│ } │
|
|
223
|
+
│ } │
|
|
224
|
+
└─────────────────────────────────────────────────────────────────────┘
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
---
|
|
228
|
+
|
|
229
|
+
## 2. File Type Decision Tree
|
|
230
|
+
|
|
231
|
+
```
|
|
232
|
+
┌─────────────────────────┐
|
|
233
|
+
│ Incoming File │
|
|
234
|
+
└────────────┬────────────┘
|
|
235
|
+
│
|
|
236
|
+
Check File Extension
|
|
237
|
+
│
|
|
238
|
+
┌────────┼────────┬────────┬────────┐
|
|
239
|
+
│ │ │ │ │
|
|
240
|
+
*.py *.md *.js *.json Other
|
|
241
|
+
│ │ │ │ │
|
|
242
|
+
▼ ▼ ▼ ▼ ▼
|
|
243
|
+
|
|
244
|
+
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ ┌─────────┐
|
|
245
|
+
│ Python │ │ Markdown │ │JavaScript│ │ JSON │ │ Generic │
|
|
246
|
+
│ Parser │ │ Parser │ │ Parser │ │ Parser │ │ Parser │
|
|
247
|
+
└────┬────┘ └────┬─────┘ └────┬─────┘ └────┬────┘ └────┬────┘
|
|
248
|
+
│ │ │ │ │
|
|
249
|
+
▼ ▼ ▼ ▼ ▼
|
|
250
|
+
|
|
251
|
+
Parse with Parse with Parse with Parse with Simple
|
|
252
|
+
`ast` `remark` `babel` JSON API Regex
|
|
253
|
+
module unified parser matching
|
|
254
|
+
|
|
255
|
+
Extract: Extract: Extract: Extract: Extract:
|
|
256
|
+
├─ Classes ├─ Headings ├─ Classes ├─ Keys └─ Words
|
|
257
|
+
├─ Funcs ├─ Sections ├─ Funcs ├─ Values
|
|
258
|
+
├─ Methods ├─ Params ├─ Hooks └─ Nesting
|
|
259
|
+
├─ Types └─ Code ├─ Exports
|
|
260
|
+
└─ Docstr blocks └─ JSDoc
|
|
261
|
+
|
|
262
|
+
Blocks: Blocks: Blocks: Blocks: Blocks:
|
|
263
|
+
scope: scope: scope: scope: scope:
|
|
264
|
+
module→ H1→H2→... module→ path→ (none)
|
|
265
|
+
class→ section class→ sub_key
|
|
266
|
+
method para method
|
|
267
|
+
|
|
268
|
+
▼ ▼ ▼ ▼ ▼
|
|
269
|
+
|
|
270
|
+
╔═════════════════════════════════════════════════════╗
|
|
271
|
+
║ Normalized Block Objects (Common Format) ║
|
|
272
|
+
╠═════════════════════════════════════════════════════╣
|
|
273
|
+
║ • file_path ║
|
|
274
|
+
║ • block_id ║
|
|
275
|
+
║ • item_type ║
|
|
276
|
+
║ • name ║
|
|
277
|
+
║ • content ║
|
|
278
|
+
║ • location (start_line, end_line) ║
|
|
279
|
+
║ • scope_hierarchy ║
|
|
280
|
+
║ • metadata (type-specific) ║
|
|
281
|
+
║ • docstring ║
|
|
282
|
+
╚═════════════════════════════════════════════════════╝
|
|
283
|
+
▼
|
|
284
|
+
→ To Indexing Layer
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
---
|
|
288
|
+
|
|
289
|
+
## 3. Query Processing Decision Tree
|
|
290
|
+
|
|
291
|
+
```
|
|
292
|
+
┌────────────────────────┐
|
|
293
|
+
│ Raw User Query │
|
|
294
|
+
└────────────┬───────────┘
|
|
295
|
+
│
|
|
296
|
+
┌────────────────────────┐
|
|
297
|
+
│ Parse Query │
|
|
298
|
+
├────────────────────────┤
|
|
299
|
+
│ Tokenize: split into │
|
|
300
|
+
│ individual terms │
|
|
301
|
+
└────────────┬───────────┘
|
|
302
|
+
│
|
|
303
|
+
┌───────┴───────┐
|
|
304
|
+
│ │
|
|
305
|
+
Extract Filters Analyze Terms
|
|
306
|
+
│ │
|
|
307
|
+
type:*.py Remove
|
|
308
|
+
scope:auth.* stopwords
|
|
309
|
+
file:*.py │
|
|
310
|
+
│ └─→ Stemming
|
|
311
|
+
│ │
|
|
312
|
+
│ "authentication"
|
|
313
|
+
│ → "auth"
|
|
314
|
+
│ │
|
|
315
|
+
├─────────────────┘
|
|
316
|
+
│
|
|
317
|
+
▼
|
|
318
|
+
┌──────────────────────────┐
|
|
319
|
+
│ Plan Search Strategy │
|
|
320
|
+
│ │
|
|
321
|
+
│ Single term? → Keywords │
|
|
322
|
+
│ Multi-term? → Multi-idx │
|
|
323
|
+
│ Regex? → Pattern search │
|
|
324
|
+
│ Semantic? → Embeddings │
|
|
325
|
+
└──────────┬───────────────┘
|
|
326
|
+
│
|
|
327
|
+
┌───────────┴────────────┬──────────┬───────────┐
|
|
328
|
+
│ │ │ │
|
|
329
|
+
▼ ▼ ▼ ▼
|
|
330
|
+
|
|
331
|
+
KEYWORD SEARCH SCOPE SEARCH SEMANTIC REGEX
|
|
332
|
+
(Inverted Index) (Hierarchy) SEARCH SEARCH
|
|
333
|
+
|
|
334
|
+
Search for each Filter by Find Match
|
|
335
|
+
term in inverted scope path similar pattern
|
|
336
|
+
index (scope:auth.*) embeddings
|
|
337
|
+
|
|
338
|
+
Find position Narrow down Find Get all
|
|
339
|
+
info in inverted results to similar matching
|
|
340
|
+
index certain parts code files
|
|
341
|
+
of structure
|
|
342
|
+
|
|
343
|
+
Match all │ │ │
|
|
344
|
+
results │ │ │
|
|
345
|
+
│ │ │ │
|
|
346
|
+
├──────────────────┴───────────────┴───────────┴──→
|
|
347
|
+
|
|
348
|
+
┌─────────────────────────────┐
|
|
349
|
+
│ Aggregate Results │
|
|
350
|
+
│ • Combine from all indexes │
|
|
351
|
+
│ • Deduplicate blocks │
|
|
352
|
+
│ • Prepare for ranking │
|
|
353
|
+
└──────────┬──────────────────┘
|
|
354
|
+
│
|
|
355
|
+
▼
|
|
356
|
+
┌─────────────────────────────┐
|
|
357
|
+
│ Extract Context │
|
|
358
|
+
│ • Load source files │
|
|
359
|
+
│ • Get snippet ±context │
|
|
360
|
+
│ • Preserve formatting │
|
|
361
|
+
└──────────┬──────────────────┘
|
|
362
|
+
│
|
|
363
|
+
▼
|
|
364
|
+
┌──────────────────────────────┐
|
|
365
|
+
│ Rank Results │
|
|
366
|
+
│ │
|
|
367
|
+
│ For each result compute: │
|
|
368
|
+
│ score = 0.35×kw_match + │
|
|
369
|
+
│ 0.25×position + │
|
|
370
|
+
│ 0.15×scope + │
|
|
371
|
+
│ 0.15×recency + │
|
|
372
|
+
│ 0.10×frequency │
|
|
373
|
+
└──────────┬───────────────────┘
|
|
374
|
+
│
|
|
375
|
+
▼
|
|
376
|
+
┌─────────────────────────────┐
|
|
377
|
+
│ Sort & Truncate │
|
|
378
|
+
│ • Sort by score (DESC) │
|
|
379
|
+
│ • Keep top N results │
|
|
380
|
+
│ • Return to formatter │
|
|
381
|
+
└──────────┬──────────────────┘
|
|
382
|
+
│
|
|
383
|
+
▼
|
|
384
|
+
Formatted Results
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
---
|
|
388
|
+
|
|
389
|
+
## 4. Ranking Factor Contribution
|
|
390
|
+
|
|
391
|
+
```
|
|
392
|
+
RELEVANCE SCORE BREAKDOWN
|
|
393
|
+
═════════════════════════════════════════════════════════
|
|
394
|
+
|
|
395
|
+
Query: "handle auth retry" searching in codebase
|
|
396
|
+
|
|
397
|
+
For each matching block, compute:
|
|
398
|
+
|
|
399
|
+
FACTOR 1: KEYWORD MATCH (Weight: 0.35)
|
|
400
|
+
───────────────────────────────────────
|
|
401
|
+
|
|
402
|
+
Block name: "handle_auth_retry"
|
|
403
|
+
Terms matched: ["handle", "auth", "retry"]
|
|
404
|
+
Match score: 3/3 terms = 1.0
|
|
405
|
+
|
|
406
|
+
Contribution: 1.0 × 0.35 = 0.35
|
|
407
|
+
|
|
408
|
+
[████████████████████████████████████] 35%
|
|
409
|
+
|
|
410
|
+
|
|
411
|
+
FACTOR 2: POSITION (Weight: 0.25)
|
|
412
|
+
─────────────────────────────────
|
|
413
|
+
|
|
414
|
+
Match location:
|
|
415
|
+
- In function name: ✓ (score: 1.0)
|
|
416
|
+
- In docstring: ✓ (adds: 0.0)
|
|
417
|
+
- In code body: ✓ (adds: 0.0)
|
|
418
|
+
|
|
419
|
+
Position score: 1.0
|
|
420
|
+
|
|
421
|
+
Contribution: 1.0 × 0.25 = 0.25
|
|
422
|
+
|
|
423
|
+
[████████████████████████] 25%
|
|
424
|
+
|
|
425
|
+
|
|
426
|
+
FACTOR 3: SCOPE SPECIFICITY (Weight: 0.15)
|
|
427
|
+
──────────────────────────────────────────
|
|
428
|
+
|
|
429
|
+
Scope: ["module", "class:AuthHandler", "method:handle"]
|
|
430
|
+
Depth: 3
|
|
431
|
+
Score: min(0.5 + (3 × 0.1), 1.0) = 0.8
|
|
432
|
+
|
|
433
|
+
Contribution: 0.8 × 0.15 = 0.12
|
|
434
|
+
|
|
435
|
+
[██████████████] 12%
|
|
436
|
+
|
|
437
|
+
|
|
438
|
+
FACTOR 4: RECENCY (Weight: 0.15)
|
|
439
|
+
────────────────────────────────
|
|
440
|
+
|
|
441
|
+
Last modified: 2 hours ago
|
|
442
|
+
Score: 1.0 (within 24 hours)
|
|
443
|
+
|
|
444
|
+
Contribution: 1.0 × 0.15 = 0.15
|
|
445
|
+
|
|
446
|
+
[███████████████] 15%
|
|
447
|
+
|
|
448
|
+
|
|
449
|
+
FACTOR 5: FREQUENCY (Weight: 0.10)
|
|
450
|
+
──────────────────────────────────
|
|
451
|
+
|
|
452
|
+
Term occurrences:
|
|
453
|
+
- "handle": 2 times
|
|
454
|
+
- "auth": 3 times
|
|
455
|
+
- "retry": 1 time
|
|
456
|
+
Total: 6 times in 50 lines
|
|
457
|
+
Normalized: min(6 / 50, 1.0) = 0.12
|
|
458
|
+
|
|
459
|
+
Contribution: 0.12 × 0.10 = 0.012
|
|
460
|
+
|
|
461
|
+
[█] ~1%
|
|
462
|
+
|
|
463
|
+
|
|
464
|
+
TOTAL RELEVANCE SCORE:
|
|
465
|
+
═════════════════════════════════════════════════════════
|
|
466
|
+
|
|
467
|
+
0.35 + 0.25 + 0.12 + 0.15 + 0.012 = 0.862
|
|
468
|
+
|
|
469
|
+
Final Score: 0.86 / 1.0
|
|
470
|
+
┌──────────────────────────────────────────┐
|
|
471
|
+
│████████████████████████████████████░░░░░░│
|
|
472
|
+
└──────────────────────────────────────────┘
|
|
473
|
+
86% relevance
|
|
474
|
+
|
|
475
|
+
Result Rank: Top results sorted by this score
|
|
476
|
+
```
|
|
477
|
+
|
|
478
|
+
---
|
|
479
|
+
|
|
480
|
+
## 5. Block Extraction Flowchart
|
|
481
|
+
|
|
482
|
+
```
|
|
483
|
+
INPUT: Raw source code file
|
|
484
|
+
│
|
|
485
|
+
├─→ PYTHON FILE (*.py)
|
|
486
|
+
│ │
|
|
487
|
+
│ ├─ Parse with ast.parse()
|
|
488
|
+
│ │ │
|
|
489
|
+
│ │ ├─→ ModuleVisitor.visit()
|
|
490
|
+
│ │ │ │
|
|
491
|
+
│ │ │ ├─→ Module docstring
|
|
492
|
+
│ │ │ │
|
|
493
|
+
│ │ │ ├─→ Top-level functions
|
|
494
|
+
│ │ │ │ └─→ For each:
|
|
495
|
+
│ │ │ │ ├─ Extract params
|
|
496
|
+
│ │ │ │ ├─ Get docstring
|
|
497
|
+
│ │ │ │ ├─ Get decorators
|
|
498
|
+
│ │ │ │ └─ Line numbers
|
|
499
|
+
│ │ │ │
|
|
500
|
+
│ │ │ └─→ Class definitions
|
|
501
|
+
│ │ │ └─→ For each class:
|
|
502
|
+
│ │ │ ├─ Get docstring
|
|
503
|
+
│ │ │ ├─ Get decorators
|
|
504
|
+
│ │ │ └─→ Class methods
|
|
505
|
+
│ │ │ └─→ For each method:
|
|
506
|
+
│ │ │ ├─ Extract params
|
|
507
|
+
│ │ │ ├─ Get return type
|
|
508
|
+
│ │ │ ├─ Get docstring
|
|
509
|
+
│ │ │ └─ Line numbers
|
|
510
|
+
│ │ │
|
|
511
|
+
│ │ └─→ Normalize all blocks
|
|
512
|
+
│ │ ├─ Compute scope_hierarchy
|
|
513
|
+
│ │ ├─ Extract content
|
|
514
|
+
│ │ ├─ Get line ranges
|
|
515
|
+
│ │ └─ Create block_ids
|
|
516
|
+
│ │
|
|
517
|
+
│ └─→ Output: List[Block]
|
|
518
|
+
│
|
|
519
|
+
├─→ MARKDOWN FILE (*.md)
|
|
520
|
+
│ │
|
|
521
|
+
│ ├─ Parse with remark/unified
|
|
522
|
+
│ │ │
|
|
523
|
+
│ │ ├─→ Heading traversal
|
|
524
|
+
│ │ │ ├─ H1 → Top-level section
|
|
525
|
+
│ │ │ ├─ H2 → Subsection
|
|
526
|
+
│ │ │ ├─ H3 → Sub-subsection
|
|
527
|
+
│ │ │ └─ Track hierarchy
|
|
528
|
+
│ │ │
|
|
529
|
+
│ │ ├─→ Content between headings
|
|
530
|
+
│ │ │ ├─ Paragraphs
|
|
531
|
+
│ │ │ ├─ Code blocks (with language)
|
|
532
|
+
│ │ │ ├─ Lists
|
|
533
|
+
│ │ │ └─ Tables
|
|
534
|
+
│ │ │
|
|
535
|
+
│ │ ├─→ Frontmatter (YAML)
|
|
536
|
+
│ │ │ └─ Extract metadata
|
|
537
|
+
│ │ │
|
|
538
|
+
│ │ └─→ Normalize sections
|
|
539
|
+
│ │ ├─ Group content by heading
|
|
540
|
+
│ │ ├─ Create hierarchy path
|
|
541
|
+
│ │ ├─ Line number tracking
|
|
542
|
+
│ │ └─ Create block_ids
|
|
543
|
+
│ │
|
|
544
|
+
│ └─→ Output: List[Block]
|
|
545
|
+
│
|
|
546
|
+
├─→ JAVASCRIPT FILE (*.js/*.jsx)
|
|
547
|
+
│ │
|
|
548
|
+
│ ├─ Parse with @babel/parser
|
|
549
|
+
│ │ │
|
|
550
|
+
│ │ ├─→ Extract top-level:
|
|
551
|
+
│ │ │ ├─ Import statements
|
|
552
|
+
│ │ │ ├─ Export statements
|
|
553
|
+
│ │ │ └─ Variable declarations
|
|
554
|
+
│ │ │
|
|
555
|
+
│ │ ├─→ Function declarations
|
|
556
|
+
│ │ │ └─→ For each:
|
|
557
|
+
│ │ │ ├─ Get parameters
|
|
558
|
+
│ │ │ ├─ Get JSDoc
|
|
559
|
+
│ │ │ └─ Line range
|
|
560
|
+
│ │ │
|
|
561
|
+
│ │ ├─→ Arrow functions (const useHook = ...)
|
|
562
|
+
│ │ │ └─→ For each:
|
|
563
|
+
│ │ │ ├─ Identify hook pattern
|
|
564
|
+
│ │ │ ├─ Get JSDoc
|
|
565
|
+
│ │ │ └─ Line range
|
|
566
|
+
│ │ │
|
|
567
|
+
│ │ ├─→ Class definitions
|
|
568
|
+
│ │ │ └─→ For each:
|
|
569
|
+
│ │ │ ├─ Methods
|
|
570
|
+
│ │ │ ├─ Properties
|
|
571
|
+
│ │ │ └─ Constructor
|
|
572
|
+
│ │ │
|
|
573
|
+
│ │ ├─→ React Components (JSX)
|
|
574
|
+
│ │ │ └─→ For each:
|
|
575
|
+
│ │ │ ├─ Props
|
|
576
|
+
│ │ │ ├─ Return JSX
|
|
577
|
+
│ │ │ └─ JSDoc
|
|
578
|
+
│ │ │
|
|
579
|
+
│ │ └─→ Normalize all blocks
|
|
580
|
+
│ │ ├─ Compute scope
|
|
581
|
+
│ │ ├─ Extract signature
|
|
582
|
+
│ │ └─ Line ranges
|
|
583
|
+
│ │
|
|
584
|
+
│ └─→ Output: List[Block]
|
|
585
|
+
│
|
|
586
|
+
├─→ JSON FILE (*.json)
|
|
587
|
+
│ │
|
|
588
|
+
│ ├─ Parse as JSON
|
|
589
|
+
│ │ │
|
|
590
|
+
│ │ ├─→ Flatten key hierarchy
|
|
591
|
+
│ │ │ └─ database.connection.pool_size
|
|
592
|
+
│ │ │
|
|
593
|
+
│ │ ├─→ For each key-value:
|
|
594
|
+
│ │ │ ├─ Key path
|
|
595
|
+
│ │ │ ├─ Value
|
|
596
|
+
│ │ │ ├─ Type (string|number|bool|object|array)
|
|
597
|
+
│ │ │ └─ Nesting depth
|
|
598
|
+
│ │ │
|
|
599
|
+
│ │ └─→ Normalize blocks
|
|
600
|
+
│ │ ├─ Create scope from path
|
|
601
|
+
│ │ └─ Line number tracking
|
|
602
|
+
│ │
|
|
603
|
+
│ └─→ Output: List[Block]
|
|
604
|
+
│
|
|
605
|
+
└─→ GENERIC TEXT FILE
|
|
606
|
+
│
|
|
607
|
+
├─ Fallback parsing (last resort)
|
|
608
|
+
│ │
|
|
609
|
+
│ ├─→ Line-by-line analysis
|
|
610
|
+
│ ├─→ Regex pattern matching
|
|
611
|
+
│ └─→ Basic structure detection
|
|
612
|
+
│
|
|
613
|
+
└─→ Output: List[Block] (minimal metadata)
|
|
614
|
+
```
|
|
615
|
+
|
|
616
|
+
---
|
|
617
|
+
|
|
618
|
+
## 6. Context Extraction Window
|
|
619
|
+
|
|
620
|
+
```
|
|
621
|
+
SOURCE FILE (Python)
|
|
622
|
+
═════════════════════════════════════════════════════
|
|
623
|
+
|
|
624
|
+
40: class AuthHandler:
|
|
625
|
+
41: """Handles authentication"""
|
|
626
|
+
42:
|
|
627
|
+
43: def __init__(self, config):
|
|
628
|
+
44: self.config = config
|
|
629
|
+
45:
|
|
630
|
+
46: def handle(self, request): ← BLOCK START (line 46)
|
|
631
|
+
47: """Handle incoming request"""
|
|
632
|
+
48:
|
|
633
|
+
49: # Validate token
|
|
634
|
+
50: token = request.headers.get('Authorization')
|
|
635
|
+
51: if not token:
|
|
636
|
+
52: return Response(status=401)
|
|
637
|
+
53:
|
|
638
|
+
54: # Process authentication
|
|
639
|
+
55: result = self.validate_token(token)
|
|
640
|
+
56: if not result:
|
|
641
|
+
57: return Response(status=403)
|
|
642
|
+
58:
|
|
643
|
+
59: return Response(status=200) ← BLOCK END (line 59)
|
|
644
|
+
60:
|
|
645
|
+
61: def validate_token(self, token):
|
|
646
|
+
62: """Validate token"""
|
|
647
|
+
|
|
648
|
+
|
|
649
|
+
CONTEXT EXTRACTION CONFIG
|
|
650
|
+
═════════════════════════════════════════════════════
|
|
651
|
+
|
|
652
|
+
{
|
|
653
|
+
"file_type": "python",
|
|
654
|
+
"context_lines_before": 5,
|
|
655
|
+
"context_lines_after": 5,
|
|
656
|
+
"include_docstring": true,
|
|
657
|
+
"include_parent_class": true,
|
|
658
|
+
"max_context_lines": 30
|
|
659
|
+
}
|
|
660
|
+
|
|
661
|
+
|
|
662
|
+
EXTRACTED CONTEXT WINDOW
|
|
663
|
+
═════════════════════════════════════════════════════
|
|
664
|
+
|
|
665
|
+
BEFORE:
|
|
666
|
+
41: │ """Handles authentication"""
|
|
667
|
+
42: │
|
|
668
|
+
43: │ def __init__(self, config):
|
|
669
|
+
44: │ self.config = config
|
|
670
|
+
45: │
|
|
671
|
+
|
|
672
|
+
BLOCK CONTENT:
|
|
673
|
+
46: │ def handle(self, request):
|
|
674
|
+
47: │ """Handle incoming request"""
|
|
675
|
+
48: │
|
|
676
|
+
49: │ # Validate token
|
|
677
|
+
50: │ token = request.headers.get('Authorization')
|
|
678
|
+
51: │ if not token:
|
|
679
|
+
52: │ return Response(status=401)
|
|
680
|
+
53: │
|
|
681
|
+
54: │ # Process authentication
|
|
682
|
+
55: │ result = self.validate_token(token)
|
|
683
|
+
56: │ if not result:
|
|
684
|
+
57: │ return Response(status=403)
|
|
685
|
+
58: │
|
|
686
|
+
59: │ return Response(status=200)
|
|
687
|
+
|
|
688
|
+
AFTER:
|
|
689
|
+
60: │
|
|
690
|
+
61: │ def validate_token(self, token):
|
|
691
|
+
62: │ """Validate token"""
|
|
692
|
+
|
|
693
|
+
|
|
694
|
+
FORMATTED OUTPUT FOR LLM
|
|
695
|
+
═════════════════════════════════════════════════════
|
|
696
|
+
|
|
697
|
+
Scope: class AuthHandler → method handle
|
|
698
|
+
Lines: 46-59 (in file src/auth/handler.py)
|
|
699
|
+
|
|
700
|
+
Code:
|
|
701
|
+
```python
|
|
702
|
+
41 │ """Handles authentication"""
|
|
703
|
+
42 │
|
|
704
|
+
43 │ def __init__(self, config):
|
|
705
|
+
44 │ self.config = config
|
|
706
|
+
45 │
|
|
707
|
+
46 │ def handle(self, request):
|
|
708
|
+
47 │ """Handle incoming request"""
|
|
709
|
+
48 │
|
|
710
|
+
49 │ # Validate token
|
|
711
|
+
50 │ token = request.headers.get('Authorization')
|
|
712
|
+
51 │ if not token:
|
|
713
|
+
52 │ return Response(status=401)
|
|
714
|
+
53 │
|
|
715
|
+
54 │ # Process authentication
|
|
716
|
+
55 │ result = self.validate_token(token)
|
|
717
|
+
56 │ if not result:
|
|
718
|
+
57 │ return Response(status=403)
|
|
719
|
+
58 │
|
|
720
|
+
59 │ return Response(status=200)
|
|
721
|
+
60 │
|
|
722
|
+
61 │ def validate_token(self, token):
|
|
723
|
+
62 │ """Validate token"""
|
|
724
|
+
```
|
|
725
|
+
|
|
726
|
+
Docstring: "Handle incoming request"
|
|
727
|
+
Method signature: def handle(self, request) -> ?
|
|
728
|
+
Related method: validate_token() called on line 55
|
|
729
|
+
```
|
|
730
|
+
|
|
731
|
+
---
|
|
732
|
+
|
|
733
|
+
## 7. Performance Optimization Layers
|
|
734
|
+
|
|
735
|
+
```
|
|
736
|
+
┌────────────────────────────────────────────────────────┐
|
|
737
|
+
│ OPTIMIZATION LAYER 1: INPUT NORMALIZATION │
|
|
738
|
+
│ (Query Pre-processing) │
|
|
739
|
+
└────────────────────────────────────────────────────────┘
|
|
740
|
+
|
|
741
|
+
Query: "finding authentication token handler retry logic"
|
|
742
|
+
▼
|
|
743
|
+
Tokenize:
|
|
744
|
+
["finding", "authentication", "token", "handler", "retry", "logic"]
|
|
745
|
+
▼
|
|
746
|
+
Remove stopwords:
|
|
747
|
+
["authentication", "token", "handler", "retry", "logic"]
|
|
748
|
+
(removed: "finding", "logic")
|
|
749
|
+
▼
|
|
750
|
+
Stem/Lemmatize:
|
|
751
|
+
["auth", "token", "handle", "retry", "logic"]
|
|
752
|
+
▼
|
|
753
|
+
Optimized query: 5 terms instead of 6
|
|
754
|
+
Reduced search space by 17%
|
|
755
|
+
|
|
756
|
+
Search cost reduction: -20-30%
|
|
757
|
+
|
|
758
|
+
|
|
759
|
+
┌────────────────────────────────────────────────────────┐
|
|
760
|
+
│ OPTIMIZATION LAYER 2: INDEX STRUCTURE │
|
|
761
|
+
│ (Data Structure Optimization) │
|
|
762
|
+
└────────────────────────────────────────────────────────┘
|
|
763
|
+
|
|
764
|
+
Naive inverted index:
|
|
765
|
+
{
|
|
766
|
+
"auth": [
|
|
767
|
+
{"file": "f1", "line": 45, "block": "b1"},
|
|
768
|
+
{"file": "f1", "line": 50, "block": "b1"},
|
|
769
|
+
{"file": "f2", "line": 100, "block": "b5"},
|
|
770
|
+
... (100 entries per keyword)
|
|
771
|
+
]
|
|
772
|
+
}
|
|
773
|
+
Size: ~1 MB per keyword
|
|
774
|
+
|
|
775
|
+
Optimized inverted index:
|
|
776
|
+
{
|
|
777
|
+
"auth": {
|
|
778
|
+
"files": [compressed bitmap of file IDs],
|
|
779
|
+
"positions": [varint-encoded line numbers]
|
|
780
|
+
}
|
|
781
|
+
}
|
|
782
|
+
Size: ~200 KB per keyword
|
|
783
|
+
|
|
784
|
+
Size reduction: 80%
|
|
785
|
+
Lookup time: O(1) → O(1) ✓
|
|
786
|
+
|
|
787
|
+
|
|
788
|
+
┌────────────────────────────────────────────────────────┐
|
|
789
|
+
│ OPTIMIZATION LAYER 3: QUERY EXECUTION STRATEGY │
|
|
790
|
+
│ (Smart Search Strategy) │
|
|
791
|
+
└────────────────────────────────────────────────────────┘
|
|
792
|
+
|
|
793
|
+
Multi-term query: ["auth", "handler", "retry"]
|
|
794
|
+
|
|
795
|
+
Strategy A (Naive): Search all terms, combine results
|
|
796
|
+
Time: O(log n × 3 terms) = O(3 log n)
|
|
797
|
+
Results: 1000 candidates
|
|
798
|
+
|
|
799
|
+
Strategy B (Optimized): Search smallest first, prune
|
|
800
|
+
1. Count postings for each term:
|
|
801
|
+
- "retry": 45 results (rarest)
|
|
802
|
+
- "auth": 200 results
|
|
803
|
+
- "handler": 500 results
|
|
804
|
+
|
|
805
|
+
2. Search "retry" first (smallest):
|
|
806
|
+
45 candidates
|
|
807
|
+
|
|
808
|
+
3. Intersect with "auth":
|
|
809
|
+
45 × 200 positions checked
|
|
810
|
+
~35 survivors
|
|
811
|
+
|
|
812
|
+
4. Intersect with "handler":
|
|
813
|
+
~25 final results
|
|
814
|
+
|
|
815
|
+
Time: O(log n + intersection cost)
|
|
816
|
+
Results: 25 candidates
|
|
817
|
+
|
|
818
|
+
Speedup: 40x fewer candidates
|
|
819
|
+
|
|
820
|
+
|
|
821
|
+
┌────────────────────────────────────────────────────────┐
|
|
822
|
+
│ OPTIMIZATION LAYER 4: CACHING │
|
|
823
|
+
│ (Multi-Level Cache) │
|
|
824
|
+
└────────────────────────────────────────────────────────┘
|
|
825
|
+
|
|
826
|
+
Query Layers:
|
|
827
|
+
┌─────────────────────────────────┐
|
|
828
|
+
│ Level 1: Query Result Cache │ Fast
|
|
829
|
+
│ (LRU, TTL: 1 hour) │ Hit rate: 30-40%
|
|
830
|
+
│ Size: 50 MB │
|
|
831
|
+
└────────────────────────────────┐
|
|
832
|
+
│
|
|
833
|
+
Miss │ (70% of queries)
|
|
834
|
+
│
|
|
835
|
+
┌─────────────────────────────────┐
|
|
836
|
+
│ Level 2: Block Content Cache │ Medium
|
|
837
|
+
│ (Recently accessed blocks) │ Hit rate: 60-70%
|
|
838
|
+
│ Size: 100 MB │
|
|
839
|
+
└────────────────────────────────┐
|
|
840
|
+
│
|
|
841
|
+
Miss │ (30% of queries)
|
|
842
|
+
│
|
|
843
|
+
┌─────────────────────────────────┐
|
|
844
|
+
│ Level 3: Disk (SQLite/JSON) │ Slow
|
|
845
|
+
│ Size: ~75 MB │ Read time: 50-200 ms
|
|
846
|
+
└─────────────────────────────────┘
|
|
847
|
+
|
|
848
|
+
Overall performance:
|
|
849
|
+
- Cache hit (L1): ~5 ms (70% speedup vs disk)
|
|
850
|
+
- Cache hit (L2): ~20 ms (10-15x speedup)
|
|
851
|
+
- Cache miss: ~150 ms (disk read)
|
|
852
|
+
- Average (with cache): ~30 ms
|
|
853
|
+
|
|
854
|
+
Cache speedup factor: 5-6x
|
|
855
|
+
|
|
856
|
+
|
|
857
|
+
┌────────────────────────────────────────────────────────┐
|
|
858
|
+
│ OPTIMIZATION LAYER 5: INCREMENTAL INDEXING │
|
|
859
|
+
│ (Change-Based Updates) │
|
|
860
|
+
└────────────────────────────────────────────────────────┘
|
|
861
|
+
|
|
862
|
+
Full Reindex (without incremental):
|
|
863
|
+
- 1000 files × 100 blocks/file = 100k blocks
|
|
864
|
+
- Time: 10-15 minutes
|
|
865
|
+
- Cost: 100% reprocessing
|
|
866
|
+
|
|
867
|
+
Incremental Indexing:
|
|
868
|
+
File changed: handler.py (2 KB file)
|
|
869
|
+
▼
|
|
870
|
+
1. Compute hash: abc123def456...
|
|
871
|
+
2. Compare with stored: xyz789...
|
|
872
|
+
3. Hash mismatch: FILE CHANGED
|
|
873
|
+
▼
|
|
874
|
+
4. Load old blocks for handler.py (from registry)
|
|
875
|
+
- Remove 12 old blocks from indexes
|
|
876
|
+
- Time: 10 ms
|
|
877
|
+
▼
|
|
878
|
+
5. Parse new version: 50 ms
|
|
879
|
+
6. Extract new blocks: 10 ms
|
|
880
|
+
7. Add to indexes: 20 ms
|
|
881
|
+
8. Update metadata: 5 ms
|
|
882
|
+
▼
|
|
883
|
+
Total time: ~95 ms
|
|
884
|
+
Speedup: 6000x faster than full reindex!
|
|
885
|
+
|
|
886
|
+
|
|
887
|
+
┌────────────────────────────────────────────────────────┐
|
|
888
|
+
│ OPTIMIZATION LAYER 6: PARALLEL PROCESSING │
|
|
889
|
+
│ (Multi-threaded Indexing) │
|
|
890
|
+
└────────────────────────────────────────────────────────┘
|
|
891
|
+
|
|
892
|
+
Sequential Processing:
|
|
893
|
+
File1 → Parse (100ms) → Extract (20ms) → Index (10ms)
|
|
894
|
+
File2 → Parse (100ms) → Extract (20ms) → Index (10ms)
|
|
895
|
+
File3 → Parse (100ms) → Extract (20ms) → Index (10ms)
|
|
896
|
+
Total: 390 ms
|
|
897
|
+
|
|
898
|
+
Parallel Processing (4 workers):
|
|
899
|
+
Worker1: File1 Parse (100ms)
|
|
900
|
+
Worker2: File2 Parse (100ms)
|
|
901
|
+
Worker3: File3 Parse (100ms)
|
|
902
|
+
Worker4: File4 Parse (100ms) ← All parallel
|
|
903
|
+
───────────────────────────────────
|
|
904
|
+
100 ms total (parse phase)
|
|
905
|
+
|
|
906
|
+
Worker1: File1 Extract (20ms)
|
|
907
|
+
Worker2: File2 Extract (20ms)
|
|
908
|
+
Worker3: File3 Extract (20ms)
|
|
909
|
+
Worker4: File4 Extract (20ms)
|
|
910
|
+
───────────────────────────────────
|
|
911
|
+
20 ms total (extract phase)
|
|
912
|
+
|
|
913
|
+
Worker1-4: Batch index (40ms)
|
|
914
|
+
───────────────────────────────────
|
|
915
|
+
40 ms total (index phase)
|
|
916
|
+
|
|
917
|
+
Parallel total: ~160 ms
|
|
918
|
+
Speedup: 2.4x (limited by I/O and batch overhead)
|
|
919
|
+
```
|
|
920
|
+
|
|
921
|
+
This visual guide provides comprehensive reference for understanding the system architecture and decision-making processes.
|
|
922
|
+
|