kgmodule-utils 0.3.0__tar.gz → 0.3.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (23) hide show
  1. kgmodule_utils-0.3.1/PKG-INFO +270 -0
  2. kgmodule_utils-0.3.1/README.md +245 -0
  3. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/pyproject.toml +1 -1
  4. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/__init__.py +3 -3
  5. kgmodule_utils-0.3.0/PKG-INFO +0 -216
  6. kgmodule_utils-0.3.0/README.md +0 -191
  7. kgmodule_utils-0.3.0/src/kg_utils/types/__init__.py +0 -14
  8. kgmodule_utils-0.3.0/src/kg_utils/types/extractor.py +0 -68
  9. kgmodule_utils-0.3.0/src/kg_utils/types/module.py +0 -87
  10. kgmodule_utils-0.3.0/src/kg_utils/types/specs.py +0 -90
  11. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/LICENSE +0 -0
  12. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/embed.py +0 -0
  13. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/embedder.py +0 -0
  14. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/extractor.py +0 -0
  15. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/module.py +0 -0
  16. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/pipeline.py +0 -0
  17. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/py.typed +0 -0
  18. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/semantic.py +0 -0
  19. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/snapshots/__init__.py +0 -0
  20. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/snapshots/manager.py +0 -0
  21. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/snapshots/models.py +0 -0
  22. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/specs.py +0 -0
  23. {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/store.py +0 -0
@@ -0,0 +1,270 @@
1
+ Metadata-Version: 2.4
2
+ Name: kgmodule-utils
3
+ Version: 0.3.1
4
+ Summary: Shared types, graph store, semantic index, and pipeline base for the KGModule SDK
5
+ License: Elastic-2.0
6
+ License-File: LICENSE
7
+ Keywords: knowledge-graph,kgmodule,sdk,types,snapshots
8
+ Author: Eric G. Suchanek, PhD
9
+ Author-email: suchanek@flux-frontiers.com
10
+ Requires-Python: >=3.12,<3.14
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Programming Language :: Python :: 3.13
16
+ Provides-Extra: semantic
17
+ Requires-Dist: lancedb (>=0.19.0) ; extra == "semantic"
18
+ Requires-Dist: numpy (>=1.24.0) ; extra == "semantic"
19
+ Requires-Dist: sentence-transformers (>=5.4.1) ; extra == "semantic"
20
+ Requires-Dist: torch (>=2.5.1) ; extra == "semantic"
21
+ Requires-Dist: transformers (>=4.40.0,<4.57) ; extra == "semantic"
22
+ Project-URL: Repository, https://github.com/Flux-Frontiers/kg_utils
23
+ Description-Content-Type: text/markdown
24
+
25
+
26
+ [![Python](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue.svg)](https://www.python.org/)
27
+ [![License: Elastic-2.0](https://img.shields.io/badge/License-Elastic%202.0-blue.svg)](https://www.elastic.co/licensing/elastic-license)
28
+ [![Version](https://img.shields.io/badge/version-0.3.1-blue.svg)](https://github.com/Flux-Frontiers/KG_utils/releases)
29
+ [![CI](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml/badge.svg)](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
30
+ [![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
31
+
32
+ # kgmodule-utils
33
+
34
+ **kgmodule-utils** — Shared graph store, semantic index, pipeline base, and snapshot infrastructure for the KGModule SDK.
35
+
36
+ *Author: Eric G. Suchanek, PhD*
37
+
38
+ *Flux-Frontiers, Liberty TWP, OH*
39
+
40
+ ---
41
+
42
+ ## Overview
43
+
44
+ kgmodule-utils is the **shared SDK layer** for the Flux-Frontiers knowledge-graph ecosystem. It provides everything a domain KG module needs — from type abstractions and SQLite graph storage through LanceDB vector indexing and a full build/query/pack pipeline — so domain authors implement only what is specific to their source domain.
45
+
46
+ Every KGModule implementation — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), and others — subclasses `KGModule` from here and implements exactly three methods: `make_extractor()`, `kind()`, and `analyze()`.
47
+
48
+ ---
49
+
50
+ ## Features
51
+
52
+ - **`kg_utils.specs`** — `NodeSpec`, `EdgeSpec`, `BuildStats`, `QueryResult`, `SnippetPack` dataclasses
53
+ - **`kg_utils.extractor`** — `KGExtractor` ABC: `extract()`, `node_kinds()`, `edge_kinds()`, `coverage_metric()`
54
+ - **`kg_utils.store`** — `GraphStore`: SQLite-backed node/edge store with BFS expansion, symbol resolution, caller lookup, and provenance recording
55
+ - **`kg_utils.semantic`** — `SemanticIndex` (LanceDB), `SentenceTransformerEmbedder`, `SeedHit`, model registry, `resolve_model_path()`
56
+ - **`kg_utils.pipeline`** — `KGModule`: full build → query → pack pipeline base with hybrid semantic + lexical reranking and snippet extraction
57
+ - **`kg_utils.embedder`** — `get_embedder()`, `wrap_embedder()`, `load_sentence_transformer()` factory functions
58
+ - **`kg_utils.embed`** — `Embedder` protocol, `DEFAULT_MODEL`, `KNOWN_MODELS`, `resolve_model_path()`
59
+ - **`kg_utils.snapshots`** — `Snapshot`, `SnapshotManager`, `SnapshotManifest` for temporal metric tracking
60
+
61
+ ---
62
+
63
+ ## Installation
64
+
65
+ **Requirements:** Python ≥ 3.12, < 3.14
66
+
67
+ ### Core only (stdlib, no optional deps)
68
+
69
+ ```bash
70
+ pip install kgmodule-utils
71
+ ```
72
+
73
+ ### With semantic search (LanceDB + sentence-transformers)
74
+
75
+ ```bash
76
+ pip install 'kgmodule-utils[semantic]'
77
+ ```
78
+
79
+ ### In a Poetry project
80
+
81
+ ```toml
82
+ [tool.poetry.dependencies]
83
+ kgmodule-utils = { version = ">=0.3.1", extras = ["semantic"] }
84
+ ```
85
+
86
+ ---
87
+
88
+ ## Quick Start
89
+
90
+ ### Build a domain KG module
91
+
92
+ ```python
93
+ from collections.abc import Iterator
94
+ from pathlib import Path
95
+
96
+ from kg_utils.extractor import KGExtractor
97
+ from kg_utils.pipeline import KGModule
98
+ from kg_utils.specs import EdgeSpec, NodeSpec
99
+
100
+
101
+ class MyExtractor(KGExtractor):
102
+ def node_kinds(self) -> list[str]:
103
+ return ["document", "section"]
104
+
105
+ def edge_kinds(self) -> list[str]:
106
+ return ["CONTAINS"]
107
+
108
+ def meaningful_node_kinds(self) -> list[str]:
109
+ return ["section"]
110
+
111
+ def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
112
+ for doc in self.repo_path.glob("**/*.md"):
113
+ doc_id = f"document:{doc}"
114
+ yield NodeSpec(node_id=doc_id, kind="document",
115
+ name=doc.stem, qualname=doc.stem,
116
+ source_path=str(doc))
117
+ # … yield sections and CONTAINS edges
118
+
119
+
120
+ class MyKG(KGModule):
121
+ _default_dir = ".mykg"
122
+
123
+ def make_extractor(self) -> KGExtractor:
124
+ return MyExtractor(self.repo_root)
125
+
126
+ def kind(self) -> str:
127
+ return "my"
128
+
129
+ def analyze(self) -> str:
130
+ s = self.stats()
131
+ return f"# MyKG\nnodes={s['total_nodes']}"
132
+
133
+
134
+ # Build and query
135
+ kg = MyKG("/path/to/repo")
136
+ kg.build(wipe=True)
137
+
138
+ result = kg.query("authentication flow", k=8, hop=1)
139
+ pack = kg.pack("error handling", max_nodes=10)
140
+ print(pack.to_markdown())
141
+ ```
142
+
143
+ ### Track metrics over time
144
+
145
+ ```python
146
+ from kg_utils.snapshots import SnapshotManager
147
+
148
+ mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")
149
+
150
+ snapshot = mgr.capture(
151
+ version="1.0.0",
152
+ branch="main",
153
+ graph_stats_dict=kg.stats(),
154
+ )
155
+ mgr.save_snapshot(snapshot)
156
+
157
+ snaps = mgr.list_snapshots(limit=5)
158
+ delta = mgr.diff_snapshots(snaps[-1]["key"], snaps[0]["key"])
159
+ ```
160
+
161
+ ---
162
+
163
+ ## API Reference
164
+
165
+ ### `kg_utils.specs`
166
+
167
+ | Class | Description |
168
+ |---|---|
169
+ | `NodeSpec` | Graph node: `node_id`, `kind`, `name`, `qualname`, `source_path`, `lineno`, `end_lineno`, `docstring`, `metadata` |
170
+ | `EdgeSpec` | Graph edge: `source_id`, `target_id`, `relation`, `weight`, `metadata` |
171
+ | `BuildStats` | Build result: node/edge counts, indexed rows, embedding dim |
172
+ | `QueryResult` | Query result: nodes, edges, seeds, hop, relevance metadata |
173
+ | `SnippetPack` | Pack result: nodes with snippets, `to_markdown()`, `to_json()`, `save()` |
174
+
175
+ ### `kg_utils.extractor`
176
+
177
+ | Class | Description |
178
+ |---|---|
179
+ | `KGExtractor` | ABC — implement `node_kinds()`, `edge_kinds()`, `extract()` |
180
+
181
+ ### `kg_utils.store`
182
+
183
+ | Class | Description |
184
+ |---|---|
185
+ | `GraphStore` | SQLite persistence: `write()`, `expand()`, `query_nodes()`, `resolve_symbols()`, `callers_of()`, `stats()` |
186
+
187
+ ### `kg_utils.semantic`
188
+
189
+ | Class / function | Description |
190
+ |---|---|
191
+ | `SemanticIndex` | LanceDB vector index: `build()`, `search()` |
192
+ | `SentenceTransformerEmbedder` | Local embedding via sentence-transformers |
193
+ | `resolve_model_path()` | Resolve model name / alias to local cache path |
194
+ | `suppress_ingestion_logging()` | Silence verbose HF / tqdm output during ingestion |
195
+
196
+ ### `kg_utils.pipeline`
197
+
198
+ | Class | Description |
199
+ |---|---|
200
+ | `KGModule` | Concrete base — implement `make_extractor()`, `kind()`, `analyze()`; get `build()`, `query()`, `pack()`, `stats()` for free |
201
+
202
+ ### `kg_utils.snapshots`
203
+
204
+ | Class | Description |
205
+ |---|---|
206
+ | `Snapshot` | Temporal snapshot keyed by git tree hash with metrics and deltas |
207
+ | `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
208
+ | `SnapshotManifest` | Fast-lookup index with format versioning |
209
+
210
+ ---
211
+
212
+ ## Project Structure
213
+
214
+ ```
215
+ KG_utils/
216
+ ├── pyproject.toml
217
+ ├── src/
218
+ │ └── kg_utils/
219
+ │ ├── __init__.py
220
+ │ ├── specs.py # NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack
221
+ │ ├── extractor.py # KGExtractor ABC
222
+ │ ├── store.py # GraphStore (SQLite)
223
+ │ ├── semantic.py # SemanticIndex, SentenceTransformerEmbedder, SeedHit
224
+ │ ├── pipeline.py # KGModule concrete base class
225
+ │ ├── module.py # Re-export shim
226
+ │ ├── embed.py # Embedder protocol, model registry
227
+ │ ├── embedder.py # SentenceTransformerEmbedder factory functions
228
+ │ └── snapshots/
229
+ │ ├── __init__.py
230
+ │ ├── models.py # Snapshot, SnapshotManifest, PruneResult
231
+ │ └── manager.py # SnapshotManager
232
+ └── tests/
233
+ ├── test_store.py # GraphStore unit tests
234
+ ├── test_pipeline_utils.py # Pipeline utility function tests
235
+ ├── test_pipeline_module.py # End-to-end integration tests (--integration)
236
+ ├── test_types.py # Spec dataclass and KGExtractor tests
237
+ ├── test_snapshots.py # Snapshot lifecycle tests
238
+ └── test_integration.py # Cross-module integration tests
239
+ ```
240
+
241
+ ---
242
+
243
+ ## Development
244
+
245
+ ```bash
246
+ git clone https://github.com/Flux-Frontiers/KG_utils.git
247
+ cd KG_utils
248
+ poetry install --with dev
249
+ ```
250
+
251
+ Run the fast test suite (no model downloads):
252
+
253
+ ```bash
254
+ poetry run pytest -m "not integration"
255
+ ```
256
+
257
+ Run all tests including semantic/integration (requires `[semantic]` extra):
258
+
259
+ ```bash
260
+ poetry run pytest
261
+ ```
262
+
263
+ ---
264
+
265
+ ## License
266
+
267
+ [Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
268
+
269
+ Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
270
+
@@ -0,0 +1,245 @@
1
+
2
+ [![Python](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue.svg)](https://www.python.org/)
3
+ [![License: Elastic-2.0](https://img.shields.io/badge/License-Elastic%202.0-blue.svg)](https://www.elastic.co/licensing/elastic-license)
4
+ [![Version](https://img.shields.io/badge/version-0.3.1-blue.svg)](https://github.com/Flux-Frontiers/KG_utils/releases)
5
+ [![CI](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml/badge.svg)](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
6
+ [![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
7
+
8
+ # kgmodule-utils
9
+
10
+ **kgmodule-utils** — Shared graph store, semantic index, pipeline base, and snapshot infrastructure for the KGModule SDK.
11
+
12
+ *Author: Eric G. Suchanek, PhD*
13
+
14
+ *Flux-Frontiers, Liberty TWP, OH*
15
+
16
+ ---
17
+
18
+ ## Overview
19
+
20
+ kgmodule-utils is the **shared SDK layer** for the Flux-Frontiers knowledge-graph ecosystem. It provides everything a domain KG module needs — from type abstractions and SQLite graph storage through LanceDB vector indexing and a full build/query/pack pipeline — so domain authors implement only what is specific to their source domain.
21
+
22
+ Every KGModule implementation — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), and others — subclasses `KGModule` from here and implements exactly three methods: `make_extractor()`, `kind()`, and `analyze()`.
23
+
24
+ ---
25
+
26
+ ## Features
27
+
28
+ - **`kg_utils.specs`** — `NodeSpec`, `EdgeSpec`, `BuildStats`, `QueryResult`, `SnippetPack` dataclasses
29
+ - **`kg_utils.extractor`** — `KGExtractor` ABC: `extract()`, `node_kinds()`, `edge_kinds()`, `coverage_metric()`
30
+ - **`kg_utils.store`** — `GraphStore`: SQLite-backed node/edge store with BFS expansion, symbol resolution, caller lookup, and provenance recording
31
+ - **`kg_utils.semantic`** — `SemanticIndex` (LanceDB), `SentenceTransformerEmbedder`, `SeedHit`, model registry, `resolve_model_path()`
32
+ - **`kg_utils.pipeline`** — `KGModule`: full build → query → pack pipeline base with hybrid semantic + lexical reranking and snippet extraction
33
+ - **`kg_utils.embedder`** — `get_embedder()`, `wrap_embedder()`, `load_sentence_transformer()` factory functions
34
+ - **`kg_utils.embed`** — `Embedder` protocol, `DEFAULT_MODEL`, `KNOWN_MODELS`, `resolve_model_path()`
35
+ - **`kg_utils.snapshots`** — `Snapshot`, `SnapshotManager`, `SnapshotManifest` for temporal metric tracking
36
+
37
+ ---
38
+
39
+ ## Installation
40
+
41
+ **Requirements:** Python ≥ 3.12, < 3.14
42
+
43
+ ### Core only (stdlib, no optional deps)
44
+
45
+ ```bash
46
+ pip install kgmodule-utils
47
+ ```
48
+
49
+ ### With semantic search (LanceDB + sentence-transformers)
50
+
51
+ ```bash
52
+ pip install 'kgmodule-utils[semantic]'
53
+ ```
54
+
55
+ ### In a Poetry project
56
+
57
+ ```toml
58
+ [tool.poetry.dependencies]
59
+ kgmodule-utils = { version = ">=0.3.1", extras = ["semantic"] }
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Quick Start
65
+
66
+ ### Build a domain KG module
67
+
68
+ ```python
69
+ from collections.abc import Iterator
70
+ from pathlib import Path
71
+
72
+ from kg_utils.extractor import KGExtractor
73
+ from kg_utils.pipeline import KGModule
74
+ from kg_utils.specs import EdgeSpec, NodeSpec
75
+
76
+
77
+ class MyExtractor(KGExtractor):
78
+ def node_kinds(self) -> list[str]:
79
+ return ["document", "section"]
80
+
81
+ def edge_kinds(self) -> list[str]:
82
+ return ["CONTAINS"]
83
+
84
+ def meaningful_node_kinds(self) -> list[str]:
85
+ return ["section"]
86
+
87
+ def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
88
+ for doc in self.repo_path.glob("**/*.md"):
89
+ doc_id = f"document:{doc}"
90
+ yield NodeSpec(node_id=doc_id, kind="document",
91
+ name=doc.stem, qualname=doc.stem,
92
+ source_path=str(doc))
93
+ # … yield sections and CONTAINS edges
94
+
95
+
96
+ class MyKG(KGModule):
97
+ _default_dir = ".mykg"
98
+
99
+ def make_extractor(self) -> KGExtractor:
100
+ return MyExtractor(self.repo_root)
101
+
102
+ def kind(self) -> str:
103
+ return "my"
104
+
105
+ def analyze(self) -> str:
106
+ s = self.stats()
107
+ return f"# MyKG\nnodes={s['total_nodes']}"
108
+
109
+
110
+ # Build and query
111
+ kg = MyKG("/path/to/repo")
112
+ kg.build(wipe=True)
113
+
114
+ result = kg.query("authentication flow", k=8, hop=1)
115
+ pack = kg.pack("error handling", max_nodes=10)
116
+ print(pack.to_markdown())
117
+ ```
118
+
119
+ ### Track metrics over time
120
+
121
+ ```python
122
+ from kg_utils.snapshots import SnapshotManager
123
+
124
+ mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")
125
+
126
+ snapshot = mgr.capture(
127
+ version="1.0.0",
128
+ branch="main",
129
+ graph_stats_dict=kg.stats(),
130
+ )
131
+ mgr.save_snapshot(snapshot)
132
+
133
+ snaps = mgr.list_snapshots(limit=5)
134
+ delta = mgr.diff_snapshots(snaps[-1]["key"], snaps[0]["key"])
135
+ ```
136
+
137
+ ---
138
+
139
+ ## API Reference
140
+
141
+ ### `kg_utils.specs`
142
+
143
+ | Class | Description |
144
+ |---|---|
145
+ | `NodeSpec` | Graph node: `node_id`, `kind`, `name`, `qualname`, `source_path`, `lineno`, `end_lineno`, `docstring`, `metadata` |
146
+ | `EdgeSpec` | Graph edge: `source_id`, `target_id`, `relation`, `weight`, `metadata` |
147
+ | `BuildStats` | Build result: node/edge counts, indexed rows, embedding dim |
148
+ | `QueryResult` | Query result: nodes, edges, seeds, hop, relevance metadata |
149
+ | `SnippetPack` | Pack result: nodes with snippets, `to_markdown()`, `to_json()`, `save()` |
150
+
151
+ ### `kg_utils.extractor`
152
+
153
+ | Class | Description |
154
+ |---|---|
155
+ | `KGExtractor` | ABC — implement `node_kinds()`, `edge_kinds()`, `extract()` |
156
+
157
+ ### `kg_utils.store`
158
+
159
+ | Class | Description |
160
+ |---|---|
161
+ | `GraphStore` | SQLite persistence: `write()`, `expand()`, `query_nodes()`, `resolve_symbols()`, `callers_of()`, `stats()` |
162
+
163
+ ### `kg_utils.semantic`
164
+
165
+ | Class / function | Description |
166
+ |---|---|
167
+ | `SemanticIndex` | LanceDB vector index: `build()`, `search()` |
168
+ | `SentenceTransformerEmbedder` | Local embedding via sentence-transformers |
169
+ | `resolve_model_path()` | Resolve model name / alias to local cache path |
170
+ | `suppress_ingestion_logging()` | Silence verbose HF / tqdm output during ingestion |
171
+
172
+ ### `kg_utils.pipeline`
173
+
174
+ | Class | Description |
175
+ |---|---|
176
+ | `KGModule` | Concrete base — implement `make_extractor()`, `kind()`, `analyze()`; get `build()`, `query()`, `pack()`, `stats()` for free |
177
+
178
+ ### `kg_utils.snapshots`
179
+
180
+ | Class | Description |
181
+ |---|---|
182
+ | `Snapshot` | Temporal snapshot keyed by git tree hash with metrics and deltas |
183
+ | `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
184
+ | `SnapshotManifest` | Fast-lookup index with format versioning |
185
+
186
+ ---
187
+
188
+ ## Project Structure
189
+
190
+ ```
191
+ KG_utils/
192
+ ├── pyproject.toml
193
+ ├── src/
194
+ │ └── kg_utils/
195
+ │ ├── __init__.py
196
+ │ ├── specs.py # NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack
197
+ │ ├── extractor.py # KGExtractor ABC
198
+ │ ├── store.py # GraphStore (SQLite)
199
+ │ ├── semantic.py # SemanticIndex, SentenceTransformerEmbedder, SeedHit
200
+ │ ├── pipeline.py # KGModule concrete base class
201
+ │ ├── module.py # Re-export shim
202
+ │ ├── embed.py # Embedder protocol, model registry
203
+ │ ├── embedder.py # SentenceTransformerEmbedder factory functions
204
+ │ └── snapshots/
205
+ │ ├── __init__.py
206
+ │ ├── models.py # Snapshot, SnapshotManifest, PruneResult
207
+ │ └── manager.py # SnapshotManager
208
+ └── tests/
209
+ ├── test_store.py # GraphStore unit tests
210
+ ├── test_pipeline_utils.py # Pipeline utility function tests
211
+ ├── test_pipeline_module.py # End-to-end integration tests (--integration)
212
+ ├── test_types.py # Spec dataclass and KGExtractor tests
213
+ ├── test_snapshots.py # Snapshot lifecycle tests
214
+ └── test_integration.py # Cross-module integration tests
215
+ ```
216
+
217
+ ---
218
+
219
+ ## Development
220
+
221
+ ```bash
222
+ git clone https://github.com/Flux-Frontiers/KG_utils.git
223
+ cd KG_utils
224
+ poetry install --with dev
225
+ ```
226
+
227
+ Run the fast test suite (no model downloads):
228
+
229
+ ```bash
230
+ poetry run pytest -m "not integration"
231
+ ```
232
+
233
+ Run all tests including semantic/integration (requires `[semantic]` extra):
234
+
235
+ ```bash
236
+ poetry run pytest
237
+ ```
238
+
239
+ ---
240
+
241
+ ## License
242
+
243
+ [Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
244
+
245
+ Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
@@ -10,7 +10,7 @@ build-backend = "poetry.core.masonry.api"
10
10
 
11
11
  [project]
12
12
  name = "kgmodule-utils"
13
- version = "0.3.0"
13
+ version = "0.3.1"
14
14
  description = "Shared types, graph store, semantic index, and pipeline base for the KGModule SDK"
15
15
  readme = "README.md"
16
16
  license = { text = "Elastic-2.0" }
@@ -1,8 +1,8 @@
1
1
  """kg_utils — Shared types, store, semantic index, and pipeline base for the KGModule SDK.
2
2
 
3
3
  Sub-packages / modules:
4
- kg_utils.types — NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack,
5
- KGExtractor (abstract), KGModule (abstract interface).
4
+ kg_utils.specs — NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack.
5
+ kg_utils.extractor — KGExtractor abstract base class.
6
6
  kg_utils.store — GraphStore: SQLite-backed authoritative node/edge store.
7
7
  kg_utils.semantic — Embedder, SentenceTransformerEmbedder, SemanticIndex, SeedHit.
8
8
  kg_utils.pipeline — KGModule: concrete base class with full build/query/pack pipeline.
@@ -17,4 +17,4 @@ Optional extras
17
17
  pip install 'kgmodule-utils[semantic]' # lancedb + sentence-transformers
18
18
  """
19
19
 
20
- __version__ = "0.3.0"
20
+ __version__ = "0.3.1"
@@ -1,216 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: kgmodule-utils
3
- Version: 0.3.0
4
- Summary: Shared types, graph store, semantic index, and pipeline base for the KGModule SDK
5
- License: Elastic-2.0
6
- License-File: LICENSE
7
- Keywords: knowledge-graph,kgmodule,sdk,types,snapshots
8
- Author: Eric G. Suchanek, PhD
9
- Author-email: suchanek@flux-frontiers.com
10
- Requires-Python: >=3.12,<3.14
11
- Classifier: Development Status :: 4 - Beta
12
- Classifier: Intended Audience :: Developers
13
- Classifier: Programming Language :: Python :: 3
14
- Classifier: Programming Language :: Python :: 3.12
15
- Classifier: Programming Language :: Python :: 3.13
16
- Provides-Extra: semantic
17
- Requires-Dist: lancedb (>=0.19.0) ; extra == "semantic"
18
- Requires-Dist: numpy (>=1.24.0) ; extra == "semantic"
19
- Requires-Dist: sentence-transformers (>=5.4.1) ; extra == "semantic"
20
- Requires-Dist: torch (>=2.5.1) ; extra == "semantic"
21
- Requires-Dist: transformers (>=4.40.0,<4.57) ; extra == "semantic"
22
- Project-URL: Repository, https://github.com/Flux-Frontiers/kg_utils
23
- Description-Content-Type: text/markdown
24
-
25
-
26
- [![Python](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue.svg)](https://www.python.org/)
27
- [![License: Elastic-2.0](https://img.shields.io/badge/License-Elastic%202.0-blue.svg)](https://www.elastic.co/licensing/elastic-license)
28
- [![Version](https://img.shields.io/badge/version-0.2.0-blue.svg)](https://github.com/Flux-Frontiers/KG_utils/releases)
29
- [![CI](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml/badge.svg)](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
30
- [![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
31
-
32
- # kgmodule-utils
33
-
34
- **kgmodule-utils** — Shared types and snapshot infrastructure for the KGModule SDK.
35
-
36
- *Author: Eric G. Suchanek, PhD*
37
-
38
- *Flux-Frontiers, Liberty TWP, OH*
39
-
40
- ---
41
-
42
- ## Overview
43
-
44
- kgmodule-utils is the **zero-dependency foundation package** for the Flux-Frontiers knowledge-graph ecosystem. It provides the canonical type abstractions and temporal snapshot infrastructure that all KGModule implementations — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [FTreeKG](https://github.com/Flux-Frontiers/ftree_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), [AgentKG](https://github.com/Flux-Frontiers/agent_kg) — depend on.
45
-
46
- Every KGModule shares the same `NodeSpec`, `EdgeSpec`, `KGExtractor`, and `KGModule` base classes defined here, ensuring consistent interfaces across the ecosystem. The snapshot subsystem enables temporal metric tracking, delta comparison, and pruning across git commits.
47
-
48
- ---
49
-
50
- ## Features
51
-
52
- - **Core type abstractions** — `NodeSpec`, `EdgeSpec`, `QueryResult`, `SnippetPack` dataclasses for knowledge-graph nodes, edges, and query results
53
- - **KGExtractor base class** — Abstract interface for domain-specific extractors with `extract()`, `node_kinds()`, `edge_kinds()`, and `coverage_metric()`
54
- - **KGModule base class** — Abstract interface for knowledge-graph modules with `build()`, `query()`, `pack()`, `stats()`, and `analyze()`
55
- - **Snapshot models** — `Snapshot` dataclass keyed by git tree hash with free-form metrics, hotspots, issues, and delta tracking
56
- - **SnapshotManager** — Capture, persist, load, list, diff, and prune snapshots with automatic deduplication and delta computation
57
- - **SnapshotManifest** — Fast-lookup index of all snapshots with format versioning
58
- - **Zero dependencies** — Stdlib-only; no external packages required at runtime
59
-
60
- ---
61
-
62
- ## Installation
63
-
64
- **Requirements:** Python ≥ 3.12, < 3.14
65
-
66
- ### Standalone (pip)
67
-
68
- ```bash
69
- pip install kgmodule-utils
70
- ```
71
-
72
- ### Existing Poetry project
73
-
74
- ```bash
75
- poetry add kgmodule-utils
76
- ```
77
-
78
- Or declare it directly in your `pyproject.toml`:
79
-
80
- ```toml
81
- [tool.poetry.dependencies]
82
- kgmodule-utils = "^0.2.0"
83
- ```
84
-
85
- ---
86
-
87
- ## Quick Start
88
-
89
- ### Types — Define a KGModule
90
-
91
- ```python
92
- from kg_utils.types import NodeSpec, EdgeSpec, KGExtractor, KGModule
93
-
94
- class MyExtractor(KGExtractor):
95
- def node_kinds(self) -> list[str]:
96
- return ["module", "function", "class"]
97
-
98
- def edge_kinds(self) -> list[str]:
99
- return ["CONTAINS", "CALLS", "IMPORTS"]
100
-
101
- def extract(self, source_root: str):
102
- # Yield NodeSpec and EdgeSpec objects from your domain
103
- yield NodeSpec(
104
- node_id="fn:main:hello",
105
- kind="function",
106
- name="hello",
107
- qualname="main.hello",
108
- source_path="main.py",
109
- docstring="Greet the user.",
110
- )
111
- yield EdgeSpec(
112
- source_id="mod:main",
113
- target_id="fn:main:hello",
114
- relation="CONTAINS",
115
- )
116
- ```
117
-
118
- ### Snapshots — Track metrics over time
119
-
120
- ```python
121
- from kg_utils.snapshots import SnapshotManager
122
-
123
- mgr = SnapshotManager(snapshots_dir=".my_kg/snapshots", package_name="my-kg")
124
-
125
- # Capture a snapshot from current metrics
126
- snapshot = mgr.capture(metrics={
127
- "total_nodes": 142,
128
- "total_edges": 387,
129
- "coverage": 0.78,
130
- })
131
-
132
- # Save with automatic deduplication
133
- mgr.save_snapshot(snapshot)
134
-
135
- # List and compare
136
- snaps = mgr.list_snapshots(limit=5)
137
- delta = mgr.diff_snapshots(key_a=snaps[0].key, key_b=snaps[-1].key)
138
- ```
139
-
140
- ---
141
-
142
- ## API Reference
143
-
144
- ### `kg_utils.types`
145
-
146
- | Class | Description |
147
- |---|---|
148
- | `NodeSpec` | Dataclass for KG nodes: `node_id`, `kind`, `name`, `qualname`, `source_path`, `docstring` |
149
- | `EdgeSpec` | Dataclass for KG edges: `source_id`, `target_id`, `relation` |
150
- | `QueryResult` | Container for query responses with nodes, edges, and metadata |
151
- | `SnippetPack` | Extended result container with source-code snippets |
152
- | `KGExtractor` | Abstract base class for domain extractors |
153
- | `KGModule` | Abstract base class for knowledge-graph modules |
154
-
155
- ### `kg_utils.snapshots`
156
-
157
- | Class | Description |
158
- |---|---|
159
- | `Snapshot` | Temporal snapshot keyed by git tree hash with free-form metrics and deltas |
160
- | `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
161
- | `SnapshotManifest` | Index of all snapshots with format versioning and fast lookup |
162
- | `PruneResult` | Summary of pruning operations: removed, orphaned, broken entries |
163
-
164
- ---
165
-
166
- ## Project Structure
167
-
168
- ```
169
- KG_utils/
170
- ├── LICENSE
171
- ├── README.md
172
- ├── pyproject.toml
173
- ├── pytest.ini
174
- ├── src/
175
- │ └── kg_utils/
176
- │ ├── __init__.py
177
- │ ├── py.typed # PEP 561 marker
178
- │ ├── types/
179
- │ │ ├── __init__.py # Public re-exports
180
- │ │ ├── specs.py # NodeSpec, EdgeSpec, QueryResult, SnippetPack
181
- │ │ ├── extractor.py # KGExtractor ABC
182
- │ │ └── module.py # KGModule ABC
183
- │ └── snapshots/
184
- │ ├── __init__.py # Public re-exports
185
- │ ├── models.py # Snapshot, SnapshotManifest, PruneResult
186
- │ └── manager.py # SnapshotManager
187
- └── tests/
188
- ├── __init__.py
189
- ├── test_types.py
190
- └── test_snapshots.py
191
- ```
192
-
193
- ---
194
-
195
- ## Development
196
-
197
- ```bash
198
- git clone https://github.com/Flux-Frontiers/KG_utils.git
199
- cd KG_utils
200
- poetry install --with dev
201
- ```
202
-
203
- Run the test suite:
204
-
205
- ```bash
206
- poetry run pytest
207
- ```
208
-
209
- ---
210
-
211
- ## License
212
-
213
- [Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
214
-
215
- Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
216
-
@@ -1,191 +0,0 @@
1
-
2
- [![Python](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue.svg)](https://www.python.org/)
3
- [![License: Elastic-2.0](https://img.shields.io/badge/License-Elastic%202.0-blue.svg)](https://www.elastic.co/licensing/elastic-license)
4
- [![Version](https://img.shields.io/badge/version-0.2.0-blue.svg)](https://github.com/Flux-Frontiers/KG_utils/releases)
5
- [![CI](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml/badge.svg)](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
6
- [![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
7
-
8
- # kgmodule-utils
9
-
10
- **kgmodule-utils** — Shared types and snapshot infrastructure for the KGModule SDK.
11
-
12
- *Author: Eric G. Suchanek, PhD*
13
-
14
- *Flux-Frontiers, Liberty TWP, OH*
15
-
16
- ---
17
-
18
- ## Overview
19
-
20
- kgmodule-utils is the **zero-dependency foundation package** for the Flux-Frontiers knowledge-graph ecosystem. It provides the canonical type abstractions and temporal snapshot infrastructure that all KGModule implementations — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [FTreeKG](https://github.com/Flux-Frontiers/ftree_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), [AgentKG](https://github.com/Flux-Frontiers/agent_kg) — depend on.
21
-
22
- Every KGModule shares the same `NodeSpec`, `EdgeSpec`, `KGExtractor`, and `KGModule` base classes defined here, ensuring consistent interfaces across the ecosystem. The snapshot subsystem enables temporal metric tracking, delta comparison, and pruning across git commits.
23
-
24
- ---
25
-
26
- ## Features
27
-
28
- - **Core type abstractions** — `NodeSpec`, `EdgeSpec`, `QueryResult`, `SnippetPack` dataclasses for knowledge-graph nodes, edges, and query results
29
- - **KGExtractor base class** — Abstract interface for domain-specific extractors with `extract()`, `node_kinds()`, `edge_kinds()`, and `coverage_metric()`
30
- - **KGModule base class** — Abstract interface for knowledge-graph modules with `build()`, `query()`, `pack()`, `stats()`, and `analyze()`
31
- - **Snapshot models** — `Snapshot` dataclass keyed by git tree hash with free-form metrics, hotspots, issues, and delta tracking
32
- - **SnapshotManager** — Capture, persist, load, list, diff, and prune snapshots with automatic deduplication and delta computation
33
- - **SnapshotManifest** — Fast-lookup index of all snapshots with format versioning
34
- - **Zero dependencies** — Stdlib-only; no external packages required at runtime
35
-
36
- ---
37
-
38
- ## Installation
39
-
40
- **Requirements:** Python ≥ 3.12, < 3.14
41
-
42
- ### Standalone (pip)
43
-
44
- ```bash
45
- pip install kgmodule-utils
46
- ```
47
-
48
- ### Existing Poetry project
49
-
50
- ```bash
51
- poetry add kgmodule-utils
52
- ```
53
-
54
- Or declare it directly in your `pyproject.toml`:
55
-
56
- ```toml
57
- [tool.poetry.dependencies]
58
- kgmodule-utils = "^0.2.0"
59
- ```
60
-
61
- ---
62
-
63
- ## Quick Start
64
-
65
- ### Types — Define a KGModule
66
-
67
- ```python
68
- from kg_utils.types import NodeSpec, EdgeSpec, KGExtractor, KGModule
69
-
70
- class MyExtractor(KGExtractor):
71
- def node_kinds(self) -> list[str]:
72
- return ["module", "function", "class"]
73
-
74
- def edge_kinds(self) -> list[str]:
75
- return ["CONTAINS", "CALLS", "IMPORTS"]
76
-
77
- def extract(self, source_root: str):
78
- # Yield NodeSpec and EdgeSpec objects from your domain
79
- yield NodeSpec(
80
- node_id="fn:main:hello",
81
- kind="function",
82
- name="hello",
83
- qualname="main.hello",
84
- source_path="main.py",
85
- docstring="Greet the user.",
86
- )
87
- yield EdgeSpec(
88
- source_id="mod:main",
89
- target_id="fn:main:hello",
90
- relation="CONTAINS",
91
- )
92
- ```
93
-
94
- ### Snapshots — Track metrics over time
95
-
96
- ```python
97
- from kg_utils.snapshots import SnapshotManager
98
-
99
- mgr = SnapshotManager(snapshots_dir=".my_kg/snapshots", package_name="my-kg")
100
-
101
- # Capture a snapshot from current metrics
102
- snapshot = mgr.capture(metrics={
103
- "total_nodes": 142,
104
- "total_edges": 387,
105
- "coverage": 0.78,
106
- })
107
-
108
- # Save with automatic deduplication
109
- mgr.save_snapshot(snapshot)
110
-
111
- # List and compare
112
- snaps = mgr.list_snapshots(limit=5)
113
- delta = mgr.diff_snapshots(key_a=snaps[0].key, key_b=snaps[-1].key)
114
- ```
115
-
116
- ---
117
-
118
- ## API Reference
119
-
120
- ### `kg_utils.types`
121
-
122
- | Class | Description |
123
- |---|---|
124
- | `NodeSpec` | Dataclass for KG nodes: `node_id`, `kind`, `name`, `qualname`, `source_path`, `docstring` |
125
- | `EdgeSpec` | Dataclass for KG edges: `source_id`, `target_id`, `relation` |
126
- | `QueryResult` | Container for query responses with nodes, edges, and metadata |
127
- | `SnippetPack` | Extended result container with source-code snippets |
128
- | `KGExtractor` | Abstract base class for domain extractors |
129
- | `KGModule` | Abstract base class for knowledge-graph modules |
130
-
131
- ### `kg_utils.snapshots`
132
-
133
- | Class | Description |
134
- |---|---|
135
- | `Snapshot` | Temporal snapshot keyed by git tree hash with free-form metrics and deltas |
136
- | `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
137
- | `SnapshotManifest` | Index of all snapshots with format versioning and fast lookup |
138
- | `PruneResult` | Summary of pruning operations: removed, orphaned, broken entries |
139
-
140
- ---
141
-
142
- ## Project Structure
143
-
144
- ```
145
- KG_utils/
146
- ├── LICENSE
147
- ├── README.md
148
- ├── pyproject.toml
149
- ├── pytest.ini
150
- ├── src/
151
- │ └── kg_utils/
152
- │ ├── __init__.py
153
- │ ├── py.typed # PEP 561 marker
154
- │ ├── types/
155
- │ │ ├── __init__.py # Public re-exports
156
- │ │ ├── specs.py # NodeSpec, EdgeSpec, QueryResult, SnippetPack
157
- │ │ ├── extractor.py # KGExtractor ABC
158
- │ │ └── module.py # KGModule ABC
159
- │ └── snapshots/
160
- │ ├── __init__.py # Public re-exports
161
- │ ├── models.py # Snapshot, SnapshotManifest, PruneResult
162
- │ └── manager.py # SnapshotManager
163
- └── tests/
164
- ├── __init__.py
165
- ├── test_types.py
166
- └── test_snapshots.py
167
- ```
168
-
169
- ---
170
-
171
- ## Development
172
-
173
- ```bash
174
- git clone https://github.com/Flux-Frontiers/KG_utils.git
175
- cd KG_utils
176
- poetry install --with dev
177
- ```
178
-
179
- Run the test suite:
180
-
181
- ```bash
182
- poetry run pytest
183
- ```
184
-
185
- ---
186
-
187
- ## License
188
-
189
- [Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
190
-
191
- Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
@@ -1,14 +0,0 @@
1
- """kg_utils.types — Core dataclasses and base classes for the KGModule SDK."""
2
-
3
- from kg_utils.types.specs import EdgeSpec, NodeSpec, QueryResult, SnippetPack
4
- from kg_utils.types.extractor import KGExtractor
5
- from kg_utils.types.module import KGModule
6
-
7
- __all__ = [
8
- "EdgeSpec",
9
- "KGExtractor",
10
- "KGModule",
11
- "NodeSpec",
12
- "QueryResult",
13
- "SnippetPack",
14
- ]
@@ -1,68 +0,0 @@
1
- """kg_utils/types/extractor.py — Abstract base class for KG extractors."""
2
-
3
- from __future__ import annotations
4
-
5
- from collections.abc import Iterator
6
- from pathlib import Path
7
- from typing import Any
8
-
9
- from kg_utils.types.specs import EdgeSpec, NodeSpec
10
-
11
-
12
- class KGExtractor:
13
- """Base class for knowledge-graph extractors.
14
-
15
- Subclasses must implement :meth:`node_kinds`, :meth:`edge_kinds`,
16
- and :meth:`extract`.
17
-
18
- :param repo_path: Absolute path to the repository or corpus root.
19
- :param config: Optional domain-specific configuration dict.
20
- """
21
-
22
- def __init__(self, repo_path: Path, config: dict[str, Any] | None = None) -> None:
23
- self.repo_path = repo_path
24
- self.config = config or {}
25
-
26
- def node_kinds(self) -> list[str]:
27
- """Return canonical node kind names.
28
-
29
- :return: List of node kind strings.
30
- """
31
- raise NotImplementedError
32
-
33
- def edge_kinds(self) -> list[str]:
34
- """Return canonical edge relation types.
35
-
36
- :return: List of edge relation strings.
37
- """
38
- raise NotImplementedError
39
-
40
- def meaningful_node_kinds(self) -> list[str]:
41
- """Return node kinds included in the vector index and coverage metrics.
42
-
43
- Override to exclude structural stubs from the default (all node_kinds).
44
-
45
- :return: Subset of node_kinds() to index semantically.
46
- """
47
- return self.node_kinds()
48
-
49
- def coverage_metric(self, nodes: list[NodeSpec]) -> float:
50
- """Compute a domain coverage quality metric.
51
-
52
- Default: fraction of meaningful nodes with a non-empty docstring.
53
-
54
- :param nodes: All extracted NodeSpec objects.
55
- :return: Coverage score in [0.0, 1.0].
56
- """
57
- meaningful = [n for n in nodes if n.kind in self.meaningful_node_kinds()]
58
- if not meaningful:
59
- return 0.0
60
- covered = sum(1 for n in meaningful if n.docstring.strip())
61
- return covered / len(meaningful)
62
-
63
- def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
64
- """Traverse the source and yield NodeSpec / EdgeSpec objects.
65
-
66
- :return: Iterator of NodeSpec and EdgeSpec objects.
67
- """
68
- raise NotImplementedError
@@ -1,87 +0,0 @@
1
- """kg_utils/types/module.py — Abstract base class for KG modules."""
2
-
3
- from __future__ import annotations
4
-
5
- from pathlib import Path
6
- from typing import Any
7
-
8
- from kg_utils.types.extractor import KGExtractor
9
- from kg_utils.types.specs import QueryResult, SnippetPack
10
-
11
-
12
- class KGModule:
13
- """Base class for knowledge-graph modules.
14
-
15
- Subclasses must implement :meth:`make_extractor`, :meth:`kind`,
16
- and should override :meth:`build`, :meth:`query`, :meth:`stats`,
17
- :meth:`pack`, and :meth:`analyze` with domain-specific logic.
18
-
19
- :param repo_root: Absolute path to the repository or corpus root.
20
- :param db_path: Path for the SQLite graph database.
21
- :param lancedb_dir: Path for the LanceDB vector index directory.
22
- :param config: Optional domain-specific configuration dict.
23
- """
24
-
25
- def __init__(
26
- self,
27
- repo_root: Path,
28
- db_path: Path | None = None,
29
- lancedb_dir: Path | None = None,
30
- config: dict[str, Any] | None = None,
31
- ) -> None:
32
- self.repo_root = repo_root
33
- self.db_path = db_path
34
- self.lancedb_dir = lancedb_dir
35
- self.config = config or {}
36
-
37
- def make_extractor(self) -> KGExtractor:
38
- """Return the domain extractor for this module.
39
-
40
- :return: KGExtractor subclass instance.
41
- """
42
- raise NotImplementedError
43
-
44
- def kind(self) -> str:
45
- """Return the KGKind string for this module.
46
-
47
- :return: Kind string (e.g. "code", "meta", "doc").
48
- """
49
- raise NotImplementedError
50
-
51
- def build(self, wipe: bool = False) -> None:
52
- """Build the knowledge graph index.
53
-
54
- :param wipe: If True, delete existing index before building.
55
- """
56
- raise NotImplementedError
57
-
58
- def query(self, q: str, k: int = 8, **kwargs: Any) -> QueryResult:
59
- """Query the knowledge graph.
60
-
61
- :param q: Natural-language query string.
62
- :param k: Number of results to return.
63
- :return: QueryResult with matched nodes and edges.
64
- """
65
- raise NotImplementedError
66
-
67
- def stats(self) -> dict[str, Any]:
68
- """Return statistics about the knowledge graph.
69
-
70
- :return: Dict with keys like total_nodes, total_edges, etc.
71
- """
72
- raise NotImplementedError
73
-
74
- def pack(self, q: str, **kwargs: Any) -> SnippetPack:
75
- """Pack query results with source context.
76
-
77
- :param q: Natural-language query string.
78
- :return: SnippetPack with nodes, edges, and snippets.
79
- """
80
- raise NotImplementedError
81
-
82
- def analyze(self) -> str:
83
- """Run full analysis and return a Markdown report.
84
-
85
- :return: Markdown-formatted analysis report.
86
- """
87
- raise NotImplementedError
@@ -1,90 +0,0 @@
1
- """kg_utils/types/specs.py — Core dataclasses shared by all KG modules."""
2
-
3
- from __future__ import annotations
4
-
5
- from dataclasses import dataclass, field
6
- from typing import Any
7
-
8
-
9
- @dataclass
10
- class NodeSpec:
11
- """Specification for a knowledge-graph node.
12
-
13
- :param node_id: Unique identifier, typically ``<kind>:<path>:<qualname>``.
14
- :param kind: Node kind (e.g. "file", "function", "class", "directory").
15
- :param name: Short display name.
16
- :param qualname: Fully-qualified name or relative path.
17
- :param source_path: Path to the source file (relative to repo root).
18
- :param docstring: Semantic content for vector indexing.
19
- """
20
-
21
- node_id: str
22
- kind: str
23
- name: str
24
- qualname: str
25
- source_path: str
26
- docstring: str = ""
27
-
28
-
29
- @dataclass
30
- class EdgeSpec:
31
- """Specification for a knowledge-graph edge.
32
-
33
- :param source_id: Node ID of the edge source.
34
- :param target_id: Node ID of the edge target.
35
- :param relation: Relation type (e.g. "CONTAINS", "CALLS", "IMPORTS").
36
- """
37
-
38
- source_id: str
39
- target_id: str
40
- relation: str
41
-
42
-
43
- @dataclass
44
- class QueryResult:
45
- """Result container returned by KGModule.query().
46
-
47
- :param nodes: List of matched node dicts.
48
- :param edges: List of matched edge dicts.
49
- :param seeds: Number of seed nodes from vector search.
50
- :param expanded_nodes: Number of nodes after graph expansion.
51
- :param returned_nodes: Number of nodes actually returned.
52
- :param hop: Number of hops used in graph expansion.
53
- :param rels: Relation types used in expansion.
54
- """
55
-
56
- nodes: list[dict[str, Any]] = field(default_factory=list)
57
- edges: list[dict[str, Any]] = field(default_factory=list)
58
- seeds: int = 0
59
- expanded_nodes: int = 0
60
- returned_nodes: int = 0
61
- hop: int = 0
62
- rels: list[str] = field(default_factory=list)
63
-
64
-
65
- @dataclass
66
- class SnippetPack:
67
- """Result container returned by KGModule.pack().
68
-
69
- :param query: The original query string.
70
- :param seeds: Number of seed nodes from vector search.
71
- :param expanded_nodes: Number of nodes after graph expansion.
72
- :param returned_nodes: Number of nodes actually returned.
73
- :param hop: Number of hops used in expansion.
74
- :param rels: Relation types used in expansion.
75
- :param model: Embedding model identifier.
76
- :param nodes: Node dicts included in the pack.
77
- :param edges: Edge dicts included in the pack.
78
- :param snippets: Source-code snippets (for code KGs).
79
- """
80
-
81
- query: str
82
- seeds: int = 0
83
- expanded_nodes: int = 0
84
- returned_nodes: int = 0
85
- hop: int = 0
86
- rels: list[str] = field(default_factory=list)
87
- model: str = ""
88
- nodes: list[Any] = field(default_factory=list)
89
- edges: list[Any] = field(default_factory=list)
90
- snippets: list[Any] = field(default_factory=list)
File without changes