kgmodule-utils 0.3.0__tar.gz → 0.3.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- kgmodule_utils-0.3.1/PKG-INFO +270 -0
- kgmodule_utils-0.3.1/README.md +245 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/pyproject.toml +1 -1
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/__init__.py +3 -3
- kgmodule_utils-0.3.0/PKG-INFO +0 -216
- kgmodule_utils-0.3.0/README.md +0 -191
- kgmodule_utils-0.3.0/src/kg_utils/types/__init__.py +0 -14
- kgmodule_utils-0.3.0/src/kg_utils/types/extractor.py +0 -68
- kgmodule_utils-0.3.0/src/kg_utils/types/module.py +0 -87
- kgmodule_utils-0.3.0/src/kg_utils/types/specs.py +0 -90
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/LICENSE +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/embed.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/embedder.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/extractor.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/module.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/pipeline.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/py.typed +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/semantic.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/snapshots/__init__.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/snapshots/manager.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/snapshots/models.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/specs.py +0 -0
- {kgmodule_utils-0.3.0 → kgmodule_utils-0.3.1}/src/kg_utils/store.py +0 -0
|
@@ -0,0 +1,270 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: kgmodule-utils
|
|
3
|
+
Version: 0.3.1
|
|
4
|
+
Summary: Shared types, graph store, semantic index, and pipeline base for the KGModule SDK
|
|
5
|
+
License: Elastic-2.0
|
|
6
|
+
License-File: LICENSE
|
|
7
|
+
Keywords: knowledge-graph,kgmodule,sdk,types,snapshots
|
|
8
|
+
Author: Eric G. Suchanek, PhD
|
|
9
|
+
Author-email: suchanek@flux-frontiers.com
|
|
10
|
+
Requires-Python: >=3.12,<3.14
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
16
|
+
Provides-Extra: semantic
|
|
17
|
+
Requires-Dist: lancedb (>=0.19.0) ; extra == "semantic"
|
|
18
|
+
Requires-Dist: numpy (>=1.24.0) ; extra == "semantic"
|
|
19
|
+
Requires-Dist: sentence-transformers (>=5.4.1) ; extra == "semantic"
|
|
20
|
+
Requires-Dist: torch (>=2.5.1) ; extra == "semantic"
|
|
21
|
+
Requires-Dist: transformers (>=4.40.0,<4.57) ; extra == "semantic"
|
|
22
|
+
Project-URL: Repository, https://github.com/Flux-Frontiers/kg_utils
|
|
23
|
+
Description-Content-Type: text/markdown
|
|
24
|
+
|
|
25
|
+
|
|
26
|
+
[](https://www.python.org/)
|
|
27
|
+
[](https://www.elastic.co/licensing/elastic-license)
|
|
28
|
+
[](https://github.com/Flux-Frontiers/KG_utils/releases)
|
|
29
|
+
[](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
|
|
30
|
+
[](https://python-poetry.org/)
|
|
31
|
+
|
|
32
|
+
# kgmodule-utils
|
|
33
|
+
|
|
34
|
+
**kgmodule-utils** — Shared graph store, semantic index, pipeline base, and snapshot infrastructure for the KGModule SDK.
|
|
35
|
+
|
|
36
|
+
*Author: Eric G. Suchanek, PhD*
|
|
37
|
+
|
|
38
|
+
*Flux-Frontiers, Liberty TWP, OH*
|
|
39
|
+
|
|
40
|
+
---
|
|
41
|
+
|
|
42
|
+
## Overview
|
|
43
|
+
|
|
44
|
+
kgmodule-utils is the **shared SDK layer** for the Flux-Frontiers knowledge-graph ecosystem. It provides everything a domain KG module needs — from type abstractions and SQLite graph storage through LanceDB vector indexing and a full build/query/pack pipeline — so domain authors implement only what is specific to their source domain.
|
|
45
|
+
|
|
46
|
+
Every KGModule implementation — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), and others — subclasses `KGModule` from here and implements exactly three methods: `make_extractor()`, `kind()`, and `analyze()`.
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## Features
|
|
51
|
+
|
|
52
|
+
- **`kg_utils.specs`** — `NodeSpec`, `EdgeSpec`, `BuildStats`, `QueryResult`, `SnippetPack` dataclasses
|
|
53
|
+
- **`kg_utils.extractor`** — `KGExtractor` ABC: `extract()`, `node_kinds()`, `edge_kinds()`, `coverage_metric()`
|
|
54
|
+
- **`kg_utils.store`** — `GraphStore`: SQLite-backed node/edge store with BFS expansion, symbol resolution, caller lookup, and provenance recording
|
|
55
|
+
- **`kg_utils.semantic`** — `SemanticIndex` (LanceDB), `SentenceTransformerEmbedder`, `SeedHit`, model registry, `resolve_model_path()`
|
|
56
|
+
- **`kg_utils.pipeline`** — `KGModule`: full build → query → pack pipeline base with hybrid semantic + lexical reranking and snippet extraction
|
|
57
|
+
- **`kg_utils.embedder`** — `get_embedder()`, `wrap_embedder()`, `load_sentence_transformer()` factory functions
|
|
58
|
+
- **`kg_utils.embed`** — `Embedder` protocol, `DEFAULT_MODEL`, `KNOWN_MODELS`, `resolve_model_path()`
|
|
59
|
+
- **`kg_utils.snapshots`** — `Snapshot`, `SnapshotManager`, `SnapshotManifest` for temporal metric tracking
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Installation
|
|
64
|
+
|
|
65
|
+
**Requirements:** Python ≥ 3.12, < 3.14
|
|
66
|
+
|
|
67
|
+
### Core only (stdlib, no optional deps)
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
pip install kgmodule-utils
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### With semantic search (LanceDB + sentence-transformers)
|
|
74
|
+
|
|
75
|
+
```bash
|
|
76
|
+
pip install 'kgmodule-utils[semantic]'
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
### In a Poetry project
|
|
80
|
+
|
|
81
|
+
```toml
|
|
82
|
+
[tool.poetry.dependencies]
|
|
83
|
+
kgmodule-utils = { version = ">=0.3.1", extras = ["semantic"] }
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## Quick Start
|
|
89
|
+
|
|
90
|
+
### Build a domain KG module
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
from collections.abc import Iterator
|
|
94
|
+
from pathlib import Path
|
|
95
|
+
|
|
96
|
+
from kg_utils.extractor import KGExtractor
|
|
97
|
+
from kg_utils.pipeline import KGModule
|
|
98
|
+
from kg_utils.specs import EdgeSpec, NodeSpec
|
|
99
|
+
|
|
100
|
+
|
|
101
|
+
class MyExtractor(KGExtractor):
|
|
102
|
+
def node_kinds(self) -> list[str]:
|
|
103
|
+
return ["document", "section"]
|
|
104
|
+
|
|
105
|
+
def edge_kinds(self) -> list[str]:
|
|
106
|
+
return ["CONTAINS"]
|
|
107
|
+
|
|
108
|
+
def meaningful_node_kinds(self) -> list[str]:
|
|
109
|
+
return ["section"]
|
|
110
|
+
|
|
111
|
+
def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
|
|
112
|
+
for doc in self.repo_path.glob("**/*.md"):
|
|
113
|
+
doc_id = f"document:{doc}"
|
|
114
|
+
yield NodeSpec(node_id=doc_id, kind="document",
|
|
115
|
+
name=doc.stem, qualname=doc.stem,
|
|
116
|
+
source_path=str(doc))
|
|
117
|
+
# … yield sections and CONTAINS edges
|
|
118
|
+
|
|
119
|
+
|
|
120
|
+
class MyKG(KGModule):
|
|
121
|
+
_default_dir = ".mykg"
|
|
122
|
+
|
|
123
|
+
def make_extractor(self) -> KGExtractor:
|
|
124
|
+
return MyExtractor(self.repo_root)
|
|
125
|
+
|
|
126
|
+
def kind(self) -> str:
|
|
127
|
+
return "my"
|
|
128
|
+
|
|
129
|
+
def analyze(self) -> str:
|
|
130
|
+
s = self.stats()
|
|
131
|
+
return f"# MyKG\nnodes={s['total_nodes']}"
|
|
132
|
+
|
|
133
|
+
|
|
134
|
+
# Build and query
|
|
135
|
+
kg = MyKG("/path/to/repo")
|
|
136
|
+
kg.build(wipe=True)
|
|
137
|
+
|
|
138
|
+
result = kg.query("authentication flow", k=8, hop=1)
|
|
139
|
+
pack = kg.pack("error handling", max_nodes=10)
|
|
140
|
+
print(pack.to_markdown())
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### Track metrics over time
|
|
144
|
+
|
|
145
|
+
```python
|
|
146
|
+
from kg_utils.snapshots import SnapshotManager
|
|
147
|
+
|
|
148
|
+
mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")
|
|
149
|
+
|
|
150
|
+
snapshot = mgr.capture(
|
|
151
|
+
version="1.0.0",
|
|
152
|
+
branch="main",
|
|
153
|
+
graph_stats_dict=kg.stats(),
|
|
154
|
+
)
|
|
155
|
+
mgr.save_snapshot(snapshot)
|
|
156
|
+
|
|
157
|
+
snaps = mgr.list_snapshots(limit=5)
|
|
158
|
+
delta = mgr.diff_snapshots(snaps[-1]["key"], snaps[0]["key"])
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## API Reference
|
|
164
|
+
|
|
165
|
+
### `kg_utils.specs`
|
|
166
|
+
|
|
167
|
+
| Class | Description |
|
|
168
|
+
|---|---|
|
|
169
|
+
| `NodeSpec` | Graph node: `node_id`, `kind`, `name`, `qualname`, `source_path`, `lineno`, `end_lineno`, `docstring`, `metadata` |
|
|
170
|
+
| `EdgeSpec` | Graph edge: `source_id`, `target_id`, `relation`, `weight`, `metadata` |
|
|
171
|
+
| `BuildStats` | Build result: node/edge counts, indexed rows, embedding dim |
|
|
172
|
+
| `QueryResult` | Query result: nodes, edges, seeds, hop, relevance metadata |
|
|
173
|
+
| `SnippetPack` | Pack result: nodes with snippets, `to_markdown()`, `to_json()`, `save()` |
|
|
174
|
+
|
|
175
|
+
### `kg_utils.extractor`
|
|
176
|
+
|
|
177
|
+
| Class | Description |
|
|
178
|
+
|---|---|
|
|
179
|
+
| `KGExtractor` | ABC — implement `node_kinds()`, `edge_kinds()`, `extract()` |
|
|
180
|
+
|
|
181
|
+
### `kg_utils.store`
|
|
182
|
+
|
|
183
|
+
| Class | Description |
|
|
184
|
+
|---|---|
|
|
185
|
+
| `GraphStore` | SQLite persistence: `write()`, `expand()`, `query_nodes()`, `resolve_symbols()`, `callers_of()`, `stats()` |
|
|
186
|
+
|
|
187
|
+
### `kg_utils.semantic`
|
|
188
|
+
|
|
189
|
+
| Class / function | Description |
|
|
190
|
+
|---|---|
|
|
191
|
+
| `SemanticIndex` | LanceDB vector index: `build()`, `search()` |
|
|
192
|
+
| `SentenceTransformerEmbedder` | Local embedding via sentence-transformers |
|
|
193
|
+
| `resolve_model_path()` | Resolve model name / alias to local cache path |
|
|
194
|
+
| `suppress_ingestion_logging()` | Silence verbose HF / tqdm output during ingestion |
|
|
195
|
+
|
|
196
|
+
### `kg_utils.pipeline`
|
|
197
|
+
|
|
198
|
+
| Class | Description |
|
|
199
|
+
|---|---|
|
|
200
|
+
| `KGModule` | Concrete base — implement `make_extractor()`, `kind()`, `analyze()`; get `build()`, `query()`, `pack()`, `stats()` for free |
|
|
201
|
+
|
|
202
|
+
### `kg_utils.snapshots`
|
|
203
|
+
|
|
204
|
+
| Class | Description |
|
|
205
|
+
|---|---|
|
|
206
|
+
| `Snapshot` | Temporal snapshot keyed by git tree hash with metrics and deltas |
|
|
207
|
+
| `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
|
|
208
|
+
| `SnapshotManifest` | Fast-lookup index with format versioning |
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## Project Structure
|
|
213
|
+
|
|
214
|
+
```
|
|
215
|
+
KG_utils/
|
|
216
|
+
├── pyproject.toml
|
|
217
|
+
├── src/
|
|
218
|
+
│ └── kg_utils/
|
|
219
|
+
│ ├── __init__.py
|
|
220
|
+
│ ├── specs.py # NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack
|
|
221
|
+
│ ├── extractor.py # KGExtractor ABC
|
|
222
|
+
│ ├── store.py # GraphStore (SQLite)
|
|
223
|
+
│ ├── semantic.py # SemanticIndex, SentenceTransformerEmbedder, SeedHit
|
|
224
|
+
│ ├── pipeline.py # KGModule concrete base class
|
|
225
|
+
│ ├── module.py # Re-export shim
|
|
226
|
+
│ ├── embed.py # Embedder protocol, model registry
|
|
227
|
+
│ ├── embedder.py # SentenceTransformerEmbedder factory functions
|
|
228
|
+
│ └── snapshots/
|
|
229
|
+
│ ├── __init__.py
|
|
230
|
+
│ ├── models.py # Snapshot, SnapshotManifest, PruneResult
|
|
231
|
+
│ └── manager.py # SnapshotManager
|
|
232
|
+
└── tests/
|
|
233
|
+
├── test_store.py # GraphStore unit tests
|
|
234
|
+
├── test_pipeline_utils.py # Pipeline utility function tests
|
|
235
|
+
├── test_pipeline_module.py # End-to-end integration tests (--integration)
|
|
236
|
+
├── test_types.py # Spec dataclass and KGExtractor tests
|
|
237
|
+
├── test_snapshots.py # Snapshot lifecycle tests
|
|
238
|
+
└── test_integration.py # Cross-module integration tests
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
---
|
|
242
|
+
|
|
243
|
+
## Development
|
|
244
|
+
|
|
245
|
+
```bash
|
|
246
|
+
git clone https://github.com/Flux-Frontiers/KG_utils.git
|
|
247
|
+
cd KG_utils
|
|
248
|
+
poetry install --with dev
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
Run the fast test suite (no model downloads):
|
|
252
|
+
|
|
253
|
+
```bash
|
|
254
|
+
poetry run pytest -m "not integration"
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
Run all tests including semantic/integration (requires `[semantic]` extra):
|
|
258
|
+
|
|
259
|
+
```bash
|
|
260
|
+
poetry run pytest
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
## License
|
|
266
|
+
|
|
267
|
+
[Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
|
|
268
|
+
|
|
269
|
+
Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
|
|
270
|
+
|
|
@@ -0,0 +1,245 @@
|
|
|
1
|
+
|
|
2
|
+
[](https://www.python.org/)
|
|
3
|
+
[](https://www.elastic.co/licensing/elastic-license)
|
|
4
|
+
[](https://github.com/Flux-Frontiers/KG_utils/releases)
|
|
5
|
+
[](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
|
|
6
|
+
[](https://python-poetry.org/)
|
|
7
|
+
|
|
8
|
+
# kgmodule-utils
|
|
9
|
+
|
|
10
|
+
**kgmodule-utils** — Shared graph store, semantic index, pipeline base, and snapshot infrastructure for the KGModule SDK.
|
|
11
|
+
|
|
12
|
+
*Author: Eric G. Suchanek, PhD*
|
|
13
|
+
|
|
14
|
+
*Flux-Frontiers, Liberty TWP, OH*
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Overview
|
|
19
|
+
|
|
20
|
+
kgmodule-utils is the **shared SDK layer** for the Flux-Frontiers knowledge-graph ecosystem. It provides everything a domain KG module needs — from type abstractions and SQLite graph storage through LanceDB vector indexing and a full build/query/pack pipeline — so domain authors implement only what is specific to their source domain.
|
|
21
|
+
|
|
22
|
+
Every KGModule implementation — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), and others — subclasses `KGModule` from here and implements exactly three methods: `make_extractor()`, `kind()`, and `analyze()`.
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
## Features
|
|
27
|
+
|
|
28
|
+
- **`kg_utils.specs`** — `NodeSpec`, `EdgeSpec`, `BuildStats`, `QueryResult`, `SnippetPack` dataclasses
|
|
29
|
+
- **`kg_utils.extractor`** — `KGExtractor` ABC: `extract()`, `node_kinds()`, `edge_kinds()`, `coverage_metric()`
|
|
30
|
+
- **`kg_utils.store`** — `GraphStore`: SQLite-backed node/edge store with BFS expansion, symbol resolution, caller lookup, and provenance recording
|
|
31
|
+
- **`kg_utils.semantic`** — `SemanticIndex` (LanceDB), `SentenceTransformerEmbedder`, `SeedHit`, model registry, `resolve_model_path()`
|
|
32
|
+
- **`kg_utils.pipeline`** — `KGModule`: full build → query → pack pipeline base with hybrid semantic + lexical reranking and snippet extraction
|
|
33
|
+
- **`kg_utils.embedder`** — `get_embedder()`, `wrap_embedder()`, `load_sentence_transformer()` factory functions
|
|
34
|
+
- **`kg_utils.embed`** — `Embedder` protocol, `DEFAULT_MODEL`, `KNOWN_MODELS`, `resolve_model_path()`
|
|
35
|
+
- **`kg_utils.snapshots`** — `Snapshot`, `SnapshotManager`, `SnapshotManifest` for temporal metric tracking
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Installation
|
|
40
|
+
|
|
41
|
+
**Requirements:** Python ≥ 3.12, < 3.14
|
|
42
|
+
|
|
43
|
+
### Core only (stdlib, no optional deps)
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
pip install kgmodule-utils
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### With semantic search (LanceDB + sentence-transformers)
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
pip install 'kgmodule-utils[semantic]'
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### In a Poetry project
|
|
56
|
+
|
|
57
|
+
```toml
|
|
58
|
+
[tool.poetry.dependencies]
|
|
59
|
+
kgmodule-utils = { version = ">=0.3.1", extras = ["semantic"] }
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
---
|
|
63
|
+
|
|
64
|
+
## Quick Start
|
|
65
|
+
|
|
66
|
+
### Build a domain KG module
|
|
67
|
+
|
|
68
|
+
```python
|
|
69
|
+
from collections.abc import Iterator
|
|
70
|
+
from pathlib import Path
|
|
71
|
+
|
|
72
|
+
from kg_utils.extractor import KGExtractor
|
|
73
|
+
from kg_utils.pipeline import KGModule
|
|
74
|
+
from kg_utils.specs import EdgeSpec, NodeSpec
|
|
75
|
+
|
|
76
|
+
|
|
77
|
+
class MyExtractor(KGExtractor):
|
|
78
|
+
def node_kinds(self) -> list[str]:
|
|
79
|
+
return ["document", "section"]
|
|
80
|
+
|
|
81
|
+
def edge_kinds(self) -> list[str]:
|
|
82
|
+
return ["CONTAINS"]
|
|
83
|
+
|
|
84
|
+
def meaningful_node_kinds(self) -> list[str]:
|
|
85
|
+
return ["section"]
|
|
86
|
+
|
|
87
|
+
def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
|
|
88
|
+
for doc in self.repo_path.glob("**/*.md"):
|
|
89
|
+
doc_id = f"document:{doc}"
|
|
90
|
+
yield NodeSpec(node_id=doc_id, kind="document",
|
|
91
|
+
name=doc.stem, qualname=doc.stem,
|
|
92
|
+
source_path=str(doc))
|
|
93
|
+
# … yield sections and CONTAINS edges
|
|
94
|
+
|
|
95
|
+
|
|
96
|
+
class MyKG(KGModule):
|
|
97
|
+
_default_dir = ".mykg"
|
|
98
|
+
|
|
99
|
+
def make_extractor(self) -> KGExtractor:
|
|
100
|
+
return MyExtractor(self.repo_root)
|
|
101
|
+
|
|
102
|
+
def kind(self) -> str:
|
|
103
|
+
return "my"
|
|
104
|
+
|
|
105
|
+
def analyze(self) -> str:
|
|
106
|
+
s = self.stats()
|
|
107
|
+
return f"# MyKG\nnodes={s['total_nodes']}"
|
|
108
|
+
|
|
109
|
+
|
|
110
|
+
# Build and query
|
|
111
|
+
kg = MyKG("/path/to/repo")
|
|
112
|
+
kg.build(wipe=True)
|
|
113
|
+
|
|
114
|
+
result = kg.query("authentication flow", k=8, hop=1)
|
|
115
|
+
pack = kg.pack("error handling", max_nodes=10)
|
|
116
|
+
print(pack.to_markdown())
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Track metrics over time
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
from kg_utils.snapshots import SnapshotManager
|
|
123
|
+
|
|
124
|
+
mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")
|
|
125
|
+
|
|
126
|
+
snapshot = mgr.capture(
|
|
127
|
+
version="1.0.0",
|
|
128
|
+
branch="main",
|
|
129
|
+
graph_stats_dict=kg.stats(),
|
|
130
|
+
)
|
|
131
|
+
mgr.save_snapshot(snapshot)
|
|
132
|
+
|
|
133
|
+
snaps = mgr.list_snapshots(limit=5)
|
|
134
|
+
delta = mgr.diff_snapshots(snaps[-1]["key"], snaps[0]["key"])
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
---
|
|
138
|
+
|
|
139
|
+
## API Reference
|
|
140
|
+
|
|
141
|
+
### `kg_utils.specs`
|
|
142
|
+
|
|
143
|
+
| Class | Description |
|
|
144
|
+
|---|---|
|
|
145
|
+
| `NodeSpec` | Graph node: `node_id`, `kind`, `name`, `qualname`, `source_path`, `lineno`, `end_lineno`, `docstring`, `metadata` |
|
|
146
|
+
| `EdgeSpec` | Graph edge: `source_id`, `target_id`, `relation`, `weight`, `metadata` |
|
|
147
|
+
| `BuildStats` | Build result: node/edge counts, indexed rows, embedding dim |
|
|
148
|
+
| `QueryResult` | Query result: nodes, edges, seeds, hop, relevance metadata |
|
|
149
|
+
| `SnippetPack` | Pack result: nodes with snippets, `to_markdown()`, `to_json()`, `save()` |
|
|
150
|
+
|
|
151
|
+
### `kg_utils.extractor`
|
|
152
|
+
|
|
153
|
+
| Class | Description |
|
|
154
|
+
|---|---|
|
|
155
|
+
| `KGExtractor` | ABC — implement `node_kinds()`, `edge_kinds()`, `extract()` |
|
|
156
|
+
|
|
157
|
+
### `kg_utils.store`
|
|
158
|
+
|
|
159
|
+
| Class | Description |
|
|
160
|
+
|---|---|
|
|
161
|
+
| `GraphStore` | SQLite persistence: `write()`, `expand()`, `query_nodes()`, `resolve_symbols()`, `callers_of()`, `stats()` |
|
|
162
|
+
|
|
163
|
+
### `kg_utils.semantic`
|
|
164
|
+
|
|
165
|
+
| Class / function | Description |
|
|
166
|
+
|---|---|
|
|
167
|
+
| `SemanticIndex` | LanceDB vector index: `build()`, `search()` |
|
|
168
|
+
| `SentenceTransformerEmbedder` | Local embedding via sentence-transformers |
|
|
169
|
+
| `resolve_model_path()` | Resolve model name / alias to local cache path |
|
|
170
|
+
| `suppress_ingestion_logging()` | Silence verbose HF / tqdm output during ingestion |
|
|
171
|
+
|
|
172
|
+
### `kg_utils.pipeline`
|
|
173
|
+
|
|
174
|
+
| Class | Description |
|
|
175
|
+
|---|---|
|
|
176
|
+
| `KGModule` | Concrete base — implement `make_extractor()`, `kind()`, `analyze()`; get `build()`, `query()`, `pack()`, `stats()` for free |
|
|
177
|
+
|
|
178
|
+
### `kg_utils.snapshots`
|
|
179
|
+
|
|
180
|
+
| Class | Description |
|
|
181
|
+
|---|---|
|
|
182
|
+
| `Snapshot` | Temporal snapshot keyed by git tree hash with metrics and deltas |
|
|
183
|
+
| `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
|
|
184
|
+
| `SnapshotManifest` | Fast-lookup index with format versioning |
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
## Project Structure
|
|
189
|
+
|
|
190
|
+
```
|
|
191
|
+
KG_utils/
|
|
192
|
+
├── pyproject.toml
|
|
193
|
+
├── src/
|
|
194
|
+
│ └── kg_utils/
|
|
195
|
+
│ ├── __init__.py
|
|
196
|
+
│ ├── specs.py # NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack
|
|
197
|
+
│ ├── extractor.py # KGExtractor ABC
|
|
198
|
+
│ ├── store.py # GraphStore (SQLite)
|
|
199
|
+
│ ├── semantic.py # SemanticIndex, SentenceTransformerEmbedder, SeedHit
|
|
200
|
+
│ ├── pipeline.py # KGModule concrete base class
|
|
201
|
+
│ ├── module.py # Re-export shim
|
|
202
|
+
│ ├── embed.py # Embedder protocol, model registry
|
|
203
|
+
│ ├── embedder.py # SentenceTransformerEmbedder factory functions
|
|
204
|
+
│ └── snapshots/
|
|
205
|
+
│ ├── __init__.py
|
|
206
|
+
│ ├── models.py # Snapshot, SnapshotManifest, PruneResult
|
|
207
|
+
│ └── manager.py # SnapshotManager
|
|
208
|
+
└── tests/
|
|
209
|
+
├── test_store.py # GraphStore unit tests
|
|
210
|
+
├── test_pipeline_utils.py # Pipeline utility function tests
|
|
211
|
+
├── test_pipeline_module.py # End-to-end integration tests (--integration)
|
|
212
|
+
├── test_types.py # Spec dataclass and KGExtractor tests
|
|
213
|
+
├── test_snapshots.py # Snapshot lifecycle tests
|
|
214
|
+
└── test_integration.py # Cross-module integration tests
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
---
|
|
218
|
+
|
|
219
|
+
## Development
|
|
220
|
+
|
|
221
|
+
```bash
|
|
222
|
+
git clone https://github.com/Flux-Frontiers/KG_utils.git
|
|
223
|
+
cd KG_utils
|
|
224
|
+
poetry install --with dev
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
Run the fast test suite (no model downloads):
|
|
228
|
+
|
|
229
|
+
```bash
|
|
230
|
+
poetry run pytest -m "not integration"
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
Run all tests including semantic/integration (requires `[semantic]` extra):
|
|
234
|
+
|
|
235
|
+
```bash
|
|
236
|
+
poetry run pytest
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
---
|
|
240
|
+
|
|
241
|
+
## License
|
|
242
|
+
|
|
243
|
+
[Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
|
|
244
|
+
|
|
245
|
+
Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
|
|
@@ -10,7 +10,7 @@ build-backend = "poetry.core.masonry.api"
|
|
|
10
10
|
|
|
11
11
|
[project]
|
|
12
12
|
name = "kgmodule-utils"
|
|
13
|
-
version = "0.3.
|
|
13
|
+
version = "0.3.1"
|
|
14
14
|
description = "Shared types, graph store, semantic index, and pipeline base for the KGModule SDK"
|
|
15
15
|
readme = "README.md"
|
|
16
16
|
license = { text = "Elastic-2.0" }
|
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
"""kg_utils — Shared types, store, semantic index, and pipeline base for the KGModule SDK.
|
|
2
2
|
|
|
3
3
|
Sub-packages / modules:
|
|
4
|
-
kg_utils.
|
|
5
|
-
|
|
4
|
+
kg_utils.specs — NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack.
|
|
5
|
+
kg_utils.extractor — KGExtractor abstract base class.
|
|
6
6
|
kg_utils.store — GraphStore: SQLite-backed authoritative node/edge store.
|
|
7
7
|
kg_utils.semantic — Embedder, SentenceTransformerEmbedder, SemanticIndex, SeedHit.
|
|
8
8
|
kg_utils.pipeline — KGModule: concrete base class with full build/query/pack pipeline.
|
|
@@ -17,4 +17,4 @@ Optional extras
|
|
|
17
17
|
pip install 'kgmodule-utils[semantic]' # lancedb + sentence-transformers
|
|
18
18
|
"""
|
|
19
19
|
|
|
20
|
-
__version__ = "0.3.
|
|
20
|
+
__version__ = "0.3.1"
|
kgmodule_utils-0.3.0/PKG-INFO
DELETED
|
@@ -1,216 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: kgmodule-utils
|
|
3
|
-
Version: 0.3.0
|
|
4
|
-
Summary: Shared types, graph store, semantic index, and pipeline base for the KGModule SDK
|
|
5
|
-
License: Elastic-2.0
|
|
6
|
-
License-File: LICENSE
|
|
7
|
-
Keywords: knowledge-graph,kgmodule,sdk,types,snapshots
|
|
8
|
-
Author: Eric G. Suchanek, PhD
|
|
9
|
-
Author-email: suchanek@flux-frontiers.com
|
|
10
|
-
Requires-Python: >=3.12,<3.14
|
|
11
|
-
Classifier: Development Status :: 4 - Beta
|
|
12
|
-
Classifier: Intended Audience :: Developers
|
|
13
|
-
Classifier: Programming Language :: Python :: 3
|
|
14
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
-
Classifier: Programming Language :: Python :: 3.13
|
|
16
|
-
Provides-Extra: semantic
|
|
17
|
-
Requires-Dist: lancedb (>=0.19.0) ; extra == "semantic"
|
|
18
|
-
Requires-Dist: numpy (>=1.24.0) ; extra == "semantic"
|
|
19
|
-
Requires-Dist: sentence-transformers (>=5.4.1) ; extra == "semantic"
|
|
20
|
-
Requires-Dist: torch (>=2.5.1) ; extra == "semantic"
|
|
21
|
-
Requires-Dist: transformers (>=4.40.0,<4.57) ; extra == "semantic"
|
|
22
|
-
Project-URL: Repository, https://github.com/Flux-Frontiers/kg_utils
|
|
23
|
-
Description-Content-Type: text/markdown
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
[](https://www.python.org/)
|
|
27
|
-
[](https://www.elastic.co/licensing/elastic-license)
|
|
28
|
-
[](https://github.com/Flux-Frontiers/KG_utils/releases)
|
|
29
|
-
[](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
|
|
30
|
-
[](https://python-poetry.org/)
|
|
31
|
-
|
|
32
|
-
# kgmodule-utils
|
|
33
|
-
|
|
34
|
-
**kgmodule-utils** — Shared types and snapshot infrastructure for the KGModule SDK.
|
|
35
|
-
|
|
36
|
-
*Author: Eric G. Suchanek, PhD*
|
|
37
|
-
|
|
38
|
-
*Flux-Frontiers, Liberty TWP, OH*
|
|
39
|
-
|
|
40
|
-
---
|
|
41
|
-
|
|
42
|
-
## Overview
|
|
43
|
-
|
|
44
|
-
kgmodule-utils is the **zero-dependency foundation package** for the Flux-Frontiers knowledge-graph ecosystem. It provides the canonical type abstractions and temporal snapshot infrastructure that all KGModule implementations — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [FTreeKG](https://github.com/Flux-Frontiers/ftree_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), [AgentKG](https://github.com/Flux-Frontiers/agent_kg) — depend on.
|
|
45
|
-
|
|
46
|
-
Every KGModule shares the same `NodeSpec`, `EdgeSpec`, `KGExtractor`, and `KGModule` base classes defined here, ensuring consistent interfaces across the ecosystem. The snapshot subsystem enables temporal metric tracking, delta comparison, and pruning across git commits.
|
|
47
|
-
|
|
48
|
-
---
|
|
49
|
-
|
|
50
|
-
## Features
|
|
51
|
-
|
|
52
|
-
- **Core type abstractions** — `NodeSpec`, `EdgeSpec`, `QueryResult`, `SnippetPack` dataclasses for knowledge-graph nodes, edges, and query results
|
|
53
|
-
- **KGExtractor base class** — Abstract interface for domain-specific extractors with `extract()`, `node_kinds()`, `edge_kinds()`, and `coverage_metric()`
|
|
54
|
-
- **KGModule base class** — Abstract interface for knowledge-graph modules with `build()`, `query()`, `pack()`, `stats()`, and `analyze()`
|
|
55
|
-
- **Snapshot models** — `Snapshot` dataclass keyed by git tree hash with free-form metrics, hotspots, issues, and delta tracking
|
|
56
|
-
- **SnapshotManager** — Capture, persist, load, list, diff, and prune snapshots with automatic deduplication and delta computation
|
|
57
|
-
- **SnapshotManifest** — Fast-lookup index of all snapshots with format versioning
|
|
58
|
-
- **Zero dependencies** — Stdlib-only; no external packages required at runtime
|
|
59
|
-
|
|
60
|
-
---
|
|
61
|
-
|
|
62
|
-
## Installation
|
|
63
|
-
|
|
64
|
-
**Requirements:** Python ≥ 3.12, < 3.14
|
|
65
|
-
|
|
66
|
-
### Standalone (pip)
|
|
67
|
-
|
|
68
|
-
```bash
|
|
69
|
-
pip install kgmodule-utils
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
### Existing Poetry project
|
|
73
|
-
|
|
74
|
-
```bash
|
|
75
|
-
poetry add kgmodule-utils
|
|
76
|
-
```
|
|
77
|
-
|
|
78
|
-
Or declare it directly in your `pyproject.toml`:
|
|
79
|
-
|
|
80
|
-
```toml
|
|
81
|
-
[tool.poetry.dependencies]
|
|
82
|
-
kgmodule-utils = "^0.2.0"
|
|
83
|
-
```
|
|
84
|
-
|
|
85
|
-
---
|
|
86
|
-
|
|
87
|
-
## Quick Start
|
|
88
|
-
|
|
89
|
-
### Types — Define a KGModule
|
|
90
|
-
|
|
91
|
-
```python
|
|
92
|
-
from kg_utils.types import NodeSpec, EdgeSpec, KGExtractor, KGModule
|
|
93
|
-
|
|
94
|
-
class MyExtractor(KGExtractor):
|
|
95
|
-
def node_kinds(self) -> list[str]:
|
|
96
|
-
return ["module", "function", "class"]
|
|
97
|
-
|
|
98
|
-
def edge_kinds(self) -> list[str]:
|
|
99
|
-
return ["CONTAINS", "CALLS", "IMPORTS"]
|
|
100
|
-
|
|
101
|
-
def extract(self, source_root: str):
|
|
102
|
-
# Yield NodeSpec and EdgeSpec objects from your domain
|
|
103
|
-
yield NodeSpec(
|
|
104
|
-
node_id="fn:main:hello",
|
|
105
|
-
kind="function",
|
|
106
|
-
name="hello",
|
|
107
|
-
qualname="main.hello",
|
|
108
|
-
source_path="main.py",
|
|
109
|
-
docstring="Greet the user.",
|
|
110
|
-
)
|
|
111
|
-
yield EdgeSpec(
|
|
112
|
-
source_id="mod:main",
|
|
113
|
-
target_id="fn:main:hello",
|
|
114
|
-
relation="CONTAINS",
|
|
115
|
-
)
|
|
116
|
-
```
|
|
117
|
-
|
|
118
|
-
### Snapshots — Track metrics over time
|
|
119
|
-
|
|
120
|
-
```python
|
|
121
|
-
from kg_utils.snapshots import SnapshotManager
|
|
122
|
-
|
|
123
|
-
mgr = SnapshotManager(snapshots_dir=".my_kg/snapshots", package_name="my-kg")
|
|
124
|
-
|
|
125
|
-
# Capture a snapshot from current metrics
|
|
126
|
-
snapshot = mgr.capture(metrics={
|
|
127
|
-
"total_nodes": 142,
|
|
128
|
-
"total_edges": 387,
|
|
129
|
-
"coverage": 0.78,
|
|
130
|
-
})
|
|
131
|
-
|
|
132
|
-
# Save with automatic deduplication
|
|
133
|
-
mgr.save_snapshot(snapshot)
|
|
134
|
-
|
|
135
|
-
# List and compare
|
|
136
|
-
snaps = mgr.list_snapshots(limit=5)
|
|
137
|
-
delta = mgr.diff_snapshots(key_a=snaps[0].key, key_b=snaps[-1].key)
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
---
|
|
141
|
-
|
|
142
|
-
## API Reference
|
|
143
|
-
|
|
144
|
-
### `kg_utils.types`
|
|
145
|
-
|
|
146
|
-
| Class | Description |
|
|
147
|
-
|---|---|
|
|
148
|
-
| `NodeSpec` | Dataclass for KG nodes: `node_id`, `kind`, `name`, `qualname`, `source_path`, `docstring` |
|
|
149
|
-
| `EdgeSpec` | Dataclass for KG edges: `source_id`, `target_id`, `relation` |
|
|
150
|
-
| `QueryResult` | Container for query responses with nodes, edges, and metadata |
|
|
151
|
-
| `SnippetPack` | Extended result container with source-code snippets |
|
|
152
|
-
| `KGExtractor` | Abstract base class for domain extractors |
|
|
153
|
-
| `KGModule` | Abstract base class for knowledge-graph modules |
|
|
154
|
-
|
|
155
|
-
### `kg_utils.snapshots`
|
|
156
|
-
|
|
157
|
-
| Class | Description |
|
|
158
|
-
|---|---|
|
|
159
|
-
| `Snapshot` | Temporal snapshot keyed by git tree hash with free-form metrics and deltas |
|
|
160
|
-
| `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
|
|
161
|
-
| `SnapshotManifest` | Index of all snapshots with format versioning and fast lookup |
|
|
162
|
-
| `PruneResult` | Summary of pruning operations: removed, orphaned, broken entries |
|
|
163
|
-
|
|
164
|
-
---
|
|
165
|
-
|
|
166
|
-
## Project Structure
|
|
167
|
-
|
|
168
|
-
```
|
|
169
|
-
KG_utils/
|
|
170
|
-
├── LICENSE
|
|
171
|
-
├── README.md
|
|
172
|
-
├── pyproject.toml
|
|
173
|
-
├── pytest.ini
|
|
174
|
-
├── src/
|
|
175
|
-
│ └── kg_utils/
|
|
176
|
-
│ ├── __init__.py
|
|
177
|
-
│ ├── py.typed # PEP 561 marker
|
|
178
|
-
│ ├── types/
|
|
179
|
-
│ │ ├── __init__.py # Public re-exports
|
|
180
|
-
│ │ ├── specs.py # NodeSpec, EdgeSpec, QueryResult, SnippetPack
|
|
181
|
-
│ │ ├── extractor.py # KGExtractor ABC
|
|
182
|
-
│ │ └── module.py # KGModule ABC
|
|
183
|
-
│ └── snapshots/
|
|
184
|
-
│ ├── __init__.py # Public re-exports
|
|
185
|
-
│ ├── models.py # Snapshot, SnapshotManifest, PruneResult
|
|
186
|
-
│ └── manager.py # SnapshotManager
|
|
187
|
-
└── tests/
|
|
188
|
-
├── __init__.py
|
|
189
|
-
├── test_types.py
|
|
190
|
-
└── test_snapshots.py
|
|
191
|
-
```
|
|
192
|
-
|
|
193
|
-
---
|
|
194
|
-
|
|
195
|
-
## Development
|
|
196
|
-
|
|
197
|
-
```bash
|
|
198
|
-
git clone https://github.com/Flux-Frontiers/KG_utils.git
|
|
199
|
-
cd KG_utils
|
|
200
|
-
poetry install --with dev
|
|
201
|
-
```
|
|
202
|
-
|
|
203
|
-
Run the test suite:
|
|
204
|
-
|
|
205
|
-
```bash
|
|
206
|
-
poetry run pytest
|
|
207
|
-
```
|
|
208
|
-
|
|
209
|
-
---
|
|
210
|
-
|
|
211
|
-
## License
|
|
212
|
-
|
|
213
|
-
[Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
|
|
214
|
-
|
|
215
|
-
Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
|
|
216
|
-
|
kgmodule_utils-0.3.0/README.md
DELETED
|
@@ -1,191 +0,0 @@
|
|
|
1
|
-
|
|
2
|
-
[](https://www.python.org/)
|
|
3
|
-
[](https://www.elastic.co/licensing/elastic-license)
|
|
4
|
-
[](https://github.com/Flux-Frontiers/KG_utils/releases)
|
|
5
|
-
[](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
|
|
6
|
-
[](https://python-poetry.org/)
|
|
7
|
-
|
|
8
|
-
# kgmodule-utils
|
|
9
|
-
|
|
10
|
-
**kgmodule-utils** — Shared types and snapshot infrastructure for the KGModule SDK.
|
|
11
|
-
|
|
12
|
-
*Author: Eric G. Suchanek, PhD*
|
|
13
|
-
|
|
14
|
-
*Flux-Frontiers, Liberty TWP, OH*
|
|
15
|
-
|
|
16
|
-
---
|
|
17
|
-
|
|
18
|
-
## Overview
|
|
19
|
-
|
|
20
|
-
kgmodule-utils is the **zero-dependency foundation package** for the Flux-Frontiers knowledge-graph ecosystem. It provides the canonical type abstractions and temporal snapshot infrastructure that all KGModule implementations — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [FTreeKG](https://github.com/Flux-Frontiers/ftree_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), [AgentKG](https://github.com/Flux-Frontiers/agent_kg) — depend on.
|
|
21
|
-
|
|
22
|
-
Every KGModule shares the same `NodeSpec`, `EdgeSpec`, `KGExtractor`, and `KGModule` base classes defined here, ensuring consistent interfaces across the ecosystem. The snapshot subsystem enables temporal metric tracking, delta comparison, and pruning across git commits.
|
|
23
|
-
|
|
24
|
-
---
|
|
25
|
-
|
|
26
|
-
## Features
|
|
27
|
-
|
|
28
|
-
- **Core type abstractions** — `NodeSpec`, `EdgeSpec`, `QueryResult`, `SnippetPack` dataclasses for knowledge-graph nodes, edges, and query results
|
|
29
|
-
- **KGExtractor base class** — Abstract interface for domain-specific extractors with `extract()`, `node_kinds()`, `edge_kinds()`, and `coverage_metric()`
|
|
30
|
-
- **KGModule base class** — Abstract interface for knowledge-graph modules with `build()`, `query()`, `pack()`, `stats()`, and `analyze()`
|
|
31
|
-
- **Snapshot models** — `Snapshot` dataclass keyed by git tree hash with free-form metrics, hotspots, issues, and delta tracking
|
|
32
|
-
- **SnapshotManager** — Capture, persist, load, list, diff, and prune snapshots with automatic deduplication and delta computation
|
|
33
|
-
- **SnapshotManifest** — Fast-lookup index of all snapshots with format versioning
|
|
34
|
-
- **Zero dependencies** — Stdlib-only; no external packages required at runtime
|
|
35
|
-
|
|
36
|
-
---
|
|
37
|
-
|
|
38
|
-
## Installation
|
|
39
|
-
|
|
40
|
-
**Requirements:** Python ≥ 3.12, < 3.14
|
|
41
|
-
|
|
42
|
-
### Standalone (pip)
|
|
43
|
-
|
|
44
|
-
```bash
|
|
45
|
-
pip install kgmodule-utils
|
|
46
|
-
```
|
|
47
|
-
|
|
48
|
-
### Existing Poetry project
|
|
49
|
-
|
|
50
|
-
```bash
|
|
51
|
-
poetry add kgmodule-utils
|
|
52
|
-
```
|
|
53
|
-
|
|
54
|
-
Or declare it directly in your `pyproject.toml`:
|
|
55
|
-
|
|
56
|
-
```toml
|
|
57
|
-
[tool.poetry.dependencies]
|
|
58
|
-
kgmodule-utils = "^0.2.0"
|
|
59
|
-
```
|
|
60
|
-
|
|
61
|
-
---
|
|
62
|
-
|
|
63
|
-
## Quick Start
|
|
64
|
-
|
|
65
|
-
### Types — Define a KGModule
|
|
66
|
-
|
|
67
|
-
```python
|
|
68
|
-
from kg_utils.types import NodeSpec, EdgeSpec, KGExtractor, KGModule
|
|
69
|
-
|
|
70
|
-
class MyExtractor(KGExtractor):
|
|
71
|
-
def node_kinds(self) -> list[str]:
|
|
72
|
-
return ["module", "function", "class"]
|
|
73
|
-
|
|
74
|
-
def edge_kinds(self) -> list[str]:
|
|
75
|
-
return ["CONTAINS", "CALLS", "IMPORTS"]
|
|
76
|
-
|
|
77
|
-
def extract(self, source_root: str):
|
|
78
|
-
# Yield NodeSpec and EdgeSpec objects from your domain
|
|
79
|
-
yield NodeSpec(
|
|
80
|
-
node_id="fn:main:hello",
|
|
81
|
-
kind="function",
|
|
82
|
-
name="hello",
|
|
83
|
-
qualname="main.hello",
|
|
84
|
-
source_path="main.py",
|
|
85
|
-
docstring="Greet the user.",
|
|
86
|
-
)
|
|
87
|
-
yield EdgeSpec(
|
|
88
|
-
source_id="mod:main",
|
|
89
|
-
target_id="fn:main:hello",
|
|
90
|
-
relation="CONTAINS",
|
|
91
|
-
)
|
|
92
|
-
```
|
|
93
|
-
|
|
94
|
-
### Snapshots — Track metrics over time
|
|
95
|
-
|
|
96
|
-
```python
|
|
97
|
-
from kg_utils.snapshots import SnapshotManager
|
|
98
|
-
|
|
99
|
-
mgr = SnapshotManager(snapshots_dir=".my_kg/snapshots", package_name="my-kg")
|
|
100
|
-
|
|
101
|
-
# Capture a snapshot from current metrics
|
|
102
|
-
snapshot = mgr.capture(metrics={
|
|
103
|
-
"total_nodes": 142,
|
|
104
|
-
"total_edges": 387,
|
|
105
|
-
"coverage": 0.78,
|
|
106
|
-
})
|
|
107
|
-
|
|
108
|
-
# Save with automatic deduplication
|
|
109
|
-
mgr.save_snapshot(snapshot)
|
|
110
|
-
|
|
111
|
-
# List and compare
|
|
112
|
-
snaps = mgr.list_snapshots(limit=5)
|
|
113
|
-
delta = mgr.diff_snapshots(key_a=snaps[0].key, key_b=snaps[-1].key)
|
|
114
|
-
```
|
|
115
|
-
|
|
116
|
-
---
|
|
117
|
-
|
|
118
|
-
## API Reference
|
|
119
|
-
|
|
120
|
-
### `kg_utils.types`
|
|
121
|
-
|
|
122
|
-
| Class | Description |
|
|
123
|
-
|---|---|
|
|
124
|
-
| `NodeSpec` | Dataclass for KG nodes: `node_id`, `kind`, `name`, `qualname`, `source_path`, `docstring` |
|
|
125
|
-
| `EdgeSpec` | Dataclass for KG edges: `source_id`, `target_id`, `relation` |
|
|
126
|
-
| `QueryResult` | Container for query responses with nodes, edges, and metadata |
|
|
127
|
-
| `SnippetPack` | Extended result container with source-code snippets |
|
|
128
|
-
| `KGExtractor` | Abstract base class for domain extractors |
|
|
129
|
-
| `KGModule` | Abstract base class for knowledge-graph modules |
|
|
130
|
-
|
|
131
|
-
### `kg_utils.snapshots`
|
|
132
|
-
|
|
133
|
-
| Class | Description |
|
|
134
|
-
|---|---|
|
|
135
|
-
| `Snapshot` | Temporal snapshot keyed by git tree hash with free-form metrics and deltas |
|
|
136
|
-
| `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
|
|
137
|
-
| `SnapshotManifest` | Index of all snapshots with format versioning and fast lookup |
|
|
138
|
-
| `PruneResult` | Summary of pruning operations: removed, orphaned, broken entries |
|
|
139
|
-
|
|
140
|
-
---
|
|
141
|
-
|
|
142
|
-
## Project Structure
|
|
143
|
-
|
|
144
|
-
```
|
|
145
|
-
KG_utils/
|
|
146
|
-
├── LICENSE
|
|
147
|
-
├── README.md
|
|
148
|
-
├── pyproject.toml
|
|
149
|
-
├── pytest.ini
|
|
150
|
-
├── src/
|
|
151
|
-
│ └── kg_utils/
|
|
152
|
-
│ ├── __init__.py
|
|
153
|
-
│ ├── py.typed # PEP 561 marker
|
|
154
|
-
│ ├── types/
|
|
155
|
-
│ │ ├── __init__.py # Public re-exports
|
|
156
|
-
│ │ ├── specs.py # NodeSpec, EdgeSpec, QueryResult, SnippetPack
|
|
157
|
-
│ │ ├── extractor.py # KGExtractor ABC
|
|
158
|
-
│ │ └── module.py # KGModule ABC
|
|
159
|
-
│ └── snapshots/
|
|
160
|
-
│ ├── __init__.py # Public re-exports
|
|
161
|
-
│ ├── models.py # Snapshot, SnapshotManifest, PruneResult
|
|
162
|
-
│ └── manager.py # SnapshotManager
|
|
163
|
-
└── tests/
|
|
164
|
-
├── __init__.py
|
|
165
|
-
├── test_types.py
|
|
166
|
-
└── test_snapshots.py
|
|
167
|
-
```
|
|
168
|
-
|
|
169
|
-
---
|
|
170
|
-
|
|
171
|
-
## Development
|
|
172
|
-
|
|
173
|
-
```bash
|
|
174
|
-
git clone https://github.com/Flux-Frontiers/KG_utils.git
|
|
175
|
-
cd KG_utils
|
|
176
|
-
poetry install --with dev
|
|
177
|
-
```
|
|
178
|
-
|
|
179
|
-
Run the test suite:
|
|
180
|
-
|
|
181
|
-
```bash
|
|
182
|
-
poetry run pytest
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
---
|
|
186
|
-
|
|
187
|
-
## License
|
|
188
|
-
|
|
189
|
-
[Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
|
|
190
|
-
|
|
191
|
-
Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
|
|
@@ -1,14 +0,0 @@
|
|
|
1
|
-
"""kg_utils.types — Core dataclasses and base classes for the KGModule SDK."""
|
|
2
|
-
|
|
3
|
-
from kg_utils.types.specs import EdgeSpec, NodeSpec, QueryResult, SnippetPack
|
|
4
|
-
from kg_utils.types.extractor import KGExtractor
|
|
5
|
-
from kg_utils.types.module import KGModule
|
|
6
|
-
|
|
7
|
-
__all__ = [
|
|
8
|
-
"EdgeSpec",
|
|
9
|
-
"KGExtractor",
|
|
10
|
-
"KGModule",
|
|
11
|
-
"NodeSpec",
|
|
12
|
-
"QueryResult",
|
|
13
|
-
"SnippetPack",
|
|
14
|
-
]
|
|
@@ -1,68 +0,0 @@
|
|
|
1
|
-
"""kg_utils/types/extractor.py — Abstract base class for KG extractors."""
|
|
2
|
-
|
|
3
|
-
from __future__ import annotations
|
|
4
|
-
|
|
5
|
-
from collections.abc import Iterator
|
|
6
|
-
from pathlib import Path
|
|
7
|
-
from typing import Any
|
|
8
|
-
|
|
9
|
-
from kg_utils.types.specs import EdgeSpec, NodeSpec
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
class KGExtractor:
|
|
13
|
-
"""Base class for knowledge-graph extractors.
|
|
14
|
-
|
|
15
|
-
Subclasses must implement :meth:`node_kinds`, :meth:`edge_kinds`,
|
|
16
|
-
and :meth:`extract`.
|
|
17
|
-
|
|
18
|
-
:param repo_path: Absolute path to the repository or corpus root.
|
|
19
|
-
:param config: Optional domain-specific configuration dict.
|
|
20
|
-
"""
|
|
21
|
-
|
|
22
|
-
def __init__(self, repo_path: Path, config: dict[str, Any] | None = None) -> None:
|
|
23
|
-
self.repo_path = repo_path
|
|
24
|
-
self.config = config or {}
|
|
25
|
-
|
|
26
|
-
def node_kinds(self) -> list[str]:
|
|
27
|
-
"""Return canonical node kind names.
|
|
28
|
-
|
|
29
|
-
:return: List of node kind strings.
|
|
30
|
-
"""
|
|
31
|
-
raise NotImplementedError
|
|
32
|
-
|
|
33
|
-
def edge_kinds(self) -> list[str]:
|
|
34
|
-
"""Return canonical edge relation types.
|
|
35
|
-
|
|
36
|
-
:return: List of edge relation strings.
|
|
37
|
-
"""
|
|
38
|
-
raise NotImplementedError
|
|
39
|
-
|
|
40
|
-
def meaningful_node_kinds(self) -> list[str]:
|
|
41
|
-
"""Return node kinds included in the vector index and coverage metrics.
|
|
42
|
-
|
|
43
|
-
Override to exclude structural stubs from the default (all node_kinds).
|
|
44
|
-
|
|
45
|
-
:return: Subset of node_kinds() to index semantically.
|
|
46
|
-
"""
|
|
47
|
-
return self.node_kinds()
|
|
48
|
-
|
|
49
|
-
def coverage_metric(self, nodes: list[NodeSpec]) -> float:
|
|
50
|
-
"""Compute a domain coverage quality metric.
|
|
51
|
-
|
|
52
|
-
Default: fraction of meaningful nodes with a non-empty docstring.
|
|
53
|
-
|
|
54
|
-
:param nodes: All extracted NodeSpec objects.
|
|
55
|
-
:return: Coverage score in [0.0, 1.0].
|
|
56
|
-
"""
|
|
57
|
-
meaningful = [n for n in nodes if n.kind in self.meaningful_node_kinds()]
|
|
58
|
-
if not meaningful:
|
|
59
|
-
return 0.0
|
|
60
|
-
covered = sum(1 for n in meaningful if n.docstring.strip())
|
|
61
|
-
return covered / len(meaningful)
|
|
62
|
-
|
|
63
|
-
def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
|
|
64
|
-
"""Traverse the source and yield NodeSpec / EdgeSpec objects.
|
|
65
|
-
|
|
66
|
-
:return: Iterator of NodeSpec and EdgeSpec objects.
|
|
67
|
-
"""
|
|
68
|
-
raise NotImplementedError
|
|
@@ -1,87 +0,0 @@
|
|
|
1
|
-
"""kg_utils/types/module.py — Abstract base class for KG modules."""
|
|
2
|
-
|
|
3
|
-
from __future__ import annotations
|
|
4
|
-
|
|
5
|
-
from pathlib import Path
|
|
6
|
-
from typing import Any
|
|
7
|
-
|
|
8
|
-
from kg_utils.types.extractor import KGExtractor
|
|
9
|
-
from kg_utils.types.specs import QueryResult, SnippetPack
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
class KGModule:
|
|
13
|
-
"""Base class for knowledge-graph modules.
|
|
14
|
-
|
|
15
|
-
Subclasses must implement :meth:`make_extractor`, :meth:`kind`,
|
|
16
|
-
and should override :meth:`build`, :meth:`query`, :meth:`stats`,
|
|
17
|
-
:meth:`pack`, and :meth:`analyze` with domain-specific logic.
|
|
18
|
-
|
|
19
|
-
:param repo_root: Absolute path to the repository or corpus root.
|
|
20
|
-
:param db_path: Path for the SQLite graph database.
|
|
21
|
-
:param lancedb_dir: Path for the LanceDB vector index directory.
|
|
22
|
-
:param config: Optional domain-specific configuration dict.
|
|
23
|
-
"""
|
|
24
|
-
|
|
25
|
-
def __init__(
|
|
26
|
-
self,
|
|
27
|
-
repo_root: Path,
|
|
28
|
-
db_path: Path | None = None,
|
|
29
|
-
lancedb_dir: Path | None = None,
|
|
30
|
-
config: dict[str, Any] | None = None,
|
|
31
|
-
) -> None:
|
|
32
|
-
self.repo_root = repo_root
|
|
33
|
-
self.db_path = db_path
|
|
34
|
-
self.lancedb_dir = lancedb_dir
|
|
35
|
-
self.config = config or {}
|
|
36
|
-
|
|
37
|
-
def make_extractor(self) -> KGExtractor:
|
|
38
|
-
"""Return the domain extractor for this module.
|
|
39
|
-
|
|
40
|
-
:return: KGExtractor subclass instance.
|
|
41
|
-
"""
|
|
42
|
-
raise NotImplementedError
|
|
43
|
-
|
|
44
|
-
def kind(self) -> str:
|
|
45
|
-
"""Return the KGKind string for this module.
|
|
46
|
-
|
|
47
|
-
:return: Kind string (e.g. "code", "meta", "doc").
|
|
48
|
-
"""
|
|
49
|
-
raise NotImplementedError
|
|
50
|
-
|
|
51
|
-
def build(self, wipe: bool = False) -> None:
|
|
52
|
-
"""Build the knowledge graph index.
|
|
53
|
-
|
|
54
|
-
:param wipe: If True, delete existing index before building.
|
|
55
|
-
"""
|
|
56
|
-
raise NotImplementedError
|
|
57
|
-
|
|
58
|
-
def query(self, q: str, k: int = 8, **kwargs: Any) -> QueryResult:
|
|
59
|
-
"""Query the knowledge graph.
|
|
60
|
-
|
|
61
|
-
:param q: Natural-language query string.
|
|
62
|
-
:param k: Number of results to return.
|
|
63
|
-
:return: QueryResult with matched nodes and edges.
|
|
64
|
-
"""
|
|
65
|
-
raise NotImplementedError
|
|
66
|
-
|
|
67
|
-
def stats(self) -> dict[str, Any]:
|
|
68
|
-
"""Return statistics about the knowledge graph.
|
|
69
|
-
|
|
70
|
-
:return: Dict with keys like total_nodes, total_edges, etc.
|
|
71
|
-
"""
|
|
72
|
-
raise NotImplementedError
|
|
73
|
-
|
|
74
|
-
def pack(self, q: str, **kwargs: Any) -> SnippetPack:
|
|
75
|
-
"""Pack query results with source context.
|
|
76
|
-
|
|
77
|
-
:param q: Natural-language query string.
|
|
78
|
-
:return: SnippetPack with nodes, edges, and snippets.
|
|
79
|
-
"""
|
|
80
|
-
raise NotImplementedError
|
|
81
|
-
|
|
82
|
-
def analyze(self) -> str:
|
|
83
|
-
"""Run full analysis and return a Markdown report.
|
|
84
|
-
|
|
85
|
-
:return: Markdown-formatted analysis report.
|
|
86
|
-
"""
|
|
87
|
-
raise NotImplementedError
|
|
@@ -1,90 +0,0 @@
|
|
|
1
|
-
"""kg_utils/types/specs.py — Core dataclasses shared by all KG modules."""
|
|
2
|
-
|
|
3
|
-
from __future__ import annotations
|
|
4
|
-
|
|
5
|
-
from dataclasses import dataclass, field
|
|
6
|
-
from typing import Any
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
@dataclass
|
|
10
|
-
class NodeSpec:
|
|
11
|
-
"""Specification for a knowledge-graph node.
|
|
12
|
-
|
|
13
|
-
:param node_id: Unique identifier, typically ``<kind>:<path>:<qualname>``.
|
|
14
|
-
:param kind: Node kind (e.g. "file", "function", "class", "directory").
|
|
15
|
-
:param name: Short display name.
|
|
16
|
-
:param qualname: Fully-qualified name or relative path.
|
|
17
|
-
:param source_path: Path to the source file (relative to repo root).
|
|
18
|
-
:param docstring: Semantic content for vector indexing.
|
|
19
|
-
"""
|
|
20
|
-
|
|
21
|
-
node_id: str
|
|
22
|
-
kind: str
|
|
23
|
-
name: str
|
|
24
|
-
qualname: str
|
|
25
|
-
source_path: str
|
|
26
|
-
docstring: str = ""
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
@dataclass
|
|
30
|
-
class EdgeSpec:
|
|
31
|
-
"""Specification for a knowledge-graph edge.
|
|
32
|
-
|
|
33
|
-
:param source_id: Node ID of the edge source.
|
|
34
|
-
:param target_id: Node ID of the edge target.
|
|
35
|
-
:param relation: Relation type (e.g. "CONTAINS", "CALLS", "IMPORTS").
|
|
36
|
-
"""
|
|
37
|
-
|
|
38
|
-
source_id: str
|
|
39
|
-
target_id: str
|
|
40
|
-
relation: str
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
@dataclass
|
|
44
|
-
class QueryResult:
|
|
45
|
-
"""Result container returned by KGModule.query().
|
|
46
|
-
|
|
47
|
-
:param nodes: List of matched node dicts.
|
|
48
|
-
:param edges: List of matched edge dicts.
|
|
49
|
-
:param seeds: Number of seed nodes from vector search.
|
|
50
|
-
:param expanded_nodes: Number of nodes after graph expansion.
|
|
51
|
-
:param returned_nodes: Number of nodes actually returned.
|
|
52
|
-
:param hop: Number of hops used in graph expansion.
|
|
53
|
-
:param rels: Relation types used in expansion.
|
|
54
|
-
"""
|
|
55
|
-
|
|
56
|
-
nodes: list[dict[str, Any]] = field(default_factory=list)
|
|
57
|
-
edges: list[dict[str, Any]] = field(default_factory=list)
|
|
58
|
-
seeds: int = 0
|
|
59
|
-
expanded_nodes: int = 0
|
|
60
|
-
returned_nodes: int = 0
|
|
61
|
-
hop: int = 0
|
|
62
|
-
rels: list[str] = field(default_factory=list)
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
@dataclass
|
|
66
|
-
class SnippetPack:
|
|
67
|
-
"""Result container returned by KGModule.pack().
|
|
68
|
-
|
|
69
|
-
:param query: The original query string.
|
|
70
|
-
:param seeds: Number of seed nodes from vector search.
|
|
71
|
-
:param expanded_nodes: Number of nodes after graph expansion.
|
|
72
|
-
:param returned_nodes: Number of nodes actually returned.
|
|
73
|
-
:param hop: Number of hops used in expansion.
|
|
74
|
-
:param rels: Relation types used in expansion.
|
|
75
|
-
:param model: Embedding model identifier.
|
|
76
|
-
:param nodes: Node dicts included in the pack.
|
|
77
|
-
:param edges: Edge dicts included in the pack.
|
|
78
|
-
:param snippets: Source-code snippets (for code KGs).
|
|
79
|
-
"""
|
|
80
|
-
|
|
81
|
-
query: str
|
|
82
|
-
seeds: int = 0
|
|
83
|
-
expanded_nodes: int = 0
|
|
84
|
-
returned_nodes: int = 0
|
|
85
|
-
hop: int = 0
|
|
86
|
-
rels: list[str] = field(default_factory=list)
|
|
87
|
-
model: str = ""
|
|
88
|
-
nodes: list[Any] = field(default_factory=list)
|
|
89
|
-
edges: list[Any] = field(default_factory=list)
|
|
90
|
-
snippets: list[Any] = field(default_factory=list)
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|