kgmodule-utils 0.3.0__tar.gz → 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. kgmodule_utils-0.4.0/PKG-INFO +280 -0
  2. kgmodule_utils-0.4.0/README.md +245 -0
  3. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/pyproject.toml +14 -22
  4. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/__init__.py +9 -4
  5. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/pipeline.py +2 -2
  6. kgmodule_utils-0.4.0/src/kg_utils/synthesis/__init__.py +79 -0
  7. kgmodule_utils-0.4.0/src/kg_utils/synthesis/_config.py +155 -0
  8. kgmodule_utils-0.4.0/src/kg_utils/synthesis/_image.py +235 -0
  9. kgmodule_utils-0.4.0/src/kg_utils/synthesis/_text.py +178 -0
  10. kgmodule_utils-0.3.0/PKG-INFO +0 -216
  11. kgmodule_utils-0.3.0/README.md +0 -191
  12. kgmodule_utils-0.3.0/src/kg_utils/types/__init__.py +0 -14
  13. kgmodule_utils-0.3.0/src/kg_utils/types/extractor.py +0 -68
  14. kgmodule_utils-0.3.0/src/kg_utils/types/module.py +0 -87
  15. kgmodule_utils-0.3.0/src/kg_utils/types/specs.py +0 -90
  16. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/LICENSE +0 -0
  17. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/embed.py +0 -0
  18. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/embedder.py +0 -0
  19. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/extractor.py +0 -0
  20. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/module.py +0 -0
  21. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/py.typed +0 -0
  22. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/semantic.py +0 -0
  23. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/snapshots/__init__.py +0 -0
  24. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/snapshots/manager.py +0 -0
  25. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/snapshots/models.py +0 -0
  26. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/specs.py +0 -0
  27. {kgmodule_utils-0.3.0 → kgmodule_utils-0.4.0}/src/kg_utils/store.py +0 -0
@@ -0,0 +1,280 @@
1
+ Metadata-Version: 2.4
2
+ Name: kgmodule-utils
3
+ Version: 0.4.0
4
+ Summary: Shared types, graph store, semantic index, and pipeline base for the KGModule SDK
5
+ License: Elastic-2.0
6
+ License-File: LICENSE
7
+ Keywords: knowledge-graph,kgmodule,sdk,types,snapshots
8
+ Author: Eric G. Suchanek, PhD
9
+ Author-email: suchanek@flux-frontiers.com
10
+ Requires-Python: >=3.12,<3.14
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Programming Language :: Python :: 3.13
16
+ Provides-Extra: semantic
17
+ Provides-Extra: synthesis
18
+ Provides-Extra: synthesis-mflux
19
+ Requires-Dist: httpx (>=0.27.0) ; extra == "synthesis"
20
+ Requires-Dist: httpx (>=0.27.0) ; extra == "synthesis-mflux"
21
+ Requires-Dist: lancedb (>=0.19.0) ; extra == "semantic"
22
+ Requires-Dist: mflux (>=0.9.0) ; extra == "synthesis-mflux"
23
+ Requires-Dist: numpy (>=1.24.0) ; extra == "semantic"
24
+ Requires-Dist: openai (>=1.30.0) ; extra == "synthesis"
25
+ Requires-Dist: openai (>=1.30.0) ; extra == "synthesis-mflux"
26
+ Requires-Dist: pillow (>=10.0.0) ; extra == "synthesis"
27
+ Requires-Dist: pillow (>=10.0.0) ; extra == "synthesis-mflux"
28
+ Requires-Dist: sentence-transformers (>=5.4.1) ; extra == "semantic"
29
+ Requires-Dist: torch (>=2.5.1) ; extra == "semantic"
30
+ Requires-Dist: transformers (>=4.40.0,<4.57) ; extra == "semantic"
31
+ Requires-Dist: ty (>=0.0.44,<0.0.45)
32
+ Project-URL: Repository, https://github.com/Flux-Frontiers/kg_utils
33
+ Description-Content-Type: text/markdown
34
+
35
+
36
+ [![Python](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue.svg)](https://www.python.org/)
37
+ [![License: Elastic-2.0](https://img.shields.io/badge/License-Elastic%202.0-blue.svg)](https://www.elastic.co/licensing/elastic-license)
38
+ [![Version](https://img.shields.io/badge/version-0.3.1-blue.svg)](https://github.com/Flux-Frontiers/KG_utils/releases)
39
+ [![CI](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml/badge.svg)](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
40
+ [![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
41
+
42
+ # kgmodule-utils
43
+
44
+ **kgmodule-utils** — Shared graph store, semantic index, pipeline base, and snapshot infrastructure for the KGModule SDK.
45
+
46
+ *Author: Eric G. Suchanek, PhD*
47
+
48
+ *Flux-Frontiers, Liberty TWP, OH*
49
+
50
+ ---
51
+
52
+ ## Overview
53
+
54
+ kgmodule-utils is the **shared SDK layer** for the Flux-Frontiers knowledge-graph ecosystem. It provides everything a domain KG module needs — from type abstractions and SQLite graph storage through LanceDB vector indexing and a full build/query/pack pipeline — so domain authors implement only what is specific to their source domain.
55
+
56
+ Every KGModule implementation — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), and others — subclasses `KGModule` from here and implements exactly three methods: `make_extractor()`, `kind()`, and `analyze()`.
57
+
58
+ ---
59
+
60
+ ## Features
61
+
62
+ - **`kg_utils.specs`** — `NodeSpec`, `EdgeSpec`, `BuildStats`, `QueryResult`, `SnippetPack` dataclasses
63
+ - **`kg_utils.extractor`** — `KGExtractor` ABC: `extract()`, `node_kinds()`, `edge_kinds()`, `coverage_metric()`
64
+ - **`kg_utils.store`** — `GraphStore`: SQLite-backed node/edge store with BFS expansion, symbol resolution, caller lookup, and provenance recording
65
+ - **`kg_utils.semantic`** — `SemanticIndex` (LanceDB), `SentenceTransformerEmbedder`, `SeedHit`, model registry, `resolve_model_path()`
66
+ - **`kg_utils.pipeline`** — `KGModule`: full build → query → pack pipeline base with hybrid semantic + lexical reranking and snippet extraction
67
+ - **`kg_utils.embedder`** — `get_embedder()`, `wrap_embedder()`, `load_sentence_transformer()` factory functions
68
+ - **`kg_utils.embed`** — `Embedder` protocol, `DEFAULT_MODEL`, `KNOWN_MODELS`, `resolve_model_path()`
69
+ - **`kg_utils.snapshots`** — `Snapshot`, `SnapshotManager`, `SnapshotManifest` for temporal metric tracking
70
+
71
+ ---
72
+
73
+ ## Installation
74
+
75
+ **Requirements:** Python ≥ 3.12, < 3.14
76
+
77
+ ### Core only (stdlib, no optional deps)
78
+
79
+ ```bash
80
+ pip install kgmodule-utils
81
+ ```
82
+
83
+ ### With semantic search (LanceDB + sentence-transformers)
84
+
85
+ ```bash
86
+ pip install 'kgmodule-utils[semantic]'
87
+ ```
88
+
89
+ ### In a Poetry project
90
+
91
+ ```toml
92
+ [tool.poetry.dependencies]
93
+ kgmodule-utils = { version = ">=0.3.1", extras = ["semantic"] }
94
+ ```
95
+
96
+ ---
97
+
98
+ ## Quick Start
99
+
100
+ ### Build a domain KG module
101
+
102
+ ```python
103
+ from collections.abc import Iterator
104
+ from pathlib import Path
105
+
106
+ from kg_utils.extractor import KGExtractor
107
+ from kg_utils.pipeline import KGModule
108
+ from kg_utils.specs import EdgeSpec, NodeSpec
109
+
110
+
111
+ class MyExtractor(KGExtractor):
112
+ def node_kinds(self) -> list[str]:
113
+ return ["document", "section"]
114
+
115
+ def edge_kinds(self) -> list[str]:
116
+ return ["CONTAINS"]
117
+
118
+ def meaningful_node_kinds(self) -> list[str]:
119
+ return ["section"]
120
+
121
+ def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
122
+ for doc in self.repo_path.glob("**/*.md"):
123
+ doc_id = f"document:{doc}"
124
+ yield NodeSpec(node_id=doc_id, kind="document",
125
+ name=doc.stem, qualname=doc.stem,
126
+ source_path=str(doc))
127
+ # … yield sections and CONTAINS edges
128
+
129
+
130
+ class MyKG(KGModule):
131
+ _default_dir = ".mykg"
132
+
133
+ def make_extractor(self) -> KGExtractor:
134
+ return MyExtractor(self.repo_root)
135
+
136
+ def kind(self) -> str:
137
+ return "my"
138
+
139
+ def analyze(self) -> str:
140
+ s = self.stats()
141
+ return f"# MyKG\nnodes={s['total_nodes']}"
142
+
143
+
144
+ # Build and query
145
+ kg = MyKG("/path/to/repo")
146
+ kg.build(wipe=True)
147
+
148
+ result = kg.query("authentication flow", k=8, hop=1)
149
+ pack = kg.pack("error handling", max_nodes=10)
150
+ print(pack.to_markdown())
151
+ ```
152
+
153
+ ### Track metrics over time
154
+
155
+ ```python
156
+ from kg_utils.snapshots import SnapshotManager
157
+
158
+ mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")
159
+
160
+ snapshot = mgr.capture(
161
+ version="1.0.0",
162
+ branch="main",
163
+ graph_stats_dict=kg.stats(),
164
+ )
165
+ mgr.save_snapshot(snapshot)
166
+
167
+ snaps = mgr.list_snapshots(limit=5)
168
+ delta = mgr.diff_snapshots(snaps[-1]["key"], snaps[0]["key"])
169
+ ```
170
+
171
+ ---
172
+
173
+ ## API Reference
174
+
175
+ ### `kg_utils.specs`
176
+
177
+ | Class | Description |
178
+ |---|---|
179
+ | `NodeSpec` | Graph node: `node_id`, `kind`, `name`, `qualname`, `source_path`, `lineno`, `end_lineno`, `docstring`, `metadata` |
180
+ | `EdgeSpec` | Graph edge: `source_id`, `target_id`, `relation`, `weight`, `metadata` |
181
+ | `BuildStats` | Build result: node/edge counts, indexed rows, embedding dim |
182
+ | `QueryResult` | Query result: nodes, edges, seeds, hop, relevance metadata |
183
+ | `SnippetPack` | Pack result: nodes with snippets, `to_markdown()`, `to_json()`, `save()` |
184
+
185
+ ### `kg_utils.extractor`
186
+
187
+ | Class | Description |
188
+ |---|---|
189
+ | `KGExtractor` | ABC — implement `node_kinds()`, `edge_kinds()`, `extract()` |
190
+
191
+ ### `kg_utils.store`
192
+
193
+ | Class | Description |
194
+ |---|---|
195
+ | `GraphStore` | SQLite persistence: `write()`, `expand()`, `query_nodes()`, `resolve_symbols()`, `callers_of()`, `stats()` |
196
+
197
+ ### `kg_utils.semantic`
198
+
199
+ | Class / function | Description |
200
+ |---|---|
201
+ | `SemanticIndex` | LanceDB vector index: `build()`, `search()` |
202
+ | `SentenceTransformerEmbedder` | Local embedding via sentence-transformers |
203
+ | `resolve_model_path()` | Resolve model name / alias to local cache path |
204
+ | `suppress_ingestion_logging()` | Silence verbose HF / tqdm output during ingestion |
205
+
206
+ ### `kg_utils.pipeline`
207
+
208
+ | Class | Description |
209
+ |---|---|
210
+ | `KGModule` | Concrete base — implement `make_extractor()`, `kind()`, `analyze()`; get `build()`, `query()`, `pack()`, `stats()` for free |
211
+
212
+ ### `kg_utils.snapshots`
213
+
214
+ | Class | Description |
215
+ |---|---|
216
+ | `Snapshot` | Temporal snapshot keyed by git tree hash with metrics and deltas |
217
+ | `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
218
+ | `SnapshotManifest` | Fast-lookup index with format versioning |
219
+
220
+ ---
221
+
222
+ ## Project Structure
223
+
224
+ ```
225
+ KG_utils/
226
+ ├── pyproject.toml
227
+ ├── src/
228
+ │ └── kg_utils/
229
+ │ ├── __init__.py
230
+ │ ├── specs.py # NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack
231
+ │ ├── extractor.py # KGExtractor ABC
232
+ │ ├── store.py # GraphStore (SQLite)
233
+ │ ├── semantic.py # SemanticIndex, SentenceTransformerEmbedder, SeedHit
234
+ │ ├── pipeline.py # KGModule concrete base class
235
+ │ ├── module.py # Re-export shim
236
+ │ ├── embed.py # Embedder protocol, model registry
237
+ │ ├── embedder.py # SentenceTransformerEmbedder factory functions
238
+ │ └── snapshots/
239
+ │ ├── __init__.py
240
+ │ ├── models.py # Snapshot, SnapshotManifest, PruneResult
241
+ │ └── manager.py # SnapshotManager
242
+ └── tests/
243
+ ├── test_store.py # GraphStore unit tests
244
+ ├── test_pipeline_utils.py # Pipeline utility function tests
245
+ ├── test_pipeline_module.py # End-to-end integration tests (--integration)
246
+ ├── test_types.py # Spec dataclass and KGExtractor tests
247
+ ├── test_snapshots.py # Snapshot lifecycle tests
248
+ └── test_integration.py # Cross-module integration tests
249
+ ```
250
+
251
+ ---
252
+
253
+ ## Development
254
+
255
+ ```bash
256
+ git clone https://github.com/Flux-Frontiers/KG_utils.git
257
+ cd KG_utils
258
+ poetry install --with dev
259
+ ```
260
+
261
+ Run the fast test suite (no model downloads):
262
+
263
+ ```bash
264
+ poetry run pytest -m "not integration"
265
+ ```
266
+
267
+ Run all tests including semantic/integration (requires `[semantic]` extra):
268
+
269
+ ```bash
270
+ poetry run pytest
271
+ ```
272
+
273
+ ---
274
+
275
+ ## License
276
+
277
+ [Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
278
+
279
+ Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
280
+
@@ -0,0 +1,245 @@
1
+
2
+ [![Python](https://img.shields.io/badge/python-3.12%20%7C%203.13-blue.svg)](https://www.python.org/)
3
+ [![License: Elastic-2.0](https://img.shields.io/badge/License-Elastic%202.0-blue.svg)](https://www.elastic.co/licensing/elastic-license)
4
+ [![Version](https://img.shields.io/badge/version-0.3.1-blue.svg)](https://github.com/Flux-Frontiers/KG_utils/releases)
5
+ [![CI](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml/badge.svg)](https://github.com/Flux-Frontiers/KG_utils/actions/workflows/ci.yml)
6
+ [![Poetry](https://img.shields.io/endpoint?url=https://python-poetry.org/badge/v0.json)](https://python-poetry.org/)
7
+
8
+ # kgmodule-utils
9
+
10
+ **kgmodule-utils** — Shared graph store, semantic index, pipeline base, and snapshot infrastructure for the KGModule SDK.
11
+
12
+ *Author: Eric G. Suchanek, PhD*
13
+
14
+ *Flux-Frontiers, Liberty TWP, OH*
15
+
16
+ ---
17
+
18
+ ## Overview
19
+
20
+ kgmodule-utils is the **shared SDK layer** for the Flux-Frontiers knowledge-graph ecosystem. It provides everything a domain KG module needs — from type abstractions and SQLite graph storage through LanceDB vector indexing and a full build/query/pack pipeline — so domain authors implement only what is specific to their source domain.
21
+
22
+ Every KGModule implementation — [PyCodeKG](https://github.com/Flux-Frontiers/pycode_kg), [DocKG](https://github.com/Flux-Frontiers/doc_kg), and others — subclasses `KGModule` from here and implements exactly three methods: `make_extractor()`, `kind()`, and `analyze()`.
23
+
24
+ ---
25
+
26
+ ## Features
27
+
28
+ - **`kg_utils.specs`** — `NodeSpec`, `EdgeSpec`, `BuildStats`, `QueryResult`, `SnippetPack` dataclasses
29
+ - **`kg_utils.extractor`** — `KGExtractor` ABC: `extract()`, `node_kinds()`, `edge_kinds()`, `coverage_metric()`
30
+ - **`kg_utils.store`** — `GraphStore`: SQLite-backed node/edge store with BFS expansion, symbol resolution, caller lookup, and provenance recording
31
+ - **`kg_utils.semantic`** — `SemanticIndex` (LanceDB), `SentenceTransformerEmbedder`, `SeedHit`, model registry, `resolve_model_path()`
32
+ - **`kg_utils.pipeline`** — `KGModule`: full build → query → pack pipeline base with hybrid semantic + lexical reranking and snippet extraction
33
+ - **`kg_utils.embedder`** — `get_embedder()`, `wrap_embedder()`, `load_sentence_transformer()` factory functions
34
+ - **`kg_utils.embed`** — `Embedder` protocol, `DEFAULT_MODEL`, `KNOWN_MODELS`, `resolve_model_path()`
35
+ - **`kg_utils.snapshots`** — `Snapshot`, `SnapshotManager`, `SnapshotManifest` for temporal metric tracking
36
+
37
+ ---
38
+
39
+ ## Installation
40
+
41
+ **Requirements:** Python ≥ 3.12, < 3.14
42
+
43
+ ### Core only (stdlib, no optional deps)
44
+
45
+ ```bash
46
+ pip install kgmodule-utils
47
+ ```
48
+
49
+ ### With semantic search (LanceDB + sentence-transformers)
50
+
51
+ ```bash
52
+ pip install 'kgmodule-utils[semantic]'
53
+ ```
54
+
55
+ ### In a Poetry project
56
+
57
+ ```toml
58
+ [tool.poetry.dependencies]
59
+ kgmodule-utils = { version = ">=0.3.1", extras = ["semantic"] }
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Quick Start
65
+
66
+ ### Build a domain KG module
67
+
68
+ ```python
69
+ from collections.abc import Iterator
70
+ from pathlib import Path
71
+
72
+ from kg_utils.extractor import KGExtractor
73
+ from kg_utils.pipeline import KGModule
74
+ from kg_utils.specs import EdgeSpec, NodeSpec
75
+
76
+
77
+ class MyExtractor(KGExtractor):
78
+ def node_kinds(self) -> list[str]:
79
+ return ["document", "section"]
80
+
81
+ def edge_kinds(self) -> list[str]:
82
+ return ["CONTAINS"]
83
+
84
+ def meaningful_node_kinds(self) -> list[str]:
85
+ return ["section"]
86
+
87
+ def extract(self) -> Iterator[NodeSpec | EdgeSpec]:
88
+ for doc in self.repo_path.glob("**/*.md"):
89
+ doc_id = f"document:{doc}"
90
+ yield NodeSpec(node_id=doc_id, kind="document",
91
+ name=doc.stem, qualname=doc.stem,
92
+ source_path=str(doc))
93
+ # … yield sections and CONTAINS edges
94
+
95
+
96
+ class MyKG(KGModule):
97
+ _default_dir = ".mykg"
98
+
99
+ def make_extractor(self) -> KGExtractor:
100
+ return MyExtractor(self.repo_root)
101
+
102
+ def kind(self) -> str:
103
+ return "my"
104
+
105
+ def analyze(self) -> str:
106
+ s = self.stats()
107
+ return f"# MyKG\nnodes={s['total_nodes']}"
108
+
109
+
110
+ # Build and query
111
+ kg = MyKG("/path/to/repo")
112
+ kg.build(wipe=True)
113
+
114
+ result = kg.query("authentication flow", k=8, hop=1)
115
+ pack = kg.pack("error handling", max_nodes=10)
116
+ print(pack.to_markdown())
117
+ ```
118
+
119
+ ### Track metrics over time
120
+
121
+ ```python
122
+ from kg_utils.snapshots import SnapshotManager
123
+
124
+ mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")
125
+
126
+ snapshot = mgr.capture(
127
+ version="1.0.0",
128
+ branch="main",
129
+ graph_stats_dict=kg.stats(),
130
+ )
131
+ mgr.save_snapshot(snapshot)
132
+
133
+ snaps = mgr.list_snapshots(limit=5)
134
+ delta = mgr.diff_snapshots(snaps[-1]["key"], snaps[0]["key"])
135
+ ```
136
+
137
+ ---
138
+
139
+ ## API Reference
140
+
141
+ ### `kg_utils.specs`
142
+
143
+ | Class | Description |
144
+ |---|---|
145
+ | `NodeSpec` | Graph node: `node_id`, `kind`, `name`, `qualname`, `source_path`, `lineno`, `end_lineno`, `docstring`, `metadata` |
146
+ | `EdgeSpec` | Graph edge: `source_id`, `target_id`, `relation`, `weight`, `metadata` |
147
+ | `BuildStats` | Build result: node/edge counts, indexed rows, embedding dim |
148
+ | `QueryResult` | Query result: nodes, edges, seeds, hop, relevance metadata |
149
+ | `SnippetPack` | Pack result: nodes with snippets, `to_markdown()`, `to_json()`, `save()` |
150
+
151
+ ### `kg_utils.extractor`
152
+
153
+ | Class | Description |
154
+ |---|---|
155
+ | `KGExtractor` | ABC — implement `node_kinds()`, `edge_kinds()`, `extract()` |
156
+
157
+ ### `kg_utils.store`
158
+
159
+ | Class | Description |
160
+ |---|---|
161
+ | `GraphStore` | SQLite persistence: `write()`, `expand()`, `query_nodes()`, `resolve_symbols()`, `callers_of()`, `stats()` |
162
+
163
+ ### `kg_utils.semantic`
164
+
165
+ | Class / function | Description |
166
+ |---|---|
167
+ | `SemanticIndex` | LanceDB vector index: `build()`, `search()` |
168
+ | `SentenceTransformerEmbedder` | Local embedding via sentence-transformers |
169
+ | `resolve_model_path()` | Resolve model name / alias to local cache path |
170
+ | `suppress_ingestion_logging()` | Silence verbose HF / tqdm output during ingestion |
171
+
172
+ ### `kg_utils.pipeline`
173
+
174
+ | Class | Description |
175
+ |---|---|
176
+ | `KGModule` | Concrete base — implement `make_extractor()`, `kind()`, `analyze()`; get `build()`, `query()`, `pack()`, `stats()` for free |
177
+
178
+ ### `kg_utils.snapshots`
179
+
180
+ | Class | Description |
181
+ |---|---|
182
+ | `Snapshot` | Temporal snapshot keyed by git tree hash with metrics and deltas |
183
+ | `SnapshotManager` | Capture, persist, load, list, diff, and prune snapshots |
184
+ | `SnapshotManifest` | Fast-lookup index with format versioning |
185
+
186
+ ---
187
+
188
+ ## Project Structure
189
+
190
+ ```
191
+ KG_utils/
192
+ ├── pyproject.toml
193
+ ├── src/
194
+ │ └── kg_utils/
195
+ │ ├── __init__.py
196
+ │ ├── specs.py # NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack
197
+ │ ├── extractor.py # KGExtractor ABC
198
+ │ ├── store.py # GraphStore (SQLite)
199
+ │ ├── semantic.py # SemanticIndex, SentenceTransformerEmbedder, SeedHit
200
+ │ ├── pipeline.py # KGModule concrete base class
201
+ │ ├── module.py # Re-export shim
202
+ │ ├── embed.py # Embedder protocol, model registry
203
+ │ ├── embedder.py # SentenceTransformerEmbedder factory functions
204
+ │ └── snapshots/
205
+ │ ├── __init__.py
206
+ │ ├── models.py # Snapshot, SnapshotManifest, PruneResult
207
+ │ └── manager.py # SnapshotManager
208
+ └── tests/
209
+ ├── test_store.py # GraphStore unit tests
210
+ ├── test_pipeline_utils.py # Pipeline utility function tests
211
+ ├── test_pipeline_module.py # End-to-end integration tests (--integration)
212
+ ├── test_types.py # Spec dataclass and KGExtractor tests
213
+ ├── test_snapshots.py # Snapshot lifecycle tests
214
+ └── test_integration.py # Cross-module integration tests
215
+ ```
216
+
217
+ ---
218
+
219
+ ## Development
220
+
221
+ ```bash
222
+ git clone https://github.com/Flux-Frontiers/KG_utils.git
223
+ cd KG_utils
224
+ poetry install --with dev
225
+ ```
226
+
227
+ Run the fast test suite (no model downloads):
228
+
229
+ ```bash
230
+ poetry run pytest -m "not integration"
231
+ ```
232
+
233
+ Run all tests including semantic/integration (requires `[semantic]` extra):
234
+
235
+ ```bash
236
+ poetry run pytest
237
+ ```
238
+
239
+ ---
240
+
241
+ ## License
242
+
243
+ [Elastic License 2.0](https://www.elastic.co/licensing/elastic-license) — see [LICENSE](LICENSE).
244
+
245
+ Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
@@ -10,7 +10,7 @@ build-backend = "poetry.core.masonry.api"
10
10
 
11
11
  [project]
12
12
  name = "kgmodule-utils"
13
- version = "0.3.0"
13
+ version = "0.4.0"
14
14
  description = "Shared types, graph store, semantic index, and pipeline base for the KGModule SDK"
15
15
  readme = "README.md"
16
16
  license = { text = "Elastic-2.0" }
@@ -26,7 +26,7 @@ classifiers = [
26
26
  "Programming Language :: Python :: 3.13",
27
27
  ]
28
28
  requires-python = ">=3.12,<3.14"
29
- dependencies = []
29
+ dependencies = ["ty (>=0.0.44,<0.0.45)"]
30
30
 
31
31
  [project.optional-dependencies]
32
32
  semantic = [
@@ -36,6 +36,17 @@ semantic = [
36
36
  "torch>=2.5.1",
37
37
  "transformers>=4.40.0,<4.57",
38
38
  ]
39
+ synthesis = [
40
+ "httpx>=0.27.0",
41
+ "openai>=1.30.0",
42
+ "pillow>=10.0.0",
43
+ ]
44
+ synthesis-mflux = [
45
+ "httpx>=0.27.0",
46
+ "openai>=1.30.0",
47
+ "pillow>=10.0.0",
48
+ "mflux>=0.9.0",
49
+ ]
39
50
 
40
51
  [project.urls]
41
52
  Repository = "https://github.com/Flux-Frontiers/kg_utils"
@@ -58,7 +69,7 @@ pytest = "^8.0.0"
58
69
  pytest-cov = "^5.0.0"
59
70
  black = ">=22.0"
60
71
  ruff = ">=0.4.0"
61
- mypy = ">=1.10.0"
72
+ ty = ">=0.0.41"
62
73
 
63
74
  [tool.black]
64
75
  line-length = 100
@@ -76,22 +87,3 @@ init-hook = "import sys; sys.path.insert(0, 'src')"
76
87
  disable = [
77
88
  "missing-module-docstring",
78
89
  ]
79
-
80
- [tool.mypy]
81
- python_version = "3.12"
82
- strict = true
83
- warn_unused_ignores = true
84
- disallow_untyped_defs = true
85
-
86
- [[tool.mypy.overrides]]
87
- module = [
88
- "sentence_transformers.*",
89
- "transformers.*",
90
- "numpy.*",
91
- "lancedb",
92
- ]
93
- ignore_missing_imports = true
94
-
95
- [[tool.mypy.overrides]]
96
- module = "kg_utils.embedder"
97
- disallow_untyped_calls = false
@@ -1,8 +1,8 @@
1
1
  """kg_utils — Shared types, store, semantic index, and pipeline base for the KGModule SDK.
2
2
 
3
3
  Sub-packages / modules:
4
- kg_utils.types — NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack,
5
- KGExtractor (abstract), KGModule (abstract interface).
4
+ kg_utils.specs — NodeSpec, EdgeSpec, BuildStats, QueryResult, SnippetPack.
5
+ kg_utils.extractor — KGExtractor abstract base class.
6
6
  kg_utils.store — GraphStore: SQLite-backed authoritative node/edge store.
7
7
  kg_utils.semantic — Embedder, SentenceTransformerEmbedder, SemanticIndex, SeedHit.
8
8
  kg_utils.pipeline — KGModule: concrete base class with full build/query/pack pipeline.
@@ -11,10 +11,15 @@ Sub-packages / modules:
11
11
  kg_model_cache_dir(), resolve_model_path().
12
12
  kg_utils.embedder — Concrete SentenceTransformerEmbedder, get_embedder(),
13
13
  wrap_embedder(), load_sentence_transformer().
14
+ kg_utils.synthesis — Unified text + image synthesis: TextSynthesizer, ImageSynthesizer.
15
+ Backends: omlx | ollama | openai (text);
16
+ mflux-local | mflux-serve | openai (image).
14
17
 
15
18
  Optional extras
16
19
  ---------------
17
- pip install 'kgmodule-utils[semantic]' # lancedb + sentence-transformers
20
+ pip install 'kgmodule-utils[semantic]' # lancedb + sentence-transformers
21
+ pip install 'kgmodule-utils[synthesis]' # httpx + openai + pillow
22
+ pip install 'kgmodule-utils[synthesis-mflux]' # + mflux (Apple Silicon local gen)
18
23
  """
19
24
 
20
- __version__ = "0.3.0"
25
+ __version__ = "0.4.0"
@@ -38,7 +38,7 @@ from __future__ import annotations
38
38
  import re
39
39
  from abc import ABC, abstractmethod
40
40
  from pathlib import Path
41
- from typing import Any
41
+ from typing import Any, cast
42
42
 
43
43
  from kg_utils.semantic import (
44
44
  DEFAULT_MODEL,
@@ -715,7 +715,7 @@ class KGModule(ABC):
715
715
  file_nlines=len(lines),
716
716
  )
717
717
  if n.get("qualname") and n.get("_span"):
718
- spans_by_qualname[(mp, n["qualname"])] = n["_span"]
718
+ spans_by_qualname[(mp, n["qualname"])] = cast(tuple[int, int], n["_span"])
719
719
 
720
720
  raw_nodes.sort(key=lambda x: x["_rank_key"])
721
721