odin-engine 0.1.0__py3-none-any.whl → 0.2.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (63) hide show
  1. benchmarks/__init__.py +17 -17
  2. benchmarks/datasets.py +284 -284
  3. benchmarks/metrics.py +275 -275
  4. benchmarks/run_ablation.py +279 -279
  5. benchmarks/run_npll_benchmark.py +270 -270
  6. npll/__init__.py +10 -10
  7. npll/bootstrap.py +474 -474
  8. npll/core/__init__.py +33 -33
  9. npll/core/knowledge_graph.py +308 -308
  10. npll/core/logical_rules.py +496 -496
  11. npll/core/mln.py +474 -474
  12. npll/inference/__init__.py +40 -40
  13. npll/inference/e_step.py +419 -419
  14. npll/inference/elbo.py +434 -434
  15. npll/inference/m_step.py +576 -576
  16. npll/npll_model.py +631 -631
  17. npll/scoring/__init__.py +42 -42
  18. npll/scoring/embeddings.py +441 -441
  19. npll/scoring/probability.py +402 -402
  20. npll/scoring/scoring_module.py +369 -369
  21. npll/training/__init__.py +24 -24
  22. npll/training/evaluation.py +496 -496
  23. npll/training/npll_trainer.py +520 -520
  24. npll/utils/__init__.py +47 -47
  25. npll/utils/batch_utils.py +492 -492
  26. npll/utils/config.py +144 -144
  27. npll/utils/math_utils.py +338 -338
  28. odin/__init__.py +21 -20
  29. odin/engine.py +264 -264
  30. odin/schema.py +210 -0
  31. {odin_engine-0.1.0.dist-info → odin_engine-0.2.0.dist-info}/METADATA +503 -456
  32. odin_engine-0.2.0.dist-info/RECORD +63 -0
  33. {odin_engine-0.1.0.dist-info → odin_engine-0.2.0.dist-info}/licenses/LICENSE +21 -21
  34. retrieval/__init__.py +50 -50
  35. retrieval/adapters.py +140 -140
  36. retrieval/adapters_arango.py +1418 -1418
  37. retrieval/aggregators.py +707 -707
  38. retrieval/beam.py +127 -127
  39. retrieval/budget.py +60 -60
  40. retrieval/cache.py +159 -159
  41. retrieval/confidence.py +88 -88
  42. retrieval/eval.py +49 -49
  43. retrieval/linker.py +87 -87
  44. retrieval/metrics.py +105 -105
  45. retrieval/metrics_motifs.py +36 -36
  46. retrieval/orchestrator.py +571 -571
  47. retrieval/ppr/__init__.py +12 -12
  48. retrieval/ppr/anchors.py +41 -41
  49. retrieval/ppr/bippr.py +61 -61
  50. retrieval/ppr/engines.py +257 -257
  51. retrieval/ppr/global_pr.py +76 -76
  52. retrieval/ppr/indexes.py +78 -78
  53. retrieval/ppr.py +156 -156
  54. retrieval/ppr_cache.py +25 -25
  55. retrieval/scoring.py +294 -294
  56. retrieval/utils/pii_redaction.py +36 -36
  57. retrieval/writers/__init__.py +9 -9
  58. retrieval/writers/arango_writer.py +28 -28
  59. retrieval/writers/base.py +21 -21
  60. retrieval/writers/janus_writer.py +36 -36
  61. odin_engine-0.1.0.dist-info/RECORD +0 -62
  62. {odin_engine-0.1.0.dist-info → odin_engine-0.2.0.dist-info}/WHEEL +0 -0
  63. {odin_engine-0.1.0.dist-info → odin_engine-0.2.0.dist-info}/top_level.txt +0 -0
@@ -1,456 +1,503 @@
1
- Metadata-Version: 2.4
2
- Name: odin-engine
3
- Version: 0.1.0
4
- Summary: Odin: Graph intelligence engine with PPR + NPLL for AI agent navigation
5
- Author: Prescott Data
6
- License: MIT
7
- Project-URL: Homepage, https://bitbucket.org/dromos/odin-kg-engine
8
- Keywords: knowledge-graph,graph-intelligence,pagerank,neural-logic,ai-agents,graph-exploration
9
- Classifier: Development Status :: 4 - Beta
10
- Classifier: Intended Audience :: Developers
11
- Classifier: Intended Audience :: Science/Research
12
- Classifier: License :: OSI Approved :: MIT License
13
- Classifier: Programming Language :: Python :: 3
14
- Classifier: Programming Language :: Python :: 3.9
15
- Classifier: Programming Language :: Python :: 3.10
16
- Classifier: Programming Language :: Python :: 3.11
17
- Classifier: Programming Language :: Python :: 3.12
18
- Classifier: Programming Language :: Python :: 3.13
19
- Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
- Classifier: Topic :: Database
21
- Requires-Python: >=3.9
22
- Description-Content-Type: text/markdown
23
- License-File: LICENSE
24
- Requires-Dist: torch>=2.0.0
25
- Requires-Dist: python-arango>=7.0
26
- Requires-Dist: numpy>=1.20.0
27
- Requires-Dist: networkx>=3.0
28
- Requires-Dist: scipy>=1.10.0
29
- Requires-Dist: scikit-learn>=1.3.0
30
- Requires-Dist: gremlinpython>=3.5.0
31
- Provides-Extra: dev
32
- Requires-Dist: pytest>=7.0; extra == "dev"
33
- Requires-Dist: pytest-cov; extra == "dev"
34
- Requires-Dist: flake8; extra == "dev"
35
- Requires-Dist: bandit; extra == "dev"
36
- Dynamic: license-file
37
-
38
- # Odin KG Engine
39
-
40
- **Graph Intelligence for Autonomous AI Agents**
41
-
42
- Odin is a production-ready Python library that transforms how AI agents navigate knowledge graphs. It combines structural graph algorithms (Personalized PageRank), semantic plausibility scoring (NPLL), and pattern detection to guide agents toward high-signal discoveries in complex, multi-domain graphs.
43
-
44
- **Built for:** Healthcare analytics, fraud detection, regulatory compliance, supply chain intelligence, and any domain where autonomous agents need to discover patterns in graphs with 10K-5M entities.
45
-
46
- [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
47
- [![Tests](https://img.shields.io/badge/tests-62%20passing-green.svg)](tests/)
48
- [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
49
- [![PyPI](https://img.shields.io/badge/pypi-odin--kg--engine-blue)](https://pypi.org/project/odin-kg-engine/)
50
-
51
- ---
52
-
53
- ## The Problem
54
-
55
- AI agents exploring knowledge graphs face three critical challenges:
56
-
57
- 1. **Exponential Path Growth** - A 3-hop exploration from a single node in a densely connected graph can generate 100K+ paths, most of which are noise
58
- 2. **Semantic Invalidity** - Naive traversal follows edges that violate domain logic (e.g., `Patient diagnosed_by Medication`)
59
- 3. **No Prioritization** - Without ranking, agents waste turns analyzing low-value paths while missing critical patterns
60
-
61
- **Traditional approaches fail:**
62
- - **BFS/DFS**: Exponential explosion, no signal filtering
63
- - **Fixed Cypher Queries**: Only finds patterns you already know exist
64
- - **Random Walk**: No convergence guarantees, wasted compute
65
- - **LLM Prompting Alone**: Hallucinates relationships, can't verify graph structure
66
-
67
- ## The Solution
68
-
69
- Odin provides a **guided exploration framework** that acts as a navigation compass for agents:
70
-
71
- ```
72
- ┌─────────────┐ ┌──────────────────────────────────────┐ ┌─────────────────┐
73
- │ Seed │ -> │ PPR → Beam Search → NPLL → Aggregate │ -> │ Ranked Paths + │
74
- Entities (Odin Engine) Patterns + Score
75
- └─────────────┘ └──────────────────────────────────────┘ └─────────────────┘
76
- ```
77
-
78
- **Key Components:**
79
-
80
- | Component | Purpose | Impact |
81
- |-----------|---------|--------|
82
- | **Personalized PageRank (PPR)** | Identifies structurally important nodes as starting points | Reduces search space by 80% |
83
- | **Beam Search** | Efficiently explores top-K paths at each hop | Prevents exponential explosion |
84
- | **NPLL (Neural Probabilistic Logic)** | Scores edge plausibility using learned rules from your graph | Filters 60-90% of semantically invalid paths |
85
- | **Motif Detection** | Surfaces recurring patterns (e.g., "A→B→C appears 47 times") | Automatic anomaly detection |
86
- | **Triage Scoring** | 0-100 importance ranking for agent prioritization | Focuses agent compute on high-signal areas |
87
-
88
- **Results:** 10x faster exploration, 5x more relevant discoveries, agents focus on high-signal regions.
89
-
90
- ---
91
-
92
- ## Quick Start
93
-
94
- ### Installation
95
-
96
- ```bash
97
- # From PyPI (recommended)
98
- pip install odin-kg-engine
99
-
100
- # From source
101
- git clone https://github.com/prescott-data/odin-kg-engine.git
102
- cd odin-kg-engine
103
- pip install -e .
104
- ```
105
-
106
- **Requirements:**
107
- - Python 3.9+
108
- - ArangoDB 3.10+ (or compatible graph database)
109
- - PyTorch 2.0+ (for NPLL)
110
-
111
- ### 5-Minute Integration
112
-
113
- ```python
114
- from arango import ArangoClient
115
- from odin import OdinEngine
116
-
117
- # 1. Connect to your knowledge graph
118
- client = ArangoClient(hosts="http://localhost:8529")
119
- db = client.db("my_database", username="user", password="pass")
120
-
121
- # 2. Initialize Odin (auto-trains NPLL from your graph on first run)
122
- engine = OdinEngine(db=db, community_id="my_community")
123
-
124
- # 3. Explore from seed entities
125
- result = engine.retrieve(
126
- seeds=["entity/claim_123", "entity/provider_456"],
127
- max_paths=50,
128
- hop_limit=3,
129
- )
130
-
131
- # 4. Access scored paths
132
- print(f"Found {len(result['paths'])} paths")
133
- print(f"Triage Score: {result['triage']['score']}/100")
134
-
135
- for path in result['paths'][:5]:
136
- nodes = " → ".join(path['nodes'])
137
- print(f" [{path['score']:.2f}] {nodes}")
138
- ```
139
-
140
- **Output Example:**
141
- ```
142
- Found 47 paths
143
- Triage Score: 87/100
144
- [0.94] claim_123 → billed_by → provider_456 → flagged_in → audit_07
145
- [0.89] claim_123 → has_diagnosis → sepsis_dx → rare_in → nursing_home_cluster
146
- [0.82] provider_456 → prescribed → medication_999 → contraindicated_with → patient_history
147
- ```
148
-
149
- ### Self-Managing Intelligence
150
-
151
- Odin automatically manages its NPLL model lifecycle:
152
-
153
- 1. **First Run**: Extracts edge patterns from your graph, trains NPLL model (~2-5 minutes)
154
- 2. **Stores Weights**: Saves learned parameters in ArangoDB collection (`NPLLWeights`)
155
- 3. **Subsequent Runs**: Loads weights from DB and rebuilds model in ~30 seconds
156
-
157
- **No separate ML pipeline, no .pt files, no DevOps overhead.** Just initialize `OdinEngine` and it handles everything.
158
-
159
- ---
160
-
161
- ## Core Features
162
-
163
- ### 1. Intelligent Path Finding
164
-
165
- ```python
166
- # Start from suspicious entities, let Odin find connections
167
- result = engine.retrieve(
168
- seeds=["claim/CLM_99285"],
169
- max_paths=100,
170
- hop_limit=4,
171
- )
172
-
173
- # Returns scored paths with:
174
- # - Structural importance (PPR)
175
- # - Semantic plausibility (NPLL)
176
- # - Pattern frequency (motif detection)
177
- ```
178
-
179
- ### 2. Edge Plausibility Scoring
180
-
181
- ```python
182
- # Validate specific relationships
183
- score = engine.score_edge(
184
- head="entity/patient_001",
185
- relation="treated_by",
186
- tail="entity/doctor_smith"
187
- )
188
- # Returns: 0.0 (impossible) to 1.0 (highly plausible)
189
-
190
- # Use in agent decision loops
191
- if score > 0.7:
192
- agent.investigate_further(path)
193
- ```
194
-
195
- ### 3. Anchor Node Discovery
196
-
197
- ```python
198
- # Find most important nodes in your graph
199
- anchors = engine.find_anchors(
200
- seeds=["community/insurance_claims"],
201
- topn=20
202
- )
203
- # Returns top-N nodes by PageRank for targeted exploration
204
- ```
205
-
206
- ### 4. Pattern Detection
207
-
208
- ```python
209
- # Automatic motif extraction
210
- motifs = result['aggregates']['motifs']
211
- for motif in motifs:
212
- print(f"{motif['pattern']} appears {motif['count']} times")
213
-
214
- # Example output:
215
- # Claim → has_policyholder → Person (47 instances)
216
- # Provider → billed_by → Claim → flagged_in → Audit (12 instances)
217
- ```
218
-
219
- ---
220
-
221
- ## Architecture
222
-
223
- Odin is composed of four layers working in concert:
224
-
225
- ```
226
- ┌─────────────────────────────────────────────────────────────────────────────┐
227
- │ ODIN ENGINE │
228
- │ │
229
- │ ┌───────────────────────────────────────────────────────────────────────┐ │
230
- │ │ RETRIEVAL ORCHESTRATOR │ │
231
- │ │ Coordinates PPR Beam Search → NPLL → Aggregation pipeline │ │
232
- │ └───────────────────────────────────────────────────────────────────────┘ │
233
- │ │
234
- │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
235
- │ │ Graph │ │ PPR Engine │ │ NPLL Model │ │ Aggregators │ │
236
- │ │ Accessor │ │ (Anchors) │ │ (Confidence) │ │ (Motifs) │ │
237
- │ │ + Cache │ │ │ │ │ │ │ │
238
- │ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
239
- └─────────────────────────────────────────────────────────────────────────────┘
240
- ```
241
-
242
- **Component Details:**
243
-
244
- | Layer | Responsibility | Performance |
245
- |-------|----------------|-------------|
246
- | **Graph Accessor** | LRU-cached graph queries | <10ms avg per node fetch |
247
- | **PPR Engine** | Compute importance scores | ~200ms for 1K nodes |
248
- | **Beam Search** | Multi-hop path exploration | 50-500ms depending on beam width |
249
- | **NPLL Confidence** | Semantic edge filtering | ~5ms per edge (cached) |
250
- | **Aggregators** | Pattern extraction | ~100ms post-retrieval |
251
-
252
- **Typical End-to-End Latency:** 300-800ms for 50-path retrieval
253
-
254
- ---
255
-
256
- ## Use Cases
257
-
258
- ### 1. Healthcare Fraud Detection
259
- **Scenario:** Find providers billing unusual procedure combinations
260
-
261
- ```python
262
- engine = OdinEngine(db, community_id="medicare_claims")
263
- result = engine.retrieve(
264
- seeds=["provider/high_volume_clinic"],
265
- max_paths=100,
266
- )
267
- # Odin surfaces: "CPT_99285 + CPT_office_visit" pattern in 34 claims
268
- ```
269
-
270
- ### 2. Supply Chain Risk Analysis
271
- **Scenario:** Identify cascading supplier dependencies
272
-
273
- ```python
274
- result = engine.retrieve(
275
- seeds=["supplier/critical_vendor"],
276
- hop_limit=5, # Deep supply chain exploration
277
- )
278
- # Discovers: Tier-3 supplier affects 47 downstream products
279
- ```
280
-
281
- ### 3. Regulatory Compliance Checks
282
- **Scenario:** Validate entity relationships against compliance rules
283
-
284
- ```python
285
- score = engine.score_edge(
286
- head="entity/investment_fund",
287
- relation="managed_by",
288
- tail="entity/sanctioned_entity"
289
- )
290
- if score > 0.5:
291
- compliance_agent.flag_for_review()
292
- ```
293
-
294
- ---
295
-
296
- ## Documentation
297
-
298
- | Document | Description |
299
- |----------|-------------|
300
- | [**Architecture**](docs/ARCHITECTURE.md) | Complete technical design (873 lines) |
301
- | [**Agent Integration Guide**](docs/AGENT_INTEGRATION_GUIDE.md) | How to integrate with AI agents (189 lines) |
302
- | [**API Reference**](#api-reference) | Full method signatures and parameters |
303
-
304
- ---
305
-
306
- ## API Reference
307
-
308
- ### OdinEngine
309
-
310
- ```python
311
- from odin import OdinEngine
312
-
313
- engine = OdinEngine(
314
- db: StandardDatabase, # ArangoDB connection
315
- community_id: str = "global", # Scope for exploration
316
- cache_size: int = 5000, # LRU cache size
317
- auto_train: bool = True, # Auto-train NPLL if needed
318
- community_mode: str = "none" # "none" | "mapping"
319
- )
320
- ```
321
-
322
- **Methods:**
323
-
324
- #### `retrieve(seeds, max_paths, hop_limit, **kwargs)`
325
- Find and score paths from seed entities.
326
-
327
- **Parameters:**
328
- - `seeds: List[str]` - Starting entity IDs (e.g., `["entity/123"]`)
329
- - `max_paths: int = 50` - Maximum paths to return
330
- - `hop_limit: int = 3` - Maximum hops from seeds
331
- - `beam_width: int = 10` - Top-K paths to explore at each hop
332
-
333
- **Returns:**
334
- ```python
335
- {
336
- "paths": [
337
- {
338
- "nodes": ["entity/A", "entity/B", "entity/C"],
339
- "edges": [{"relation": "relates_to", "score": 0.89}],
340
- "score": 0.94
341
- }
342
- ],
343
- "triage": {"score": 87, "confidence": "high"},
344
- "aggregates": {
345
- "motifs": [{"pattern": "A→B→C", "count": 12}],
346
- "node_frequencies": {"entity/A": 47}
347
- }
348
- }
349
- ```
350
-
351
- #### `score_edge(head, relation, tail)`
352
- Score plausibility of a single edge.
353
-
354
- **Parameters:**
355
- - `head: str` - Source entity ID
356
- - `relation: str` - Edge type
357
- - `tail: str` - Target entity ID
358
-
359
- **Returns:** `float` (0.0-1.0)
360
-
361
- #### `find_anchors(seeds, topn)`
362
- Get top-N most important nodes by PageRank.
363
-
364
- **Parameters:**
365
- - `seeds: List[str]` - Seed entities for personalized PageRank
366
- - `topn: int = 20` - Number of top nodes to return
367
-
368
- **Returns:** `List[Tuple[str, float]]` - (entity_id, ppr_score)
369
-
370
- #### `retrain_model(force_retrain=True)`
371
- Force NPLL model retraining (use after major graph updates).
372
-
373
- ---
374
-
375
- ## Performance
376
-
377
- | Metric | Value | Notes |
378
- |--------|-------|-------|
379
- | **Graph Scale** | 10K-5M entities | Tested on healthcare, insurance, supply chain |
380
- | **Typical Latency** | 300-800ms | For 50-path retrieval with 3-hop limit |
381
- | **Cache Hit Rate** | >80% | After warm-up period |
382
- | **NPLL Training** | 2-5 minutes | First run, then cached |
383
- | **NPLL Loading** | ~30 seconds | Subsequent runs |
384
- | **Memory** | ~500MB-2GB | Depends on cache size and graph density |
385
-
386
- **Tested Deployments:**
387
- - Healthcare KG: 2.3M entities, 8.7M edges (avg 450ms retrieval)
388
- - Insurance Claims: 850K entities, 4.1M edges (avg 320ms retrieval)
389
- - Supply Chain: 450K entities, 2.3M edges (avg 280ms retrieval)
390
-
391
- ---
392
-
393
- ## Testing
394
-
395
- ```bash
396
- # Run all tests (62 passing)
397
- pytest tests/ -v
398
-
399
- # Unit tests only
400
- pytest tests/unit/ -v
401
-
402
- # Integration tests
403
- pytest tests/integration/ -v
404
-
405
- # With coverage report
406
- pytest tests/ --cov=odin --cov=npll --cov=retrieval --cov-report=html
407
- ```
408
-
409
- ---
410
-
411
- ## Contributing
412
-
413
- We welcome contributions! Areas of interest:
414
-
415
- - **Scalability**: Optimizations for graphs >10M entities
416
- - **Algorithms**: Alternative PPR implementations, new aggregators
417
- - **Database Support**: Neo4j, Neptune adapters
418
- - **Benchmarks**: Academic dataset comparisons
419
-
420
- ---
421
-
422
- ## License
423
-
424
- MIT License - see [LICENSE](LICENSE) for details.
425
-
426
- ---
427
-
428
- ## Authors
429
-
430
- **Prescott Data**
431
- - Muyukani Kizito - Lead Engineer
432
- - Elizabeth Nyambere - NPLL & GNN Research
433
-
434
- ---
435
-
436
- ## Citation
437
-
438
- If you use Odin in academic work, please cite:
439
-
440
- ```bibtex
441
- @software{odin_kg_engine,
442
- title={Odin: Graph Intelligence for Autonomous AI Agents},
443
- author={Prescott Data},
444
- year={2026},
445
- url={https://github.com/prescott-data/odin-kg-engine}
446
- }
447
- ```
448
-
449
- ---
450
-
451
- ## Links
452
-
453
- - [PyPI Package](https://pypi.org/project/odin-kg-engine/)
454
- - [Documentation](docs/)
455
- - [GitHub Repository](https://github.com/prescott-data/odin-kg-engine)
456
- - [Prescott Data](https://prescottdata.io)
1
+ Metadata-Version: 2.4
2
+ Name: odin-engine
3
+ Version: 0.2.0
4
+ Summary: Odin: Graph intelligence engine with PPR + NPLL for AI agent navigation
5
+ Author: Prescott Data
6
+ License: MIT
7
+ Project-URL: Homepage, https://bitbucket.org/dromos/odin-kg-engine
8
+ Keywords: knowledge-graph,graph-intelligence,pagerank,neural-logic,ai-agents,graph-exploration
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: Intended Audience :: Science/Research
12
+ Classifier: License :: OSI Approved :: MIT License
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.9
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Programming Language :: Python :: 3.13
19
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
20
+ Classifier: Topic :: Database
21
+ Requires-Python: >=3.9
22
+ Description-Content-Type: text/markdown
23
+ License-File: LICENSE
24
+ Requires-Dist: torch>=2.0.0
25
+ Requires-Dist: python-arango>=7.0
26
+ Requires-Dist: numpy>=1.20.0
27
+ Requires-Dist: networkx>=3.0
28
+ Requires-Dist: scipy>=1.10.0
29
+ Requires-Dist: scikit-learn>=1.3.0
30
+ Requires-Dist: gremlinpython>=3.5.0
31
+ Provides-Extra: dev
32
+ Requires-Dist: pytest>=7.0; extra == "dev"
33
+ Requires-Dist: pytest-cov; extra == "dev"
34
+ Requires-Dist: flake8; extra == "dev"
35
+ Requires-Dist: bandit; extra == "dev"
36
+ Requires-Dist: psutil; extra == "dev"
37
+ Dynamic: license-file
38
+
39
+ # Odin KG Engine
40
+
41
+ **Graph Intelligence for Autonomous AI Agents**
42
+
43
+ Odin is a production-ready Python library that transforms how AI agents navigate knowledge graphs. It combines structural graph algorithms (Personalized PageRank), semantic plausibility scoring (NPLL), and pattern detection to guide agents toward high-signal discoveries in complex, multi-domain graphs.
44
+
45
+ **Built for:** Healthcare analytics, fraud detection, regulatory compliance, supply chain intelligence, and any domain where autonomous agents need to discover patterns in graphs with 10K-5M entities.
46
+
47
+ [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
48
+ [![Tests](https://img.shields.io/badge/tests-62%20passing-green.svg)](tests/)
49
+ [![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)
50
+ [![PyPI](https://img.shields.io/badge/pypi-odin--kg--engine-blue)](https://pypi.org/project/odin-kg-engine/)
51
+
52
+ ---
53
+
54
+ ## The Problem
55
+
56
+ AI agents exploring knowledge graphs face three critical challenges:
57
+
58
+ 1. **Exponential Path Growth** - A 3-hop exploration from a single node in a densely connected graph can generate 100K+ paths, most of which are noise
59
+ 2. **Semantic Invalidity** - Naive traversal follows edges that violate domain logic (e.g., `Patient diagnosed_by → Medication`)
60
+ 3. **No Prioritization** - Without ranking, agents waste turns analyzing low-value paths while missing critical patterns
61
+
62
+ **Traditional approaches fail:**
63
+ - **BFS/DFS**: Exponential explosion, no signal filtering
64
+ - **Fixed Cypher Queries**: Only finds patterns you already know exist
65
+ - **Random Walk**: No convergence guarantees, wasted compute
66
+ - **LLM Prompting Alone**: Hallucinates relationships, can't verify graph structure
67
+
68
+ ## The Solution
69
+
70
+ Odin provides a **guided exploration framework** that acts as a navigation compass for agents:
71
+
72
+ ```
73
+ ┌─────────────┐ ┌──────────────────────────────────────┐ ┌─────────────────┐
74
+ Seed -> PPR → Beam Search → NPLL → Aggregate -> Ranked Paths +
75
+ │ Entities │ (Odin Engine) │ │ Patterns + Score│
76
+ └─────────────┘ └──────────────────────────────────────┘ └─────────────────┘
77
+ ```
78
+
79
+ **Key Components:**
80
+
81
+ | Component | Purpose | Impact |
82
+ |-----------|---------|--------|
83
+ | **Personalized PageRank (PPR)** | Identifies structurally important nodes as starting points | Reduces search space by 80% |
84
+ | **Beam Search** | Efficiently explores top-K paths at each hop | Prevents exponential explosion |
85
+ | **NPLL (Neural Probabilistic Logic)** | Scores edge plausibility using learned rules from your graph | Filters 60-90% of semantically invalid paths |
86
+ | **Motif Detection** | Surfaces recurring patterns (e.g., "A→B→C appears 47 times") | Automatic anomaly detection |
87
+ | **Triage Scoring** | 0-100 importance ranking for agent prioritization | Focuses agent compute on high-signal areas |
88
+
89
+ **Results:** 10x faster exploration, 5x more relevant discoveries, agents focus on high-signal regions.
90
+
91
+ ---
92
+
93
+ ## Quick Start
94
+
95
+ ### Installation
96
+
97
+ ```bash
98
+ # From PyPI (recommended)
99
+ pip install odin-kg-engine
100
+
101
+ # From source
102
+ git clone https://github.com/prescott-data/odin-kg-engine.git
103
+ cd odin-kg-engine
104
+ pip install -e .
105
+ ```
106
+
107
+ **Requirements:**
108
+ - Python 3.9+
109
+ - ArangoDB 3.10+ (or compatible graph database)
110
+ - PyTorch 2.0+ (for NPLL)
111
+
112
+ ### 5-Minute Integration
113
+
114
+ ```python
115
+ from arango import ArangoClient
116
+ from odin import OdinEngine
117
+
118
+ # 1. Connect to your knowledge graph
119
+ client = ArangoClient(hosts="http://localhost:8529")
120
+ db = client.db("my_database", username="user", password="pass")
121
+
122
+ # 2. Initialize Odin (auto-trains NPLL from your graph on first run)
123
+ engine = OdinEngine(db=db, community_id="my_community")
124
+
125
+ # 3. Explore from seed entities
126
+ result = engine.retrieve(
127
+ seeds=["entity/claim_123", "entity/provider_456"],
128
+ max_paths=50,
129
+ hop_limit=3,
130
+ )
131
+
132
+ # 4. Access scored paths
133
+ print(f"Found {len(result['paths'])} paths")
134
+ print(f"Triage Score: {result['triage']['score']}/100")
135
+
136
+ for path in result['paths'][:5]:
137
+ nodes = " → ".join(path['nodes'])
138
+ print(f" [{path['score']:.2f}] {nodes}")
139
+
140
+ # 5. NEW in v0.2.0: Inspect your database schema
141
+ from odin import SchemaInspector
142
+ inspector = SchemaInspector(db) # Uses same db connection from step 1
143
+ schema = inspector.get_schema_map()
144
+ print(f"Database: {schema['database_name']}")
145
+ print(f"Collections: {len(schema['collections'])}")
146
+ ```
147
+
148
+ **Output Example:**
149
+ ```
150
+ Found 47 paths
151
+ Triage Score: 87/100
152
+ [0.94] claim_123 → billed_by → provider_456 → flagged_in → audit_07
153
+ [0.89] claim_123 has_diagnosis sepsis_dx rare_in nursing_home_cluster
154
+ [0.82] provider_456 prescribed medication_999 contraindicated_with patient_history
155
+ ```
156
+
157
+ ### Self-Managing Intelligence
158
+
159
+ Odin automatically manages its NPLL model lifecycle:
160
+
161
+ 1. **First Run**: Extracts edge patterns from your graph, trains NPLL model (~2-5 minutes)
162
+ 2. **Stores Weights**: Saves learned parameters in ArangoDB collection (`NPLLWeights`)
163
+ 3. **Subsequent Runs**: Loads weights from DB and rebuilds model in ~30 seconds
164
+
165
+ **No separate ML pipeline, no .pt files, no DevOps overhead.** Just initialize `OdinEngine` and it handles everything.
166
+
167
+ ---
168
+
169
+ ## Core Features
170
+
171
+ ### 1. Intelligent Path Finding
172
+
173
+ ```python
174
+ # Start from suspicious entities, let Odin find connections
175
+ result = engine.retrieve(
176
+ seeds=["claim/CLM_99285"],
177
+ max_paths=100,
178
+ hop_limit=4,
179
+ )
180
+
181
+ # Returns scored paths with:
182
+ # - Structural importance (PPR)
183
+ # - Semantic plausibility (NPLL)
184
+ # - Pattern frequency (motif detection)
185
+ ```
186
+
187
+ ### 2. Edge Plausibility Scoring
188
+
189
+ ```python
190
+ # Validate specific relationships
191
+ score = engine.score_edge(
192
+ head="entity/patient_001",
193
+ relation="treated_by",
194
+ tail="entity/doctor_smith"
195
+ )
196
+ # Returns: 0.0 (impossible) to 1.0 (highly plausible)
197
+
198
+ # Use in agent decision loops
199
+ if score > 0.7:
200
+ agent.investigate_further(path)
201
+ ```
202
+
203
+ ### 3. Anchor Node Discovery
204
+
205
+ ```python
206
+ # Find most important nodes in your graph
207
+ anchors = engine.find_anchors(
208
+ seeds=["community/insurance_claims"],
209
+ topn=20
210
+ )
211
+ # Returns top-N nodes by PageRank for targeted exploration
212
+ ```
213
+
214
+ ### 4. Pattern Detection
215
+
216
+ ```python
217
+ # Automatic motif extraction
218
+ motifs = result['aggregates']['motifs']
219
+ for motif in motifs:
220
+ print(f"{motif['pattern']} appears {motif['count']} times")
221
+
222
+ # Example output:
223
+ # Claim has_policyholder Person (47 instances)
224
+ # Provider → billed_by → Claim → flagged_in → Audit (12 instances)
225
+ ```
226
+
227
+ ### 5. Schema Introspection (v0.2.0+)
228
+
229
+ ```python
230
+ from arango import ArangoClient
231
+ from odin import SchemaInspector
232
+
233
+ # Connect to ArangoDB (same as step 1 above)
234
+ client = ArangoClient(hosts="http://localhost:8529")
235
+ db = client.db("my_database", username="user", password="pass")
236
+
237
+ # Inspect your ArangoDB schema at runtime
238
+ inspector = SchemaInspector(db)
239
+ schema = inspector.get_schema_map()
240
+
241
+ # Get all collections and their fields
242
+ for collection in schema['collections']:
243
+ print(f"{collection['name']}: {collection['count']} docs")
244
+ print(f" Fields: {', '.join(collection['fields'][:5])}")
245
+
246
+ # Get specific collection info
247
+ entities_info = inspector.get_collection_info('ExtractedEntities')
248
+ print(f"Entity fields: {entities_info['fields']}")
249
+
250
+ # Get edge collection relationships
251
+ edge_info = inspector.get_edge_info('ExtractedRelationships')
252
+ print(f"From: {edge_info['from_collections']}")
253
+ print(f"To: {edge_info['to_collections']}")
254
+
255
+ # Save schema to file for documentation or agent context
256
+ from odin import inspect_arango_schema
257
+ inspect_arango_schema(db, output_file='schema.json')
258
+ ```
259
+
260
+ **Use Cases:**
261
+ - **AI Agents**: Provide schema context to agents for writing valid AQL queries
262
+ - **Documentation**: Auto-generate database schema documentation
263
+ - **Validation**: Verify collection structures across environments
264
+ - **Schema Evolution**: Track changes to your graph structure over time
265
+
266
+ ---
267
+
268
+ ## Architecture
269
+
270
+ Odin is composed of four layers working in concert:
271
+
272
+ ```
273
+ ┌─────────────────────────────────────────────────────────────────────────────┐
274
+ │ ODIN ENGINE │
275
+ │ │
276
+ ┌───────────────────────────────────────────────────────────────────────┐ │
277
+ │ │ RETRIEVAL ORCHESTRATOR │ │
278
+ │ │ Coordinates PPR Beam Search NPLL → Aggregation pipeline │ │
279
+ │ └───────────────────────────────────────────────────────────────────────┘ │
280
+ │ │
281
+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
282
+ │ │ Graph │ │ PPR Engine │ │ NPLL Model │ │ Aggregators │ │
283
+ │ │ Accessor │ │ (Anchors) │ │ (Confidence) │ │ (Motifs) │ │
284
+ │ │ + Cache │ │ │ │ │ │ │ │
285
+ │ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
286
+ └─────────────────────────────────────────────────────────────────────────────┘
287
+ ```
288
+
289
+ **Component Details:**
290
+
291
+ | Layer | Responsibility | Performance |
292
+ |-------|----------------|-------------|
293
+ | **Graph Accessor** | LRU-cached graph queries | <10ms avg per node fetch |
294
+ | **PPR Engine** | Compute importance scores | ~200ms for 1K nodes |
295
+ | **Beam Search** | Multi-hop path exploration | 50-500ms depending on beam width |
296
+ | **NPLL Confidence** | Semantic edge filtering | ~5ms per edge (cached) |
297
+ | **Aggregators** | Pattern extraction | ~100ms post-retrieval |
298
+
299
+ **Typical End-to-End Latency:** 300-800ms for 50-path retrieval
300
+
301
+ ---
302
+
303
+ ## Use Cases
304
+
305
+ ### 1. Healthcare Fraud Detection
306
+ **Scenario:** Find providers billing unusual procedure combinations
307
+
308
+ ```python
309
+ engine = OdinEngine(db, community_id="medicare_claims")
310
+ result = engine.retrieve(
311
+ seeds=["provider/high_volume_clinic"],
312
+ max_paths=100,
313
+ )
314
+ # Odin surfaces: "CPT_99285 + CPT_office_visit" pattern in 34 claims
315
+ ```
316
+
317
+ ### 2. Supply Chain Risk Analysis
318
+ **Scenario:** Identify cascading supplier dependencies
319
+
320
+ ```python
321
+ result = engine.retrieve(
322
+ seeds=["supplier/critical_vendor"],
323
+ hop_limit=5, # Deep supply chain exploration
324
+ )
325
+ # Discovers: Tier-3 supplier affects 47 downstream products
326
+ ```
327
+
328
+ ### 3. Regulatory Compliance Checks
329
+ **Scenario:** Validate entity relationships against compliance rules
330
+
331
+ ```python
332
+ score = engine.score_edge(
333
+ head="entity/investment_fund",
334
+ relation="managed_by",
335
+ tail="entity/sanctioned_entity"
336
+ )
337
+ if score > 0.5:
338
+ compliance_agent.flag_for_review()
339
+ ```
340
+
341
+ ---
342
+
343
+ ## Documentation
344
+
345
+ | Document | Description |
346
+ |----------|-------------|
347
+ | [**Architecture**](docs/ARCHITECTURE.md) | Complete technical design (873 lines) |
348
+ | [**Agent Integration Guide**](docs/AGENT_INTEGRATION_GUIDE.md) | How to integrate with AI agents (189 lines) |
349
+ | [**API Reference**](#api-reference) | Full method signatures and parameters |
350
+
351
+ ---
352
+
353
+ ## API Reference
354
+
355
+ ### OdinEngine
356
+
357
+ ```python
358
+ from odin import OdinEngine
359
+
360
+ engine = OdinEngine(
361
+ db: StandardDatabase, # ArangoDB connection
362
+ community_id: str = "global", # Scope for exploration
363
+ cache_size: int = 5000, # LRU cache size
364
+ auto_train: bool = True, # Auto-train NPLL if needed
365
+ community_mode: str = "none" # "none" | "mapping"
366
+ )
367
+ ```
368
+
369
+ **Methods:**
370
+
371
+ #### `retrieve(seeds, max_paths, hop_limit, **kwargs)`
372
+ Find and score paths from seed entities.
373
+
374
+ **Parameters:**
375
+ - `seeds: List[str]` - Starting entity IDs (e.g., `["entity/123"]`)
376
+ - `max_paths: int = 50` - Maximum paths to return
377
+ - `hop_limit: int = 3` - Maximum hops from seeds
378
+ - `beam_width: int = 10` - Top-K paths to explore at each hop
379
+
380
+ **Returns:**
381
+ ```python
382
+ {
383
+ "paths": [
384
+ {
385
+ "nodes": ["entity/A", "entity/B", "entity/C"],
386
+ "edges": [{"relation": "relates_to", "score": 0.89}],
387
+ "score": 0.94
388
+ }
389
+ ],
390
+ "triage": {"score": 87, "confidence": "high"},
391
+ "aggregates": {
392
+ "motifs": [{"pattern": "A→B→C", "count": 12}],
393
+ "node_frequencies": {"entity/A": 47}
394
+ }
395
+ }
396
+ ```
397
+
398
+ #### `score_edge(head, relation, tail)`
399
+ Score plausibility of a single edge.
400
+
401
+ **Parameters:**
402
+ - `head: str` - Source entity ID
403
+ - `relation: str` - Edge type
404
+ - `tail: str` - Target entity ID
405
+
406
+ **Returns:** `float` (0.0-1.0)
407
+
408
+ #### `find_anchors(seeds, topn)`
409
+ Get top-N most important nodes by PageRank.
410
+
411
+ **Parameters:**
412
+ - `seeds: List[str]` - Seed entities for personalized PageRank
413
+ - `topn: int = 20` - Number of top nodes to return
414
+
415
+ **Returns:** `List[Tuple[str, float]]` - (entity_id, ppr_score)
416
+
417
+ #### `retrain_model(force_retrain=True)`
418
+ Force NPLL model retraining (use after major graph updates).
419
+
420
+ ---
421
+
422
+ ## Performance
423
+
424
+ | Metric | Value | Notes |
425
+ |--------|-------|-------|
426
+ | **Graph Scale** | 10K-5M entities | Tested on healthcare, insurance, supply chain |
427
+ | **Typical Latency** | 300-800ms | For 50-path retrieval with 3-hop limit |
428
+ | **Cache Hit Rate** | >80% | After warm-up period |
429
+ | **NPLL Training** | 2-5 minutes | First run, then cached |
430
+ | **NPLL Loading** | ~30 seconds | Subsequent runs |
431
+ | **Memory** | ~500MB-2GB | Depends on cache size and graph density |
432
+
433
+ **Tested Deployments:**
434
+ - Healthcare KG: 2.3M entities, 8.7M edges (avg 450ms retrieval)
435
+ - Insurance Claims: 850K entities, 4.1M edges (avg 320ms retrieval)
436
+ - Supply Chain: 450K entities, 2.3M edges (avg 280ms retrieval)
437
+
438
+ ---
439
+
440
+ ## Testing
441
+
442
+ ```bash
443
+ # Run all tests (62 passing)
444
+ pytest tests/ -v
445
+
446
+ # Unit tests only
447
+ pytest tests/unit/ -v
448
+
449
+ # Integration tests
450
+ pytest tests/integration/ -v
451
+
452
+ # With coverage report
453
+ pytest tests/ --cov=odin --cov=npll --cov=retrieval --cov-report=html
454
+ ```
455
+
456
+ ---
457
+
458
+ ## Contributing
459
+
460
+ We welcome contributions! Areas of interest:
461
+
462
+ - **Scalability**: Optimizations for graphs >10M entities
463
+ - **Algorithms**: Alternative PPR implementations, new aggregators
464
+ - **Database Support**: Neo4j, Neptune adapters
465
+ - **Benchmarks**: Academic dataset comparisons
466
+
467
+ ---
468
+
469
+ ## License
470
+
471
+ MIT License - see [LICENSE](LICENSE) for details.
472
+
473
+ ---
474
+
475
+ ## Authors
476
+
477
+ **Prescott Data**
478
+ - Muyukani Kizito - Lead Engineer
479
+ - Elizabeth Nyambere - NPLL & GNN Research
480
+
481
+ ---
482
+
483
+ ## Citation
484
+
485
+ If you use Odin in academic work, please cite:
486
+
487
+ ```bibtex
488
+ @software{odin_kg_engine,
489
+ title={Odin: Graph Intelligence for Autonomous AI Agents},
490
+ author={Prescott Data},
491
+ year={2026},
492
+ url={https://github.com/prescott-data/odin-kg-engine}
493
+ }
494
+ ```
495
+
496
+ ---
497
+
498
+ ## Links
499
+
500
+ - [PyPI Package](https://pypi.org/project/odin-kg-engine/)
501
+ - [Documentation](docs/)
502
+ - [GitHub Repository](https://github.com/prescott-data/odin-kg-engine)
503
+ - [Prescott Data](https://prescottdata.io)