rust-kgdb 0.6.81 → 0.6.83
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +357 -32
- package/examples/rpc-catalog-dprod-demo.js +339 -0
- package/examples/rpc-federation-sql-demo.js +273 -0
- package/examples/rpc-virtual-tables-demo.js +268 -0
- package/hypermind-agent.js +626 -0
- package/index.d.ts +304 -0
- package/index.js +9 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -4,7 +4,36 @@
|
|
|
4
4
|
[](https://opensource.org/licenses/Apache-2.0)
|
|
5
5
|
[](https://www.w3.org/TR/sparql11-query/)
|
|
6
6
|
|
|
7
|
-
> **
|
|
7
|
+
> **Your knowledge is scattered. Your claims live in Snowflake. Your customer graph sits in Neo4j. Your risk models run on BigQuery. Your compliance docs are in SharePoint. And your AI? It hallucinates because it can't see the full picture.**
|
|
8
|
+
>
|
|
9
|
+
> rust-kgdb unifies scattered enterprise knowledge into a single queryable graph—with native embeddings, cross-database federation, and AI that generates queries instead of fabricating answers. No hallucinations. Full audit trails. One query across everything.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## What's New in v0.7.0
|
|
14
|
+
|
|
15
|
+
| Feature | Description | Performance |
|
|
16
|
+
|---------|-------------|-------------|
|
|
17
|
+
| **HyperFederate** | Cross-database SQL: KGDB + Snowflake + BigQuery | Single query, 890ms 3-way federation |
|
|
18
|
+
| **RpcFederationProxy** | WASM RPC proxy for federated queries | 7 UDFs + 9 Table Functions |
|
|
19
|
+
| **Virtual Tables** | Session-bound query materialization | No ETL, real-time results |
|
|
20
|
+
| **DCAT DPROD Catalog** | W3C-aligned data product registry | Self-describing RDF storage |
|
|
21
|
+
| **Federation ProofDAG** | Full provenance for federated results | SHA-256 audit trail |
|
|
22
|
+
|
|
23
|
+
```javascript
|
|
24
|
+
const { GraphDB, RpcFederationProxy, FEDERATION_TOOLS } = require('rust-kgdb')
|
|
25
|
+
|
|
26
|
+
// Query across KGDB + Snowflake + BigQuery in single SQL
|
|
27
|
+
const federation = new RpcFederationProxy({ endpoint: 'http://localhost:30180' })
|
|
28
|
+
const result = await federation.query(`
|
|
29
|
+
SELECT kg.*, sf.C_NAME, bq.name_popularity
|
|
30
|
+
FROM graph_search('SELECT ?person WHERE { ?person a :Customer }') kg
|
|
31
|
+
JOIN snowflake.CUSTOMER sf ON kg.custKey = sf.C_CUSTKEY
|
|
32
|
+
LEFT JOIN bigquery.usa_names bq ON sf.C_NAME = bq.name
|
|
33
|
+
`)
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
*See [HyperFederate: Cross-Database Federation](#hyperfederate-cross-database-federation) for complete documentation.*
|
|
8
37
|
|
|
9
38
|
---
|
|
10
39
|
|
|
@@ -27,21 +56,35 @@ const { GraphDB, Rdf2VecEngine, EmbeddingService } = require('rust-kgdb')
|
|
|
27
56
|
|
|
28
57
|
## The Problem With AI Today
|
|
29
58
|
|
|
30
|
-
|
|
59
|
+
**Here's what actually happens in every enterprise AI project:**
|
|
31
60
|
|
|
32
|
-
|
|
61
|
+
Your fraud analyst asks a simple question: *"Show me high-risk customers with large account balances who've had claims in the past 6 months."*
|
|
33
62
|
|
|
34
|
-
|
|
63
|
+
Sounds simple. It's not.
|
|
35
64
|
|
|
36
|
-
The
|
|
65
|
+
The **customer data** lives in Snowflake. The **risk scores** are computed in your knowledge graph. The **claims history** sits in BigQuery. The **policy details** are in a legacy Oracle database. And **nobody can write a query that spans all four**.
|
|
37
66
|
|
|
38
|
-
|
|
67
|
+
So the analyst does what everyone does:
|
|
68
|
+
1. Export customers from Snowflake to CSV
|
|
69
|
+
2. Run a separate risk query in the graph database
|
|
70
|
+
3. Pull claims from BigQuery into another spreadsheet
|
|
71
|
+
4. Spend 3 hours in Excel doing VLOOKUP joins
|
|
72
|
+
5. Present "findings" that are already 6 hours stale
|
|
39
73
|
|
|
40
|
-
|
|
41
|
-
|
|
74
|
+
**This is the reality of enterprise data in 2025.** Knowledge is scattered across dozens of systems. Every "simple" question requires a data engineering project. And when you finally get your answer, you can't trace how it was derived.
|
|
75
|
+
|
|
76
|
+
Now add AI to this mess.
|
|
77
|
+
|
|
78
|
+
Your analyst asks ChatGPT the same question. It responds confidently: *"Customer #4521 is high-risk with $847,000 in account balance and 3 recent claims."*
|
|
79
|
+
|
|
80
|
+
The analyst opens an investigation. Two weeks later, legal discovers Customer #4521 doesn't exist. **The AI made up everything—the customer ID, the balance, the claims.** The AI had no access to your data. It just generated plausible-sounding text.
|
|
81
|
+
|
|
82
|
+
This keeps happening:
|
|
83
|
+
- A lawyer cites "Smith v. Johnson (2019)" in court. **That case doesn't exist.**
|
|
84
|
+
- A doctor avoids prescribing "Nexapril" for cardiac patients. **Nexapril isn't a real drug.**
|
|
42
85
|
- A fraud analyst flags Account #7842 for money laundering. **It belongs to a children's charity.**
|
|
43
86
|
|
|
44
|
-
Every time, the same pattern:
|
|
87
|
+
Every time, the same pattern: Data is scattered. AI can't see it. AI fabricates. People get hurt.
|
|
45
88
|
|
|
46
89
|
---
|
|
47
90
|
|
|
@@ -64,29 +107,46 @@ A real solution requires a different architecture. One built on solid engineerin
|
|
|
64
107
|
|
|
65
108
|
## The Solution: Query Generation, Not Answer Generation
|
|
66
109
|
|
|
67
|
-
What if
|
|
110
|
+
What if we're thinking about AI wrong?
|
|
111
|
+
|
|
112
|
+
Every enterprise wants the same thing: ask a question in plain English, get an accurate answer from their data. But we've been trying to make the AI *know* the answer. That's backwards.
|
|
68
113
|
|
|
69
|
-
|
|
70
|
-
- Your database knows the facts (claims, providers, transactions)
|
|
71
|
-
- AI understands language (can parse "find suspicious patterns")
|
|
72
|
-
- You need both working together
|
|
114
|
+
**The AI doesn't need to know anything. It just needs to know how to ask.**
|
|
73
115
|
|
|
74
|
-
|
|
116
|
+
Think about what's actually happening when a fraud analyst asks: *"Show me high-risk customers with large balances."*
|
|
117
|
+
|
|
118
|
+
The analyst already has everything needed to answer this question:
|
|
119
|
+
- Customer data in Snowflake
|
|
120
|
+
- Risk scores in the knowledge graph
|
|
121
|
+
- Account balances in the core banking system
|
|
122
|
+
- Complete audit logs of every transaction
|
|
123
|
+
|
|
124
|
+
The problem isn't missing data. It's that **no human can write a query that spans all these systems**. SQL doesn't work on graphs. SPARQL doesn't work on Snowflake. And nobody has 4 hours to manually join CSVs.
|
|
125
|
+
|
|
126
|
+
**The breakthrough**: What if AI generated the query instead of the answer?
|
|
75
127
|
|
|
76
128
|
```
|
|
77
|
-
|
|
78
|
-
Human: "
|
|
79
|
-
AI: "
|
|
129
|
+
The Old Way (Dangerous):
|
|
130
|
+
Human: "Show me high-risk customers with large balances"
|
|
131
|
+
AI: "Customer #4521 has $847K and high risk score" <-- FABRICATED
|
|
80
132
|
|
|
81
|
-
|
|
82
|
-
Human: "
|
|
83
|
-
AI: Generates
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
133
|
+
The New Way (Verifiable):
|
|
134
|
+
Human: "Show me high-risk customers with large balances"
|
|
135
|
+
AI: Understands intent → Generates federated SQL:
|
|
136
|
+
|
|
137
|
+
SELECT kg.customer, kg.risk_score, sf.balance
|
|
138
|
+
FROM graph_search('...risk assessment...') kg
|
|
139
|
+
JOIN snowflake.ACCOUNTS sf ON kg.customer_id = sf.id
|
|
140
|
+
WHERE kg.risk_score > 0.8 AND sf.balance > 100000
|
|
141
|
+
|
|
142
|
+
Database: Executes across KGDB + Snowflake + BigQuery
|
|
143
|
+
Result: Real customers. Real balances. Real risk scores.
|
|
144
|
+
With SHA-256 proof hash for audit trail. <-- VERIFIABLE
|
|
87
145
|
```
|
|
88
146
|
|
|
89
|
-
|
|
147
|
+
The AI never touches your data. It translates human language into precise queries. The database executes against real systems. Every answer traces back to actual records.
|
|
148
|
+
|
|
149
|
+
**rust-kgdb is not an AI that knows answers. It's an AI that knows how to ask the right questions—across every system where your knowledge lives.**
|
|
90
150
|
|
|
91
151
|
---
|
|
92
152
|
|
|
@@ -122,18 +182,29 @@ The math matters. When your fraud detection runs 35x faster, you catch fraud bef
|
|
|
122
182
|
|
|
123
183
|
## Why rust-kgdb and HyperMind?
|
|
124
184
|
|
|
125
|
-
|
|
185
|
+
**The question isn't "Can AI answer my question?" It's "Can I trust the answer?"**
|
|
126
186
|
|
|
127
|
-
|
|
187
|
+
Every AI framework makes the same mistake: they treat the LLM as the source of truth. LangChain. LlamaIndex. AutoGPT. They all assume the model knows things. It doesn't. It generates plausible text. There's a difference.
|
|
128
188
|
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
189
|
+
We built rust-kgdb on a contrarian principle: **Never trust the AI. Verify everything.**
|
|
190
|
+
|
|
191
|
+
The LLM proposes a query. The type system validates it against your actual schema. The sandbox executes it in isolation. The database returns only facts that exist. The proof DAG creates a cryptographic audit trail.
|
|
192
|
+
|
|
193
|
+
At no point does the AI "know" anything. It's a translator—from human intent to precise queries—with four layers of verification before anything touches your data.
|
|
194
|
+
|
|
195
|
+
**This is the difference between an AI that sounds right and an AI that is right.**
|
|
196
|
+
|
|
197
|
+
### The Engineering Foundation
|
|
198
|
+
|
|
199
|
+
| Layer | Component | What It Does |
|
|
200
|
+
|-------|-----------|--------------|
|
|
201
|
+
| **Database** | GraphDB | W3C SPARQL 1.1 compliant RDF store, 449ns lookups, 35x faster than RDFox |
|
|
132
202
|
| **Database** | Distributed SPARQL | HDRF partitioning across Kubernetes executors |
|
|
133
|
-
| **
|
|
203
|
+
| **Federation** | HyperFederate | Cross-database SQL: KGDB + Snowflake + BigQuery in single query |
|
|
204
|
+
| **Embeddings** | Rdf2VecEngine | Train 384-dim vectors from graph random walks, 68µs lookup |
|
|
134
205
|
| **Embeddings** | EmbeddingService | Multi-provider composite vectors with RRF fusion |
|
|
135
206
|
| **Embeddings** | HNSW Index | Approximate nearest neighbor search in 303µs |
|
|
136
|
-
| **Analytics** | GraphFrames | PageRank, connected components, motif matching |
|
|
207
|
+
| **Analytics** | GraphFrames | PageRank, connected components, triangle count, motif matching |
|
|
137
208
|
| **Analytics** | Pregel API | Bulk synchronous parallel graph algorithms |
|
|
138
209
|
| **Reasoning** | Datalog Engine | Recursive rule evaluation with fixpoint semantics |
|
|
139
210
|
| **AI Agent** | HyperMindAgent | Schema-aware SPARQL generation from natural language |
|
|
@@ -1579,11 +1650,265 @@ const agent = new AgentBuilder('scoped-agent')
|
|
|
1579
1650
|
| **Joins** | WCOJ | Worst-case optimal join algorithm |
|
|
1580
1651
|
| **Distribution** | HDRF | Streaming graph partitioning |
|
|
1581
1652
|
| **Distribution** | Raft | Consensus for coordination |
|
|
1653
|
+
| **Federation** | HyperFederate | Cross-database SQL: KGDB + Snowflake + BigQuery |
|
|
1654
|
+
| **Federation** | Virtual Tables | Session-bound query materialization |
|
|
1655
|
+
| **Federation** | DCAT Catalog | W3C DPROD data product registry |
|
|
1582
1656
|
| **Mobile** | iOS/Android | Swift and Kotlin bindings via UniFFI |
|
|
1583
1657
|
| **Storage** | InMemory/RocksDB/LMDB | Three backend options |
|
|
1584
1658
|
|
|
1585
1659
|
---
|
|
1586
1660
|
|
|
1661
|
+
## HyperFederate: Cross-Database Federation
|
|
1662
|
+
|
|
1663
|
+
### The Real Problem: Your Knowledge Lives Everywhere
|
|
1664
|
+
|
|
1665
|
+
Here's what actually happens in enterprise AI projects:
|
|
1666
|
+
|
|
1667
|
+
A fraud analyst asks: *"Show me high-risk customers with large account balances and unusual name patterns."*
|
|
1668
|
+
|
|
1669
|
+
To answer this, they need:
|
|
1670
|
+
- **Risk scores** from the Knowledge Graph (semantic relationships, fraud patterns)
|
|
1671
|
+
- **Account balances** from Snowflake (transaction history, customer master)
|
|
1672
|
+
- **Name demographics** from BigQuery (population statistics, anomaly detection)
|
|
1673
|
+
|
|
1674
|
+
Today's reality? Three separate queries. Manual data exports. Excel joins. Python scripts. Data engineers on standby. Days of work for a single question.
|
|
1675
|
+
|
|
1676
|
+
**This is insane.**
|
|
1677
|
+
|
|
1678
|
+
Your knowledge isn't siloed because you want it to be. It's siloed because no tool could query across systems... until now.
|
|
1679
|
+
|
|
1680
|
+
### One Query. Three Sources. Real Answers.
|
|
1681
|
+
|
|
1682
|
+
| Query Type | Before (Painful) | With HyperFederate |
|
|
1683
|
+
|------------|------------------|---------------------|
|
|
1684
|
+
| **KG Risk + Snowflake Accounts** | 2 queries + Python join | `JOIN snowflake.CUSTOMER ON kg.custKey = sf.C_CUSTKEY` |
|
|
1685
|
+
| **Snowflake + BigQuery Demographics** | ETL pipeline, 4-6 hours | `LEFT JOIN bigquery.usa_names ON sf.C_NAME = bq.name` |
|
|
1686
|
+
| **Three-Way: KG + SF + BQ** | "Not possible without data warehouse" | **Single SQL statement, 890ms** |
|
|
1687
|
+
|
|
1688
|
+
```sql
|
|
1689
|
+
-- The query that would take days... now takes 890ms
|
|
1690
|
+
SELECT
|
|
1691
|
+
kg.person AS entity,
|
|
1692
|
+
kg.riskScore,
|
|
1693
|
+
entity_type(kg.person) AS types, -- Semantic UDF
|
|
1694
|
+
similar_to(kg.person, 0.6) AS related, -- AI-powered similarity
|
|
1695
|
+
sf.C_NAME AS customer_name,
|
|
1696
|
+
sf.C_ACCTBAL AS account_balance,
|
|
1697
|
+
bq.name AS popular_name,
|
|
1698
|
+
bq.number AS name_popularity
|
|
1699
|
+
FROM graph_search('SELECT ?person ?riskScore WHERE { ?person :riskScore ?riskScore }') kg
|
|
1700
|
+
JOIN snowflake_tpch.CUSTOMER sf ON CAST(kg.custKey AS INT) = sf.C_CUSTKEY
|
|
1701
|
+
LEFT JOIN bigquery_public.usa_names bq ON LOWER(sf.C_NAME) = LOWER(bq.name)
|
|
1702
|
+
WHERE kg.riskScore > 0.7
|
|
1703
|
+
LIMIT 10
|
|
1704
|
+
```
|
|
1705
|
+
|
|
1706
|
+
**The analyst gets their answer in under a second.** No data engineers. No ETL. No waiting.
|
|
1707
|
+
|
|
1708
|
+
### How It Works: Heavy Lifting in Rust Core
|
|
1709
|
+
|
|
1710
|
+
The TypeScript SDK is intentionally thin. A thin RPC proxy. All the hard work happens in Rust:
|
|
1711
|
+
|
|
1712
|
+
```
|
|
1713
|
+
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
1714
|
+
│ TypeScript SDK (Thin RPC Proxy) │
|
|
1715
|
+
│ RpcFederationProxy: query(), createVirtualTable(), listCatalog(), ... │
|
|
1716
|
+
└─────────────────────────────────────────────────────────────────────────────────┘
|
|
1717
|
+
│ HTTP/RPC
|
|
1718
|
+
▼
|
|
1719
|
+
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
1720
|
+
│ Rust HyperFederate Core │
|
|
1721
|
+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
1722
|
+
│ │ Apache Arrow │ │ Memory │ │ HDRF │ │ Category │ │
|
|
1723
|
+
│ │ / Flight │ │ Acceleration │ │ Partitioner │ │ Theory │ │
|
|
1724
|
+
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
1725
|
+
│ │
|
|
1726
|
+
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
1727
|
+
│ │ Connector Registry (5+ Sources) │ │
|
|
1728
|
+
│ │ KGDB (graph_search) │ Snowflake │ BigQuery │ PostgreSQL │ MySQL │ │
|
|
1729
|
+
│ └─────────────────────────────────────────────────────────────────────────┘ │
|
|
1730
|
+
└─────────────────────────────────────────────────────────────────────────────────┘
|
|
1731
|
+
```
|
|
1732
|
+
|
|
1733
|
+
- **Apache Arrow/Flight**: High-performance columnar SQL engine (Rust)
|
|
1734
|
+
- **Memory Acceleration**: Zero-copy data transfer for sub-second queries
|
|
1735
|
+
- **HDRF**: Subject-anchored partitioning for distributed execution
|
|
1736
|
+
- **Category Theory**: Tools as typed morphisms with provable correctness
|
|
1737
|
+
|
|
1738
|
+
### Why This Matters
|
|
1739
|
+
|
|
1740
|
+
| Capability | rust-kgdb + HyperFederate | Competitors |
|
|
1741
|
+
|------------|---------------------------|-------------|
|
|
1742
|
+
| **Cross-DB SQL** | ✅ JOIN across 5+ sources | ❌ Single source only |
|
|
1743
|
+
| **KG Integration** | ✅ SPARQL in SQL | ❌ Separate systems |
|
|
1744
|
+
| **Semantic UDFs** | ✅ 7 AI-powered functions | ❌ None |
|
|
1745
|
+
| **Table Functions** | ✅ 9 graph analytics | ❌ Basic aggregates |
|
|
1746
|
+
| **Virtual Tables** | ✅ Session-bound materialization | ❌ ETL required |
|
|
1747
|
+
| **Data Catalog** | ✅ DCAT DPROD ontology | ❌ Proprietary |
|
|
1748
|
+
| **Proof/Lineage** | ✅ Full provenance (W3C PROV) | ❌ None |
|
|
1749
|
+
|
|
1750
|
+
### Using RpcFederationProxy
|
|
1751
|
+
|
|
1752
|
+
```javascript
|
|
1753
|
+
const { RpcFederationProxy, ProofDAG } = require('rust-kgdb')
|
|
1754
|
+
|
|
1755
|
+
const federation = new RpcFederationProxy({
|
|
1756
|
+
endpoint: 'http://localhost:30180',
|
|
1757
|
+
identityId: 'risk-analyst-001'
|
|
1758
|
+
})
|
|
1759
|
+
|
|
1760
|
+
// Query across KGDB + Snowflake + BigQuery in single SQL
|
|
1761
|
+
const result = await federation.query(`
|
|
1762
|
+
WITH kg_risk AS (
|
|
1763
|
+
SELECT * FROM graph_search('
|
|
1764
|
+
PREFIX finance: <https://gonnect.ai/domains/finance#>
|
|
1765
|
+
SELECT ?person ?riskScore WHERE {
|
|
1766
|
+
?person finance:riskScore ?riskScore .
|
|
1767
|
+
FILTER(?riskScore > 0.7)
|
|
1768
|
+
}
|
|
1769
|
+
')
|
|
1770
|
+
)
|
|
1771
|
+
SELECT
|
|
1772
|
+
kg.person AS entity,
|
|
1773
|
+
kg.riskScore,
|
|
1774
|
+
-- Semantic UDFs on KG entities
|
|
1775
|
+
entity_type(kg.person) AS types,
|
|
1776
|
+
similar_to(kg.person, 0.6) AS similar_entities,
|
|
1777
|
+
-- Snowflake customer data
|
|
1778
|
+
sf.C_NAME AS customer_name,
|
|
1779
|
+
sf.C_ACCTBAL AS account_balance,
|
|
1780
|
+
-- BigQuery demographics
|
|
1781
|
+
bq.name AS popular_name,
|
|
1782
|
+
bq.number AS name_popularity
|
|
1783
|
+
FROM kg_risk kg
|
|
1784
|
+
JOIN snowflake_tpch.CUSTOMER sf ON CAST(kg.custKey AS INT) = sf.C_CUSTKEY
|
|
1785
|
+
LEFT JOIN bigquery_public.usa_names bq ON LOWER(sf.C_NAME) = LOWER(bq.name)
|
|
1786
|
+
LIMIT 10
|
|
1787
|
+
`)
|
|
1788
|
+
|
|
1789
|
+
console.log(`Returned ${result.rowCount} rows in ${result.duration}ms`)
|
|
1790
|
+
console.log(`Sources: ${result.metadata.sources.join(', ')}`)
|
|
1791
|
+
```
|
|
1792
|
+
|
|
1793
|
+
### Semantic UDFs (7 AI-Powered Functions)
|
|
1794
|
+
|
|
1795
|
+
| UDF | Signature | Description |
|
|
1796
|
+
|-----|-----------|-------------|
|
|
1797
|
+
| `similar_to` | `(entity, threshold)` | Find semantically similar entities via RDF2Vec |
|
|
1798
|
+
| `text_search` | `(query, limit)` | Semantic text search |
|
|
1799
|
+
| `neighbors` | `(entity, hops)` | N-hop graph traversal |
|
|
1800
|
+
| `graph_pattern` | `(s, p, o)` | Triple pattern matching |
|
|
1801
|
+
| `sparql_query` | `(sparql)` | Inline SPARQL execution |
|
|
1802
|
+
| `entity_type` | `(entity)` | Get RDF types |
|
|
1803
|
+
| `entity_properties` | `(entity)` | Get all properties |
|
|
1804
|
+
|
|
1805
|
+
### Table Functions (9 Graph Analytics)
|
|
1806
|
+
|
|
1807
|
+
| Function | Description |
|
|
1808
|
+
|----------|-------------|
|
|
1809
|
+
| `graph_search(sparql)` | SPARQL → SQL bridge |
|
|
1810
|
+
| `vector_search(text, k, threshold)` | Semantic similarity search |
|
|
1811
|
+
| `pagerank(sparql, damping, iterations)` | PageRank centrality |
|
|
1812
|
+
| `connected_components(sparql)` | Community detection |
|
|
1813
|
+
| `shortest_paths(src, dst, max_hops)` | Path finding |
|
|
1814
|
+
| `triangle_count(sparql)` | Graph density measure |
|
|
1815
|
+
| `label_propagation(sparql, iterations)` | Community detection |
|
|
1816
|
+
| `datalog_reason(rules)` | Datalog inference |
|
|
1817
|
+
| `motif_search(pattern)` | Graph pattern matching |
|
|
1818
|
+
|
|
1819
|
+
### Virtual Tables (Session-Bound Materialization)
|
|
1820
|
+
|
|
1821
|
+
```javascript
|
|
1822
|
+
// Create virtual table from federation query
|
|
1823
|
+
const vt = await federation.createVirtualTable('high_risk_customers', `
|
|
1824
|
+
SELECT kg.*, sf.C_ACCTBAL
|
|
1825
|
+
FROM graph_search('SELECT ?person ?riskScore WHERE {...}') kg
|
|
1826
|
+
JOIN snowflake.CUSTOMER sf ON ...
|
|
1827
|
+
WHERE kg.riskScore > 0.8
|
|
1828
|
+
`, {
|
|
1829
|
+
refreshPolicy: 'on_demand', // or 'ttl', 'on_source_change'
|
|
1830
|
+
ttlSeconds: 3600,
|
|
1831
|
+
sharedWith: ['risk-analyst-002'],
|
|
1832
|
+
sharedWithGroups: ['team-risk-analytics']
|
|
1833
|
+
})
|
|
1834
|
+
|
|
1835
|
+
// Query without re-execution (materialized)
|
|
1836
|
+
const filtered = await federation.queryVirtualTable(
|
|
1837
|
+
'high_risk_customers',
|
|
1838
|
+
'C_ACCTBAL > 100000'
|
|
1839
|
+
)
|
|
1840
|
+
```
|
|
1841
|
+
|
|
1842
|
+
**Virtual Table Features**:
|
|
1843
|
+
- Session isolation (each user sees only their tables)
|
|
1844
|
+
- Access control via `sharedWith` and `sharedWithGroups`
|
|
1845
|
+
- Stored as RDF triples in KGDB (self-describing)
|
|
1846
|
+
- Queryable via SPARQL for metadata
|
|
1847
|
+
|
|
1848
|
+
### DCAT DPROD Catalog
|
|
1849
|
+
|
|
1850
|
+
```javascript
|
|
1851
|
+
// Register data product in catalog
|
|
1852
|
+
const product = await federation.registerDataProduct({
|
|
1853
|
+
name: 'High Risk Customer Analysis',
|
|
1854
|
+
description: 'Cross-domain risk scoring combining KG + transactional data',
|
|
1855
|
+
sources: ['kgdb', 'snowflake', 'bigquery'],
|
|
1856
|
+
outputPort: '/api/v1/products/high-risk/query',
|
|
1857
|
+
schema: {
|
|
1858
|
+
columns: [
|
|
1859
|
+
{ name: 'entity', type: 'STRING' },
|
|
1860
|
+
{ name: 'riskScore', type: 'FLOAT64' },
|
|
1861
|
+
{ name: 'accountBalance', type: 'DECIMAL(15,2)' }
|
|
1862
|
+
]
|
|
1863
|
+
},
|
|
1864
|
+
quality: {
|
|
1865
|
+
completeness: 0.98,
|
|
1866
|
+
accuracy: 0.95,
|
|
1867
|
+
timeliness: 0.99
|
|
1868
|
+
},
|
|
1869
|
+
owner: 'team-risk-analytics'
|
|
1870
|
+
})
|
|
1871
|
+
|
|
1872
|
+
// List catalog entries
|
|
1873
|
+
const catalog = await federation.listCatalog({ owner: 'team-risk-analytics' })
|
|
1874
|
+
```
|
|
1875
|
+
|
|
1876
|
+
### ProofDAG with Federation Evidence
|
|
1877
|
+
|
|
1878
|
+
```javascript
|
|
1879
|
+
const proof = new ProofDAG('High-risk customers identified across 3 data sources')
|
|
1880
|
+
|
|
1881
|
+
// Add federation evidence to the proof
|
|
1882
|
+
const fedNode = proof.addFederationEvidence(
|
|
1883
|
+
proof.rootId,
|
|
1884
|
+
threeWayQuery, // SQL query
|
|
1885
|
+
['kgdb', 'snowflake', 'bigquery'], // sources
|
|
1886
|
+
42, // rowCount
|
|
1887
|
+
890, // duration (ms)
|
|
1888
|
+
{ planHash: 'abc123', cached: false }
|
|
1889
|
+
)
|
|
1890
|
+
|
|
1891
|
+
console.log(`Proof hash: ${proof.computeHash()}`) // SHA-256 audit trail
|
|
1892
|
+
console.log(`Verification: ${JSON.stringify(proof.verify())}`)
|
|
1893
|
+
```
|
|
1894
|
+
|
|
1895
|
+
### Category Theory Foundation
|
|
1896
|
+
|
|
1897
|
+
HyperFederate tools are typed morphisms following category theory:
|
|
1898
|
+
|
|
1899
|
+
```javascript
|
|
1900
|
+
const { FEDERATION_TOOLS } = require('rust-kgdb')
|
|
1901
|
+
|
|
1902
|
+
// Each tool has Input → Output type signature
|
|
1903
|
+
console.log(FEDERATION_TOOLS['federation.sql.query'])
|
|
1904
|
+
// { input: 'FederatedQuery', output: 'RecordBatch', domain: 'federation' }
|
|
1905
|
+
|
|
1906
|
+
console.log(FEDERATION_TOOLS['federation.udf.call'])
|
|
1907
|
+
// { input: 'UdfCall', output: 'UdfResult', udfs: ['similar_to', 'neighbors', ...] }
|
|
1908
|
+
```
|
|
1909
|
+
|
|
1910
|
+
---
|
|
1911
|
+
|
|
1587
1912
|
## Installation
|
|
1588
1913
|
|
|
1589
1914
|
```bash
|