npm - rust-kgdb - Versions diffs - 0.6.81 → 0.6.83 - Mend

rust-kgdb 0.6.81 → 0.6.83

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +357 -32
package/examples/rpc-catalog-dprod-demo.js +339 -0
package/examples/rpc-federation-sql-demo.js +273 -0
package/examples/rpc-virtual-tables-demo.js +268 -0
package/hypermind-agent.js +626 -0
package/index.d.ts +304 -0
package/index.js +9 -0
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -4,7 +4,36 @@
 [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 [![W3C](https://img.shields.io/badge/W3C-SPARQL%201.1%20%7C%20RDF%201.2-blue)](https://www.w3.org/TR/sparql11-query/)
-> **Enterprise Knowledge Graph with Native Graph Embeddings**: A production-grade RDF database featuring built-in RDF2Vec, multi-vector composite search, and distributed SPARQL execution—engineered for teams who need verifiable AI at scale.
+> **Your knowledge is scattered. Your claims live in Snowflake. Your customer graph sits in Neo4j. Your risk models run on BigQuery. Your compliance docs are in SharePoint. And your AI? It hallucinates because it can't see the full picture.**
+>
+> rust-kgdb unifies scattered enterprise knowledge into a single queryable graph—with native embeddings, cross-database federation, and AI that generates queries instead of fabricating answers. No hallucinations. Full audit trails. One query across everything.
+---
+## What's New in v0.7.0
+| Feature | Description | Performance |
+|---------|-------------|-------------|
+| **HyperFederate** | Cross-database SQL: KGDB + Snowflake + BigQuery | Single query, 890ms 3-way federation |
+| **RpcFederationProxy** | WASM RPC proxy for federated queries | 7 UDFs + 9 Table Functions |
+| **Virtual Tables** | Session-bound query materialization | No ETL, real-time results |
+| **DCAT DPROD Catalog** | W3C-aligned data product registry | Self-describing RDF storage |
+| **Federation ProofDAG** | Full provenance for federated results | SHA-256 audit trail |
+```javascript
+const { GraphDB, RpcFederationProxy, FEDERATION_TOOLS } = require('rust-kgdb')
+// Query across KGDB + Snowflake + BigQuery in single SQL
+const federation = new RpcFederationProxy({ endpoint: 'http://localhost:30180' })
+const result = await federation.query(`
+  SELECT kg.*, sf.C_NAME, bq.name_popularity
+  FROM graph_search('SELECT ?person WHERE { ?person a :Customer }') kg
+  JOIN snowflake.CUSTOMER sf ON kg.custKey = sf.C_CUSTKEY
+  LEFT JOIN bigquery.usa_names bq ON sf.C_NAME = bq.name
+`)
+```
+*See [HyperFederate: Cross-Database Federation](#hyperfederate-cross-database-federation) for complete documentation.*
 ---
@@ -27,21 +56,35 @@ const { GraphDB, Rdf2VecEngine, EmbeddingService } = require('rust-kgdb')
 ## The Problem With AI Today
-Enterprise AI projects keep failing. Not because the technology is bad, but because organizations use it wrong.
+**Here's what actually happens in every enterprise AI project:**
-A claims investigator asks ChatGPT: *"Has Provider #4521 shown suspicious billing patterns?"*
+Your fraud analyst asks a simple question: *"Show me high-risk customers with large account balances who've had claims in the past 6 months."*
-The AI responds confidently: *"Yes, Provider #4521 has a history of duplicate billing and upcoding."*
+Sounds simple. It's not.
-The investigator opens a case. Weeks later, legal discovers Provider #4521 has a perfect record. **The AI made it up.** Lawsuit incoming.
+The **customer data** lives in Snowflake. The **risk scores** are computed in your knowledge graph. The **claims history** sits in BigQuery. The **policy details** are in a legacy Oracle database. And **nobody can write a query that spans all four**.
-This keeps happening:
+So the analyst does what everyone does:
+1. Export customers from Snowflake to CSV
+2. Run a separate risk query in the graph database
+3. Pull claims from BigQuery into another spreadsheet
+4. Spend 3 hours in Excel doing VLOOKUP joins
+5. Present "findings" that are already 6 hours stale
-- A lawyer cites "Smith v. Johnson (2019)" in court. The judge is confused. **That case doesn't exist.**
-- A doctor avoids prescribing "Nexapril" due to cardiac interactions. **Nexapril isn't a real drug.**
+**This is the reality of enterprise data in 2025.** Knowledge is scattered across dozens of systems. Every "simple" question requires a data engineering project. And when you finally get your answer, you can't trace how it was derived.
+Now add AI to this mess.
+Your analyst asks ChatGPT the same question. It responds confidently: *"Customer #4521 is high-risk with $847,000 in account balance and 3 recent claims."*
+The analyst opens an investigation. Two weeks later, legal discovers Customer #4521 doesn't exist. **The AI made up everything—the customer ID, the balance, the claims.** The AI had no access to your data. It just generated plausible-sounding text.
+This keeps happening:
+- A lawyer cites "Smith v. Johnson (2019)" in court. **That case doesn't exist.**
+- A doctor avoids prescribing "Nexapril" for cardiac patients. **Nexapril isn't a real drug.**
 - A fraud analyst flags Account #7842 for money laundering. **It belongs to a children's charity.**
-Every time, the same pattern: The AI sounds confident. The AI is wrong. People get hurt.
+Every time, the same pattern: Data is scattered. AI can't see it. AI fabricates. People get hurt.
 ---
@@ -64,29 +107,46 @@ A real solution requires a different architecture. One built on solid engineerin
 ## The Solution: Query Generation, Not Answer Generation
-What if AI stopped providing answers and started **generating queries**?
+What if we're thinking about AI wrong?
+Every enterprise wants the same thing: ask a question in plain English, get an accurate answer from their data. But we've been trying to make the AI *know* the answer. That's backwards.
-Think about it:
-- Your database knows the facts (claims, providers, transactions)
-- AI understands language (can parse "find suspicious patterns")
-- You need both working together
+**The AI doesn't need to know anything. It just needs to know how to ask.**
-**The AI translates intent into queries. The database finds facts. The AI never makes up data.**
+Think about what's actually happening when a fraud analyst asks: *"Show me high-risk customers with large balances."*
+The analyst already has everything needed to answer this question:
+- Customer data in Snowflake
+- Risk scores in the knowledge graph
+- Account balances in the core banking system
+- Complete audit logs of every transaction
+The problem isn't missing data. It's that **no human can write a query that spans all these systems**. SQL doesn't work on graphs. SPARQL doesn't work on Snowflake. And nobody has 4 hours to manually join CSVs.
+**The breakthrough**: What if AI generated the query instead of the answer?
 ```
-Before (Dangerous):
-  Human: "Is Provider #4521 suspicious?"
-  AI: "Yes, they have billing anomalies"      <-- FABRICATED
+The Old Way (Dangerous):
+  Human: "Show me high-risk customers with large balances"
+  AI: "Customer #4521 has $847K and high risk score"     <-- FABRICATED
-After (Safe):
-  Human: "Is Provider #4521 suspicious?"
-  AI: Generates SPARQL query
-  AI: Executes against YOUR database
-  Database: Returns actual facts about Provider #4521
-  Result: Real data with audit trail          <-- VERIFIABLE
+The New Way (Verifiable):
+  Human: "Show me high-risk customers with large balances"
+  AI: Understands intent → Generates federated SQL:
+      SELECT kg.customer, kg.risk_score, sf.balance
+      FROM graph_search('...risk assessment...') kg
+      JOIN snowflake.ACCOUNTS sf ON kg.customer_id = sf.id
+      WHERE kg.risk_score > 0.8 AND sf.balance > 100000
+  Database: Executes across KGDB + Snowflake + BigQuery
+  Result: Real customers. Real balances. Real risk scores.
+          With SHA-256 proof hash for audit trail.          <-- VERIFIABLE
 ```
-rust-kgdb is a knowledge graph database with an AI layer that **cannot hallucinate** because it only returns data from your actual systems.
+The AI never touches your data. It translates human language into precise queries. The database executes against real systems. Every answer traces back to actual records.
+**rust-kgdb is not an AI that knows answers. It's an AI that knows how to ask the right questions—across every system where your knowledge lives.**
 ---
@@ -122,18 +182,29 @@ The math matters. When your fraud detection runs 35x faster, you catch fraud bef
 ## Why rust-kgdb and HyperMind?
-Most AI frameworks trust the LLM. We don't.
+**The question isn't "Can AI answer my question?" It's "Can I trust the answer?"**
-### Core Capabilities
+Every AI framework makes the same mistake: they treat the LLM as the source of truth. LangChain. LlamaIndex. AutoGPT. They all assume the model knows things. It doesn't. It generates plausible text. There's a difference.
-| Layer | Feature | What It Does |
-|-------|---------|--------------|
-| **Database** | GraphDB | W3C SPARQL 1.1 compliant RDF store with 449ns lookups |
+We built rust-kgdb on a contrarian principle: **Never trust the AI. Verify everything.**
+The LLM proposes a query. The type system validates it against your actual schema. The sandbox executes it in isolation. The database returns only facts that exist. The proof DAG creates a cryptographic audit trail.
+At no point does the AI "know" anything. It's a translator—from human intent to precise queries—with four layers of verification before anything touches your data.
+**This is the difference between an AI that sounds right and an AI that is right.**
+### The Engineering Foundation
+| Layer | Component | What It Does |
+|-------|-----------|--------------|
+| **Database** | GraphDB | W3C SPARQL 1.1 compliant RDF store, 449ns lookups, 35x faster than RDFox |
 | **Database** | Distributed SPARQL | HDRF partitioning across Kubernetes executors |
-| **Embeddings** | Rdf2VecEngine | Train 384-dim vectors from graph random walks |
+| **Federation** | HyperFederate | Cross-database SQL: KGDB + Snowflake + BigQuery in single query |
+| **Embeddings** | Rdf2VecEngine | Train 384-dim vectors from graph random walks, 68µs lookup |
 | **Embeddings** | EmbeddingService | Multi-provider composite vectors with RRF fusion |
 | **Embeddings** | HNSW Index | Approximate nearest neighbor search in 303µs |
-| **Analytics** | GraphFrames | PageRank, connected components, motif matching |
+| **Analytics** | GraphFrames | PageRank, connected components, triangle count, motif matching |
 | **Analytics** | Pregel API | Bulk synchronous parallel graph algorithms |
 | **Reasoning** | Datalog Engine | Recursive rule evaluation with fixpoint semantics |
 | **AI Agent** | HyperMindAgent | Schema-aware SPARQL generation from natural language |
@@ -1579,11 +1650,265 @@ const agent = new AgentBuilder('scoped-agent')
 | **Joins** | WCOJ | Worst-case optimal join algorithm |
 | **Distribution** | HDRF | Streaming graph partitioning |
 | **Distribution** | Raft | Consensus for coordination |
+| **Federation** | HyperFederate | Cross-database SQL: KGDB + Snowflake + BigQuery |
+| **Federation** | Virtual Tables | Session-bound query materialization |
+| **Federation** | DCAT Catalog | W3C DPROD data product registry |
 | **Mobile** | iOS/Android | Swift and Kotlin bindings via UniFFI |
 | **Storage** | InMemory/RocksDB/LMDB | Three backend options |
 ---
+## HyperFederate: Cross-Database Federation
+### The Real Problem: Your Knowledge Lives Everywhere
+Here's what actually happens in enterprise AI projects:
+A fraud analyst asks: *"Show me high-risk customers with large account balances and unusual name patterns."*
+To answer this, they need:
+- **Risk scores** from the Knowledge Graph (semantic relationships, fraud patterns)
+- **Account balances** from Snowflake (transaction history, customer master)
+- **Name demographics** from BigQuery (population statistics, anomaly detection)
+Today's reality? Three separate queries. Manual data exports. Excel joins. Python scripts. Data engineers on standby. Days of work for a single question.
+**This is insane.**
+Your knowledge isn't siloed because you want it to be. It's siloed because no tool could query across systems... until now.
+### One Query. Three Sources. Real Answers.
+| Query Type | Before (Painful) | With HyperFederate |
+|------------|------------------|---------------------|
+| **KG Risk + Snowflake Accounts** | 2 queries + Python join | `JOIN snowflake.CUSTOMER ON kg.custKey = sf.C_CUSTKEY` |
+| **Snowflake + BigQuery Demographics** | ETL pipeline, 4-6 hours | `LEFT JOIN bigquery.usa_names ON sf.C_NAME = bq.name` |
+| **Three-Way: KG + SF + BQ** | "Not possible without data warehouse" | **Single SQL statement, 890ms** |
+```sql
+-- The query that would take days... now takes 890ms
+SELECT
+  kg.person AS entity,
+  kg.riskScore,
+  entity_type(kg.person) AS types,           -- Semantic UDF
+  similar_to(kg.person, 0.6) AS related,     -- AI-powered similarity
+  sf.C_NAME AS customer_name,
+  sf.C_ACCTBAL AS account_balance,
+  bq.name AS popular_name,
+  bq.number AS name_popularity
+FROM graph_search('SELECT ?person ?riskScore WHERE { ?person :riskScore ?riskScore }') kg
+JOIN snowflake_tpch.CUSTOMER sf ON CAST(kg.custKey AS INT) = sf.C_CUSTKEY
+LEFT JOIN bigquery_public.usa_names bq ON LOWER(sf.C_NAME) = LOWER(bq.name)
+WHERE kg.riskScore > 0.7
+LIMIT 10
+```
+**The analyst gets their answer in under a second.** No data engineers. No ETL. No waiting.
+### How It Works: Heavy Lifting in Rust Core
+The TypeScript SDK is intentionally thin. A thin RPC proxy. All the hard work happens in Rust:
+```
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                        TypeScript SDK (Thin RPC Proxy)                          │
+│  RpcFederationProxy: query(), createVirtualTable(), listCatalog(), ...          │
+└─────────────────────────────────────────────────────────────────────────────────┘
+                                      │ HTTP/RPC
+                                      ▼
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                          Rust HyperFederate Core                                │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
+│  │ Apache Arrow │  │   Memory     │  │    HDRF      │  │   Category   │        │
+│  │   / Flight   │  │ Acceleration │  │ Partitioner  │  │    Theory    │        │
+│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘        │
+│                                                                                 │
+│  ┌─────────────────────────────────────────────────────────────────────────┐   │
+│  │                    Connector Registry (5+ Sources)                       │   │
+│  │  KGDB (graph_search) │ Snowflake │ BigQuery │ PostgreSQL │ MySQL        │   │
+│  └─────────────────────────────────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────────────────────────────────┘
+```
+- **Apache Arrow/Flight**: High-performance columnar SQL engine (Rust)
+- **Memory Acceleration**: Zero-copy data transfer for sub-second queries
+- **HDRF**: Subject-anchored partitioning for distributed execution
+- **Category Theory**: Tools as typed morphisms with provable correctness
+### Why This Matters
+| Capability | rust-kgdb + HyperFederate | Competitors |
+|------------|---------------------------|-------------|
+| **Cross-DB SQL** | ✅ JOIN across 5+ sources | ❌ Single source only |
+| **KG Integration** | ✅ SPARQL in SQL | ❌ Separate systems |
+| **Semantic UDFs** | ✅ 7 AI-powered functions | ❌ None |
+| **Table Functions** | ✅ 9 graph analytics | ❌ Basic aggregates |
+| **Virtual Tables** | ✅ Session-bound materialization | ❌ ETL required |
+| **Data Catalog** | ✅ DCAT DPROD ontology | ❌ Proprietary |
+| **Proof/Lineage** | ✅ Full provenance (W3C PROV) | ❌ None |
+### Using RpcFederationProxy
+```javascript
+const { RpcFederationProxy, ProofDAG } = require('rust-kgdb')
+const federation = new RpcFederationProxy({
+  endpoint: 'http://localhost:30180',
+  identityId: 'risk-analyst-001'
+})
+// Query across KGDB + Snowflake + BigQuery in single SQL
+const result = await federation.query(`
+  WITH kg_risk AS (
+    SELECT * FROM graph_search('
+      PREFIX finance: <https://gonnect.ai/domains/finance#>
+      SELECT ?person ?riskScore WHERE {
+        ?person finance:riskScore ?riskScore .
+        FILTER(?riskScore > 0.7)
+      }
+    ')
+  )
+  SELECT
+    kg.person AS entity,
+    kg.riskScore,
+    -- Semantic UDFs on KG entities
+    entity_type(kg.person) AS types,
+    similar_to(kg.person, 0.6) AS similar_entities,
+    -- Snowflake customer data
+    sf.C_NAME AS customer_name,
+    sf.C_ACCTBAL AS account_balance,
+    -- BigQuery demographics
+    bq.name AS popular_name,
+    bq.number AS name_popularity
+  FROM kg_risk kg
+  JOIN snowflake_tpch.CUSTOMER sf ON CAST(kg.custKey AS INT) = sf.C_CUSTKEY
+  LEFT JOIN bigquery_public.usa_names bq ON LOWER(sf.C_NAME) = LOWER(bq.name)
+  LIMIT 10
+`)
+console.log(`Returned ${result.rowCount} rows in ${result.duration}ms`)
+console.log(`Sources: ${result.metadata.sources.join(', ')}`)
+```
+### Semantic UDFs (7 AI-Powered Functions)
+| UDF | Signature | Description |
+|-----|-----------|-------------|
+| `similar_to` | `(entity, threshold)` | Find semantically similar entities via RDF2Vec |
+| `text_search` | `(query, limit)` | Semantic text search |
+| `neighbors` | `(entity, hops)` | N-hop graph traversal |
+| `graph_pattern` | `(s, p, o)` | Triple pattern matching |
+| `sparql_query` | `(sparql)` | Inline SPARQL execution |
+| `entity_type` | `(entity)` | Get RDF types |
+| `entity_properties` | `(entity)` | Get all properties |
+### Table Functions (9 Graph Analytics)
+| Function | Description |
+|----------|-------------|
+| `graph_search(sparql)` | SPARQL → SQL bridge |
+| `vector_search(text, k, threshold)` | Semantic similarity search |
+| `pagerank(sparql, damping, iterations)` | PageRank centrality |
+| `connected_components(sparql)` | Community detection |
+| `shortest_paths(src, dst, max_hops)` | Path finding |
+| `triangle_count(sparql)` | Graph density measure |
+| `label_propagation(sparql, iterations)` | Community detection |
+| `datalog_reason(rules)` | Datalog inference |
+| `motif_search(pattern)` | Graph pattern matching |
+### Virtual Tables (Session-Bound Materialization)
+```javascript
+// Create virtual table from federation query
+const vt = await federation.createVirtualTable('high_risk_customers', `
+  SELECT kg.*, sf.C_ACCTBAL
+  FROM graph_search('SELECT ?person ?riskScore WHERE {...}') kg
+  JOIN snowflake.CUSTOMER sf ON ...
+  WHERE kg.riskScore > 0.8
+`, {
+  refreshPolicy: 'on_demand',    // or 'ttl', 'on_source_change'
+  ttlSeconds: 3600,
+  sharedWith: ['risk-analyst-002'],
+  sharedWithGroups: ['team-risk-analytics']
+})
+// Query without re-execution (materialized)
+const filtered = await federation.queryVirtualTable(
+  'high_risk_customers',
+  'C_ACCTBAL > 100000'
+)
+```
+**Virtual Table Features**:
+- Session isolation (each user sees only their tables)
+- Access control via `sharedWith` and `sharedWithGroups`
+- Stored as RDF triples in KGDB (self-describing)
+- Queryable via SPARQL for metadata
+### DCAT DPROD Catalog
+```javascript
+// Register data product in catalog
+const product = await federation.registerDataProduct({
+  name: 'High Risk Customer Analysis',
+  description: 'Cross-domain risk scoring combining KG + transactional data',
+  sources: ['kgdb', 'snowflake', 'bigquery'],
+  outputPort: '/api/v1/products/high-risk/query',
+  schema: {
+    columns: [
+      { name: 'entity', type: 'STRING' },
+      { name: 'riskScore', type: 'FLOAT64' },
+      { name: 'accountBalance', type: 'DECIMAL(15,2)' }
+    ]
+  },
+  quality: {
+    completeness: 0.98,
+    accuracy: 0.95,
+    timeliness: 0.99
+  },
+  owner: 'team-risk-analytics'
+})
+// List catalog entries
+const catalog = await federation.listCatalog({ owner: 'team-risk-analytics' })
+```
+### ProofDAG with Federation Evidence
+```javascript
+const proof = new ProofDAG('High-risk customers identified across 3 data sources')
+// Add federation evidence to the proof
+const fedNode = proof.addFederationEvidence(
+  proof.rootId,
+  threeWayQuery,                     // SQL query
+  ['kgdb', 'snowflake', 'bigquery'], // sources
+  42,                                // rowCount
+  890,                               // duration (ms)
+  { planHash: 'abc123', cached: false }
+)
+console.log(`Proof hash: ${proof.computeHash()}`)  // SHA-256 audit trail
+console.log(`Verification: ${JSON.stringify(proof.verify())}`)
+```
+### Category Theory Foundation
+HyperFederate tools are typed morphisms following category theory:
+```javascript
+const { FEDERATION_TOOLS } = require('rust-kgdb')
+// Each tool has Input → Output type signature
+console.log(FEDERATION_TOOLS['federation.sql.query'])
+// { input: 'FederatedQuery', output: 'RecordBatch', domain: 'federation' }
+console.log(FEDERATION_TOOLS['federation.udf.call'])
+// { input: 'UdfCall', output: 'UdfResult', udfs: ['similar_to', 'neighbors', ...] }
+```
+---
 ## Installation
 ```bash