rust-kgdb 0.6.1 → 0.6.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +36 -17
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -12,16 +12,16 @@
12
12
 
13
13
  We asked GPT-4 to write a simple SPARQL query: *"Find all professors."*
14
14
 
15
- It returned this:
15
+ It returned this broken output:
16
16
 
17
- ```sparql
18
- ```sparql
19
- SELECT ?professor WHERE { ?professor a ub:Faculty . }
20
- ```
21
- This query retrieves faculty members from the knowledge graph.
17
+ ```text
18
+ ```sparql
19
+ SELECT ?professor WHERE { ?professor a ub:Faculty . }
20
+ ```
21
+ This query retrieves faculty members from the knowledge graph.
22
22
  ```
23
23
 
24
- Three problems: markdown code fences break the parser, `ub:Faculty` doesn't exist in the schema (it's `ub:Professor`), and the explanation text is mixed with the query. **Result: Parser error. Zero results.**
24
+ Three problems: (1) markdown code fences break the parser, (2) `ub:Faculty` doesn't exist in the schema (it's `ub:Professor`), and (3) the explanation text is mixed with the query. **Result: Parser error. Zero results.**
25
25
 
26
26
  This isn't a cherry-picked failure. When we ran the standard LUBM benchmark (14 queries, 3,272 triples), vanilla LLMs produced valid, correct SPARQL **0% of the time**.
27
27
 
@@ -261,23 +261,42 @@ Token limits are real. rust-kgdb uses a **rolling time window strategy** to find
261
261
  └─────────────────────────────────────────────────────────────────────────────────┘
262
262
  ```
263
263
 
264
- ### Idempotent Responses via Query Cache
264
+ ### Idempotent Responses via Semantic Hashing
265
265
 
266
- Same question = Same answer. Critical for compliance.
266
+ Same question = Same answer. Even with **different wording**. Critical for compliance.
267
267
 
268
268
  ```javascript
269
- // First call: Compute answer, cache result
269
+ // First call: Compute answer, cache with semantic hash
270
270
  const result1 = await agent.call("Analyze claims from Provider P001")
271
- // Hash: sha256:9f86d081...b0f00a08
271
+ // Semantic Hash: semhash:fraud-provider-p001-claims-analysis
272
+
273
+ // Second call (different wording, same intent): Cache HIT!
274
+ const result2 = await agent.call("Show me P001's claim patterns")
275
+ // Cache HIT - same semantic hash: semhash:fraud-provider-p001-claims-analysis
272
276
 
273
- // Second call (10 minutes later): Return cached result
274
- const result2 = await agent.call("Analyze claims from Provider P001")
275
- // Cache HIT - same hash: sha256:9f86d081...b0f00a08
277
+ // Third call (exact same): Also cache hit
278
+ const result3 = await agent.call("Analyze claims from Provider P001")
279
+ // Cache HIT - same semantic hash: semhash:fraud-provider-p001-claims-analysis
276
280
 
277
281
  // Compliance officer: "Why are these identical?"
278
- // You: "Idempotent responses - same input, same output, cryptographic proof."
282
+ // You: "Semantic hashing - same meaning, same output, regardless of phrasing."
279
283
  ```
280
284
 
285
+ **How it works**: Query embeddings are hashed via **Locality-Sensitive Hashing (LSH)** with random hyperplane projections. Semantically similar queries map to the same bucket.
286
+
287
+ **Research Foundation**:
288
+ - **SimHash** (Charikar, 2002) - Random hyperplane projections for cosine similarity
289
+ - **Semantic Hashing** (Salakhutdinov & Hinton, 2009) - Deep autoencoders for binary codes
290
+ - **Learning to Hash** (Wang et al., 2018) - Survey of neural hashing methods
291
+
292
+ **Implementation**: 384-dim embeddings → LSH with 64 hyperplanes → 64-bit semantic hash
293
+
294
+ **Benefits**:
295
+ - **Semantic deduplication** - "Find fraud" and "Detect fraudulent activity" hit same cache
296
+ - **Cost reduction** - Avoid redundant LLM calls for paraphrased questions
297
+ - **Consistency** - Same answer for same intent, audit-ready
298
+ - **Sub-linear lookup** - O(1) hash lookup vs O(n) embedding comparison
299
+
281
300
  ---
282
301
 
283
302
  ## What This Is
@@ -2501,7 +2520,7 @@ that appear to be related. There might be collusion with provider PROV001."
2501
2520
  "output": "collusion(P001,P002,PROV001)",
2502
2521
  "derivation": "claim(CLM001,P001,PROV001) ∧ claim(CLM002,P002,PROV001) ∧ related(P001,P002) → collusion(P001,P002,PROV001)",
2503
2522
  "timestamp": "2024-12-14T10:30:00Z",
2504
- "hash": "sha256:9f86d081...b0f00a08"
2523
+ "semanticHash": "semhash:collusion-p001-p002-prov001"
2505
2524
  }
2506
2525
  }
2507
2526
  ```
@@ -2521,7 +2540,7 @@ that appear to be related. There might be collusion with provider PROV001."
2521
2540
  5. Unification: `?P1=P001, ?P2=P002, ?Prov=PROV001`
2522
2541
  6. Conclusion: `collusion(P001, P002, PROV001)` - QED
2523
2542
 
2524
- Here's the SHA-256 hash of this execution: `sha256:9f86d081...b0f00a08`"
2543
+ Here's the semantic hash: `semhash:collusion-p001-p002-prov001` - same query intent will always return this exact result."
2525
2544
 
2526
2545
  **Result:** HyperMind passes audit. DSPy gets you a follow-up meeting with legal.
2527
2546
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "rust-kgdb",
3
- "version": "0.6.1",
3
+ "version": "0.6.3",
4
4
  "description": "Production-grade Neuro-Symbolic AI Framework with Memory Hypergraph: +86.4% accuracy improvement over vanilla LLMs. High-performance knowledge graph (2.78µs lookups, 35x faster than RDFox). Features Memory Hypergraph (temporal scoring, rolling context window, idempotent responses), fraud detection, underwriting agents, WASM sandbox, type/category/proof theory, and W3C SPARQL 1.1 compliance.",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",