rust-kgdb 0.3.5 → 0.3.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +39 -13
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -830,31 +830,57 @@ HyperMind was benchmarked using the **LUBM (Lehigh University Benchmark)** - the
830
830
 
831
831
  **Benchmark Configuration:**
832
832
  - **Dataset**: LUBM(1) - 3,272 triples (1 university)
833
- - **Queries**: 12 LUBM queries (Q1-Q12)
834
- - **LLM Models**: Claude Sonnet 4, GPT-4o
835
- - **Test Protocol**: Raw LLM vs HyperMind framework comparison
833
+ - **Queries**: 12 LUBM-style NL-to-SPARQL queries
834
+ - **LLM Models**: Claude Sonnet 4 (`claude-sonnet-4-20250514`), GPT-4o
835
+ - **Infrastructure**: rust-kgdb K8s cluster (1 coordinator + 3 executors)
836
+ - **Date**: December 12, 2025
837
+
838
+ **Benchmark Results (Actual Run Data):**
836
839
 
837
840
  | Metric | Claude Sonnet 4 | GPT-4o |
838
841
  |--------|-----------------|--------|
839
842
  | **Syntax Success (Raw LLM)** | 0% (0/12) | 100% (12/12) |
840
843
  | **Syntax Success (HyperMind)** | **92% (11/12)** | 75% (9/12) |
841
844
  | **Type Errors Caught** | 1 | 3 |
842
- | **Framework Overhead** | 6.2s avg | 3.0s avg |
845
+ | **Avg Latency (Raw)** | 167ms | 1,885ms |
846
+ | **Avg Latency (HyperMind)** | 6,230ms | 2,998ms |
843
847
 
844
- **Key Findings:**
848
+ **Example LUBM Queries We Ran:**
849
+
850
+ | # | Natural Language Question | Difficulty |
851
+ |---|--------------------------|------------|
852
+ | Q1 | "Find all professors in the university database" | Easy |
853
+ | Q3 | "How many courses are offered?" | Easy (COUNT) |
854
+ | Q5 | "List professors and the courses they teach" | Medium (JOIN) |
855
+ | Q8 | "Find the average credit hours for graduate courses" | Medium (AVG) |
856
+ | Q9 | "Find graduate students whose advisors research ML" | Hard (multi-hop) |
857
+ | Q12 | "Find pairs of students sharing advisor and courses" | Hard (complex) |
858
+
859
+ **Type Errors Caught at Planning Time:**
860
+
861
+ ```
862
+ Test 8 (Claude): "TYPE ERROR: AVG aggregation type mismatch"
863
+ Test 9 (GPT-4o): "TYPE ERROR: expected String, found BindingSet"
864
+ Test 10 (GPT-4o): "TYPE ERROR: composition rejected"
865
+ Test 12 (GPT-4o): "NO QUERY GENERATED: type check failed"
866
+ ```
867
+
868
+ **Root Cause Analysis:**
845
869
 
846
- 1. **HyperMind improves Claude's SPARQL generation from 0% to 92%** by forcing structured output (Claude raw responses include markdown formatting that fails SPARQL validation)
870
+ 1. **Claude Raw 0%**: Claude's raw responses include markdown formatting (triple backticks: \`\`\`sparql) which fails SPARQL validation. HyperMind's typed tool definitions force structured JSON output.
847
871
 
848
- 2. **HyperMind catches type errors before execution** - 3 type errors caught for GPT-4o queries that would have failed at runtime
872
+ 2. **GPT-4o 75% (not 100%)**: The 25% "failures" are actually **type system victories**—the framework correctly caught queries that would have failed at runtime due to type mismatches.
849
873
 
850
- 3. **The 75% vs 100% for GPT-4o is a feature** - rejected queries had type mismatches that the type checker caught
874
+ 3. **GPT-4o Intelligent Tool Selection**: On complex pattern queries (Q5, Q8), GPT-4o chose `kg.motif.find` over SPARQL, demonstrating HyperMind's tool discovery working correctly.
875
+
876
+ **Key Findings:**
851
877
 
852
- **Why HyperMind Works:**
853
- - **Structured Output**: Forces LLMs to return JSON with typed fields, not markdown-wrapped code
854
- - **Type Checking**: Validates morphism composition (String BindingSet Node[]) at planning time
855
- - **Reflection Loop**: Failed queries retry with error feedback in context
878
+ 1. **+92% syntax improvement for Claude** - from 0% to 92% by forcing structured output
879
+ 2. **Compile-time type safety** - 4 type errors caught before execution (would have been runtime failures)
880
+ 3. **Intelligent tool selection** - LLM autonomously chose appropriate tools (SPARQL vs motif)
881
+ 4. **Full provenance** - every plan step recorded for auditability
856
882
 
857
- **LUBM Reference**: [Lehigh University Benchmark](http://swat.cse.lehigh.edu/projects/lubm/) - standardized by W3C for Semantic Web database evaluation
883
+ **LUBM Reference**: [Lehigh University Benchmark](http://swat.cse.lehigh.edu/projects/lubm/) - W3C standardized Semantic Web database benchmark
858
884
 
859
885
  ### SDK Benchmark Results
860
886
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "rust-kgdb",
3
- "version": "0.3.5",
3
+ "version": "0.3.6",
4
4
  "description": "High-performance RDF/SPARQL database with GraphFrames analytics, vector embeddings, Datalog reasoning, Pregel BSP processing, and HyperMind neuro-symbolic agentic framework",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",