npm - amalfa - Versions diffs - 1.0.2 → 1.0.4 - Mend

amalfa 1.0.2 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

package/package.json +1 -1
package/src/cli.ts +1 -1
package/docs/AGENT-METADATA-PATTERNS.md +0 -1021
package/docs/AGENT_PROTOCOLS.md +0 -28
package/docs/ARCHITECTURAL_OVERVIEW.md +0 -123
package/docs/BENTO_BOXING_DEPRECATION.md +0 -281
package/docs/Bun-SQLite.html +0 -464
package/docs/COMMIT_GUIDELINES.md +0 -367
package/docs/CONFIG_E2E_VALIDATION.md +0 -147
package/docs/CONFIG_UNIFICATION.md +0 -187
package/docs/CONFIG_VALIDATION.md +0 -103
package/docs/DEVELOPER_ONBOARDING.md +0 -36
package/docs/Graph and Vector Database Best Practices.md +0 -214
package/docs/LEGACY_DEPRECATION.md +0 -174
package/docs/MCP_SETUP.md +0 -317
package/docs/PERFORMANCE_BASELINES.md +0 -88
package/docs/QUICK_START_MCP.md +0 -168
package/docs/REPOSITORY_CLEANUP_SUMMARY.md +0 -261
package/docs/SESSION-2026-01-06-METADATA-PATTERNS.md +0 -346
package/docs/SETUP.md +0 -464
package/docs/SETUP_COMPLETE.md +0 -464
package/docs/VISION-AGENT-LEARNING.md +0 -1242
package/docs/_current-config-status.md +0 -93
package/docs/edge-generation-methods.md +0 -57
package/docs/elevator-pitch.md +0 -118
package/docs/graph-and-vector-database-playbook.html +0 -480
package/docs/hardened-sqlite.md +0 -85
package/docs/headless-knowledge-management.md +0 -79
package/docs/john-kaye-flux-prompt.md +0 -46
package/docs/keyboard-shortcuts.md +0 -80
package/docs/opinion-proceed-pattern.md +0 -29
package/docs/polyvis-nodes-edges-schema.md +0 -77
package/docs/protocols/lab-protocol.md +0 -30
package/docs/reaction-iquest-loop-coder.md +0 -46
package/docs/services.md +0 -60
package/docs/sqlite-wal-readonly-trap.md +0 -228
package/docs/strategy/css-architecture.md +0 -40
package/docs/test-document-cycle.md +0 -83
package/docs/test_lifecycle_E2E.md +0 -4
package/docs/the-bicameral-graph.md +0 -83
package/docs/user-guide.md +0 -70
package/docs/vision-helper.md +0 -53
package/polyvis.settings.json.bak +0 -38
package/src/EnlightenedTriad.ts +0 -146
package/src/JIT_Triad.ts +0 -137
package/src/data/experience/test_doc_1.md +0 -2
package/src/data/experience/test_doc_2.md +0 -2
package/src/demo-triad.ts +0 -45
package/src/gardeners/BaseGardener.ts +0 -55
package/src/llm/EnlightenedProvider.ts +0 -95
package/src/services/README.md +0 -56
package/src/services/llama.ts +0 -59
package/src/services/llamauv.ts +0 -56
package/src/services/olmo3.ts +0 -58
package/src/services/phi.ts +0 -52

package/docs/CONFIG_VALIDATION.md DELETED Viewed

@@ -1,103 +0,0 @@
-# Configuration Validation Results
-**Date**: 2026-01-07
-**Status**: ✅ No critical conflicts detected
-**Command**: `bun run validate-config`
-## Summary
-The validation script checked for conflicts between `amalfa.config.json` (user-facing) and `polyvis.settings.json` (legacy/deprecated). Found **1 warning** and **2 info items** - no blocking issues.
-## Findings
-### ⚠️ Warning: Database Path Mismatch
-**Issue**: Two different database paths are configured:
-- `amalfa.config.json`: `.amalfa/multi-source-test.db`
-- `polyvis.settings.json`: `public/resonance.db`
-**Impact**: Medium - The MCP server and daemon use `amalfa.config.json`, so they operate on `.amalfa/multi-source-test.db`. Legacy Resonance utilities may still reference `public/resonance.db`.
-**Resolution**: This is **intentional** for the current testing setup:
-- `.amalfa/multi-source-test.db` is the active database used by MCP server
-- `public/resonance.db` is legacy and only used by deprecated Resonance pipelines
-- No conflict exists because only one database is actively written to
-**Action Required**: None for now. When migrating to v2.0, remove `polyvis.settings.json` entirely.
----
-### ℹ️ Info: Embedding Configuration
-**Finding**: Embedding model controlled by `amalfa.config.json`:
-- Model: `BAAI/bge-small-en-v1.5`
-- Dimensions: 384
-**Note**: `polyvis.settings.json` does not define embeddings - this is correct. All embedding configuration lives in `amalfa.config.json`.
----
-### ℹ️ Info: Source Directory Configuration
-**Finding**: Different source directories defined:
-- `amalfa.config.json` (MCP ingestion): `../polyvis/docs`, `../polyvis/playbooks`
-- `polyvis.settings.json` (legacy): `debriefs`, `playbooks`, `briefs`, `docs`
-**Explanation**: These serve different purposes:
-- **MCP server** ingests from external Polyvis project docs (cross-project knowledge sharing)
-- **Legacy Resonance** processes local project artifacts
-**Status**: Working as intended - no conflict.
----
-## Validation Script Details
-**Location**: `scripts/validate-config.ts`
-**Checks**:
-1. ✅ Database path conflicts
-2. ✅ Multiple database file detection
-3. ✅ Embedding model consistency
-4. ✅ Source directory existence
-5. ✅ Deprecation notice presence
-**Usage**:
-```bash
-bun run validate-config
-```
-**Exit Codes**:
-- `0`: No errors (warnings/info allowed)
-- `1`: Critical errors detected
----
-## Configuration Ownership
-| Setting | Config File | Status |
-|---------|------------|--------|
-| Database path | `amalfa.config.json` | ✅ Active |
-| Embedding model | `amalfa.config.json` | ✅ Active |
-| MCP source dirs | `amalfa.config.json` | ✅ Active |
-| Watch settings | `amalfa.config.json` | ✅ Active |
-| Legacy Resonance paths | `polyvis.settings.json` | ⚠️ Deprecated |
-| Graph tuning params | `polyvis.settings.json` | ⚠️ Needs migration |
----
-## Next Steps
-1. ✅ **Completed**: Validation script created
-2. ✅ **Completed**: Documentation of findings
-3. 🔄 **Pending**: Add deprecation warning when `polyvis.settings.json` is loaded
-4. 🔄 **Pending**: Migrate graph tuning parameters to `amalfa.config.json`
-5. 🔄 **Future (v2.0)**: Remove `polyvis.settings.json` entirely
----
-## Related Documentation
-- [Configuration Unification Strategy](./CONFIG_UNIFICATION.md)
-- [Current Config Status](./docs/_current-config-status.md)
-- [MCP Setup Guide](./MCP_SETUP.md)

package/docs/DEVELOPER_ONBOARDING.md DELETED Viewed

@@ -1,36 +0,0 @@
-# Developer Onboarding: PolyVis "Resonance"
-**Welcome to the PolyVis Codebase.**
-This project is an **Automated Investigating Engine**. It treats code as a rigorous discipline aimed at extracting truth from chaos.
-## 1. The Core Philosophy
-We do not write "features"; we write **Protocols**.
-*   **Zero Magic:** If you can't explain how it works (e.g., using a regex instead of an LLM), use the simple version.
-*   **Harden and Flense:** Standard Output (`stdout`) is for data. Standard Error (`stderr`) is for logs. If your code pollutes `stdout`, it breaks the machine.
-    *   *Read this first:* [Harden and Flense Protocol](../playbooks/harden-and-flense-protocol.md)
-## 2. The Architecture (The "Resonance" Engine)
-The system is clear pipeline:
-1.  **Ingestion:** Raw Markdown -> `src/resonance/ingest.ts` -> SQLite (`resonance.db`).
-2.  **Intelligence:** The `VectorEngine` adds semantic meaning (embeddings) to the Graph.
-3.  **Interface:**
-    -   **Web:** A Sigma.js Graph explorer.
-    -   **MCP:** An agentic API that lets AI "talk" to the database.
-## 3. The Toolchain (Bun)
-We use `bun` for everything.
-*   `bun run dev`: Starts the Web UI (Port 3000).
-*   `bun run mcp`: Starts the Agent Server (JSON-RPC over Stdio).
-*   `bun run build:data`: Rebuilds the database from source files.
-## 4. Coding Standards (How to not break things)
-*   **Database Access:** NEVER connect to the DB manually. Use `DatabaseFactory.connectToResonance()` or `ResonanceDB.init()`. This prevents "Database Locked" errors.
-*   **Scripts:** All scripts live in `scripts/`. They must use the standard `ServiceLifecycle` pattern if they run as daemons.
-    *   *Reference:* [Scripts Playbook](../playbooks/scripts-playbook.md)
-## 5. First Task
-1.  Run `bun run build:data` to seed your local database.
-2.  Run `bun run verify` to test your vector search.
-3.  Start the UI with `bun run dev` and explore the graph.
-*Welcome to the team.*

package/docs/Graph and Vector Database Best Practices.md DELETED Viewed

@@ -1,214 +0,0 @@
-# **The Unified Semantic Layer: Advanced Methodologies for Knowledge Graph and Vector Database Integration**
-## **I. Executive Summary: The Convergence of Structure and Semantics**
-The architecture of enterprise Artificial Intelligence (AI) systems is undergoing a significant transformation, driven by the necessity for verifiable, explainable, and contextually rich generative outcomes. The initial reliance on basic Retrieval Augmented Generation (RAG) pipelines, which primarily utilize vector-only retrieval, has demonstrated critical shortcomings in factual grounding, fidelity, and multi-hop reasoning capabilities. These systems often struggle with poor data quality resulting from fragmented ingestion and susceptibility to LLM hallucination.
-The current trajectory is toward hybrid systems that synthesize the rapid retrieval capabilities of vector databases with the structured, relational integrity of knowledge graphs (KGs). This synthesis establishes what is best described as a **Unified Semantic Layer**. This paradigm shift is driven by the realization that AI agents, operating in complex enterprise workflows, require grounded knowledge that ensures "truth, transparency, and trust" in automated decisions.
-This report details four highly specialized, novel methodologies that form the technical pillars of this next-generation architecture. These methodologies address critical quality challenges across the entire data lifecycle:
-1. **Ingestion Precision:** Achieved through the **Bento Box Protocol** for structure-aware parsing using Abstract Syntax Trees (AST).
-2. **Graph Integrity:** Enforced by the **Louvain-Constrained Graph Quality** framework, which uses modularity metrics as prescriptive structural constraints.
-3. **Relational Breadth:** Automated via **Vector-Driven Semantic Edge Generation**, which uses geometric embedding analysis for multi-class link prediction.
-4. **Output Reliability:** Guaranteed through **Agent-Based Validation**, where autonomous agents use the KG as the ground truth to verify generated facts.
-The analysis concludes that while the underlying algorithms are established, their novel integration into an end-to-end quality control playbook constitutes a unique and necessary standardization for high-fidelity generative systems.
-## **II. The Current Landscape of Semantic Data Stores (Graph and Vector)**
-### **A. Graph Database Market Dynamics: Distributed Architectures and RAG Integration**
-The graph database market is characterized by mature solutions focusing on specialized architectural trade-offs determined by production workload requirements. The foundational necessity of knowledge graphs stems from their ability to model complex relationships across diverse data sources, ensuring adaptability to changing requirements or regulations without requiring full redesigns.
-#### **Architectural Diversity**
-Key players offer distinct scaling philosophies tailored for different applications. **Neo4j**, for instance, utilizes index-free adjacency and the Cypher ecosystem, optimizing it for transactional queries and localized graph traversals. This architecture is typically preferred for entity-centric workflows and medium-sized knowledge graphs where localized transaction integrity is paramount.
-In contrast, platforms like **TigerGraph** employ massively parallel processing (MPP) architectures with GSQL, designed specifically for analytical workloads and deep-link queries across web-scale data. TigerGraph distributes computation and data across clusters, enabling the processing of deeper analytical tasks that exceed the capacity of systems optimized for localized hops. The competitive landscape also includes multi-model databases and graph overlays (like PostgreSQL extensions), offering graph analytical capabilities directly on existing infrastructure, potentially eliminating the need for separate graph database deployments.
-#### **GraphRAG and Agentic Systems Nexus**
-The integration of graph databases with Retrieval Augmented Generation (RAG), known as GraphRAG, has become paramount. GraphRAG moves beyond retrieving simple, similar text chunks by surfacing relevant entities, facts, and relationships, enabling multi-hop reasoning that is unavailable in traditional RAG. The graph is increasingly leveraged to store and retrieve data, enabling AI to generate context-rich responses.
-This structured knowledge is essential for the reliability and debuggability of the next generation of enterprise AI applications: **Agentic AI**. Agents, which are designed to reason, orchestrate, and act autonomously , require the graph to serve as their critical memory and context layer. For example, systems can use an architecture graph (mapping application interactions) or multiple business graphs (sales, finance) to ground a multi-component-driven LLM, allowing complex, natural-language questions to yield graph-grounded, actionable answers.
-### **B. Vector Database Landscape: Performance, Indexing Techniques, and Trade-offs**
-Vector databases specialize in storing and querying high-dimensional vector embeddings, which capture the semantic meaning of text, images, or other content. These systems utilize advanced indexing methods, most notably Hierarchical Navigable Small World (HNSW), which creates a graph-like structure of nodes connected by edges between the most similar vectors. This structure enables rapid navigation through the vector space to identify nearest neighbors quickly.
-#### **Purpose-Built vs. Extension Debate**
-The market offers a broad selection, including dedicated, purpose-built databases such as Pinecone, Milvus, Qdrant, and Weaviate, which are optimized solely for high-dimensional vector search. Concurrently, there is significant development in powerful database extensions, such as pgvector (for PostgreSQL) and k-NN capabilities within search engines like Elasticsearch/OpenSearch.
-The decision framework for enterprises operating at moderate scale (under 100 million vectors) is fundamentally shifting. While specialized systems often claim superior performance, recent benchmarks demonstrate that well-engineered extensions are closing the gap significantly. For instance, pgvectorscale has shown capabilities achieving 471 Queries Per Second (QPS) at 99% recall on 50 million vectors, substantially outperforming some purpose-built systems at the same metric. This performance parity at moderate scale suggests that the overriding factor for many enterprise architects is moving away from pure throughput metrics toward **operational consolidation** and the simultaneous need for vectors alongside traditional relational or NoSQL data. If an organization already uses PostgreSQL, Elasticsearch, or MongoDB, leveraging an extension avoids the high operational complexity and integration cost of introducing a new, separate vector service for non-primary vector workloads.
-### **C. Hybrid System Architecture: Defining the Synergy**
-The necessity for high-fidelity RAG systems mandates a tightly integrated hybrid architecture. This architecture leverages the comparative advantages of both data stores.
-The operational workflow involves a dual approach: First, the vector database quickly executes approximate nearest neighbor search (ANN) by embedding the user query and retrieving semantically relevant data candidates, effectively "casting the net." Second, the knowledge graph receives these candidates, links them to established entities and relationships, applies structural constraints, and performs multi-hop pathfinding to assemble facts from disparate sources—the "traverse and assemble" stage. This combination yields a synergy of speed (from vectors) and grounded reasoning (from graphs).
-A crucial architectural requirement for this integration is robust data consistency. The system must employ continuous, transactionally-safe synchronization mechanisms between the knowledge graph and the vector index. This ensures that any new or updated data in the knowledge graph is immediately reflected and available for retrieval by the LLM, maintaining real-time knowledge alignment.
-Comparative Analysis of Graph Database Architectures (RAG Context)
-| Platform | Primary Architecture | Query Model Focus | Scalability Paradigm | Typical RAG Use Case |
-| :---- | :---- | :---- | :---- | :---- |
-| Neo4j | Index-Free Adjacency (Pointer-based) | Localized Traversals (Transactional) | Scale-up (Optimized for deep, localized hops) | Contextual Augmentation, Small/Medium KG |
-| TigerGraph | Massively Parallel Processing (MPP) | Deep-Link Analytics (Analytical) | Scale-out (Distributed computation) | Web-Scale KG, GNN Training, Large-scale Provenance Tracking |
-| Multi-Model/Extension | Relational/NoSQL Base | Hybrid (Graph Overlay) | Dependent on base system | Coexistence with existing data, Data Lineage |
-## **III. Data Ingestion Precision: The 'Bento Box Protocol' for Structured Chunking**
-### **A. Limitations of Text-Based Chunking**
-The performance of any RAG system is critically dependent on the quality of its retrieval component; this adheres strictly to the "garbage in, garbage out" principle. Traditional chunking strategies, such as fixed-size token splitting or heuristic character breaks, are fundamentally structure-agnostic. These methods frequently compromise RAG effectiveness by fragmenting crucial semantic units or logical context. For instance, a function definition, a procedure, or a major section of a document may be split arbitrarily across multiple chunks, breaking important conceptual relationships and leading to retrieval failures and suboptimal results.
-### **B. Structure-Aware Segmentation via AST Parsing**
-To overcome the limitations of naive text segmentation, advanced methodologies leverage Abstract Syntax Tree (AST) parsing. This technique, traditionally reserved for code analysis, treats structured documents—including source code and technical documentation in formats like Markdown—not as plain text, but as hierarchical, semantically rich data structures. Libraries such as Tree-Sitter enable the parsing of source material into an AST, where logical constructs (e.g., Markdown headers, lists, code blocks, or software classes) are encoded as distinct nodes.
-The approach known as Chunking via Abstract Syntax Trees (cAST) recursively processes these structured representations. It uses a split-then-merge algorithm, traversing the tree top-down to fit large, syntactically defined nodes into a single chunk whenever possible. If a node must be split due to size constraints, a subsequent greedy merging step combines adjacent small sibling nodes. This methodology ensures that each resulting chunk is a self-contained, semantically coherent unit, dramatically improving the contextual integrity compared to line-based heuristics.
-### **C. The 'Bento Box Protocol' using ast-grep**
-The 'Bento Box Protocol' formalizes the application of AST-based structural search and parsing for documentation segmentation, specifically utilizing tools like ast-grep.
-#### **Protocol Definition and Function**
-The protocol defines each chunk—the "bento box"—as a syntactically sound, complete segment, such as an entire section demarcated by a Markdown header structure. This structural precision is enabled by ast-grep, a tool that acts as a syntax-aware grep/sed utility, matching Abstract Syntax Tree nodes using code patterns rather than simple text. ast-grep allows for the definition of patterns (e.g., matching a complete function or a specific Markdown section) to locate and extract these meaningful structural units.
-This rigorous structural definition maximizes contextual information density. Critically, this strategy transforms data ingestion from a simple utilitarian step into a powerful **pre-processing step for semantic transformation**. Guaranteeing the retrieval of self-contained nodes drastically simplifies the subsequent tasks of Named Entity Recognition (NER) and Relationship Extraction required for graph construction. If an input chunk is guaranteed to contain a coherent logical unit, the Large Language Model (LLM) or extraction pipeline performing the graph transformation can extract facts deterministically, improving the fidelity of the resulting knowledge graph and reducing reliance on imprecise LLM inference during graph creation.
-The structural search capabilities provided by ast-grep are being standardized within contexts like the Model Context Protocol (MCP), enabling AI assistants to perform precise structural analysis of their source material, moving beyond text-based matching.
-Structured Ingestion: AST Chunking vs. Traditional Methods
-| Chunking Strategy | Mechanism | Boundary Determination | RAG Retrieval Quality | Suitability for Graph Node Creation |
-| :---- | :---- | :---- | :---- | :---- |
-| Character/Token Split | Heuristic (Fixed size/Overlap) | Arbitrary text breaks | Low Contextual Recall (Fragmented facts) | Poor (Breaks semantic triples) |
-| Markdown Header Split | Rule-based (Pattern matching) | Predefined header levels | Moderate (Contextual but lacks structural detail) | Fair (Requires heavy post-processing) |
-| AST-Based (Bento Box Protocol) | Structural Parsing (Tree-Sitter/ast-grep) | Syntax nodes (functions, classes, logic blocks) | High Contextual Coherence (Semantic units) | Excellent (Self-contained, verifiable nodes) |
-## **IV. Graph Construction Integrity: Louvain-Constrained Graph Quality**
-The structural integrity of a knowledge graph determines its usefulness for multi-hop reasoning and explainable AI. The presence of overly dense, chaotic, or semantically misaligned clusters can degrade performance and traversal efficiency. The **Louvain-Constrained Graph Quality** methodology leverages established community detection algorithms not merely for analysis, but as a proactive, prescriptive quality control mechanism.
-### **A. The Role of Modularity in Network Structure**
-The Louvain method is a prominent, hierarchical, and greedy optimization algorithm designed to identify non-overlapping communities in large networks. Its objective is to maximize the modularity score (Q), a quantitative metric that assesses the quality of a network partition. Modularity quantifies how much more densely connected the nodes within a community are compared to the expected connections in a randomized network of the same size and degree distribution. Networks with high modularity are structurally well-formed, possessing groups with a high density of internal connections and a low density of links between groups.
-### **B. The Novelty: Louvain as a Prescriptive Constraint**
-Traditionally, community detection is a descriptive process, used after a graph has been built to understand its structure. The novel methodology applies Louvain as a **prescriptive constraint** throughout the ingestion and edge-creation phases.
-In applications such as knowledge graph partitioning (e.g., LouvainSplit), the algorithm's capability to enforce optimal partitioning is leveraged to ensure graph data is efficiently distributed across clusters, which is vital for the parallel training of Knowledge Graph Embedding (KGE) models.
-#### **Structural Integrity Check (\\Delta Q)**
-When a new entity node or an inferred edge (e.g., from the vector-driven process detailed in Section V) is proposed, the system evaluates the potential impact on the overall graph topology. The mechanism computes the change in modularity (\\Delta Q) caused by the addition of that specific element. If adding the edge or node significantly reduces the global modularity score, or disrupts the internal compactness of existing, high-Q communities, the element is flagged or rejected. This process ensures the graph structure remains robust, logically sound, and partitionable.
-This constrained approach addresses inherent limitations of modularity optimization, such as its tendency to favor large communities and potentially fail to resolve small, yet important, dense modules (the resolution limit). By using Louvain for iterative structural assurance, the system guarantees both internal compactness and inter-community differentiation within the network.
-\#\#\#\# Topological Defense Against Semantic Noise
-The application of Louvain modularity as an architectural constraint provides a topological defense mechanism against the inevitable structural degradation caused by high-volume, automated semantic inference. Vector similarity processes (Section V) are excellent at detecting *relatedness* but often struggle to enforce *structural relevance*. If highly related chunks across disparate domains are linked based purely on semantic similarity, the graph can quickly become an overly dense structure—a "hairball"—that experiences performance bottlenecks due to quadratic complexity during link loading. The constraint prevents this structural entropy by enforcing rules that prioritize edges that maintain or enhance modularity, thereby ensuring efficient pathfinding and optimized traversals during retrieval.
-The Louvain Modularity Constraint in KG Engineering
-| KG Lifecycle Phase | Constraint Type | Louvain Objective | Quantitative Metric | Impact on System Quality |
-| :---- | :---- | :---- | :---- | :---- |
-| Ingestion/Partitioning | Structural Optimization | Maximize Q for cohesive partitions | Modularity Score (Q) | Enables efficient, parallel KGE training and distributed query routing |
-| Edge Generation | Edge Filtering/Penalization | Prevent large reductions in global Q | \\Delta Q (Change in Modularity) | Dampens semantic noise and prevents quadratic complexity caused by overly dense clusters |
-| Validation/Testing | Community Mapping | Confirm entity group membership | Community ID Assignment | Ensures factual provenance aligns with structural domains |
-## **V. Relational Inference: Vector-Driven Semantic Edge Generation**
-The synergy between graph and vector databases is most powerfully expressed in the automated generation of new, typed relationships. This process, **Vector-Driven Semantic Edge Generation**, moves beyond simple similarity scoring to inferring the specific semantic nature of the connection between entities.
-### **A. Foundation in Knowledge Graph Embeddings (KGE)**
-The process is grounded in Knowledge Graph Embedding (KGE) models. These models transform graph components—entities (nodes) and relationships (edges)—into a low-dimensional continuous vector space. KGE ensures that the resulting vectors preserve the various structural and semantic properties of the graph, allowing computational methods to measure plausibility. The proximity or geometric positioning of these embedding vectors enables tasks such as link prediction, where missing or inferred relationships can be identified.
-### **B. The Challenge of Relation Typing**
-Standard vector databases excel at measuring semantic *similarity* (e.g., cosine distance) between documents or terms. However, a simple distance threshold can only confirm *relatedness*. A robust knowledge graph requires edges to convey specific meaning—semantic richness—distinguishing between relations such as *Is-A*, *Caused-By*, or *Used-To-Treat*. Similarity-based schema matching alone is insufficient because it typically identifies only equivalent relationships while ignoring vital taxonomic or causal relationships necessary for meaningful reasoning.
-### **C. Algorithmic Framework for Multi-Class Edge Inference**
-The specialized methodology implemented here addresses the multi-class link prediction challenge by treating the relation type as a classification problem, leveraging the geometry of the embedding space.
-1. **High-Fidelity Embedding Generation:** Entities and relationships are first processed to create robust embeddings, often utilizing large language models (such as RoBERTa) on their labels and descriptions. These vectors are typically normalized to unit vectors to enhance cosine similarity calculations. Textual content chunks (derived from the Bento Box Protocol) are also vectorized, often by calculating the centroid over all word embeddings within the document.
-2. **Distance and Classification:** The vector distance (or scoring function) is computed between two entities to determine their relatedness. Instead of simply applying a threshold for existence, the relative positioning of the candidate entity pair is analyzed by a classification model. Supervised approaches, such as a 1D-Convolutional Neural Network (1D-CNN) or a Fully Connected Neural Network (FC-NN), are trained on known relation types to predict the specific class of relationship that exists between the two entities.
-3. **Consistency in Embedding Strategy:** For the architecture to function optimally, it is critical that the underlying embedding strategy used for the vector database's fast retrieval layer is consistent with the methodology used for the graph system's semantic inference layer. This ensures that the semantically "close" chunks retrieved by the vector index are also optimally aligned within the KGE model's relational plane, resulting in successful and accurate relational classification.
-### **D. Scaling and Density Management**
-High-volume semantic link inference inevitably creates a challenge regarding graph density. Uncontrolled linking, such as connecting all nodes with common keywords, results in quadratic growth in edge count, leading to processing complexity. The solution requires prioritizing the materialization of high-confidence, typed edges and, crucially, integrating the Louvain Constraint (Section IV) as a topological filter to manage density. This prevents the generation of structurally redundant or counterproductive links, thereby enabling faster traversals.
-The integrated application of advanced KGE link prediction techniques for **multi-class edge typing** on structurally coherent document segments (Bento Box chunks), constrained by modularity metrics, differentiates this approach significantly from simple vector similarity linking.
-## **VI. Trust and Fidelity: Agent-Based Validation using Knowledge Graphs**
-As enterprise AI moves from descriptive responses to autonomous action, the reliability of the generated output—especially its factual accuracy—becomes a non-negotiable requirement. The final step in the quality-controlled pipeline is **Agent-Based Validation**, which formally establishes the knowledge graph as the authoritative ground truth.
-### **A. The Agentic AI Shift**
-Agentic AI systems are designed for goal-orientation, autonomy, and context-awareness, capable of orchestrating complex workflows. For these systems to operate reliably, they must be grounded in facts. By combining the fluency of LLMs with the validated facts of knowledge graphs, agents can deliver both language intelligence and fact-grounded decisions, overcoming the inherent risks of LLM hallucination.
-\#\#\# B. Establishing the Knowledge Graph as Ground Truth
-The knowledge graph is utilized not just as a source for retrieval but as an active verification layer. Specialized validation frameworks, such as KGARevion or GraphEval, are employed as autonomous agents.
-The validation process is systematic and multi-step:
-1. **Triplet Generation:** The agent first generates relevant facts (triplets) from the LLM’s final answer or intermediate reasoning steps.
-2. **Verification Against KG:** It then systematically cross-references these generated triplets against the grounded, constrained knowledge graph.
-3. **Fact Filtering:** Only accurate, contextually relevant information verified against the graph structure is retained for the final output, strengthening reasoning and eliminating errors.
-This verification provides granular insight into the LLM’s factual accuracy, addressing the deficiencies of prior methods that could not systematically check all facts or explain where errors occurred.
-### **C. Quantitative Metrics for Trust and Fidelity**
-The effectiveness of agent-based validation is measured using advanced quantitative metrics specific to agentic systems. These metrics include:
-* **Faithfulness:** A RAG metric that evaluates whether the LLM's generated output factually aligns *only* with the information provided in the retrieval context (i.e., the validated knowledge graph facts).
-* **Correctness and Hallucination:** These metrics determine if the output is factually correct based on the established ground truth of the KG. The structured nature of the KG allows for systemic checking of every piece of information.
-* **Tool Correctness and Argument Correctness:** Essential for autonomous agents, these metrics assess the agent's ability to correctly identify and use external tools, ensuring that the execution of tasks (e.g., calling an API) is accurately grounded in the facts retrieved and validated by the graph.
-### **D. Continuous Learning Feedback Loop**
-A crucial consequence of this high-fidelity validation process is the creation of a structured feedback loop for continuous system improvement. When a validation agent flags a fact as unverified, it signifies a **knowledge gap** in the knowledge graph, despite the fact being fluently generated by the LLM.
-Because validation frameworks provide explainable decisions identifying the specific failing triplet, the system receives a structured deficiency signal far superior to a generalized hallucination score. This structured signal triggers automated processes for knowledge discovery, data sourcing, and real-time graph enrichment. This mechanism ensures that the AI system continuously learns and aligns with the most current, verified enterprise knowledge, accelerating the time-to-accuracy.
-The use of the knowledge graph as a **quantitative, closed-loop verification layer**—rather than merely a retrieval source—is highly unique and is necessary for deploying compliant and safe autonomous systems in high-stakes regulatory environments.
-## **VII. Synthesis, Architectural Playbook, and Conclusion**
-### **A. Interdependence and Synergy of the Four Methodologies**
-The four specialized methodologies analyzed—Bento Box Protocol, Louvain Constraint, Vector-Driven Semantic Edges, and Agent-Based Validation—are not isolated techniques but interconnected components of a single, structurally sound semantic pipeline. They function as a comprehensive system of checks and balances designed to enforce structural and factual quality end-to-end.
-* The **Bento Box Protocol** (structural ingestion) feeds semantically coherent nodes to the KG.
-* This structural integrity simplifies subsequent **Vector-Driven Semantic Edge Generation** (relational inference).
-* The inferred edges are then subjected to the **Louvain Constraint** (structural topology control), preventing the formation of chaotic, high-density clusters.
-* The resultant high-integrity graph forms the immutable **Ground Truth** used by the **Agent-Based Validation** (factual verification) layer.
-This architecture replaces unreliable, linear RAG pipelines with a resilient, layered system that minimizes data entropy at every stage.
-### **B. Architectural Recommendations: A Unified Hybrid Data Playbook**
-For organizations seeking to move beyond heuristic RAG implementations toward engineered semantic AI, the following architectural playbook formalizes the critical stages necessary for high-fidelity GraphRAG deployment. The core requirement is the tight coupling of vector and graph engines, often through integrated cloud solutions or dedicated hybrid systems.
-Best Practices Playbook: Hybrid RAG Pipeline Stages
-| Stage | Methodology Adopted | Core Technology/Constraint | Optimization Target |
-| :---- | :---- | :---- | :---- |
-| 1\. Structural Ingestion | Bento Box Protocol (AST Chunking) | ast-grep / Tree-Sitter based on syntax nodes | Maximizing contextual recall and semantic completeness of nodes |
-| 2\. Graph Construction | Vector-Driven Semantic Edges | KGE Models \+ Classification/Link Prediction | Automating multi-class link prediction while managing graph density and complexity |
-| 3\. Quality Assurance | Louvain Constraint Optimization | Modularity Score (Q) and Community Differentiation | Enforcing structural integrity and partition well-formedness for distributed processing |
-| 4\. Retrieval & Validation | Agent-Based Verification | GraphEval / KGARevion (Knowledge Graph as Ground Truth) | Ensuring factual correctness and reducing LLM hallucination rate (Faithfulness) |
-### **C. Final Assessment of Uniqueness and Standardization Potential**
-#### **Uniqueness Assessment**
-The methodologies discussed are deemed unique not because they utilize unknown algorithms, but due to their integrated and constrained application within the GraphRAG pipeline. AST parsing (Bento Box) is common in code analysis, and Louvain is a well-known community detection algorithm. However, the originality lies in two specific applications:
-1. **The Prescriptive Use of Graph Metrics:** Utilizing the Louvain modularity score (Q) as an active, structural constraint during automated ingestion and edge creation is highly novel. It shifts the metric from a post-analysis indicator to a real-time, topological defense mechanism against data chaos.
-2. **Structural Ingestion for KG Node Creation:** The deliberate, syntactically-aware transformation of documentation (Bento Box Protocol) specifically to create structurally sound, deterministic knowledge graph nodes eliminates the upstream quality issues that plague traditional text-based RAG.
-#### **Standardization Potential**
-These methodologies represent a necessary transition from empirical, heuristic RAG design to a highly engineered, principled architectural framework. By providing measurable constraints and structural integrity checks, they solve core enterprise AI challenges—hallucination, explainability, and ingestion bottlenecks.
-Given the increasing demand for high-fidelity, explainable AI, particularly in regulated industries (e.g., finance, biomedical Q\&A ), these integrated techniques are highly likely to become the **de facto architectural playbook** for advanced enterprise AI applications within the foreseeable future.
-### **D. Recommendations for Future Development**
-Based on the required complexity and interdependence of these methods, future research and development should prioritize:
-1. **Standardized Parsers:** Efforts should focus on creating standardized, open-source Markdown and structured document-to-AST parsing libraries optimized for knowledge graph extraction templates, ensuring cross-platform and language invariance.
-2. **Dynamic Constraint Frameworks:** Developing frameworks that allow dynamic tuning of the Louvain constraint threshold (\\Delta Q) based on the semantic domain or regulatory requirement of the ingested data will enhance adaptability.
-3. **High-Throughput Orchestration:** Investing in scalable, high-throughput agent orchestration layers is essential to minimize the latency introduced by the structured, multi-step validation and verification process. This will enable the integration of high-fidelity validation into real-time, production-critical workflows.
-#### **Works cited**
-1\. Boosting Q\&A Accuracy with GraphRAG Using PyG and Graph Databases | NVIDIA Technical Blog, https://developer.nvidia.com/blog/boosting-qa-accuracy-with-graphrag-using-pyg-and-graph-databases/ 2\. 7 Knowledge Graph Examples of 2026, https://www.puppygraph.com/blog/knowledge-graph-examples 3\. NODES 2025 — A Recap in 10 Videos \- Graph Database & Analytics \- Neo4j, https://neo4j.com/blog/developer/nodes-2025-a-recap-in-10-videos/ 4\. CAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree \- CMU School of Computer Science, https://www.cs.cmu.edu/\~sherryw/assets/pubs/2025-cast.pdf 5\. Less is More: Simple yet Effective Heuristic Community Detection with Graph Convolution Network \- arXiv, https://arxiv.org/html/2501.12946v1 6\. Efficient Graph Partitioning using LouvainSplit Algorithm \- Prime Open Access, https://www.primeopenaccess.com/scholarly-articles/efficient-graph-partitioning-using-louvainsplit-algorithm.pdf 7\. Embedding based Link Prediction for Knowledge Graph Completion \- FIZ Karlsruhe, https://www.fiz-karlsruhe.de/sites/default/files/FIZ/Dokumente/Forschung/ISE/Publications/2020-Biswas-CIKM-DC-Embedding-based-Link-Prediction-for-Knowledge-Graph.pdf 8\. A Semantic Partitioning Method for Large-Scale Training of Knowledge Graph Embeddings \- arXiv, https://arxiv.org/pdf/2501.04613? 9\. KGARevion \- An AI Agent for Knowledge-Intensive Biomedical QA \- Zitnik Lab, https://zitniklab.hms.harvard.edu/projects/KGARevion/ 10\. How to Build a Knowledge Graph for LLMs \- XenonStack, https://www.xenonstack.com/blog/llms-knowledge-graph-agentic-ai 11\. TigerGraph vs Neo4j: How to Choose for Your Workload \- PuppyGraph, https://www.puppygraph.com/blog/tigergraph-vs-neo4j 12\. Vector Databases: Tutorial, Best Practices & Examples \- Nexla, https://nexla.com/ai-infrastructure/vector-databases/ 13\. How to Convert Unstructured Text to Knowledge Graphs Using LLMs \- Neo4j, https://neo4j.com/blog/developer/unstructured-text-to-knowledge-graph/ 14\. The Top Graph Database Companies to Watch in 2025 \- Syntaxia, https://www.syntaxia.com/post/the-top-graph-database-companies-to-watch-in-2025 15\. Introduction to Agentic AI and Its Design Patterns | by Lekha Priya \- Medium, https://lekha-bhan88.medium.com/introduction-to-agentic-ai-and-its-design-patterns-af8b7b3ef738 16\. Best Vector Databases in 2025: A Complete Comparison Guide \- Firecrawl, https://www.firecrawl.dev/blog/best-vector-databases-2025 17\. What Are Vector Databases? Definition And Uses \- Databricks, https://www.databricks.com/glossary/vector-database 18\. What is a Vector Database & How Does it Work? Use Cases \+ Examples \- Pinecone, https://www.pinecone.io/learn/vector-database/ 19\. Top 9 Vector Databases as of December 2025 \- Shakudo, https://www.shakudo.io/blog/top-9-vector-databases 20\. AWS Prescriptive Guidance \- Choosing an AWS vector database for RAG use cases \- AWS Documentation, https://docs.aws.amazon.com/pdfs/prescriptive-guidance/latest/choosing-an-aws-vector-database-for-rag-use-cases/choosing-an-aws-vector-database-for-rag-use-cases.pdf 21\. Vector Database & Graph Database for Hybrid GraphRAG Playbook \- Cognee, https://www.cognee.ai/blog/fundamentals/vectors-and-graphs-in-practice 22\. What is Graph RAG | Ontotext Fundamentals, https://www.ontotext.com/knowledgehub/fundamentals/what-is-graph-rag/ 23\. Enhancing RAG performance with smart chunking strategies \- IBM Developer, https://developer.ibm.com/articles/awb-enhancing-rag-performance-chunking-strategies/ 24\. cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree \- arXiv, https://arxiv.org/html/2506.15655v2 25\. What is ast-grep?, https://ast-grep.github.io/guide/introduction.html 26\. ast-grep | structural search/rewrite tool for many languages, https://ast-grep.github.io/ 27\. ast-grep/ast-grep: A CLI tool for code structural search, lint and rewriting. Written in Rust \- GitHub, https://github.com/ast-grep/ast-grep 28\. ast-grep Playground Manual, https://ast-grep.github.io/reference/playground.html 29\. ast-grep MCP Server \- GitHub, https://github.com/ast-grep/ast-grep-mcp 30\. Scaling Knowledge Graphs by Eliminating Edges \- The New Stack, https://thenewstack.io/scaling-knowledge-graphs-by-eliminating-edges/ 31\. Louvain method \- Wikipedia, https://en.wikipedia.org/wiki/Louvain\_method 32\. Louvain \- Neo4j Graph Data Science, https://neo4j.com/docs/graph-data-science/current/algorithms/louvain/ 33\. Community Detection with Louvain and Infomap \- statworx, https://www.statworx.com/en/content-hub/blog/community-detection-with-louvain-and-infomap 34\. Resolution limit in community detection | PNAS, https://www.pnas.org/doi/10.1073/pnas.0605965104 35\. arXiv:2107.07842v1 \[cs.IR\] 16 Jul 2021, https://arxiv.org/pdf/2107.07842 36\. Survey on Embedding Models for Knowledge Graph and its Applications \- arXiv, https://arxiv.org/html/2404.09167v1 37\. Node similarity-based graph convolution for link prediction in biological networks \- NIH, https://pmc.ncbi.nlm.nih.gov/articles/PMC8652026/ 38\. Understanding Semantic Meaning and Similarity Search: Cosine Similarity and Euclidean Distance | by Mehmet Ozkaya, https://mehmetozkaya.medium.com/understanding-semantic-meaning-and-similarity-search-cosine-similarity-and-euclidean-distance-666fcb5911ea 39\. Semantic similarity \- Wikipedia, https://en.wikipedia.org/wiki/Semantic\_similarity 40\. Knowledge Graph-based Retrieval-Augmented Generation for Schema Matching \- arXiv, https://arxiv.org/html/2501.08686v1 41\. Estimating Text Similarity based on Semantic Concept Embeddings \- arXiv, https://arxiv.org/html/2401.04422v1 42\. GraphEval: A knowledge-graph based LLM hallucination evaluation framework \- Amazon Science, https://www.amazon.science/publications/grapheval-a-knowledge-graph-based-llm-hallucination-evaluation-framework 43\. LLM Evaluation Metrics: The Ultimate LLM Evaluation Guide \- Confident AI, https://www.confident-ai.com/blog/llm-evaluation-metrics-everything-you-need-for-llm-evaluation 44\. Graph RAG in the Wild: Insights and Best Practices from Real-World Applications, https://www.semantic-web-journal.net/content/graph-rag-wild-insights-and-best-practices-real-world-applications

package/docs/LEGACY_DEPRECATION.md DELETED Viewed

@@ -1,174 +0,0 @@
-# Legacy Configuration Deprecation Plan
-**Target**: Remove `polyvis.settings.json` and consolidate to `amalfa.config.json`
-**Timeline**: v1.0 (deprecation warnings) → v2.0 (removal)
-**Status**: 🔄 In Progress
-## Current State
-### Files Using `polyvis.settings.json`
-**Active (needs migration)**:
-1. `src/resonance/config.ts` - ResonanceConfig loader (✅ deprecation warning added)
-2. `src/resonance/db.ts` - ResonanceDB uses legacy paths
-3. `src/resonance/DatabaseFactory.ts` - Factory pattern for DB instances
-4. `src/core/VectorEngine.ts` - Vector search initialization
-5. `src/utils/EnvironmentVerifier.ts` - Environment validation
-6. `src/pipeline/HarvesterPipeline.ts` - Tag harvesting (soft dependency)
-**Legacy scripts** (low priority):
-- `scripts/verify/*` - Debug/diagnostic scripts (15+ files)
-- `scripts/cli/harvest.ts` - Concept harvesting CLI
-- Various test/lab utilities
-### Configuration Comparison
-| Feature | polyvis.settings.json | amalfa.config.json |
-|---------|----------------------|-------------------|
-| Database path | `public/resonance.db` | `.amalfa/multi-source-test.db` |
-| Source directories | `debriefs`, `playbooks`, `briefs`, `docs` | `../polyvis/docs`, `../polyvis/playbooks` |
-| Embeddings | ❌ Not defined | ✅ BAAI/bge-small-en-v1.5 (384d) |
-| Watch config | ❌ Not defined | ✅ Enabled, 1000ms debounce |
-| Graph tuning | ✅ Louvain params | ❌ Needs migration |
-| Persona fixtures | ✅ Lexicon, CDA | ❌ Needs migration |
----
-## Migration Strategy
-### Phase 1: Expand amalfa.config.json (Current)
-Add missing features from `polyvis.settings.json` to `amalfa.config.json`:
-```json
-{
-  "sources": ["../polyvis/docs", "../polyvis/playbooks"],
-  "database": ".amalfa/multi-source-test.db",
-  "embeddings": {
-    "model": "BAAI/bge-small-en-v1.5",
-    "dimensions": 384
-  },
-  "watch": {
-    "enabled": true,
-    "debounce": 1000
-  },
-  "excludePatterns": ["node_modules", ".git", ".amalfa"],
-  // NEW: Migrate from polyvis.settings.json
-  "graph": {
-    "tuning": {
-      "louvain": {
-        "persona": 0.3,
-        "experience": 0.25
-      }
-    }
-  },
-  "fixtures": {
-    "lexicon": "scripts/fixtures/conceptual-lexicon-ref-v1.79.json",
-    "cda": "scripts/fixtures/cda-ref-v63.json"
-  }
-}
-```
-### Phase 2: Update Code to Use amalfa.config.json
-1. **Modify `src/resonance/config.ts`**:
-   - Try loading `amalfa.config.json` first
-   - Fall back to `polyvis.settings.json` with deprecation warning
-   - Map unified config structure to ResonanceConfig
-2. **Update ResonanceDB initialization**:
-   - Accept config object in constructor
-   - Use `amalfa.config.json` paths
-3. **Migrate graph tuning parameters**:
-   - Add `graph.tuning` to AmalfaConfig interface
-   - Use in Louvain clustering algorithms
-4. **Handle persona fixtures**:
-   - Add `fixtures` section to AmalfaConfig
-   - Update HarvesterPipeline to read from new config
-### Phase 3: Update Scripts
-- Mark all `scripts/verify/*` as legacy (add README note)
-- Update `scripts/cli/harvest.ts` to use unified config
-- Create `scripts/migrate-config.ts` tool to auto-convert
-### Phase 4: Remove Legacy (v2.0)
-- Delete `polyvis.settings.json`
-- Remove `src/resonance/config.ts` fallback logic
-- Remove EnvironmentVerifier (replaced by modern config validation)
-- Archive legacy scripts to `scripts/legacy/`
----
-## Migration Checklist
-### v1.0 (Current Release)
-- [x] Add deprecation warning to `loadConfig()` (src/resonance/config.ts)
-- [x] Document current config conflicts (docs/CONFIG_VALIDATION.md)
-- [ ] Extend `amalfa.config.json` schema with graph tuning
-- [ ] Extend `amalfa.config.json` schema with fixtures
-- [ ] Update `src/config/defaults.ts` with new fields
-- [ ] Create config migration helper script
-- [ ] Test all MCP commands with unified config
-### v1.1 (Transition Period)
-- [ ] Update ResonanceDB to accept unified config
-- [ ] Update VectorEngine to use unified config
-- [ ] Make polyvis.settings.json fully optional
-- [ ] Add config auto-migration on first run
-- [ ] Update all documentation to reference amalfa.config.json only
-### v2.0 (Legacy Removal)
-- [ ] Remove polyvis.settings.json from repo
-- [ ] Remove ResonanceConfig loader fallback
-- [ ] Archive legacy verify scripts
-- [ ] Remove EnvironmentVerifier
-- [ ] Update CHANGELOG with breaking changes
----
-## Testing Strategy
-1. **Backward compatibility** (v1.0-1.1):
-   - Both configs work simultaneously
-   - Deprecation warnings guide users
-   - No breaking changes
-2. **Migration path validation**:
-   - Test with only `amalfa.config.json`
-   - Test with only `polyvis.settings.json` (warnings)
-   - Test with both (unified config takes precedence)
-3. **Removal readiness** (v2.0):
-   - All tests pass without polyvis.settings.json
-   - No imports of `src/resonance/config.ts` legacy loader
-   - Documentation references only unified config
----
-## Naming Review (Post-Migration)
-Once we have a single config file, review these names:
-**Consider renaming**:
-- `.amalfa/` → Could stay, or rename to `.amalfa-data/`?
-- `ResonanceDB` → `AmalfaDB`? (breaks API)
-- `sources` → `ingest` or `watch_dirs`?
-- `embeddings.model` → More descriptive field names?
-**Keep as-is**:
-- `database` - Clear
-- `watch` - Standard terminology
-- `excludePatterns` - Matches .gitignore convention
----
-## Related Documentation
-- [Configuration Unification Strategy](./CONFIG_UNIFICATION.md)
-- [Configuration Validation](./CONFIG_VALIDATION.md)
-- [MCP Setup Guide](./MCP_SETUP.md)