amalfa 1.0.2 → 1.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/package.json +1 -1
  2. package/src/cli.ts +1 -1
  3. package/docs/AGENT-METADATA-PATTERNS.md +0 -1021
  4. package/docs/AGENT_PROTOCOLS.md +0 -28
  5. package/docs/ARCHITECTURAL_OVERVIEW.md +0 -123
  6. package/docs/BENTO_BOXING_DEPRECATION.md +0 -281
  7. package/docs/Bun-SQLite.html +0 -464
  8. package/docs/COMMIT_GUIDELINES.md +0 -367
  9. package/docs/CONFIG_E2E_VALIDATION.md +0 -147
  10. package/docs/CONFIG_UNIFICATION.md +0 -187
  11. package/docs/CONFIG_VALIDATION.md +0 -103
  12. package/docs/DEVELOPER_ONBOARDING.md +0 -36
  13. package/docs/Graph and Vector Database Best Practices.md +0 -214
  14. package/docs/LEGACY_DEPRECATION.md +0 -174
  15. package/docs/MCP_SETUP.md +0 -317
  16. package/docs/PERFORMANCE_BASELINES.md +0 -88
  17. package/docs/QUICK_START_MCP.md +0 -168
  18. package/docs/REPOSITORY_CLEANUP_SUMMARY.md +0 -261
  19. package/docs/SESSION-2026-01-06-METADATA-PATTERNS.md +0 -346
  20. package/docs/SETUP.md +0 -464
  21. package/docs/SETUP_COMPLETE.md +0 -464
  22. package/docs/VISION-AGENT-LEARNING.md +0 -1242
  23. package/docs/_current-config-status.md +0 -93
  24. package/docs/edge-generation-methods.md +0 -57
  25. package/docs/elevator-pitch.md +0 -118
  26. package/docs/graph-and-vector-database-playbook.html +0 -480
  27. package/docs/hardened-sqlite.md +0 -85
  28. package/docs/headless-knowledge-management.md +0 -79
  29. package/docs/john-kaye-flux-prompt.md +0 -46
  30. package/docs/keyboard-shortcuts.md +0 -80
  31. package/docs/opinion-proceed-pattern.md +0 -29
  32. package/docs/polyvis-nodes-edges-schema.md +0 -77
  33. package/docs/protocols/lab-protocol.md +0 -30
  34. package/docs/reaction-iquest-loop-coder.md +0 -46
  35. package/docs/services.md +0 -60
  36. package/docs/sqlite-wal-readonly-trap.md +0 -228
  37. package/docs/strategy/css-architecture.md +0 -40
  38. package/docs/test-document-cycle.md +0 -83
  39. package/docs/test_lifecycle_E2E.md +0 -4
  40. package/docs/the-bicameral-graph.md +0 -83
  41. package/docs/user-guide.md +0 -70
  42. package/docs/vision-helper.md +0 -53
  43. package/polyvis.settings.json.bak +0 -38
  44. package/src/EnlightenedTriad.ts +0 -146
  45. package/src/JIT_Triad.ts +0 -137
  46. package/src/data/experience/test_doc_1.md +0 -2
  47. package/src/data/experience/test_doc_2.md +0 -2
  48. package/src/demo-triad.ts +0 -45
  49. package/src/gardeners/BaseGardener.ts +0 -55
  50. package/src/llm/EnlightenedProvider.ts +0 -95
  51. package/src/services/README.md +0 -56
  52. package/src/services/llama.ts +0 -59
  53. package/src/services/llamauv.ts +0 -56
  54. package/src/services/olmo3.ts +0 -58
  55. package/src/services/phi.ts +0 -52
@@ -1,28 +0,0 @@
1
- # Agent Protocols: The "Graph First" Directive
2
-
3
- **Target Audience:** AI Agents (Coding Assistants, MCP Clients)
4
- **Status:** Active Protocol
5
-
6
- ## 1. The Prime Directive: Verify, Then Act
7
- You are operating within the **PolyVis Resonance System**, a knowledge graph that tracks not just code, but the *intent* and *history* behind it.
8
-
9
- **Rule:** Before you modify any complex system (Ingestion, Vector Engine, Zombie Defense), you must **Ground Yourself** in the existing context.
10
-
11
- ## 2. The Trigger: When to use the MCP Server?
12
- You possess a tool called `search_documents` (or similar). You must use it when:
13
- 1. **Ambiguity:** "I don't know why this code is here." -> Query: *"Why does ZombieDefense exist?"*
14
- 2. **Architecture:** "Where does the data flow?" -> Query: *"Explain the ingestion pipeline data flow."*
15
- 3. **Standards:** "How should I log errors?" -> Query: *"What is the Harden and Flense protocol?"*
16
-
17
- ## 3. The "Graph First" Workflow
18
- Do not rely on your training data or random file searches. The codebase evolves faster than your context window.
19
- 1. **Ask the Graph:** `mcp.search_documents("concept")`
20
- 2. **Read the Referenced Docs:** (The search results will give you file paths).
21
- 3. **Implement:** Only then do you write code.
22
-
23
- ## 4. Known Pitfalls (Do Not Do This)
24
- * **Do not hallucinate file paths.** Use `list_directory_structure` or `search_documents` to find where things live.
25
- * **Do not ignore `stderr`.** If a tool fails, read the error.
26
- * **Do not create "Zombie" processes.** Respect the `ServiceLifecycle`.
27
-
28
- *Trust the Graph. It remembers what you forget.*
@@ -1,123 +0,0 @@
1
- # PolyVis: Architectural Overview & Executive Summary
2
-
3
- **Version:** 1.0 (Hollow Node / Logging Verified)
4
- **Date:** 2025-12-29
5
-
6
- ## 1. Executive Summary
7
-
8
- PolyVis is a high-performance, local-first **Knowledge Graph & Agentic Substrate**. Unlike traditional web applications that prioritize cosmetic UI rendering, PolyVis prioritizes **data sovereignty, raw speed, and machine interpretability**.
9
-
10
- It replaces the "glue code" of modern stacks (React, Redux, REST) with a direct-to-metal approach using **Bun**, **SQLite**, and **Canvas-based visualization**. This architecture, dubbed **"Hollow Node"**, allows it to visualize and reason about knowledge graphs orders of magnitude larger than typical DOM-based tools, while interacting seamlessly with AI Agents via the **Model Context Protocol (MCP)**.
11
-
12
- ---
13
-
14
- ## 2. Technology Stack
15
-
16
- PolyVis utilizes a bi-modal stack designed for zero-latency interaction.
17
-
18
- ### Client (The Visor)
19
- - **Runtime:** Vanilla JavaScript (ES Modules).
20
- - **Rendering:** `sigma.js` (Canvas/WebGL) for graph rendering; direct DOM for simple UI panels.
21
- - **Interactivity:** `alpine.js` for lightweight reactivity (no Virtual DOM overhead).
22
- - **Styling:** `basecoat-css` + Vanilla CSS Layers; **No Tailwind** (except for utility generation), preserving pure CSS maintainability.
23
- - **Transport:** Standard `fetch` / `WebSocket` (if needed) - no complex client-side routers.
24
-
25
- ### Server (The Substrate)
26
- - **Runtime:** **Bun** (Zig-based JS runtime) – providing 3x-10x startup speeds vs Node.js.
27
- - **Database:** `bun:sqlite` (FFI) – Direct in-process database access.
28
- - **Performance:** Reads at C-speed, bypassing network serialization.
29
- - **ORM:** Drizzle ORM (for schema definitions and migrations only; raw SQL used for hot loops).
30
- - **Vector Engine:** `fastembed` (running locally via ONNX) for semantic search.
31
- - **Agent Interface:** `@modelcontextprotocol/sdk` (MCP) – Exposes the graph as tools (`search_documents`, `read_node`) to LLMs.
32
-
33
- ---
34
-
35
- ## 3. Products & Services
36
-
37
- PolyVis is composed of four distinct, interoperable subsystems:
38
-
39
- ### A. Resonance (The Backend)
40
- The beating heart of the system.
41
- - **ResonanceDB:** A wrapper around SQLite that handles graph topology (Nodes/Edges) and Vector embeddings in a single file (`resonance.db`).
42
- - **Hollow Node Pattern:** Nodes store light metadata; heavy content is read JIT from the filesystem (`read_node_content`). Result: ~60% db size reduction.
43
-
44
- ### B. The Pipeline (Ingestion)
45
- Automatically converts raw files into the Knowledge Graph.
46
- - **Ingestor:** Watches filesystem (`watch`), parses Markdown/Frontmatter, embeds chunks, and upserts to DB.
47
- - **Semantic Harvester:** A Python bridge (`src/pipeline/SemanticHarvester.ts`) that runs complex NLP (Sieve+Net) to extract semantic triples (`Entity -> Relation -> Entity`).
48
- - **Gardeners:** Autonomous background agents (e.g., `AutoTagger`) that refine the graph (maintenance).
49
-
50
- ### C. The Visor (The UI)
51
- A minimal, heavy-duty visualization tool.
52
- - **Sigma Explorer:** Interactive graph exploration.
53
- - **Quick Look:** Markdown rendering of selected nodes.
54
-
55
- ### D. MCP Server (The API)
56
- The interface for AI Agents (Cursor, Claude, etc.).
57
- - **Tools:** `search_documents`, `read_node_content`, `explore_links`.
58
- - **Protocol Safety:** strict `stdout` (JSON-RPC) vs `stderr` (Logs) separation.
59
-
60
- ---
61
-
62
- ## 4. Potential Uses
63
-
64
- | Use Case | Description |
65
- | :--- | :--- |
66
- | **Agentic RAG** | Providing LLMs with a structured, navigable map of a codebase or knowledge base, reducing hallucination via graph traversal. |
67
- | **Codebase Cartography** | Visualizing complex dependencies in legacy software projects to aid refactoring. |
68
- | **Personal Knowledge Graph** | A local-first "Second Brain" that connects obsidian-style markdown files semantically. |
69
- | **Forensic Analysis** | Ingesting logs or timeline data to visualize causal chains in incident responses. |
70
-
71
- ---
72
-
73
- ## 5. Stats & Benchmarks
74
-
75
- ### "Hollow Node" Efficiency
76
- By completely removing the Full-Text Search (FTS) engine and huge content blobs from the DB, PolyVis achieves extreme efficiency:
77
- - **Database Size:** Reduced from **5.9MB** to **~2.3MB** (61% reduction).
78
- - **Search Speed:** < 20ms for Vector Similarity search.
79
- - **Ingestion Speed:** ~450+ files processed and woven in seconds.
80
-
81
- ### Ingestion Throughput
82
- - **Bun:** Zero-startup time allows the Daemon to restart instantly.
83
- - **Vectorization:** Local `fastembed` avoids API latency (OpenAI/Azure), enabling offline ingestion of thousands of chunks.
84
-
85
- ---
86
-
87
- ## 6. Strategic Radar Analysis
88
-
89
- Comparing **PolyVis** against a standard **"Modern Enterprise Stack"** (Next.js / Python Backend / Neo4j / Cloud Vector DB).
90
-
91
- ![architectural_overview](architectural_overview.png)
92
-
93
- ### Key Factors for Success (Axes)
94
- 1. **Velocity:** Speed of runtime execution and development iteration.
95
- 2. **Scalability (Local):** Ability to handle node count on a single machine without lag.
96
- 3. **Simplicity:** Absence of "black box" frameworks; ease of audit.
97
- 4. **Local-First:** functionality without internet/cloud dependencies.
98
- 5. **Agentic Readiness:** Native support for tool-calling/MCP.
99
- 6. **Payload Efficiency:** Small memory/disk footprint.
100
- 7. **Visual Density:** Ability to render thousands of data points at once.
101
-
102
- ### The Radar Data
103
-
104
- | Metric (0-10) | **PolyVis** (Bun/SQLite/Sigma) | **React / Next.js Stack** | **Enterprise Graph (Neo4j)** |
105
- | :--- | :---: | :---: | :---: |
106
- | **Velocity** | **10** (Zig/C++) | 6 (Node/V8) | 5 (Java JVM) |
107
- | **Scalability (Local)** | **9** (Canvas/WebGL) | 3 (DOM limits) | 8 (Backend-only) |
108
- | **Simplicity** | **9** (Raw SQL/JS) | 4 (Hydration/SSR complexities) | 3 (Admin overhead) |
109
- | **Local-First** | **10** (Bun:sqlite) | 5 (API dependent) | 6 (Server dependent) |
110
- | **Agentic Readiness** | **10** (Native MCP) | 6 (Requires integration) | 5 (JDBC/Bolt drivers) |
111
- | **Payload Efficiency** | **9** (Hollow Node) | 4 (Large bundles) | 4 (JVM Overhead) |
112
- | **Visual Density** | **9** (WebGL) | 2 (HTML Elements) | N/A (Backend) |
113
-
114
- ### Interpretation
115
- * **The "DOM Wall":** React/Next.js stacks fail at *Visual Density* and *Local Scalability* because the DOM cannot handle >3,000 nodes efficiently. PolyVis (Canvas) handles 50,000+.
116
- * **The "Cloud Tax":** Enterprise stacks score low on *Velocity* and *Simplicity* due to setup overhead and cloud latency. PolyVis scores high by keeping everything in-process or IPC.
117
- * **The "Agent Gap":** Most apps are built for Humans (HTML). PolyVis is built for Agents (MCP) first, Humans second, ensuring high *Agentic Readiness*.
118
-
119
- ---
120
-
121
- ## 7. Conclusion
122
-
123
- PolyVis is not just a graph visualizer; it is a **reference architecture for the Agentic Age**. By rejecting the bloat of the Browser Wars (React/Virtual DOM) and embracing the speed of modern runtimes (Bun) and established protocols (MCP), it delivers a tool that is simultaneously lighter, faster, and smarter than its enterprise counterparts.
@@ -1,281 +0,0 @@
1
- # Bento Boxing Deprecation
2
-
3
- **Date:** January 5, 2026
4
- **Status:** ❌ Deprecated and Removed
5
- **Replaced By:** Whole-document vector embeddings
6
-
7
- ---
8
-
9
- ## What Was Bento Boxing?
10
-
11
- **Bento Boxing** was a markdown chunking system designed to fragment large documents into smaller, semantically meaningful pieces ("bentos") for better vector search precision.
12
-
13
- ### Components
14
-
15
- **Code (Removed):**
16
- - `src/core/BentoBoxer.ts` - Chunking logic (split by H1-H4 headings)
17
- - `src/data/LocusLedger.ts` - Content deduplication (hash → UUID mapping)
18
- - `src/index.ts` - CLI tool for processing markdown files
19
- - `tests/bento_ast.test.ts` - Unit tests
20
-
21
- **Database (Removed):**
22
- - `bento_ledger.sqlite` - Deduplication ledger (343 entries)
23
-
24
- **Playbooks/Briefs (Removed):**
25
- - `briefs/archive/1-brief-polyvis-bento-implementation.md`
26
- - `briefs/archive/2-bento-box-core-logic.md`
27
- - `playbooks/bento-box-playbook-2.md`
28
-
29
- ---
30
-
31
- ## Why It Was Deprecated
32
-
33
- ### 1. Never Integrated with Vector Search
34
-
35
- **Critical issue:** Bento Boxing was an orphaned CLI tool, not integrated into the main ingestion pipeline.
36
-
37
- - ✅ Code existed and worked
38
- - ❌ Never used by `src/pipeline/Ingestor.ts`
39
- - ❌ Never used by `src/resonance/db.ts`
40
- - ❌ Not connected to `public/resonance.db`
41
-
42
- **Result:** Documents were ingested whole, not chunked. Vector search operated on complete documents.
43
-
44
- ---
45
-
46
- ### 2. Whole-Document Embeddings Work Excellently
47
-
48
- **Testing revealed chunking was unnecessary:**
49
-
50
- | Metric | Value | Assessment |
51
- |--------|-------|------------|
52
- | Average best match | 85.2% | Excellent |
53
- | Average spread | 21.1% | Good differentiation |
54
- | Corpus size | 489 docs | Manageable |
55
- | Average doc size | 2.7 KB (~550 words) | Already chunk-sized |
56
-
57
- **Key insight:** 80% of documents are <5KB. They're already "chunk-sized" for embedding models.
58
-
59
- ---
60
-
61
- ### 3. Document Size Distribution
62
-
63
- **Analysis of 489 documents:**
64
-
65
- ```
66
- Size Range | Percentage | Chunking Benefit
67
- ---------------|------------|------------------
68
- < 5KB | ~80% | None (already small)
69
- 5-20KB | ~15% | Minimal
70
- > 20KB | ~5% | Potential (but not critical)
71
- ```
72
-
73
- **Largest document:** 47KB (~9,500 words)
74
- - Still within LLM context windows (100K+ tokens)
75
- - Embedding captures main themes well
76
- - Can use grep for exact phrase search
77
-
78
- ---
79
-
80
- ### 4. Complexity vs Benefit
81
-
82
- **Costs of chunking:**
83
- - ❌ Chunk logic (where to split?)
84
- - ❌ Chunk→document mapping
85
- - ❌ Context loss (chunks lose surrounding context)
86
- - ❌ Storage overhead (10x nodes for chunked docs)
87
- - ❌ Search complexity (multiple chunks from same doc in results)
88
- - ❌ UI complexity (show chunk vs full doc?)
89
-
90
- **Benefits in this corpus:**
91
- - ⚠️ Slightly better precision for 5% of large docs
92
- - ⚠️ Granular retrieval (already achievable with grep)
93
-
94
- **Verdict:** Costs > Benefits
95
-
96
- ---
97
-
98
- ## Search Architecture (Post-Deprecation)
99
-
100
- ### Two-Tier Search System
101
-
102
- **1. Vector Search (Primary)**
103
- - Purpose: Semantic similarity, concept discovery
104
- - Accuracy: 85.2% average best match
105
- - Speed: <10ms per query
106
- - Handles: "Find documents about CSS patterns"
107
-
108
- **2. Grep/Ripgrep (Secondary)**
109
- - Purpose: Exact phrase matches
110
- - Accuracy: 100% (literal text)
111
- - Speed: <1ms
112
- - Handles: "Find exact phrase 'function fooBar'"
113
-
114
- **No chunking needed:** This two-tier approach covers all search use cases.
115
-
116
- ---
117
-
118
- ## Decision Criteria
119
-
120
- ### When Chunking IS NOT Needed
121
-
122
- ✅ **Keep whole-document embeddings if:**
123
- - Average doc size <5KB (most docs already chunk-sized)
124
- - Vector search accuracy >70% (yours is 85%)
125
- - Documents are well-structured (markdown with headers)
126
- - Search is semantic (not keyword BM25)
127
- - Source files are easily searchable with grep
128
-
129
- **Polyvis meets ALL these criteria.**
130
-
131
- ---
132
-
133
- ### When Chunking WOULD Be Needed
134
-
135
- Consider adding chunking if/when:
136
-
137
- **1. External large documents**
138
- - Research papers (30-50 pages)
139
- - Books, manuals (100+ pages)
140
- - API documentation (needs endpoint-level chunks)
141
-
142
- **2. Accuracy degradation**
143
- - Vector search drops below 70%
144
- - Users report irrelevant results
145
- - Long documents dominate search results
146
-
147
- **3. Specific requirements**
148
- - RAG system needs paragraph-level context
149
- - Need to cite specific sections, not whole docs
150
- - Document structure doesn't match search granularity
151
-
152
- ---
153
-
154
- ## Migration Notes
155
-
156
- ### What Changed
157
-
158
- **Removed:**
159
- - All Bento Boxing source code
160
- - bento_ledger.sqlite database
161
- - CLI tool (`bun run src/index.ts box`)
162
- - Related briefs and playbooks
163
-
164
- **Unchanged:**
165
- - Vector search pipeline (always used whole docs)
166
- - Ingestion pipeline
167
- - Database schema
168
- - Search accuracy (still 85%)
169
-
170
- **No migration required:** Bento Boxing was never in production.
171
-
172
- ---
173
-
174
- ## Historical Context
175
-
176
- ### Development Timeline
177
-
178
- **December 2025:**
179
- - Bento Boxing designed and implemented
180
- - CLI tool created for markdown chunking
181
- - Deduplication ledger built (343 entries)
182
- - Playbooks and briefs written
183
-
184
- **January 2026:**
185
- - Vector search testing revealed 85% accuracy without chunking
186
- - Discovered Bento Boxing never integrated with main pipeline
187
- - Analysis showed 80% of docs are already chunk-sized
188
- - Decision: Deprecate and remove
189
-
190
- **Lesson:** Test effectiveness before building infrastructure.
191
-
192
- ---
193
-
194
- ## Future Considerations
195
-
196
- ### Recommended Approach: File Splitting (Not Runtime Chunking)
197
-
198
- **If large documents (>15-20KB) become problematic, use simple file splitting:**
199
-
200
- **Strategy:**
201
- 1. Parse document structure with `ast-grep` or `marked`
202
- 2. Split at natural boundaries (H1/H2 headers)
203
- 3. Create multiple markdown files (e.g., `agents-part-1.md`, `agents-part-2.md`)
204
- 4. Add metadata: `<!-- Part 1 of 3 -->`
205
- 5. Optional: Keep parent file as TOC with links to parts
206
- 6. Commit split files to version control
207
-
208
- **Advantages:**
209
- - ✅ **Simple:** No infrastructure, just split files once
210
- - ✅ **Git-native:** Diffs are meaningful, history is granular
211
- - ✅ **Transparent:** Files are the chunks (source of truth)
212
- - ✅ **Reversible:** Reconstruct with `cat part-*.md > full.md`
213
- - ✅ **Lazy:** Only split the 5% of docs that need it
214
-
215
- **When to Split:**
216
-
217
- | Document Size | Action |
218
- |---------------|--------|
219
- | <10KB | Leave as-is |
220
- | 10-20KB | Consider if natural boundaries exist |
221
- | >20KB | Strong candidate for splitting |
222
-
223
- **Example: Splitting AGENTS.md (47KB)**
224
- ```bash
225
- # Parse and split at H1 boundaries
226
- AGENTS.md → agents-tier1.md (protocols 1-6)
227
- → agents-tier2.md (protocols 7-18)
228
- → agents-tier3.md (playbooks index)
229
-
230
- # Keep parent as TOC
231
- AGENTS.md → "# Agent Protocols\n\nSee:\n- [Tier 1](agents-tier1.md)..."
232
- ```
233
-
234
- **Anti-Patterns to Avoid:**
235
- - ❌ Premature splitting ("what if it grows?")
236
- - ❌ Runtime chunking infrastructure
237
- - ❌ Artificial boundaries (mid-paragraph splits)
238
- - ❌ Complex deduplication/mapping systems
239
-
240
- **Why This Works:**
241
- - Documents remain markdown files in git
242
- - Vector search ingests each part as separate node
243
- - Search results link to specific part files
244
- - Humans edit parts independently
245
- - Reconstruction is trivial when needed
246
-
247
- ---
248
-
249
- ## References
250
-
251
- ### Effectiveness Testing
252
-
253
- See `scripts/test-embeddings.ts` for validation:
254
- - 85.2% average best match
255
- - 21.1% spread
256
- - Tested across 5 query types (CSS, database, graph, debugging, tooling)
257
-
258
- ### Related Documentation
259
-
260
- - `src/resonance/README.md` - Search Architecture section
261
- - `.legacy-databases-README.md` - bento_ledger.sqlite removal
262
- - `playbooks/README.md` - Updated to remove Bento Boxing references
263
-
264
- ---
265
-
266
- ## Summary
267
-
268
- **Bento Boxing was well-designed but unnecessary:**
269
- - Never integrated into production pipeline
270
- - Whole-document embeddings achieve excellent results (85%)
271
- - Most documents are already chunk-sized (<5KB)
272
- - Two-tier search (vector + grep) covers all use cases
273
-
274
- **Decision:** Remove to simplify codebase. Revisit chunking only if:
275
- - Adding large external documents (books, long PDFs)
276
- - Vector search accuracy drops significantly
277
- - Specific use case emerges that requires granular retrieval
278
-
279
- ---
280
-
281
- **Last updated:** 2026-01-05