amalfa 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,43 @@
1
+ # Changelog
2
+
3
+ All notable changes to the **PolyVis** project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased] - 2025-12-31
9
+ ### Added
10
+ - **UI:** Implemented "Terminal Brutalist" design system (High-Contrast / Low-Noise).
11
+ - **UI:** Added "Vision Helper" (`window.__AGENT_THEME__`) for programmatic theme detection by agents.
12
+ - **UI:** Added "Style Auditor" (`window.runStyleAudit()`) for runtime CSS integrity checks.
13
+ - **UI:** Added "Hollow" vs "Full" node visualization states in `sigma.js` renderer.
14
+ - **UI:** Added "Agent Activity" indicator color (`--ansi-orange` / `#FF8C00`).
15
+ - **Arch:** Added "FAFCAS" Protocol (Feature Alignment / Frequency Correction / Amplitude Scaling) for normalized embeddings.
16
+ - **Docs:** Added `CHANGELOG.md` as a primary context source.
17
+
18
+ ### Changed
19
+ - **UI:** Replaced generic color palette with strict **ANSI Standard** variables (`basecoat-css`).
20
+ - **UI:** Enforced `border-radius: 0px` global reset.
21
+ - **UI:** Refactored Home Page to "Vertical Monolith" layout (5:8 Aspect Ratio).
22
+ - **UI:** Updated Navbar Brand to use `--ansi-cyan` (System Identity).
23
+ - **UI:** Implemented "Semantic Inversion" for hover states (High Contrast).
24
+ - **Arch:** Initiated migration from `fastembed` to `model2vec` (Pending Benchmark results).
25
+ - **Arch:** Deprecated "Context Engineering" in favor of "Constraint Stacking" for Agent prompts.
26
+
27
+ ### Fixed
28
+ - **Code:** Resolved all Biome linting issues (`noExplicitAny`, `noStaticOnlyClass`).
29
+ - **Code:** Eliminated strict TypeScript errors across the codebase.
30
+ - **Code:** Refactored static-only classes to `export const` objects for better tree-shaking and simplicity.
31
+ - **Code:** Strong typing for Database Query results (removed `any` casting).
32
+
33
+ ### Removed
34
+ - **UI:** Removed all shadows, gradients, and non-monospace fonts.
35
+ - **UI:** Removed "Soft" interaction states (transitions/fades) in favor of "Hard" inversions.
36
+
37
+ ## [1.0.0] - 2025-12-29
38
+ ### Added
39
+ - **Core:** Initial release of the "Hollow Node" architecture.
40
+ - **Runtime:** Validated **Bun** + **SQLite** (`bun:sqlite`) substrate.
41
+ - **Visor:** Canvas-based Graph rendering via `sigma.js`.
42
+ - **Agent:** MCP Server implementation with `search_documents` and `read_node` tools.
43
+ - **Pipeline:** "Semantic Harvester" python bridge for initial ingestion.
package/README.md CHANGED
@@ -51,7 +51,7 @@ Create `amalfa.config.json` in your project root:
51
51
 
52
52
  ```json
53
53
  {
54
- "source": "./docs",
54
+ "sources": ["./docs", "./notes"],
55
55
  "database": ".amalfa/resonance.db",
56
56
  "embeddings": {
57
57
  "model": "BAAI/bge-small-en-v1.5",
@@ -65,6 +65,8 @@ Create `amalfa.config.json` in your project root:
65
65
  }
66
66
  ```
67
67
 
68
+ **New in v1.0.1:** Multi-source support! Use `sources` array to scan multiple directories. Single `source` string still works (auto-migrates).
69
+
68
70
  Or use TypeScript:
69
71
 
70
72
  ```typescript
@@ -95,20 +97,29 @@ Restart Claude Desktop, and you'll see AMALFA tools available in the conversatio
95
97
 
96
98
  ## CLI Commands
97
99
 
98
- ### `amalfa init`
100
+ ### `amalfa init [--force]`
99
101
 
100
- Initialize knowledge graph from markdown files.
102
+ Initialize knowledge graph from markdown files with pre-flight validation.
101
103
 
102
104
  ```bash
103
- amalfa init
105
+ amalfa init # With validation
106
+ amalfa init --force # Override warnings (use with caution)
104
107
  ```
105
108
 
106
109
  **What it does:**
107
- - Scans your source directory for `.md` files
110
+ - **Pre-flight validation** (v1.0.1): Checks for large files, symlinks, circular references
111
+ - Scans your source directories for `.md` files
108
112
  - Generates vector embeddings (384 dimensions)
109
113
  - Extracts WikiLinks (`[[links]]`) and semantic tags
110
114
  - Creates edges between related documents
111
- - Stores everything in SQLite with WAL mode
115
+ - Stores metadata in SQLite (content in filesystem - "hollow nodes")
116
+
117
+ **Pre-Flight Protection** (v1.0.1):
118
+ - Blocks files >10MB (prevents memory issues)
119
+ - Detects symlink loops (prevents infinite recursion)
120
+ - Warns about small files (<50 bytes) and large corpora (10K+ files)
121
+ - Generates `.amalfa-pre-flight.log` with actionable recommendations
122
+ - Use `--force` to override warnings (errors still block)
112
123
 
113
124
  **Output:**
114
125
  ```
@@ -183,11 +194,14 @@ AMALFA implements the **"Hollow Nodes"** pattern:
183
194
  - **Markdown files** = Source of truth (version controlled, human-readable)
184
195
  - **SQLite database** = Ephemeral cache (can be regenerated anytime)
185
196
 
197
+ **v1.0.1 Enhancement:** Schema v6 fully implements hollow nodes - content is never stored in the database, only metadata and embeddings. This reduces database size dramatically (~350MB saved for 70K documents) and maintains the filesystem as the single source of truth.
198
+
186
199
  This means:
187
200
  - ✅ You can delete `.amalfa/` and rebuild with `amalfa init`
188
201
  - ✅ Your markdown files remain the canonical source
189
202
  - ✅ Database changes are never written back to files
190
203
  - ✅ No lock-in, no vendor formats
204
+ - ✅ Smaller databases, faster writes (v1.0.1)
191
205
 
192
206
  ### Technology Stack
193
207
 
@@ -207,9 +221,11 @@ your-project/
207
221
  │ ├── architecture.md
208
222
  │ └── ...
209
223
  ├── .amalfa/ # AMALFA data (gitignored)
210
- │ └── resonance.db # SQLite database (2-5 MB typical)
224
+ │ └── resonance.db # SQLite database (schema v6 - hollow nodes)
211
225
  ├── amalfa.config.json # Configuration (optional)
212
- └── .amalfa-daemon.pid # Daemon process ID (if running)
226
+ ├── .amalfa-daemon.pid # Daemon process ID (if running)
227
+ ├── .amalfa-daemon.log # Daemon logs
228
+ └── .amalfa-pre-flight.log # Validation report (generated by init)
213
229
  ```
214
230
 
215
231
  ## Features
package/ROADMAP.md ADDED
@@ -0,0 +1,316 @@
1
+ # AMALFA Roadmap
2
+
3
+ This document outlines the planned features and improvements for future versions of AMALFA.
4
+
5
+ ## Version 1.1 (Q1 2026) - Graph Analytics & Performance
6
+
7
+ ### Theme: "Hollow Nodes + Graph Intelligence"
8
+
9
+ Version 1.1 focuses on leveraging the hollow node architecture to enable powerful graph analytics with minimal memory overhead.
10
+
11
+ ---
12
+
13
+ ### Graphology Integration
14
+
15
+ **Status**: Planned
16
+ **Priority**: High
17
+ **Complexity**: Medium
18
+
19
+ #### Overview
20
+ Integrate Graphology.js for in-memory graph analytics using the hollow node pattern. Graph contains only structure (nodes as IDs + paths, edges with weights), while content remains in filesystem/database.
21
+
22
+ #### Memory Footprint
23
+ - 70,000 nodes × 20 bytes (ID + path) = **~1.4MB**
24
+ - 100,000 edges × 50 bytes (source + target + weight + type) = **~5MB**
25
+ - **Total: ~7MB for 70K node graph** (vs 490MB if we stored content + embeddings)
26
+
27
+ #### New Components
28
+ - `src/core/GraphEngine.ts` - Lazy-loading graph builder
29
+ - Hollow pattern: `graph.addNode(id, { path: "docs/file.md" })`
30
+ - Content fetched on-demand from filesystem
31
+
32
+ #### Benefits
33
+ - **Fast traversal**: Pure memory operations, no I/O
34
+ - **Graph algorithms**: Centrality, clustering, path finding
35
+ - **Hybrid search**: Vector similarity + graph structure
36
+ - **Scalable**: 100K+ nodes easily
37
+
38
+ ---
39
+
40
+ ### New MCP Tools
41
+
42
+ **Status**: Planned
43
+ **Priority**: High
44
+ **Complexity**: Low-Medium
45
+
46
+ #### 1. `find_related_documents(node_id, depth)`
47
+ - **Purpose**: Find documents connected via graph structure
48
+ - **Parameters**:
49
+ - `node_id`: Starting document ID
50
+ - `depth`: Traversal depth (default: 2)
51
+ - `include_content`: Return full content or just paths (default: false)
52
+ - **Returns**: Array of related document IDs/paths
53
+ - **Use Case**: Agent explores document relationships without vector search
54
+
55
+ #### 2. `discover_clusters()`
56
+ - **Purpose**: Detect topic communities using Louvain algorithm
57
+ - **Parameters**:
58
+ - `min_cluster_size`: Minimum documents per cluster (default: 3)
59
+ - **Returns**: Array of clusters, each with document IDs
60
+ - **Use Case**: "What are the main topics in this knowledge base?"
61
+
62
+ #### 3. `find_connection_path(from_id, to_id)`
63
+ - **Purpose**: Find shortest path between two documents
64
+ - **Parameters**:
65
+ - `from_id`: Source document
66
+ - `to_id`: Target document
67
+ - **Returns**: Array of document IDs forming the path
68
+ - **Use Case**: "How is the API documentation related to the database schema?"
69
+
70
+ #### 4. `get_document_importance(node_id)`
71
+ - **Purpose**: Return centrality metrics for a document
72
+ - **Parameters**:
73
+ - `node_id`: Document to analyze
74
+ - **Returns**: Object with PageRank, betweenness, degree centrality
75
+ - **Use Case**: "Is this a hub document?"
76
+
77
+ #### 5. Enhanced `search_knowledge(query, use_graph_ranking)`
78
+ - **Enhancement**: Add optional graph-based reranking
79
+ - **Parameters**:
80
+ - `query`: Search query (existing)
81
+ - `limit`: Result count (existing)
82
+ - `use_graph_ranking`: Rerank by centrality (new, default: false)
83
+ - **Use Case**: Find relevant AND important documents
84
+
85
+ ---
86
+
87
+ ### VectorEngine Refactor
88
+
89
+ **Status**: Required for v1.1
90
+ **Priority**: High
91
+ **Complexity**: Low
92
+
93
+ #### Problem
94
+ Current VectorEngine reads content from database `content` column (now NULL in schema v6).
95
+
96
+ #### Solution
97
+ Update `searchByVector()` to read content from filesystem:
98
+
99
+ ```typescript
100
+ // Current (broken in v6)
101
+ const row = this.db.query("SELECT title, content FROM nodes WHERE id = ?").get(id);
102
+
103
+ // New (filesystem-backed)
104
+ const row = this.db.query("SELECT title, meta FROM nodes WHERE id = ?").get(id);
105
+ const meta = JSON.parse(row.meta);
106
+ const content = readFileSync(meta.source, 'utf8');
107
+ ```
108
+
109
+ #### Benefits
110
+ - Works with hollow nodes (schema v6)
111
+ - Single source of truth (filesystem)
112
+ - Enables schema v7 (remove content column entirely)
113
+
114
+ ---
115
+
116
+ ### Schema v7: Remove Content Column
117
+
118
+ **Status**: Planned after VectorEngine refactor
119
+ **Priority**: Medium
120
+ **Complexity**: Low
121
+
122
+ #### Changes
123
+ - Drop `content` column from `nodes` table completely
124
+ - Rebuild table without deprecated column
125
+ - All code must use filesystem reads
126
+
127
+ #### Prerequisites
128
+ - VectorEngine refactor complete
129
+ - All legacy code updated
130
+ - Test suite validates filesystem reads
131
+
132
+ #### Benefits
133
+ - Cleaner schema
134
+ - Removes technical debt
135
+ - ~350MB saved for 70K corpus
136
+
137
+ ---
138
+
139
+ ### Automatic File Splitting
140
+
141
+ **Status**: Planned
142
+ **Priority**: Medium
143
+ **Complexity**: High
144
+
145
+ #### Problem
146
+ Large files (>10MB) currently blocked by pre-flight validation. Users must manually split files.
147
+
148
+ #### Solution
149
+ Automatic chunking strategy:
150
+
151
+ 1. **Detection**: Files > 10MB trigger auto-split
152
+ 2. **Strategy Priority**:
153
+ - Markdown headers (H1/H2) - Most natural
154
+ - Token count (~2000 tokens per chunk) - Fallback
155
+ - Character count (~8000 chars) - Last resort
156
+ 3. **Virtual Nodes**: Create chunks with naming:
157
+ - `docs/api-reference.md#introduction`
158
+ - `docs/api-reference.md#authentication`
159
+ - `docs/api-reference.md#endpoints`
160
+ 4. **Graph Links**: Connect chunks:
161
+ - Container node: `api-reference.md` (type: `container`)
162
+ - Chunk nodes: `api-reference.md#section` (type: `chunk`)
163
+ - Edges: `chunk --part_of--> container`
164
+
165
+ #### Components
166
+ - `src/pipeline/MarkdownSplitter.ts` - Splitting logic
167
+ - Update `PreFlightAnalyzer` to suggest auto-split
168
+ - Schema v8: Add `chunk_index` column to nodes
169
+ - Update MCP tools to reassemble chunks on retrieval
170
+
171
+ #### Configuration
172
+ ```json
173
+ {
174
+ "maxFileSizeKB": 10240,
175
+ "autoSplit": true,
176
+ "splitStrategy": "headers" // or "tokens" or "characters"
177
+ }
178
+ ```
179
+
180
+ ---
181
+
182
+ ### Performance Enhancements
183
+
184
+ **Status**: Ongoing
185
+ **Priority**: Medium
186
+ **Complexity**: Varies
187
+
188
+ #### Planned Improvements
189
+
190
+ 1. **Batch Embedding Generation**
191
+ - Current: One file at a time
192
+ - Proposed: Batch FastEmbed calls (5-10 files)
193
+ - Expected: 2-3x faster ingestion
194
+
195
+ 2. **Parallel File Discovery**
196
+ - Current: Sequential directory scan
197
+ - Proposed: Parallel glob with worker threads
198
+ - Expected: Faster for large corpora (10K+ files)
199
+
200
+ 3. **Incremental Edge Reweaving**
201
+ - Current: Full graph rebuild on changes
202
+ - Proposed: Update only affected edges
203
+ - Expected: Faster daemon updates
204
+
205
+ 4. **Graph Cache**
206
+ - Current: Build graph from SQLite on each MCP session
207
+ - Proposed: Serialize to `.amalfa/graph.bin`, load in ~50ms
208
+ - Expected: Faster MCP server startup
209
+
210
+ ---
211
+
212
+ ## Version 1.2+ (Future) - Advanced Features
213
+
214
+ ### Multi-Language Support
215
+ - Embeddings for non-English content
216
+ - Language-specific tokenization
217
+ - Configurable embedding models per language
218
+
219
+ ### Custom Embedding Models
220
+ - Support for user-provided models
221
+ - Model switching without re-ingestion
222
+ - Embedding dimension compatibility checks
223
+
224
+ ### Graph Visualization Export
225
+ - Export to Graphviz DOT format
226
+ - Export to Sigma.js JSON
227
+ - Interactive web-based explorer
228
+
229
+ ### Backup & Restore
230
+ - `amalfa backup` command
231
+ - Compressed archive with database + source files
232
+ - `amalfa restore` with validation
233
+
234
+ ### Advanced Search
235
+ - Boolean operators (AND/OR/NOT)
236
+ - Filtered search by metadata
237
+ - Date range queries
238
+ - Fuzzy matching
239
+
240
+ ### API Server Mode
241
+ - RESTful API alongside MCP
242
+ - WebSocket for real-time updates
243
+ - Multi-client support
244
+
245
+ ---
246
+
247
+ ## Feature Requests
248
+
249
+ We welcome feature requests! Please open an issue on GitHub with:
250
+ - **Use case**: What problem does this solve?
251
+ - **Priority**: How important is this to you?
252
+ - **Alternatives**: What workarounds exist?
253
+
254
+ ---
255
+
256
+ ## Development Priorities
257
+
258
+ ### High Priority (v1.1)
259
+ 1. Graphology integration
260
+ 2. New MCP tools
261
+ 3. VectorEngine refactor
262
+ 4. Schema v7
263
+
264
+ ### Medium Priority (v1.1 or v1.2)
265
+ 1. Automatic file splitting
266
+ 2. Performance optimizations
267
+ 3. Graph cache
268
+
269
+ ### Low Priority (v1.2+)
270
+ 1. Multi-language support
271
+ 2. Custom embedding models
272
+ 3. Graph visualization
273
+ 4. API server mode
274
+
275
+ ---
276
+
277
+ ## Breaking Changes
278
+
279
+ ### v1.1
280
+ - Schema v7 removes `content` column (after VectorEngine refactor)
281
+ - Existing databases auto-migrate from v6 → v7
282
+ - **Action required**: Ensure all nodes have `meta.source` paths before upgrading
283
+
284
+ ### v2.0 (If needed)
285
+ - Major API changes (TBD)
286
+ - New MCP protocol version
287
+ - Configuration format changes
288
+
289
+ ---
290
+
291
+ ## Timeline
292
+
293
+ | Version | Target Date | Status | Features |
294
+ |---------|------------|--------|----------|
295
+ | v1.0.0 | 2026-01-06 | ✅ Released | Initial release, MCP server, vector search |
296
+ | v1.0.1 | 2026-01-06 | ✅ Released | Pre-flight validation, multi-source, schema v6 |
297
+ | v1.1.0 | Q1 2026 | 🚧 In Progress | Graphology, new MCP tools, schema v7 |
298
+ | v1.2.0 | Q2 2026 | 📋 Planned | File splitting, performance, advanced features |
299
+ | v2.0.0 | TBD | 💭 Future | Major enhancements, breaking changes |
300
+
301
+ ---
302
+
303
+ ## Contributing
304
+
305
+ We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
306
+
307
+ **Priority areas for contributors:**
308
+ - Graphology integration
309
+ - Test coverage improvements
310
+ - Documentation enhancements
311
+ - Performance benchmarks
312
+
313
+ ---
314
+
315
+ **Last Updated**: 2026-01-06
316
+ **Version**: 1.0.1