npm - amalfa - Versions diffs - 1.0.0 → 1.0.1 - Mend

amalfa 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/CHANGELOG.md.old +43 -0
package/README.md +24 -8
package/ROADMAP.md +316 -0
package/TEST_PLAN.md +561 -0
package/amalfa.config.example.ts +8 -6
package/package.json +1 -1
package/src/cli.ts +57 -11
package/src/config/defaults.ts +22 -4
package/src/daemon/index.ts +12 -8
package/src/pipeline/AmalfaIngestor.ts +22 -12
package/src/pipeline/PreFlightAnalyzer.ts +434 -0
package/src/resonance/db.ts +4 -2
package/src/resonance/schema.ts +19 -1

package/CHANGELOG.md.old ADDED Viewed

@@ -0,0 +1,43 @@
+# Changelog
+All notable changes to the **PolyVis** project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [Unreleased] - 2025-12-31
+### Added
+- **UI:** Implemented "Terminal Brutalist" design system (High-Contrast / Low-Noise).
+- **UI:** Added "Vision Helper" (`window.__AGENT_THEME__`) for programmatic theme detection by agents.
+- **UI:** Added "Style Auditor" (`window.runStyleAudit()`) for runtime CSS integrity checks.
+- **UI:** Added "Hollow" vs "Full" node visualization states in `sigma.js` renderer.
+- **UI:** Added "Agent Activity" indicator color (`--ansi-orange` / `#FF8C00`).
+- **Arch:** Added "FAFCAS" Protocol (Feature Alignment / Frequency Correction / Amplitude Scaling) for normalized embeddings.
+- **Docs:** Added `CHANGELOG.md` as a primary context source.
+### Changed
+- **UI:** Replaced generic color palette with strict **ANSI Standard** variables (`basecoat-css`).
+- **UI:** Enforced `border-radius: 0px` global reset.
+- **UI:** Refactored Home Page to "Vertical Monolith" layout (5:8 Aspect Ratio).
+- **UI:** Updated Navbar Brand to use `--ansi-cyan` (System Identity).
+- **UI:** Implemented "Semantic Inversion" for hover states (High Contrast).
+- **Arch:** Initiated migration from `fastembed` to `model2vec` (Pending Benchmark results).
+- **Arch:** Deprecated "Context Engineering" in favor of "Constraint Stacking" for Agent prompts.
+### Fixed
+- **Code:** Resolved all Biome linting issues (`noExplicitAny`, `noStaticOnlyClass`).
+- **Code:** Eliminated strict TypeScript errors across the codebase.
+- **Code:** Refactored static-only classes to `export const` objects for better tree-shaking and simplicity.
+- **Code:** Strong typing for Database Query results (removed `any` casting).
+### Removed
+- **UI:** Removed all shadows, gradients, and non-monospace fonts.
+- **UI:** Removed "Soft" interaction states (transitions/fades) in favor of "Hard" inversions.
+## [1.0.0] - 2025-12-29
+### Added
+- **Core:** Initial release of the "Hollow Node" architecture.
+- **Runtime:** Validated **Bun** + **SQLite** (`bun:sqlite`) substrate.
+- **Visor:** Canvas-based Graph rendering via `sigma.js`.
+- **Agent:** MCP Server implementation with `search_documents` and `read_node` tools.
+- **Pipeline:** "Semantic Harvester" python bridge for initial ingestion.

package/README.md CHANGED Viewed

@@ -51,7 +51,7 @@ Create `amalfa.config.json` in your project root:
 ```json
 {
-  "source": "./docs",
+  "sources": ["./docs", "./notes"],
   "database": ".amalfa/resonance.db",
   "embeddings": {
     "model": "BAAI/bge-small-en-v1.5",
@@ -65,6 +65,8 @@ Create `amalfa.config.json` in your project root:
 }
 ```
+**New in v1.0.1:** Multi-source support! Use `sources` array to scan multiple directories. Single `source` string still works (auto-migrates).
 Or use TypeScript:
 ```typescript
@@ -95,20 +97,29 @@ Restart Claude Desktop, and you'll see AMALFA tools available in the conversatio
 ## CLI Commands
-### `amalfa init`
+### `amalfa init [--force]`
-Initialize knowledge graph from markdown files.
+Initialize knowledge graph from markdown files with pre-flight validation.
 ```bash
-amalfa init
+amalfa init          # With validation
+amalfa init --force  # Override warnings (use with caution)
 ```
 **What it does:**
-- Scans your source directory for `.md` files
+- **Pre-flight validation** (v1.0.1): Checks for large files, symlinks, circular references
+- Scans your source directories for `.md` files
 - Generates vector embeddings (384 dimensions)
 - Extracts WikiLinks (`[[links]]`) and semantic tags
 - Creates edges between related documents
-- Stores everything in SQLite with WAL mode
+- Stores metadata in SQLite (content in filesystem - "hollow nodes")
+**Pre-Flight Protection** (v1.0.1):
+- Blocks files >10MB (prevents memory issues)
+- Detects symlink loops (prevents infinite recursion)
+- Warns about small files (<50 bytes) and large corpora (10K+ files)
+- Generates `.amalfa-pre-flight.log` with actionable recommendations
+- Use `--force` to override warnings (errors still block)
 **Output:**
 ```
@@ -183,11 +194,14 @@ AMALFA implements the **"Hollow Nodes"** pattern:
 - **Markdown files** = Source of truth (version controlled, human-readable)
 - **SQLite database** = Ephemeral cache (can be regenerated anytime)
+**v1.0.1 Enhancement:** Schema v6 fully implements hollow nodes - content is never stored in the database, only metadata and embeddings. This reduces database size dramatically (~350MB saved for 70K documents) and maintains the filesystem as the single source of truth.
 This means:
 - ✅ You can delete `.amalfa/` and rebuild with `amalfa init`
 - ✅ Your markdown files remain the canonical source
 - ✅ Database changes are never written back to files
 - ✅ No lock-in, no vendor formats
+- ✅ Smaller databases, faster writes (v1.0.1)
 ### Technology Stack
@@ -207,9 +221,11 @@ your-project/
 │   ├── architecture.md
 │   └── ...
 ├── .amalfa/                 # AMALFA data (gitignored)
-│   └── resonance.db         # SQLite database (2-5 MB typical)
+│   └── resonance.db         # SQLite database (schema v6 - hollow nodes)
 ├── amalfa.config.json       # Configuration (optional)
-└── .amalfa-daemon.pid       # Daemon process ID (if running)
+├── .amalfa-daemon.pid       # Daemon process ID (if running)
+├── .amalfa-daemon.log       # Daemon logs
+└── .amalfa-pre-flight.log   # Validation report (generated by init)
 ```
 ## Features

package/ROADMAP.md ADDED Viewed

@@ -0,0 +1,316 @@
+# AMALFA Roadmap
+This document outlines the planned features and improvements for future versions of AMALFA.
+## Version 1.1 (Q1 2026) - Graph Analytics & Performance
+### Theme: "Hollow Nodes + Graph Intelligence"
+Version 1.1 focuses on leveraging the hollow node architecture to enable powerful graph analytics with minimal memory overhead.
+---
+### Graphology Integration
+**Status**: Planned
+**Priority**: High
+**Complexity**: Medium
+#### Overview
+Integrate Graphology.js for in-memory graph analytics using the hollow node pattern. Graph contains only structure (nodes as IDs + paths, edges with weights), while content remains in filesystem/database.
+#### Memory Footprint
+- 70,000 nodes × 20 bytes (ID + path) = **~1.4MB**
+- 100,000 edges × 50 bytes (source + target + weight + type) = **~5MB**
+- **Total: ~7MB for 70K node graph** (vs 490MB if we stored content + embeddings)
+#### New Components
+- `src/core/GraphEngine.ts` - Lazy-loading graph builder
+- Hollow pattern: `graph.addNode(id, { path: "docs/file.md" })`
+- Content fetched on-demand from filesystem
+#### Benefits
+- **Fast traversal**: Pure memory operations, no I/O
+- **Graph algorithms**: Centrality, clustering, path finding
+- **Hybrid search**: Vector similarity + graph structure
+- **Scalable**: 100K+ nodes easily
+---
+### New MCP Tools
+**Status**: Planned
+**Priority**: High
+**Complexity**: Low-Medium
+#### 1. `find_related_documents(node_id, depth)`
+- **Purpose**: Find documents connected via graph structure
+- **Parameters**:
+  - `node_id`: Starting document ID
+  - `depth`: Traversal depth (default: 2)
+  - `include_content`: Return full content or just paths (default: false)
+- **Returns**: Array of related document IDs/paths
+- **Use Case**: Agent explores document relationships without vector search
+#### 2. `discover_clusters()`
+- **Purpose**: Detect topic communities using Louvain algorithm
+- **Parameters**:
+  - `min_cluster_size`: Minimum documents per cluster (default: 3)
+- **Returns**: Array of clusters, each with document IDs
+- **Use Case**: "What are the main topics in this knowledge base?"
+#### 3. `find_connection_path(from_id, to_id)`
+- **Purpose**: Find shortest path between two documents
+- **Parameters**:
+  - `from_id`: Source document
+  - `to_id`: Target document
+- **Returns**: Array of document IDs forming the path
+- **Use Case**: "How is the API documentation related to the database schema?"
+#### 4. `get_document_importance(node_id)`
+- **Purpose**: Return centrality metrics for a document
+- **Parameters**:
+  - `node_id`: Document to analyze
+- **Returns**: Object with PageRank, betweenness, degree centrality
+- **Use Case**: "Is this a hub document?"
+#### 5. Enhanced `search_knowledge(query, use_graph_ranking)`
+- **Enhancement**: Add optional graph-based reranking
+- **Parameters**:
+  - `query`: Search query (existing)
+  - `limit`: Result count (existing)
+  - `use_graph_ranking`: Rerank by centrality (new, default: false)
+- **Use Case**: Find relevant AND important documents
+---
+### VectorEngine Refactor
+**Status**: Required for v1.1
+**Priority**: High
+**Complexity**: Low
+#### Problem
+Current VectorEngine reads content from database `content` column (now NULL in schema v6).
+#### Solution
+Update `searchByVector()` to read content from filesystem:
+```typescript
+// Current (broken in v6)
+const row = this.db.query("SELECT title, content FROM nodes WHERE id = ?").get(id);
+// New (filesystem-backed)
+const row = this.db.query("SELECT title, meta FROM nodes WHERE id = ?").get(id);
+const meta = JSON.parse(row.meta);
+const content = readFileSync(meta.source, 'utf8');
+```
+#### Benefits
+- Works with hollow nodes (schema v6)
+- Single source of truth (filesystem)
+- Enables schema v7 (remove content column entirely)
+---
+### Schema v7: Remove Content Column
+**Status**: Planned after VectorEngine refactor
+**Priority**: Medium
+**Complexity**: Low
+#### Changes
+- Drop `content` column from `nodes` table completely
+- Rebuild table without deprecated column
+- All code must use filesystem reads
+#### Prerequisites
+- VectorEngine refactor complete
+- All legacy code updated
+- Test suite validates filesystem reads
+#### Benefits
+- Cleaner schema
+- Removes technical debt
+- ~350MB saved for 70K corpus
+---
+### Automatic File Splitting
+**Status**: Planned
+**Priority**: Medium
+**Complexity**: High
+#### Problem
+Large files (>10MB) currently blocked by pre-flight validation. Users must manually split files.
+#### Solution
+Automatic chunking strategy:
+1. **Detection**: Files > 10MB trigger auto-split
+2. **Strategy Priority**:
+   - Markdown headers (H1/H2) - Most natural
+   - Token count (~2000 tokens per chunk) - Fallback
+   - Character count (~8000 chars) - Last resort
+3. **Virtual Nodes**: Create chunks with naming:
+   - `docs/api-reference.md#introduction`
+   - `docs/api-reference.md#authentication`
+   - `docs/api-reference.md#endpoints`
+4. **Graph Links**: Connect chunks:
+   - Container node: `api-reference.md` (type: `container`)
+   - Chunk nodes: `api-reference.md#section` (type: `chunk`)
+   - Edges: `chunk --part_of--> container`
+#### Components
+- `src/pipeline/MarkdownSplitter.ts` - Splitting logic
+- Update `PreFlightAnalyzer` to suggest auto-split
+- Schema v8: Add `chunk_index` column to nodes
+- Update MCP tools to reassemble chunks on retrieval
+#### Configuration
+```json
+{
+  "maxFileSizeKB": 10240,
+  "autoSplit": true,
+  "splitStrategy": "headers" // or "tokens" or "characters"
+}
+```
+---
+### Performance Enhancements
+**Status**: Ongoing
+**Priority**: Medium
+**Complexity**: Varies
+#### Planned Improvements
+1. **Batch Embedding Generation**
+   - Current: One file at a time
+   - Proposed: Batch FastEmbed calls (5-10 files)
+   - Expected: 2-3x faster ingestion
+2. **Parallel File Discovery**
+   - Current: Sequential directory scan
+   - Proposed: Parallel glob with worker threads
+   - Expected: Faster for large corpora (10K+ files)
+3. **Incremental Edge Reweaving**
+   - Current: Full graph rebuild on changes
+   - Proposed: Update only affected edges
+   - Expected: Faster daemon updates
+4. **Graph Cache**
+   - Current: Build graph from SQLite on each MCP session
+   - Proposed: Serialize to `.amalfa/graph.bin`, load in ~50ms
+   - Expected: Faster MCP server startup
+---
+## Version 1.2+ (Future) - Advanced Features
+### Multi-Language Support
+- Embeddings for non-English content
+- Language-specific tokenization
+- Configurable embedding models per language
+### Custom Embedding Models
+- Support for user-provided models
+- Model switching without re-ingestion
+- Embedding dimension compatibility checks
+### Graph Visualization Export
+- Export to Graphviz DOT format
+- Export to Sigma.js JSON
+- Interactive web-based explorer
+### Backup & Restore
+- `amalfa backup` command
+- Compressed archive with database + source files
+- `amalfa restore` with validation
+### Advanced Search
+- Boolean operators (AND/OR/NOT)
+- Filtered search by metadata
+- Date range queries
+- Fuzzy matching
+### API Server Mode
+- RESTful API alongside MCP
+- WebSocket for real-time updates
+- Multi-client support
+---
+## Feature Requests
+We welcome feature requests! Please open an issue on GitHub with:
+- **Use case**: What problem does this solve?
+- **Priority**: How important is this to you?
+- **Alternatives**: What workarounds exist?
+---
+## Development Priorities
+### High Priority (v1.1)
+1. Graphology integration
+2. New MCP tools
+3. VectorEngine refactor
+4. Schema v7
+### Medium Priority (v1.1 or v1.2)
+1. Automatic file splitting
+2. Performance optimizations
+3. Graph cache
+### Low Priority (v1.2+)
+1. Multi-language support
+2. Custom embedding models
+3. Graph visualization
+4. API server mode
+---
+## Breaking Changes
+### v1.1
+- Schema v7 removes `content` column (after VectorEngine refactor)
+- Existing databases auto-migrate from v6 → v7
+- **Action required**: Ensure all nodes have `meta.source` paths before upgrading
+### v2.0 (If needed)
+- Major API changes (TBD)
+- New MCP protocol version
+- Configuration format changes
+---
+## Timeline
+| Version | Target Date | Status | Features |
+|---------|------------|--------|----------|
+| v1.0.0 | 2026-01-06 | ✅ Released | Initial release, MCP server, vector search |
+| v1.0.1 | 2026-01-06 | ✅ Released | Pre-flight validation, multi-source, schema v6 |
+| v1.1.0 | Q1 2026 | 🚧 In Progress | Graphology, new MCP tools, schema v7 |
+| v1.2.0 | Q2 2026 | 📋 Planned | File splitting, performance, advanced features |
+| v2.0.0 | TBD | 💭 Future | Major enhancements, breaking changes |
+---
+## Contributing
+We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+**Priority areas for contributors:**
+- Graphology integration
+- Test coverage improvements
+- Documentation enhancements
+- Performance benchmarks
+---
+**Last Updated**: 2026-01-06
+**Version**: 1.0.1