npm - moflo - Versions diffs - 4.8.80-rc.7 → 4.8.81 - Mend

moflo 4.8.80-rc.7 → 4.8.81

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (101) hide show

package/.claude/agents/sparc/pseudocode.md CHANGED Viewed

@@ -58,7 +58,7 @@ BEGIN
     END IF
     // Verify password
-    isValid ← PasswordHasher.verify(password, user.passwordHash)
+    isValid ← CredentialStore.verify(password, user.passwordHash)
     IF NOT isValid THEN
         // Log failed attempt
@@ -225,7 +225,7 @@ ANALYSIS: User Authentication Flow
 Time Complexity:
     - Email validation: O(1)
     - Database lookup: O(log n) with index
-    - Password verification: O(1) - fixed bcrypt rounds
+    - Password verification: O(1) - fixed-cost hash compare
     - Session creation: O(1)
     - Total: O(log n)

package/.claude/guidance/shipped/moflo-core-guidance.md CHANGED Viewed

@@ -149,7 +149,7 @@ This gives Claude access to 200+ MCP tools (`mcp__moflo__memory_*`, `mcp__moflo_
 | `init`      | 4           | Project initialization with wizard, presets, skills, hooks               |
 | `agent`     | 8           | Agent lifecycle (spawn, list, status, stop, metrics, pool, health, logs) |
 | `swarm`     | 6           | Multi-agent swarm coordination and orchestration                         |
-| `memory`    | 11          | AgentDB memory with vector search (150x-12,500x faster)                  |
+| `memory`    | 11          | sql.js + HNSW vector search, 150x-12,500x faster                         |
 | `mcp`       | 9           | MCP server management and tool execution                                 |
 | `task`      | 6           | Task creation, assignment, and lifecycle                                 |
 | `session`   | 7           | Session state management and persistence                                 |
@@ -535,7 +535,7 @@ auto_index:
 # Memory backend
 memory:
-  backend: sql.js                 # sql.js (WASM) | agentdb | json
+  backend: sql.js                 # sql.js (WASM) | json
   embedding_model: Xenova/all-MiniLM-L6-v2   # 384-dim neural embeddings
   namespace: default              # Default namespace for memory operations
@@ -582,7 +582,7 @@ status_line:
   show_mcp: true                 # MCP server count
   show_security: true            # CVE/security status (dashboard only)
   show_adrs: true                # ADR compliance (dashboard only)
-  show_agentdb: true             # AgentDB vectors/size (dashboard only)
+  show_agentdb: true             # MofloDb vectors/size (dashboard only)
   show_tests: true               # Test file count (dashboard only)
 # Spell step sandboxing (OS-level process isolation for bash steps)

package/.claude/skills/memory-optimization/SKILL.md ADDED Viewed

@@ -0,0 +1,121 @@
+---
+name: "memory-optimization"
+description: "Tune moflo's memory stack for speed, RAM, and index quality. Covers HNSW parameters (M, efConstruction, ef), vector quantization, batch operations, and common bottlenecks. Use when scaling past ~100k entries or when search latency regresses."
+---
+# MoFlo Memory Optimization
+When the default `@moflo/memory` settings stop being enough — past ~100k entries, or when p95 search latency climbs — these are the levers.
+## HNSW Parameters
+HNSW has three knobs. They trade build time, query time, memory, and recall.
+```typescript
+import { HNSWIndex } from '@moflo/memory';
+const index = new HNSWIndex({
+  dimensions: 1536,        // must match your embedding model
+  maxElements: 1_000_000,  // pre-allocated capacity
+  M: 16,                   // graph connectivity (default 16)
+  efConstruction: 200,     // build-time search width (default 200)
+  metric: 'cosine',        // 'cosine' | 'l2' | 'ip'
+});
+```
+| Knob | Higher | Lower | When to change |
+|------|--------|-------|----------------|
+| `M` | better recall, more RAM (~2×M pointers per point) | less RAM, worse recall | Bump to 32–64 if recall@10 < 0.95; drop to 8 if memory-bound |
+| `efConstruction` | better index quality, slower build | faster build, worse queries | 200–400 is sweet spot; only lower in test fixtures |
+| `ef` (search-time, passed to `search()`) | better recall, slower queries | faster queries, worse recall | Start at 2×k, raise until recall plateaus |
+Rule of thumb: `M` and `efConstruction` are set once. `ef` is the runtime dial.
+## Quantization
+`@moflo/memory` supports scalar quantization (Float32 → Int8) for a ~4× memory reduction with a ~1-2% recall hit. Turn it on when the index doesn't fit comfortably in RAM.
+```typescript
+const index = new HNSWIndex({
+  dimensions: 1536,
+  maxElements: 5_000_000,
+  quantization: {
+    enabled: true,
+    type: 'scalar',   // scalar (Int8) is the supported path
+    rebuildThreshold: 10_000,
+  },
+});
+```
+Measure recall before/after on your own query distribution — public benchmarks don't predict your domain.
+## Batch Operations
+Single-entry writes pay the HNSW insert cost per call. For bulk ingest, batch:
+```typescript
+const entries: Array<[string, Float32Array]> = buildCorpus();
+// Parallelise at the adapter level; don't await sequentially.
+await Promise.all(
+  entries.map(([id, vec]) => index.addPoint(id, vec))
+);
+// Or via MCP for moflo-native batch into .swarm/memory.db:
+await mcp.memory_store(/* … */);  // upsert: true + Promise.all is fine
+```
+For >10k entries, prefer `bin/build-embeddings.mjs` / `bin/index-all.mjs` — they stream in batches with a progress bar and skip unchanged chunks via a hash file.
+## Caching
+`MofloDbAdapter` has a built-in LRU cache (default 10k entries, 5-min TTL):
+```typescript
+import { MofloDbAdapter } from '@moflo/memory';
+const store = new MofloDbAdapter({
+  cacheEnabled: true,
+  cacheSize: 50_000,      // scale with working set, not total corpus
+  cacheTtl: 10 * 60_000,  // 10 minutes
+});
+```
+Cache hits on exact keys bypass HNSW entirely. If your workload is read-heavy and hits a narrow keyspace, this is the cheapest win.
+## Measuring
+```bash
+npx vitest bench src/modules/memory/benchmarks/vector-search.bench.ts
+```
+The bench prints linear vs HNSW times for 1k and 10k vectors. Run it before and after any parameter change — "it felt faster" is not a benchmark.
+For production memory stats:
+```typescript
+const stats = await mcp.memory_stats({});
+// { entryCount, indexSize, cacheHitRate, avgSearchMs, … }
+```
+## Common Bottlenecks
+| Symptom | Likely cause | Fix |
+|---------|--------------|-----|
+| Cold-start of 5s on first search | HNSW loading from disk | Share a single instance via `beforeAll` in tests; keep the adapter resident in long-running processes |
+| Search latency climbs linearly with `limit` | Over-fetching and re-ranking on the hot path | Lower `limit`; raise `threshold` to prune |
+| Inserts slow past ~100k entries | `maxElements` too close to entry count → reallocation | Set `maxElements` to 2× expected corpus |
+| High RSS on a small corpus | Vector dimension mismatch with index | Confirm `dimensions` matches embedder output (OpenAI = 1536, local models vary) |
+## Anti-Patterns
+- **Don't rebuild the index on every test.** Use a module-level singleton + `beforeAll`. HNSW cold-boot is ~5s.
+- **Don't raise `ef` globally.** Raise it on the specific queries that need recall. Default is fine for 90% of calls.
+- **Don't quantize a small corpus.** Below ~500k vectors the RAM saving doesn't justify the recall cost.
+- **Don't measure in dev mode.** sql.js WASM behaves differently under `NODE_ENV=production`; benches should match the target.
+## See Also
+- `memory-patterns` skill — API usage and namespace design
+- `vector-search` skill — RAG-specific patterns on top of the optimized index
+- `src/modules/memory/benchmarks/` — runnable benches for every knob above

package/.claude/skills/memory-patterns/SKILL.md ADDED Viewed

@@ -0,0 +1,136 @@
+---
+name: "memory-patterns"
+description: "Persistent memory patterns for moflo agents — session memory, long-term knowledge, pattern learning, and cross-session context via moflo's sql.js + HNSW vector store. Use when building stateful agents or assistants that need to remember across runs."
+---
+# MoFlo Memory Patterns
+Persistent, semantically-searchable memory for moflo-enabled projects. Backed by `.swarm/memory.db` (sql.js + HNSW vector index) and exposed through MCP tools.
+## Core API
+Three things to know:
+1. **MCP tools** — call from Claude Code sessions:
+   - `mcp__moflo__memory_store { key, value, namespace?, tags?, ttl?, upsert? }`
+   - `mcp__moflo__memory_search { query, namespace?, limit?, threshold? }` — semantic, HNSW-backed
+   - `mcp__moflo__memory_retrieve { key, namespace? }` — exact key lookup
+   - `mcp__moflo__memory_list { namespace?, limit? }`
+   - `mcp__moflo__memory_delete { key, namespace? }`
+   - `mcp__moflo__memory_stats {}`
+2. **CLI** — `flo-search "<query>" --namespace <ns>` for quick semantic lookup from the shell.
+3. **Namespaces** — namespace + key is the unique identity. Default namespaces shipped by moflo: `knowledge`, `patterns`, `guidance`, `code-map`. Create your own for application memory (e.g. `app:sessions`, `app:users`).
+## Pattern 1: Session Memory
+Rolling per-conversation memory with TTL so old sessions expire:
+```typescript
+// Store a turn
+await mcp.memory_store({
+  namespace: 'app:sessions',
+  key: `${sessionId}:msg:${turnIndex}`,
+  value: { role, content, ts: Date.now() },
+  ttl: 60 * 60 * 24 * 7, // 7 days
+});
+// Recall the session — keys sort lexicographically, so prefixing with
+// sessionId groups a conversation.
+const all = await mcp.memory_list({ namespace: 'app:sessions' });
+const current = all.filter(t => t.key.startsWith(`${sessionId}:`));
+```
+Why namespace-per-use: search scope stays small and delete-by-namespace becomes `memory_list` + `memory_delete` in a loop.
+## Pattern 2: Long-Term Knowledge
+Facts that should survive any session and be findable by meaning, not exact key:
+```typescript
+await mcp.memory_store({
+  namespace: 'knowledge',
+  key: 'auth:session-token-rotation',
+  value: 'Session tokens are rotated every 15 minutes by the auth middleware. Refresh happens transparently on the client.',
+  tags: ['auth', 'security'],
+  upsert: true,
+});
+// Retrieval: semantic, not keyword
+const hits = await mcp.memory_search({
+  namespace: 'knowledge',
+  query: 'how do auth tokens refresh?',
+  limit: 5,
+  threshold: 0.4,
+});
+```
+`upsert: true` is the norm — you're updating your own knowledge, not guarding against collisions.
+## Pattern 3: Pattern Learning (store + promote)
+Capture what worked, then let the next run find it:
+```typescript
+// After a successful task
+await mcp.memory_store({
+  namespace: 'patterns',
+  key: `${patternType}:${shortHash(signature)}`,
+  value: {
+    signature,          // what triggered this
+    approach,           // what you did
+    outcome: 'success', // how it landed
+    occurrences: 1,
+  },
+  tags: [patternType],
+  upsert: true,
+});
+// Before starting a similar task
+const similar = await mcp.memory_search({
+  namespace: 'patterns',
+  query: currentTaskDescription,
+  limit: 3,
+  threshold: 0.5,
+});
+```
+On repeated hits, read the existing entry, increment `occurrences`, and `upsert`.
+## Pattern 4: Context Recall at Prompt Start
+moflo's gate hooks enforce "search memory before exploring files." Mirror that in your own agents:
+```typescript
+async function beforeTask(description: string) {
+  const [guidance, patterns, codeMap] = await Promise.all([
+    mcp.memory_search({ namespace: 'guidance', query: description, limit: 5 }),
+    mcp.memory_search({ namespace: 'patterns',  query: description, limit: 5 }),
+    mcp.memory_search({ namespace: 'code-map',  query: description, limit: 8 }),
+  ]);
+  return { guidance, patterns, codeMap };
+}
+```
+This is the same fan-out the `/flo` spell does — cheap (HNSW, parallel) and replaces a lot of exploratory Glob/Grep.
+## Anti-Patterns
+- **Don't put large blobs in `value`.** Store pointers/keys — the embedding is built from the value string, and huge values bloat the index.
+- **Don't search without a namespace.** Cross-namespace search mixes `guidance` (prose) with `patterns` (structured) — signal collapses.
+- **Don't use sequential numeric keys** if you also want semantic search over them. Pick keys humans/agents would search for by meaning.
+- **Don't use `ttl` on knowledge you want long-lived.** TTL is for sessions, ephemeral cache, WIP notes.
+## Persistence & Indexing
+- File: `.swarm/memory.db` at project root (sql.js).
+- Embeddings: built by `@moflo/embeddings`; indexed with HNSW from `@moflo/memory`.
+- Cold-start cost: ~5 seconds to initialize HNSW. Tests should share a single instance (`beforeAll`, not `beforeEach`).
+- Namespace isolation: each namespace is a logical partition, but the HNSW index spans the table. Query time scales with `limit` and `threshold`, not total row count.
+## See Also
+- `vector-search` skill — RAG patterns over your own documents
+- `memory-optimization` skill — HNSW tuning, quantization, batch ops
+- `.claude/guidance/shipped/moflo-core-guidance.md` — CLI/MCP reference

package/.claude/skills/reasoningbank-intelligence/SKILL.md CHANGED Viewed

@@ -1,201 +1,148 @@
 ---
 name: "reasoningbank-intelligence"
-description: "Implement adaptive learning with ReasoningBank for pattern recognition, strategy optimization, and continuous improvement. Use when building self-learning agents, optimizing workflows, or implementing meta-cognitive systems."
+description: "Adaptive learning for moflo agents via ReasoningBank: trajectory storage, verdict judgment, memory distillation, consolidation, and MMR retrieval. Use when building agents that should improve from experience across runs."
 ---
 # ReasoningBank Intelligence
-## What This Skill Does
-Implements ReasoningBank's adaptive learning system for AI agents to learn from experience, recognize patterns, and optimize strategies over time. Enables meta-cognitive capabilities and continuous improvement.
+Trajectory-based learning pipeline for moflo-enabled agents. Records what an agent did, judges the outcome, distills successful runs into reusable patterns, and retrieves relevant prior experience on the next task.
 ## Prerequisites
-- agentic-flow v1.5.11+
-- AgentDB v1.0.4+ (for persistence)
-- Node.js 18+
+- `@moflo/neural` (ships with moflo)
+- Moflo's memory DB at `.swarm/memory.db` (created on first run)
 ## Quick Start
 ```typescript
-import { ReasoningBank } from 'agentic-flow/reasoningbank';
-// Initialize ReasoningBank
-const rb = new ReasoningBank({
-  persist: true,
-  learningRate: 0.1,
-  adapter: 'agentdb' // Use AgentDB for storage
+import { createInitializedReasoningBank } from '@moflo/neural';
+const rb = await createInitializedReasoningBank({
+  namespace: 'reasoning-bank',
+  vectorDimension: 768,
+  retrievalK: 3,
+  mmrLambda: 0.7,              // 0=pure relevance, 1=pure diversity
+  distillationThreshold: 0.6,  // min verdict score to keep
+  dedupThreshold: 0.95,
 });
+```
-// Record task outcome
-await rb.recordExperience({
-  task: 'code_review',
-  approach: 'static_analysis_first',
-  outcome: {
-    success: true,
-    metrics: {
-      bugs_found: 5,
-      time_taken: 120,
-      false_positives: 1
-    }
-  },
-  context: {
-    language: 'typescript',
-    complexity: 'medium'
-  }
-});
+## The Pipeline
-// Get optimal strategy
-const strategy = await rb.recommendStrategy('code_review', {
-  language: 'typescript',
-  complexity: 'high'
-});
+Four stages. You typically call each once per task:
+```text
+1. Record trajectory  →  storeTrajectory({ id, input, actions, outcome, reward, ... })
+2. Judge              →  const verdict = await rb.judge(trajectory)
+3. Distill            →  const memory  = await rb.distill(trajectory)   // if verdict good enough
+4. Retrieve (next task) →  const hits  = await rb.retrieveByContent(query, k)
 ```
-## Core Features
+### 1. Record
-### 1. Pattern Recognition
 ```typescript
-// Learn patterns from data
-await rb.learnPattern({
-  pattern: 'api_errors_increase_after_deploy',
-  triggers: ['deployment', 'traffic_spike'],
-  actions: ['rollback', 'scale_up'],
-  confidence: 0.85
-});
-// Match patterns
-const matches = await rb.matchPatterns(currentSituation);
+const trajectory = {
+  id: taskId,
+  input: userRequest,
+  actions: ['read_file', 'edit_file', 'run_tests'],
+  outcome: 'success' as const,
+  reward: 1.0,                // 0..1
+  metadata: { toolCalls: 3, durationMs: 1800 },
+  timestamp: new Date(),
+};
+rb.storeTrajectory(trajectory);
 ```
-### 2. Strategy Optimization
+### 2. Judge
+The built-in judge scores on outcome + reward + action-step quality. No external LLM call.
 ```typescript
-// Compare strategies
-const comparison = await rb.compareStrategies('bug_fixing', [
-  'tdd_approach',
-  'debug_first',
-  'reproduce_then_fix'
-]);
-// Get best strategy
-const best = comparison.strategies[0];
-console.log(`Best: ${best.name} (score: ${best.score})`);
+const verdict = await rb.judge(trajectory);
+// { score: 0-1, outcome: 'success' | ..., reasoning: string }
 ```
-### 3. Continuous Learning
+Swap in your own judge by extending `ReasoningBank` if you want LLM-in-the-loop scoring — this was designed as a rule-based baseline so the hot path stays cheap.
+### 3. Distill
+Compresses a trajectory into a reusable `DistilledMemory` (signature, approach, outcome-tagged). Skips if the verdict is below `distillationThreshold`.
 ```typescript
-// Enable auto-learning from all tasks
-await rb.enableAutoLearning({
-  threshold: 0.7,        // Only learn from high-confidence outcomes
-  updateFrequency: 100   // Update models every 100 experiences
-});
+const memory = await rb.distill(trajectory);
+// null if verdict below threshold → not worth learning from
 ```
-## Advanced Usage
+For batch runs (nightly, offline replay):
-### Meta-Learning
 ```typescript
-// Learn about learning
-await rb.metaLearn({
-  observation: 'parallel_execution_faster_for_independent_tasks',
-  confidence: 0.95,
-  applicability: {
-    task_types: ['batch_processing', 'data_transformation'],
-    conditions: ['tasks_independent', 'io_bound']
-  }
-});
+const memories = await rb.distillBatch(trajectories);
 ```
-### Transfer Learning
+### 4. Retrieve
+Query by text (embedding generated for you) or by pre-computed vector:
 ```typescript
-// Apply knowledge from one domain to another
-await rb.transferKnowledge({
-  from: 'code_review_javascript',
-  to: 'code_review_typescript',
-  similarity: 0.8
-});
+const hits = await rb.retrieveByContent(newTaskDescription, 5);
+// [{ memory, relevanceScore, diversityScore, combinedScore }]
 ```
-### Adaptive Agents
+MMR (maximal marginal relevance) prevents the top-K from collapsing to "five slight variations of the same thing" — tune `mmrLambda` to push more diversity.
+## Consolidation
+Memory quality degrades over time without maintenance. Run consolidation periodically (once a week, once after N new entries, etc.):
 ```typescript
-// Create self-improving agent
-class AdaptiveAgent {
-  async execute(task: Task) {
-    // Get optimal strategy
-    const strategy = await rb.recommendStrategy(task.type, task.context);
-    // Execute with strategy
-    const result = await this.executeWithStrategy(task, strategy);
-    // Learn from outcome
-    await rb.recordExperience({
-      task: task.type,
-      approach: strategy.name,
-      outcome: result,
-      context: task.context
-    });
-    return result;
-  }
-}
+const result = await rb.consolidate();
+// { removedDuplicates, contradictionsDetected, prunedPatterns, mergedPatterns }
 ```
-## Integration with AgentDB
+Consolidation:
+- Removes duplicates above `dedupThreshold` similarity.
+- Detects contradictions (same signature, opposite outcomes) if `enableContradictionDetection`.
+- Prunes entries older than `maxPatternAgeDays`.
+- Merges semantically-adjacent patterns.
-```typescript
-// Persist ReasoningBank data
-await rb.configure({
-  storage: {
-    type: 'agentdb',
-    options: {
-      database: './reasoning-bank.db',
-      enableVectorSearch: true
-    }
-  }
-});
+## Persistence
-// Query learned patterns
-const patterns = await rb.query({
-  category: 'optimization',
-  minConfidence: 0.8,
-  timeRange: { last: '30d' }
-});
-```
+ReasoningBank persists through `MofloDbAdapter` to `.swarm/memory.db`. Set `enableMofloDb: false` for ephemeral in-memory use (tests).
-## Performance Metrics
+The `namespace` config isolates reasoning-bank entries from general memory. Default is `reasoning-bank`.
-```typescript
-// Track learning effectiveness
-const metrics = await rb.getMetrics();
-console.log(`
-  Total Experiences: ${metrics.totalExperiences}
-  Patterns Learned: ${metrics.patternsLearned}
-  Strategy Success Rate: ${metrics.strategySuccessRate}
-  Improvement Over Time: ${metrics.improvement}
-`);
-```
+## Anti-Patterns
+- **Don't record every step as a separate trajectory.** A trajectory = one task. Steps are a field inside the trajectory.
+- **Don't skip the judge.** Distilling every trajectory poisons the pool with failures — that's what `distillationThreshold` guards against.
+- **Don't run consolidation on the hot path.** It's a sweep — do it out-of-band.
+- **Don't share `namespace` between different agent roles.** Keep `reasoning-bank:reviewer`, `reasoning-bank:researcher`, etc. separate; signatures overlap otherwise.
+- **Don't raise `retrievalK` to tame bad retrieval.** Tune `mmrLambda` or the embedder instead — more K just dilutes the top results.
-## Best Practices
+## Integration with moflo's Hooks
-1. **Record consistently**: Log all task outcomes, not just successes
-2. **Provide context**: Rich context improves pattern matching
-3. **Set thresholds**: Filter low-confidence learnings
-4. **Review periodically**: Audit learned patterns for quality
-5. **Use vector search**: Enable semantic pattern matching
+Moflo's session/hook system already wires ReasoningBank into the `/flo` spell and the SubAgentStart hook. If you're building a custom agent that should participate, hook into `post-task`:
-## Troubleshooting
+```typescript
+await mcp.hooks_post_task({
+  trajectoryId: taskId,
+  outcome: 'success',
+  reward: 1.0,
+});
+```
-### Issue: Poor recommendations
-**Solution**: Ensure sufficient training data (100+ experiences per task type)
+That records, judges, and distills in one call.
-### Issue: Slow pattern matching
-**Solution**: Enable vector indexing in AgentDB
+## Performance
-### Issue: Memory growing large
-**Solution**: Set TTL for old experiences or enable pruning
+- Retrieve (k=5, corpus of 10k): ~3–8ms.
+- Distill: single call ≈ vector embed + a few similarity checks; ~10–50ms.
+- Consolidate: O(n²) in the namespace — run offline for corpora > 10k.
-## Learn More
+## See Also
-- ReasoningBank Guide: agentic-flow/src/reasoningbank/README.md
-- AgentDB Integration: packages/agentdb/docs/reasoningbank.md
-- Pattern Learning: docs/reasoning/patterns.md
+- `memory-patterns` skill — for non-trajectory memory (sessions, knowledge)
+- `memory-optimization` skill — HNSW tuning, quantization
+- `src/modules/neural/src/reasoning-bank.ts` — full API
+- `src/modules/neural/src/domain/services/learning-service.ts` — trajectory types

package/.claude/skills/v3-security-overhaul/SKILL.md CHANGED Viewed

@@ -26,16 +26,6 @@ npm update @anthropic-ai/claude-code@^2.0.31
 npm audit --audit-level high
 ```
-### CVE-2: Weak Password Hashing
-```typescript
-// ❌ Old: SHA-256 with hardcoded salt
-const hash = crypto.createHash('sha256').update(password + salt).digest('hex');
-// ✅ New: bcrypt with 12 rounds
-import bcrypt from 'bcrypt';
-const hash = await bcrypt.hash(password, 12);
-```
 ### CVE-3: Hardcoded Credentials
 ```typescript
 // ✅ Generate secure random credentials