mindkeg-mcp 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -32,6 +32,8 @@ Unlike traditional RAG systems that chunk large documents, Mind Keg stores **pre
32
32
  - API key authentication with per-repository access control
33
33
  - SQLite storage (zero dependencies, zero config)
34
34
  - Import/export for backup and migration
35
+ - **Smarter knowledge management**: auto-categorization (KNN voting), conflict detection, smart staleness scoring, access tracking with relevance decay, near-duplicate merging, typed learning relationships
36
+ - **Enterprise security**: encryption at rest, audit logging, TTL/data retention, Prometheus monitoring, rate limiting, content integrity verification
35
37
 
36
38
  ## Quick Start
37
39
 
@@ -170,6 +172,8 @@ Copy `templates/AGENTS.md` to the root of any repository where you want agents t
170
172
  | `deprecate_learning` | Mark a learning as deprecated |
171
173
  | `flag_stale` | Flag a learning as potentially outdated |
172
174
  | `delete_learning` | Permanently delete a learning |
175
+ | `merge_learnings` | Merge near-duplicate learnings into a canonical entry |
176
+ | `relate_learnings` | Create typed relationships between learnings |
173
177
  | `list_repositories` | List all repositories with learning counts |
174
178
  | `list_workspaces` | List all workspaces with learning counts |
175
179
 
@@ -206,6 +210,18 @@ mindkeg dedup-scan --dry-run
206
210
  # Backup and restore
207
211
  mindkeg export --output backup.json
208
212
  mindkeg import backup.json --regenerate-embeddings
213
+
214
+ # Data retention
215
+ mindkeg purge --older-than 90 # Purge learnings older than 90 days
216
+ mindkeg purge --repository /path/repo # Purge all learnings for a repo
217
+ mindkeg purge --all --confirm # Purge everything (requires --confirm)
218
+
219
+ # Encryption at rest
220
+ mindkeg encrypt-db # Encrypt existing database (requires MINDKEG_ENCRYPTION_KEY)
221
+ mindkeg decrypt-db # Decrypt existing database (requires MINDKEG_ENCRYPTION_KEY)
222
+
223
+ # Integrity backfill
224
+ mindkeg backfill-integrity # Compute SHA-256 hashes for legacy learnings
209
225
  ```
210
226
 
211
227
  ## Configuration
@@ -243,24 +259,134 @@ export MINDKEG_EMBEDDING_PROVIDER=none
243
259
 
244
260
  Disables semantic search and falls back to SQLite FTS5 full-text search — all other features work identically.
245
261
 
262
+ ## Enterprise Security
263
+
264
+ Mind Keg ships a suite of security features suitable for corporate and regulated environments.
265
+
266
+ ### Encryption at Rest
267
+
268
+ Encrypt `content` and `embedding` fields using AES-256-GCM. All other fields (category, tags, timestamps) remain plaintext.
269
+
270
+ ```bash
271
+ # Generate a 256-bit key
272
+ node -e "console.log(require('crypto').randomBytes(32).toString('base64'))"
273
+
274
+ export MINDKEG_ENCRYPTION_KEY=<your-base64-key>
275
+ mindkeg serve --stdio
276
+ ```
277
+
278
+ To encrypt an existing database in-place:
279
+
280
+ ```bash
281
+ MINDKEG_ENCRYPTION_KEY=<key> mindkeg encrypt-db
282
+ # Creates a backup automatically before operating
283
+ ```
284
+
285
+ > **Note**: FTS5 keyword search does not work when encryption is enabled. Use FastEmbed or OpenAI embedding providers for search.
286
+
287
+ ### Audit Logging
288
+
289
+ All MCP tool invocations are written to a structured JSON lines audit log (SIEM-compatible).
290
+
291
+ ```bash
292
+ export MINDKEG_AUDIT_LOG=~/.mindkeg/audit.jsonl # default
293
+ # Or: MINDKEG_AUDIT_LOG=stderr (write to stderr alongside app logs)
294
+ # Or: MINDKEG_AUDIT_LOG=none (disable)
295
+ ```
296
+
297
+ Each audit entry contains: `timestamp` (ISO 8601), `action`, `actor` (API key prefix), `resource_id`, `result`, `client` transport metadata. Sensitive fields (`content`, `embedding`) are never logged.
298
+
299
+ ### TTL and Data Retention
300
+
301
+ Set a global default TTL or a per-learning TTL to automatically expire old entries.
302
+
303
+ ```bash
304
+ export MINDKEG_DEFAULT_TTL_DAYS=365 # Expire all learnings after 1 year by default
305
+ export MINDKEG_PURGE_INTERVAL_HOURS=24 # Run purge every 24 hours (default)
306
+ ```
307
+
308
+ Per-learning TTL overrides the global default:
309
+
310
+ ```json
311
+ { "content": "...", "ttl_days": 30 }
312
+ ```
313
+
314
+ Manual purge:
315
+
316
+ ```bash
317
+ mindkeg purge --older-than 180 --confirm
318
+ ```
319
+
320
+ ### Monitoring
321
+
322
+ HTTP transport exposes Prometheus-compatible endpoints:
323
+
324
+ ```
325
+ GET /health → JSON: { status, version, uptime, database }
326
+ GET /metrics → Prometheus text format
327
+ ```
328
+
329
+ Both endpoints are unauthenticated by default. Set `MINDKEG_METRICS_AUTH=true` to require API key auth.
330
+
331
+ Metrics exposed: `mindkeg_learnings_total`, `mindkeg_tool_invocations_total`, `mindkeg_tool_duration_seconds`, `mindkeg_errors_total`, `mindkeg_uptime_seconds`, `mindkeg_search_latency_seconds`.
332
+
333
+ ### Rate Limiting
334
+
335
+ HTTP transport enforces per-API-key token bucket rate limits with separate write and read buckets.
336
+
337
+ ```bash
338
+ export MINDKEG_RATE_LIMIT_WRITE_RPM=100 # default: 100 write req/min per key
339
+ export MINDKEG_RATE_LIMIT_READ_RPM=300 # default: 300 read req/min per key
340
+ ```
341
+
342
+ Returns HTTP 429 with `Retry-After` header when exceeded. stdio transport is not rate-limited.
343
+
344
+ ### Supply Chain Security
345
+
346
+ - npm packages published with `--provenance` (Sigstore attestation via GitHub Actions)
347
+ - CycloneDX SBOM generated and uploaded as a release asset on every GitHub release
348
+ - Cosign signatures for npm tarballs uploaded as release assets
349
+
350
+ ### Content Integrity
351
+
352
+ SHA-256 integrity hashes are computed and stored for every learning on write. Verify on demand:
353
+
354
+ ```json
355
+ { "query": "...", "verify_integrity": true }
356
+ ```
357
+
358
+ Each result includes `integrity_valid: true | false | null` (`null` for legacy learnings without a stored hash).
359
+
360
+ Backfill integrity hashes for existing learnings:
361
+
362
+ ```bash
363
+ mindkeg backfill-integrity
364
+ ```
365
+
246
366
  ## Data Model
247
367
 
248
368
  Each learning contains:
249
369
 
250
- | Field | Type | Notes |
251
- |--------------|-------------------|------------------------------------------------|
252
- | `id` | UUID | Auto-generated |
253
- | `content` | string (max 500) | The atomic learning text |
254
- | `category` | enum | One of 6 categories |
255
- | `tags` | string[] | Free-form labels |
256
- | `repository` | string or null | Repo path; null = workspace or global |
257
- | `workspace` | string or null | Workspace path; null = repo-specific or global |
258
- | `group_id` | UUID or null | Link related learnings |
259
- | `source` | string | Who created this (e.g., "claude-code") |
260
- | `status` | enum | `active` or `deprecated` |
261
- | `stale_flag` | boolean | Agent-flagged as potentially outdated |
262
- | `created_at` | ISO 8601 | Auto-set on creation |
263
- | `updated_at` | ISO 8601 | Auto-updated on modification |
370
+ | Field | Type | Notes |
371
+ |-------------------|-------------------|-------------------------------------------------------------|
372
+ | `id` | UUID | Auto-generated |
373
+ | `content` | string (max 500) | The atomic learning text (sanitized on write) |
374
+ | `category` | enum | One of 6 categories |
375
+ | `tags` | string[] | Free-form labels |
376
+ | `repository` | string or null | Repo path; null = workspace or global |
377
+ | `workspace` | string or null | Workspace path; null = repo-specific or global |
378
+ | `group_id` | UUID or null | Link related learnings |
379
+ | `source` | string | Who created this (e.g., "claude-code") |
380
+ | `status` | enum | `active` or `deprecated` |
381
+ | `stale_flag` | boolean | Agent-flagged as potentially outdated |
382
+ | `ttl_days` | integer or null | Per-learning TTL; overrides global `MINDKEG_DEFAULT_TTL_DAYS` |
383
+ | `source_agent` | string or null | Agent name for provenance tracking |
384
+ | `integrity_hash` | string or null | SHA-256 hash of canonical fields for tamper detection |
385
+ | `access_count` | integer | Times returned by search/get_context (feeds ranking) |
386
+ | `last_accessed_at`| ISO 8601 or null | Last time returned by search/get_context |
387
+ | `staleness_score` | float 0.0–1.0 | Auto-computed from age, access recency, and conflicts |
388
+ | `created_at` | ISO 8601 | Auto-set on creation |
389
+ | `updated_at` | ISO 8601 | Auto-updated on modification; TTL expiry anchors to this |
264
390
 
265
391
  ## Scoping
266
392
 
@@ -326,14 +452,19 @@ Mind Keg works fully offline by default. FastEmbed provides free, local semantic
326
452
  ```
327
453
  CLI (Commander.js)
328
454
  └── init / stats / serve / api-key / migrate / export / import / dedup-scan
455
+ purge / encrypt-db / decrypt-db / backfill-integrity
329
456
 
330
457
  src/
331
458
  index.ts Entry point, stdio + HTTP transports
332
459
  server.ts MCP server + tool registration
333
460
  config.ts Config loading (env vars → defaults)
461
+ audit/ Structured JSON lines audit logger
334
462
  auth/ API key generation + validation middleware
335
- tools/ One file per MCP tool (9 tools)
336
- services/ LearningService + EmbeddingService
463
+ crypto/ AES-256-GCM field encryption
464
+ monitoring/ Prometheus metrics + /health endpoint
465
+ security/ Content sanitization, integrity hashing, rate limiter
466
+ tools/ One file per MCP tool (11 tools) + shared tool-utils
467
+ services/ LearningService + EmbeddingService + PurgeService + ConflictDetector + StalenessEngine
337
468
  storage/ StorageAdapter interface + SQLite impl
338
469
  models/ Zod schemas + TypeScript types
339
470
  utils/ Logger (pino → stderr) + error classes