engrm 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (98) hide show
  1. package/README.md +214 -73
  2. package/bin/build.mjs +97 -0
  3. package/bin/engrm.mjs +13 -0
  4. package/dist/cli.js +2712 -0
  5. package/dist/hooks/elicitation-result.js +1786 -0
  6. package/dist/hooks/post-tool-use.js +2357 -0
  7. package/dist/hooks/pre-compact.js +1321 -0
  8. package/dist/hooks/sentinel.js +1168 -0
  9. package/dist/hooks/session-start.js +1473 -0
  10. package/dist/hooks/stop.js +1834 -0
  11. package/dist/server.js +16628 -0
  12. package/package.json +29 -4
  13. package/packs/api-best-practices.json +182 -0
  14. package/packs/nextjs-patterns.json +68 -0
  15. package/packs/node-security.json +68 -0
  16. package/packs/python-django.json +68 -0
  17. package/packs/react-gotchas.json +182 -0
  18. package/packs/typescript-patterns.json +67 -0
  19. package/packs/web-security.json +182 -0
  20. package/.mcp.json +0 -9
  21. package/AUTH-DESIGN.md +0 -436
  22. package/BRIEF.md +0 -197
  23. package/CLAUDE.md +0 -44
  24. package/COMPETITIVE.md +0 -174
  25. package/CONTEXT-OPTIMIZATION.md +0 -305
  26. package/INFRASTRUCTURE.md +0 -252
  27. package/MARKET.md +0 -230
  28. package/PLAN.md +0 -278
  29. package/SENTINEL.md +0 -293
  30. package/SERVER-API-PLAN.md +0 -553
  31. package/SPEC.md +0 -843
  32. package/SWOT.md +0 -148
  33. package/SYNC-ARCHITECTURE.md +0 -294
  34. package/VIBE-CODER-STRATEGY.md +0 -250
  35. package/bun.lock +0 -375
  36. package/hooks/post-tool-use.ts +0 -144
  37. package/hooks/session-start.ts +0 -64
  38. package/hooks/stop.ts +0 -131
  39. package/mem-page.html +0 -1305
  40. package/src/capture/dedup.test.ts +0 -103
  41. package/src/capture/dedup.ts +0 -76
  42. package/src/capture/extractor.test.ts +0 -245
  43. package/src/capture/extractor.ts +0 -330
  44. package/src/capture/quality.test.ts +0 -168
  45. package/src/capture/quality.ts +0 -104
  46. package/src/capture/retrospective.test.ts +0 -115
  47. package/src/capture/retrospective.ts +0 -121
  48. package/src/capture/scanner.test.ts +0 -131
  49. package/src/capture/scanner.ts +0 -100
  50. package/src/capture/scrubber.test.ts +0 -144
  51. package/src/capture/scrubber.ts +0 -181
  52. package/src/cli.ts +0 -517
  53. package/src/config.ts +0 -238
  54. package/src/context/inject.test.ts +0 -940
  55. package/src/context/inject.ts +0 -382
  56. package/src/embeddings/backfill.ts +0 -50
  57. package/src/embeddings/embedder.test.ts +0 -76
  58. package/src/embeddings/embedder.ts +0 -139
  59. package/src/lifecycle/aging.test.ts +0 -103
  60. package/src/lifecycle/aging.ts +0 -36
  61. package/src/lifecycle/compaction.test.ts +0 -264
  62. package/src/lifecycle/compaction.ts +0 -190
  63. package/src/lifecycle/purge.test.ts +0 -100
  64. package/src/lifecycle/purge.ts +0 -37
  65. package/src/lifecycle/scheduler.test.ts +0 -120
  66. package/src/lifecycle/scheduler.ts +0 -101
  67. package/src/provisioning/browser-auth.ts +0 -172
  68. package/src/provisioning/provision.test.ts +0 -198
  69. package/src/provisioning/provision.ts +0 -94
  70. package/src/register.test.ts +0 -167
  71. package/src/register.ts +0 -178
  72. package/src/server.ts +0 -436
  73. package/src/storage/migrations.test.ts +0 -244
  74. package/src/storage/migrations.ts +0 -261
  75. package/src/storage/outbox.test.ts +0 -229
  76. package/src/storage/outbox.ts +0 -131
  77. package/src/storage/projects.test.ts +0 -137
  78. package/src/storage/projects.ts +0 -184
  79. package/src/storage/sqlite.test.ts +0 -798
  80. package/src/storage/sqlite.ts +0 -934
  81. package/src/storage/vec.test.ts +0 -198
  82. package/src/sync/auth.test.ts +0 -76
  83. package/src/sync/auth.ts +0 -68
  84. package/src/sync/client.ts +0 -183
  85. package/src/sync/engine.test.ts +0 -94
  86. package/src/sync/engine.ts +0 -127
  87. package/src/sync/pull.test.ts +0 -279
  88. package/src/sync/pull.ts +0 -170
  89. package/src/sync/push.test.ts +0 -117
  90. package/src/sync/push.ts +0 -230
  91. package/src/tools/get.ts +0 -34
  92. package/src/tools/pin.ts +0 -47
  93. package/src/tools/save.test.ts +0 -301
  94. package/src/tools/save.ts +0 -231
  95. package/src/tools/search.test.ts +0 -69
  96. package/src/tools/search.ts +0 -181
  97. package/src/tools/timeline.ts +0 -64
  98. package/tsconfig.json +0 -22
package/PLAN.md DELETED
@@ -1,278 +0,0 @@
1
- # Implementation Plan — Engrm
2
-
3
- ## Approach
4
-
5
- **Internal tooling first.** We're building this so our dev team can share project context across machines and developers. The public product comes later — first it needs to work for us.
6
-
7
- Built from scratch. claude-mem is a reference for how to hook into Claude Code (hooks, MCP registration, observation capture patterns) but no code is shared. This avoids AGPL licensing issues and lets us design the architecture around cross-device team memory from the start.
8
-
9
- ## Component Architecture
10
-
11
- ```
12
- engrm/
13
- ├── src/
14
- │ ├── server.ts # MCP protocol handler (entry point)
15
- │ ├── tools/ # MCP tool implementations
16
- │ │ ├── search.ts # search() — hybrid local + remote, project-scoped
17
- │ │ ├── timeline.ts # timeline() — chronological context
18
- │ │ ├── get.ts # get_observations() — fetch by ID
19
- │ │ ├── save.ts # save_observation() — manual save with quality scoring
20
- │ │ └── pin.ts # pin_observation() — prevent aging
21
- │ ├── capture/ # Observation extraction
22
- │ │ ├── extractor.ts # Extract observations from tool use
23
- │ │ ├── scrubber.ts # Secret/PII scrubbing
24
- │ │ ├── quality.ts # Quality scoring (0.0-1.0)
25
- │ │ └── dedup.ts # Near-duplicate detection (title similarity)
26
- │ ├── storage/ # Local storage layer
27
- │ │ ├── sqlite.ts # SQLite database (source of truth)
28
- │ │ ├── migrations.ts # Schema migrations
29
- │ │ ├── outbox.ts # Sync outbox queue
30
- │ │ └── projects.ts # Project identity (git remote → canonical ID)
31
- │ ├── lifecycle/ # Observation lifecycle management
32
- │ │ ├── aging.ts # Daily: active → aging after 30 days
33
- │ │ ├── compaction.ts # Weekly: aging → archived, generate digests
34
- │ │ └── purge.ts # Monthly: delete archived > 12 months
35
- │ ├── sync/ # Remote sync layer
36
- │ │ ├── client.ts # Candengo Vector REST client
37
- │ │ └── engine.ts # Sync engine (outbox flush, backfill, archival cleanup)
38
- │ ├── context/ # Context injection
39
- │ │ └── inject.ts # Session start context builder
40
- │ └── config.ts # Configuration management
41
-
42
- ├── hooks/ # Claude Code hooks
43
- │ ├── post-tool-use.sh # Observation capture
44
- │ └── stop.sh # Session summary + sync flush
45
-
46
- ├── package.json
47
- ├── tsconfig.json
48
- ├── BRIEF.md
49
- ├── SPEC.md
50
- ├── PLAN.md # This file
51
- └── CLAUDE.md
52
- ```
53
-
54
- ---
55
-
56
- ## Phase 1: Local MCP Server + Provisioning (Weeks 1-2)
57
-
58
- **Goal**: Working MCP server with local SQLite storage, and a self-service provisioning flow so any developer can go from zero to working memory in under 2 minutes.
59
-
60
- ### 1.1 MCP Server Core
61
-
62
- | Task | Description | Effort |
63
- |---|---|---|
64
- | Project scaffolding | TypeScript + Bun, MCP SDK, bun:sqlite | S |
65
- | SQLite schema + migrations | projects, observations, sessions, sync_outbox tables (see SPEC §1-2) | M |
66
- | Project identity detection | Auto-detect canonical project ID from git remote URL, normalise, store in projects table | M |
67
- | MCP tool: `save_observation` | Save to local SQLite with project FK, quality score, add to sync outbox | S |
68
- | MCP tool: `search` | Local SQLite FTS5 search, project-scoped by default, quality-weighted ranking | M |
69
- | MCP tool: `get_observations` | Fetch by IDs from local SQLite | S |
70
- | MCP tool: `timeline` | Chronological context around an observation | M |
71
- | MCP tool: `pin_observation` | Pin/unpin observations to prevent aging | XS |
72
- | Quality scoring | Score observations at capture time (0.0-1.0) based on type, content signals (see SPEC §2) | M |
73
- | Secret scrubber | Regex-based scrubbing of API keys, passwords, tokens before storage | M |
74
- | Relative file paths | Store file paths relative to project root, resolve at capture time | S |
75
- | Configuration | `~/.engrm/settings.json` — local paths, remote config | S |
76
-
77
- ### 1.2 Self-Provisioning
78
-
79
- | Task | Description | Effort |
80
- |---|---|---|
81
- | Engrm landing page | `www.engrm.dev` — product page + signup + install instructions | M |
82
- | Account provisioning backend | Signup → create mem_accounts row, namespace, provision token | M |
83
- | Provision API endpoint | `POST /v1/mem/provision` — exchange token for permanent credentials | S |
84
- | `npx engrm init` | CLI command: redeem token, write settings, register MCP + hooks in Claude Code | M |
85
- | Team invite flow | Admin creates team → invite URL → member joins with team namespace pre-configured | M |
86
- | Self-hosted init path | `--url` flag for custom endpoints, `--manual` for air-gapped environments | S |
87
-
88
- ### 1.3 Provisioning Flow
89
-
90
- ```
91
- 1. Developer visits www.engrm.dev
92
- 2. Signs up (email or GitHub OAuth)
93
- 3. Backend provisions account + namespace
94
- 4. Page shows personalised install command:
95
- npx engrm init --token=cmt_abc123...
96
- 5. Developer runs command in terminal
97
- 6. Plugin exchanges token → gets API key, endpoint, namespace
98
- 7. Plugin writes settings.json, registers MCP server + hooks in Claude Code
99
- 8. Next Claude Code session has memory
100
- ```
101
-
102
- For teams: admin creates team at `www.engrm.dev/team`, shares invite link, team members get pre-configured for the shared namespace.
103
-
104
- **Deliverable**: A working MCP server that Claude Code can call, with self-service provisioning from candengo.com. Any developer can sign up and be running in under 2 minutes.
105
-
106
- ---
107
-
108
- ## Phase 2: Claude Code Hooks (Week 3)
109
-
110
- **Goal**: Automatic observation capture from Claude Code sessions.
111
-
112
- | Task | Description | Effort |
113
- |---|---|---|
114
- | PostToolUse hook | Shell script that extracts observations from tool results | L |
115
- | Stop hook | Session summary generation, sync flush | M |
116
- | MCP server registration | `.mcp.json` config for Claude Code | XS |
117
- | Hooks registration | `hooks.json` for Claude Code | XS |
118
- | Context injection | Inject relevant history on session start (via MCP tool call) | M |
119
- | Observation quality filtering | Skip trivial tool uses (ls, cat of small files), focus on meaningful work | M |
120
-
121
- **Deliverable**: Claude Code automatically captures observations as you work. Session summaries on exit. Relevant history injected on start.
122
-
123
- ### Observation Extraction Design
124
-
125
- This is the hardest problem. What makes a good observation?
126
-
127
- **Capture triggers** (PostToolUse):
128
- - File edits → what changed and why
129
- - Command execution with errors → what failed and how it was fixed
130
- - Multiple file reads in sequence → likely investigating something
131
- - Test runs → pass/fail context
132
-
133
- **Skip** (low signal):
134
- - Simple file reads (single `cat`)
135
- - `ls`, `pwd`, `git status` and similar navigation
136
- - Repeated identical tool calls
137
-
138
- **Extraction approach**: The hook sends the tool name + result summary to the MCP server. The server decides whether it's worth capturing based on the tool type and content. Quality score is assigned at capture time. Observations are batched per-session and deduplicated (title similarity > 0.8 against last 24h → merge into existing).
139
-
140
- ### Observation Lifecycle + Deduplication
141
-
142
- | Task | Description | Effort |
143
- |---|---|---|
144
- | Deduplication on save | Check title similarity against last 24h for same project, merge if > 0.8 | M |
145
- | Aging job | Daily: move active observations older than 30 days to aging (0.7x search weight) | S |
146
- | Archival + compaction | Weekly: observations > 90 days grouped by session, summarised into digest | L |
147
- | Purge job | Monthly: delete archived observations > 12 months (keep digests + pinned) | S |
148
- | FTS5 index maintenance | Remove archived observations from FTS5 index during compaction | S |
149
- | Quota check | Count active+aging observations for free tier enforcement | S |
150
-
151
- **Why this matters now**: Without lifecycle management, a developer generating ~100 observations/day hits 10K in ~3 months. Search results degrade as old, irrelevant observations pollute rankings. Compaction turns 25 old observations from a debugging session into one useful digest. Aging reduces the weight of stale knowledge. The free tier stays usable because only active+aging observations count toward the 10K limit — compacted observations are free.
152
-
153
- ---
154
-
155
- ## Phase 3: Cross-Device Sync + Team Memory (Weeks 4-6)
156
-
157
- **Goal**: Offline-first sync to Candengo Vector with team support from day one. Work on laptop, continue on desktop. Other developers' observations are searchable too.
158
-
159
- Team memory isn't a separate phase — it's the reason we're building this. User identity, attribution, and shared namespaces are built into the sync layer from the start.
160
-
161
- ### 3.1 Candengo Vector API Prep
162
-
163
- | Task | Description | Effort |
164
- |---|---|---|
165
- | Metadata filtering on search API | `metadata_filters` param on `/v1/search` — filter by `project_canonical`, `user_id`, etc. | S |
166
- | Document listing by source_type | `GET /v1/documents?source_type=X` with pagination | S |
167
- | Document deletion by source_id | `DELETE /v1/documents/{source_id}` — needed for archival/compaction cleanup | S |
168
- | Device/user ID tracking in metadata | Accept `device_id`, `user_id` in metadata | XS |
169
-
170
- ### 3.2 Sync Engine
171
-
172
- | Task | Description | Effort |
173
- |---|---|---|
174
- | Candengo Vector REST client | TypeScript HTTP client for `/v1/ingest`, `/v1/search`, `/v1/ingest/batch`, `/v1/documents/{id}` | M |
175
- | Fire-and-forget sync | On observation save → attempt immediate push | S |
176
- | Background sync timer | Every 30s → flush pending outbox items (batch of 50) | S |
177
- | Startup backfill | On boot → sync observations saved while offline (high-water-mark) | M |
178
- | Connectivity detection | Skip sync when offline, resume when connected | S |
179
- | Retry with exponential backoff | Failed syncs retry 30s, 60s, 120s, max 5min | S |
180
- | Observation → Candengo mapping | Map to ingest format with `project_canonical` in metadata, source_id = `{user}-{device}-obs-{id}` | M |
181
- | Archival sync | When compaction runs: delete archived source_ids from Vector, ingest digest | M |
182
-
183
- ### 3.3 Team + Hybrid Search
184
-
185
- | Task | Description | Effort |
186
- |---|---|---|
187
- | Hybrid search orchestrator | Query local FTS5 + Candengo `/v1/search` in parallel, scoped by `project_canonical` | M |
188
- | Result merging + deduplication | Merge by source_id, weighted scoring (semantic × quality × lifecycle) | M |
189
- | Graceful degradation | Candengo unreachable → local-only search (transparent) | S |
190
- | Device ID generation | Auto-generate stable device ID on first run | XS |
191
- | User identity + attribution | `user_id` in all observations, "david/laptop" in results | S |
192
- | Source ID namespacing | `{user_id}-{device_id}-obs-{local_id}` prevents all collisions | S |
193
- | Visibility controls | `shared` / `personal` / `secret` flags | M |
194
- | Team search scope | Search own + team observations by default, filtered by `project_canonical` | M |
195
- | Cross-project search | Support `project: "*"` to search across all projects | S |
196
-
197
- **Deliverable**: Full cross-device team sync. Observations from any team member appear on any device within 30 seconds. Works offline, syncs when reconnected. Projects are matched across machines by git remote URL. New developer installs, connects to the shared namespace, and their agent has the full team knowledge base.
198
-
199
- ### Backfill Strategy
200
-
201
- Instead of diffing all IDs on every startup (expensive at scale), use a high-water-mark:
202
-
203
- ```
204
- 1. Store last_synced_epoch locally
205
- 2. On startup: SELECT * FROM observations WHERE created_at_epoch > last_synced_epoch
206
- 3. Batch push missing observations
207
- 4. Update last_synced_epoch
208
- ```
209
-
210
- Simple, efficient, scales to any observation count.
211
-
212
- ---
213
-
214
- ## Phase 4: Dogfood (Weeks 7-8)
215
-
216
- **Goal**: Run it internally on our projects (Candengo, Alchemy, AIMY). Fix what hurts.
217
-
218
- | Task | Description | Effort |
219
- |---|---|---|
220
- | Team onboarding | Install on all dev machines, shared Candengo Vector namespace | S |
221
- | Observation quality tuning | Adjust capture filters based on real usage — too noisy? too quiet? | M |
222
- | Search relevance tuning | Adjust scoring weights based on real queries | M |
223
- | Bug fixes from dogfooding | Whatever breaks | M |
224
- | Automated testing | Unit tests, sync integration tests | L |
225
- | Performance benchmarking | <50ms local search, <200ms remote search | M |
226
-
227
- ---
228
-
229
- ## Phase 5: Public Launch (Weeks 9-10)
230
-
231
- | Task | Description | Effort |
232
- |---|---|---|
233
- | One-line installer | `npx engrm install` or similar | M |
234
- | CLI tool | `engrm status`, `search`, `sync` commands | M |
235
- | Documentation | Installation, configuration, usage guide | M |
236
- | GitHub repo (FSL-1.1-ALv2 license) | README, examples, contributing guide, LICENSE file | M |
237
- | Free tier limits enforcement | Observation count, device count checks against account tier | M |
238
- | Upgrade flow | In-plugin nudge when approaching limits, link to engrm.dev/upgrade | S |
239
-
240
- **Licensing**: Core client released under FSL-1.1-ALv2 (Functional Source License, Fair Source). Source-available — developers can read, modify, and self-host freely. The restriction: nobody can fork it and offer a competing hosted service. Each version converts to Apache 2.0 after 2 years. Sentinel (real-time AI audit) is proprietary, delivered from a separate private repo to paying customers only.
241
-
242
- ---
243
-
244
- ## Effort Key
245
-
246
- | Size | Estimated Effort | Description |
247
- |---|---|---|
248
- | XS | < 2 hours | Trivial change, config, or wrapper |
249
- | S | 2-4 hours | Straightforward implementation |
250
- | M | 4-8 hours | Moderate complexity, some design decisions |
251
- | L | 1-2 days | Significant feature, requires careful design |
252
-
253
- ---
254
-
255
- ## Dependencies & Critical Path
256
-
257
- ```
258
- Phase 1 (Local MCP) ──→ Phase 2 (Hooks) ──→ Phase 3 (Sync + Team) ──→ Phase 4 (Dogfood) ──→ Phase 5 (Launch)
259
- ```
260
-
261
- **Phase 1+2 are usable standalone** — local-only memory is already valuable.
262
- **Phase 3 is the whole point** — cross-device team sync is why we're building this.
263
- **Phase 4 is essential** — dogfooding on our own projects before releasing externally.
264
-
265
- ---
266
-
267
- ## Risk Register
268
-
269
- | Risk | Impact | Likelihood | Mitigation |
270
- |---|---|---|---|
271
- | Observation quality too noisy | High | Medium | Quality scoring (0.0-1.0), skip below 0.1, deduplication on save, compaction at 90 days |
272
- | Observation volume exceeds quota | Medium | High | Lifecycle management: aging → archival → purge. Compaction summarises old sessions into digests. Only active+aging counts toward quota |
273
- | Project identity mismatch across machines | High | Medium | Canonical ID from normalised git remote URL. Fallback: `.engrm.json` in project root |
274
- | Search relevance degrades over time | High | Medium | Quality-weighted ranking, lifecycle scoring (aging=0.7x), project scoping, compaction removes noise |
275
- | Source ID collisions across devices | Medium | High | Source ID = `{user_id}-{device_id}-obs-{local_id}` — unique across all dimensions |
276
- | MCP protocol breaking changes | High | Low | Pin MCP SDK version, abstract protocol layer |
277
- | Secret leakage in observations | Critical | Medium | Multi-layer scrubbing, sensitivity classification, relative file paths only |
278
- | Sync conflicts | Medium | Low | Source ID namespacing — structurally impossible for two users to overwrite each other |
package/SENTINEL.md DELETED
@@ -1,293 +0,0 @@
1
- # Engrm Sentinel — Real-Time AI Audit for Coding Agents
2
-
3
- **Status**: Planned (Phase 5)
4
- **Target Launch**: April 22, 2026 (6 weeks from 2026-03-11)
5
- **Tier**: Pro + Team (paid upsell feature)
6
-
7
- ## Executive Summary
8
-
9
- Sentinel is a real-time code validation layer that intercepts AI agent tool calls (file writes, edits) **before execution**, retrieves team-specific coding standards from Engrm's vector memory, and routes the diff through a configurable audit LLM for judgment. If the code violates team standards, Sentinel blocks the write and tells the agent exactly what to fix — the agent self-corrects automatically.
10
-
11
- No competitor offers this. Every existing AI code review tool (CodeRabbit, Qodo, Greptile, Ellipsis) operates at PR level — **after** code is written. Sentinel operates at the pre-execution level, preventing mistakes before they happen.
12
-
13
- ## Market Gap
14
-
15
- ```
16
- Static Standards Dynamic RAG Standards
17
- ───────────────── ──────────────────────
18
- PR-Level Review │ CodeRabbit ($24) │ Qodo ($30)
19
- │ Ellipsis ($20) │ Greptile ($30)
20
- │ Sourcery ($12) │
21
- │ Copilot ($19-39) │
22
- ─────────────────┼───────────────────────┼─────────────────────────
23
- Real-Time │ decider/claude-hooks │
24
- Pre-Execution │ trailofbits config │ ← ENGRM SENTINEL
25
- Interception │ (local-only, no RAG) │ (UNOCCUPIED)
26
- ```
27
-
28
- ### Competitive Research (March 2026)
29
-
30
- | Tool | Pricing | Timing | Custom Standards | RAG/Vector | Agent Hooks |
31
- |------|---------|--------|-----------------|------------|-------------|
32
- | CodeRabbit | $0-24/dev/mo | PR-level + IDE inline | Yes (.coderabbit.yml) | No | No |
33
- | Qodo | $0-30/dev/mo | PR-level + IDE | Yes (auto-generated) | Yes (proprietary) | No |
34
- | Greptile | $30/dev/mo | PR-level | Learns from PRs | Yes (AST + vector) | No |
35
- | Ellipsis | ~$20/dev/mo | PR-level | Yes (natural language) | No | No |
36
- | Cursor Bugbot | Included ($20-40/mo) | PR-level (background) | .cursor/rules | Proprietary | Cursor-only |
37
- | Copilot Review | $19-39/user/mo | PR-level | Repository rules | Proprietary | Copilot-only |
38
- | **Engrm Sentinel** | **$15-25/dev/mo** | **Pre-execution** | **Dynamic RAG** | **Hybrid FTS5+vec** | **Any MCP agent** |
39
-
40
- ### GitHub Reference Implementations
41
-
42
- | Repo | Stars | Pattern | What We Learn |
43
- |------|-------|---------|--------------|
44
- | disler/claude-code-hooks-mastery | 3.3k | Builder/Validator agents, PostToolUse linting | Builder/Validator separation pattern |
45
- | trailofbits/claude-code-config | 1.6k | Security blocking hooks, anti-rationalization gate | Stop hook prompt checking for incomplete work |
46
- | qodo-ai/pr-agent | 10.5k | PR review tools, AGPL | PR compression for large diffs |
47
- | ChrisWiles/claude-code-showcase | 5.5k | Skills, agents, GitHub Actions | Skill evaluation hook as pattern |
48
- | decider/claude-hooks | 67 | Static rule enforcement | Hierarchical config (root + dir overrides) |
49
- | praneybehl/code-review-mcp | 29 | MCP server for multi-provider review | Stateless — no memory, no team sharing |
50
-
51
- **Key insight**: Every existing implementation is stateless. None retrieves project-specific standards from vector memory. None syncs findings across a team.
52
-
53
- ## Architecture
54
-
55
- ### Flow
56
-
57
- ```
58
- Developer working with Claude Code...
59
-
60
- Claude tries to write a file
61
-
62
-
63
- PreToolUse(Write|Edit) fires → hooks/sentinel.ts
64
-
65
- ├─ 1. SKIP CHECK
66
- │ Is sentinel enabled? Is this file in skip_patterns?
67
-
68
- ├─ 2. RETRIEVE STANDARDS
69
- │ engrm search("auth middleware security")
70
- │ → "Decision: all auth must use bcrypt, not MD5"
71
- │ → "Bugfix: session tokens were stored unencrypted"
72
- │ → "Standard: never log auth credentials"
73
-
74
- ├─ 3. AUDIT LLM CALL
75
- │ POST base_url/chat/completions
76
- │ { model, messages: [system + standards + diff] }
77
- │ temperature: 0, max_tokens: 150
78
-
79
- ├─ 4a. PASS → exit 0 (Claude proceeds)
80
- ├─ 4b. WARN → exit 0 + log observation
81
- └─ 4c. BLOCK → exit 2 + stderr reason
82
- (Claude receives error, self-corrects, retries)
83
-
84
-
85
- Finding saved as observation → syncs to team → future audits are smarter
86
- ```
87
-
88
- ### Dashboard → Server → Client Config Push
89
-
90
- ```
91
- Dashboard (engrm.dev/sentinel)
92
- │ POST /v1/mem/sentinel/config
93
-
94
- Candengo Vector (sync_events, record_type="sentinel_config")
95
- │ GET /v1/sync/changes (existing pull loop, every 60s)
96
-
97
- Client (~/.engrm/settings.json → sentinel config merged)
98
-
99
-
100
- PreToolUse hook reads config on each invocation
101
- ```
102
-
103
- ### Provider Agnostic (OpenAI-Compatible API)
104
-
105
- All major LLM providers speak the same `POST /v1/chat/completions` format:
106
-
107
- | Provider | Base URL | Models | Cost/1K audits |
108
- |----------|----------|--------|---------------|
109
- | OpenAI | api.openai.com/v1 | gpt-4o-mini | ~$0.40 |
110
- | xAI/Grok | api.x.ai/v1 | grok-3-mini | ~$0.30 |
111
- | Mistral | api.mistral.ai/v1 | mistral-small | ~$0.20 |
112
- | Anthropic | via proxy | haiku-4.5 | ~$0.50 |
113
- | Local vLLM | 192.168.5.5:8000/v1 | devstral-24b | $0 |
114
- | Ollama | localhost:11434/v1 | llama3-8b | $0 |
115
-
116
- One client function, ~40 lines. No provider-specific code.
117
-
118
- ## Config Schema
119
-
120
- ```typescript
121
- interface SentinelConfig {
122
- enabled: boolean;
123
- mode: "advisory" | "blocking"; // WARN-only vs BLOCK+WARN
124
- provider: "openai" | "xai" | "mistral" | "anthropic" | "custom";
125
- base_url: string; // OpenAI-compatible endpoint
126
- model: string; // e.g. "gpt-4o-mini"
127
- api_key_env?: string; // Client-side env var name
128
- encrypted_api_key?: string; // Server-pushed, decrypted client-side
129
- match_tools: string[]; // ["Write", "Edit"] default
130
- timeout_ms: number; // Max wait (default 8000)
131
- skip_patterns: string[]; // e.g. ["*.test.ts", "*.md"]
132
- max_diff_lines: number; // Truncate large diffs (default 200)
133
- }
134
- ```
135
-
136
- Stored in `~/.engrm/settings.json` under `sentinel` key. Pushed from dashboard via sync_events.
137
-
138
- ## Standards
139
-
140
- Standards are **observations tagged as audit-relevant**. No separate schema needed.
141
-
142
- - Add `"standard"` to the observation type enum
143
- - Tag with `sentinel-standard` in concepts
144
- - Standards sync through the existing push/pull pipeline
145
- - Dashboard provides UI for creating/managing them
146
- - Every past decision, bugfix, and pattern is a potential standard — just tag it
147
-
148
- ## Graceful Degradation
149
-
150
- Following the sqlite-vec precedent:
151
-
152
- | Failure | Behavior |
153
- |---------|----------|
154
- | LLM API down/timeout | exit 0 (allow), log warning |
155
- | No standards found | Skip audit, exit 0 |
156
- | Config not synced yet | Sentinel disabled by default |
157
- | API key missing | Skip audit, log once |
158
- | Free tier user | Sentinel hooks not registered |
159
-
160
- ## Feedback Loop (The Moat)
161
-
162
- ```
163
- Sentinel blocks a write
164
- → Claude self-corrects
165
- → Corrected code passes
166
- → Block + correction saved as observation
167
- → Observation syncs to all team members
168
- → Future audits retrieve it as context
169
- → Sentinel gets smarter over time
170
- ```
171
-
172
- Static rules don't learn. Sentinel does. This is the competitive moat.
173
-
174
- ## Pricing
175
-
176
- | Tier | Sentinel | Observations | Price |
177
- |------|----------|-------------|-------|
178
- | Free | Not available | 10K, 2 devices | $0 |
179
- | Pro | Advisory mode, 100 audits/day, own API keys | 50K, unlimited devices | $15/dev/mo |
180
- | Team | Full blocking + advisory, unlimited, dashboard config push, shared standards, audit heatmap | 100K, unlimited devices | $25/dev/mo |
181
-
182
- Users bring their own LLM API keys. Engrm's marginal cost per audit is near-zero (one vector search + config lookup).
183
-
184
- ## Implementation Plan
185
-
186
- ### Phase 1: Core Hook + Local Audit (Week 1)
187
-
188
- | Task | File | Effort |
189
- |------|------|--------|
190
- | Add `SentinelConfig` to config interface | `src/config.ts` | 1h |
191
- | Add `"standard"` to observation type enum | `src/types.ts` | 30m |
192
- | Create `src/sentinel/types.ts` | New | 30m |
193
- | Create `src/sentinel/llm-client.ts` (OpenAI-compatible) | New (~40 lines) | 1h |
194
- | Create `src/sentinel/prompts.ts` | New | 2h |
195
- | Create `src/sentinel/audit.ts` (orchestrator) | New | 3h |
196
- | Create `hooks/sentinel.ts` (PreToolUse hook) | New | 2h |
197
- | Register sentinel hook in `registerHooks()` | `src/register.ts` | 30m |
198
- | Tests | `src/sentinel/*.test.ts` | 3h |
199
- | Integration test (hook → local LLM) | Manual | 2h |
200
-
201
- ### Phase 2: Dashboard Config Push (Week 2)
202
-
203
- | Task | Location | Effort |
204
- |------|----------|--------|
205
- | `POST /v1/mem/sentinel/config` endpoint | candengo-vector | 3h |
206
- | `GET /v1/mem/sentinel/config` endpoint | candengo-vector | 1h |
207
- | Store config in sync_events (record_type="sentinel_config") | candengo-vector | 2h |
208
- | Handle sentinel_config in pull loop | `src/sync/pull.ts` | 2h |
209
- | Dashboard: LLM provider config page | website/mem/sentinel.html | 4h |
210
- | Dashboard: standards manager (CRUD) | website/mem/sentinel.html | 4h |
211
- | API key encryption (server→client) | Both | 3h |
212
-
213
- ### Phase 3: Standards Library + Feedback Loop (Week 3)
214
-
215
- | Task | Location | Effort |
216
- |------|----------|--------|
217
- | `POST /v1/mem/sentinel/standards` CRUD | candengo-vector | 3h |
218
- | Standards sync via pull loop | `src/sync/pull.ts` | 2h |
219
- | Save audit findings as observations | `src/sentinel/audit.ts` | 2h |
220
- | Dashboard: audit results + heatmap | candengo-vector website | 4h |
221
- | `engrm sentinel status` CLI | `src/cli.ts` | 1h |
222
- | `engrm sentinel test` CLI | `src/cli.ts` | 2h |
223
-
224
- ### Phase 4: Polish + Tier Enforcement (Week 4)
225
-
226
- | Task | Effort |
227
- |------|--------|
228
- | Rate limiting (100/day pro, unlimited team) | 2h |
229
- | Tier check on hook registration | 1h |
230
- | Anti-rationalization gate (Stop hook, from TrailOfBits) | 3h |
231
- | Skip patterns, file-type filtering | 2h |
232
- | Docs + onboarding in dashboard | 3h |
233
- | Performance profiling (target: <3s/audit) | 2h |
234
-
235
- ### Phase 5: Beta + Launch (Weeks 5-6)
236
-
237
- | Task | Effort |
238
- |------|--------|
239
- | Internal dogfood (Unimpossible team) | 1 week |
240
- | Bug fixes from dogfood | Variable |
241
- | Launch blog post + HN announcement | 1 day |
242
- | Waitlist conversion emails | 1 day |
243
-
244
- ## Files to Create
245
-
246
- ```
247
- src/sentinel/
248
- ├── types.ts # SentinelConfig, AuditResult, etc.
249
- ├── llm-client.ts # OpenAI-compatible API client (~40 lines)
250
- ├── prompts.ts # System prompt, audit request formatter, response parser
251
- ├── audit.ts # Orchestrator: search → LLM → decision → save finding
252
- └── *.test.ts # Tests
253
-
254
- hooks/
255
- └── sentinel.ts # PreToolUse hook (follows post-tool-use.ts pattern)
256
- ```
257
-
258
- ## Files to Modify
259
-
260
- ```
261
- src/config.ts # Add SentinelConfig to Config interface + defaults
262
- src/register.ts # Register PreToolUse sentinel hook
263
- src/sync/pull.ts # Handle record_type="sentinel_config" | "sentinel_standard"
264
- src/cli.ts # Add `engrm sentinel status|test` commands
265
- ```
266
-
267
- ## Key Design Patterns to Follow
268
-
269
- | Pattern | Source | Application |
270
- |---------|--------|-------------|
271
- | Silent error handling | hooks/post-tool-use.ts | Never crash; exit 0 on any error |
272
- | Config merging | src/config.ts | Defaults → disk → sync override |
273
- | Reentrancy guards | src/sync/engine.ts | Prevent concurrent audits |
274
- | Graceful degradation | sqlite-vec integration | If unavailable, skip silently |
275
- | Observation pipeline | src/capture/ | Findings go through same scrub→quality→dedup→save flow |
276
-
277
- ## References
278
-
279
- ### Competitors
280
- - [CodeRabbit](https://www.coderabbit.ai/) — PR-level, $0-24/dev/mo
281
- - [Qodo](https://www.qodo.ai/) — Best RAG, PR-level, $0-30/dev/mo
282
- - [Greptile](https://www.greptile.com/) — Learns from PRs, $30/dev/mo
283
- - [Ellipsis](https://www.ellipsis.dev/) — PR-level, ~$20/dev/mo
284
-
285
- ### GitHub Repos
286
- - [disler/claude-code-hooks-mastery](https://github.com/disler/claude-code-hooks-mastery) (3.3k★)
287
- - [trailofbits/claude-code-config](https://github.com/trailofbits/claude-code-config) (1.6k★)
288
- - [qodo-ai/pr-agent](https://github.com/qodo-ai/pr-agent) (10.5k★)
289
- - [praneybehl/code-review-mcp](https://github.com/praneybehl/code-review-mcp)
290
-
291
- ### Claude Code Docs
292
- - [Hooks Reference](https://code.claude.com/docs/en/hooks)
293
- - [Hooks Guide](https://code.claude.com/docs/en/hooks-guide)