dravix-agent 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (208) hide show
  1. package/.claude/settings.example.json +30 -0
  2. package/ARCHITECTURE.md +410 -0
  3. package/LICENSE +21 -0
  4. package/README.md +153 -0
  5. package/ROADMAP.md +117 -0
  6. package/data/vulnkb.json +666 -0
  7. package/dist/bin/aegis.d.ts +3 -0
  8. package/dist/bin/aegis.d.ts.map +1 -0
  9. package/dist/bin/aegis.js +489 -0
  10. package/dist/bin/aegis.js.map +1 -0
  11. package/dist/cache.d.ts +9 -0
  12. package/dist/cache.d.ts.map +1 -0
  13. package/dist/cache.js +146 -0
  14. package/dist/cache.js.map +1 -0
  15. package/dist/engines/ai-sinks.d.ts +52 -0
  16. package/dist/engines/ai-sinks.d.ts.map +1 -0
  17. package/dist/engines/ai-sinks.js +204 -0
  18. package/dist/engines/ai-sinks.js.map +1 -0
  19. package/dist/engines/eslint.d.ts +9 -0
  20. package/dist/engines/eslint.d.ts.map +1 -0
  21. package/dist/engines/eslint.js +245 -0
  22. package/dist/engines/eslint.js.map +1 -0
  23. package/dist/engines/joern.d.ts +3 -0
  24. package/dist/engines/joern.d.ts.map +1 -0
  25. package/dist/engines/joern.js +98 -0
  26. package/dist/engines/joern.js.map +1 -0
  27. package/dist/engines/js-sinks.d.ts +70 -0
  28. package/dist/engines/js-sinks.d.ts.map +1 -0
  29. package/dist/engines/js-sinks.js +370 -0
  30. package/dist/engines/js-sinks.js.map +1 -0
  31. package/dist/engines/llm-critic.d.ts +130 -0
  32. package/dist/engines/llm-critic.d.ts.map +1 -0
  33. package/dist/engines/llm-critic.js +551 -0
  34. package/dist/engines/llm-critic.js.map +1 -0
  35. package/dist/engines/pragma.d.ts +20 -0
  36. package/dist/engines/pragma.d.ts.map +1 -0
  37. package/dist/engines/pragma.js +83 -0
  38. package/dist/engines/pragma.js.map +1 -0
  39. package/dist/engines/property-test.d.ts +3 -0
  40. package/dist/engines/property-test.d.ts.map +1 -0
  41. package/dist/engines/property-test.js +134 -0
  42. package/dist/engines/property-test.js.map +1 -0
  43. package/dist/engines/pyright.d.ts +10 -0
  44. package/dist/engines/pyright.d.ts.map +1 -0
  45. package/dist/engines/pyright.js +143 -0
  46. package/dist/engines/pyright.js.map +1 -0
  47. package/dist/engines/pysa.d.ts +3 -0
  48. package/dist/engines/pysa.d.ts.map +1 -0
  49. package/dist/engines/pysa.js +83 -0
  50. package/dist/engines/pysa.js.map +1 -0
  51. package/dist/engines/python-sinks.d.ts +82 -0
  52. package/dist/engines/python-sinks.d.ts.map +1 -0
  53. package/dist/engines/python-sinks.js +459 -0
  54. package/dist/engines/python-sinks.js.map +1 -0
  55. package/dist/engines/registry.d.ts +26 -0
  56. package/dist/engines/registry.d.ts.map +1 -0
  57. package/dist/engines/registry.js +70 -0
  58. package/dist/engines/registry.js.map +1 -0
  59. package/dist/engines/secret-scan.d.ts +22 -0
  60. package/dist/engines/secret-scan.d.ts.map +1 -0
  61. package/dist/engines/secret-scan.js +179 -0
  62. package/dist/engines/secret-scan.js.map +1 -0
  63. package/dist/engines/semgrep.d.ts +10 -0
  64. package/dist/engines/semgrep.d.ts.map +1 -0
  65. package/dist/engines/semgrep.js +200 -0
  66. package/dist/engines/semgrep.js.map +1 -0
  67. package/dist/engines/treesitter.d.ts +18 -0
  68. package/dist/engines/treesitter.d.ts.map +1 -0
  69. package/dist/engines/treesitter.js +135 -0
  70. package/dist/engines/treesitter.js.map +1 -0
  71. package/dist/engines/tsc.d.ts +10 -0
  72. package/dist/engines/tsc.d.ts.map +1 -0
  73. package/dist/engines/tsc.js +142 -0
  74. package/dist/engines/tsc.js.map +1 -0
  75. package/dist/engines/types.d.ts +47 -0
  76. package/dist/engines/types.d.ts.map +1 -0
  77. package/dist/engines/types.js +27 -0
  78. package/dist/engines/types.js.map +1 -0
  79. package/dist/findings.d.ts +121 -0
  80. package/dist/findings.d.ts.map +1 -0
  81. package/dist/findings.js +98 -0
  82. package/dist/findings.js.map +1 -0
  83. package/dist/hooks/claude-code.d.ts +3 -0
  84. package/dist/hooks/claude-code.d.ts.map +1 -0
  85. package/dist/hooks/claude-code.js +187 -0
  86. package/dist/hooks/claude-code.js.map +1 -0
  87. package/dist/index/context.d.ts +127 -0
  88. package/dist/index/context.d.ts.map +1 -0
  89. package/dist/index/context.js +267 -0
  90. package/dist/index/context.js.map +1 -0
  91. package/dist/index/embeddings.d.ts +68 -0
  92. package/dist/index/embeddings.d.ts.map +1 -0
  93. package/dist/index/embeddings.js +570 -0
  94. package/dist/index/embeddings.js.map +1 -0
  95. package/dist/index/graph_routing.d.ts +36 -0
  96. package/dist/index/graph_routing.d.ts.map +1 -0
  97. package/dist/index/graph_routing.js +170 -0
  98. package/dist/index/graph_routing.js.map +1 -0
  99. package/dist/index/joern.d.ts +76 -0
  100. package/dist/index/joern.d.ts.map +1 -0
  101. package/dist/index/joern.js +782 -0
  102. package/dist/index/joern.js.map +1 -0
  103. package/dist/index/property-test.d.ts +88 -0
  104. package/dist/index/property-test.d.ts.map +1 -0
  105. package/dist/index/property-test.js +466 -0
  106. package/dist/index/property-test.js.map +1 -0
  107. package/dist/index/proto/scip.proto +897 -0
  108. package/dist/index/pysa.d.ts +91 -0
  109. package/dist/index/pysa.d.ts.map +1 -0
  110. package/dist/index/pysa.js +617 -0
  111. package/dist/index/pysa.js.map +1 -0
  112. package/dist/index/scip.d.ts +76 -0
  113. package/dist/index/scip.d.ts.map +1 -0
  114. package/dist/index/scip.js +541 -0
  115. package/dist/index/scip.js.map +1 -0
  116. package/dist/index/vulrag.d.ts +86 -0
  117. package/dist/index/vulrag.d.ts.map +1 -0
  118. package/dist/index/vulrag.js +242 -0
  119. package/dist/index/vulrag.js.map +1 -0
  120. package/dist/index.d.ts +9 -0
  121. package/dist/index.d.ts.map +1 -0
  122. package/dist/index.js +8 -0
  123. package/dist/index.js.map +1 -0
  124. package/dist/install/claude-code.d.ts +31 -0
  125. package/dist/install/claude-code.d.ts.map +1 -0
  126. package/dist/install/claude-code.js +447 -0
  127. package/dist/install/claude-code.js.map +1 -0
  128. package/dist/lang.d.ts +5 -0
  129. package/dist/lang.d.ts.map +1 -0
  130. package/dist/lang.js +52 -0
  131. package/dist/lang.js.map +1 -0
  132. package/dist/learning/suppressions.d.ts +70 -0
  133. package/dist/learning/suppressions.d.ts.map +1 -0
  134. package/dist/learning/suppressions.js +179 -0
  135. package/dist/learning/suppressions.js.map +1 -0
  136. package/dist/mcp/server.d.ts +2 -0
  137. package/dist/mcp/server.d.ts.map +1 -0
  138. package/dist/mcp/server.js +187 -0
  139. package/dist/mcp/server.js.map +1 -0
  140. package/dist/mcp/tools/explain.d.ts +58 -0
  141. package/dist/mcp/tools/explain.d.ts.map +1 -0
  142. package/dist/mcp/tools/explain.js +60 -0
  143. package/dist/mcp/tools/explain.js.map +1 -0
  144. package/dist/mcp/tools/precheck.d.ts +29 -0
  145. package/dist/mcp/tools/precheck.d.ts.map +1 -0
  146. package/dist/mcp/tools/precheck.js +42 -0
  147. package/dist/mcp/tools/precheck.js.map +1 -0
  148. package/dist/mcp/tools/validate.d.ts +73 -0
  149. package/dist/mcp/tools/validate.d.ts.map +1 -0
  150. package/dist/mcp/tools/validate.js +66 -0
  151. package/dist/mcp/tools/validate.js.map +1 -0
  152. package/dist/mcp/warm.d.ts +88 -0
  153. package/dist/mcp/warm.d.ts.map +1 -0
  154. package/dist/mcp/warm.js +331 -0
  155. package/dist/mcp/warm.js.map +1 -0
  156. package/dist/orchestrator.d.ts +46 -0
  157. package/dist/orchestrator.d.ts.map +1 -0
  158. package/dist/orchestrator.js +596 -0
  159. package/dist/orchestrator.js.map +1 -0
  160. package/dist/policy.d.ts +51 -0
  161. package/dist/policy.d.ts.map +1 -0
  162. package/dist/policy.js +201 -0
  163. package/dist/policy.js.map +1 -0
  164. package/dist/risk.d.ts +31 -0
  165. package/dist/risk.d.ts.map +1 -0
  166. package/dist/risk.js +92 -0
  167. package/dist/risk.js.map +1 -0
  168. package/dist/stats.d.ts +72 -0
  169. package/dist/stats.d.ts.map +1 -0
  170. package/dist/stats.js +217 -0
  171. package/dist/stats.js.map +1 -0
  172. package/dist/telemetry/collector.d.ts +10 -0
  173. package/dist/telemetry/collector.d.ts.map +1 -0
  174. package/dist/telemetry/collector.js +75 -0
  175. package/dist/telemetry/collector.js.map +1 -0
  176. package/dist/telemetry/consent.d.ts +9 -0
  177. package/dist/telemetry/consent.d.ts.map +1 -0
  178. package/dist/telemetry/consent.js +42 -0
  179. package/dist/telemetry/consent.js.map +1 -0
  180. package/dist/telemetry/installation.d.ts +2 -0
  181. package/dist/telemetry/installation.d.ts.map +1 -0
  182. package/dist/telemetry/installation.js +32 -0
  183. package/dist/telemetry/installation.js.map +1 -0
  184. package/dist/telemetry/sanitizer.d.ts +5 -0
  185. package/dist/telemetry/sanitizer.d.ts.map +1 -0
  186. package/dist/telemetry/sanitizer.js +60 -0
  187. package/dist/telemetry/sanitizer.js.map +1 -0
  188. package/dist/telemetry/types.d.ts +39 -0
  189. package/dist/telemetry/types.d.ts.map +1 -0
  190. package/dist/telemetry/types.js +4 -0
  191. package/dist/telemetry/types.js.map +1 -0
  192. package/dist/telemetry/uploader.d.ts +12 -0
  193. package/dist/telemetry/uploader.d.ts.map +1 -0
  194. package/dist/telemetry/uploader.js +92 -0
  195. package/dist/telemetry/uploader.js.map +1 -0
  196. package/dist/util/logger.d.ts +19 -0
  197. package/dist/util/logger.d.ts.map +1 -0
  198. package/dist/util/logger.js +58 -0
  199. package/dist/util/logger.js.map +1 -0
  200. package/dist/util/safe-paths.d.ts +8 -0
  201. package/dist/util/safe-paths.d.ts.map +1 -0
  202. package/dist/util/safe-paths.js +102 -0
  203. package/dist/util/safe-paths.js.map +1 -0
  204. package/dist/util/subprocess.d.ts +32 -0
  205. package/dist/util/subprocess.d.ts.map +1 -0
  206. package/dist/util/subprocess.js +137 -0
  207. package/dist/util/subprocess.js.map +1 -0
  208. package/package.json +93 -0
@@ -0,0 +1,30 @@
1
+ {
2
+ "//": "Aegis-v2 hook configuration for Claude Code. Copy to your project's .claude/settings.json (or merge into the existing file).",
3
+ "//install": "Requires Aegis-v2 installed globally (npm i -g @aegis/aegis-v2) or local path to dist/bin/aegis.js.",
4
+ "hooks": {
5
+ "PostToolUse": [
6
+ {
7
+ "matcher": "Write|Edit",
8
+ "hooks": [
9
+ {
10
+ "type": "command",
11
+ "command": "aegis hook",
12
+ "timeout": 10
13
+ }
14
+ ]
15
+ }
16
+ ],
17
+ "PreToolUse": [
18
+ {
19
+ "matcher": "Write|Edit",
20
+ "hooks": [
21
+ {
22
+ "type": "command",
23
+ "command": "aegis hook",
24
+ "timeout": 3
25
+ }
26
+ ]
27
+ }
28
+ ]
29
+ }
30
+ }
@@ -0,0 +1,410 @@
1
+ # Aegis-v2 — Architecture
2
+
3
+ > Real-time MCP code-write gate for AI agents. Built **OSS-only**, **TypeScript-native**,
4
+ > **fully owned** — no commercial-product lock-in.
5
+
6
+ ---
7
+
8
+ ## 0. North Star
9
+
10
+ A `PreToolUse` / `PostToolUse` hook for Claude Code / Cursor / Codex / opencode / Hermes
11
+ that intercepts every file write the agent is about to make, runs a multi-layer hybrid
12
+ analysis in **p50 ≤ 400 ms**, and either ALLOWS, WARNS, or BLOCKS with a remediation
13
+ prompt fed back to the agent.
14
+
15
+ **Two product properties that nothing else combines:**
16
+
17
+ 1. **The gate is in the WRITE LOOP** — not a post-merge scan dashboard. Bad code
18
+ never reaches disk.
19
+ 2. **The agent receives a structured remediation prompt** when blocked — it can
20
+ regenerate without human intervention.
21
+
22
+ ---
23
+
24
+ ## 1. Honest baselines we have to beat
25
+
26
+ Latest published numbers (2024-2026) on real bug-detection benchmarks. **Every
27
+ commercial tool is well under 50% recall on hard datasets.**
28
+
29
+ | Tool / approach | Benchmark | Recall |
30
+ |---|---|---|
31
+ | Single 7B LLM (state-of-the-art) | PrimeVul | **3.09% F1** [Ding 2024](https://arxiv.org/abs/2403.18624) |
32
+ | GPT-3.5 / GPT-4 (zero-shot) | PrimeVul stringent | "akin to random guessing" |
33
+ | Best commercial SAST (CodeQL) | EASE-2024 Java curated | 18.4% |
34
+ | Semgrep Pro | EASE-2024 | 14.3% |
35
+ | Snyk DeepCode | EASE-2024 | 11.2% |
36
+ | **Macroscope** (2026 SOTA) | 118 OSS runtime bugs | **48%** |
37
+ | CodeRabbit | same | 46% |
38
+ | Cursor BugBot | same | 42% |
39
+ | **Vul-RAG** (LLM + knowledge retrieval) | LinuxVul | +16-24% over pure LLM ([Du 2024](https://arxiv.org/abs/2406.11147)) |
40
+
41
+ **Implication:** Aegis-v2 must combine static engines + retrieved vuln knowledge +
42
+ project graph + LLM critic with structured evidence trails. **No single layer is
43
+ enough.** A 150M model alone classifies at near-chance on logic bugs (PrimeVul).
44
+ The 150M's role is **router**, not detector.
45
+
46
+ ---
47
+
48
+ ## 2. Architecture overview (one picture)
49
+
50
+ ```
51
+ ┌─────────────────────────────────────────────────────────────────────┐
52
+ │ AI agent (Claude Code / Cursor / Codex CLI / opencode / Hermes) │
53
+ │ │ │
54
+ │ ├── PreToolUse hook ───► aegis precheck_change() │
55
+ │ │ p50 ≤ 50 ms; symbol resolution + tree-sitter sanity │
56
+ │ │ │
57
+ │ ├── Write / Edit tool ───► (file written to disk) │
58
+ │ │ │
59
+ │ └── PostToolUse hook ───► aegis validate_edit() │
60
+ │ p50 ≤ 400 ms / p95 ≤ 4 s on deep │
61
+ │ exit 0 = allow ─ exit 2 = BLOCK + remediation prompt │
62
+ │ │
63
+ │ ┌─────────────────────────────────────────────────────────────┐ │
64
+ │ │ LAYER 1 — Fast Deterministic ≤ 200 ms (parallel) │ │
65
+ │ │ • tree-sitter incremental parse │ │
66
+ │ │ • Semgrep OSS (community rules: p/default, p/security) │ │
67
+ │ │ • Pyright strict (Python) / tsc strict (TS) / mypy │ │
68
+ │ │ • ESLint + @typescript-eslint/no-floating-promises, │ │
69
+ │ │ no-misused-promises, require-await │ │
70
+ │ │ • Secret scan (gitleaks rules embedded) │ │
71
+ │ └─────────────────┬───────────────────────────────────────────┘ │
72
+ │ │ findings + AST + symbols │
73
+ │ ┌─────────────────▼───────────────────────────────────────────┐ │
74
+ │ │ LAYER 2 — Project Context ≤ 300 ms (cached) │ │
75
+ │ │ • SCIP index lookup (scip-typescript / scip-python) │ │
76
+ │ │ • Joern CPG diff query (data-flow paths, callers) │ │
77
+ │ │ • Pysa pre-computed taint summaries (Python) │ │
78
+ │ │ • Local code embedding (CodeT5+ 110M) → FAISS top-K │ │
79
+ │ └─────────────────┬───────────────────────────────────────────┘ │
80
+ │ │ enriched evidence trail │
81
+ │ ┌─────────────────▼───────────────────────────────────────────┐ │
82
+ │ │ LAYER 3 — 150M Router ≤ 100 ms (on-device) │ │
83
+ │ │ • Classifies: clean / style / logic / security / race │ │
84
+ │ │ • Predicts CWE bucket (89, 79, 362, 476, 285, 918, ...) │ │
85
+ │ │ • Action: pass / warn / deep_review │ │
86
+ │ │ • If deep_review → escalate to Layer 4 │ │
87
+ │ └─────────────────┬───────────────────────────────────────────┘ │
88
+ │ │ score + category │
89
+ │ ┌─────────────────▼───────────────────────────────────────────┐ │
90
+ │ │ LAYER 4 — LLM Critic 1-3 s (~10% of edits, async) │ │
91
+ │ │ • Vul-RAG retrieve k=3 CVE knowledge by predicted CWE │ │
92
+ │ │ • Claude / GPT critic prompt: │ │
93
+ │ │ diff + data-flow path + callers + Vul-RAG knowledge │ │
94
+ │ │ • Structured JSON: {verdict, cwe, evidence, fix, conf} │ │
95
+ │ │ • Ensemble vote (second-opinion: different model vendor) │ │
96
+ │ └─────────────────┬───────────────────────────────────────────┘ │
97
+ └─────────────────────│───────────────────────────────────────────────┘
98
+
99
+ exit 2 + remediation JSON → agent regenerates
100
+ exit 0 → write proceeds
101
+ ```
102
+
103
+ ### Layer responsibilities
104
+
105
+ | Layer | Owns | Latency budget | Fail-open? |
106
+ |---|---|---|---|
107
+ | 1 Fast Deterministic | Pattern bugs, type errors, secrets, async footguns | ≤ 200 ms | **No** — if engine crashes, log + skip just that engine |
108
+ | 2 Project Context | Cross-file symbols, data flow, callers, embeddings | ≤ 300 ms | Yes — empty context is acceptable |
109
+ | 3 150M Router | Classify edit + suspected CWE; decide if Layer 4 needed | ≤ 100 ms | Yes — fall back to threshold-based router |
110
+ | 4 LLM Critic | Final verdict on hard cases, with full evidence | 1-3 s (async) | Yes — Layer 1+2 decision stands |
111
+
112
+ ---
113
+
114
+ ## 3. MCP tool surface
115
+
116
+ Three tools exposed via the MCP protocol (stdio transport):
117
+
118
+ ### `precheck_change`
119
+ **When:** `PreToolUse` hook, before the agent calls `Write` / `Edit`.
120
+ **Input:** `{ file_path, proposed_content }`.
121
+ **Latency:** ≤ 50 ms hard cap.
122
+ **Output:** quick advisory only (`{ allow|warn, reasons[] }`) — never blocks.
123
+ **Layers used:** 1 (tree-sitter sanity only — no full engine run).
124
+
125
+ ### `validate_edit`
126
+ **When:** `PostToolUse` hook, after the agent writes the file.
127
+ **Input:** `{ file_path, content, diff?, project_root? }`.
128
+ **Latency:** p50 ≤ 400 ms, p95 ≤ 4 s.
129
+ **Output:** `{ verdict: allow|warn|block, findings: [...], remediation_prompt? }`.
130
+ **Exit code mapping:** allow → 0, warn → 0 (with stderr message), block → 2.
131
+ **Layers used:** 1 + 2 always; 3 always; 4 only when 3 says `deep_review`.
132
+
133
+ ### `explain_risk`
134
+ **When:** agent or human asks "why was this blocked?".
135
+ **Input:** `{ finding_id }` or `{ file_path, line }`.
136
+ **Output:** detailed evidence trail (engine path, data-flow nodes, Vul-RAG citations).
137
+ **Layers used:** read-only retrieval from the finding store.
138
+
139
+ ---
140
+
141
+ ## 4. Unified Finding schema (zod, single source of truth)
142
+
143
+ ```ts
144
+ const Finding = z.object({
145
+ id: z.string(), // sha256(engine + file + line + rule)[:16]
146
+ engine: z.enum([
147
+ "semgrep", "pyright", "eslint", "treesitter", "secret",
148
+ "joern", "pysa", "codeql", "infer-racerd",
149
+ "router-150m", "llm-critic",
150
+ ]),
151
+ file: z.string(),
152
+ line: z.number().int().positive().optional(),
153
+ col: z.number().int().positive().optional(),
154
+ rule_id: z.string(),
155
+ cwe: z.string().regex(/^CWE-\d+$/).optional(),
156
+ severity: z.enum(["info", "low", "medium", "high", "critical"]),
157
+ message: z.string().max(500),
158
+ evidence: z.object({
159
+ snippet: z.string().max(2000).optional(),
160
+ dataflow: z.array(z.object({
161
+ file: z.string(), line: z.number(), label: z.string(),
162
+ })).max(20).optional(),
163
+ callers: z.array(z.string()).max(10).optional(),
164
+ related_cves: z.array(z.string()).max(5).optional(),
165
+ }).optional(),
166
+ confidence: z.number().min(0).max(1), // 0-1, source-stamped
167
+ source: z.enum(["pattern", "dataflow", "taint", "router", "critic", "ensemble"]),
168
+ remediation: z.string().max(2000).optional(),
169
+ });
170
+ ```
171
+
172
+ Every layer produces `Finding[]` in this exact shape. **No engine-specific output
173
+ escapes the orchestrator.** That single contract is why we can swap engines without
174
+ touching downstream logic.
175
+
176
+ ---
177
+
178
+ ## 5. Risk scoring → action mapping
179
+
180
+ ```
181
+ confidence × severity_weight → action
182
+
183
+ ≥ 0.85 × {high, critical} → BLOCK (exit 2, remediation prompt)
184
+ 0.6-0.85 × {medium+} → WARN (exit 0, stderr message)
185
+ any × {low, info} → ALLOW (silent)
186
+
187
+ any single critical secret → BLOCK (regardless of confidence)
188
+ any taint sink reached → BLOCK (Pysa/Joern with confidence > 0.7)
189
+ ```
190
+
191
+ The thresholds live in `src/risk.ts` as a single function (`scoreToAction`) so we
192
+ can A/B-tune them without touching engines.
193
+
194
+ ---
195
+
196
+ ## 6. Latency budgets (production targets)
197
+
198
+ | Stage | p50 | p95 | Worst |
199
+ |---|---|---|---|
200
+ | PreToolUse precheck | 30 ms | 80 ms | 100 ms |
201
+ | Layer 1 (parallel, all engines) | 150 ms | 400 ms | 500 ms |
202
+ | Layer 2 cached query | 50 ms | 200 ms | 300 ms |
203
+ | Layer 3 router inference | 80 ms | 120 ms | 150 ms |
204
+ | Layer 4 LLM critic (10% of edits) | 1.5 s | 3 s | 5 s |
205
+ | **Total p50 (no critic)** | **~310 ms** | | |
206
+ | **Total p95 (with critic)** | | **~4 s** | |
207
+
208
+ **Hard limit:** `validate_edit` returns within 5 s or the gate FAILS-OPEN with a
209
+ warning. Better to miss a bug than to break the agent's flow.
210
+
211
+ ---
212
+
213
+ ## 7. Caching strategy
214
+
215
+ | Cache | Key | TTL | Backend |
216
+ |---|---|---|---|
217
+ | Tree-sitter AST | sha256(content) | session | in-memory LRU |
218
+ | Semgrep findings per file | sha256(content + rules_version) | session | in-memory |
219
+ | SCIP index | per-file mtime + content hash | manual invalidation | LMDB / RocksDB |
220
+ | Joern CPG | per-file mtime + content hash | manual invalidation | OverflowDB |
221
+ | Vul-RAG retrieval | sha256(query_embedding) | 24 h | LMDB |
222
+ | LLM critic verdict | sha256(diff + context + cwe) | 7 d | LMDB |
223
+
224
+ `.aegis/` per project, gitignored.
225
+
226
+ ---
227
+
228
+ ## 8. Threat model (MCP attack surface)
229
+
230
+ **Why this matters:** 2025-2026 supply-chain attacks (Shai-Hulud,
231
+ SANDWORM_MODE, Mini Shai-Hulud) specifically targeted MCP servers via
232
+ prompt injection and registry poisoning. Live CVEs we must defend against:
233
+
234
+ - **CVE-2025-49596** — MCP Inspector RCE
235
+ - **CVE-2025-6514** — mcp-remote command injection (437K+ downloads)
236
+ - **CVE-2025-53967** — Figma/Framelink
237
+ - **CVE-2025-54136** — Cursor zero-click prompt injection
238
+ - **CVE-2025-54994** — typosquat package
239
+
240
+ **Aegis-v2 invariants** (must hold in every release):
241
+
242
+ 1. **No registry-based auto-update.** Version pinned in `package.json` + checksum.
243
+ 2. **No shell-out to user-controlled paths.** All subprocess args go through
244
+ `escapeArg()`; file paths validated against `path.resolve()` + allowlist.
245
+ 3. **Findings are treated as DATA in any LLM prompt**, never as instructions —
246
+ wrapped in nonce-marked fences (same pattern as our argus cascade).
247
+ 4. **No outbound network in Layer 1-3.** Layer 4 LLM critic is the ONLY outbound
248
+ call, and it goes only to user-configured endpoint (Anthropic / OpenAI).
249
+ 5. **Minimal-permission install.** The hook reads files in the project root; never
250
+ `~/.ssh`, `~/.aws`, `~/Documents`, etc. Explicit denylist in `src/util/safe-paths.ts`.
251
+ 6. **No tool result is auto-executed.** Remediation prompts are TEXT for the agent,
252
+ never code we run ourselves.
253
+ 7. **Audit log** of every gate decision, locally at `.aegis/audit.jsonl`, append-only.
254
+
255
+ ---
256
+
257
+ ## 9. Engines (Phase-by-phase rollout)
258
+
259
+ Following the research's effort/impact ranking:
260
+
261
+ | Phase | Engines | Why |
262
+ |---|---|---|
263
+ | **0 (Week 1)** | Semgrep OSS + Pyright + tsc + ESLint + tree-sitter + secret-scan | 60% of catches "free"; covers SQLi/XSS/cmd-inj/secret patterns + type errors + async footguns |
264
+ | **1 (Weeks 2-4)** | + SCIP indexer + Joern CPG + CodeT5+ embeddings + FAISS | Cross-file context — answers "who calls this?", "what's the type of x?", "are there similar functions?" |
265
+ | **2 (Weeks 5-8)** | + Pysa taint (Python) + RacerD-on-Joern queries (TS/JS/Python) + Vul-RAG KB + LLM critic + 150M router | Deep semantic — taint reachability, races, CVE knowledge, project-aware verdict |
266
+ | **3 (Weeks 9-12)** | + suppression learning + telemetry + property-based testing gate + policy-as-code | Production hardening |
267
+
268
+ ### Phase 0 engine choices (what ships first)
269
+
270
+ | Engine | What it catches | Latency | Setup |
271
+ |---|---|---|---|
272
+ | **tree-sitter** | Syntax errors, basic structure | 5-20 ms | `tree-sitter` + `tree-sitter-python` etc. |
273
+ | **Semgrep OSS** | Pattern bugs (`p/default` + `p/security-audit` + custom) | 100-500 ms | `pysemgrep` subprocess |
274
+ | **Pyright** | Python type errors, unresolved imports | 100-300 ms | `pyright` CLI subprocess |
275
+ | **tsc** | TS/JS type errors (with `--noEmit`) | 200-500 ms | `tsc` subprocess |
276
+ | **ESLint** | JS/TS lint rules — `no-floating-promises`, `no-misused-promises`, `require-await` (catches ~60% of async bugs) | 100-300 ms | `eslint --rulesdir` |
277
+ | **Secret scan** | Hardcoded keys, tokens | 20-50 ms | Embedded regex set (gitleaks rules) |
278
+
279
+ ---
280
+
281
+ ## 10. Comparison to existing tools
282
+
283
+ | Tool | Type | Real-time gate? | Cross-file? | LLM critic? | OSS? |
284
+ |---|---|---|---|---|---|
285
+ | **Semgrep MCP** (`semgrep/mcp`) | MCP wrapper | Yes (post-write) | Limited | No | Yes |
286
+ | **Codacy MCP** | MCP wrapper | Yes | Yes (paid) | No | Yes (free tier) |
287
+ | **Snyk MCP** | Commercial | Yes (CLI tier) | Yes | Yes (DeepCode) | No |
288
+ | **Mobb Vibe Shield** | Commercial | Yes | Limited | Yes (fix author) | OpenGrep core |
289
+ | **CodeRabbit MCP** | Commercial | No (PR-time) | Yes | Yes | No |
290
+ | **Aegis-v2** | OSS, ours | **Yes (PRE+POST hook)** | **Yes (SCIP+Joern)** | **Yes (Vul-RAG)** | **Yes** |
291
+
292
+ **Differentiators:**
293
+ 1. Phase 0 alone matches Semgrep MCP on free tier.
294
+ 2. Phase 1-2 adds what Codacy charges for + adds LLM critic with Vul-RAG (no
295
+ competitor does this combo open-source).
296
+ 3. Phase 3 adds property-based testing gate (no competitor at all).
297
+
298
+ ---
299
+
300
+ ## 11. Project layout (committed today)
301
+
302
+ ```
303
+ aegis-v2/
304
+ ├── README.md
305
+ ├── ARCHITECTURE.md ← this file
306
+ ├── ROADMAP.md ← per-phase tasks, deliverables, exit criteria
307
+ ├── LICENSE ← MIT
308
+ ├── package.json
309
+ ├── tsconfig.json
310
+ ├── vitest.config.ts
311
+ ├── .gitignore
312
+ ├── .claude/
313
+ │ └── settings.example.json ← copy-paste hook config for users
314
+
315
+ ├── src/
316
+ │ ├── index.ts ← package entry; re-exports
317
+ │ ├── mcp/
318
+ │ │ ├── server.ts ← MCP stdio server bootstrap
319
+ │ │ ├── transport.ts
320
+ │ │ └── tools/
321
+ │ │ ├── precheck.ts ← precheck_change
322
+ │ │ ├── validate.ts ← validate_edit
323
+ │ │ └── explain.ts ← explain_risk
324
+ │ ├── engines/
325
+ │ │ ├── types.ts ← Engine interface
326
+ │ │ ├── registry.ts ← engine registration
327
+ │ │ ├── treesitter.ts ← Phase 0
328
+ │ │ ├── semgrep.ts ← Phase 0
329
+ │ │ ├── pyright.ts ← Phase 0
330
+ │ │ ├── tsc.ts ← Phase 0
331
+ │ │ ├── eslint.ts ← Phase 0
332
+ │ │ ├── secret-scan.ts ← Phase 0
333
+ │ │ ├── scip.ts ← Phase 1 (stub)
334
+ │ │ ├── joern.ts ← Phase 1 (stub)
335
+ │ │ ├── embeddings.ts ← Phase 1 (stub)
336
+ │ │ ├── pysa.ts ← Phase 2 (stub)
337
+ │ │ ├── racerd.ts ← Phase 2 (stub)
338
+ │ │ ├── vulrag.ts ← Phase 2 (stub)
339
+ │ │ ├── router-150m.ts ← Phase 2 (stub)
340
+ │ │ └── llm-critic.ts ← Phase 2 (stub)
341
+ │ ├── orchestrator.ts ← run engines in parallel, merge findings
342
+ │ ├── findings.ts ← unified Finding zod schema
343
+ │ ├── risk.ts ← scoring + action mapping
344
+ │ ├── lang.ts ← language detection from file ext
345
+ │ ├── cache.ts ← LMDB wrapper
346
+ │ ├── hooks/
347
+ │ │ └── claude-code.ts ← stdin/stdout protocol for Claude Code hooks
348
+ │ └── util/
349
+ │ ├── subprocess.ts ← safe spawn with timeouts
350
+ │ ├── safe-paths.ts ← path validation + denylist
351
+ │ └── logger.ts
352
+
353
+ ├── tests/
354
+ │ ├── fixtures/ ← good + bad sample files per language
355
+ │ ├── engines/ ← per-engine unit tests
356
+ │ ├── orchestrator.test.ts
357
+ │ ├── risk.test.ts
358
+ │ └── mcp.test.ts ← MCP protocol roundtrip
359
+
360
+ └── docs/
361
+ ├── hook-setup.md ← step-by-step user install
362
+ ├── phase-1-project-context.md ← extended design for SCIP+Joern+embeddings
363
+ ├── phase-2-deep-logic.md ← Pysa, RacerD, Vul-RAG, 150M router, LLM critic
364
+ ├── phase-3-production.md ← telemetry, suppression learning, property tests
365
+ └── benchmarks.md ← Macroscope + internal eval methodology
366
+ ```
367
+
368
+ ---
369
+
370
+ ## 12. Open decisions (logged here for posterity)
371
+
372
+ | Decision | Choice | Why | Reversible? |
373
+ |---|---|---|---|
374
+ | Language | TypeScript / Node 18+ | MCP SDK is first-class TS; CodeGraph / SCIP-typescript / scip-python all bundle TS clients; aligns with research recommendation | Hard — would require full rewrite |
375
+ | Stack philosophy | OSS-only (free engines + free Claude/GPT subscription) | No vendor lock; full control; can be self-hosted | Easy — can add commercial adapters later |
376
+ | MCP transport | stdio | Lowest latency; no auth surface; matches every agent's expected wiring | Easy — Streamable HTTP can be added later |
377
+ | Findings schema | Unified `Finding` (zod) | Single contract = engine swap without downstream changes | Hard — would need migration |
378
+ | Cache backend | LMDB | Embedded, zero-deps, mmap-fast | Easy — can swap for RocksDB |
379
+ | Phase 0 engines | Semgrep OSS + Pyright + tsc + ESLint + tree-sitter + secret-scan | Cheapest highest-coverage starting set per research | Easy — engines are pluggable |
380
+ | Hook strategy | Both PreToolUse (advisory) + PostToolUse (blocking) | Pre is cheap; post is authoritative. Together = user sees warnings during editing AND can never write blocked code. | Easy |
381
+ | Local LLM | Defer 150M router to Phase 2 | Phase 0 doesn't need it; ship value sooner | Easy |
382
+ | Critic model | Claude (default) + OpenAI GPT (ensemble) | Different vendors avoid common-mode bias | Easy |
383
+ | Audit log | `.aegis/audit.jsonl` append-only | Local, simple, gitignored | Easy |
384
+
385
+ ---
386
+
387
+ ## 13. Citations
388
+
389
+ Research that shaped this design:
390
+
391
+ - Ding et al., **PrimeVul** (arxiv 2403.18624, 2024) — vulnerability detection benchmark; LLMs at ~3% F1 in stringent settings.
392
+ - Du et al., **Vul-RAG** (arxiv 2406.11147, 2024) — knowledge-augmented retrieval; +16-24% over pure LLM.
393
+ - Blackshear et al., **RacerD** (OOPSLA 2018) — compositional inter-procedural race detection; 2500+ races fixed in production at Meta.
394
+ - Macroscope **Code-Review Benchmark** (Feb 2026) — 118 bugs across 45 OSS repos; 48% top, traditional SAST < 20%.
395
+ - aiXcoder **COLA** (arxiv 2503.15301, 2025) — LLMs ignore unfocused context; targeted retrieval wins.
396
+ - **RepoGraph** (ICLR 2025, arxiv 2410.14684) — AST-based def/ref/invoke graph; +2-3 pp on SWE-bench.
397
+ - **IRIS** (ICLR 2025, arxiv 2405.17238) — LLM-augmented CodeQL; doubles recall on Java curated set.
398
+
399
+ OSS engines:
400
+
401
+ - [Semgrep OSS](https://github.com/semgrep/semgrep) — Apache 2.0
402
+ - [Pyright](https://github.com/microsoft/pyright) — MIT
403
+ - [ESLint](https://github.com/eslint/eslint) — MIT
404
+ - [tree-sitter](https://github.com/tree-sitter/tree-sitter) — MIT
405
+ - [Joern](https://github.com/joernio/joern) — Apache 2.0
406
+ - [Infer / RacerD](https://github.com/facebook/infer) — MIT
407
+ - [Pysa / Pyre](https://github.com/facebook/pyre-check) — MIT
408
+ - [SCIP](https://github.com/sourcegraph/scip) + [scip-typescript](https://github.com/sourcegraph/scip-typescript) + [scip-python](https://github.com/sourcegraph/scip-python) — Apache 2.0
409
+ - [LMDB-js](https://github.com/kriszyp/lmdb-js) — MIT
410
+ - [@modelcontextprotocol/sdk](https://github.com/modelcontextprotocol/typescript-sdk) — MIT
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Aegis-v2 contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,153 @@
1
+ # Aegis-v2
2
+
3
+ **Real-time MCP code-write gate for AI agents.**
4
+
5
+ When Claude Code / Cursor / Codex / opencode / Hermes is about to write a file,
6
+ Aegis runs an in-process multi-engine analysis and decides ALLOW / WARN / BLOCK
7
+ — in median ≤ 400 ms. Blocked writes return a structured remediation prompt
8
+ the agent can act on automatically.
9
+
10
+ OSS-only. TypeScript-native. Fully self-hosted.
11
+
12
+ ## Status
13
+
14
+ **Phase 0** — Shipped. See `ARCHITECTURE.md` for the full 4-phase plan.
15
+
16
+ Phase 0 capability surface:
17
+
18
+ | Engine | Catches | Languages |
19
+ |---|---|---|
20
+ | `secret-scan` (built-in) | Hardcoded secrets (AWS / GitHub / OpenAI / Slack / JWT / PEM keys) | all |
21
+ | `treesitter` | Parse errors / "agent wrote half a file" | py, js, jsx, ts, tsx |
22
+ | `eslint` | `no-floating-promises`, `no-misused-promises`, `no-eval`, ~25 lint rules | js, ts, jsx, tsx |
23
+ | `pyright` (if installed) | Python type errors | py |
24
+ | `tsc` (if installed) | TS type errors | ts, tsx, js, jsx (`checkJs`) |
25
+ | `semgrep` (if installed) | Pattern-based security rules from `p/default` | all major langs |
26
+
27
+ ## Install
28
+
29
+ ```bash
30
+ cd aegis-v2
31
+ npm install
32
+ npm run build
33
+ # global install (optional)
34
+ npm link
35
+ ```
36
+
37
+ External engines that the gate auto-detects (install where you want them):
38
+
39
+ ```bash
40
+ # Python type checker
41
+ pip install --user pyright
42
+
43
+ # Multi-language pattern engine
44
+ pip install --user semgrep
45
+
46
+ # TypeScript compiler — likely already in your projects
47
+ npm i -g typescript
48
+ ```
49
+
50
+ Aegis runs without these — they're additive. Each one not found = degraded
51
+ coverage, no crash.
52
+
53
+ Run `aegis doctor` to see which engines are live on your machine.
54
+
55
+ ## Wire to Claude Code (the hook)
56
+
57
+ Copy `.claude/settings.example.json` to your project's `.claude/settings.json`
58
+ (or merge into the existing file):
59
+
60
+ ```json
61
+ {
62
+ "hooks": {
63
+ "PostToolUse": [
64
+ { "matcher": "Write|Edit", "hooks": [{ "type": "command", "command": "aegis hook", "timeout": 10 }] }
65
+ ]
66
+ }
67
+ }
68
+ ```
69
+
70
+ Restart Claude Code. From the next session, every `Write` / `Edit` is gated.
71
+ Blocked writes show the remediation prompt and Claude regenerates.
72
+
73
+ ## Use as an MCP server
74
+
75
+ Add to `~/.claude.json` (Claude Code) — but **prefer the hook** for blocking;
76
+ the MCP tools are for the agent to opt-in to checking BEFORE writing:
77
+
78
+ ```json
79
+ {
80
+ "mcpServers": {
81
+ "aegis": {
82
+ "type": "stdio",
83
+ "command": "aegis",
84
+ "args": ["mcp"]
85
+ }
86
+ }
87
+ }
88
+ ```
89
+
90
+ The MCP server exposes three tools:
91
+
92
+ - `precheck_change(file_path, content)` — fast advisory, ≤ 200 ms, never blocks.
93
+ - `validate_edit(file_path, content)` — authoritative, returns verdict + remediation.
94
+ - `explain_risk(finding_id|file)` — look up the prior gate decision from `.aegis/audit.jsonl`.
95
+
96
+ ## CLI
97
+
98
+ ```bash
99
+ aegis doctor # list engines + availability
100
+ aegis scan path/to/file.py # print verdict + findings as JSON; exit 0/2
101
+ aegis mcp # start stdio MCP server
102
+ aegis hook # consume Claude-Code hook payload on stdin
103
+ ```
104
+
105
+ ## Environment
106
+
107
+ | Variable | Default | Purpose |
108
+ |---|---|---|
109
+ | `AEGIS_LOG_LEVEL` | `info` | `debug` \| `info` \| `warn` \| `error` |
110
+ | `AEGIS_TOTAL_TIMEOUT_MS` | `5000` | Overall gate budget |
111
+ | `AEGIS_SEMGREP_CONFIG` | `p/default` | Semgrep ruleset |
112
+ | `AEGIS_SEMGREP_BIN` | — | Override semgrep binary path |
113
+ | `AEGIS_PYRIGHT_BIN` | — | Override pyright binary path |
114
+ | `AEGIS_TSC_BIN` | — | Override tsc binary path |
115
+
116
+ ## Tests
117
+
118
+ ```bash
119
+ npm test
120
+ ```
121
+
122
+ Unit-tested today: findings schema, lang detection, risk scoring, secret-scan
123
+ engine, tree-sitter engine, ESLint engine, orchestrator end-to-end with
124
+ path-safety + timeout.
125
+
126
+ ## Audit log
127
+
128
+ Every gate decision (allow / warn / block) appends a line to
129
+ `.aegis/audit.jsonl` under your project root. Append-only, no PII —
130
+ only the file path, action, engine, rule_id, severity, and confidence.
131
+
132
+ ## Security posture
133
+
134
+ See `ARCHITECTURE.md §8` (Threat model). Key invariants:
135
+
136
+ 1. No registry-based auto-update.
137
+ 2. No shell expansion on user paths.
138
+ 3. Findings + file content are treated as **data** in any LLM prompt
139
+ (Phase 2+), never as instructions.
140
+ 4. Phase 0-1 makes no outbound network calls.
141
+ 5. Path safety denylist covers `~/.ssh`, `~/.aws`, `~/.gnupg`, `~/.kube`,
142
+ `~/.docker`, `~/.npmrc`, `~/.pgpass`, `/etc`, `/root`, `/proc`, `/sys`,
143
+ `C:\Windows`, `C:\Program Files`.
144
+ 6. Audit log local-only, append-only, gitignored.
145
+
146
+ ## License
147
+
148
+ MIT — see `LICENSE`.
149
+
150
+ ## Roadmap
151
+
152
+ See `ROADMAP.md` for the full Phase 0 → 3 plan and per-phase exit criteria.
153
+ Phase 1 (project-aware context via SCIP + Joern + embeddings) is next.