@codragraph/cli 1.6.4 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/README.md +34 -0
  2. package/dist/cli/analyze.d.ts +22 -0
  3. package/dist/cli/analyze.js +107 -4
  4. package/dist/cli/compress-stats.d.ts +29 -0
  5. package/dist/cli/compress-stats.js +97 -0
  6. package/dist/cli/graphstore.d.ts +6 -2
  7. package/dist/cli/graphstore.js +24 -2
  8. package/dist/cli/index.js +16 -2
  9. package/dist/cli/profile-heap.d.ts +35 -0
  10. package/dist/cli/profile-heap.js +126 -0
  11. package/dist/cli/setup.d.ts +13 -0
  12. package/dist/cli/setup.js +22 -11
  13. package/dist/cli/skill-gen.d.ts +14 -2
  14. package/dist/cli/skill-gen.js +52 -19
  15. package/dist/cli/tool.js +4 -0
  16. package/dist/core/embeddings/embedding-pipeline.js +24 -7
  17. package/dist/core/group/bridge-db.js +111 -24
  18. package/dist/core/lbug/content-read.d.ts +46 -0
  19. package/dist/core/lbug/content-read.js +64 -0
  20. package/dist/core/lbug/csv-generator.d.ts +2 -6
  21. package/dist/core/lbug/csv-generator.js +45 -12
  22. package/dist/core/lbug/lbug-adapter.d.ts +4 -1
  23. package/dist/core/lbug/lbug-adapter.js +153 -21
  24. package/dist/core/lbug/schema.d.ts +7 -7
  25. package/dist/core/lbug/schema.js +18 -0
  26. package/dist/core/run-analyze.d.ts +13 -0
  27. package/dist/core/run-analyze.js +91 -4
  28. package/dist/core/search/bm25-index.js +67 -15
  29. package/dist/mcp/local/local-backend.js +22 -5
  30. package/dist/server/api.js +4 -3
  31. package/dist/storage/repo-manager.d.ts +39 -0
  32. package/dist/storage/repo-manager.js +19 -0
  33. package/hooks/claude/codragraph-hook.cjs +95 -2
  34. package/package.json +4 -4
  35. package/scripts/build-tree-sitter-proto.cjs +15 -3
  36. package/scripts/patch-tree-sitter-swift.cjs +17 -4
  37. package/skills/codragraph-api-surface.md +110 -0
  38. package/skills/codragraph-config-audit.md +146 -0
  39. package/skills/codragraph-cross-repo-impact.md +135 -0
  40. package/skills/codragraph-data-lineage.md +137 -0
  41. package/skills/codragraph-dead-code.md +119 -0
  42. package/skills/codragraph-gh-actions-debug.md +162 -0
  43. package/skills/codragraph-gh-issue-workflow.md +178 -0
  44. package/skills/codragraph-gh-pr-workflow.md +176 -0
  45. package/skills/codragraph-gh-release-workflow.md +187 -0
  46. package/skills/codragraph-git-bisect.md +176 -0
  47. package/skills/codragraph-git-force-push.md +147 -0
  48. package/skills/codragraph-git-history-rewrite.md +174 -0
  49. package/skills/codragraph-git-rebase-vs-merge.md +138 -0
  50. package/skills/codragraph-git-recovery.md +181 -0
  51. package/skills/codragraph-git-worktree.md +145 -0
  52. package/skills/codragraph-migration-tracking.md +130 -0
  53. package/skills/codragraph-notebook-context.md +136 -0
  54. package/skills/codragraph-observability-coverage.md +125 -0
  55. package/skills/codragraph-onboarding.md +129 -0
  56. package/skills/codragraph-perf-hotspots.md +132 -0
  57. package/skills/codragraph-project-switcher.md +116 -0
  58. package/skills/codragraph-security-audit.md +144 -0
  59. package/skills/codragraph-sql-tracing.md +122 -0
  60. package/skills/codragraph-supply-chain-audit.md +153 -0
  61. package/skills/codragraph-test-coverage.md +97 -0
@@ -0,0 +1,125 @@
1
+ ---
2
+ name: codragraph-observability-coverage
3
+ description: "Use to audit observability coverage — which functions / processes have logs, metrics, or distributed-trace spans, and which don't. Find the dark corners where you're flying blind. Examples: \"observability coverage\", \"unlogged code\", \"missing traces\", \"telemetry audit\", \"where are we flying blind\""
4
+ ---
5
+
6
+ # Observability Coverage Audit with CodraGraph
7
+
8
+ ## When to Use
9
+
10
+ - "Which functions have NO logs / metrics / traces?"
11
+ - "Audit telemetry coverage on the request path."
12
+ - "Find dark spots in my observability."
13
+ - "Are all my critical processes instrumented?"
14
+ - Post-incident review: "did we have visibility into X?"
15
+
16
+ ## Why CodraGraph helps here
17
+
18
+ Telemetry calls are just function calls — `logger.info(...)`,
19
+ `tracer.startSpan(...)`, `metrics.histogram(...)`. CodraGraph's call
20
+ graph shows you exactly which symbols invoke them. Subtract those from
21
+ your full symbol set: the difference is your dark zone.
22
+
23
+ ## Workflow
24
+
25
+ ```
26
+ 1. Identify your telemetry surface area:
27
+ codragraph_query({query: "logger trace span metric histogram counter"})
28
+ → list of telemetry-emitting helpers (logger.info, span.end, metrics.timing, ...)
29
+
30
+ 2. For each telemetry helper, find its callers:
31
+ codragraph_cypher({query: `
32
+ MATCH (caller)-[:CALLS]->(t {name: '<telemetry-fn>'})
33
+ RETURN DISTINCT caller.id, caller.name
34
+ `})
35
+ → "instrumented" set: every function that emits telemetry
36
+
37
+ 3. Map your critical surface (processes / entry points):
38
+ READ codragraph://repo/{name}/processes
39
+ → request-path / job-path flows
40
+
41
+ 4. Subtract: which symbols in critical processes are NOT in the
42
+ instrumented set?
43
+ codragraph_cypher({query: `
44
+ MATCH (n {label: 'Function'})
45
+ WHERE n.isEntryPoint = true
46
+ AND NOT EXISTS {
47
+ MATCH (n)-[:CALLS*1..3]->(t)
48
+ WHERE t.name STARTS WITH 'logger.'
49
+ OR t.name STARTS WITH 'tracer.'
50
+ OR t.name STARTS WITH 'metrics.'
51
+ }
52
+ RETURN n.name, n.filePath
53
+ `})
54
+ → entry points with NO telemetry within 3 hops = dark zones
55
+
56
+ 5. For each dark zone, codragraph_context to confirm and propose
57
+ minimum-viable instrumentation (one log line + one span)
58
+ ```
59
+
60
+ ## Coverage tiers
61
+
62
+ | Tier | What's covered | What it tells you |
63
+ |---|---|---|
64
+ | **None** | No telemetry within 3 hops of entry point | Flying blind under load |
65
+ | **Logs only** | `logger.*` reachable but no `tracer.*` / `metrics.*` | Can debug post-hoc, can't query prod |
66
+ | **Logs + metrics** | Counters / histograms emitted | Dashboards possible |
67
+ | **Logs + metrics + traces** | Spans tied to request flow | Full observability |
68
+ | **Structured + correlated** | All three with a request_id propagated | Best — can chase one user through everything |
69
+
70
+ ## Checklist
71
+
72
+ ```
73
+ - [ ] Listed telemetry-emitting helpers (logger / tracer / metrics)
74
+ - [ ] Resolved their direct callers (instrumented set)
75
+ - [ ] Listed critical processes / entry points
76
+ - [ ] Subtracted: which entry points have no telemetry within 3 hops?
77
+ - [ ] For each gap, propose minimum-viable instrumentation
78
+ - [ ] Tier-rate each critical flow (None / Logs / Metrics / Traces / Correlated)
79
+ ```
80
+
81
+ ## Example: "Audit observability on our checkout flow"
82
+
83
+ ```
84
+ 1. codragraph_query({query: "checkout payment process"})
85
+ → CheckoutFlow process: 7 steps (validateCart → reservePayment →
86
+ captureFunds → createOrder → notifyShip → emitReceipt → done)
87
+
88
+ 2. Telemetry helpers:
89
+ - logger.info, logger.warn, logger.error
90
+ - tracer.startSpan, span.end
91
+ - metrics.histogram, metrics.counter
92
+
93
+ 3. For each step in CheckoutFlow:
94
+ codragraph_context({name: "<step>"})
95
+ → check callees include any telemetry helper
96
+
97
+ - validateCart → logger.info ✓, tracer ✓, metrics ✗
98
+ - reservePayment → logger.info ✓, tracer ✓, metrics ✗
99
+ - captureFunds → logger.info ✓, tracer ✗, metrics ✗ ⚠
100
+ - createOrder → logger.info ✓, tracer ✓, metrics ✓
101
+ - notifyShip → ⚠ NOTHING (dark zone)
102
+ - emitReceipt → logger.info ✓
103
+ - done → logger.info ✓
104
+
105
+ 4. Gaps:
106
+ - notifyShip: entirely unobserved. Add tracer.startSpan + counter on
107
+ success/failure. Cheapest fix to close the gap.
108
+ - captureFunds: missing tracer span around the actual capture call.
109
+ Add for distributed-trace correlation with payment provider.
110
+ - validateCart, reservePayment, captureFunds: missing latency histograms.
111
+ Add metrics.timing for each.
112
+
113
+ Tier rating: Logs ✓, Metrics partial ⚠, Traces partial ⚠, Correlated ✓
114
+ (request_id is propagated end-to-end where instrumentation exists).
115
+ ```
116
+
117
+ ## Pitfalls
118
+
119
+ | Pitfall | Symptom | Fix |
120
+ |---|---|---|
121
+ | Telemetry behind a façade | Direct caller is your `obs.log()` wrapper, not `logger.info` | Search for the wrapper too |
122
+ | Conditional logging only on errors | "Looks instrumented" but emits nothing on the happy path | Audit success paths separately |
123
+ | Telemetry in middleware, missing in handler | Edge instrumentation doesn't show handler-internal state | Check both layers |
124
+ | Excessive logging in hot loops | Coverage looks great, dashboards drown in noise | Pair with codragraph-perf-hotspots; sample logs in hot paths |
125
+ ```
@@ -0,0 +1,129 @@
1
+ ---
2
+ name: codragraph-onboarding
3
+ description: "Use when a developer is new to a codebase and needs a guided walkthrough — entry points, functional areas, key flows, where to start contributing. Examples: \"I'm new to this repo\", \"where do I start\", \"give me a tour\", \"onboard me to this codebase\", \"what does this project do\""
4
+ ---
5
+
6
+ # Codebase Onboarding with CodraGraph
7
+
8
+ ## When to Use
9
+
10
+ - "I'm new to this codebase. Where do I start?"
11
+ - "Give me a tour of this project."
12
+ - "What does each part of this repo do?"
13
+ - "I want to fix a bug in `<area>` — what should I read first?"
14
+ - "I'm picking this project back up after 6 months."
15
+
16
+ ## Why CodraGraph helps here
17
+
18
+ A README tells you what the project *does*. Reading source top-down tells
19
+ you nothing for the first hour. CodraGraph already grouped your code into
20
+ **Leiden communities** during analyze — these are the natural functional
21
+ areas of the codebase, derived from the call graph rather than directory
22
+ structure. Pair them with the detected execution flows (Processes) and you
23
+ get a guided tour: each cluster is a "module", each process is a "story
24
+ running through it."
25
+
26
+ ## Workflow
27
+
28
+ ```
29
+ 1. READ codragraph://repo/{name}/context
30
+ → repo-level overview: file count, language mix, last index time
31
+
32
+ 2. READ codragraph://repo/{name}/clusters
33
+ → all functional areas (auth, payments, ingestion, …) with cohesion %
34
+ and dominant directories
35
+
36
+ 3. For each top-3 cluster (by symbol count):
37
+ READ .claude/skills/generated/<cluster-kebab-name>/SKILL.md (if --skills was run)
38
+ OR
39
+ codragraph_query({query: "<cluster label>"})
40
+ → entry points, key files, member symbols
41
+
42
+ 4. READ codragraph://repo/{name}/processes
43
+ → all detected execution flows (named processes that span the graph)
44
+
45
+ 5. For each process:
46
+ READ codragraph://repo/{name}/process/<processName>
47
+ → step-by-step trace: which symbol calls which next
48
+
49
+ 6. Pick a cluster the user wants to dig into:
50
+ codragraph_context({name: "<entry point of that cluster>"})
51
+ → callers + callees, full picture of the entry point
52
+ ```
53
+
54
+ > If `.claude/skills/generated/` is empty, run `codragraph analyze --skills`
55
+ > first to materialize per-community guides. They make onboarding
56
+ > dramatically faster.
57
+
58
+ ## Checklist
59
+
60
+ ```
61
+ - [ ] Repo overview (context resource)
62
+ - [ ] List clusters (clusters resource)
63
+ - [ ] Read top 3-5 cluster skills or query each cluster label
64
+ - [ ] List processes
65
+ - [ ] Walk top 2-3 processes step-by-step
66
+ - [ ] Pick one entry point and run context for the deep dive
67
+ - [ ] Summarize: "Here's the map. Start at <X> for <task>."
68
+ ```
69
+
70
+ ## Tour Structure
71
+
72
+ | Stage | Tool | Output |
73
+ | --- | --- | --- |
74
+ | Map | `clusters` resource | "10 functional areas, dominant: auth, ingestion, web" |
75
+ | Themes | per-community SKILL.md | Each area's purpose, key files, entry points |
76
+ | Stories | `processes` resource | "5 flows: SignupFlow, IngestPipeline, …" |
77
+ | Trace | `process/{name}` resource | Step-by-step call sequence |
78
+ | Deep dive | `context` | Pick one symbol, see all sides |
79
+
80
+ ## Example: "I'm new to CodraGraph itself, where do I start?"
81
+
82
+ ```
83
+ 1. READ codragraph://repo/CodraGraph/context
84
+ → 4325 symbols, 10556 relationships, 300 flows. TypeScript primary.
85
+
86
+ 2. READ codragraph://repo/CodraGraph/clusters
87
+ → Top clusters: ingestion (1240 symbols), graphstore (340), cli (290),
88
+ mcp (220), languages (180)
89
+
90
+ 3. READ .claude/skills/generated/ingestion/SKILL.md
91
+ → Entry points: runFullAnalysis, IngestionPipeline.run
92
+ → Key files: codragraph/src/core/ingestion/
93
+
94
+ 4. READ codragraph://repo/CodraGraph/processes
95
+ → Top flows: AnalyzeFlow, McpQueryFlow, GraphstoreCommitFlow
96
+
97
+ 5. READ codragraph://repo/CodraGraph/process/AnalyzeFlow
98
+ → 12 steps from CLI invocation through Phase 4 snapshot
99
+
100
+ 6. codragraph_context({name: "runFullAnalysis"})
101
+ → orchestrator that takes (repoPath, options, hooks) and runs the pipeline.
102
+ → Called by: analyzeCommand (CLI), eval-server, augment hook
103
+
104
+ Tour result: "Start at runFullAnalysis (codragraph/src/core/run-analyze.ts).
105
+ That's the orchestrator. The 12-stage pipeline lives under
106
+ src/core/ingestion/. Phase 4 graphstore is in src/core/graphstore/."
107
+ ```
108
+
109
+ ## Output Format
110
+
111
+ ```markdown
112
+ ## Codebase Tour: <repo>
113
+
114
+ ### Project shape
115
+ - N symbols across M files. Primary languages: …
116
+ - N functional areas (clusters), M execution flows.
117
+
118
+ ### Functional areas
119
+ 1. **<cluster>** — <symbolCount> symbols, dominant `src/<dir>/`. Purpose: …
120
+ 2. ...
121
+
122
+ ### Key flows
123
+ - **AnalyzeFlow** — 12 steps. Entry: `<symbol>`.
124
+ - ...
125
+
126
+ ### Recommended starting point for "<task>"
127
+ Read `<file>:<line>` (`<symbol>`). It's the orchestrator for <area>.
128
+ Once you understand it, walk the <flow> to see the whole story.
129
+ ```
@@ -0,0 +1,132 @@
1
+ ---
2
+ name: codragraph-perf-hotspots
3
+ description: "Use to identify likely performance hot paths from the call graph — top-N callees of entry points, fan-out functions, recursive cycles, and where to focus profiling effort. NOT a profiler — a structural pre-screen that narrows where to actually run a profiler. Examples: \"find perf hotspots\", \"hot paths\", \"top callees\", \"where should I profile\", \"functions called by every handler\""
4
+ ---
5
+
6
+ # Performance Hotspot Pre-Screen with CodraGraph
7
+
8
+ ## When to Use
9
+
10
+ - "Where should I focus my profiler?"
11
+ - "Which functions are on every request path?"
12
+ - "Find fan-out points — functions called from many places."
13
+ - "Are there recursive call cycles?"
14
+ - "List top N callees of the API request handler."
15
+
16
+ ## What this skill IS and ISN'T
17
+
18
+ CodraGraph builds a **static call graph** — it knows who *can* call
19
+ whom, not who *did* call whom in production. So this skill identifies
20
+ **structural hot path candidates**, not measured hotspots.
21
+
22
+ Use it as a **pre-screen** for actual profiling: "given my structural
23
+ hot path candidates, the profiler should focus here first." If you have
24
+ profiler data (pprof, flamegraphs, OpenTelemetry traces), CodraGraph
25
+ turns the names from that data into actionable callgraph context.
26
+
27
+ ## Workflow
28
+
29
+ ```
30
+ 1. Identify entry points:
31
+ codragraph_query({query: "request handler endpoint route main"})
32
+ → top entry-point candidates
33
+
34
+ 2. For each entry point, get top N callees ordered by depth-1 fan-in:
35
+ codragraph_cypher({query: `
36
+ MATCH (caller)-[:CALLS]->(callee)
37
+ WHERE callee.label = 'Function'
38
+ RETURN callee.name, count(DISTINCT caller) AS in_degree
39
+ ORDER BY in_degree DESC
40
+ LIMIT 20
41
+ `})
42
+ → callees called from many places = on many paths = candidate hot spots
43
+
44
+ 3. Cross-cut with processes:
45
+ READ codragraph://repo/{name}/processes
46
+ → functions appearing in MANY processes ARE on the hot path by definition
47
+
48
+ 4. Recursive cycle detection:
49
+ codragraph_cypher({query: `
50
+ MATCH path = (n)-[:CALLS*2..6]->(n)
51
+ RETURN n.name, length(path) AS cycle_len
52
+ ORDER BY cycle_len ASC
53
+ LIMIT 10
54
+ `})
55
+ → unbounded recursion = potential perf cliff under specific inputs
56
+
57
+ 5. With profiler output, translate names back to context:
58
+ For each top-N name from your flamegraph/pprof:
59
+ codragraph_context({name: "<sym>"})
60
+ → "this function is on N execution flows; called by M sites"
61
+ ```
62
+
63
+ ## Hot-path heuristics
64
+
65
+ | Signal | Meaning |
66
+ |---|---|
67
+ | Function called from > 10 distinct callers | High fan-in → optimize once, win everywhere |
68
+ | Function appearing in > 5 processes | On many request paths → request-time critical |
69
+ | Cycle of length 2-3 in CALLS edges | Mutual recursion — may overflow under depth |
70
+ | `await` chain of length > 8 in one process | Sequential I/O — candidate for parallelism |
71
+ | Function under cluster `database` / `network` | I/O-bound; profile network and DB calls separately |
72
+
73
+ ## Checklist
74
+
75
+ ```
76
+ - [ ] List entry points (codragraph_query for handlers/routes/main)
77
+ - [ ] Top-N callees by in-degree (Cypher)
78
+ - [ ] Cross-reference with processes (functions in many flows = hot)
79
+ - [ ] Cycle detection
80
+ - [ ] If profiler data exists, codragraph_context for each top hot symbol
81
+ - [ ] Prioritize: request-path + high in-degree + I/O-bound = first to optimize
82
+ ```
83
+
84
+ ## Example: "Find the hot paths in our HTTP handler chain"
85
+
86
+ ```
87
+ 1. codragraph_query({query: "express router handler"})
88
+ → 28 handler functions
89
+
90
+ 2. codragraph_cypher({query: `
91
+ MATCH (caller)-[:CALLS]->(callee)
92
+ WHERE callee.label = 'Function'
93
+ RETURN callee.name, count(DISTINCT caller) AS in_degree
94
+ ORDER BY in_degree DESC LIMIT 10
95
+ `})
96
+ → top in-degree callees:
97
+ - logRequest (28) ← every handler calls this
98
+ - getCurrentUser (22)
99
+ - db.query (18) ← I/O bound
100
+ - cache.get (15)
101
+ - serializeJson (28) ← every handler calls this
102
+
103
+ 3. READ codragraph://repo/CodraGraph/processes
104
+ → 5 processes; getCurrentUser appears in 4 of 5
105
+
106
+ 4. codragraph_context({name: "getCurrentUser"})
107
+ → 22 callers, calls db.query (cache miss path) and cache.get
108
+ → STRONGLY recommend: profile getCurrentUser first.
109
+ → Win-rate per opt: 22 callers × cache miss rate × DB latency.
110
+ ```
111
+
112
+ ## Output Format
113
+
114
+ ```markdown
115
+ ## Perf Pre-Screen: <scope>
116
+
117
+ ### Top hot-path candidates (structural)
118
+ | Function | In-degree | Processes | I/O type | Note |
119
+ |---|--:|--:|---|---|
120
+ | getCurrentUser | 22 | 4 | DB + cache | profile first |
121
+ | db.query | 18 | 5 | DB | hot for write paths |
122
+ | serializeJson | 28 | 5 | CPU | every handler — micro-opt territory |
123
+
124
+ ### Cycles detected
125
+ - `processStep ↔ enqueueRetry` — depth 2 cycle, unbounded under failure conditions
126
+
127
+ ### Profiler integration plan
128
+ 1. Collect pprof / flamegraph / OTEL trace under representative load
129
+ 2. Top-N hottest functions from profile → run codragraph_context on each
130
+ 3. Cross-reference with this static pre-screen
131
+ 4. Optimize where structural & measured hot paths overlap
132
+ ```
@@ -0,0 +1,116 @@
1
+ ---
2
+ name: codragraph-project-switcher
3
+ description: "Use when the user works across many parallel projects and needs to switch context, find which repo a symbol is in, list all indexed projects, or run a query against a specific repo without ambiguity. Examples: \"what projects am I working on\", \"switch to repo X\", \"which of my repos has function Y\", \"list my repositories\""
4
+ ---
5
+
6
+ # Multi-Project / Vibecoding Context Switcher
7
+
8
+ ## When to Use
9
+
10
+ - "What projects do I have indexed?"
11
+ - "Switch to my `<project>` repo for this query."
12
+ - "Which of my repos has the `<symbol>` function?"
13
+ - "Run this query across all my repos."
14
+ - Solo dev juggling 4+ side projects with different agents
15
+ - Picking up a project after weeks away
16
+
17
+ ## Why CodraGraph helps here
18
+
19
+ CodraGraph maintains a global registry of every indexed repo at
20
+ `~/.codragraph/registry.json`. Every MCP tool accepts a `repo` parameter
21
+ to disambiguate. So switching context isn't "open a new editor / cd into
22
+ the project / re-orient your agent" — it's just passing `repo: "<name>"`
23
+ to the next call. Combine with **groups** (multiple repos that share
24
+ contracts) and you get cross-repo queries with one call.
25
+
26
+ ## Workflow
27
+
28
+ ```
29
+ 1. List every indexed repo:
30
+ codragraph_list_repos({})
31
+ → name, path, file count, last analyze time
32
+
33
+ 2. (Optional) List groups (sets of related repos):
34
+ codragraph_group_list({})
35
+ → group name + member repos
36
+
37
+ 3. Query against a specific repo:
38
+ codragraph_query({repo: "<name>", query: "<concept>"})
39
+ → answers come from that repo only
40
+
41
+ 4. Find which repo has a symbol you remember:
42
+ For each repo from list_repos:
43
+ codragraph_query({repo: "<name>", query: "<remembered name>"})
44
+ → first hit identifies the repo
45
+
46
+ 5. For cross-repo questions (group-mode):
47
+ codragraph_query({repo: "@<group>", query: "<concept>"})
48
+ → fans out across every group member, RRF-merges results
49
+ ```
50
+
51
+ > If `list_repos` returns nothing, the user has no indexed projects yet.
52
+ > Run `codragraph analyze` in each project once to register them.
53
+
54
+ ## Checklist
55
+
56
+ ```
57
+ - [ ] list_repos to see what's indexed
58
+ - [ ] group_list to see related-repo groups
59
+ - [ ] Pick the right repo (or group) for the question
60
+ - [ ] Pass repo: "<name>" or repo: "@<group>" to subsequent calls
61
+ - [ ] If a project isn't indexed yet, suggest the user run analyze in it
62
+ - [ ] Mention staleness ("repo X last indexed 12 days ago — re-analyze?")
63
+ ```
64
+
65
+ ## Multi-Project Patterns
66
+
67
+ | Situation | What to do |
68
+ | --- | --- |
69
+ | Switching from one solo project to another | `list_repos` → pick → all subsequent tools take `repo: "<name>"` |
70
+ | "Did I solve this in another repo?" | `query` over each repo, look for matching symbols |
71
+ | Shared library used by multiple repos | Define a group (group.yaml); use `repo: "@<group>"` |
72
+ | Resuming a project after weeks | Check staleness (`list_repos` last-indexed timestamps); re-analyze if old |
73
+
74
+ ## Example: "Switch me to my SaaS side project and find the auth code"
75
+
76
+ ```
77
+ 1. codragraph_list_repos({})
78
+ → 4 indexed repos:
79
+ - codragraph (~/code/codragraph, 4325 symbols, indexed 2 hours ago)
80
+ - my-saas (~/projects/my-saas, 1180 symbols, indexed 3 days ago)
81
+ - portfolio (~/code/portfolio, 240 symbols, indexed 2 weeks ago)
82
+ - data-eda (~/notebooks/data-eda, 95 symbols, indexed 1 month ago)
83
+
84
+ 2. codragraph_query({repo: "my-saas", query: "authentication login session"})
85
+ → top 5 symbols in my-saas, none in other repos
86
+
87
+ 3. codragraph_context({repo: "my-saas", name: "validateSession"})
88
+ → callers: requireAuth, refreshToken (both in my-saas)
89
+
90
+ 4. (Optional) Reminder: my-saas was indexed 3 days ago — fine for navigation
91
+ but if I just made commits, run `cd ~/projects/my-saas && codragraph analyze`
92
+ to refresh.
93
+
94
+ Switched. Subsequent queries default to my-saas now.
95
+ ```
96
+
97
+ ## Output Format
98
+
99
+ ```markdown
100
+ ## Project Switch: → `<repo>`
101
+
102
+ ### Available projects
103
+ 1. **<repo-1>** — N symbols, last indexed Xh ago
104
+ 2. **<repo-2>** — M symbols, last indexed Yd ago (stale?)
105
+ 3. ...
106
+
107
+ ### Active for this conversation
108
+ `<chosen-repo>` (path: `<path>`).
109
+
110
+ ### Staleness note
111
+ Last indexed `<duration>` ago. Re-analyze if recent commits matter.
112
+
113
+ ### Quick links
114
+ - `query` already scoped to this repo
115
+ - For cross-repo: pass `repo: "@<group>"` instead
116
+ ```
@@ -0,0 +1,144 @@
1
+ ---
2
+ name: codragraph-security-audit
3
+ description: "Use for security-focused codebase audits — finding auth bypass paths, missing input validation, secrets in code, untrusted-input flow, and routes that skip auth middleware. Examples: \"security audit\", \"find auth bypass\", \"unvalidated input\", \"untrusted data flow\", \"missing authentication\""
4
+ ---
5
+
6
+ # Security Audit with CodraGraph
7
+
8
+ ## When to Use
9
+
10
+ - "Audit auth coverage — which routes skip the auth middleware?"
11
+ - "Find input that flows from request to SQL/template/exec without validation."
12
+ - "Find hardcoded secrets / credentials in source."
13
+ - "Trace untrusted-input flow for `<endpoint>`."
14
+ - Pre-release security pass on a PR-heavy week.
15
+
16
+ ## Why CodraGraph helps here
17
+
18
+ Static-analysis tools find pattern matches; CodraGraph adds the
19
+ **call-graph**, so you can answer "*which* request handlers reach this
20
+ unsafe sink?" rather than just "this sink is unsafe somewhere." Combined
21
+ with `query` for sensitive identifier strings and `cypher` for structural
22
+ filters, you can build an audit that's far more targeted than grep.
23
+
24
+ ## Workflow
25
+
26
+ ```
27
+ 1. Identify the boundary symbols:
28
+ codragraph_query({query: "request handler route controller"})
29
+ → all entry points where untrusted input arrives
30
+
31
+ 2. Identify the dangerous sinks:
32
+ codragraph_query({query: "exec spawn eval query raw_sql innerHTML"})
33
+ → places where untrusted input becomes harmful
34
+
35
+ 3. For each (handler → sink) pair, walk the call graph:
36
+ codragraph_impact({target: "<sink>", direction: "upstream"})
37
+ → which handlers REACH this sink?
38
+
39
+ 4. For each path, look for a validator on the way:
40
+ codragraph_context({name: "<handler>"})
41
+ → is `validate / sanitize / escape / parse` called between handler and sink?
42
+ → if not: candidate vulnerability
43
+
44
+ 5. Cross-check against secrets:
45
+ codragraph_cypher({query: "MATCH (n) WHERE n.body =~ '.*(?i)(api[_-]?key|secret|password|token)[ ]*=[ ]*[\\'\"][a-zA-Z0-9]{16,}.*' RETURN n"})
46
+ → suspicious literals that look like real keys
47
+ ```
48
+
49
+ ## Audit Patterns
50
+
51
+ | Pattern | What to look for | CodraGraph approach |
52
+ |---|---|---|
53
+ | Auth bypass | Routes not wrapped by `requireAuth` middleware | `query` for routes; check `context` for middleware in callers |
54
+ | SQL injection | Raw query string built from request input | `query` SQL literals → `impact` upstream → flag handlers |
55
+ | XSS | Untrusted input rendered without escape | `query` `innerHTML` / `dangerouslySetInnerHTML` → impact upstream |
56
+ | Command injection | `exec` / `spawn` with concatenated input | `query` exec/spawn → impact upstream → check for shell escape |
57
+ | Open redirect | Redirect URL from request | `query` `redirect` / `Location:` → trace input source |
58
+ | Hardcoded secrets | API keys in source | `cypher` regex over `n.body` |
59
+ | Missing CSRF | State-changing routes without CSRF middleware | `query` POST / PUT / DELETE handlers → check middleware chain |
60
+
61
+ ## Why "missing validator" is a great query
62
+
63
+ The graph cleanly shows the call path: `handler → … → sink`. Validators
64
+ appear in that chain or they don't. If `validateInput` / `sanitize` /
65
+ `escape` is NOT in the path between a handler and a sink, that's a
66
+ *provable* gap, not a guess.
67
+
68
+ ```
69
+ codragraph_cypher({query: `
70
+ // Find handler→sink paths that don't pass through ANY validator
71
+ MATCH path = (handler {label: 'Function'})-[:CALLS*1..6]->(sink {label: 'Function'})
72
+ WHERE handler.isEntryPoint = true
73
+ AND sink.name IN ['exec', 'query', 'innerHTML', 'eval']
74
+ AND NONE(n IN nodes(path) WHERE n.name STARTS WITH 'validate'
75
+ OR n.name STARTS WITH 'sanitize'
76
+ OR n.name STARTS WITH 'escape')
77
+ RETURN handler.name, sink.name, length(path) AS hops
78
+ ORDER BY hops ASC
79
+ `})
80
+ ```
81
+
82
+ ## Checklist
83
+
84
+ ```
85
+ - [ ] Listed entry-point handlers (query for "handler"/"route"/"controller")
86
+ - [ ] Listed dangerous sinks (exec/query/eval/innerHTML/raw)
87
+ - [ ] Built handler→sink table; flagged paths missing validators
88
+ - [ ] Cypher scan for hardcoded-secret literal patterns
89
+ - [ ] codragraph_context on each flagged handler — confirm coverage / propose fix
90
+ - [ ] Document findings as severity-tagged report
91
+ ```
92
+
93
+ ## Example: "Audit which routes skip our requireAuth middleware"
94
+
95
+ ```
96
+ 1. codragraph_query({query: "Express router get post put delete handler"})
97
+ → 28 route handlers across 6 router files
98
+
99
+ 2. For each handler, codragraph_context({name: "<handler>"}):
100
+ → check that callers include `requireAuth` (a known middleware function)
101
+
102
+ 3. codragraph_cypher({
103
+ query: `MATCH (handler {isEntryPoint: true})-[:CALLS]->()
104
+ WHERE NOT EXISTS {
105
+ MATCH (handler)<-[:CALLS]-(mw {name: 'requireAuth'})
106
+ }
107
+ RETURN handler.name, handler.filePath`
108
+ })
109
+ → 4 handlers have no requireAuth caller:
110
+ - publicHealthCheck (intentional ✓)
111
+ - signup, login (intentional ✓ — these CREATE auth)
112
+ - debugDumpUser ⚠ (NOT intentional — leaks user data)
113
+
114
+ 4. Findings report:
115
+ - HIGH: debugDumpUser at src/routes/debug.ts:42
116
+ reaches db.query → returns full user record. No auth.
117
+ Fix: wrap with requireAuth or remove from production builds.
118
+ ```
119
+
120
+ ## Output Format
121
+
122
+ ```markdown
123
+ ## Security Audit: <scope>
124
+
125
+ ### Summary
126
+ - N handlers audited
127
+ - M handler→sink paths inspected
128
+ - K validator-missing paths flagged
129
+ - Severity: 1 HIGH, 2 MEDIUM, 0 CRITICAL
130
+
131
+ ### Findings
132
+
133
+ #### HIGH — debugDumpUser at src/routes/debug.ts:42
134
+ - Reachable without `requireAuth`
135
+ - Calls `db.query` with no input validation
136
+ - Returns full user record
137
+ - **Fix:** wrap with auth middleware OR strip from production builds
138
+
139
+ #### MEDIUM — …
140
+
141
+ ### Hardcoded-secret scan
142
+ - 0 high-confidence matches
143
+ - 1 false-positive: example token in test fixture
144
+ ```