@codragraph/cli 1.6.4 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +34 -0
- package/dist/cli/analyze.d.ts +22 -0
- package/dist/cli/analyze.js +107 -4
- package/dist/cli/compress-stats.d.ts +29 -0
- package/dist/cli/compress-stats.js +97 -0
- package/dist/cli/graphstore.d.ts +6 -2
- package/dist/cli/graphstore.js +24 -2
- package/dist/cli/index.js +16 -2
- package/dist/cli/profile-heap.d.ts +35 -0
- package/dist/cli/profile-heap.js +126 -0
- package/dist/cli/setup.d.ts +13 -0
- package/dist/cli/setup.js +22 -11
- package/dist/cli/skill-gen.d.ts +14 -2
- package/dist/cli/skill-gen.js +52 -19
- package/dist/cli/tool.js +4 -0
- package/dist/core/embeddings/embedding-pipeline.js +24 -7
- package/dist/core/group/bridge-db.js +111 -24
- package/dist/core/lbug/content-read.d.ts +46 -0
- package/dist/core/lbug/content-read.js +64 -0
- package/dist/core/lbug/csv-generator.d.ts +2 -6
- package/dist/core/lbug/csv-generator.js +45 -12
- package/dist/core/lbug/lbug-adapter.d.ts +4 -1
- package/dist/core/lbug/lbug-adapter.js +153 -21
- package/dist/core/lbug/schema.d.ts +7 -7
- package/dist/core/lbug/schema.js +18 -0
- package/dist/core/run-analyze.d.ts +13 -0
- package/dist/core/run-analyze.js +91 -4
- package/dist/core/search/bm25-index.js +67 -15
- package/dist/mcp/local/local-backend.js +22 -5
- package/dist/server/api.js +4 -3
- package/dist/storage/repo-manager.d.ts +39 -0
- package/dist/storage/repo-manager.js +19 -0
- package/hooks/claude/codragraph-hook.cjs +95 -2
- package/package.json +4 -4
- package/scripts/build-tree-sitter-proto.cjs +15 -3
- package/scripts/patch-tree-sitter-swift.cjs +17 -4
- package/skills/codragraph-api-surface.md +110 -0
- package/skills/codragraph-config-audit.md +146 -0
- package/skills/codragraph-cross-repo-impact.md +135 -0
- package/skills/codragraph-data-lineage.md +137 -0
- package/skills/codragraph-dead-code.md +119 -0
- package/skills/codragraph-gh-actions-debug.md +162 -0
- package/skills/codragraph-gh-issue-workflow.md +178 -0
- package/skills/codragraph-gh-pr-workflow.md +176 -0
- package/skills/codragraph-gh-release-workflow.md +187 -0
- package/skills/codragraph-git-bisect.md +176 -0
- package/skills/codragraph-git-force-push.md +147 -0
- package/skills/codragraph-git-history-rewrite.md +174 -0
- package/skills/codragraph-git-rebase-vs-merge.md +138 -0
- package/skills/codragraph-git-recovery.md +181 -0
- package/skills/codragraph-git-worktree.md +145 -0
- package/skills/codragraph-migration-tracking.md +130 -0
- package/skills/codragraph-notebook-context.md +136 -0
- package/skills/codragraph-observability-coverage.md +125 -0
- package/skills/codragraph-onboarding.md +129 -0
- package/skills/codragraph-perf-hotspots.md +132 -0
- package/skills/codragraph-project-switcher.md +116 -0
- package/skills/codragraph-security-audit.md +144 -0
- package/skills/codragraph-sql-tracing.md +122 -0
- package/skills/codragraph-supply-chain-audit.md +153 -0
- package/skills/codragraph-test-coverage.md +97 -0
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: codragraph-api-surface
|
|
3
|
+
description: "Use when the user wants to enumerate the public API of a package or codebase, understand what's exported, audit breaking change risk, or compare API shapes across versions. Examples: \"what's our public API\", \"list exports\", \"API surface\", \"what would break if I remove X\", \"document the public interface\""
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# API Surface Audit with CodraGraph
|
|
7
|
+
|
|
8
|
+
## When to Use
|
|
9
|
+
|
|
10
|
+
- "What's the public API of this package?"
|
|
11
|
+
- "List every exported function / class / type"
|
|
12
|
+
- "What would break if I remove or rename `<symbol>`?"
|
|
13
|
+
- Pre-release API freeze audit
|
|
14
|
+
- Generating API documentation from the graph
|
|
15
|
+
- Comparing API surface across versions (with `codragraph diff --semantic`)
|
|
16
|
+
|
|
17
|
+
## Why CodraGraph helps here
|
|
18
|
+
|
|
19
|
+
Reading every `index.ts` / `__init__.py` / `mod.rs` by hand misses re-exports
|
|
20
|
+
and framework-magic exports (Next.js page routes, decorators, registered
|
|
21
|
+
plugins). CodraGraph's `isExported` property is computed by language-aware
|
|
22
|
+
export detection — covers default exports, named re-exports, `__all__`,
|
|
23
|
+
`pub use`, etc., consistently across all 16 supported languages.
|
|
24
|
+
|
|
25
|
+
## Workflow
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
1. codragraph_cypher({query: `
|
|
29
|
+
MATCH (n) WHERE n.isExported = true
|
|
30
|
+
RETURN labels(n)[0] AS table, n.name, n.filePath, n.id
|
|
31
|
+
ORDER BY table, n.filePath, n.name
|
|
32
|
+
`})
|
|
33
|
+
→ every exported symbol, grouped by table
|
|
34
|
+
|
|
35
|
+
2. For each high-traffic export:
|
|
36
|
+
codragraph_impact({target: "<name>", direction: "upstream"})
|
|
37
|
+
→ who depends on it (within this repo)
|
|
38
|
+
|
|
39
|
+
3. For cross-repo audits (multi-repo group):
|
|
40
|
+
codragraph_impact({repo: "@<group>", target: "<name>", direction: "upstream"})
|
|
41
|
+
→ blast radius across every group member
|
|
42
|
+
|
|
43
|
+
4. Compare across versions:
|
|
44
|
+
codragraph diff <baseline> <head> --semantic --json
|
|
45
|
+
→ addedAPIs / removedAPIs / classifiedModifications
|
|
46
|
+
→ produces a versioned changelog of what your public surface gained / lost
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
> Pair with `codragraph-pr-review` skill when reviewing a PR that touches
|
|
50
|
+
> exported symbols — the impact-across-group check is the difference between
|
|
51
|
+
> "breaks our consumers" and "internal refactor."
|
|
52
|
+
|
|
53
|
+
## Checklist
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
- [ ] Cypher query for n.isExported = true
|
|
57
|
+
- [ ] Group by file or by community (Leiden cluster)
|
|
58
|
+
- [ ] For each non-trivial export, run impact upstream
|
|
59
|
+
- [ ] If the package is in a group, run impact with repo: "@group" too
|
|
60
|
+
- [ ] Compare with previous release: codragraph diff <prev-tag> HEAD --semantic
|
|
61
|
+
- [ ] Flag exports with no documented consumers — candidates for visibility
|
|
62
|
+
reduction (export → internal)
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
## Example: "What's our public API?"
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
1. codragraph_cypher({
|
|
69
|
+
query: `MATCH (n) WHERE n.isExported = true
|
|
70
|
+
RETURN labels(n)[0] AS table, n.name, n.filePath`
|
|
71
|
+
})
|
|
72
|
+
→ 47 exports: 22 Function, 12 Class, 8 Interface, 5 Constant
|
|
73
|
+
|
|
74
|
+
2. Top-level functions:
|
|
75
|
+
- createClient (src/index.ts) ← 14 callers
|
|
76
|
+
- fetchUser (src/api.ts) ← 6 callers
|
|
77
|
+
- validate (src/utils.ts) ← 1 internal caller only ⚠ over-exported
|
|
78
|
+
|
|
79
|
+
3. codragraph_impact({target: "validate", direction: "upstream"})
|
|
80
|
+
→ d=1: only formatPayload (same package). No external consumers.
|
|
81
|
+
→ Recommend: drop the `export` keyword. Internal-only.
|
|
82
|
+
|
|
83
|
+
4. Compare with v1.5.3 release:
|
|
84
|
+
codragraph diff v1.5.3 HEAD --semantic
|
|
85
|
+
→ +3 added APIs, -1 removed API (mappings.toCamelCase), ~2 modified
|
|
86
|
+
→ Removed API is a SemVer major bump.
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
## Output Format
|
|
90
|
+
|
|
91
|
+
```markdown
|
|
92
|
+
## API Surface: <package>
|
|
93
|
+
|
|
94
|
+
### Exports (47 total)
|
|
95
|
+
| Symbol | Table | File | Callers (internal) | Notes |
|
|
96
|
+
|--------|-------|------|-------------------:|-------|
|
|
97
|
+
| createClient | Function | src/index.ts | 14 | core entry |
|
|
98
|
+
| validate | Function | src/utils.ts | 1 | over-exported, suggest internal |
|
|
99
|
+
| ...
|
|
100
|
+
|
|
101
|
+
### Diff vs <previous-tag>
|
|
102
|
+
- **Added (3):** `subscribe`, `unsubscribe`, `EventBus`
|
|
103
|
+
- **Removed (1):** `toCamelCase` ⚠ SemVer major
|
|
104
|
+
- **Modified (2):** `createClient` (param 3→4), `fetchUser` (return type)
|
|
105
|
+
|
|
106
|
+
### Recommendations
|
|
107
|
+
- Reduce visibility on 4 over-exported internals
|
|
108
|
+
- Document the 3 new APIs in the release notes
|
|
109
|
+
- The removed `toCamelCase` requires a major version bump
|
|
110
|
+
```
|
|
@@ -0,0 +1,146 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: codragraph-config-audit
|
|
3
|
+
description: "Use to audit how environment variables, config files, and feature flags are read and used across the codebase — find unused config, missing defaults, undocumented env vars, secrets read into logs. Examples: \"audit env vars\", \"unused config\", \"who reads FOO_BAR env\", \"feature flag usage\", \"config sprawl\""
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Configuration Audit with CodraGraph
|
|
7
|
+
|
|
8
|
+
## When to Use
|
|
9
|
+
|
|
10
|
+
- "Which env vars do we actually read?"
|
|
11
|
+
- "Which env vars are read but never set in deploy configs?"
|
|
12
|
+
- "Find the unused feature flags I can delete."
|
|
13
|
+
- "Who reads `STRIPE_SECRET_KEY`?"
|
|
14
|
+
- "Is `<config>` ever logged or sent to telemetry?"
|
|
15
|
+
- "Audit config sprawl before consolidating."
|
|
16
|
+
|
|
17
|
+
## Why CodraGraph helps here
|
|
18
|
+
|
|
19
|
+
Configuration enters your code through a small set of helpers:
|
|
20
|
+
`process.env.X`, `os.getenv("X")`, `config.get("foo.bar")`,
|
|
21
|
+
`featureFlags.isEnabled("flag")`. CodraGraph indexes the calls to those
|
|
22
|
+
helpers and the literal arguments — so a `query` for the helper plus a
|
|
23
|
+
`context` of each call site produces a complete picture of which keys
|
|
24
|
+
are read where.
|
|
25
|
+
|
|
26
|
+
## Workflow
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
1. Identify the config helpers (per-language patterns):
|
|
30
|
+
codragraph_query({query: "process.env getenv ConfigService featureFlags"})
|
|
31
|
+
→ list of config-read helpers
|
|
32
|
+
|
|
33
|
+
2. For each helper, find every call site and its key argument:
|
|
34
|
+
codragraph_cypher({query: `
|
|
35
|
+
MATCH (caller)-[:CALLS]->(helper {name: 'getenv'})
|
|
36
|
+
RETURN caller.name, caller.filePath
|
|
37
|
+
`})
|
|
38
|
+
→ For richer key-extraction, read the bodies via context:
|
|
39
|
+
codragraph_context({name: "<caller>", content: true})
|
|
40
|
+
→ look for the literal string passed to getenv()
|
|
41
|
+
|
|
42
|
+
3. Cross-check with deploy configs:
|
|
43
|
+
- Read .env / .env.example / docker-compose.yml / k8s ConfigMaps
|
|
44
|
+
- Build the SET of keys actually defined
|
|
45
|
+
- For each key your code reads but isn't defined: undocumented env var
|
|
46
|
+
- For each key defined but no code reads: dead config — delete
|
|
47
|
+
|
|
48
|
+
4. Feature-flag specific audit:
|
|
49
|
+
codragraph_query({query: "featureFlags.isEnabled flag.evaluate"})
|
|
50
|
+
→ For each flag-read site: codragraph_impact upstream
|
|
51
|
+
→ Flags with no callers can be removed
|
|
52
|
+
→ Flags with one branch always returning true / false are stale
|
|
53
|
+
|
|
54
|
+
5. Secret-leakage check:
|
|
55
|
+
codragraph_query({query: "STRIPE_SECRET DATABASE_URL API_KEY"})
|
|
56
|
+
→ For each match: codragraph_context to confirm the value is not
|
|
57
|
+
piped to logger / tracer / metrics
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Audit dimensions
|
|
61
|
+
|
|
62
|
+
| Dimension | Question | CodraGraph approach |
|
|
63
|
+
|---|---|---|
|
|
64
|
+
| **Used** | Is this env var read anywhere? | `query` for the literal key |
|
|
65
|
+
| **Documented** | Is the key in `.env.example` / docs? | grep deploy files; subtract from used set |
|
|
66
|
+
| **Defaulted** | Does the read have a default? | `context` shows the surrounding code |
|
|
67
|
+
| **Validated** | Is the value parsed / type-checked? | `context` for `parseInt` / `URL` / Zod schema in the caller |
|
|
68
|
+
| **Logged** | Does the value flow to telemetry? | `impact` downstream from the read site → check telemetry helpers |
|
|
69
|
+
| **Stale flag** | Is the flag still toggled in production? | combine with deploy-config check |
|
|
70
|
+
|
|
71
|
+
## Feature flag lifecycle audit
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
codragraph_cypher({query: `
|
|
75
|
+
MATCH (caller)-[:CALLS]->(ff {name: 'isEnabled'})
|
|
76
|
+
RETURN caller.name, caller.filePath, count(*) AS uses
|
|
77
|
+
ORDER BY uses DESC
|
|
78
|
+
`})
|
|
79
|
+
→ for each call site, codragraph_context to extract the flag NAME literal
|
|
80
|
+
|
|
81
|
+
# Then:
|
|
82
|
+
- Flag name read by 0 callers → remove
|
|
83
|
+
- Flag name with both branches identical → stale (always-true or always-false)
|
|
84
|
+
- Flag still wired in code, but config has it pinned `true` for >90 days → graduate
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## Checklist
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
- [ ] Listed config helpers (env / config / featureFlag readers)
|
|
91
|
+
- [ ] Built the read-set: { key: [ call sites ] }
|
|
92
|
+
- [ ] Built the defined-set from deploy configs
|
|
93
|
+
- [ ] Diff: undocumented (in code, not in config) + dead (in config, not in code)
|
|
94
|
+
- [ ] Spot-check defaults / validation / secret leakage on critical keys
|
|
95
|
+
- [ ] Feature-flag staleness check
|
|
96
|
+
- [ ] Output: read map + recommended deletions / required deploy changes
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
## Example: "Audit our feature flags"
|
|
100
|
+
|
|
101
|
+
```
|
|
102
|
+
1. codragraph_query({query: "featureFlags.isEnabled"})
|
|
103
|
+
→ 47 call sites in 23 files
|
|
104
|
+
|
|
105
|
+
2. For each call site, extract the flag string (codragraph_context):
|
|
106
|
+
- 'new_checkout' (12 sites)
|
|
107
|
+
- 'experimental_search' (4 sites)
|
|
108
|
+
- 'use_new_pricing' (8 sites)
|
|
109
|
+
- 'kill_legacy_admin' (1 site)
|
|
110
|
+
- 'canary_v3' (0 sites — defined in code dead)
|
|
111
|
+
|
|
112
|
+
3. Cross-check deploys:
|
|
113
|
+
- 'new_checkout' set to TRUE for 100%% prod since 2026-01 (graduate it)
|
|
114
|
+
- 'experimental_search' set to TRUE for 5%% prod (active experiment, keep)
|
|
115
|
+
- 'use_new_pricing' set to TRUE for 100%% prod since 2026-03 (graduate)
|
|
116
|
+
- 'kill_legacy_admin' set to TRUE for 100%% prod since 2026-02 (graduate)
|
|
117
|
+
- 'canary_v3' not configured anywhere (truly dead)
|
|
118
|
+
|
|
119
|
+
4. Findings:
|
|
120
|
+
- DELETE: 'canary_v3' (dead code, no callers, no config)
|
|
121
|
+
- GRADUATE: 'new_checkout', 'use_new_pricing', 'kill_legacy_admin' →
|
|
122
|
+
remove the flag check; keep the new behavior unconditionally
|
|
123
|
+
- KEEP: 'experimental_search'
|
|
124
|
+
- Codebase loses: 21 call sites, 1 unused flag definition
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
## Output Format
|
|
128
|
+
|
|
129
|
+
```markdown
|
|
130
|
+
## Config Audit: <scope>
|
|
131
|
+
|
|
132
|
+
### Env vars / config keys
|
|
133
|
+
| Key | Read sites | Defined? | Default? | Validated? | Notes |
|
|
134
|
+
|---|--:|---|---|---|---|
|
|
135
|
+
| DATABASE_URL | 4 | ✓ | ✗ | ✗ | add Zod parse |
|
|
136
|
+
| EXPERIMENTAL_FOO | 1 | ✗ | ✓ ('false') | ✓ | undocumented; either document or delete |
|
|
137
|
+
| ... | ... | ... | ... | ... | ... |
|
|
138
|
+
|
|
139
|
+
### Feature flags
|
|
140
|
+
- DELETE (no callers): canary_v3, legacy_dashboard_b
|
|
141
|
+
- GRADUATE (100%% production for >90 days): new_checkout, kill_legacy_admin
|
|
142
|
+
- KEEP (active experiment): experimental_search, ai_summarize_v2
|
|
143
|
+
|
|
144
|
+
### Secret-leak check
|
|
145
|
+
- 0 paths from secret reads to logger/metrics/tracer found ✓
|
|
146
|
+
```
|
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: codragraph-cross-repo-impact
|
|
3
|
+
description: "Use when assessing the blast radius of a change that crosses repository boundaries — a shared library used by multiple services, a contract / protobuf / OpenAPI schema consumed by N consumers, a microservices change. Examples: \"what services consume X\", \"cross-repo blast radius\", \"will this break the consumers\", \"who depends on this contract\""
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Cross-Repo Impact Analysis with CodraGraph
|
|
7
|
+
|
|
8
|
+
## When to Use
|
|
9
|
+
|
|
10
|
+
- "What other repos consume `<symbol>` from this one?"
|
|
11
|
+
- "If I change this gRPC method / OpenAPI route / protobuf message, what breaks?"
|
|
12
|
+
- "Cross-repo blast radius for `<change>`"
|
|
13
|
+
- Microservices architecture: assessing a contract change
|
|
14
|
+
- Shared-library author: deciding if a function is safe to remove
|
|
15
|
+
|
|
16
|
+
## Why CodraGraph helps here
|
|
17
|
+
|
|
18
|
+
CodraGraph's **groups** (sets of related repos sharing a `group.yaml`)
|
|
19
|
+
maintain a Contract Registry — provider/consumer rows for every cross-repo
|
|
20
|
+
reference (gRPC service / method, OpenAPI route, protobuf message). The
|
|
21
|
+
group-mode `impact` walks both the local call graph AND the contract
|
|
22
|
+
bridges, so a single call returns the blast radius across every member
|
|
23
|
+
repo. You don't need to re-run impact in each consumer separately.
|
|
24
|
+
|
|
25
|
+
## Workflow
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
1. Identify the group:
|
|
29
|
+
codragraph_group_list({})
|
|
30
|
+
→ list of groups + their member repos
|
|
31
|
+
|
|
32
|
+
2. Confirm the symbol exists in the producer repo:
|
|
33
|
+
codragraph_context({repo: "<producerRepo>", name: "<symbol>"})
|
|
34
|
+
|
|
35
|
+
3. Run group-mode impact:
|
|
36
|
+
codragraph_impact({repo: "@<group>", target: "<symbol>", direction: "upstream"})
|
|
37
|
+
→ d=1 callers spanning every group member
|
|
38
|
+
(in-repo callers + contract-bridge consumers)
|
|
39
|
+
|
|
40
|
+
4. Inspect the Contract Registry to see provider/consumer rows directly:
|
|
41
|
+
READ codragraph://group/<groupName>/contracts
|
|
42
|
+
→ list of contracts touching the symbol or its API
|
|
43
|
+
|
|
44
|
+
5. Check group-status / staleness:
|
|
45
|
+
READ codragraph://group/<groupName>/status
|
|
46
|
+
→ which member repos haven't been re-indexed recently
|
|
47
|
+
(stale members produce stale impact results)
|
|
48
|
+
|
|
49
|
+
6. Surface the worst-case consumer:
|
|
50
|
+
any consumer not updated since the schema change = potential breakage
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
> If any member repo's index is stale, group-mode impact may underreport.
|
|
54
|
+
> Re-analyze stale members before relying on the results.
|
|
55
|
+
|
|
56
|
+
## Checklist
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
- [ ] group_list to confirm the group exists and the producer is a member
|
|
60
|
+
- [ ] context on the symbol in the producer repo
|
|
61
|
+
- [ ] Group-mode impact upstream
|
|
62
|
+
- [ ] Inspect Contract Registry for provider/consumer rows
|
|
63
|
+
- [ ] Check group/status for stale members; re-analyze if needed
|
|
64
|
+
- [ ] List affected consumer repos by impact depth
|
|
65
|
+
- [ ] Recommend coordinated PRs across consumers (if breaking)
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## When to Use Which Tool
|
|
69
|
+
|
|
70
|
+
| Question | Tool |
|
|
71
|
+
| --- | --- |
|
|
72
|
+
| "Which repos are in my group?" | `group_list` |
|
|
73
|
+
| "What contracts cross between member A and member B?" | Contract Registry resource |
|
|
74
|
+
| "If I change this provider method, what breaks?" | Group-mode `impact` |
|
|
75
|
+
| "Are all consumers up to date with the latest provider commit?" | `group/<name>/status` resource |
|
|
76
|
+
| "What's the structural diff between last release and now in repo X?" | Per-repo `diff --semantic` |
|
|
77
|
+
|
|
78
|
+
## Example: "Will renaming `getUserProfile` break my microservices?"
|
|
79
|
+
|
|
80
|
+
```
|
|
81
|
+
1. codragraph_group_list({})
|
|
82
|
+
→ group "platform": [user-service, web-app, mobile-bff, admin-portal]
|
|
83
|
+
|
|
84
|
+
2. codragraph_context({repo: "user-service", name: "getUserProfile"})
|
|
85
|
+
→ exported gRPC method in user.proto, defined in user-service
|
|
86
|
+
|
|
87
|
+
3. codragraph_impact({repo: "@platform", target: "getUserProfile", direction: "upstream"})
|
|
88
|
+
→ d=1 callers (across the group):
|
|
89
|
+
- web-app/src/api/userClient.ts (CALLS via grpc-web)
|
|
90
|
+
- mobile-bff/internal/user.go (CALLS via grpc.NewClient)
|
|
91
|
+
- admin-portal/src/services/users.tsx (CALLS via grpc-web)
|
|
92
|
+
→ 3 consumer repos depend on this method by exact name.
|
|
93
|
+
|
|
94
|
+
4. READ codragraph://group/platform/contracts
|
|
95
|
+
→ user.UserService.getUserProfile: provider=user-service,
|
|
96
|
+
consumers=[web-app, mobile-bff, admin-portal]
|
|
97
|
+
|
|
98
|
+
5. READ codragraph://group/platform/status
|
|
99
|
+
→ web-app last indexed 2 hours ago ✓
|
|
100
|
+
→ mobile-bff last indexed 3 days ago ⚠ (might miss recent callers)
|
|
101
|
+
→ admin-portal last indexed 1 month ago ⚠⚠ (re-analyze first!)
|
|
102
|
+
|
|
103
|
+
6. Recommendation:
|
|
104
|
+
- HIGH-RISK rename. 3 consumer repos must change in lockstep.
|
|
105
|
+
- Re-index admin-portal before trusting the d=1 list.
|
|
106
|
+
- Coordinated PR sequence:
|
|
107
|
+
1. Add new method (getUserProfileV2) in user-service
|
|
108
|
+
2. Migrate web-app, mobile-bff, admin-portal to V2
|
|
109
|
+
3. Remove getUserProfile in user-service after all consumers ship
|
|
110
|
+
- Alternative: keep both, deprecate old, drop in next major.
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## Output Format
|
|
114
|
+
|
|
115
|
+
```markdown
|
|
116
|
+
## Cross-Repo Impact: `<symbol>` in `<producer-repo>` (group `@<group>`)
|
|
117
|
+
|
|
118
|
+
### Consumers (d=1)
|
|
119
|
+
| Repo | Caller | Path | Notes |
|
|
120
|
+
| --- | --- | --- | --- |
|
|
121
|
+
| web-app | userClient.ts | grpc-web | active |
|
|
122
|
+
| mobile-bff | internal/user.go | grpc native | active |
|
|
123
|
+
| admin-portal | services/users.tsx | grpc-web | last indexed 1mo ago ⚠ |
|
|
124
|
+
|
|
125
|
+
### Contracts touching this symbol
|
|
126
|
+
- `user.UserService.getUserProfile` (provider: user-service)
|
|
127
|
+
|
|
128
|
+
### Staleness
|
|
129
|
+
Re-analyze `admin-portal` before trusting these results.
|
|
130
|
+
|
|
131
|
+
### Recommended migration sequence
|
|
132
|
+
1. Add `getUserProfileV2` alongside the old method
|
|
133
|
+
2. Migrate consumers to V2 (separate PRs per repo)
|
|
134
|
+
3. Remove old method after all consumers ship
|
|
135
|
+
```
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: codragraph-data-lineage
|
|
3
|
+
description: "Use when tracing data flow through an ETL pipeline, finding where a column or table is read/written, mapping data dependencies in a notebook-heavy or data-engineering project. Examples: \"where does this data come from\", \"trace this column\", \"data lineage for X\", \"who reads from this table\", \"what's downstream of this query\""
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Data Lineage with CodraGraph
|
|
7
|
+
|
|
8
|
+
## When to Use
|
|
9
|
+
|
|
10
|
+
- "Where does the `user_events` table get written?"
|
|
11
|
+
- "Trace the data flow that produces `daily_revenue.csv`"
|
|
12
|
+
- "What's downstream of this Snowflake query?"
|
|
13
|
+
- "Which notebook cells transform this DataFrame?"
|
|
14
|
+
- Auditing a data pipeline before changing a schema
|
|
15
|
+
- Understanding an unfamiliar ETL project
|
|
16
|
+
|
|
17
|
+
## Why CodraGraph helps here
|
|
18
|
+
|
|
19
|
+
Data pipelines often look like a graph of small functions: `extract_*`,
|
|
20
|
+
`transform_*`, `load_*`, `enrich_*`. Their connections are *function calls*
|
|
21
|
+
plus *string-literal table names* and *file paths*. CodraGraph already
|
|
22
|
+
captures the call graph; combining it with `query` over identifier strings
|
|
23
|
+
gives you data lineage at the *symbol* level, regardless of whether your
|
|
24
|
+
pipeline is in vanilla Python, Airflow, dbt, Pandas, PySpark, or Notebooks.
|
|
25
|
+
|
|
26
|
+
## Workflow
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
1. codragraph_query({query: "<table_or_column_name>"})
|
|
30
|
+
→ find every symbol that mentions the data identifier
|
|
31
|
+
|
|
32
|
+
2. For each candidate symbol:
|
|
33
|
+
codragraph_context({name: "<symbol>"})
|
|
34
|
+
→ see callers (who triggers this read/write) and callees (what it depends on)
|
|
35
|
+
|
|
36
|
+
3. Walk the producer side (downstream → upstream):
|
|
37
|
+
codragraph_impact({target: "<load function>", direction: "upstream"})
|
|
38
|
+
→ trace back to where the data originates
|
|
39
|
+
|
|
40
|
+
4. Walk the consumer side (upstream → downstream):
|
|
41
|
+
codragraph_impact({target: "<extract function>", direction: "downstream"})
|
|
42
|
+
→ trace forward to every transform / sink that depends on it
|
|
43
|
+
|
|
44
|
+
5. READ codragraph://repo/{name}/process/<pipeline-flow>
|
|
45
|
+
→ the canonical step-by-step flow if CodraGraph detected this as a process
|
|
46
|
+
|
|
47
|
+
6. Build the lineage diagram: source → transform stages → sink
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
> CodraGraph is graph-aware, not SQL-aware: it sees a string literal that
|
|
51
|
+
> *looks* like a table name, but doesn't parse SQL semantics. For mature SQL
|
|
52
|
+
> lineage tooling (column-level resolution), pair with `codragraph-sql-tracing`
|
|
53
|
+
> skill and a SQL parser like sqlglot.
|
|
54
|
+
|
|
55
|
+
## Checklist
|
|
56
|
+
|
|
57
|
+
```
|
|
58
|
+
- [ ] query for the table/column/file identifier
|
|
59
|
+
- [ ] context on each candidate to map producers vs consumers
|
|
60
|
+
- [ ] impact upstream on the load/sink function
|
|
61
|
+
- [ ] impact downstream on the extract/source function
|
|
62
|
+
- [ ] Cross-reference with processes for canonical pipeline flows
|
|
63
|
+
- [ ] Render the lineage as: source → transform_1 → transform_2 → sink
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
## Identifier Patterns to Search
|
|
67
|
+
|
|
68
|
+
| Layer | Search hints |
|
|
69
|
+
| --- | --- |
|
|
70
|
+
| File-based source | filename, path fragments (`raw_events.parquet`) |
|
|
71
|
+
| Database table | bare table name + `FROM table_name` |
|
|
72
|
+
| Column / field | column name in conjunction with the table name |
|
|
73
|
+
| API endpoint | URL path or function name (`fetchUserEvents`) |
|
|
74
|
+
| Event topic / queue | topic name (`user.signup.v2`) |
|
|
75
|
+
|
|
76
|
+
## Example: "Where does daily_revenue.csv come from?"
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
1. codragraph_query({query: "daily_revenue"})
|
|
80
|
+
→ 4 symbols:
|
|
81
|
+
- write_daily_revenue (src/etl/daily.py)
|
|
82
|
+
- read_daily_revenue (src/dashboards/finance.py)
|
|
83
|
+
- DailyRevenueRow (src/schemas/types.py)
|
|
84
|
+
- daily_revenue_dag (airflow/dags/finance_etl.py)
|
|
85
|
+
|
|
86
|
+
2. codragraph_context({name: "write_daily_revenue"})
|
|
87
|
+
→ callers: daily_revenue_dag (Airflow task)
|
|
88
|
+
→ callees: aggregate_orders, attach_currency_rates, format_csv_row
|
|
89
|
+
|
|
90
|
+
3. codragraph_impact({target: "aggregate_orders", direction: "upstream"})
|
|
91
|
+
→ reads from: orders_raw, returns_raw (both tables)
|
|
92
|
+
|
|
93
|
+
4. codragraph_impact({target: "read_daily_revenue", direction: "downstream"})
|
|
94
|
+
→ consumed by: finance_dashboard.render(), revenue_alerts.check()
|
|
95
|
+
|
|
96
|
+
5. READ codragraph://repo/CodraGraph/process/DailyRevenueETL
|
|
97
|
+
→ 6 steps:
|
|
98
|
+
fetch_orders → fetch_returns → aggregate_orders →
|
|
99
|
+
attach_currency_rates → format_csv_row → write_daily_revenue
|
|
100
|
+
|
|
101
|
+
Lineage:
|
|
102
|
+
orders_raw, returns_raw (DB tables)
|
|
103
|
+
↓
|
|
104
|
+
fetch_orders + fetch_returns (extract)
|
|
105
|
+
↓
|
|
106
|
+
aggregate_orders → attach_currency_rates → format_csv_row (transform)
|
|
107
|
+
↓
|
|
108
|
+
daily_revenue.csv (sink)
|
|
109
|
+
↓
|
|
110
|
+
finance_dashboard, revenue_alerts (consumers)
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
## Output Format
|
|
114
|
+
|
|
115
|
+
```markdown
|
|
116
|
+
## Data Lineage: <data-asset>
|
|
117
|
+
|
|
118
|
+
### Sources
|
|
119
|
+
- `orders_raw` (DB)
|
|
120
|
+
- `returns_raw` (DB)
|
|
121
|
+
|
|
122
|
+
### Pipeline (DailyRevenueETL flow, 6 steps)
|
|
123
|
+
1. `fetch_orders` — reads `orders_raw`
|
|
124
|
+
2. `fetch_returns` — reads `returns_raw`
|
|
125
|
+
3. `aggregate_orders` — joins, sums by day
|
|
126
|
+
4. `attach_currency_rates` — enriches with FX
|
|
127
|
+
5. `format_csv_row` — schema-conforming serialization
|
|
128
|
+
6. `write_daily_revenue` — writes `daily_revenue.csv`
|
|
129
|
+
|
|
130
|
+
### Consumers
|
|
131
|
+
- `finance_dashboard` (renders chart)
|
|
132
|
+
- `revenue_alerts` (threshold checks)
|
|
133
|
+
|
|
134
|
+
### Risk if `<schema/source>` changes
|
|
135
|
+
- 6 transform stages depend on it
|
|
136
|
+
- 2 consumers depend on the output schema
|
|
137
|
+
```
|
|
@@ -0,0 +1,119 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: codragraph-dead-code
|
|
3
|
+
description: "Use when the user wants to find unused code, orphan functions/classes, dead code, or symbols safe to delete. Examples: \"what's unused\", \"find dead code\", \"can I delete this function\", \"orphan symbols\", \"clean up unused exports\""
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Dead Code Detection with CodraGraph
|
|
7
|
+
|
|
8
|
+
## When to Use
|
|
9
|
+
|
|
10
|
+
- "What's unused in this codebase?"
|
|
11
|
+
- "Find dead code / orphan functions / unused exports"
|
|
12
|
+
- "Can I safely delete `<symbol>`?"
|
|
13
|
+
- Periodic cleanup before a release
|
|
14
|
+
- Identifying candidates for the next refactor
|
|
15
|
+
|
|
16
|
+
## Why CodraGraph helps here
|
|
17
|
+
|
|
18
|
+
Linters can find unreachable code in *one file*. CodraGraph walks the
|
|
19
|
+
*whole-repo* call graph, including dynamic dispatch, framework entry
|
|
20
|
+
points, and exports — so it can tell you a symbol is genuinely unreachable
|
|
21
|
+
rather than just "the linter couldn't see the caller."
|
|
22
|
+
|
|
23
|
+
The trick is to combine three signals:
|
|
24
|
+
|
|
25
|
+
1. **No incoming references** — `cypher` query for symbols with zero
|
|
26
|
+
`<-[:CALLS|REFERENCES]-` edges.
|
|
27
|
+
2. **Not exported** — internal-only symbols are stronger candidates than
|
|
28
|
+
`isExported = true` ones (which may be public API).
|
|
29
|
+
3. **Not in any process** — execution flows are the canonical "this is
|
|
30
|
+
used" signal; symbols outside every process are extra suspect.
|
|
31
|
+
|
|
32
|
+
## Workflow
|
|
33
|
+
|
|
34
|
+
```
|
|
35
|
+
1. codragraph_cypher({query: `
|
|
36
|
+
MATCH (n)
|
|
37
|
+
WHERE NOT (n)<-[:CALLS|REFERENCES]-()
|
|
38
|
+
AND NOT (n.isExported = true)
|
|
39
|
+
RETURN n.id, n.name, labels(n)[0] AS label, n.filePath
|
|
40
|
+
LIMIT 200
|
|
41
|
+
`})
|
|
42
|
+
→ list of orphan candidates
|
|
43
|
+
|
|
44
|
+
2. For each candidate:
|
|
45
|
+
codragraph_context({name: "<candidate>"})
|
|
46
|
+
→ confirm: 0 callers, 0 callees that matter, not in any process
|
|
47
|
+
|
|
48
|
+
3. Cross-check against processes:
|
|
49
|
+
READ codragraph://repo/{name}/processes
|
|
50
|
+
→ if the symbol appears in ANY process, it's not actually dead
|
|
51
|
+
|
|
52
|
+
4. For exported orphans (potentially public API):
|
|
53
|
+
codragraph_impact({target: "<symbol>", direction: "upstream"})
|
|
54
|
+
→ if d=1 has external callers (in another indexed repo group), keep it
|
|
55
|
+
|
|
56
|
+
5. Group by file/cluster, prioritize by file size of dead code
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
> If "Index is stale" → run `npx @codragraph/cli analyze` first. Stale
|
|
60
|
+
> indexes produce false-positive dead-code reports because new callers
|
|
61
|
+
> aren't visible.
|
|
62
|
+
|
|
63
|
+
## Checklist
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
- [ ] Cypher query for orphan symbols (no incoming edges, not exported)
|
|
67
|
+
- [ ] context check on each candidate (confirm 0 callers)
|
|
68
|
+
- [ ] Cross-reference with processes (symbols in flows are not dead)
|
|
69
|
+
- [ ] For exported orphans, impact across groups (cross-repo callers?)
|
|
70
|
+
- [ ] Group findings by file → suggest which files can lose the most code
|
|
71
|
+
- [ ] Flag any candidate that's a framework convention (e.g., default export
|
|
72
|
+
of a Next.js page route) — those LOOK orphan but aren't
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## Pitfalls
|
|
76
|
+
|
|
77
|
+
| Pitfall | What to do |
|
|
78
|
+
| --- | --- |
|
|
79
|
+
| Framework conventions (Next.js pages, Astro routes, Django URLs) | Check `isEntryPoint` on the node — these often score high |
|
|
80
|
+
| Test-only symbols | Filter `filePath CONTAINS '/test'` separately |
|
|
81
|
+
| Re-exported symbols | A re-export creates `REFERENCES` edges; a true orphan has none |
|
|
82
|
+
| Dynamic dispatch (factories, plugin systems) | Cross-check with `query` for the registration string |
|
|
83
|
+
|
|
84
|
+
## Example: "Clean up unused code in src/utils/"
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
1. codragraph_cypher({
|
|
88
|
+
query: `MATCH (n) WHERE n.filePath STARTS WITH 'src/utils/'
|
|
89
|
+
AND NOT (n)<-[:CALLS|REFERENCES]-()
|
|
90
|
+
AND NOT (n.isExported = true)
|
|
91
|
+
RETURN n.id, n.name, n.filePath`
|
|
92
|
+
})
|
|
93
|
+
→ 7 candidates
|
|
94
|
+
|
|
95
|
+
2. codragraph_context({name: "formatLegacyDate"})
|
|
96
|
+
→ 0 callers, 0 callees, not in any process. Truly dead.
|
|
97
|
+
|
|
98
|
+
3. codragraph_context({name: "DEBUG_TIMER"})
|
|
99
|
+
→ 0 callers but called dynamically via process.env injection.
|
|
100
|
+
→ Keep it.
|
|
101
|
+
|
|
102
|
+
4. Final: 6 of 7 candidates safe to delete. Total: 142 LoC across 4 files.
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
## Output Format
|
|
106
|
+
|
|
107
|
+
```markdown
|
|
108
|
+
## Dead Code Audit: <scope>
|
|
109
|
+
|
|
110
|
+
### High-confidence (0 callers, 0 callees, not in any process)
|
|
111
|
+
- `formatLegacyDate` — `src/utils/date.ts:42` (12 LoC)
|
|
112
|
+
- ...
|
|
113
|
+
|
|
114
|
+
### Possibly dead (verify dynamic dispatch first)
|
|
115
|
+
- `DEBUG_TIMER` — used via env-driven hook?
|
|
116
|
+
|
|
117
|
+
### Total cleanup potential
|
|
118
|
+
N functions, M LoC, X files.
|
|
119
|
+
```
|