@codragraph/cli 1.6.4 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/README.md +34 -0
  2. package/dist/cli/analyze.d.ts +22 -0
  3. package/dist/cli/analyze.js +107 -4
  4. package/dist/cli/compress-stats.d.ts +29 -0
  5. package/dist/cli/compress-stats.js +97 -0
  6. package/dist/cli/graphstore.d.ts +6 -2
  7. package/dist/cli/graphstore.js +24 -2
  8. package/dist/cli/index.js +16 -2
  9. package/dist/cli/profile-heap.d.ts +35 -0
  10. package/dist/cli/profile-heap.js +126 -0
  11. package/dist/cli/setup.d.ts +13 -0
  12. package/dist/cli/setup.js +22 -11
  13. package/dist/cli/skill-gen.d.ts +14 -2
  14. package/dist/cli/skill-gen.js +52 -19
  15. package/dist/cli/tool.js +4 -0
  16. package/dist/core/embeddings/embedding-pipeline.js +24 -7
  17. package/dist/core/group/bridge-db.js +111 -24
  18. package/dist/core/lbug/content-read.d.ts +46 -0
  19. package/dist/core/lbug/content-read.js +64 -0
  20. package/dist/core/lbug/csv-generator.d.ts +2 -6
  21. package/dist/core/lbug/csv-generator.js +45 -12
  22. package/dist/core/lbug/lbug-adapter.d.ts +4 -1
  23. package/dist/core/lbug/lbug-adapter.js +153 -21
  24. package/dist/core/lbug/schema.d.ts +7 -7
  25. package/dist/core/lbug/schema.js +18 -0
  26. package/dist/core/run-analyze.d.ts +13 -0
  27. package/dist/core/run-analyze.js +91 -4
  28. package/dist/core/search/bm25-index.js +67 -15
  29. package/dist/mcp/local/local-backend.js +22 -5
  30. package/dist/server/api.js +4 -3
  31. package/dist/storage/repo-manager.d.ts +39 -0
  32. package/dist/storage/repo-manager.js +19 -0
  33. package/hooks/claude/codragraph-hook.cjs +95 -2
  34. package/package.json +4 -4
  35. package/scripts/build-tree-sitter-proto.cjs +15 -3
  36. package/scripts/patch-tree-sitter-swift.cjs +17 -4
  37. package/skills/codragraph-api-surface.md +110 -0
  38. package/skills/codragraph-config-audit.md +146 -0
  39. package/skills/codragraph-cross-repo-impact.md +135 -0
  40. package/skills/codragraph-data-lineage.md +137 -0
  41. package/skills/codragraph-dead-code.md +119 -0
  42. package/skills/codragraph-gh-actions-debug.md +162 -0
  43. package/skills/codragraph-gh-issue-workflow.md +178 -0
  44. package/skills/codragraph-gh-pr-workflow.md +176 -0
  45. package/skills/codragraph-gh-release-workflow.md +187 -0
  46. package/skills/codragraph-git-bisect.md +176 -0
  47. package/skills/codragraph-git-force-push.md +147 -0
  48. package/skills/codragraph-git-history-rewrite.md +174 -0
  49. package/skills/codragraph-git-rebase-vs-merge.md +138 -0
  50. package/skills/codragraph-git-recovery.md +181 -0
  51. package/skills/codragraph-git-worktree.md +145 -0
  52. package/skills/codragraph-migration-tracking.md +130 -0
  53. package/skills/codragraph-notebook-context.md +136 -0
  54. package/skills/codragraph-observability-coverage.md +125 -0
  55. package/skills/codragraph-onboarding.md +129 -0
  56. package/skills/codragraph-perf-hotspots.md +132 -0
  57. package/skills/codragraph-project-switcher.md +116 -0
  58. package/skills/codragraph-security-audit.md +144 -0
  59. package/skills/codragraph-sql-tracing.md +122 -0
  60. package/skills/codragraph-supply-chain-audit.md +153 -0
  61. package/skills/codragraph-test-coverage.md +97 -0
@@ -0,0 +1,181 @@
1
+ ---
2
+ name: codragraph-git-recovery
3
+ description: "Use when work appears lost — bad reset, dropped commit, blown-away stash, deleted branch, broken merge, accidental rm in tracked files. Covers reflog spelunking, stash recovery, dangling-commit revival, and what's actually unrecoverable. Examples: \"I lost my commits\", \"deleted my branch\", \"reset --hard by mistake\", \"recover work\", \"reflog\""
4
+ ---
5
+
6
+ # Lost-Work Recovery Playbook
7
+
8
+ ## When to Use
9
+
10
+ - "I ran `git reset --hard` and lost my work"
11
+ - "I deleted a branch — can I get it back?"
12
+ - "I dropped a stash by accident"
13
+ - "I rm'd a file and committed before I noticed"
14
+ - "Help, I think I lost everything."
15
+
16
+ ## The first-aid rules
17
+
18
+ 1. **STOP committing.** Every new commit pushes the lost ones further into
19
+ the reflog. Don't `git gc`, don't clone fresh.
20
+ 2. **Don't close the terminal yet.** Some recovery paths (e.g.,
21
+ `ORIG_HEAD`) live until you run another mutating command.
22
+ 3. **Reflog is your friend.** Default expiry is 90 days for reachable
23
+ commits, 30 days for unreachable. You almost always have time.
24
+ 4. **Local-only state survives EVERYTHING except** explicit `git gc
25
+ --prune=now` and disk loss. Even `reset --hard` leaves the old SHAs
26
+ in the reflog.
27
+
28
+ ## Recovery paths by symptom
29
+
30
+ ### "I ran `git reset --hard` and lost commits"
31
+
32
+ ```bash
33
+ git reflog # newest first
34
+ # → look for "HEAD@{1}: commit: <message>" — that was you, before reset
35
+ git reset --hard HEAD@{1} # OR the explicit SHA
36
+ ```
37
+
38
+ `ORIG_HEAD` is the convenience alias for "what HEAD was before the most
39
+ recent destructive operation":
40
+
41
+ ```bash
42
+ git reset --hard ORIG_HEAD # undoes the last reset / merge / rebase
43
+ ```
44
+
45
+ ### "I deleted a branch (`git branch -D`) and want it back"
46
+
47
+ ```bash
48
+ git reflog # the last entry of that branch is here
49
+ # → look for "<branch>@{N}: ..." OR walk HEAD's history to find where you were
50
+ git branch <branch> <sha> # recreates the branch at the old tip
51
+ ```
52
+
53
+ If the reflog entry for the branch is gone (rare), search dangling
54
+ commits:
55
+
56
+ ```bash
57
+ git fsck --lost-found
58
+ # → "dangling commit <sha>" lines
59
+ git show <sha> # inspect each candidate
60
+ git branch <branch> <sha> # restore the one you want
61
+ ```
62
+
63
+ ### "I dropped a stash with `stash drop` (or by accident)"
64
+
65
+ ```bash
66
+ git fsck --unreachable | grep commit
67
+ # → unreachable commits include dropped stashes
68
+ git show <sha> # confirm it's the right stash
69
+ git stash apply <sha> # apply it back (or stash branch <name> <sha>)
70
+ ```
71
+
72
+ ### "I committed a file I shouldn't have" (no remote push yet)
73
+
74
+ ```bash
75
+ git reset --soft HEAD~1 # undo the commit, keep the changes staged
76
+ git restore --staged <file> # unstage the offending file
77
+ echo "<file>" >> .gitignore
78
+ git commit -m "<original message>"
79
+ ```
80
+
81
+ If you already pushed:
82
+
83
+ ```bash
84
+ git reset --soft HEAD~1
85
+ git restore --staged <file>
86
+ echo "<file>" >> .gitignore
87
+ git commit -m "<original message>"
88
+ git push --force-with-lease # SEE codragraph-git-force-push skill
89
+ ```
90
+
91
+ If the file contained secrets, `--force-with-lease` is NOT enough —
92
+ rotate the secret first, then use `git filter-repo` or BFG to scrub
93
+ history (see codragraph-git-history-rewrite).
94
+
95
+ ### "I deleted a file with `rm` (NOT git rm) and haven't committed"
96
+
97
+ ```bash
98
+ git restore <file> # restores from index
99
+ # OR if it was never tracked, you're out of luck — check editor backups / Time Machine
100
+ ```
101
+
102
+ ### "I have a broken merge in progress"
103
+
104
+ ```bash
105
+ git merge --abort # cleanly back out
106
+ git rebase --abort # for in-progress rebase
107
+ git cherry-pick --abort # for in-progress cherry-pick
108
+ ```
109
+
110
+ If `--abort` doesn't work (rare):
111
+
112
+ ```bash
113
+ git reset --merge ORIG_HEAD
114
+ ```
115
+
116
+ ## When recovery WON'T work
117
+
118
+ | Situation | Recoverable? |
119
+ |---|---|
120
+ | Reset --hard within last 90 days (no `gc --prune=now`) | ✓ via reflog |
121
+ | Branch deleted within last 30 days | ✓ via fsck or branch reflog |
122
+ | Stash dropped within last 30 days | ✓ via fsck |
123
+ | Untracked / unstaged files deleted with `rm` | ✗ git never saw them |
124
+ | `.git/` dir deleted | ✗ catastrophic; restore from backup or remote |
125
+ | `git gc --prune=now` after the disaster | ✗ unreachable objects gone |
126
+ | Force-pushed-over commits AND local clone gone | ✗ unless GitHub Support recovers |
127
+
128
+ ## Recovery diagnostics
129
+
130
+ ```bash
131
+ # What does my reflog look like?
132
+ git reflog --all --date=iso | head -50
133
+
134
+ # What dangling commits exist?
135
+ git fsck --lost-found
136
+
137
+ # What was HEAD recently?
138
+ git reflog show HEAD --date=iso | head -10
139
+
140
+ # What was BRANCH recently?
141
+ git reflog show <branch> --date=iso | head -10
142
+
143
+ # Where did this commit live? (by sha)
144
+ git branch --contains <sha>
145
+ git tag --contains <sha>
146
+ ```
147
+
148
+ ## Why CodraGraph helps after recovery
149
+
150
+ Recovery is mechanical (reflog → reset). The real question is "did I
151
+ actually get back what I lost?" CodraGraph's diff lets you compare:
152
+
153
+ ```bash
154
+ # Compare your recovered tree to the suspected lost-work SHA:
155
+ codragraph diff <recovered-head> <reflog-candidate-sha> --semantic
156
+
157
+ # If you see addedAPIs in the candidate that aren't in your recovered HEAD,
158
+ # the recovery is incomplete — try a different reflog entry.
159
+ ```
160
+
161
+ For deleted branches with no clear "right" tip:
162
+
163
+ ```bash
164
+ git fsck --lost-found
165
+ # For each candidate dangling-commit SHA:
166
+ codragraph diff main <dangling-sha> --semantic
167
+ # Pick the one whose diff matches the work you remember doing.
168
+ ```
169
+
170
+ ## Checklist when in panic mode
171
+
172
+ ```
173
+ - [ ] STOP committing / running git gc / reinstalling
174
+ - [ ] git reflog | head -30 — see what's there
175
+ - [ ] Identify the candidate SHA for the lost work
176
+ - [ ] git show <sha> to verify it's the right thing
177
+ - [ ] git reset --hard / git branch <name> <sha> / git stash apply <sha>
178
+ - [ ] codragraph diff to verify the recovery captured the real work
179
+ - [ ] Set up git config --global core.autocrlf and longer reflog expiry
180
+ (gc.reflogExpire) for the future
181
+ ```
@@ -0,0 +1,145 @@
1
+ ---
2
+ name: codragraph-git-worktree
3
+ description: "Use when working on multiple branches in parallel without re-cloning, juggling a long-running branch alongside hotfixes, or running CI/build on one branch while editing another. Covers `git worktree` setup, conventions, and gotchas. Examples: \"work on two branches at once\", \"git worktree\", \"hotfix without losing my context\", \"avoid stash-juggling\""
4
+ ---
5
+
6
+ # Parallel Branches with `git worktree`
7
+
8
+ ## When to Use
9
+
10
+ - "I'm in the middle of something and need to do a hotfix on main."
11
+ - "I want CI to run on one branch while I edit another."
12
+ - "I have N agents working on N branches simultaneously."
13
+ - "I'm sick of `git stash` between branch switches."
14
+
15
+ ## The idea
16
+
17
+ A *worktree* is a separate working directory backed by the same git
18
+ repository. One `.git/`, many checkouts. Switching between branches no
19
+ longer requires re-running build / re-installing deps / re-warming editor
20
+ indexes — you just `cd` to a different directory.
21
+
22
+ ## Setup
23
+
24
+ ```bash
25
+ # From inside your existing repo (e.g. ~/code/myrepo):
26
+ git worktree add ../myrepo-hotfix main # check out 'main' at ../myrepo-hotfix
27
+ git worktree add ../myrepo-feat-x feat/x # check out feat/x at ../myrepo-feat-x
28
+
29
+ # List all worktrees:
30
+ git worktree list
31
+ # → /home/anit/code/myrepo abc1234 [feature/foo]
32
+ # → /home/anit/code/myrepo-hotfix def5678 [main]
33
+ # → /home/anit/code/myrepo-feat-x fed9876 [feat/x]
34
+
35
+ # Each is a real working directory. cd into it; git commands operate on its
36
+ # checked-out branch automatically.
37
+ cd ../myrepo-hotfix
38
+ git status # → on branch 'main'
39
+ ```
40
+
41
+ ## Conventions that scale
42
+
43
+ ```
44
+ ~/code/
45
+ ├── myrepo/ # main worktree (your "primary" branch's checkout)
46
+ ├── myrepo-hotfix/ # for emergency fixes off main
47
+ ├── myrepo-pr-143/ # parking a PR for review
48
+ └── myrepo-bisect/ # dedicated to long-running bisects
49
+ ```
50
+
51
+ Or co-locate inside the primary repo as siblings under `.worktrees/`:
52
+
53
+ ```bash
54
+ mkdir -p .worktrees
55
+ git worktree add .worktrees/hotfix main
56
+ echo ".worktrees/" >> .gitignore # don't index worktrees as files
57
+ ```
58
+
59
+ ## Removing worktrees
60
+
61
+ ```bash
62
+ git worktree remove ../myrepo-hotfix
63
+ # or, if the directory was already deleted manually:
64
+ git worktree prune
65
+ ```
66
+
67
+ ## Why CodraGraph helps here
68
+
69
+ Each worktree gets its own working directory but shares the `.git/`. This
70
+ matters for indexing: CodraGraph indexes `.codragraph/` in the working
71
+ directory, so each worktree gets its OWN index. That's actually useful:
72
+
73
+ ```bash
74
+ # Two worktrees, two independent CodraGraph indexes.
75
+ cd ~/code/myrepo # primary, on feature/foo
76
+ codragraph analyze # index of feature/foo
77
+
78
+ cd ~/code/myrepo-hotfix # worktree, on main
79
+ codragraph analyze # index of main
80
+
81
+ # Now you can run cross-worktree impact analysis:
82
+ codragraph_query({repo: "myrepo", query: "auth"}) # primary index
83
+ codragraph_query({repo: "myrepo-hotfix", query: "auth"}) # main's index
84
+ ```
85
+
86
+ This is especially handy for **migration tracking** (compare a
87
+ long-running migration branch's structural state to current main without
88
+ checking out and re-indexing repeatedly).
89
+
90
+ ## Pitfalls
91
+
92
+ | Pitfall | What happens | Fix |
93
+ |---|---|---|
94
+ | Same branch checked out in two worktrees | git refuses; only one worktree per branch | Use a different branch in the second worktree (or `--detach`) |
95
+ | Worktree dir deleted manually (no `worktree remove`) | git still tracks it as live | `git worktree prune` |
96
+ | Hooks / config differ across worktrees | Worktrees share `.git/` but each has its own `.git/info/` and per-worktree config | Use `git config --worktree` for per-worktree settings |
97
+ | Editor's "open repo" features confused | VS Code sometimes treats each worktree as its own repo | Open each worktree as a separate workspace; this is actually correct |
98
+ | `npm install` in every worktree | Slow, duplicated `node_modules` | Use pnpm with shared store, or yarn berry's PnP |
99
+
100
+ ## Variants
101
+
102
+ ```bash
103
+ # Detached HEAD worktree (no branch — useful for inspecting old commits):
104
+ git worktree add --detach ../myrepo-old <commit-sha>
105
+
106
+ # Lock a worktree (prevents `worktree prune` from cleaning it):
107
+ git worktree lock ../myrepo-old
108
+
109
+ # Move a worktree:
110
+ git worktree move ../myrepo-old ../myrepo-archive
111
+ ```
112
+
113
+ ## Checklist
114
+
115
+ ```
116
+ - [ ] Created worktree with a clear directory name and the right branch
117
+ - [ ] Confirmed `git worktree list` shows all expected entries
118
+ - [ ] Per-worktree CodraGraph index where relevant (codragraph analyze in each)
119
+ - [ ] Editor opened per worktree (separate workspace each)
120
+ - [ ] When done: git worktree remove <path> (or rm + git worktree prune)
121
+ ```
122
+
123
+ ## Example: "I need to ship a hotfix without losing my WIP"
124
+
125
+ ```bash
126
+ # Currently on feature/big-refactor, 8 uncommitted files, half-broken.
127
+
128
+ # 1. Create a worktree for the hotfix off main
129
+ git worktree add ../myrepo-hotfix main
130
+
131
+ # 2. cd over and do the work
132
+ cd ../myrepo-hotfix
133
+ git switch -c hotfix/null-pointer
134
+ # ... edit, test, commit ...
135
+ git push -u origin hotfix/null-pointer
136
+ gh pr create --title "..." --body "..."
137
+ gh pr merge --squash --delete-branch
138
+
139
+ # 3. Clean up the worktree
140
+ cd ../myrepo
141
+ git worktree remove ../myrepo-hotfix
142
+
143
+ # 4. Continue your refactor as if nothing happened
144
+ git status # still 8 uncommitted files, intact
145
+ ```
@@ -0,0 +1,130 @@
1
+ ---
2
+ name: codragraph-migration-tracking
3
+ description: "Use when tracking the progress of a phased refactor or migration (renaming an API, swapping a library, moving from class- to functional-components, deprecating a flag). Examples: \"how far is the migration\", \"what's left to migrate\", \"track this refactor\", \"are we done with the move from X to Y\", \"is the migration done\""
4
+ ---
5
+
6
+ # Migration Progress Tracking with CodraGraph
7
+
8
+ ## When to Use
9
+
10
+ - "How far along is the migration from `<old>` to `<new>`?"
11
+ - "What's left to migrate / refactor / deprecate?"
12
+ - "Are we done with `<old API>`?"
13
+ - Coordinating a phased refactor across many PRs
14
+ - Reporting migration status to stakeholders
15
+
16
+ ## Why CodraGraph helps here
17
+
18
+ Migrations span dozens of PRs and weeks. Without a structural index, the
19
+ question "are we done?" reduces to grep-and-eyeball. CodraGraph's versioned
20
+ graphstore lets you snapshot the codebase at the start of the migration,
21
+ then diff against today to see exactly what's converted and what isn't.
22
+ Pair with `cypher` to count remaining instances of the old pattern.
23
+
24
+ ## Workflow
25
+
26
+ ```
27
+ 1. Establish baseline (once, at migration start):
28
+ codragraph commit -m "migration baseline: pre-X-removal"
29
+ codragraph branch create migration-baseline
30
+ → captures the structural state for later comparison
31
+
32
+ 2. Count remaining old-API call sites today:
33
+ codragraph_cypher({query: `
34
+ MATCH (n)-[:CALLS]->(target)
35
+ WHERE target.name = '<oldFunction>'
36
+ RETURN n.filePath, count(n) AS callers
37
+ ORDER BY callers DESC
38
+ `})
39
+ → "27 callers in 14 files still using <oldFunction>"
40
+
41
+ 3. Diff against the baseline to see structural progress:
42
+ codragraph diff migration-baseline HEAD --semantic --json
43
+ → look at removedAPIs (old surface gone), addedAPIs (new surface added),
44
+ classifiedModifications (signatures swapped)
45
+
46
+ 4. Assess flows still touching the old API:
47
+ codragraph_impact({target: "<oldFunction>", direction: "upstream"})
48
+ → list of remaining callers grouped by depth
49
+
50
+ 5. Suggest the next batch of files to migrate (highest caller-count first)
51
+ ```
52
+
53
+ > If `migration-baseline` doesn't exist, you skipped step 1 — fall back to
54
+ > the earliest commit in `codragraph log` as a baseline (less precise but
55
+ > usable).
56
+
57
+ ## Checklist
58
+
59
+ ```
60
+ - [ ] Establish a baseline (branch / tagged commit) at migration start
61
+ - [ ] Cypher count of remaining old-API references
62
+ - [ ] codragraph diff baseline HEAD --semantic for structural progress
63
+ - [ ] impact upstream on the old API → list of remaining callers
64
+ - [ ] Group remaining work by file → suggest next batch
65
+ - [ ] Report: "<N>%% of <total> call sites converted. <K> files remaining."
66
+ ```
67
+
68
+ ## Migration Patterns This Catches
69
+
70
+ | Pattern | Cypher hint |
71
+ | --- | --- |
72
+ | API rename (foo → bar) | `MATCH ()-[:CALLS]->(n) WHERE n.name = 'foo' RETURN n.filePath, count(*)` |
73
+ | Library swap (lodash → native) | Filter on `filePath` for files still importing the old library |
74
+ | Class → functional component | Match by `n.label = 'Class'` in the relevant directory |
75
+ | Feature flag removal | Cypher for string literals matching the flag name |
76
+ | Type-system migration (any → typed) | `MATCH (n) WHERE n.returnType = 'any' OR n.returnType IS NULL` |
77
+
78
+ ## Example: "Track our migration from `validatePaymentV1` to `validatePaymentV2`"
79
+
80
+ ```
81
+ 1. (Baseline established 3 months ago: codragraph branch migration-v2-start)
82
+
83
+ 2. codragraph_cypher({query: `
84
+ MATCH (caller)-[:CALLS]->(target)
85
+ WHERE target.name STARTS WITH 'validatePayment'
86
+ RETURN target.name, count(caller) AS callers
87
+ `})
88
+ → validatePaymentV1: 8 callers
89
+ → validatePaymentV2: 31 callers
90
+
91
+ 3. codragraph diff migration-v2-start HEAD --semantic
92
+ → addedAPIs: validatePaymentV2 (and 4 helpers)
93
+ → classifiedModifications: 23 functions migrated from V1 to V2
94
+ → removedAPIs: 0 (V1 still exported)
95
+
96
+ 4. codragraph_impact({target: "validatePaymentV1", direction: "upstream"})
97
+ → d=1 callers (still on V1):
98
+ - legacyCheckout (src/legacy/checkout.ts)
99
+ - webhookV1 (src/webhooks/v1.ts)
100
+ - … 6 more
101
+
102
+ Report: 79%% migrated (31 / 39 callers). 8 callers in 3 files remaining.
103
+ Next batch: src/legacy/checkout.ts (5 callers in one file).
104
+ ```
105
+
106
+ ## Output Format
107
+
108
+ ```markdown
109
+ ## Migration Progress: <name>
110
+
111
+ ### Baseline
112
+ `migration-baseline` (3 months ago, before refactor started)
113
+
114
+ ### Current state
115
+ - **Converted:** 31 / 39 call sites (79%%)
116
+ - **Remaining:** 8 callers in 3 files
117
+ - **Old API surface:** still exported (cannot remove yet)
118
+ - **New API surface:** stable (4 helpers added)
119
+
120
+ ### Remaining work
121
+ | File | Old-API callers | Notes |
122
+ | --- | --- | --- |
123
+ | `src/legacy/checkout.ts` | 5 | one batch |
124
+ | `src/webhooks/v1.ts` | 2 | tied to legacy webhook contract |
125
+ | ... | ... | ... |
126
+
127
+ ### Done criteria
128
+ - 0 remaining callers
129
+ - removedAPIs in `codragraph diff` includes `validatePaymentV1`
130
+ ```
@@ -0,0 +1,136 @@
1
+ ---
2
+ name: codragraph-notebook-context
3
+ description: "Use when working with notebook-heavy projects (Jupyter, Databricks, Colab, Marimo) where each notebook contains long pipelines of cells, and the user needs to navigate, summarize, or refactor across them. Examples: \"what do these notebooks do\", \"summarize this analysis pipeline\", \"refactor cells from this notebook into modules\", \"data analysis project tour\""
4
+ ---
5
+
6
+ # Notebook-Heavy Project Navigation with CodraGraph
7
+
8
+ ## When to Use
9
+
10
+ - "What's in these notebooks?"
11
+ - "Summarize the analysis pipeline across `<notebook>.ipynb`"
12
+ - "Help me refactor a notebook into a proper module"
13
+ - "What functions defined in notebooks does the production code call?"
14
+ - "Audit the data analyses in this project"
15
+
16
+ ## Why CodraGraph helps here
17
+
18
+ Data-science / analytics projects often have 80%% of the logic inside
19
+ `.ipynb` files: top-level imports, helper functions, ad-hoc transforms.
20
+ CodraGraph indexes Python (the most common notebook language) and treats
21
+ notebook-derived code as first-class graph content. That means `query`,
22
+ `context`, and `impact` work on notebook-defined symbols just like
23
+ production-code symbols, so you can navigate a 30-cell notebook the same
24
+ way you'd navigate a typical module.
25
+
26
+ ## Workflow
27
+
28
+ ```
29
+ 1. List notebook-derived symbols:
30
+ codragraph_cypher({query: `
31
+ MATCH (n)
32
+ WHERE n.filePath ENDS WITH '.ipynb'
33
+ RETURN n.filePath, n.name, labels(n)[0] AS label
34
+ ORDER BY n.filePath, n.startLine
35
+ `})
36
+ → every function/class defined inside any notebook
37
+
38
+ 2. For each notebook of interest:
39
+ codragraph_query({query: "<notebook concept, e.g. 'monthly retention'>"})
40
+ → top-ranked symbols across notebooks (process-grouped)
41
+
42
+ 3. codragraph_context({name: "<notebook function>"})
43
+ → callers (other notebooks? production?) and callees (libraries used)
44
+
45
+ 4. Cross-notebook reuse check:
46
+ codragraph_impact({target: "<helper>", direction: "upstream"})
47
+ → if multiple notebooks call the same helper, that's a refactor candidate
48
+ (extract into a shared module)
49
+
50
+ 5. Production / notebook bridge:
51
+ codragraph_cypher({query: `
52
+ MATCH (caller)-[:CALLS]->(target)
53
+ WHERE NOT caller.filePath ENDS WITH '.ipynb'
54
+ AND target.filePath ENDS WITH '.ipynb'
55
+ RETURN caller.filePath, target.filePath, target.name
56
+ `})
57
+ → production code calling into notebooks (usually a bug — flag it)
58
+ ```
59
+
60
+ > Notebooks export when wrapped in nbconvert / papermill / databricks-cli.
61
+ > CodraGraph parses the `.ipynb` JSON and treats each code cell as part of
62
+ > the file's symbol space.
63
+
64
+ ## Checklist
65
+
66
+ ```
67
+ - [ ] Cypher query for all symbols with .ipynb file paths
68
+ - [ ] Group by notebook → see total symbol count per notebook
69
+ - [ ] query for the analysis topic to find top-ranked symbols
70
+ - [ ] context on key notebook helpers
71
+ - [ ] impact upstream on cross-notebook helpers → refactor candidates
72
+ - [ ] Cypher for production-code → notebook calls (bridge audit)
73
+ ```
74
+
75
+ ## Refactor Signals
76
+
77
+ | Signal | What to do |
78
+ | --- | --- |
79
+ | Same helper defined in 3+ notebooks | Extract into a shared `.py` module |
80
+ | Notebook function called from production code | Move to production module; notebook should re-import |
81
+ | Notebook with > 30 distinct symbols | Likely needs to be split (or graduated to a module) |
82
+ | Notebook calling another notebook | Strong refactor signal — extract the shared part |
83
+
84
+ ## Example: "Summarize the customer-churn analyses in notebooks/"
85
+
86
+ ```
87
+ 1. codragraph_cypher({
88
+ query: `MATCH (n) WHERE n.filePath STARTS WITH 'notebooks/'
89
+ AND n.filePath ENDS WITH '.ipynb'
90
+ RETURN n.filePath, count(n) AS symbols
91
+ ORDER BY symbols DESC`
92
+ })
93
+ → 4 notebooks: churn_baseline.ipynb (28 symbols),
94
+ churn_features.ipynb (35 symbols),
95
+ churn_model.ipynb (22 symbols),
96
+ churn_eval.ipynb (12 symbols)
97
+
98
+ 2. codragraph_query({query: "customer churn cohort"})
99
+ → top symbols across the 4 notebooks, grouped by detected processes
100
+
101
+ 3. codragraph_context({name: "compute_cohort_retention"})
102
+ → defined in: churn_features.ipynb
103
+ → called by: churn_model.ipynb, churn_eval.ipynb (TWO notebooks)
104
+ → REFACTOR CANDIDATE — extract into src/churn/cohort.py
105
+
106
+ 4. codragraph_cypher for production → notebook calls
107
+ → 1 hit: scripts/daily_churn_report.py imports from churn_eval.ipynb ⚠
108
+ → Production should not depend on a notebook. Extract.
109
+
110
+ Findings:
111
+ - 97 total symbols across 4 notebooks
112
+ - 1 multi-notebook helper (compute_cohort_retention) → extract to module
113
+ - 1 production → notebook bridge (daily_churn_report) → flag as tech debt
114
+ ```
115
+
116
+ ## Output Format
117
+
118
+ ```markdown
119
+ ## Notebook Tour: `notebooks/`
120
+
121
+ ### Notebooks
122
+ | Notebook | Symbols | Purpose (top-3 symbols) |
123
+ | --- | --- | --- |
124
+ | churn_baseline.ipynb | 28 | baseline_churn_rate, … |
125
+ | churn_features.ipynb | 35 | compute_cohort_retention, … |
126
+ | churn_model.ipynb | 22 | train_churn_classifier, … |
127
+ | churn_eval.ipynb | 12 | evaluate_churn_model, … |
128
+
129
+ ### Refactor candidates
130
+ 1. `compute_cohort_retention` — used in 2 notebooks → extract to `src/churn/cohort.py`
131
+ 2. ...
132
+
133
+ ### Bridge audits
134
+ - ⚠ `scripts/daily_churn_report.py` imports from `churn_eval.ipynb` —
135
+ production should not depend on a notebook.
136
+ ```