@oriro/orirocli 0.1.9 → 0.1.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (166) hide show
  1. package/README.md +16 -18
  2. package/dist/cli.js +4776 -2964
  3. package/package.json +2 -2
  4. package/skills/craft/ai-engineering/SKILL.md +2 -2
  5. package/skills/graphify/SKILL.md +0 -619
  6. package/skills/graphify/__init__.py +0 -28
  7. package/skills/graphify/__main__.py +0 -4582
  8. package/skills/graphify/affected.py +0 -154
  9. package/skills/graphify/always_on/agents-md.md +0 -12
  10. package/skills/graphify/always_on/antigravity-rules.md +0 -14
  11. package/skills/graphify/always_on/claude-md.md +0 -9
  12. package/skills/graphify/always_on/gemini-md.md +0 -9
  13. package/skills/graphify/always_on/kiro-steering.md +0 -5
  14. package/skills/graphify/always_on/vscode-instructions.md +0 -17
  15. package/skills/graphify/analyze.py +0 -724
  16. package/skills/graphify/benchmark.py +0 -155
  17. package/skills/graphify/build.py +0 -487
  18. package/skills/graphify/cache.py +0 -417
  19. package/skills/graphify/callflow_html.py +0 -2020
  20. package/skills/graphify/cluster.py +0 -272
  21. package/skills/graphify/command-kilo.md +0 -15
  22. package/skills/graphify/dedup.py +0 -429
  23. package/skills/graphify/detect.py +0 -1379
  24. package/skills/graphify/diagnostics.py +0 -390
  25. package/skills/graphify/export.py +0 -1408
  26. package/skills/graphify/extract.py +0 -11570
  27. package/skills/graphify/global_graph.py +0 -159
  28. package/skills/graphify/google_workspace.py +0 -223
  29. package/skills/graphify/hooks.py +0 -457
  30. package/skills/graphify/ingest.py +0 -331
  31. package/skills/graphify/llm.py +0 -1896
  32. package/skills/graphify/manifest.py +0 -4
  33. package/skills/graphify/mcp_ingest.py +0 -392
  34. package/skills/graphify/multigraph_compat.py +0 -212
  35. package/skills/graphify/pg_introspect.py +0 -142
  36. package/skills/graphify/prs.py +0 -748
  37. package/skills/graphify/querylog.py +0 -70
  38. package/skills/graphify/report.py +0 -218
  39. package/skills/graphify/scip_ingest.py +0 -363
  40. package/skills/graphify/security.py +0 -336
  41. package/skills/graphify/semantic_cleanup.py +0 -319
  42. package/skills/graphify/serve.py +0 -1309
  43. package/skills/graphify/skill-aider.md +0 -1246
  44. package/skills/graphify/skill-amp.md +0 -613
  45. package/skills/graphify/skill-claw.md +0 -616
  46. package/skills/graphify/skill-codex.md +0 -613
  47. package/skills/graphify/skill-copilot.md +0 -616
  48. package/skills/graphify/skill-devin.md +0 -1372
  49. package/skills/graphify/skill-droid.md +0 -613
  50. package/skills/graphify/skill-kilo.md +0 -625
  51. package/skills/graphify/skill-kiro.md +0 -615
  52. package/skills/graphify/skill-opencode.md +0 -608
  53. package/skills/graphify/skill-pi.md +0 -615
  54. package/skills/graphify/skill-trae.md +0 -614
  55. package/skills/graphify/skill-vscode.md +0 -612
  56. package/skills/graphify/skill-windows.md +0 -651
  57. package/skills/graphify/skills/amp/references/add-watch.md +0 -56
  58. package/skills/graphify/skills/amp/references/exports.md +0 -71
  59. package/skills/graphify/skills/amp/references/extraction-spec.md +0 -68
  60. package/skills/graphify/skills/amp/references/github-and-merge.md +0 -46
  61. package/skills/graphify/skills/amp/references/hooks.md +0 -33
  62. package/skills/graphify/skills/amp/references/query.md +0 -249
  63. package/skills/graphify/skills/amp/references/transcribe.md +0 -48
  64. package/skills/graphify/skills/amp/references/update.md +0 -179
  65. package/skills/graphify/skills/claude/references/add-watch.md +0 -56
  66. package/skills/graphify/skills/claude/references/exports.md +0 -71
  67. package/skills/graphify/skills/claude/references/extraction-spec.md +0 -68
  68. package/skills/graphify/skills/claude/references/github-and-merge.md +0 -46
  69. package/skills/graphify/skills/claude/references/hooks.md +0 -33
  70. package/skills/graphify/skills/claude/references/query.md +0 -103
  71. package/skills/graphify/skills/claude/references/transcribe.md +0 -48
  72. package/skills/graphify/skills/claude/references/update.md +0 -179
  73. package/skills/graphify/skills/claw/references/add-watch.md +0 -56
  74. package/skills/graphify/skills/claw/references/exports.md +0 -71
  75. package/skills/graphify/skills/claw/references/extraction-spec.md +0 -29
  76. package/skills/graphify/skills/claw/references/github-and-merge.md +0 -46
  77. package/skills/graphify/skills/claw/references/hooks.md +0 -33
  78. package/skills/graphify/skills/claw/references/query.md +0 -249
  79. package/skills/graphify/skills/claw/references/transcribe.md +0 -48
  80. package/skills/graphify/skills/claw/references/update.md +0 -179
  81. package/skills/graphify/skills/codex/references/add-watch.md +0 -56
  82. package/skills/graphify/skills/codex/references/exports.md +0 -71
  83. package/skills/graphify/skills/codex/references/extraction-spec.md +0 -29
  84. package/skills/graphify/skills/codex/references/github-and-merge.md +0 -46
  85. package/skills/graphify/skills/codex/references/hooks.md +0 -33
  86. package/skills/graphify/skills/codex/references/query.md +0 -249
  87. package/skills/graphify/skills/codex/references/transcribe.md +0 -48
  88. package/skills/graphify/skills/codex/references/update.md +0 -179
  89. package/skills/graphify/skills/copilot/references/add-watch.md +0 -56
  90. package/skills/graphify/skills/copilot/references/exports.md +0 -71
  91. package/skills/graphify/skills/copilot/references/extraction-spec.md +0 -68
  92. package/skills/graphify/skills/copilot/references/github-and-merge.md +0 -46
  93. package/skills/graphify/skills/copilot/references/hooks.md +0 -33
  94. package/skills/graphify/skills/copilot/references/query.md +0 -249
  95. package/skills/graphify/skills/copilot/references/transcribe.md +0 -48
  96. package/skills/graphify/skills/copilot/references/update.md +0 -179
  97. package/skills/graphify/skills/droid/references/add-watch.md +0 -56
  98. package/skills/graphify/skills/droid/references/exports.md +0 -71
  99. package/skills/graphify/skills/droid/references/extraction-spec.md +0 -68
  100. package/skills/graphify/skills/droid/references/github-and-merge.md +0 -46
  101. package/skills/graphify/skills/droid/references/hooks.md +0 -33
  102. package/skills/graphify/skills/droid/references/query.md +0 -249
  103. package/skills/graphify/skills/droid/references/transcribe.md +0 -48
  104. package/skills/graphify/skills/droid/references/update.md +0 -179
  105. package/skills/graphify/skills/kilo/references/add-watch.md +0 -56
  106. package/skills/graphify/skills/kilo/references/exports.md +0 -71
  107. package/skills/graphify/skills/kilo/references/extraction-spec.md +0 -68
  108. package/skills/graphify/skills/kilo/references/github-and-merge.md +0 -46
  109. package/skills/graphify/skills/kilo/references/hooks.md +0 -33
  110. package/skills/graphify/skills/kilo/references/query.md +0 -249
  111. package/skills/graphify/skills/kilo/references/transcribe.md +0 -48
  112. package/skills/graphify/skills/kilo/references/update.md +0 -179
  113. package/skills/graphify/skills/kiro/references/add-watch.md +0 -56
  114. package/skills/graphify/skills/kiro/references/exports.md +0 -71
  115. package/skills/graphify/skills/kiro/references/extraction-spec.md +0 -29
  116. package/skills/graphify/skills/kiro/references/github-and-merge.md +0 -46
  117. package/skills/graphify/skills/kiro/references/hooks.md +0 -33
  118. package/skills/graphify/skills/kiro/references/query.md +0 -249
  119. package/skills/graphify/skills/kiro/references/transcribe.md +0 -48
  120. package/skills/graphify/skills/kiro/references/update.md +0 -179
  121. package/skills/graphify/skills/opencode/references/add-watch.md +0 -56
  122. package/skills/graphify/skills/opencode/references/exports.md +0 -71
  123. package/skills/graphify/skills/opencode/references/extraction-spec.md +0 -68
  124. package/skills/graphify/skills/opencode/references/github-and-merge.md +0 -46
  125. package/skills/graphify/skills/opencode/references/hooks.md +0 -33
  126. package/skills/graphify/skills/opencode/references/query.md +0 -249
  127. package/skills/graphify/skills/opencode/references/transcribe.md +0 -48
  128. package/skills/graphify/skills/opencode/references/update.md +0 -179
  129. package/skills/graphify/skills/pi/references/add-watch.md +0 -56
  130. package/skills/graphify/skills/pi/references/exports.md +0 -71
  131. package/skills/graphify/skills/pi/references/extraction-spec.md +0 -29
  132. package/skills/graphify/skills/pi/references/github-and-merge.md +0 -46
  133. package/skills/graphify/skills/pi/references/hooks.md +0 -33
  134. package/skills/graphify/skills/pi/references/query.md +0 -249
  135. package/skills/graphify/skills/pi/references/transcribe.md +0 -48
  136. package/skills/graphify/skills/pi/references/update.md +0 -179
  137. package/skills/graphify/skills/trae/references/add-watch.md +0 -56
  138. package/skills/graphify/skills/trae/references/exports.md +0 -71
  139. package/skills/graphify/skills/trae/references/extraction-spec.md +0 -68
  140. package/skills/graphify/skills/trae/references/github-and-merge.md +0 -46
  141. package/skills/graphify/skills/trae/references/hooks.md +0 -35
  142. package/skills/graphify/skills/trae/references/query.md +0 -249
  143. package/skills/graphify/skills/trae/references/transcribe.md +0 -48
  144. package/skills/graphify/skills/trae/references/update.md +0 -179
  145. package/skills/graphify/skills/vscode/references/add-watch.md +0 -56
  146. package/skills/graphify/skills/vscode/references/exports.md +0 -71
  147. package/skills/graphify/skills/vscode/references/extraction-spec.md +0 -68
  148. package/skills/graphify/skills/vscode/references/github-and-merge.md +0 -46
  149. package/skills/graphify/skills/vscode/references/hooks.md +0 -33
  150. package/skills/graphify/skills/vscode/references/query.md +0 -249
  151. package/skills/graphify/skills/vscode/references/transcribe.md +0 -48
  152. package/skills/graphify/skills/vscode/references/update.md +0 -179
  153. package/skills/graphify/skills/windows/references/add-watch.md +0 -56
  154. package/skills/graphify/skills/windows/references/exports.md +0 -71
  155. package/skills/graphify/skills/windows/references/extraction-spec.md +0 -68
  156. package/skills/graphify/skills/windows/references/github-and-merge.md +0 -46
  157. package/skills/graphify/skills/windows/references/hooks.md +0 -33
  158. package/skills/graphify/skills/windows/references/query.md +0 -249
  159. package/skills/graphify/skills/windows/references/transcribe.md +0 -48
  160. package/skills/graphify/skills/windows/references/update.md +0 -179
  161. package/skills/graphify/symbol_resolution.py +0 -538
  162. package/skills/graphify/transcribe.py +0 -184
  163. package/skills/graphify/tree_html.py +0 -582
  164. package/skills/graphify/validate.py +0 -72
  165. package/skills/graphify/watch.py +0 -898
  166. package/skills/graphify/wiki.py +0 -282
@@ -1,615 +0,0 @@
1
- ---
2
- name: graphify
3
- description: "Use for any question about a codebase, its architecture, file relationships, or project content — especially when graphify-out/ exists, where the question should be treated as a graphify query first. Turns any input (code, docs, papers, images, videos) into a persistent knowledge graph with god nodes, community detection, and query/path/explain tools."
4
- ---
5
-
6
- # /graphify
7
-
8
- Turn any folder of files into a navigable knowledge graph with community detection, an honest audit trail, and three outputs: interactive HTML, GraphRAG-ready JSON, and a plain-language GRAPH_REPORT.md.
9
-
10
- ## Usage
11
-
12
- ```
13
- /graphify # full pipeline on current directory → Obsidian vault
14
- /graphify <path> # full pipeline on specific path
15
- /graphify https://github.com/<owner>/<repo> # clone repo then run full pipeline on it
16
- /graphify https://github.com/<owner>/<repo> --branch <branch> # clone a specific branch
17
- /graphify <url1> <url2> ... # clone multiple repos, build each, merge into one cross-repo graph
18
- /graphify <path> --mode deep # thorough extraction, richer INFERRED edges
19
- /graphify <path> --update # incremental - re-extract only new/changed files
20
- /graphify <path> --directed # build directed graph (preserves edge direction: source→target)
21
- /graphify <path> --whisper-model medium # use a larger Whisper model for better transcription accuracy
22
- /graphify <path> --cluster-only # rerun clustering on existing graph
23
- /graphify <path> --no-viz # skip visualization, just report + JSON
24
- /graphify <path> --html # (HTML is generated by default - this flag is a no-op)
25
- /graphify <path> --svg # also export graph.svg (embeds in Notion, GitHub)
26
- /graphify <path> --graphml # export graph.graphml (Gephi, yEd)
27
- /graphify <path> --neo4j # generate graphify-out/cypher.txt for Neo4j
28
- /graphify <path> --neo4j-push bolt://localhost:7687 # push directly to Neo4j
29
- /graphify <path> --mcp # start MCP stdio server for agent access
30
- /graphify <path> --watch # watch folder, auto-rebuild on code changes (no LLM needed)
31
- /graphify <path> --wiki # build agent-crawlable wiki (index.md + one article per community)
32
- /graphify <path> --obsidian --obsidian-dir ~/vaults/my-project # write vault to custom path (e.g. existing vault)
33
- /graphify add <url> # fetch URL, save to ./raw, update graph
34
- /graphify add <url> --author "Name" # tag who wrote it
35
- /graphify add <url> --contributor "Name" # tag who added it to the corpus
36
- /graphify query "<question>" # BFS traversal - broad context
37
- /graphify query "<question>" --dfs # DFS - trace a specific path
38
- /graphify query "<question>" --budget 1500 # cap answer at N tokens
39
- /graphify path "AuthModule" "Database" # shortest path between two concepts
40
- /graphify explain "SwinTransformer" # plain-language explanation of a node
41
- ```
42
-
43
- ## What graphify is for
44
-
45
- Drop any folder of code, docs, papers, images, or video into graphify and get a queryable knowledge graph. Persistent across sessions, honest audit trail (EXTRACTED/INFERRED/AMBIGUOUS), community detection surfaces cross-document connections you wouldn't think to ask about.
46
-
47
- ## What You Must Do When Invoked
48
-
49
- If the user invoked `/graphify --help` or `/graphify -h` (with no other arguments), print the contents of the `## Usage` section above verbatim and stop. Do not run any commands, do not detect files, do not default the path to `.`. Just print the Usage block and return.
50
-
51
- **Fast path — existing graph:** Before doing anything else, check whether `graphify-out/graph.json` exists. The expected location is `graphify-out/graph.json` relative to the **current working directory** (i.e. the project root where you are running commands). If it exists AND the user's request is a natural-language question about the codebase (e.g. "How does X work?", "What calls Y?", "Trace the data flow through Z") and NOT an explicit rebuild command (`--update`, `--cluster-only`, or a bare path/URL that implies fresh extraction): **skip Steps 1–5 entirely and jump straight to `## For /graphify query`.** Run `graphify query "<question>"` immediately. Do not run detect. Do not check corpus size. Do not ask the user to narrow. The graph is already built — use it.
52
-
53
- If no path was given, use `.` (current directory). Do not ask the user for a path.
54
-
55
- If the path argument starts with `https://github.com/` or `http://github.com/`, treat it as a GitHub URL - run Step 0 before anything else, then continue with the resolved local path.
56
-
57
- Follow these steps in order. Do not skip steps.
58
-
59
- ### Step 0 - GitHub repos and multi-path merge (only if a URL or several paths)
60
-
61
- Only when the path is one or more `https://github.com/...` URLs, or several local subfolders to merge. See `references/github-and-merge.md` for the clone, cross-repo merge, and monorepo flow, then continue with the resolved local path. A plain local path skips this step.
62
-
63
- ### Step 1 - Ensure graphify is installed
64
-
65
- ```bash
66
- # Detect the correct Python interpreter (handles uv tool, pipx, venv, system installs)
67
- PYTHON=""
68
- GRAPHIFY_BIN=$(which graphify 2>/dev/null)
69
- # 1. uv tool installs — most reliable on modern Mac/Linux
70
- if [ -z "$PYTHON" ] && command -v uv >/dev/null 2>&1; then
71
- _UV_PY=$(uv tool run graphifyy python -c "import sys; print(sys.executable)" 2>/dev/null)
72
- if [ -n "$_UV_PY" ]; then PYTHON="$_UV_PY"; fi
73
- fi
74
- # 2. Read shebang from graphify binary (pipx and direct pip installs)
75
- if [ -z "$PYTHON" ] && [ -n "$GRAPHIFY_BIN" ]; then
76
- _SHEBANG=$(head -1 "$GRAPHIFY_BIN" | tr -d '#!')
77
- case "$_SHEBANG" in
78
- *[!a-zA-Z0-9/_.-]*) ;;
79
- *) "$_SHEBANG" -c "import graphify" 2>/dev/null && PYTHON="$_SHEBANG" ;;
80
- esac
81
- fi
82
- # 3. Fall back to python3
83
- if [ -z "$PYTHON" ]; then PYTHON="python3"; fi
84
- if ! "$PYTHON" -c "import graphify" 2>/dev/null; then
85
- if command -v uv >/dev/null 2>&1; then
86
- uv tool install --upgrade graphifyy -q 2>&1 | tail -3
87
- _UV_PY=$(uv tool run graphifyy python -c "import sys; print(sys.executable)" 2>/dev/null)
88
- if [ -n "$_UV_PY" ]; then PYTHON="$_UV_PY"; fi
89
- else
90
- "$PYTHON" -m pip install graphifyy -q 2>/dev/null \
91
- || "$PYTHON" -m pip install graphifyy -q --break-system-packages 2>&1 | tail -3
92
- fi
93
- fi
94
- # Write interpreter path for all subsequent steps (persists across invocations)
95
- mkdir -p graphify-out
96
- "$PYTHON" -c "import sys; open('graphify-out/.graphify_python', 'w', encoding='utf-8').write(sys.executable)"
97
- # Save scan root so `graphify update` (no args) knows where to look next time
98
- echo "$(cd INPUT_PATH && pwd)" > graphify-out/.graphify_root
99
- ```
100
-
101
- If the import succeeds, print nothing and move straight to Step 2.
102
-
103
- **In every subsequent bash block, replace `python3` with `$(cat graphify-out/.graphify_python)` to use the correct interpreter.**
104
-
105
- ### Step 2 - Detect files
106
-
107
- ```bash
108
- $(cat graphify-out/.graphify_python) -c "
109
- import json
110
- from graphify.detect import detect
111
- from pathlib import Path
112
- result = detect(Path('INPUT_PATH'))
113
- print(json.dumps(result, ensure_ascii=False))
114
- " > graphify-out/.graphify_detect.json
115
- ```
116
-
117
- Replace INPUT_PATH with the actual path the user provided. Do NOT cat or print the JSON - read it silently and present a clean summary instead:
118
-
119
- ```
120
- Corpus: X files · ~Y words
121
- code: N files (.py .ts .go ...)
122
- docs: N files (.md .txt ...)
123
- papers: N files (.pdf ...)
124
- images: N files
125
- video: N files (.mp4 .mp3 ...)
126
- ```
127
-
128
- Omit any category with 0 files from the summary.
129
-
130
- Then act on it:
131
- - If `total_files` is 0: stop with "No supported files found in [path]."
132
- - If `skipped_sensitive` is non-empty: mention file count skipped, not the file names.
133
- - If `total_words` > 2,000,000 OR `total_files` > 500: show the warning. Then compute the top 5 first-level subdirectories by file count:
134
- - Read `scan_root` from the detect JSON (always an absolute path to the resolved INPUT_PATH).
135
- - Concatenate all file lists across all types (`code`, `document`, `paper`, `image`, `video`).
136
- - Filter out any path that starts with `scan_root + "/graphify-out/"` to exclude converted sidecars.
137
- - For each file, strip the `scan_root` prefix and take the first path component. Files directly in `scan_root` with no subdirectory count as `(root)`.
138
- - If all files are in `(root)` with no subdirectories, do not ask to narrow — no subfolders exist. Instead suggest `--no-cluster` to skip the expensive clustering step and proceed.
139
- - Otherwise rank by count, show the top 5 with file counts, then ask which subfolder to run on. Wait for the user's answer before proceeding.
140
- - Otherwise: proceed directly to Step 2.5 if video files were detected, or Step 3 if not.
141
-
142
- ### Step 2.5 - Video and audio (only if video files detected)
143
-
144
- Skip this step entirely if `detect` returned zero `video` files. When the corpus has video or audio, see `references/transcribe.md` to transcribe them to text first, then treat the transcripts as doc files in Step 3.
145
-
146
- ### Step 3 - Extract entities and relationships
147
-
148
- **Before starting:** note whether `--mode deep` was given. You must pass `DEEP_MODE=true` to every subagent in Step B2 if it was. Track this from the original invocation - do not lose it.
149
-
150
- This step has two parts: **structural extraction** (deterministic, free) and **semantic extraction** (LLM, costs tokens).
151
-
152
- **Before dispatching subagents:** check whether `GEMINI_API_KEY` or `GOOGLE_API_KEY` is set. If neither is set, print this one-liner to the user:
153
- > Tip: set `GEMINI_API_KEY` or `GOOGLE_API_KEY` to use Gemini for semantic extraction (`pip install 'graphifyy[gemini]'`).
154
-
155
- Print it once, then continue. If `GEMINI_API_KEY` or `GOOGLE_API_KEY` IS set, use `graphify.llm.extract_corpus_parallel(files, backend="gemini")` for semantic extraction instead of dispatching Claude subagents. The default Gemini model is `gemini-3-flash-preview`; set `GRAPHIFY_GEMINI_MODEL` or pass `--model` in headless CLI flows to override it.
156
-
157
- > **No other API keys are read.** If `GEMINI_API_KEY`/`GOOGLE_API_KEY` are unset, fall straight through to Claude Code subagent dispatch (Part B below) — the host session itself is the LLM. graphify does **not** read `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or any other provider key from the environment. If a host agent prompts the user for `ANTHROPIC_API_KEY` to run extraction, that prompt is a misread of this skill — ignore it and dispatch subagents as written.
158
-
159
- **Run Part A (AST) and Part B (semantic) in parallel. Dispatch all semantic subagents AND start AST extraction in the same message. Both can run simultaneously since they operate on different file types. Merge results in Part C as before.**
160
-
161
- Note: Parallelizing AST + semantic saves 5-15s on large corpora. AST is deterministic and fast; start it while subagents are processing docs/papers.
162
-
163
- #### Part A - Structural extraction for code files
164
-
165
- For any code files detected, run AST extraction in parallel with Part B subagents:
166
-
167
- ```bash
168
- $(cat graphify-out/.graphify_python) -c "
169
- import sys, json
170
- from graphify.extract import collect_files, extract
171
- from pathlib import Path
172
- import json
173
-
174
- code_files = []
175
- detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
176
- for f in detect.get('files', {}).get('code', []):
177
- code_files.extend(collect_files(Path(f)) if Path(f).is_dir() else [Path(f)])
178
-
179
- if code_files:
180
- result = extract(code_files, cache_root=Path('.'))
181
- Path('graphify-out/.graphify_ast.json').write_text(json.dumps(result, indent=2, ensure_ascii=False), encoding=\"utf-8\")
182
- print(f'AST: {len(result[\"nodes\"])} nodes, {len(result[\"edges\"])} edges')
183
- else:
184
- Path('graphify-out/.graphify_ast.json').write_text(json.dumps({'nodes':[],'edges':[],'input_tokens':0,'output_tokens':0}, ensure_ascii=False), encoding=\"utf-8\")
185
- print('No code files - skipping AST extraction')
186
- "
187
- ```
188
-
189
- #### Part B - Semantic extraction (parallel subagents)
190
-
191
- **Fast path:** If detection found zero docs, papers, and images (code-only corpus), skip Part B entirely and go straight to Part C. AST handles code - there is nothing for semantic subagents to do.
192
-
193
- **MANDATORY: You MUST use the Agent tool here. Reading files yourself one-by-one is forbidden - it is 5-10x slower. If you do not use the Agent tool you are doing this wrong.**
194
-
195
- Before dispatching subagents, print a timing estimate:
196
- - Load `total_words` and file counts from `graphify-out/.graphify_detect.json`
197
- - Estimate agents needed: `ceil(uncached_non_code_files / 22)` (chunk size is 20-25)
198
- - Estimate time: ~45s per agent batch (they run in parallel, so total ≈ 45s × ceil(agents/parallel_limit))
199
- - Print: "Semantic extraction: ~N files → X agents, estimated ~Ys"
200
-
201
- **Step B0 - Check extraction cache first**
202
-
203
- Before dispatching any subagents, check which files already have cached extraction results:
204
-
205
- ```bash
206
- $(cat graphify-out/.graphify_python) -c "
207
- import json
208
- from graphify.cache import check_semantic_cache
209
- from pathlib import Path
210
-
211
- detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
212
- all_files = [f for files in detect['files'].values() for f in files]
213
-
214
- cached_nodes, cached_edges, cached_hyperedges, uncached = check_semantic_cache(all_files)
215
-
216
- if cached_nodes or cached_edges or cached_hyperedges:
217
- Path('graphify-out/.graphify_cached.json').write_text(json.dumps({'nodes': cached_nodes, 'edges': cached_edges, 'hyperedges': cached_hyperedges}, ensure_ascii=False), encoding=\"utf-8\")
218
- Path('graphify-out/.graphify_uncached.txt').write_text('\n'.join(uncached), encoding=\"utf-8\")
219
- print(f'Cache: {len(all_files)-len(uncached)} files hit, {len(uncached)} files need extraction')
220
- "
221
- ```
222
-
223
- Only dispatch subagents for files listed in `graphify-out/.graphify_uncached.txt`. If all files are cached, skip to Part C directly.
224
-
225
- **Step B1 - Split into chunks**
226
-
227
- Load files from `graphify-out/.graphify_uncached.txt`. Split into chunks of 20-25 files each. Each image gets its own chunk (vision needs separate context). When splitting, group files from the same directory together so related artifacts land in the same chunk and cross-file relationships are more likely to be extracted.
228
-
229
- **Step B2 - Dispatch ALL subagents in a single message**
230
-
231
- Call the Agent tool multiple times IN THE SAME RESPONSE - one call per chunk. This is the only way they run in parallel. If you make one Agent call, wait, then make another, you are doing it sequentially and defeating the purpose.
232
-
233
- **IMPORTANT - subagent type:** Always use `subagent_type="general-purpose"`. Do NOT use `Explore` - it is read-only and cannot write chunk files to disk, which silently drops extraction results. General-purpose has Write and Bash access which the subagent needs.
234
-
235
- Concrete example for 3 chunks:
236
- ```
237
- [Agent tool call 1: files 1-15, subagent_type="general-purpose"]
238
- [Agent tool call 2: files 16-30, subagent_type="general-purpose"]
239
- [Agent tool call 3: files 31-45, subagent_type="general-purpose"]
240
- ```
241
- All three in one message. Not three separate messages.
242
-
243
- Each subagent receives this exact prompt (substitute FILE_LIST, CHUNK_NUM, TOTAL_CHUNKS, DEEP_MODE, and CHUNK_PATH).
244
-
245
- CHUNK_PATH must be an **absolute** path — derive it before dispatching:
246
- ```bash
247
- PROJECT_ROOT=$(cat graphify-out/.graphify_root)
248
- # Then for chunk N: CHUNK_PATH="${PROJECT_ROOT}/graphify-out/.graphify_chunk_0N.json"
249
- ```
250
-
251
- Subagent prompt template:
252
-
253
- See `references/extraction-spec.md` for the exact subagent prompt (JSON schema, node-ID rules, confidence rubric, frontmatter, hyperedge, and vision rules). Load it only here, only when at least one chunk holds a doc, paper, or image; a pure-code corpus has skipped Part B and never reads it. Pass each subagent that prompt verbatim with FILE_LIST, CHUNK_NUM, TOTAL_CHUNKS, DEEP_MODE, and CHUNK_PATH substituted, and have it write the result to CHUNK_PATH.
254
-
255
- **Step B3 - Collect, cache, and merge**
256
-
257
- Wait for all subagents. For each result:
258
- - Check that `graphify-out/.graphify_chunk_NN.json` exists on disk — this is the success signal
259
- - If the file exists and contains valid JSON with `nodes` and `edges`, include it and save to cache
260
- - If the file is missing, the subagent was likely dispatched as read-only (Explore type) — print a warning: "chunk N missing from disk — subagent may have been read-only. Re-run with general-purpose agent." Do not silently skip.
261
- - If a subagent failed or returned invalid JSON, print a warning and skip that chunk - do not abort
262
-
263
- If more than half the chunks failed or are missing, stop and tell the user to re-run and ensure `subagent_type="general-purpose"` is used.
264
-
265
- Merge all chunk files into `.graphify_semantic_new.json`. **After each Agent call completes, read the real token counts from the Agent tool result's `usage` field and write them back into the chunk JSON before merging** — the chunk JSON itself always has placeholder zeros. Then run:
266
- ```bash
267
- $(cat graphify-out/.graphify_python) -c "
268
- import json, glob
269
- from pathlib import Path
270
-
271
- chunks = sorted(glob.glob('graphify-out/.graphify_chunk_*.json'))
272
- all_nodes, all_edges, all_hyperedges = [], [], []
273
- total_in, total_out = 0, 0
274
- for c in chunks:
275
- d = json.loads(Path(c).read_text(encoding=\"utf-8\"))
276
- all_nodes += d.get('nodes', [])
277
- all_edges += d.get('edges', [])
278
- all_hyperedges += d.get('hyperedges', [])
279
- total_in += d.get('input_tokens', 0)
280
- total_out += d.get('output_tokens', 0)
281
- Path('graphify-out/.graphify_semantic_new.json').write_text(json.dumps({
282
- 'nodes': all_nodes, 'edges': all_edges, 'hyperedges': all_hyperedges,
283
- 'input_tokens': total_in, 'output_tokens': total_out,
284
- }, indent=2, ensure_ascii=False), encoding=\"utf-8\")
285
- print(f'Merged {len(chunks)} chunks: {total_in:,} in / {total_out:,} out tokens')
286
- "
287
- ```
288
-
289
- Save new results to cache:
290
- ```bash
291
- $(cat graphify-out/.graphify_python) -c "
292
- import json
293
- from graphify.cache import save_semantic_cache
294
- from pathlib import Path
295
-
296
- new = json.loads(Path('graphify-out/.graphify_semantic_new.json').read_text(encoding=\"utf-8\")) if Path('graphify-out/.graphify_semantic_new.json').exists() else {'nodes':[],'edges':[],'hyperedges':[]}
297
- saved = save_semantic_cache(new.get('nodes', []), new.get('edges', []), new.get('hyperedges', []))
298
- print(f'Cached {saved} files')
299
- "
300
- ```
301
-
302
- Merge cached + new results into `graphify-out/.graphify_semantic.json`:
303
- ```bash
304
- $(cat graphify-out/.graphify_python) -c "
305
- import json
306
- from pathlib import Path
307
-
308
- cached = json.loads(Path('graphify-out/.graphify_cached.json').read_text(encoding=\"utf-8\")) if Path('graphify-out/.graphify_cached.json').exists() else {'nodes':[],'edges':[],'hyperedges':[]}
309
- new = json.loads(Path('graphify-out/.graphify_semantic_new.json').read_text(encoding=\"utf-8\")) if Path('graphify-out/.graphify_semantic_new.json').exists() else {'nodes':[],'edges':[],'hyperedges':[]}
310
-
311
- all_nodes = cached['nodes'] + new.get('nodes', [])
312
- all_edges = cached['edges'] + new.get('edges', [])
313
- all_hyperedges = cached.get('hyperedges', []) + new.get('hyperedges', [])
314
- seen = set()
315
- deduped = []
316
- for n in all_nodes:
317
- if n['id'] not in seen:
318
- seen.add(n['id'])
319
- deduped.append(n)
320
-
321
- merged = {
322
- 'nodes': deduped,
323
- 'edges': all_edges,
324
- 'hyperedges': all_hyperedges,
325
- 'input_tokens': new.get('input_tokens', 0),
326
- 'output_tokens': new.get('output_tokens', 0),
327
- }
328
- Path('graphify-out/.graphify_semantic.json').write_text(json.dumps(merged, indent=2, ensure_ascii=False), encoding=\"utf-8\")
329
- print(f'Extraction complete - {len(deduped)} nodes, {len(all_edges)} edges ({len(cached[\"nodes\"])} from cache, {len(new.get(\"nodes\",[]))} new)')
330
- "
331
- ```
332
- Clean up temp files: `rm -f graphify-out/.graphify_cached.json graphify-out/.graphify_uncached.txt graphify-out/.graphify_semantic_new.json`
333
-
334
- #### Part C - Merge AST + semantic into final extraction
335
-
336
- ```bash
337
- $(cat graphify-out/.graphify_python) -c "
338
- import sys, json
339
- from pathlib import Path
340
-
341
- ast = json.loads(Path('graphify-out/.graphify_ast.json').read_text(encoding=\"utf-8\"))
342
- sem = json.loads(Path('graphify-out/.graphify_semantic.json').read_text(encoding=\"utf-8\"))
343
-
344
- # Merge: AST nodes first, semantic nodes deduplicated by id
345
- seen = {n['id'] for n in ast['nodes']}
346
- merged_nodes = list(ast['nodes'])
347
- for n in sem['nodes']:
348
- if n['id'] not in seen:
349
- merged_nodes.append(n)
350
- seen.add(n['id'])
351
-
352
- merged_edges = ast['edges'] + sem['edges']
353
- merged_hyperedges = sem.get('hyperedges', [])
354
- merged = {
355
- 'nodes': merged_nodes,
356
- 'edges': merged_edges,
357
- 'hyperedges': merged_hyperedges,
358
- 'input_tokens': sem.get('input_tokens', 0),
359
- 'output_tokens': sem.get('output_tokens', 0),
360
- }
361
- Path('graphify-out/.graphify_extract.json').write_text(json.dumps(merged, indent=2, ensure_ascii=False), encoding=\"utf-8\")
362
- total = len(merged_nodes)
363
- edges = len(merged_edges)
364
- print(f'Merged: {total} nodes, {edges} edges ({len(ast[\"nodes\"])} AST + {len(sem[\"nodes\"])} semantic)')
365
- "
366
- ```
367
-
368
- ### Step 4 - Build graph, cluster, analyze, generate outputs
369
-
370
- **Before starting:** note whether `--directed` was given. If so, pass `directed=True` to `build_from_json()` in the code block below. This builds a `DiGraph` that preserves edge direction (source→target) instead of the default undirected `Graph`.
371
-
372
- ```bash
373
- mkdir -p graphify-out
374
- $(cat graphify-out/.graphify_python) -c "
375
- import sys, json
376
- from graphify.build import build_from_json
377
- from graphify.cluster import cluster, score_all
378
- from graphify.analyze import god_nodes, surprising_connections, suggest_questions
379
- from graphify.report import generate
380
- from graphify.export import to_json
381
- from pathlib import Path
382
-
383
- extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
384
- detection = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
385
-
386
- G = build_from_json(extraction)
387
- communities = cluster(G)
388
- cohesion = score_all(G, communities)
389
- tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)}
390
- gods = god_nodes(G)
391
- surprises = surprising_connections(G, communities)
392
- labels = {cid: 'Community ' + str(cid) for cid in communities}
393
- # Placeholder questions - regenerated with real labels in Step 5
394
- questions = suggest_questions(G, communities, labels)
395
-
396
- report = generate(G, communities, cohesion, labels, gods, surprises, detection, tokens, '.', suggested_questions=questions)
397
- Path('graphify-out/GRAPH_REPORT.md').write_text(report, encoding=\"utf-8\")
398
- to_json(G, communities, 'graphify-out/graph.json')
399
-
400
- analysis = {
401
- 'communities': {str(k): v for k, v in communities.items()},
402
- 'cohesion': {str(k): v for k, v in cohesion.items()},
403
- 'gods': gods,
404
- 'surprises': surprises,
405
- 'questions': questions,
406
- }
407
- Path('graphify-out/.graphify_analysis.json').write_text(json.dumps(analysis, indent=2, ensure_ascii=False), encoding=\"utf-8\")
408
- if G.number_of_nodes() == 0:
409
- print('ERROR: Graph is empty - extraction produced no nodes.')
410
- print('Possible causes: all files were skipped, binary-only corpus, or extraction failed.')
411
- raise SystemExit(1)
412
- print(f'Graph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges, {len(communities)} communities')
413
- "
414
- ```
415
-
416
- If this step prints `ERROR: Graph is empty`, stop and tell the user what happened - do not proceed to labeling or visualization.
417
-
418
- Replace INPUT_PATH with the actual path.
419
-
420
- ### Step 5 - Label communities
421
-
422
- Read `graphify-out/.graphify_analysis.json`. For each community key, look at its node labels and write a 2-5 word plain-language name (e.g. "Attention Mechanism", "Training Pipeline", "Data Loading").
423
-
424
- Then regenerate the report and save the labels for the visualizer:
425
-
426
- ```bash
427
- $(cat graphify-out/.graphify_python) -c "
428
- import sys, json
429
- from graphify.build import build_from_json
430
- from graphify.cluster import score_all
431
- from graphify.analyze import god_nodes, surprising_connections, suggest_questions
432
- from graphify.report import generate
433
- from pathlib import Path
434
-
435
- extraction = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
436
- detection = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
437
- analysis = json.loads(Path('graphify-out/.graphify_analysis.json').read_text(encoding=\"utf-8\"))
438
-
439
- G = build_from_json(extraction)
440
- communities = {int(k): v for k, v in analysis['communities'].items()}
441
- cohesion = {int(k): v for k, v in analysis['cohesion'].items()}
442
- tokens = {'input': extraction.get('input_tokens', 0), 'output': extraction.get('output_tokens', 0)}
443
-
444
- # LABELS - replace these with the names you chose above
445
- labels = LABELS_DICT
446
-
447
- # Regenerate questions with real community labels (labels affect question phrasing)
448
- questions = suggest_questions(G, communities, labels)
449
-
450
- report = generate(G, communities, cohesion, labels, analysis['gods'], analysis['surprises'], detection, tokens, '.', suggested_questions=questions)
451
- Path('graphify-out/GRAPH_REPORT.md').write_text(report, encoding=\"utf-8\")
452
- Path('graphify-out/.graphify_labels.json').write_text(json.dumps({str(k): v for k, v in labels.items()}, ensure_ascii=False), encoding=\"utf-8\")
453
- print('Report updated with community labels')
454
- "
455
- ```
456
-
457
- Replace `LABELS_DICT` with the actual dict you constructed (e.g. `{0: "Attention Mechanism", 1: "Training Pipeline"}`).
458
- Replace INPUT_PATH with the actual path.
459
-
460
- ### Step 6 - Generate Obsidian vault (opt-in) + HTML
461
-
462
- **Generate HTML always** (unless `--no-viz`). **Obsidian vault only if `--obsidian` was explicitly given** — skip it otherwise, it generates one file per node.
463
-
464
- If `--obsidian` was given:
465
-
466
- - If `--obsidian-dir <path>` was also given, pass it via `--dir`. Otherwise defaults to `graphify-out/obsidian`.
467
-
468
- ```bash
469
- graphify export obsidian
470
- # or with custom dir: graphify export obsidian --dir ~/vaults/my-project
471
- ```
472
-
473
- Generate the HTML graph (always, unless `--no-viz`):
474
-
475
- ```bash
476
- graphify export html # auto-aggregates to community view if graph > 5000 nodes
477
- # or: graphify export html --no-viz
478
- ```
479
-
480
- ### Steps 6b-8 - Wiki, Neo4j, SVG, GraphML, MCP, benchmark (only on their flags)
481
-
482
- These run only when their flag is present (`--wiki`, `--neo4j`/`--neo4j-push`, `--svg`, `--graphml`, `--mcp`) or, for the token-reduction benchmark, when `total_words` exceeds 5,000. A default run with no export flags skips all of them. See `references/exports.md` for each one. Run any `--wiki` export before Step 9 cleanup so `.graphify_labels.json` is still available.
483
-
484
- ---
485
-
486
- ### Step 9 - Save manifest, update cost tracker, clean up, and report
487
-
488
- ```bash
489
- $(cat graphify-out/.graphify_python) -c "
490
- import json
491
- from pathlib import Path
492
- from datetime import datetime, timezone
493
- from graphify.detect import save_manifest
494
-
495
- # Save manifest for --update
496
- detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text(encoding=\"utf-8\"))
497
- # In --update mode, 'all_files' carries the full corpus; 'files' is the changed
498
- # subset. Full-rebuild mode populates only 'files', so the fallback handles that.
499
- save_manifest(detect.get('all_files') or detect['files'])
500
-
501
- # Update cumulative cost tracker
502
- extract = json.loads(Path('graphify-out/.graphify_extract.json').read_text(encoding=\"utf-8\"))
503
- input_tok = extract.get('input_tokens', 0)
504
- output_tok = extract.get('output_tokens', 0)
505
-
506
- cost_path = Path('graphify-out/cost.json')
507
- if cost_path.exists():
508
- cost = json.loads(cost_path.read_text(encoding=\"utf-8\"))
509
- else:
510
- cost = {'runs': [], 'total_input_tokens': 0, 'total_output_tokens': 0}
511
-
512
- cost['runs'].append({
513
- 'date': datetime.now(timezone.utc).isoformat(),
514
- 'input_tokens': input_tok,
515
- 'output_tokens': output_tok,
516
- 'files': detect.get('total_files', 0),
517
- })
518
- cost['total_input_tokens'] += input_tok
519
- cost['total_output_tokens'] += output_tok
520
- cost_path.write_text(json.dumps(cost, indent=2, ensure_ascii=False), encoding=\"utf-8\")
521
-
522
- print(f'This run: {input_tok:,} input tokens, {output_tok:,} output tokens')
523
- print(f'All time: {cost[\"total_input_tokens\"]:,} input, {cost[\"total_output_tokens\"]:,} output ({len(cost[\"runs\"])} runs)')
524
- "
525
- rm -f graphify-out/.graphify_detect.json graphify-out/.graphify_extract.json graphify-out/.graphify_ast.json graphify-out/.graphify_semantic.json graphify-out/.graphify_analysis.json
526
- find graphify-out -maxdepth 1 -name '.graphify_chunk_*.json' -delete 2>/dev/null
527
- rm -f graphify-out/.needs_update 2>/dev/null || true
528
- ```
529
-
530
- Tell the user (omit the obsidian line unless --obsidian was given):
531
- ```
532
- Graph complete. Outputs in PATH_TO_DIR/graphify-out/
533
-
534
- graph.html - interactive graph, open in browser
535
- GRAPH_REPORT.md - audit report
536
- graph.json - raw graph data
537
- obsidian/ - Obsidian vault (only if --obsidian was given)
538
- ```
539
-
540
- If graphify saved you time, consider supporting it: https://github.com/sponsors/safishamsi
541
-
542
- Replace PATH_TO_DIR with the actual absolute path of the directory that was processed.
543
-
544
- Then paste these sections from GRAPH_REPORT.md directly into the chat:
545
- - God Nodes
546
- - Surprising Connections
547
- - Suggested Questions
548
-
549
- Do NOT paste the full report - just those three sections. Keep it concise.
550
-
551
- Then immediately offer to explore. Pick the single most interesting suggested question from the report - the one that crosses the most community boundaries or has the most surprising bridge node - and ask:
552
-
553
- > "The most interesting question this graph can answer: **[question]**. Want me to trace it?"
554
-
555
- If the user says yes, run `/graphify query "[question]"` on the graph and walk them through the answer using the graph structure - which nodes connect, which community boundaries get crossed, what the path reveals. Keep going as long as they want to explore. Each answer should end with a natural follow-up ("this connects to X - want to go deeper?") so the session feels like navigation, not a one-shot report.
556
-
557
- The graph is the map. Your job after the pipeline is to be the guide.
558
-
559
- ---
560
-
561
- ## Interpreter guard for subcommands
562
-
563
- Before running any subcommand below (`--update`, `--cluster-only`, `query`, `path`, `explain`, `add`), check that `.graphify_python` exists. If it's missing (e.g. user deleted `graphify-out/`), re-resolve the interpreter first:
564
-
565
- ```bash
566
- if [ ! -f graphify-out/.graphify_python ]; then
567
- GRAPHIFY_BIN=$(which graphify 2>/dev/null)
568
- if [ -n "$GRAPHIFY_BIN" ]; then
569
- PYTHON=$(head -1 "$GRAPHIFY_BIN" | tr -d '#!')
570
- case "$PYTHON" in *[!a-zA-Z0-9/_.-]*) PYTHON="python3" ;; esac
571
- else
572
- PYTHON="python3"
573
- fi
574
- mkdir -p graphify-out
575
- "$PYTHON" -c "import sys; open('graphify-out/.graphify_python', 'w', encoding='utf-8').write(sys.executable)"
576
- fi
577
- ```
578
-
579
- ## For --update and --cluster-only
580
-
581
- Both are non-default subcommands. `--update` re-extracts only new or changed files; `--cluster-only` reruns clustering on the existing graph. See `references/update.md` for both flows.
582
-
583
- ---
584
-
585
- ## For /graphify query
586
-
587
- When `graphify-out/graph.json` already exists and the user asks a question about the corpus, answer from the graph rather than rebuilding it:
588
-
589
- ```bash
590
- graphify query "<question>"
591
- ```
592
-
593
- If the `graphify query` CLI is unavailable, fall back to an inline NetworkX traversal of `graphify-out/graph.json`. Answer using only what the graph output contains, and quote `source_location` when citing a specific fact. For the BFS/DFS traversal modes, the `--budget` cap, the NetworkX fallback, `save-result` feedback, and the `/graphify path` and `/graphify explain` flows, see `references/query.md`.
594
-
595
- ---
596
-
597
- ## For /graphify add and --watch
598
-
599
- Neither is part of the default build. When the user runs `/graphify add <url>` to fetch a URL into the corpus, or passes `--watch` to auto-rebuild on file changes, see `references/add-watch.md`.
600
-
601
- ---
602
-
603
- ## For the commit hook and native CLAUDE.md integration
604
-
605
- When the user asks to install the post-commit auto-rebuild hook or wire graphify into a project's CLAUDE.md, see `references/hooks.md`.
606
-
607
- ---
608
-
609
- ## Honesty Rules
610
-
611
- - Never invent an edge. If unsure, use AMBIGUOUS.
612
- - Never skip the corpus check warning.
613
- - Always show token cost in the report.
614
- - Never hide cohesion scores behind symbols - show the raw number.
615
- - Never run HTML viz on a graph with more than 5,000 nodes without warning the user.