@plusplusoneplusplus/deep-wiki 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,220 @@
1
+ # Deep Wiki Generator
2
+
3
+ A standalone CLI tool that auto-generates a comprehensive, browsable wiki for any codebase. Uses the Copilot SDK with MCP tools (grep, glob, view) to discover, analyze, and document repository structure, modules, and dependencies.
4
+
5
+ All code stays local — nothing leaves your machine.
6
+
7
+ ## Installation
8
+
9
+ ```bash
10
+ # From the monorepo root
11
+ npm install
12
+
13
+ # Build
14
+ cd packages/deep-wiki
15
+ npm run build
16
+ ```
17
+
18
+ ## Usage
19
+
20
+ ### Generate Full Wiki
21
+
22
+ ```bash
23
+ # Basic — runs all 5 phases (discover → consolidate → analyze → write → website)
24
+ deep-wiki generate /path/to/repo --output ./wiki
25
+
26
+ # With options
27
+ deep-wiki generate /path/to/repo \
28
+ --output ./wiki \
29
+ --model claude-sonnet \
30
+ --concurrency 5 \
31
+ --depth normal \
32
+ --focus "src/" \
33
+ --timeout 300 \
34
+ --verbose
35
+
36
+ # Resume from Phase 3 (reuse cached discovery + consolidation)
37
+ deep-wiki generate /path/to/repo --output ./wiki --phase 3
38
+
39
+ # Resume from Phase 4 (reuse cached discovery + consolidation + analysis)
40
+ deep-wiki generate /path/to/repo --output ./wiki --phase 4
41
+
42
+ # Force full regeneration (ignore all caches)
43
+ deep-wiki generate /path/to/repo --output ./wiki --force
44
+ ```
45
+
46
+ ### Discover Module Graph Only (Phase 1)
47
+
48
+ ```bash
49
+ deep-wiki discover /path/to/repo --output ./wiki --verbose
50
+ ```
51
+
52
+ ### Generate Options
53
+
54
+ | Option | Description | Default |
55
+ |--------|-------------|---------|
56
+ | `-o, --output <path>` | Output directory for wiki | `./wiki` |
57
+ | `-m, --model <model>` | AI model to use | SDK default |
58
+ | `-c, --concurrency <n>` | Parallel AI sessions | `5` |
59
+ | `-t, --timeout <seconds>` | Timeout per phase | 300 (5 min) |
60
+ | `--depth <level>` | Article detail: `shallow`, `normal`, `deep` | `normal` |
61
+ | `--focus <path>` | Focus on a specific subtree | Full repo |
62
+ | `--phase <n>` | Resume from phase N (1, 2, 3, or 4) | `1` |
63
+ | `--force` | Ignore all caches, regenerate everything | `false` |
64
+ | `-v, --verbose` | Verbose logging | `false` |
65
+ | `--no-color` | Disable colored output | Colors on |
66
+
67
+ ### Output Structure
68
+
69
+ **Small repos** (flat layout):
70
+ ```
71
+ wiki/
72
+ ├── index.md # Project overview + categorized table of contents
73
+ ├── architecture.md # High-level architecture with Mermaid diagrams
74
+ ├── getting-started.md # Prerequisites, setup, build, run instructions
75
+ ├── module-graph.json # Raw Phase 1 discovery output
76
+ └── modules/
77
+ ├── auth.md # Per-module article
78
+ ├── database.md
79
+ └── ...
80
+ ```
81
+
82
+ **Large repos** (3-level hierarchical layout — automatic for repos with 3000+ files):
83
+ ```
84
+ wiki/
85
+ ├── index.md # Project-level index (links to areas)
86
+ ├── architecture.md # Project-level architecture
87
+ ├── getting-started.md # Project-level getting started
88
+ ├── module-graph.json # Raw Phase 1 discovery output
89
+ ├── areas/
90
+ │ ├── packages-core/
91
+ │ │ ├── index.md # Area index (links to its modules)
92
+ │ │ ├── architecture.md # Area-level architecture diagram
93
+ │ │ └── modules/
94
+ │ │ ├── auth.md
95
+ │ │ ├── database.md
96
+ │ │ └── ...
97
+ │ ├── packages-api/
98
+ │ │ ├── index.md
99
+ │ │ ├── architecture.md
100
+ │ │ └── modules/
101
+ │ │ ├── routes.md
102
+ │ │ └── ...
103
+ │ └── ...
104
+ └── modules/ # (empty — modules live under their area)
105
+ ```
106
+
107
+ The hierarchical layout activates automatically when Phase 1 discovers top-level areas (repos with 3000+ files). No additional CLI flags needed.
108
+
109
+ ## Five-Phase Pipeline
110
+
111
+ ### Phase 1: Discovery (~1-3 min)
112
+
113
+ A single AI session with MCP tools scans the repo and produces a `ModuleGraph` JSON:
114
+ - Project info (name, language, build system, entry points)
115
+ - Modules (id, name, path, purpose, key files, dependencies, complexity, category)
116
+ - Categories and architecture notes
117
+
118
+ Large repos (3000+ files) use multi-round discovery automatically.
119
+
120
+ ### Phase 2: Consolidation
121
+
122
+ Consolidates and refines the module graph from Phase 1 before analysis.
123
+
124
+ ### Phase 3: Deep Analysis (~2-10 min)
125
+
126
+ Parallel AI sessions (each with read-only MCP tools) analyze every module:
127
+ - Public API, internal architecture, data flow
128
+ - Design patterns, error handling, code examples
129
+ - Internal and external dependency mapping
130
+ - Suggested Mermaid diagrams
131
+
132
+ Three depth levels control investigation thoroughness:
133
+ - **shallow** — overview + public API only (fastest)
134
+ - **normal** — 7-step investigation (default)
135
+ - **deep** — 10-step exhaustive analysis with performance and edge cases
136
+
137
+ ### Phase 4: Article Generation (~2-5 min)
138
+
139
+ Parallel AI sessions (session pool, no tools needed) write markdown articles:
140
+ - **Map phase** — one article per module with cross-links between modules
141
+ - **Reduce phase** — AI generates index, architecture, and getting-started pages
142
+
143
+ For large repos with areas, Phase 4 uses a 2-tier reduce:
144
+ 1. **Per-area reduce** — generates area index + area architecture (10-30 modules per area)
145
+ 2. **Project-level reduce** — receives area summaries → generates project index + architecture + getting-started
146
+
147
+ ### Phase 5: Website
148
+
149
+ Creates optional static HTML website with navigation, themes (light/dark/auto). Use `--skip-website` to omit.
150
+
151
+ ## Incremental Rebuilds
152
+
153
+ Subsequent runs are faster thanks to per-module caching:
154
+
155
+ 1. Git diff detects changed files since last analysis
156
+ 2. Changed files are mapped to affected modules
157
+ 3. Only affected modules are re-analyzed (unchanged modules load from cache)
158
+ 4. Phase 4 always re-runs (cheap, cross-links may need updating)
159
+
160
+ Cache is stored in `<output>/.wiki-cache/`. Use `--force` to bypass. Article cache supports area-scoped storage: `articles/{area-id}/{module-id}.json`.
161
+
162
+ ## Testing
163
+
164
+ ```bash
165
+ # Run all tests
166
+ npm run test:run
167
+
168
+ # Watch mode
169
+ npm test
170
+ ```
171
+
172
+ 451 tests across 21 test files covering all phases: types, schemas, AI invoker, prompt generation, response parsing, map-reduce orchestration, file writing, caching (with incremental rebuild), CLI parsing, command integration, hierarchical output, area tagging, and area-scoped article caching.
173
+
174
+ ## Architecture
175
+
176
+ ```
177
+ src/
178
+ ├── index.ts # CLI entry point
179
+ ├── cli.ts # Commander program (discover + generate)
180
+ ├── types.ts # All shared types (Phase 1+3+4)
181
+ ├── schemas.ts # JSON schemas + validation helpers
182
+ ├── logger.ts # Colored CLI output + spinner
183
+ ├── ai-invoker.ts # Analysis + writing invoker factories
184
+ ├── commands/
185
+ │ ├── discover.ts # deep-wiki discover <repo>
186
+ │ └── generate.ts # deep-wiki generate <repo> (5-phase orchestration)
187
+ ├── discovery/
188
+ │ ├── index.ts # discoverModuleGraph()
189
+ │ ├── prompts.ts # Discovery prompt templates
190
+ │ ├── discovery-session.ts # SDK session orchestration
191
+ │ ├── response-parser.ts # JSON extraction + validation
192
+ │ └── large-repo-handler.ts # Multi-round for big repos
193
+ ├── consolidation/
194
+ │ ├── index.ts # consolidateModules()
195
+ │ ├── consolidator.ts # Hybrid orchestration
196
+ │ ├── rule-based-consolidator.ts
197
+ │ └── ai-consolidator.ts
198
+ ├── analysis/
199
+ │ ├── index.ts # analyzeModules()
200
+ │ ├── prompts.ts # Analysis prompt templates (3 depths)
201
+ │ ├── analysis-executor.ts # MapReduceExecutor orchestration
202
+ │ └── response-parser.ts # ModuleAnalysis JSON parsing + Mermaid validation
203
+ ├── writing/
204
+ │ ├── index.ts # generateArticles()
205
+ │ ├── prompts.ts # Module article prompt templates
206
+ │ ├── reduce-prompts.ts # Index/architecture/getting-started prompts
207
+ │ ├── article-executor.ts # MapReduceExecutor orchestration
208
+ │ └── file-writer.ts # Write markdown to disk (flat + hierarchical layouts)
209
+ └── cache/
210
+ ├── index.ts # Cache manager (graph + consolidation + analyses + area-scoped articles)
211
+ └── git-utils.ts # Git hash + change detection
212
+ ```
213
+
214
+ ## Dependencies
215
+
216
+ | Package | Purpose |
217
+ |---------|---------|
218
+ | `@plusplusoneplusplus/pipeline-core` | AI SDK, MapReduceExecutor, JSON extraction |
219
+ | `commander` | CLI argument parsing |
220
+ | `js-yaml` | YAML handling |