codesummary 1.2.1 → 1.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,21 +5,22 @@
5
5
  [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
6
6
  [![Cross-Platform](https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey)](#)
7
7
 
8
- A **cross-platform CLI tool** that scans your project's source code and generates three types of output:
8
+ A **cross-platform CLI tool** that scans your project's source code and generates repository documentation optimized for both human review and Large Language Models (LLMs).
9
9
 
10
- - **PDF** — clean, professional A4 documentation for code reviews and audits
11
- - **RAG JSON** — semantic chunks with byte offsets and token estimates, ready for vector databases and LLM pipelines
12
- - **LLM Markdown** — token-optimised single file to paste directly into any chat-based LLM
10
+ - **LLM Markdown** — Token-optimized single file designed for direct consumption by any chat-based AI.
11
+ - **PDF Documentation** — Clean, professional A4 reports for code reviews, audits, and archival.
12
+ - **Structured Context** — Emits `.llmsummary.json` for downstream agentic workflows.
13
13
 
14
14
  ## 🚀 Key Features
15
15
 
16
- - **Three output formats**: PDF, RAG JSON, LLM Markdown pick what you need
17
- - **Intelligent scanning**: recursive traversal with configurable whitelist filtering
18
- - **Extensive language support**: 50+ file types out of the box
19
- - **Versioned output files**: `-v1`, `-v2` suffixes instead of overwriting or timestamps
20
- - **Non-interactive mode**: `--no-interactive` for CI/CD and scripted use
21
- - **Smart config migration**: new defaults merge into existing config without overwriting customisations
22
- - **Cross-platform**: identical behaviour on Windows, macOS, and Linux
16
+ - **LLM-First Output**: Optimized Markdown with lossless compression (whitespace stripping, JSON compaction) to maximize context window efficiency.
17
+ - **Deep Dependency Analysis**: Internal file-level graph engine with centrality metrics and hub/node identification.
18
+ - **Semantic Clustering**: Heuristic-based capability grouping (CLI, logic, utilities, etc.) to help AI understand project structure.
19
+ - **Language-Aware Extraction**: Specialized adapters for JavaScript/TypeScript and Python, with a universal fallback for 50+ other languages.
20
+ - **Professional PDF Reports**: High-quality snapshots including file trees, AI-assisted executive briefs, and syntax-highlighted source code.
21
+ - **Intelligent Scanning**: Recursive traversal with configurable filters and robust `.csignore` support (gitignore-style syntax).
22
+ - **Versioned Output**: Automatic `-v1`, `-v2` suffixes prevent overwriting previous reports.
23
+ - **CI/CD Ready**: `--no-interactive` mode for automated documentation pipelines.
23
24
 
24
25
  ## 📦 Installation
25
26
 
@@ -33,451 +34,116 @@ npm install -g codesummary
33
34
 
34
35
  ## 🎯 Output Modes
35
36
 
36
- ### 📄 PDFDocumentation & audits
37
+ ### 💬 LLMDirect Chat Context
37
38
 
38
- Generate a professional A4 PDF with file structure and complete source content:
39
-
40
- ```bash
41
- codesummary
42
- # or explicitly:
43
- codesummary --format pdf
44
- # Output: MYPROJECT_code.pdf
45
- ```
46
-
47
- Best for: code reviews, client handovers, compliance audits, archival snapshots.
48
-
49
- ---
50
-
51
- ### 🤖 RAG — Vector databases & AI pipelines
52
-
53
- Generate a structured JSON file built for embedding and retrieval:
54
-
55
- ```bash
56
- codesummary --format rag
57
- # Output: MYPROJECT_rag.json
58
- ```
59
-
60
- The JSON contains semantic chunks (split by function, class, or logical block) with:
61
- - Byte-accurate content offsets for fast seeking
62
- - SHA-256 file hashes for deduplication
63
- - Token estimates for context budget planning
64
- - Import/call extraction for graph-based retrieval
65
- - Full statistics for monitoring
66
-
67
- Best for: building RAG systems, loading code into a vector database (Pinecone, Qdrant, Chroma, etc.), or programmatic LLM integration where you control chunking and retrieval.
68
-
69
- ---
70
-
71
- ### 💬 LLM — Direct chat with any AI assistant
72
-
73
- Generate a single, token-optimised Markdown file to paste directly into any LLM:
39
+ Generate a single, token-optimized Markdown file to paste into ChatGPT, Claude, or any other LLM:
74
40
 
75
41
  ```bash
76
42
  codesummary --format llm
77
- # Output: MYPROJECT_llm.md
43
+ # Output: PROJECT_llm.md
78
44
  ```
79
45
 
80
- The file contains a project header, a complete file tree, and each file's content in a fenced code block with syntax highlighting hints. These lossless optimisations are applied automatically to reduce token count:
46
+ The file includes a project overview, a suggested reading order, dependency graphs, and full file contents with language hints. Lossless optimizations (normalizing line endings, stripping trailing whitespace, compacting JSON) are applied automatically.
81
47
 
82
- - Line endings normalised to `\n`
83
- - Trailing whitespace stripped per line
84
- - Leading/trailing blank lines removed per file
85
- - JSON files compacted (re-serialised without indentation)
86
- - Markdown files: max 2 consecutive blank lines preserved
87
- - All other files: max 1 consecutive blank line
48
+ ### 📄 PDF Audits & Handovers
88
49
 
89
- Best for: asking any LLM chat interface to review, explain, or work with your codebase in a single paste much more token-efficient than a PDF.
90
-
91
- ---
92
-
93
- ### 🔄 Both — PDF + RAG together
50
+ Generate a professional A4 PDF for human consumption:
94
51
 
95
52
  ```bash
96
- codesummary --format both
97
- # Output: MYPROJECT_code.pdf + MYPROJECT_rag.json
53
+ codesummary --format pdf
54
+ # Output: PROJECT_code.pdf
98
55
  ```
99
56
 
100
- Uses a single scan pass. If one format fails, the other still completes.
57
+ Best for: code reviews, client handovers, compliance audits, and physical archiving.
101
58
 
102
59
  ---
103
60
 
104
61
  ## 📖 Usage
105
62
 
106
- ### Quick start
63
+ ### Quick Start
107
64
 
108
65
  ```bash
109
- # First run: interactive setup wizard
66
+ # First run: Interactive setup wizard
110
67
  codesummary
111
68
 
112
69
  # Generate LLM Markdown for the current project
113
70
  codesummary --format llm
114
71
 
115
- # Generate RAG JSON and save to a specific directory
116
- codesummary --format rag --output ./ai-data
117
-
118
- # Skip all prompts (CI-friendly)
119
- codesummary --format llm --no-interactive
120
-
121
- # Generate everything at once
72
+ # Generate both formats in one run
122
73
  codesummary --format both
123
- ```
124
-
125
- ### Interactive workflow
126
-
127
- #### 1. First-run setup
128
74
 
75
+ # Generate a task-focused context pack (prioritizes relevant files)
76
+ codesummary --format llm --focus "authentication flow" --max-tokens 12000
129
77
  ```
130
- Welcome to CodeSummary!
131
- No configuration found. Starting setup...
132
-
133
- Where should the output be saved by default?
134
- > [ ] Current working directory (relative mode)
135
- > [x] Fixed folder (absolute mode)
136
78
 
137
- Enter absolute path for fixed folder:
138
- > ~/Desktop/CodeSummaries
139
- ```
79
+ ### Interactive Workflow
140
80
 
141
- #### 2. Extension selection
81
+ 1. **First-Run Setup**: Detects missing configuration and launches a wizard to set your default output paths.
82
+ 2. **Extension Selection**: Choose which file types to include (e.g., `.ts`, `.py`, `.md`).
83
+ 3. **Generation**: Files are scanned, analyzed, and rendered to your chosen format.
142
84
 
143
- ```
144
- Scan Summary:
145
- Extensions found: .js, .ts, .md, .json
146
- Total files: 127 — Total size: 2.4 MB
147
-
148
- Select file extensions to include:
149
- [x] .js → JavaScript (42 files)
150
- [x] .ts → TypeScript (28 files)
151
- [x] .md → Markdown (5 files)
152
- [ ] .json → JSON (52 files)
153
- ```
85
+ ### Optional AI Enrichment
154
86
 
155
- #### 3. Output
87
+ Enhance your reports with AI-generated architecture insights:
156
88
 
157
- ```
158
- SUCCESS: LLM-optimised Markdown generated successfully!
89
+ ```bash
90
+ # Using local Ollama
91
+ codesummary --format llm --ai-semantic --provider ollama --model llama3.1
159
92
 
160
- Output: ~/Desktop/CodeSummaries/MYPROJECT_llm.md
161
- Extensions: .js, .ts, .md
162
- Total files: 75
163
- File size: 1.1 MB
164
- Ready to paste into any LLM chat interface
93
+ # Using an OpenAI-compatible endpoint
94
+ codesummary --format llm --ai-semantic --provider openai-compatible --api-key YOUR_KEY
165
95
  ```
166
96
 
167
- ### Command reference
168
-
169
- | Command | Description |
170
- | ------- | ----------- |
171
- | `codesummary` | Scan and generate PDF (default) |
172
- | `codesummary --format pdf` | Generate PDF documentation |
173
- | `codesummary --format rag` | Generate RAG-optimised JSON |
174
- | `codesummary --format llm` | Generate LLM-optimised Markdown |
175
- | `codesummary --format both` | Generate PDF + RAG JSON |
176
- | `codesummary config` | Edit configuration interactively |
177
- | `codesummary --show-config` | Display current configuration |
178
- | `codesummary --reset-config` | Reset configuration to defaults |
179
-
180
- ### Options
181
-
182
- | Option | Short | Description |
183
- | ------ | ----- | ----------- |
184
- | `--format <format>` | `-f` | Output format: `pdf` (default), `rag`, `llm`, or `both` |
185
- | `--output <path>` | `-o` | Override output directory for this run |
186
- | `--no-interactive` | | Skip all prompts; auto-select all extensions |
187
- | `--show-config` | | Display current configuration |
188
- | `--reset-config` | | Reset configuration to defaults |
189
- | `--help` | `-h` | Show help |
190
- | `--version` | `-v` | Show version |
97
+ *Note: AI enrichment is optional. If the provider is unavailable, CodeSummary falls back to heuristic analysis.*
191
98
 
192
99
  ---
193
100
 
194
101
  ## ⚙️ Configuration
195
102
 
196
103
  Configuration is stored globally at:
197
-
198
- - **Linux/macOS**: `~/.codesummary/config.json`
199
104
  - **Windows**: `%APPDATA%\CodeSummary\config.json`
105
+ - **Linux/macOS**: `~/.codesummary/config.json`
200
106
 
201
- Existing configuration is never overwritten on upgrade — new defaults are merged in automatically.
202
-
203
- ### Default configuration
204
-
205
- ```json
206
- {
207
- "output": {
208
- "mode": "fixed",
209
- "fixedPath": "~/Desktop/CodeSummaries"
210
- },
211
- "allowedExtensions": [
212
- ".js", ".jsx", ".ts", ".tsx", ".json", ".html", ".css", ".scss",
213
- ".md", ".txt", ".py", ".java", ".cs", ".cpp", ".c", ".h",
214
- ".xml", ".yaml", ".yml", ".sh", ".bat", ".ps1",
215
- ".cfg", ".conf", ".env", ".local", ".service", ".timer",
216
- ".ino", ".j2", ".csv", ".tsv", ".crt", ".sql",
217
- ".toml", ".ini", ".properties", ".tf", ".tfvars", ".proto", ".prisma",
218
- ".dart", ".lua", ".r", ".ex", ".exs", ".pl", ".mk", ".cmake",
219
- ".mdx", ".astro", ".graphql", ".gql"
220
- ],
221
- "excludeDirs": [
222
- "node_modules", ".git", ".vscode", "dist", "build", "coverage",
223
- "out", "__pycache__", ".next", ".nuxt",
224
- ".idea", "target", ".gradle", "venv", ".venv",
225
- ".pytest_cache", ".mypy_cache", ".tox", ".terraform", ".turbo",
226
- ".angular", ".svelte-kit", ".yarn", ".pnpm-store",
227
- ".expo", ".dart_tool", "storybook-static", "htmlcov"
228
- ],
229
- "excludeFiles": [
230
- "*-lock.json", "*.lock", "*.min.js", "*.min.css", "*.map",
231
- ".DS_Store", "Thumbs.db", "desktop.ini", "ehthumbs.db",
232
- "*.pyc", "*.pyo", "*.class", "*.log", "*.tmp", "*.temp",
233
- "*.swp", "*.bak", "*.orig"
234
- ],
235
- "settings": {
236
- "documentTitle": "Project Code Summary",
237
- "maxFilesBeforePrompt": 500
238
- }
239
- }
240
- ```
241
-
242
- ---
243
-
244
- ## 📋 PDF structure
245
-
246
- Generated PDFs are A4 with three sections:
247
-
248
- 1. **Project overview** — title, project name, generation timestamp, included file types
249
- 2. **File structure** — complete sorted file listing
250
- 3. **File content** — full source of every selected file, monospace font, no truncation
251
-
252
- ---
253
-
254
- ## 🤖 RAG JSON structure
255
-
256
- ```json
257
- {
258
- "metadata": {
259
- "projectName": "MyProject",
260
- "generatedAt": "2025-07-31T08:00:00.000Z",
261
- "version": "3.1.0"
262
- },
263
- "files": [
264
- {
265
- "id": "abc123def456",
266
- "path": "src/component.js",
267
- "language": "JavaScript",
268
- "hash": "sha256-...",
269
- "chunks": [
270
- {
271
- "id": "chunk_abc123def456_0",
272
- "content": "function myFunction() { ... }",
273
- "tokenEstimate": 45,
274
- "lineStart": 1,
275
- "lineEnd": 15,
276
- "chunkingMethod": "semantic-function",
277
- "context": "function_myFunction",
278
- "imports": ["lodash", "react"],
279
- "calls": ["useState", "useEffect"]
280
- }
281
- ]
282
- }
283
- ],
284
- "index": {
285
- "chunkOffsets": {
286
- "chunk_abc123def456_0": {
287
- "contentStart": 12123,
288
- "contentEnd": 12356,
289
- "filePath": "src/component.js"
290
- }
291
- },
292
- "statistics": { "processingTimeMs": 245, "chunksWithValidOffsets": 387 }
293
- }
294
- }
295
- ```
296
-
297
- ### RAG integration example
298
-
299
- ```javascript
300
- const ragData = JSON.parse(fs.readFileSync('project_rag.json'));
301
-
302
- // Extract all chunks for embedding
303
- const chunks = ragData.files.flatMap(file =>
304
- file.chunks.map(chunk => ({
305
- id: chunk.id,
306
- content: chunk.content,
307
- metadata: { filePath: file.path, language: file.language }
308
- }))
309
- );
310
-
311
- // Store in your vector database
312
- for (const chunk of chunks) {
313
- const embedding = await embed(chunk.content);
314
- await vectorDB.upsert(chunk.id, embedding, chunk.metadata);
315
- }
316
- ```
317
-
318
- ---
319
-
320
- ## 💬 LLM Markdown structure
321
-
322
- ```markdown
323
- # MyProject — Code Summary
324
-
325
- **Generated:** 2026-04-05 | **Files:** 42 | **Total size:** 1.2 MB
326
-
327
- ---
328
-
329
- ## File Tree
330
-
331
- ```
332
- src/cli.js
333
- src/scanner.js
334
- ...
335
- ```
336
-
337
- ---
338
-
339
- ## src/cli.js
340
-
341
- ```js
342
- import chalk from 'chalk';
343
- ...
344
- ```
345
- ```
346
-
347
- Paste the `.md` file directly into any LLM chat interface. No further processing needed.
348
-
349
- ---
350
-
351
- ## 🔧 Advanced features
352
-
353
- ### Versioned output filenames
354
-
355
- When the target file already exists, CodeSummary creates a versioned copy instead of overwriting:
356
-
357
- ```
358
- MYPROJECT_llm.md ← exists
359
- MYPROJECT_llm-v1.md ← created
360
- MYPROJECT_llm-v1.md ← exists on next run
361
- MYPROJECT_llm-v2.md ← created
362
- ```
363
-
364
- This applies to all three output formats (PDF, RAG JSON, LLM Markdown).
365
-
366
- ### Non-interactive mode
367
-
368
- Skip all prompts and auto-select all detected extensions:
369
-
370
- ```bash
371
- codesummary --format llm --no-interactive
372
- ```
373
-
374
- Useful for CI pipelines or scripted documentation generation.
375
-
376
- ---
377
-
378
- ## 🎨 Supported file types
379
-
380
- | Extension | Type | Extension | Type |
381
- | --------- | ---- | --------- | ---- |
382
- | `.js` `.jsx` | JavaScript | `.ts` `.tsx` | TypeScript |
383
- | `.py` | Python | `.java` | Java |
384
- | `.cs` | C# | `.cpp` `.c` `.h` | C/C++ |
385
- | `.go` | Go | `.rs` | Rust |
386
- | `.swift` | Swift | `.kt` | Kotlin |
387
- | `.rb` | Ruby | `.php` | PHP |
388
- | `.dart` | Dart | `.lua` | Lua |
389
- | `.r` | R | `.ex` `.exs` | Elixir |
390
- | `.pl` | Perl | `.scala` | Scala |
391
- | `.html` | HTML | `.css` `.scss` | CSS |
392
- | `.vue` | Vue.js | `.svelte` | Svelte |
393
- | `.astro` | Astro | `.mdx` | MDX |
394
- | `.json` | JSON | `.yaml` `.yml` | YAML |
395
- | `.toml` | TOML | `.xml` | XML |
396
- | `.ini` | INI | `.properties` | Java Properties |
397
- | `.tf` `.tfvars` | Terraform | `.proto` | Protobuf |
398
- | `.prisma` | Prisma | `.graphql` `.gql` | GraphQL |
399
- | `.sql` | SQL | `.md` `.txt` | Docs |
400
- | `.sh` `.bash` | Shell | `.bat` | Batch |
401
- | `.ps1` | PowerShell | `.mk` `.cmake` | Build |
402
- | `.cfg` `.conf` | Config | `.env` `.local` | Environment |
403
- | `.service` `.timer` | Systemd | `.ino` | Arduino |
404
- | `.j2` | Jinja2 | `.csv` `.tsv` | Data |
405
- | `.crt` | Certificate | `.dockerfile` | Docker |
406
-
407
- ---
408
-
409
- ## 🛠️ Project structure
107
+ Existing settings are preserved during upgrades; new defaults are merged automatically.
410
108
 
411
- ```
412
- codesummary/
413
- ├── bin/
414
- │ └── codesummary.js # Entry point
415
- ├── src/
416
- │ ├── cli.js # Argument parsing, orchestration
417
- │ ├── scanner.js # Recursive directory scanning
418
- │ ├── pdfGenerator.js # PDF generation (PDFKit)
419
- │ ├── ragGenerator.js # RAG JSON generation with semantic chunking
420
- │ ├── llmGenerator.js # LLM Markdown generation with optimisations
421
- │ ├── configManager.js # Global config storage and migration
422
- │ ├── ragConfig.js # RAG-specific configuration and YAML loading
423
- │ ├── errorHandler.js # Centralised error handling and path validation
424
- │ └── utils.js # Shared utilities (formatFileSize, etc.)
425
- ├── rag-schema.json
426
- ├── raggen.config.yaml
427
- └── package.json
428
- ```
109
+ ### `.csignore` Support
110
+ Use a `.csignore` file in your project root to exclude sensitive paths (keys, tokens, private data) using standard `.gitignore` syntax.
429
111
 
430
112
  ---
431
113
 
432
- ## 🔍 Troubleshooting
114
+ ## 🛠️ Project Structure
433
115
 
434
- **No files found after scan**
435
- - Check `allowedExtensions` in your config (`codesummary --show-config`)
436
- - Verify the directory is not listed in `excludeDirs`
437
-
438
- **Output file not generated**
439
- - Check write permissions on the output directory
440
- - Try `--output ./` to write to the current directory
441
-
442
- **Non-ASCII characters in paths cause issues**
443
- - Update to v1.2.0+ which fixes Windows path handling for accented characters
444
-
445
- **CI pipeline hangs**
446
- - Add `--no-interactive` to skip all prompts
116
+ - `bin/codesummary.js`: CLI Entry point.
117
+ - `src/cli.js`: Argument parsing and flow orchestration.
118
+ - `src/scanner.js`: Recursive file system traversal and filtering.
119
+ - `src/graph/`: Dependency graph engine and language adapters.
120
+ - `src/llmGenerator.js`: Token-optimized Markdown generation.
121
+ - `src/pdfGenerator.js`: High-fidelity PDF generation (PDFKit).
122
+ - `src/configManager.js`: Global configuration and migration logic.
447
123
 
448
124
  ---
449
125
 
450
126
  ## 🤝 Contributing
451
127
 
452
- 1. Fork the repository
453
- 2. Clone: `git clone https://github.com/skamoll/CodeSummary.git`
454
- 3. Install: `npm install`
455
- 4. Test: `node bin/codesummary.js --help`
456
- 5. Submit a pull request
128
+ 1. Fork the repository.
129
+ 2. Clone: `git clone https://github.com/skamoll/CodeSummary.git`
130
+ 3. Install: `npm install`
131
+ 4. Submit a pull request.
457
132
 
458
133
  ---
459
134
 
460
135
  ## 📄 License
461
136
 
462
- GNU General Public License v3.0 see [LICENSE](LICENSE) for details.
137
+ Distributed under the **GNU General Public License v3.0**. See `LICENSE` for details.
463
138
 
464
139
  ---
465
140
 
466
141
  ## 📊 Roadmap
467
142
 
468
- - [ ] Syntax highlighting in PDF output
469
- - [ ] Clickable table of contents in PDF
470
- - [x] LLM-optimised Markdown output (`--format llm`)
471
- - [x] Versioned output filenames (`-v1`, `-v2`)
472
- - [x] Non-interactive mode (`--no-interactive`)
473
- - [x] RAG JSON with semantic chunking
474
- - [ ] `--format all` (PDF + RAG + LLM in one pass)
475
- - [ ] Git integration (document only changed files)
476
- - [ ] CI/CD plugin for automated documentation
477
-
478
- ---
479
-
480
- ## 📞 Support
143
+ - [x] LLM-optimized Markdown output.
144
+ - [x] Versioned output filenames.
145
+ - [x] Deep dependency graph analysis.
146
+ - [x] VS Code Extension (MVP).
147
+ - [ ] Automated regression test suite.
481
148
 
482
- - Report bugs: [GitHub Issues](https://github.com/skamoll/CodeSummary/issues)
483
- - Questions: [GitHub Discussions](https://github.com/skamoll/CodeSummary/discussions)
149
+ See [ROADMAP.md](ROADMAP.md) for detailed product direction.