codesummary 1.1.1 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +234 -190
- package/LICENSE +674 -674
- package/README.md +483 -607
- package/bin/codesummary.js +12 -12
- package/features.md +418 -502
- package/package.json +95 -95
- package/rag-schema.json +113 -113
- package/src/cli.js +599 -540
- package/src/configManager.js +880 -827
- package/src/errorHandler.js +474 -477
- package/src/index.js +25 -25
- package/src/llmGenerator.js +189 -0
- package/src/pdfGenerator.js +408 -475
- package/src/ragConfig.js +369 -373
- package/src/ragGenerator.js +1739 -1757
- package/src/scanner.js +386 -467
- package/src/utils.js +139 -0
package/features.md
CHANGED
|
@@ -1,502 +1,418 @@
|
|
|
1
|
-
# CodeSummary – Detailed Features and Functional Specification
|
|
2
|
-
|
|
3
|
-
## 1. Overview
|
|
4
|
-
|
|
5
|
-
**CodeSummary** is a **Node.js-based, cross-platform CLI tool** (distributed via **npm**) that automatically scans a project's source code and generates
|
|
6
|
-
|
|
7
|
-
- **
|
|
8
|
-
- **
|
|
9
|
-
- **
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
- **
|
|
22
|
-
- **
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
- **
|
|
44
|
-
- **
|
|
45
|
-
- **
|
|
46
|
-
- **
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
| `
|
|
74
|
-
| `
|
|
75
|
-
|
|
|
76
|
-
|
|
|
77
|
-
| `
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
-
|
|
161
|
-
-
|
|
162
|
-
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
-
|
|
202
|
-
-
|
|
203
|
-
-
|
|
204
|
-
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
#### 2.4
|
|
250
|
-
|
|
251
|
-
|
|
252
|
-
-
|
|
253
|
-
-
|
|
254
|
-
-
|
|
255
|
-
-
|
|
256
|
-
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
|
|
266
|
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
|
|
271
|
-
|
|
272
|
-
#### 2.4.2
|
|
273
|
-
|
|
274
|
-
**
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
|
|
283
|
-
**
|
|
284
|
-
|
|
285
|
-
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
**
|
|
293
|
-
|
|
294
|
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
|
|
320
|
-
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
|
|
336
|
-
|
|
337
|
-
|
|
338
|
-
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
- **
|
|
346
|
-
- **
|
|
347
|
-
- **
|
|
348
|
-
- **
|
|
349
|
-
- **
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|
|
379
|
-
|
|
380
|
-
|
|
381
|
-
|
|
382
|
-
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
|
|
388
|
-
|
|
389
|
-
|
|
390
|
-
|
|
391
|
-
|
|
392
|
-
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
|
|
396
|
-
|
|
397
|
-
|
|
398
|
-
|
|
399
|
-
|
|
400
|
-
-
|
|
401
|
-
-
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
-
|
|
410
|
-
-
|
|
411
|
-
-
|
|
412
|
-
-
|
|
413
|
-
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
|
|
419
|
-
- Unicode and internationalization support
|
|
420
|
-
|
|
421
|
-
---
|
|
422
|
-
|
|
423
|
-
## 4. Quality Assurance
|
|
424
|
-
|
|
425
|
-
### 4.1 Testing Strategy
|
|
426
|
-
|
|
427
|
-
- **Cross-platform testing** on Windows, macOS, and Linux
|
|
428
|
-
- **Large project stress testing** with thousands of files
|
|
429
|
-
- **Memory usage profiling** for optimization
|
|
430
|
-
- **Terminal compatibility verification** across different environments
|
|
431
|
-
- **File conflict scenario testing** with various edge cases
|
|
432
|
-
|
|
433
|
-
### 4.2 Security Considerations
|
|
434
|
-
|
|
435
|
-
- **Path traversal prevention** with input validation
|
|
436
|
-
- **Permission-based access control** respecting system security
|
|
437
|
-
- **No external network dependencies** for complete offline operation
|
|
438
|
-
- **Safe file handling** with proper error boundaries
|
|
439
|
-
- **Configuration validation** to prevent malicious settings
|
|
440
|
-
|
|
441
|
-
### 4.3 Documentation Standards
|
|
442
|
-
|
|
443
|
-
- **Comprehensive README** with usage examples
|
|
444
|
-
- **Detailed feature specification** (this document)
|
|
445
|
-
- **Inline code documentation** with JSDoc standards
|
|
446
|
-
- **Error message clarity** with actionable guidance
|
|
447
|
-
- **Contributing guidelines** for open-source collaboration
|
|
448
|
-
|
|
449
|
-
---
|
|
450
|
-
|
|
451
|
-
## 5. Future Enhancements
|
|
452
|
-
|
|
453
|
-
### 5.1 Planned Features
|
|
454
|
-
|
|
455
|
-
- **Syntax highlighting** in PDF output for better code readability
|
|
456
|
-
- **Clickable table of contents** with bookmarks for navigation
|
|
457
|
-
- **Multiple output formats** (HTML, JSON, Markdown)
|
|
458
|
-
- **Project metrics and statistics** (line counts, complexity analysis)
|
|
459
|
-
- **CI/CD integration mode** for automated documentation pipelines
|
|
460
|
-
- **Custom PDF themes** and styling options
|
|
461
|
-
- **Plugin system** for custom file processors
|
|
462
|
-
|
|
463
|
-
### 5.2 Advanced Capabilities
|
|
464
|
-
|
|
465
|
-
- **Incremental updates** for changed files only
|
|
466
|
-
- **Git integration** for commit-specific documentation
|
|
467
|
-
- **Code annotation** system for additional context
|
|
468
|
-
- **Multi-language support** for international users
|
|
469
|
-
- **Web-based configuration** interface for easier setup
|
|
470
|
-
- **Integration APIs** for third-party tools
|
|
471
|
-
|
|
472
|
-
---
|
|
473
|
-
|
|
474
|
-
## 6. Success Metrics
|
|
475
|
-
|
|
476
|
-
### 6.1 Performance Targets
|
|
477
|
-
|
|
478
|
-
- **Scan speed**: >1000 files per second on modern hardware
|
|
479
|
-
- **Memory usage**: <200MB for projects with 10,000+ files
|
|
480
|
-
- **PDF generation**: <30 seconds for typical projects (100 files)
|
|
481
|
-
- **Cross-platform consistency**: 100% feature parity across platforms
|
|
482
|
-
|
|
483
|
-
### 6.2 Quality Targets
|
|
484
|
-
|
|
485
|
-
- **Zero data loss**: All file content included without truncation
|
|
486
|
-
- **Error rate**: <0.1% failure rate on valid projects
|
|
487
|
-
- **User satisfaction**: Clear, actionable error messages for all failure cases
|
|
488
|
-
- **Compatibility**: Works on 99%+ of supported platform/terminal combinations
|
|
489
|
-
|
|
490
|
-
---
|
|
491
|
-
|
|
492
|
-
## 7. Conclusion
|
|
493
|
-
|
|
494
|
-
CodeSummary represents a comprehensive solution for automated code documentation, combining professional-grade PDF output with intelligent file processing and cross-platform compatibility. Its focus on complete content inclusion, smart conflict handling, and terminal compatibility makes it suitable for both individual developers and enterprise environments.
|
|
495
|
-
|
|
496
|
-
The tool's architecture supports unlimited scalability while maintaining efficient resource usage, ensuring it can handle projects of any size. With its extensive language support and intelligent filtering, CodeSummary serves as a valuable tool for code reviews, audits, documentation, and archival purposes.
|
|
497
|
-
|
|
498
|
-
---
|
|
499
|
-
|
|
500
|
-
**Document Version**: 2.0
|
|
501
|
-
**Last Updated**: January 2025
|
|
502
|
-
**Status**: Implementation Complete - Ready for Release
|
|
1
|
+
# CodeSummary – Detailed Features and Functional Specification
|
|
2
|
+
|
|
3
|
+
## 1. Overview
|
|
4
|
+
|
|
5
|
+
**CodeSummary** is a **Node.js-based, cross-platform CLI tool** (distributed via **npm**) that automatically scans a project's source code and generates output in three formats:
|
|
6
|
+
|
|
7
|
+
- **PDF**: clean, professional A4 documentation for code reviews, audits, and archival snapshots
|
|
8
|
+
- **RAG JSON**: structured output with semantic chunks, byte offsets, and token estimates — built for vector databases and programmatic LLM integration
|
|
9
|
+
- **LLM Markdown**: a single token-optimised Markdown file for pasting directly into any chat-based LLM (any LLM chat interface)
|
|
10
|
+
|
|
11
|
+
> **Repository**: [https://github.com/skamoll/CodeSummary](https://github.com/skamoll/CodeSummary)
|
|
12
|
+
> **npm Package Name**: `codesummary`
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
### 1.1 Target Audience
|
|
17
|
+
|
|
18
|
+
- **Developers** who need quick, complete overviews of large projects
|
|
19
|
+
- **Auditors / Consultants** requiring traceable documentation snapshots
|
|
20
|
+
- **Educators / Students** preparing comprehensive code handovers
|
|
21
|
+
- **AI Engineers** building RAG pipelines or feeding code into vector databases
|
|
22
|
+
- **Anyone** who wants to work with their codebase inside a chat-based LLM efficiently
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
### 1.2 Core Objectives
|
|
27
|
+
|
|
28
|
+
1. **Three output modes** — PDF for humans, RAG JSON for machines, LLM Markdown for chat
|
|
29
|
+
2. **Cross-platform reliability** — identical behaviour on Windows, macOS, and Linux
|
|
30
|
+
3. **Lossless content optimisation** — reduce token count without altering code meaning
|
|
31
|
+
4. **Smart config migration** — new defaults merge into existing config without data loss
|
|
32
|
+
5. **Versioned output** — `-v1`, `-v2` suffixes prevent overwrites and timestamp clutter
|
|
33
|
+
6. **Non-interactive operation** — `--no-interactive` for CI/CD pipelines
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
### 1.3 Technology Stack
|
|
38
|
+
|
|
39
|
+
- **Node.js** ≥ 18 (native ES modules)
|
|
40
|
+
- **PDFKit** for PDF generation with streaming support
|
|
41
|
+
- **Inquirer.js** for interactive prompts
|
|
42
|
+
- **Chalk** for terminal styling
|
|
43
|
+
- **Ora** for progress indicators
|
|
44
|
+
- **fs-extra** for enhanced file system operations
|
|
45
|
+
- **js-yaml** for YAML config loading
|
|
46
|
+
- **ajv** for JSON schema validation
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## 2. Functional Requirements
|
|
51
|
+
|
|
52
|
+
### 2.1 Command-Line Interface
|
|
53
|
+
|
|
54
|
+
#### 2.1.1 Primary Commands
|
|
55
|
+
|
|
56
|
+
| Command | Description |
|
|
57
|
+
|---------|-------------|
|
|
58
|
+
| `codesummary` | Scan current directory, generate PDF |
|
|
59
|
+
| `codesummary --format rag` | Generate RAG-optimised JSON |
|
|
60
|
+
| `codesummary --format llm` | Generate LLM-optimised Markdown |
|
|
61
|
+
| `codesummary --format both` | Generate PDF + RAG JSON (single scan) |
|
|
62
|
+
| `codesummary config` | Launch interactive configuration editor |
|
|
63
|
+
| `codesummary --show-config` | Display current configuration |
|
|
64
|
+
| `codesummary --reset-config` | Reset to defaults and run setup wizard |
|
|
65
|
+
| `codesummary --help` | Show help |
|
|
66
|
+
| `codesummary --version` | Show version |
|
|
67
|
+
|
|
68
|
+
#### 2.1.2 Command-Line Options
|
|
69
|
+
|
|
70
|
+
| Option | Short | Description |
|
|
71
|
+
|--------|-------|-------------|
|
|
72
|
+
| `--format <format>` | `-f` | `pdf` (default), `rag`, `llm`, or `both` |
|
|
73
|
+
| `--output <path>` | `-o` | Override output directory for this run |
|
|
74
|
+
| `--no-interactive` | | Skip all prompts; auto-select all extensions |
|
|
75
|
+
| `--show-config` | | Display current configuration |
|
|
76
|
+
| `--reset-config` | | Reset configuration to defaults |
|
|
77
|
+
| `--help` | `-h` | Show help message |
|
|
78
|
+
| `--version` | `-v` | Show version |
|
|
79
|
+
|
|
80
|
+
#### 2.1.3 Interactive Workflow
|
|
81
|
+
|
|
82
|
+
1. **First-run setup** — detects missing config, launches setup wizard, creates output directory
|
|
83
|
+
2. **Directory scanning** — recursive scan with whitelist filtering and exclusion rules
|
|
84
|
+
3. **Extension selection** — checkbox prompt with file counts; skipped with `--no-interactive`
|
|
85
|
+
4. **Generation** — selected format(s) generated, versioned filenames used if target exists
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
### 2.2 Output Formats
|
|
90
|
+
|
|
91
|
+
#### 2.2.1 PDF (`--format pdf`)
|
|
92
|
+
|
|
93
|
+
Generates a professional A4 PDF with three sections:
|
|
94
|
+
|
|
95
|
+
1. **Project overview**: title, project name, timestamp, included file types
|
|
96
|
+
2. **File structure**: sorted complete file listing
|
|
97
|
+
3. **File content**: full source of every selected file, monospace font, no truncation
|
|
98
|
+
|
|
99
|
+
File naming: `PROJECTNAME_code.pdf` → `PROJECTNAME_code-v1.pdf` → `PROJECTNAME_code-v2.pdf` ...
|
|
100
|
+
|
|
101
|
+
#### 2.2.2 RAG JSON (`--format rag`)
|
|
102
|
+
|
|
103
|
+
Generates a structured JSON file built for embedding and retrieval in AI/ML pipelines.
|
|
104
|
+
|
|
105
|
+
**When to use RAG:**
|
|
106
|
+
- Loading code into a vector database (Pinecone, Qdrant, Chroma, etc.)
|
|
107
|
+
- Building a retrieval-augmented generation pipeline
|
|
108
|
+
- Programmatic LLM integration where you control chunking and retrieval
|
|
109
|
+
- Rapid chunk seeking via byte offsets without re-parsing the full JSON
|
|
110
|
+
|
|
111
|
+
**JSON structure:**
|
|
112
|
+
```json
|
|
113
|
+
{
|
|
114
|
+
"metadata": { "projectName": "...", "generatedAt": "...", "version": "..." },
|
|
115
|
+
"files": [
|
|
116
|
+
{
|
|
117
|
+
"id": "abc123",
|
|
118
|
+
"path": "src/component.js",
|
|
119
|
+
"language": "JavaScript",
|
|
120
|
+
"hash": "sha256-...",
|
|
121
|
+
"chunks": [
|
|
122
|
+
{
|
|
123
|
+
"id": "chunk_abc123_0",
|
|
124
|
+
"content": "function myFn() { ... }",
|
|
125
|
+
"tokenEstimate": 45,
|
|
126
|
+
"lineStart": 1,
|
|
127
|
+
"lineEnd": 15,
|
|
128
|
+
"chunkingMethod": "semantic-function",
|
|
129
|
+
"context": "function_myFn",
|
|
130
|
+
"imports": ["react"],
|
|
131
|
+
"calls": ["useState"]
|
|
132
|
+
}
|
|
133
|
+
]
|
|
134
|
+
}
|
|
135
|
+
],
|
|
136
|
+
"index": {
|
|
137
|
+
"chunkOffsets": {
|
|
138
|
+
"chunk_abc123_0": { "contentStart": 12123, "contentEnd": 12356 }
|
|
139
|
+
},
|
|
140
|
+
"statistics": { "processingTimeMs": 245, "chunksWithValidOffsets": 387 }
|
|
141
|
+
}
|
|
142
|
+
}
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
**Key RAG features:**
|
|
146
|
+
- Semantic chunking by function, class, or logical block
|
|
147
|
+
- Byte-accurate content offsets for fast random access
|
|
148
|
+
- SHA-256 file hashes for deduplication
|
|
149
|
+
- Language-aware token estimation (±20% accuracy)
|
|
150
|
+
- Import and call graph extraction
|
|
151
|
+
- YAML-configurable via `raggen.config.yaml`
|
|
152
|
+
|
|
153
|
+
File naming: `PROJECTNAME_rag.json` → `PROJECTNAME_rag-v1.json` → ...
|
|
154
|
+
|
|
155
|
+
#### 2.2.3 LLM Markdown (`--format llm`)
|
|
156
|
+
|
|
157
|
+
Generates a single Markdown file optimised for direct consumption by chat-based LLMs.
|
|
158
|
+
|
|
159
|
+
**When to use LLM Markdown:**
|
|
160
|
+
- Asking any LLM chat interface to review or explain your codebase
|
|
161
|
+
- One-off questions that don't justify setting up a RAG pipeline
|
|
162
|
+
- Sharing project context in a conversation without a file upload feature
|
|
163
|
+
|
|
164
|
+
**File structure:**
|
|
165
|
+
```markdown
|
|
166
|
+
# ProjectName — Code Summary
|
|
167
|
+
|
|
168
|
+
**Generated:** 2026-04-05 | **Files:** 42 | **Total size:** 1.2 MB
|
|
169
|
+
|
|
170
|
+
---
|
|
171
|
+
|
|
172
|
+
## File Tree
|
|
173
|
+
|
|
174
|
+
```
|
|
175
|
+
src/cli.js
|
|
176
|
+
src/scanner.js
|
|
177
|
+
...
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
---
|
|
181
|
+
|
|
182
|
+
## src/cli.js
|
|
183
|
+
|
|
184
|
+
```js
|
|
185
|
+
// full file content
|
|
186
|
+
```
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
**Lossless optimisations applied automatically:**
|
|
190
|
+
|
|
191
|
+
| Optimisation | Applies to | Notes |
|
|
192
|
+
|---|---|---|
|
|
193
|
+
| Normalise line endings (`\r\n` → `\n`) | All files | Safe for all languages |
|
|
194
|
+
| Strip trailing whitespace per line | All files | Never has semantic meaning |
|
|
195
|
+
| Remove leading/trailing blank lines | All files | Per-file trimming |
|
|
196
|
+
| Compact JSON | `.json` files | Re-serialise without indentation |
|
|
197
|
+
| Max 2 consecutive blank lines | `.md` / `.mdx` | Preserves paragraph semantics |
|
|
198
|
+
| Max 1 consecutive blank line | All other files | Removes relleno without touching indentation |
|
|
199
|
+
|
|
200
|
+
**What is never modified:**
|
|
201
|
+
- Indentation (critical for Python, YAML, Makefiles)
|
|
202
|
+
- Multiple spaces within a line (may be in string literals)
|
|
203
|
+
- Comments
|
|
204
|
+
- Code logic
|
|
205
|
+
|
|
206
|
+
File naming: `PROJECTNAME_llm.md` → `PROJECTNAME_llm-v1.md` → ...
|
|
207
|
+
|
|
208
|
+
#### 2.2.4 Both (`--format both`)
|
|
209
|
+
|
|
210
|
+
Runs PDF and RAG generation in sequence using a single scan pass. Uses continue-on-error: if one format fails, the other still completes. Exit code 1 if either failed.
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
### 2.3 Configuration Management
|
|
215
|
+
|
|
216
|
+
#### 2.3.1 Storage Locations
|
|
217
|
+
|
|
218
|
+
- **Linux/macOS**: `~/.codesummary/config.json`
|
|
219
|
+
- **Windows**: `%APPDATA%\CodeSummary\config.json`
|
|
220
|
+
|
|
221
|
+
#### 2.3.2 Configuration Structure
|
|
222
|
+
|
|
223
|
+
```json
|
|
224
|
+
{
|
|
225
|
+
"configVersion": "1.1.0",
|
|
226
|
+
"output": {
|
|
227
|
+
"mode": "fixed | relative",
|
|
228
|
+
"fixedPath": "absolute path"
|
|
229
|
+
},
|
|
230
|
+
"allowedExtensions": ["array of extensions"],
|
|
231
|
+
"excludeDirs": ["array of directory names"],
|
|
232
|
+
"excludeFiles": ["array of glob patterns"],
|
|
233
|
+
"styles": { "colors": {}, "layout": {}, "fonts": {} },
|
|
234
|
+
"settings": {
|
|
235
|
+
"documentTitle": "Project Code Summary",
|
|
236
|
+
"maxFilesBeforePrompt": 500
|
|
237
|
+
}
|
|
238
|
+
}
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
#### 2.3.3 Smart Migration
|
|
242
|
+
|
|
243
|
+
On every run, new defaults are merged into the existing config using `smartMergeArrays`:
|
|
244
|
+
- Items already present are kept in place
|
|
245
|
+
- New items are appended at the end
|
|
246
|
+
- User removals are respected (removed items are not re-added)
|
|
247
|
+
- Changes are saved automatically and the user is notified
|
|
248
|
+
|
|
249
|
+
#### 2.3.4 Interactive Editor
|
|
250
|
+
|
|
251
|
+
Sections available via `codesummary config`:
|
|
252
|
+
- Output settings (mode, fixed path)
|
|
253
|
+
- Allowed extensions
|
|
254
|
+
- Excluded directories
|
|
255
|
+
- Excluded file patterns
|
|
256
|
+
- General settings (document title, file warning threshold)
|
|
257
|
+
|
|
258
|
+
---
|
|
259
|
+
|
|
260
|
+
### 2.4 File System Scanning
|
|
261
|
+
|
|
262
|
+
#### 2.4.1 Algorithm
|
|
263
|
+
|
|
264
|
+
1. Recursive directory traversal from `process.cwd()`
|
|
265
|
+
2. Whitelist filtering by allowed extensions
|
|
266
|
+
3. Directory exclusion by exact name match + built-in common-skip list
|
|
267
|
+
4. File exclusion by glob pattern matching
|
|
268
|
+
5. Symlink detection (skipped to avoid loops)
|
|
269
|
+
6. File size limit: 100 MB per file
|
|
270
|
+
7. Duplicate detection via absolute path tracking
|
|
271
|
+
|
|
272
|
+
#### 2.4.2 Supported Extensions (defaults)
|
|
273
|
+
|
|
274
|
+
**Web & JavaScript ecosystem:**
|
|
275
|
+
`.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`, `.astro`, `.mdx`
|
|
276
|
+
|
|
277
|
+
**Backend languages:**
|
|
278
|
+
`.py`, `.java`, `.cs`, `.cpp`, `.c`, `.h`, `.go`, `.rs`, `.swift`, `.kt`, `.scala`, `.rb`, `.php`, `.dart`, `.lua`, `.r`, `.ex`, `.exs`, `.pl`
|
|
279
|
+
|
|
280
|
+
**Web & markup:**
|
|
281
|
+
`.html`, `.css`, `.scss`, `.xml`
|
|
282
|
+
|
|
283
|
+
**Data & config:**
|
|
284
|
+
`.json`, `.yaml`, `.yml`, `.toml`, `.ini`, `.properties`, `.tf`, `.tfvars`, `.env`, `.local`, `.cfg`, `.conf`
|
|
285
|
+
|
|
286
|
+
**Schema & query:**
|
|
287
|
+
`.sql`, `.graphql`, `.gql`, `.proto`, `.prisma`
|
|
288
|
+
|
|
289
|
+
**Scripts:**
|
|
290
|
+
`.sh`, `.bat`, `.ps1`, `.mk`, `.cmake`
|
|
291
|
+
|
|
292
|
+
**Documentation:**
|
|
293
|
+
`.md`, `.mdx`, `.txt`
|
|
294
|
+
|
|
295
|
+
**Specialised:**
|
|
296
|
+
`.ino` (Arduino), `.j2` (Jinja2), `.service`, `.timer` (systemd), `.crt` (certificates), `.csv`, `.tsv`
|
|
297
|
+
|
|
298
|
+
#### 2.4.3 Default Excluded Directories
|
|
299
|
+
|
|
300
|
+
Build output: `dist`, `build`, `out`, `target`
|
|
301
|
+
Dependencies: `node_modules`, `vendor`, `bower_components`
|
|
302
|
+
Caches: `.cache`, `.turbo`, `.gradle`, `.yarn`, `.pnpm-store`, `.pytest_cache`, `.mypy_cache`, `.tox`, `htmlcov`
|
|
303
|
+
IDE: `.git`, `.vscode`, `.idea`
|
|
304
|
+
Framework: `.next`, `.nuxt`, `.angular`, `.svelte-kit`, `.expo`, `.dart_tool`, `storybook-static`
|
|
305
|
+
Python: `__pycache__`, `venv`, `.venv`
|
|
306
|
+
Infrastructure: `.terraform`
|
|
307
|
+
|
|
308
|
+
#### 2.4.4 Default Excluded File Patterns
|
|
309
|
+
|
|
310
|
+
Lock files: `*-lock.json`, `*.lock`, `composer.lock`, `Pipfile.lock`, `*-lock.yaml`
|
|
311
|
+
Minified: `*.min.js`, `*.min.css`, `*.map`
|
|
312
|
+
Compiled: `*.pyc`, `*.pyo`, `*.class`
|
|
313
|
+
Temporary: `*.log`, `*.tmp`, `*.temp`, `*.swp`, `*.bak`, `*.orig`
|
|
314
|
+
OS metadata: `.DS_Store`, `Thumbs.db`, `desktop.ini`, `ehthumbs.db`
|
|
315
|
+
|
|
316
|
+
---
|
|
317
|
+
|
|
318
|
+
### 2.5 Versioned Output Files
|
|
319
|
+
|
|
320
|
+
When a target file already exists, a `-vN` suffix is added instead of overwriting:
|
|
321
|
+
|
|
322
|
+
```
|
|
323
|
+
PROJECTNAME_llm.md ← exists
|
|
324
|
+
PROJECTNAME_llm-v1.md ← created
|
|
325
|
+
← next run: v1 exists
|
|
326
|
+
PROJECTNAME_llm-v2.md ← created
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
Applies to all three formats. Existing `-vN` suffixes are stripped before re-versioning to avoid `name-v1-v1.md`.
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
### 2.6 Non-Interactive Mode
|
|
334
|
+
|
|
335
|
+
`--no-interactive` (or non-TTY stdin) skips:
|
|
336
|
+
- Extension selection checkbox → all detected extensions selected
|
|
337
|
+
- File count confirmation prompt → proceeds automatically
|
|
338
|
+
|
|
339
|
+
Designed for use in CI/CD pipelines and scripted environments.
|
|
340
|
+
|
|
341
|
+
---
|
|
342
|
+
|
|
343
|
+
### 2.7 Error Handling
|
|
344
|
+
|
|
345
|
+
- **Path traversal prevention**: blocks `..`, null bytes, and Windows reserved names
|
|
346
|
+
- **Non-ASCII path support**: Unicode characters in paths (e.g. `C:\Users\Andrés\...`) are preserved correctly
|
|
347
|
+
- **Graceful scan errors**: permission denied and missing files are logged but don't abort the scan
|
|
348
|
+
- **PDF stream errors**: file-in-use (EBUSY/EACCES) triggers versioned filename fallback
|
|
349
|
+
- **LLM/RAG errors**: unreadable files emit a warning block in output instead of crashing
|
|
350
|
+
- **`--format both` failures**: continue-on-error; both outputs attempted, all errors reported together
|
|
351
|
+
|
|
352
|
+
---
|
|
353
|
+
|
|
354
|
+
## 3. Technical Architecture
|
|
355
|
+
|
|
356
|
+
### 3.1 Module Structure
|
|
357
|
+
|
|
358
|
+
```
|
|
359
|
+
src/
|
|
360
|
+
├── cli.js # Argument parsing, orchestration, user interaction
|
|
361
|
+
├── scanner.js # Recursive directory scanning and filtering
|
|
362
|
+
├── pdfGenerator.js # PDF generation (PDFKit, streaming)
|
|
363
|
+
├── ragGenerator.js # RAG JSON generation with semantic chunking
|
|
364
|
+
├── llmGenerator.js # LLM Markdown generation with content optimisations
|
|
365
|
+
├── configManager.js # Global config load, save, migrate, edit
|
|
366
|
+
├── ragConfig.js # RAG-specific config (YAML loading, defaults)
|
|
367
|
+
├── errorHandler.js # Path validation, sanitisation, global error handlers
|
|
368
|
+
└── utils.js # Shared: formatFileSize, getExtensionDescription,
|
|
369
|
+
# matchesGlobPattern, resolveVersionedPath
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
### 3.2 Data Flow
|
|
373
|
+
|
|
374
|
+
```
|
|
375
|
+
bin/codesummary.js
|
|
376
|
+
└─ src/index.js (bootstrap)
|
|
377
|
+
└─ src/cli.js (parse args → executeMainFlow)
|
|
378
|
+
├─ scanner.js (scan → filesByExtension)
|
|
379
|
+
├─ pdfGenerator.js (format: pdf)
|
|
380
|
+
├─ ragGenerator.js (format: rag) ← uses ragConfig.js
|
|
381
|
+
└─ llmGenerator.js (format: llm)
|
|
382
|
+
```
|
|
383
|
+
|
|
384
|
+
### 3.3 Key Design Decisions
|
|
385
|
+
|
|
386
|
+
- **ESM modules** throughout (`"type": "module"`)
|
|
387
|
+
- **No singleton exports** — all modules export classes, instantiated at call site
|
|
388
|
+
- **Shared utilities** in `utils.js` — single source of truth, no duplication
|
|
389
|
+
- **Streaming writes** for PDF and LLM output — memory-efficient on large projects
|
|
390
|
+
- **Static imports** only — dynamic `import()` avoided for consistency
|
|
391
|
+
|
|
392
|
+
---
|
|
393
|
+
|
|
394
|
+
## 4. Security
|
|
395
|
+
|
|
396
|
+
- Path traversal (`..`) blocked via pattern matching before any file operation
|
|
397
|
+
- User-supplied paths sanitised: control characters and injection sequences removed
|
|
398
|
+
- Unicode characters in paths preserved (non-ASCII allowed)
|
|
399
|
+
- Windows reserved device names (CON, NUL, COM1, etc.) rejected
|
|
400
|
+
- No external network calls at runtime — fully offline operation
|
|
401
|
+
- Config validated against schema before use; corrupt config prompts reset
|
|
402
|
+
|
|
403
|
+
---
|
|
404
|
+
|
|
405
|
+
## 5. Future Enhancements
|
|
406
|
+
|
|
407
|
+
- `--format all`: PDF + RAG + LLM in a single pass
|
|
408
|
+
- Syntax highlighting in PDF output
|
|
409
|
+
- Clickable table of contents with bookmarks in PDF
|
|
410
|
+
- Git integration: document only changed files since last commit
|
|
411
|
+
- CI/CD plugin for automated documentation on push
|
|
412
|
+
- Custom PDF themes and styling
|
|
413
|
+
|
|
414
|
+
---
|
|
415
|
+
|
|
416
|
+
**Document Version**: 3.0
|
|
417
|
+
**Last Updated**: 2026-04-05
|
|
418
|
+
**Status**: Implementation Complete
|