codesummary 1.1.1 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +234 -190
- package/LICENSE +674 -674
- package/README.md +483 -607
- package/bin/codesummary.js +12 -12
- package/features.md +418 -502
- package/package.json +95 -95
- package/rag-schema.json +113 -113
- package/src/cli.js +599 -540
- package/src/configManager.js +880 -827
- package/src/errorHandler.js +474 -477
- package/src/index.js +25 -25
- package/src/llmGenerator.js +189 -0
- package/src/pdfGenerator.js +408 -475
- package/src/ragConfig.js +369 -373
- package/src/ragGenerator.js +1739 -1757
- package/src/scanner.js +386 -467
- package/src/utils.js +139 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,191 +1,235 @@
|
|
|
1
|
-
# Changelog
|
|
2
|
-
|
|
3
|
-
All notable changes to this project will be documented in this file.
|
|
4
|
-
|
|
5
|
-
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
-
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
-
|
|
8
|
-
## [1.
|
|
9
|
-
|
|
10
|
-
###
|
|
11
|
-
|
|
12
|
-
####
|
|
13
|
-
-
|
|
14
|
-
-
|
|
15
|
-
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
-
|
|
20
|
-
-
|
|
21
|
-
|
|
22
|
-
####
|
|
23
|
-
-
|
|
24
|
-
-
|
|
25
|
-
|
|
26
|
-
###
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
-
|
|
39
|
-
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
-
|
|
45
|
-
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
- **
|
|
58
|
-
- **
|
|
59
|
-
- **
|
|
60
|
-
|
|
61
|
-
####
|
|
62
|
-
- **
|
|
63
|
-
- **
|
|
64
|
-
- **
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
-
|
|
72
|
-
-
|
|
73
|
-
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
#### **
|
|
82
|
-
-
|
|
83
|
-
-
|
|
84
|
-
-
|
|
85
|
-
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
-
|
|
91
|
-
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
-
|
|
95
|
-
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
-
|
|
101
|
-
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
1.
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [1.2.0] - 2026-04-05
|
|
9
|
+
|
|
10
|
+
### ✨ New Features
|
|
11
|
+
|
|
12
|
+
#### LLM Markdown output (`--format llm`)
|
|
13
|
+
- New output format that generates a single Markdown file optimised for direct use with any chat-based LLM
|
|
14
|
+
- Includes project header, file tree, and full file contents in fenced code blocks with language hints
|
|
15
|
+
- Lossless optimisations applied automatically: line ending normalisation, trailing whitespace removal, blank line collapsing, JSON compaction
|
|
16
|
+
- File naming follows the same versioning scheme as other formats
|
|
17
|
+
|
|
18
|
+
#### Versioned output filenames
|
|
19
|
+
- All output formats (PDF, RAG JSON, LLM Markdown) now use `-v1`, `-v2` suffixes when the target file already exists
|
|
20
|
+
- Replaces the previous timestamp-based fallback for PDF; now consistent across all formats
|
|
21
|
+
|
|
22
|
+
#### Non-interactive mode fixed
|
|
23
|
+
- `--no-interactive` flag now correctly skips the extension selection prompt and the large-project confirmation
|
|
24
|
+
- Also activates automatically when stdin is not a TTY (CI/CD environments)
|
|
25
|
+
|
|
26
|
+
### 🔧 Improvements
|
|
27
|
+
|
|
28
|
+
#### Architecture
|
|
29
|
+
- Extracted shared utilities (`formatFileSize`, `getExtensionDescription`, `matchesGlobPattern`, `resolveVersionedPath`) into `src/utils.js` — no more duplicated code across modules
|
|
30
|
+
- `ragConfig.js` now exports the class instead of a singleton instance, eliminating shared state between runs
|
|
31
|
+
- `ragGenerator.js` imports switched to static; `RagConfigManager` instantiated locally
|
|
32
|
+
- `RagGenerator` constructor no longer accepts an unused `config` parameter
|
|
33
|
+
- `cli.js` uses `createRequire` for version reading — fixes fragile Windows path hack
|
|
34
|
+
|
|
35
|
+
#### Bug fixes
|
|
36
|
+
- `validatePath`: removed false positive that blocked valid absolute Windows paths (e.g. `C:\Users\Name\...`) when passed as `--output`
|
|
37
|
+
- `sanitizeInput` with `allowPath: true`: now preserves non-ASCII characters in paths (e.g. accented letters in user profile directories on Windows)
|
|
38
|
+
- Two additional `sanitizeInput` call sites in `cli.js` updated to pass `allowPath: true`
|
|
39
|
+
- `migrateConfig` side-channel (`_pendingNotification`) replaced with explicit return value
|
|
40
|
+
|
|
41
|
+
#### Extended defaults
|
|
42
|
+
- 19 new `allowedExtensions`: `.toml`, `.ini`, `.properties`, `.tf`, `.tfvars`, `.proto`, `.prisma`, `.dart`, `.lua`, `.r`, `.ex`, `.exs`, `.pl`, `.mk`, `.cmake`, `.mdx`, `.astro`, `.graphql`, `.gql`
|
|
43
|
+
- 13 previously user-only extensions promoted to defaults: `.ps1`, `.cfg`, `.conf`, `.env`, `.local`, `.service`, `.timer`, `.ino`, `.j2`, `.csv`, `.tsv`, `.crt`, `.sql`
|
|
44
|
+
- 18 new `excludeDirs`: `.idea`, `target`, `.gradle`, `venv`, `.venv`, `.pytest_cache`, `.mypy_cache`, `.tox`, `.terraform`, `.turbo`, `.angular`, `.svelte-kit`, `.yarn`, `.pnpm-store`, `.expo`, `.dart_tool`, `storybook-static`, `htmlcov`
|
|
45
|
+
- 11 new `excludeFiles`: `*.pyc`, `*.pyo`, `*.class`, `*.log`, `*.tmp`, `*.temp`, `*.swp`, `*.bak`, `*.orig`, `desktop.ini`, `ehthumbs.db`
|
|
46
|
+
|
|
47
|
+
### 📋 Migration Notes
|
|
48
|
+
- No breaking changes to existing configuration or CLI flags
|
|
49
|
+
- Existing config files are migrated automatically on first run — new extensions, dirs, and file patterns are appended; customisations are preserved
|
|
50
|
+
- `--format pdf` (explicit) and bare `codesummary` behaviour unchanged
|
|
51
|
+
|
|
52
|
+
## [1.1.1] - 2025-07-31
|
|
53
|
+
|
|
54
|
+
### 🔧 **Fixes & Improvements**
|
|
55
|
+
|
|
56
|
+
#### **CLI Enhancements**
|
|
57
|
+
- **Added Version Flag**: New `--version` and `-v` flags to display current version
|
|
58
|
+
- **Cross-Platform Compatibility**: Fixed Windows path resolution for version detection
|
|
59
|
+
- **Help Documentation**: Updated help text to include version option
|
|
60
|
+
|
|
61
|
+
#### **Dependency Cleanup**
|
|
62
|
+
- **Removed Deprecated Crypto**: Eliminated `crypto@1.0.1` dependency (now uses built-in Node.js crypto)
|
|
63
|
+
- **Security Improvement**: No more npm warnings about deprecated packages
|
|
64
|
+
- **Cleaner Dependencies**: Reduced package footprint
|
|
65
|
+
|
|
66
|
+
#### **Bug Fixes**
|
|
67
|
+
- **Merge Conflicts**: Resolved conflicts between main and develop branches
|
|
68
|
+
- **CLI Argument Parsing**: Fixed unknown option error for `--version` flag
|
|
69
|
+
|
|
70
|
+
### 📋 **Migration Notes**
|
|
71
|
+
- No breaking changes
|
|
72
|
+
- Existing installations will benefit from cleaner dependencies
|
|
73
|
+
- New `--version` flag available immediately after update
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## [1.1.0] - 2025-07-31
|
|
78
|
+
|
|
79
|
+
### 🎉 Major Features Added
|
|
80
|
+
|
|
81
|
+
#### 🔧 **Complete RAG System Refactoring**
|
|
82
|
+
- **Atomic JSON Generation**: Eliminated streaming-based approach that caused JSON corruption
|
|
83
|
+
- **100% Thread-Safe Processing**: All files processed in memory before writing
|
|
84
|
+
- **Robust Error Handling**: No more duplicate keys or malformed JSON output
|
|
85
|
+
- **Performance Boost**: ~107 more chunks generated with improved stability
|
|
86
|
+
|
|
87
|
+
#### 📊 **Precision Offset Index System**
|
|
88
|
+
- **Complete fileOffsets**: Format `fileId -> [start, end]` for rapid file seeking
|
|
89
|
+
- **Detailed chunkOffsets**: Individual chunk positions with `jsonStart`, `jsonEnd`, `contentStart`, `contentEnd`
|
|
90
|
+
- **99.8% Precision**: 509/510 chunks with valid byte-accurate offsets
|
|
91
|
+
- **RAG-Optimized**: Enables high-performance vector database operations
|
|
92
|
+
|
|
93
|
+
#### 🧠 **Enhanced Token Estimation Engine**
|
|
94
|
+
- **Multi-Heuristic Algorithm**: Replaces simple `ceil(length/4)` with sophisticated analysis
|
|
95
|
+
- **Language-Aware Processing**: Specialized calculations for JavaScript, Python, Java, C++, etc.
|
|
96
|
+
- **Syntax Analysis**: Accounts for brackets, operators, and language-specific tokens
|
|
97
|
+
- **20% More Accurate**: Example: 100 chars JavaScript goes from 25 → 30 tokens
|
|
98
|
+
|
|
99
|
+
#### 📈 **Complete Processing Statistics**
|
|
100
|
+
- **Real-Time Metrics**: Processing time, throughput, bytes written
|
|
101
|
+
- **Quality Assurance**: Empty files count, chunks with valid offsets
|
|
102
|
+
- **Performance Tracking**: `bytesPerSecond`, `avgFileSize`, `avgChunksPerFile`
|
|
103
|
+
- **Error Collection**: Detailed error tracking and reporting
|
|
104
|
+
|
|
105
|
+
#### 🔄 **Future-Proof Schema System**
|
|
106
|
+
- **Schema Versioning**: `schemaVersion: "1.0"` for migration management
|
|
107
|
+
- **Method Tracking**: `tokenEstimationMethod: "enhanced_heuristic_v1.0"`
|
|
108
|
+
- **Schema URL**: Links to official schema definition for validation
|
|
109
|
+
- **Backward Compatibility**: Maintains compatibility with existing consumers
|
|
110
|
+
|
|
111
|
+
### 🛠️ **Technical Improvements**
|
|
112
|
+
|
|
113
|
+
#### **Code Quality & Architecture**
|
|
114
|
+
- Eliminated 5+ problematic streaming methods (`streamingGeneration`, `writeMainBody`, etc.)
|
|
115
|
+
- Consolidated to single `generate()` method for clarity
|
|
116
|
+
- Removed global state variables that caused race conditions
|
|
117
|
+
- Enhanced function detection regex for better semantic chunking
|
|
118
|
+
|
|
119
|
+
#### **Performance Optimizations**
|
|
120
|
+
- **Processing Speed**: 510 chunks generated in 56ms (vs previous inconsistent timing)
|
|
121
|
+
- **Memory Efficiency**: 18.4 MB/s throughput with atomic processing
|
|
122
|
+
- **Output Size**: Optimized JSON structure - 1.03 MB for comprehensive indexing
|
|
123
|
+
- **Validation**: Built-in JSON structure validation with detailed reporting
|
|
124
|
+
|
|
125
|
+
#### **Enhanced ScriptHandler**
|
|
126
|
+
- Improved regex patterns for TypeScript interfaces, enums, class methods
|
|
127
|
+
- Better support for `const enum`, `implements`, access modifiers
|
|
128
|
+
- Enhanced arrow function detection with `let`, `var` support
|
|
129
|
+
- More precise function boundary detection with brace matching
|
|
130
|
+
|
|
131
|
+
### 🐛 **Bugs Fixed**
|
|
132
|
+
|
|
133
|
+
#### **Critical JSON Corruption Issues**
|
|
134
|
+
- ❌ **Fixed**: Duplicate `index` sections in output JSON
|
|
135
|
+
- ❌ **Fixed**: Negative `processingTimeMs` values
|
|
136
|
+
- ❌ **Fixed**: Inconsistent chunk counts between sections
|
|
137
|
+
- ❌ **Fixed**: Missing or incorrect byte offsets
|
|
138
|
+
- ❌ **Fixed**: Malformed JSON due to concurrent writes
|
|
139
|
+
- ❌ **Fixed**: Stream truncation issues with large files
|
|
140
|
+
|
|
141
|
+
#### **Data Integrity Issues**
|
|
142
|
+
- ❌ **Fixed**: Inconsistent statistics across different JSON sections
|
|
143
|
+
- ❌ **Fixed**: Incorrect `totalBytes` calculations
|
|
144
|
+
- ❌ **Fixed**: Missing `chunkOffsets` for seek operations
|
|
145
|
+
- ❌ **Fixed**: Race conditions in multi-file processing
|
|
146
|
+
|
|
147
|
+
### 📊 **Performance Metrics (Before vs After)**
|
|
148
|
+
|
|
149
|
+
| Metric | v1.0.2 | v1.1.0 | Improvement |
|
|
150
|
+
|--------|--------|--------|-------------|
|
|
151
|
+
| JSON Validity | ❌ Corrupted | ✅ 100% Valid | +100% |
|
|
152
|
+
| Chunk Generation | ~400 chunks | 510 chunks | +27% |
|
|
153
|
+
| Processing Time | Inconsistent | 56ms stable | Consistent |
|
|
154
|
+
| Offset Precision | ~60% valid | 99.8% valid | +66% |
|
|
155
|
+
| Memory Safety | Race conditions | Thread-safe | Stable |
|
|
156
|
+
| Output Size | Bloated/corrupt | 1.03 MB optimized | Efficient |
|
|
157
|
+
|
|
158
|
+
### 🔍 **API Changes**
|
|
159
|
+
|
|
160
|
+
#### **New JSON Structure Fields**
|
|
161
|
+
```json
|
|
162
|
+
{
|
|
163
|
+
"metadata": {
|
|
164
|
+
"schemaVersion": "1.0",
|
|
165
|
+
"schemaUrl": "https://github.com/skamoll/CodeSummary/schemas/rag-output.json",
|
|
166
|
+
"config": {
|
|
167
|
+
"tokenEstimationMethod": "enhanced_heuristic_v1.0"
|
|
168
|
+
}
|
|
169
|
+
},
|
|
170
|
+
"index": {
|
|
171
|
+
"chunkOffsets": {
|
|
172
|
+
"chunk_id": {
|
|
173
|
+
"jsonStart": 1234,
|
|
174
|
+
"jsonEnd": 5678,
|
|
175
|
+
"contentStart": 2000,
|
|
176
|
+
"contentEnd": 4000,
|
|
177
|
+
"filePath": "src/file.js"
|
|
178
|
+
}
|
|
179
|
+
},
|
|
180
|
+
"fileOffsets": {
|
|
181
|
+
"file_id": [startByte, endByte]
|
|
182
|
+
},
|
|
183
|
+
"statistics": {
|
|
184
|
+
"processingTimeMs": 56,
|
|
185
|
+
"bytesPerSecond": 18404786,
|
|
186
|
+
"chunksWithValidOffsets": 509,
|
|
187
|
+
"emptyFiles": 0
|
|
188
|
+
}
|
|
189
|
+
}
|
|
190
|
+
}
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
### 🎯 **Use Cases Enabled**
|
|
194
|
+
|
|
195
|
+
#### **RAG/Vector Database Applications**
|
|
196
|
+
- **Rapid Content Retrieval**: Use `chunkOffsets` for instant chunk access
|
|
197
|
+
- **Efficient File Processing**: `fileOffsets` enable selective file loading
|
|
198
|
+
- **Quality Metrics**: Statistics help optimize chunk size and processing
|
|
199
|
+
|
|
200
|
+
#### **Code Analysis Tools**
|
|
201
|
+
- **Semantic Navigation**: Enhanced function detection for better code understanding
|
|
202
|
+
- **Token Budget Planning**: Accurate token estimation for LLM interactions
|
|
203
|
+
- **Processing Monitoring**: Detailed metrics for pipeline optimization
|
|
204
|
+
|
|
205
|
+
### 🔗 **Migration Guide**
|
|
206
|
+
|
|
207
|
+
#### **From v1.0.x to v1.1.0**
|
|
208
|
+
1. **JSON Structure**: New `index` section with detailed offsets - update parsers
|
|
209
|
+
2. **Token Estimates**: Values may be ~20% higher due to improved accuracy
|
|
210
|
+
3. **Statistics**: New fields available in `index.statistics`
|
|
211
|
+
4. **Schema**: Check `metadata.schemaVersion` for compatibility
|
|
212
|
+
|
|
213
|
+
#### **Backward Compatibility**
|
|
214
|
+
- ✅ All existing `metadata` and `files` sections unchanged
|
|
215
|
+
- ✅ Chunk structure remains the same
|
|
216
|
+
- ✅ CLI interface identical
|
|
217
|
+
- ⚠️ New `index` section - consumers should handle gracefully
|
|
218
|
+
|
|
219
|
+
---
|
|
220
|
+
|
|
221
|
+
## [1.0.2] - 2025-07-29
|
|
222
|
+
### Fixed
|
|
223
|
+
- Bug fixes and stability improvements
|
|
224
|
+
- Enhanced cross-platform compatibility
|
|
225
|
+
|
|
226
|
+
## [1.0.1] - 2025-07-28
|
|
227
|
+
### Added
|
|
228
|
+
- Initial RAG functionality
|
|
229
|
+
- Basic PDF generation
|
|
230
|
+
|
|
231
|
+
## [1.0.0] - 2025-07-27
|
|
232
|
+
### Added
|
|
233
|
+
- Initial release
|
|
234
|
+
- Core PDF generation functionality
|
|
191
235
|
- Multi-language support
|