codesummary 1.1.1 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,191 +1,235 @@
1
- # Changelog
2
-
3
- All notable changes to this project will be documented in this file.
4
-
5
- The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
- and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
-
8
- ## [1.1.1] - 2025-07-31
9
-
10
- ### 🔧 **Fixes & Improvements**
11
-
12
- #### **CLI Enhancements**
13
- - **Added Version Flag**: New `--version` and `-v` flags to display current version
14
- - **Cross-Platform Compatibility**: Fixed Windows path resolution for version detection
15
- - **Help Documentation**: Updated help text to include version option
16
-
17
- #### **Dependency Cleanup**
18
- - **Removed Deprecated Crypto**: Eliminated `crypto@1.0.1` dependency (now uses built-in Node.js crypto)
19
- - **Security Improvement**: No more npm warnings about deprecated packages
20
- - **Cleaner Dependencies**: Reduced package footprint
21
-
22
- #### **Bug Fixes**
23
- - **Merge Conflicts**: Resolved conflicts between main and develop branches
24
- - **CLI Argument Parsing**: Fixed unknown option error for `--version` flag
25
-
26
- ### 📋 **Migration Notes**
27
- - No breaking changes
28
- - Existing installations will benefit from cleaner dependencies
29
- - New `--version` flag available immediately after update
30
-
31
- ---
32
-
33
- ## [1.1.0] - 2025-07-31
34
-
35
- ### 🎉 Major Features Added
36
-
37
- #### 🔧 **Complete RAG System Refactoring**
38
- - **Atomic JSON Generation**: Eliminated streaming-based approach that caused JSON corruption
39
- - **100% Thread-Safe Processing**: All files processed in memory before writing
40
- - **Robust Error Handling**: No more duplicate keys or malformed JSON output
41
- - **Performance Boost**: ~107 more chunks generated with improved stability
42
-
43
- #### 📊 **Precision Offset Index System**
44
- - **Complete fileOffsets**: Format `fileId -> [start, end]` for rapid file seeking
45
- - **Detailed chunkOffsets**: Individual chunk positions with `jsonStart`, `jsonEnd`, `contentStart`, `contentEnd`
46
- - **99.8% Precision**: 509/510 chunks with valid byte-accurate offsets
47
- - **RAG-Optimized**: Enables high-performance vector database operations
48
-
49
- #### 🧠 **Enhanced Token Estimation Engine**
50
- - **Multi-Heuristic Algorithm**: Replaces simple `ceil(length/4)` with sophisticated analysis
51
- - **Language-Aware Processing**: Specialized calculations for JavaScript, Python, Java, C++, etc.
52
- - **Syntax Analysis**: Accounts for brackets, operators, and language-specific tokens
53
- - **20% More Accurate**: Example: 100 chars JavaScript goes from 25 → 30 tokens
54
-
55
- #### 📈 **Complete Processing Statistics**
56
- - **Real-Time Metrics**: Processing time, throughput, bytes written
57
- - **Quality Assurance**: Empty files count, chunks with valid offsets
58
- - **Performance Tracking**: `bytesPerSecond`, `avgFileSize`, `avgChunksPerFile`
59
- - **Error Collection**: Detailed error tracking and reporting
60
-
61
- #### 🔄 **Future-Proof Schema System**
62
- - **Schema Versioning**: `schemaVersion: "1.0"` for migration management
63
- - **Method Tracking**: `tokenEstimationMethod: "enhanced_heuristic_v1.0"`
64
- - **Schema URL**: Links to official schema definition for validation
65
- - **Backward Compatibility**: Maintains compatibility with existing consumers
66
-
67
- ### 🛠️ **Technical Improvements**
68
-
69
- #### **Code Quality & Architecture**
70
- - Eliminated 5+ problematic streaming methods (`streamingGeneration`, `writeMainBody`, etc.)
71
- - Consolidated to single `generate()` method for clarity
72
- - Removed global state variables that caused race conditions
73
- - Enhanced function detection regex for better semantic chunking
74
-
75
- #### **Performance Optimizations**
76
- - **Processing Speed**: 510 chunks generated in 56ms (vs previous inconsistent timing)
77
- - **Memory Efficiency**: 18.4 MB/s throughput with atomic processing
78
- - **Output Size**: Optimized JSON structure - 1.03 MB for comprehensive indexing
79
- - **Validation**: Built-in JSON structure validation with detailed reporting
80
-
81
- #### **Enhanced ScriptHandler**
82
- - Improved regex patterns for TypeScript interfaces, enums, class methods
83
- - Better support for `const enum`, `implements`, access modifiers
84
- - Enhanced arrow function detection with `let`, `var` support
85
- - More precise function boundary detection with brace matching
86
-
87
- ### 🐛 **Bugs Fixed**
88
-
89
- #### **Critical JSON Corruption Issues**
90
- - **Fixed**: Duplicate `index` sections in output JSON
91
- - **Fixed**: Negative `processingTimeMs` values
92
- - ❌ **Fixed**: Inconsistent chunk counts between sections
93
- - **Fixed**: Missing or incorrect byte offsets
94
- - **Fixed**: Malformed JSON due to concurrent writes
95
- - **Fixed**: Stream truncation issues with large files
96
-
97
- #### **Data Integrity Issues**
98
- - ❌ **Fixed**: Inconsistent statistics across different JSON sections
99
- - **Fixed**: Incorrect `totalBytes` calculations
100
- - **Fixed**: Missing `chunkOffsets` for seek operations
101
- - **Fixed**: Race conditions in multi-file processing
102
-
103
- ### 📊 **Performance Metrics (Before vs After)**
104
-
105
- | Metric | v1.0.2 | v1.1.0 | Improvement |
106
- |--------|--------|--------|-------------|
107
- | JSON Validity | ❌ Corrupted | ✅ 100% Valid | +100% |
108
- | Chunk Generation | ~400 chunks | 510 chunks | +27% |
109
- | Processing Time | Inconsistent | 56ms stable | Consistent |
110
- | Offset Precision | ~60% valid | 99.8% valid | +66% |
111
- | Memory Safety | Race conditions | Thread-safe | Stable |
112
- | Output Size | Bloated/corrupt | 1.03 MB optimized | Efficient |
113
-
114
- ### 🔍 **API Changes**
115
-
116
- #### **New JSON Structure Fields**
117
- ```json
118
- {
119
- "metadata": {
120
- "schemaVersion": "1.0",
121
- "schemaUrl": "https://github.com/skamoll/CodeSummary/schemas/rag-output.json",
122
- "config": {
123
- "tokenEstimationMethod": "enhanced_heuristic_v1.0"
124
- }
125
- },
126
- "index": {
127
- "chunkOffsets": {
128
- "chunk_id": {
129
- "jsonStart": 1234,
130
- "jsonEnd": 5678,
131
- "contentStart": 2000,
132
- "contentEnd": 4000,
133
- "filePath": "src/file.js"
134
- }
135
- },
136
- "fileOffsets": {
137
- "file_id": [startByte, endByte]
138
- },
139
- "statistics": {
140
- "processingTimeMs": 56,
141
- "bytesPerSecond": 18404786,
142
- "chunksWithValidOffsets": 509,
143
- "emptyFiles": 0
144
- }
145
- }
146
- }
147
- ```
148
-
149
- ### 🎯 **Use Cases Enabled**
150
-
151
- #### **RAG/Vector Database Applications**
152
- - **Rapid Content Retrieval**: Use `chunkOffsets` for instant chunk access
153
- - **Efficient File Processing**: `fileOffsets` enable selective file loading
154
- - **Quality Metrics**: Statistics help optimize chunk size and processing
155
-
156
- #### **Code Analysis Tools**
157
- - **Semantic Navigation**: Enhanced function detection for better code understanding
158
- - **Token Budget Planning**: Accurate token estimation for LLM interactions
159
- - **Processing Monitoring**: Detailed metrics for pipeline optimization
160
-
161
- ### 🔗 **Migration Guide**
162
-
163
- #### **From v1.0.x to v1.1.0**
164
- 1. **JSON Structure**: New `index` section with detailed offsets - update parsers
165
- 2. **Token Estimates**: Values may be ~20% higher due to improved accuracy
166
- 3. **Statistics**: New fields available in `index.statistics`
167
- 4. **Schema**: Check `metadata.schemaVersion` for compatibility
168
-
169
- #### **Backward Compatibility**
170
- - ✅ All existing `metadata` and `files` sections unchanged
171
- - ✅ Chunk structure remains the same
172
- - ✅ CLI interface identical
173
- - ⚠️ New `index` section - consumers should handle gracefully
174
-
175
- ---
176
-
177
- ## [1.0.2] - 2025-07-29
178
- ### Fixed
179
- - Bug fixes and stability improvements
180
- - Enhanced cross-platform compatibility
181
-
182
- ## [1.0.1] - 2025-07-28
183
- ### Added
184
- - Initial RAG functionality
185
- - Basic PDF generation
186
-
187
- ## [1.0.0] - 2025-07-27
188
- ### Added
189
- - Initial release
190
- - Core PDF generation functionality
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [1.2.0] - 2026-04-05
9
+
10
+ ### New Features
11
+
12
+ #### LLM Markdown output (`--format llm`)
13
+ - New output format that generates a single Markdown file optimised for direct use with any chat-based LLM
14
+ - Includes project header, file tree, and full file contents in fenced code blocks with language hints
15
+ - Lossless optimisations applied automatically: line ending normalisation, trailing whitespace removal, blank line collapsing, JSON compaction
16
+ - File naming follows the same versioning scheme as other formats
17
+
18
+ #### Versioned output filenames
19
+ - All output formats (PDF, RAG JSON, LLM Markdown) now use `-v1`, `-v2` suffixes when the target file already exists
20
+ - Replaces the previous timestamp-based fallback for PDF; now consistent across all formats
21
+
22
+ #### Non-interactive mode fixed
23
+ - `--no-interactive` flag now correctly skips the extension selection prompt and the large-project confirmation
24
+ - Also activates automatically when stdin is not a TTY (CI/CD environments)
25
+
26
+ ### 🔧 Improvements
27
+
28
+ #### Architecture
29
+ - Extracted shared utilities (`formatFileSize`, `getExtensionDescription`, `matchesGlobPattern`, `resolveVersionedPath`) into `src/utils.js` — no more duplicated code across modules
30
+ - `ragConfig.js` now exports the class instead of a singleton instance, eliminating shared state between runs
31
+ - `ragGenerator.js` imports switched to static; `RagConfigManager` instantiated locally
32
+ - `RagGenerator` constructor no longer accepts an unused `config` parameter
33
+ - `cli.js` uses `createRequire` for version reading — fixes fragile Windows path hack
34
+
35
+ #### Bug fixes
36
+ - `validatePath`: removed false positive that blocked valid absolute Windows paths (e.g. `C:\Users\Name\...`) when passed as `--output`
37
+ - `sanitizeInput` with `allowPath: true`: now preserves non-ASCII characters in paths (e.g. accented letters in user profile directories on Windows)
38
+ - Two additional `sanitizeInput` call sites in `cli.js` updated to pass `allowPath: true`
39
+ - `migrateConfig` side-channel (`_pendingNotification`) replaced with explicit return value
40
+
41
+ #### Extended defaults
42
+ - 19 new `allowedExtensions`: `.toml`, `.ini`, `.properties`, `.tf`, `.tfvars`, `.proto`, `.prisma`, `.dart`, `.lua`, `.r`, `.ex`, `.exs`, `.pl`, `.mk`, `.cmake`, `.mdx`, `.astro`, `.graphql`, `.gql`
43
+ - 13 previously user-only extensions promoted to defaults: `.ps1`, `.cfg`, `.conf`, `.env`, `.local`, `.service`, `.timer`, `.ino`, `.j2`, `.csv`, `.tsv`, `.crt`, `.sql`
44
+ - 18 new `excludeDirs`: `.idea`, `target`, `.gradle`, `venv`, `.venv`, `.pytest_cache`, `.mypy_cache`, `.tox`, `.terraform`, `.turbo`, `.angular`, `.svelte-kit`, `.yarn`, `.pnpm-store`, `.expo`, `.dart_tool`, `storybook-static`, `htmlcov`
45
+ - 11 new `excludeFiles`: `*.pyc`, `*.pyo`, `*.class`, `*.log`, `*.tmp`, `*.temp`, `*.swp`, `*.bak`, `*.orig`, `desktop.ini`, `ehthumbs.db`
46
+
47
+ ### 📋 Migration Notes
48
+ - No breaking changes to existing configuration or CLI flags
49
+ - Existing config files are migrated automatically on first run — new extensions, dirs, and file patterns are appended; customisations are preserved
50
+ - `--format pdf` (explicit) and bare `codesummary` behaviour unchanged
51
+
52
+ ## [1.1.1] - 2025-07-31
53
+
54
+ ### 🔧 **Fixes & Improvements**
55
+
56
+ #### **CLI Enhancements**
57
+ - **Added Version Flag**: New `--version` and `-v` flags to display current version
58
+ - **Cross-Platform Compatibility**: Fixed Windows path resolution for version detection
59
+ - **Help Documentation**: Updated help text to include version option
60
+
61
+ #### **Dependency Cleanup**
62
+ - **Removed Deprecated Crypto**: Eliminated `crypto@1.0.1` dependency (now uses built-in Node.js crypto)
63
+ - **Security Improvement**: No more npm warnings about deprecated packages
64
+ - **Cleaner Dependencies**: Reduced package footprint
65
+
66
+ #### **Bug Fixes**
67
+ - **Merge Conflicts**: Resolved conflicts between main and develop branches
68
+ - **CLI Argument Parsing**: Fixed unknown option error for `--version` flag
69
+
70
+ ### 📋 **Migration Notes**
71
+ - No breaking changes
72
+ - Existing installations will benefit from cleaner dependencies
73
+ - New `--version` flag available immediately after update
74
+
75
+ ---
76
+
77
+ ## [1.1.0] - 2025-07-31
78
+
79
+ ### 🎉 Major Features Added
80
+
81
+ #### 🔧 **Complete RAG System Refactoring**
82
+ - **Atomic JSON Generation**: Eliminated streaming-based approach that caused JSON corruption
83
+ - **100% Thread-Safe Processing**: All files processed in memory before writing
84
+ - **Robust Error Handling**: No more duplicate keys or malformed JSON output
85
+ - **Performance Boost**: ~107 more chunks generated with improved stability
86
+
87
+ #### 📊 **Precision Offset Index System**
88
+ - **Complete fileOffsets**: Format `fileId -> [start, end]` for rapid file seeking
89
+ - **Detailed chunkOffsets**: Individual chunk positions with `jsonStart`, `jsonEnd`, `contentStart`, `contentEnd`
90
+ - **99.8% Precision**: 509/510 chunks with valid byte-accurate offsets
91
+ - **RAG-Optimized**: Enables high-performance vector database operations
92
+
93
+ #### 🧠 **Enhanced Token Estimation Engine**
94
+ - **Multi-Heuristic Algorithm**: Replaces simple `ceil(length/4)` with sophisticated analysis
95
+ - **Language-Aware Processing**: Specialized calculations for JavaScript, Python, Java, C++, etc.
96
+ - **Syntax Analysis**: Accounts for brackets, operators, and language-specific tokens
97
+ - **20% More Accurate**: Example: 100 chars JavaScript goes from 25 → 30 tokens
98
+
99
+ #### 📈 **Complete Processing Statistics**
100
+ - **Real-Time Metrics**: Processing time, throughput, bytes written
101
+ - **Quality Assurance**: Empty files count, chunks with valid offsets
102
+ - **Performance Tracking**: `bytesPerSecond`, `avgFileSize`, `avgChunksPerFile`
103
+ - **Error Collection**: Detailed error tracking and reporting
104
+
105
+ #### 🔄 **Future-Proof Schema System**
106
+ - **Schema Versioning**: `schemaVersion: "1.0"` for migration management
107
+ - **Method Tracking**: `tokenEstimationMethod: "enhanced_heuristic_v1.0"`
108
+ - **Schema URL**: Links to official schema definition for validation
109
+ - **Backward Compatibility**: Maintains compatibility with existing consumers
110
+
111
+ ### 🛠️ **Technical Improvements**
112
+
113
+ #### **Code Quality & Architecture**
114
+ - Eliminated 5+ problematic streaming methods (`streamingGeneration`, `writeMainBody`, etc.)
115
+ - Consolidated to single `generate()` method for clarity
116
+ - Removed global state variables that caused race conditions
117
+ - Enhanced function detection regex for better semantic chunking
118
+
119
+ #### **Performance Optimizations**
120
+ - **Processing Speed**: 510 chunks generated in 56ms (vs previous inconsistent timing)
121
+ - **Memory Efficiency**: 18.4 MB/s throughput with atomic processing
122
+ - **Output Size**: Optimized JSON structure - 1.03 MB for comprehensive indexing
123
+ - **Validation**: Built-in JSON structure validation with detailed reporting
124
+
125
+ #### **Enhanced ScriptHandler**
126
+ - Improved regex patterns for TypeScript interfaces, enums, class methods
127
+ - Better support for `const enum`, `implements`, access modifiers
128
+ - Enhanced arrow function detection with `let`, `var` support
129
+ - More precise function boundary detection with brace matching
130
+
131
+ ### 🐛 **Bugs Fixed**
132
+
133
+ #### **Critical JSON Corruption Issues**
134
+ - ❌ **Fixed**: Duplicate `index` sections in output JSON
135
+ - ❌ **Fixed**: Negative `processingTimeMs` values
136
+ - ❌ **Fixed**: Inconsistent chunk counts between sections
137
+ - **Fixed**: Missing or incorrect byte offsets
138
+ - ❌ **Fixed**: Malformed JSON due to concurrent writes
139
+ - ❌ **Fixed**: Stream truncation issues with large files
140
+
141
+ #### **Data Integrity Issues**
142
+ - ❌ **Fixed**: Inconsistent statistics across different JSON sections
143
+ - ❌ **Fixed**: Incorrect `totalBytes` calculations
144
+ - ❌ **Fixed**: Missing `chunkOffsets` for seek operations
145
+ - ❌ **Fixed**: Race conditions in multi-file processing
146
+
147
+ ### 📊 **Performance Metrics (Before vs After)**
148
+
149
+ | Metric | v1.0.2 | v1.1.0 | Improvement |
150
+ |--------|--------|--------|-------------|
151
+ | JSON Validity | ❌ Corrupted | ✅ 100% Valid | +100% |
152
+ | Chunk Generation | ~400 chunks | 510 chunks | +27% |
153
+ | Processing Time | Inconsistent | 56ms stable | Consistent |
154
+ | Offset Precision | ~60% valid | 99.8% valid | +66% |
155
+ | Memory Safety | Race conditions | Thread-safe | Stable |
156
+ | Output Size | Bloated/corrupt | 1.03 MB optimized | Efficient |
157
+
158
+ ### 🔍 **API Changes**
159
+
160
+ #### **New JSON Structure Fields**
161
+ ```json
162
+ {
163
+ "metadata": {
164
+ "schemaVersion": "1.0",
165
+ "schemaUrl": "https://github.com/skamoll/CodeSummary/schemas/rag-output.json",
166
+ "config": {
167
+ "tokenEstimationMethod": "enhanced_heuristic_v1.0"
168
+ }
169
+ },
170
+ "index": {
171
+ "chunkOffsets": {
172
+ "chunk_id": {
173
+ "jsonStart": 1234,
174
+ "jsonEnd": 5678,
175
+ "contentStart": 2000,
176
+ "contentEnd": 4000,
177
+ "filePath": "src/file.js"
178
+ }
179
+ },
180
+ "fileOffsets": {
181
+ "file_id": [startByte, endByte]
182
+ },
183
+ "statistics": {
184
+ "processingTimeMs": 56,
185
+ "bytesPerSecond": 18404786,
186
+ "chunksWithValidOffsets": 509,
187
+ "emptyFiles": 0
188
+ }
189
+ }
190
+ }
191
+ ```
192
+
193
+ ### 🎯 **Use Cases Enabled**
194
+
195
+ #### **RAG/Vector Database Applications**
196
+ - **Rapid Content Retrieval**: Use `chunkOffsets` for instant chunk access
197
+ - **Efficient File Processing**: `fileOffsets` enable selective file loading
198
+ - **Quality Metrics**: Statistics help optimize chunk size and processing
199
+
200
+ #### **Code Analysis Tools**
201
+ - **Semantic Navigation**: Enhanced function detection for better code understanding
202
+ - **Token Budget Planning**: Accurate token estimation for LLM interactions
203
+ - **Processing Monitoring**: Detailed metrics for pipeline optimization
204
+
205
+ ### 🔗 **Migration Guide**
206
+
207
+ #### **From v1.0.x to v1.1.0**
208
+ 1. **JSON Structure**: New `index` section with detailed offsets - update parsers
209
+ 2. **Token Estimates**: Values may be ~20% higher due to improved accuracy
210
+ 3. **Statistics**: New fields available in `index.statistics`
211
+ 4. **Schema**: Check `metadata.schemaVersion` for compatibility
212
+
213
+ #### **Backward Compatibility**
214
+ - ✅ All existing `metadata` and `files` sections unchanged
215
+ - ✅ Chunk structure remains the same
216
+ - ✅ CLI interface identical
217
+ - ⚠️ New `index` section - consumers should handle gracefully
218
+
219
+ ---
220
+
221
+ ## [1.0.2] - 2025-07-29
222
+ ### Fixed
223
+ - Bug fixes and stability improvements
224
+ - Enhanced cross-platform compatibility
225
+
226
+ ## [1.0.1] - 2025-07-28
227
+ ### Added
228
+ - Initial RAG functionality
229
+ - Basic PDF generation
230
+
231
+ ## [1.0.0] - 2025-07-27
232
+ ### Added
233
+ - Initial release
234
+ - Core PDF generation functionality
191
235
  - Multi-language support