codesummary 1.2.1 → 1.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +26 -213
- package/README.md +61 -395
- package/features.md +25 -386
- package/package.json +13 -17
- package/src/ai/errors.js +85 -0
- package/src/ai/featureFlags.js +8 -0
- package/src/ai/promptTemplates.js +337 -0
- package/src/ai/providerClient.js +81 -0
- package/src/ai/providers/ollama.js +92 -0
- package/src/ai/providers/openaiCompatible.js +96 -0
- package/src/analysis/repositorySignals.js +196 -0
- package/src/cli.js +819 -77
- package/src/configManager.js +21 -0
- package/src/graph/adapters/baseAdapter.js +24 -0
- package/src/graph/adapters/javascriptAdapter.js +53 -0
- package/src/graph/adapters/pythonAdapter.js +77 -0
- package/src/graph/graphEngine.js +151 -0
- package/src/graph/graphMetrics.js +79 -0
- package/src/graph/graphSchema.js +30 -0
- package/src/graph/universalExtractor.js +29 -0
- package/src/llmGenerator.js +723 -8
- package/src/pdfGenerator.js +1189 -275
- package/src/renderers/llmSummaryRenderer.js +14 -0
- package/src/renderers/pdfThemeRenderer.js +685 -0
- package/src/scanner.js +115 -8
- package/rag-schema.json +0 -114
- package/src/ragConfig.js +0 -369
- package/src/ragGenerator.js +0 -1740
package/CHANGELOG.md
CHANGED
|
@@ -5,231 +5,44 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
-
## [1.2.
|
|
8
|
+
## [1.2.2] - 2026-04-18
|
|
9
9
|
|
|
10
10
|
### ✨ New Features
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
-
|
|
14
|
-
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
|
|
18
|
-
#### Versioned output filenames
|
|
19
|
-
- All output formats (PDF, RAG JSON, LLM Markdown) now use `-v1`, `-v2` suffixes when the target file already exists
|
|
20
|
-
- Replaces the previous timestamp-based fallback for PDF; now consistent across all formats
|
|
21
|
-
|
|
22
|
-
#### Non-interactive mode fixed
|
|
23
|
-
- `--no-interactive` flag now correctly skips the extension selection prompt and the large-project confirmation
|
|
24
|
-
- Also activates automatically when stdin is not a TTY (CI/CD environments)
|
|
11
|
+
- **Deep Dependency Analysis**: Advanced graph engine with centrality and hub detection.
|
|
12
|
+
- **Language Adapters**: Deep extraction for JS/TS and Python; regex-based fallback for others.
|
|
13
|
+
- **Semantic Clustering**: Heuristic grouping of project files by capability (API, Logic, UI, etc.).
|
|
14
|
+
- **Focused Context Packs**: `--focus` and `--max-tokens` flags for task-specific documentation.
|
|
15
|
+
- **AI-Assisted Briefs**: Optional semantic enrichment via Ollama or OpenAI-compatible providers.
|
|
16
|
+
- **Structured Artifacts**: Emission of `.llmsummary.json` for agentic tool integration.
|
|
17
|
+
- **VS Code Extension (MVP)**: Integrated commands for VS Code users.
|
|
25
18
|
|
|
26
19
|
### 🔧 Improvements
|
|
20
|
+
- **Security**: Implemented robust `.csignore` support with nested inheritance rules.
|
|
21
|
+
- **CLI UX**: Added explicit risk confirmation when enabling AI enrichment without `.csignore`.
|
|
22
|
+
- **PDF Layout**: Refined cover design, zebra tables, and syntax-highlighted headers.
|
|
23
|
+
- **Token Efficiency**: Enhanced lossless compression in LLM Markdown output.
|
|
27
24
|
|
|
28
|
-
|
|
29
|
-
-
|
|
30
|
-
- `ragConfig.js` now exports the class instead of a singleton instance, eliminating shared state between runs
|
|
31
|
-
- `ragGenerator.js` imports switched to static; `RagConfigManager` instantiated locally
|
|
32
|
-
- `RagGenerator` constructor no longer accepts an unused `config` parameter
|
|
33
|
-
- `cli.js` uses `createRequire` for version reading — fixes fragile Windows path hack
|
|
34
|
-
|
|
35
|
-
#### Bug fixes
|
|
36
|
-
- `validatePath`: removed false positive that blocked valid absolute Windows paths (e.g. `C:\Users\Name\...`) when passed as `--output`
|
|
37
|
-
- `sanitizeInput` with `allowPath: true`: now preserves non-ASCII characters in paths (e.g. accented letters in user profile directories on Windows)
|
|
38
|
-
- Two additional `sanitizeInput` call sites in `cli.js` updated to pass `allowPath: true`
|
|
39
|
-
- `migrateConfig` side-channel (`_pendingNotification`) replaced with explicit return value
|
|
40
|
-
|
|
41
|
-
#### Extended defaults
|
|
42
|
-
- 19 new `allowedExtensions`: `.toml`, `.ini`, `.properties`, `.tf`, `.tfvars`, `.proto`, `.prisma`, `.dart`, `.lua`, `.r`, `.ex`, `.exs`, `.pl`, `.mk`, `.cmake`, `.mdx`, `.astro`, `.graphql`, `.gql`
|
|
43
|
-
- 13 previously user-only extensions promoted to defaults: `.ps1`, `.cfg`, `.conf`, `.env`, `.local`, `.service`, `.timer`, `.ino`, `.j2`, `.csv`, `.tsv`, `.crt`, `.sql`
|
|
44
|
-
- 18 new `excludeDirs`: `.idea`, `target`, `.gradle`, `venv`, `.venv`, `.pytest_cache`, `.mypy_cache`, `.tox`, `.terraform`, `.turbo`, `.angular`, `.svelte-kit`, `.yarn`, `.pnpm-store`, `.expo`, `.dart_tool`, `storybook-static`, `htmlcov`
|
|
45
|
-
- 11 new `excludeFiles`: `*.pyc`, `*.pyo`, `*.class`, `*.log`, `*.tmp`, `*.temp`, `*.swp`, `*.bak`, `*.orig`, `desktop.ini`, `ehthumbs.db`
|
|
46
|
-
|
|
47
|
-
### 📋 Migration Notes
|
|
48
|
-
- No breaking changes to existing configuration or CLI flags
|
|
49
|
-
- Existing config files are migrated automatically on first run — new extensions, dirs, and file patterns are appended; customisations are preserved
|
|
50
|
-
- `--format pdf` (explicit) and bare `codesummary` behaviour unchanged
|
|
51
|
-
|
|
52
|
-
## [1.1.1] - 2025-07-31
|
|
53
|
-
|
|
54
|
-
### 🔧 **Fixes & Improvements**
|
|
55
|
-
|
|
56
|
-
#### **CLI Enhancements**
|
|
57
|
-
- **Added Version Flag**: New `--version` and `-v` flags to display current version
|
|
58
|
-
- **Cross-Platform Compatibility**: Fixed Windows path resolution for version detection
|
|
59
|
-
- **Help Documentation**: Updated help text to include version option
|
|
60
|
-
|
|
61
|
-
#### **Dependency Cleanup**
|
|
62
|
-
- **Removed Deprecated Crypto**: Eliminated `crypto@1.0.1` dependency (now uses built-in Node.js crypto)
|
|
63
|
-
- **Security Improvement**: No more npm warnings about deprecated packages
|
|
64
|
-
- **Cleaner Dependencies**: Reduced package footprint
|
|
65
|
-
|
|
66
|
-
#### **Bug Fixes**
|
|
67
|
-
- **Merge Conflicts**: Resolved conflicts between main and develop branches
|
|
68
|
-
- **CLI Argument Parsing**: Fixed unknown option error for `--version` flag
|
|
69
|
-
|
|
70
|
-
### 📋 **Migration Notes**
|
|
71
|
-
- No breaking changes
|
|
72
|
-
- Existing installations will benefit from cleaner dependencies
|
|
73
|
-
- New `--version` flag available immediately after update
|
|
25
|
+
### 🗑️ Removed
|
|
26
|
+
- **RAG Legacy**: Fully retired old RAG-specific modules and configurations.
|
|
74
27
|
|
|
75
28
|
---
|
|
76
29
|
|
|
77
|
-
## [1.
|
|
78
|
-
|
|
79
|
-
### 🎉 Major Features Added
|
|
80
|
-
|
|
81
|
-
#### 🔧 **Complete RAG System Refactoring**
|
|
82
|
-
- **Atomic JSON Generation**: Eliminated streaming-based approach that caused JSON corruption
|
|
83
|
-
- **100% Thread-Safe Processing**: All files processed in memory before writing
|
|
84
|
-
- **Robust Error Handling**: No more duplicate keys or malformed JSON output
|
|
85
|
-
- **Performance Boost**: ~107 more chunks generated with improved stability
|
|
86
|
-
|
|
87
|
-
#### 📊 **Precision Offset Index System**
|
|
88
|
-
- **Complete fileOffsets**: Format `fileId -> [start, end]` for rapid file seeking
|
|
89
|
-
- **Detailed chunkOffsets**: Individual chunk positions with `jsonStart`, `jsonEnd`, `contentStart`, `contentEnd`
|
|
90
|
-
- **99.8% Precision**: 509/510 chunks with valid byte-accurate offsets
|
|
91
|
-
- **RAG-Optimized**: Enables high-performance vector database operations
|
|
92
|
-
|
|
93
|
-
#### 🧠 **Enhanced Token Estimation Engine**
|
|
94
|
-
- **Multi-Heuristic Algorithm**: Replaces simple `ceil(length/4)` with sophisticated analysis
|
|
95
|
-
- **Language-Aware Processing**: Specialized calculations for JavaScript, Python, Java, C++, etc.
|
|
96
|
-
- **Syntax Analysis**: Accounts for brackets, operators, and language-specific tokens
|
|
97
|
-
- **20% More Accurate**: Example: 100 chars JavaScript goes from 25 → 30 tokens
|
|
98
|
-
|
|
99
|
-
#### 📈 **Complete Processing Statistics**
|
|
100
|
-
- **Real-Time Metrics**: Processing time, throughput, bytes written
|
|
101
|
-
- **Quality Assurance**: Empty files count, chunks with valid offsets
|
|
102
|
-
- **Performance Tracking**: `bytesPerSecond`, `avgFileSize`, `avgChunksPerFile`
|
|
103
|
-
- **Error Collection**: Detailed error tracking and reporting
|
|
104
|
-
|
|
105
|
-
#### 🔄 **Future-Proof Schema System**
|
|
106
|
-
- **Schema Versioning**: `schemaVersion: "1.0"` for migration management
|
|
107
|
-
- **Method Tracking**: `tokenEstimationMethod: "enhanced_heuristic_v1.0"`
|
|
108
|
-
- **Schema URL**: Links to official schema definition for validation
|
|
109
|
-
- **Backward Compatibility**: Maintains compatibility with existing consumers
|
|
110
|
-
|
|
111
|
-
### 🛠️ **Technical Improvements**
|
|
112
|
-
|
|
113
|
-
#### **Code Quality & Architecture**
|
|
114
|
-
- Eliminated 5+ problematic streaming methods (`streamingGeneration`, `writeMainBody`, etc.)
|
|
115
|
-
- Consolidated to single `generate()` method for clarity
|
|
116
|
-
- Removed global state variables that caused race conditions
|
|
117
|
-
- Enhanced function detection regex for better semantic chunking
|
|
118
|
-
|
|
119
|
-
#### **Performance Optimizations**
|
|
120
|
-
- **Processing Speed**: 510 chunks generated in 56ms (vs previous inconsistent timing)
|
|
121
|
-
- **Memory Efficiency**: 18.4 MB/s throughput with atomic processing
|
|
122
|
-
- **Output Size**: Optimized JSON structure - 1.03 MB for comprehensive indexing
|
|
123
|
-
- **Validation**: Built-in JSON structure validation with detailed reporting
|
|
124
|
-
|
|
125
|
-
#### **Enhanced ScriptHandler**
|
|
126
|
-
- Improved regex patterns for TypeScript interfaces, enums, class methods
|
|
127
|
-
- Better support for `const enum`, `implements`, access modifiers
|
|
128
|
-
- Enhanced arrow function detection with `let`, `var` support
|
|
129
|
-
- More precise function boundary detection with brace matching
|
|
130
|
-
|
|
131
|
-
### 🐛 **Bugs Fixed**
|
|
132
|
-
|
|
133
|
-
#### **Critical JSON Corruption Issues**
|
|
134
|
-
- ❌ **Fixed**: Duplicate `index` sections in output JSON
|
|
135
|
-
- ❌ **Fixed**: Negative `processingTimeMs` values
|
|
136
|
-
- ❌ **Fixed**: Inconsistent chunk counts between sections
|
|
137
|
-
- ❌ **Fixed**: Missing or incorrect byte offsets
|
|
138
|
-
- ❌ **Fixed**: Malformed JSON due to concurrent writes
|
|
139
|
-
- ❌ **Fixed**: Stream truncation issues with large files
|
|
140
|
-
|
|
141
|
-
#### **Data Integrity Issues**
|
|
142
|
-
- ❌ **Fixed**: Inconsistent statistics across different JSON sections
|
|
143
|
-
- ❌ **Fixed**: Incorrect `totalBytes` calculations
|
|
144
|
-
- ❌ **Fixed**: Missing `chunkOffsets` for seek operations
|
|
145
|
-
- ❌ **Fixed**: Race conditions in multi-file processing
|
|
146
|
-
|
|
147
|
-
### 📊 **Performance Metrics (Before vs After)**
|
|
148
|
-
|
|
149
|
-
| Metric | v1.0.2 | v1.1.0 | Improvement |
|
|
150
|
-
|--------|--------|--------|-------------|
|
|
151
|
-
| JSON Validity | ❌ Corrupted | ✅ 100% Valid | +100% |
|
|
152
|
-
| Chunk Generation | ~400 chunks | 510 chunks | +27% |
|
|
153
|
-
| Processing Time | Inconsistent | 56ms stable | Consistent |
|
|
154
|
-
| Offset Precision | ~60% valid | 99.8% valid | +66% |
|
|
155
|
-
| Memory Safety | Race conditions | Thread-safe | Stable |
|
|
156
|
-
| Output Size | Bloated/corrupt | 1.03 MB optimized | Efficient |
|
|
157
|
-
|
|
158
|
-
### 🔍 **API Changes**
|
|
159
|
-
|
|
160
|
-
#### **New JSON Structure Fields**
|
|
161
|
-
```json
|
|
162
|
-
{
|
|
163
|
-
"metadata": {
|
|
164
|
-
"schemaVersion": "1.0",
|
|
165
|
-
"schemaUrl": "https://github.com/skamoll/CodeSummary/schemas/rag-output.json",
|
|
166
|
-
"config": {
|
|
167
|
-
"tokenEstimationMethod": "enhanced_heuristic_v1.0"
|
|
168
|
-
}
|
|
169
|
-
},
|
|
170
|
-
"index": {
|
|
171
|
-
"chunkOffsets": {
|
|
172
|
-
"chunk_id": {
|
|
173
|
-
"jsonStart": 1234,
|
|
174
|
-
"jsonEnd": 5678,
|
|
175
|
-
"contentStart": 2000,
|
|
176
|
-
"contentEnd": 4000,
|
|
177
|
-
"filePath": "src/file.js"
|
|
178
|
-
}
|
|
179
|
-
},
|
|
180
|
-
"fileOffsets": {
|
|
181
|
-
"file_id": [startByte, endByte]
|
|
182
|
-
},
|
|
183
|
-
"statistics": {
|
|
184
|
-
"processingTimeMs": 56,
|
|
185
|
-
"bytesPerSecond": 18404786,
|
|
186
|
-
"chunksWithValidOffsets": 509,
|
|
187
|
-
"emptyFiles": 0
|
|
188
|
-
}
|
|
189
|
-
}
|
|
190
|
-
}
|
|
191
|
-
```
|
|
192
|
-
|
|
193
|
-
### 🎯 **Use Cases Enabled**
|
|
194
|
-
|
|
195
|
-
#### **RAG/Vector Database Applications**
|
|
196
|
-
- **Rapid Content Retrieval**: Use `chunkOffsets` for instant chunk access
|
|
197
|
-
- **Efficient File Processing**: `fileOffsets` enable selective file loading
|
|
198
|
-
- **Quality Metrics**: Statistics help optimize chunk size and processing
|
|
199
|
-
|
|
200
|
-
#### **Code Analysis Tools**
|
|
201
|
-
- **Semantic Navigation**: Enhanced function detection for better code understanding
|
|
202
|
-
- **Token Budget Planning**: Accurate token estimation for LLM interactions
|
|
203
|
-
- **Processing Monitoring**: Detailed metrics for pipeline optimization
|
|
204
|
-
|
|
205
|
-
### 🔗 **Migration Guide**
|
|
30
|
+
## [1.2.0] - 2026-04-05
|
|
206
31
|
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
4. **Schema**: Check `metadata.schemaVersion` for compatibility
|
|
32
|
+
### ✨ New Features
|
|
33
|
+
- **LLM Markdown Output**: Introduced `--format llm` for token-optimized AI context.
|
|
34
|
+
- **Automatic Versioning**: Replaced timestamped fallbacks with `-v1`, `-v2` suffix logic.
|
|
35
|
+
- **Headless Support**: Improved `--no-interactive` behavior for CI/CD pipelines.
|
|
212
36
|
|
|
213
|
-
|
|
214
|
-
-
|
|
215
|
-
-
|
|
216
|
-
-
|
|
217
|
-
- ⚠️ New `index` section - consumers should handle gracefully
|
|
37
|
+
### 🔧 Improvements
|
|
38
|
+
- **Refactoring**: Consolidated core logic into `src/utils.js`.
|
|
39
|
+
- **Windows Support**: Fixed non-ASCII path handling and absolute path validation.
|
|
40
|
+
- **Default Extensions**: Expanded support for 30+ new file types including TOML, Terraform, and Protobuf.
|
|
218
41
|
|
|
219
42
|
---
|
|
220
43
|
|
|
221
|
-
## [1.0
|
|
222
|
-
|
|
223
|
-
- Bug fixes and stability improvements
|
|
224
|
-
- Enhanced cross-platform compatibility
|
|
225
|
-
|
|
226
|
-
## [1.0.1] - 2025-07-28
|
|
227
|
-
### Added
|
|
228
|
-
- Initial RAG functionality
|
|
229
|
-
- Basic PDF generation
|
|
44
|
+
## [1.1.0] - 2025-07-31
|
|
45
|
+
- Initial architecture improvements and core CLI stability fixes.
|
|
230
46
|
|
|
231
47
|
## [1.0.0] - 2025-07-27
|
|
232
|
-
|
|
233
|
-
- Initial release
|
|
234
|
-
- Core PDF generation functionality
|
|
235
|
-
- Multi-language support
|
|
48
|
+
- Initial release with core PDF generation functionality.
|