npm - codesummary - Versions diffs - 1.1.1 → 1.2.1 - Mend

codesummary 1.1.1 → 1.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/README.md CHANGED Viewed

@@ -1,607 +1,483 @@
-# CodeSummary
-[![npm version](https://badge.fury.io/js/codesummary.svg)](https://badge.fury.io/js/codesummary)
-[![Node.js Version](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg)](https://nodejs.org/)
-[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
-[![Cross-Platform](https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey)](#)
-A **cross-platform CLI tool** that automatically scans project source code and generates both **clean, professional PDF documentation** and **RAG-optimized JSON outputs** for AI/ML applications. Perfect for code reviews, audits, project documentation, archival snapshots, and feeding code into vector databases or LLM systems.
-## 🚀 Key Features
-### 📄 **PDF Generation**
-- **🔍 Intelligent Scanning**: Recursively scans project directories with configurable file type filtering
-- **📄 Clean PDF Output**: Generates well-structured A4 PDFs with optimized formatting and complete content flow
-- **📝 Complete Content**: Includes ALL file content without truncation - no size limits
-### 🤖 **RAG & AI Integration** *(New in v1.1.0)*
-- **📊 RAG-Optimized JSON**: Purpose-built output format for vector databases and LLM applications
-- **🎯 Semantic Chunking**: Intelligent code segmentation by functions, classes, and logical blocks
-- **📈 Precision Offsets**: Byte-accurate indexing for rapid content retrieval (99.8% precision)
-- **🧠 Smart Token Estimation**: Language-aware token counting with 20% improved accuracy
-- **⚡ High-Performance Seeking**: Complete offset index for instant chunk access in RAG pipelines
-- **🔄 Schema Versioning**: Future-proof JSON structure with migration support
-- **⚙️ Global Configuration**: One-time setup with persistent cross-platform user preferences
-- **🎯 Interactive Selection**: Choose which file types to include via intuitive checkbox prompts
-- **🛡️ Safe & Smart**: Whitelist-driven approach prevents binary files, with intelligent fallbacks
-- **🌍 Cross-Platform**: Works identically on Windows, macOS, and Linux with terminal compatibility
-- **📊 Smart Filtering**: Automatically excludes build directories, dependencies, and temporary files
-- **⚡ Performance Optimized**: Efficient memory usage and streaming for large projects
-- **🔄 File Conflict Handling**: Automatic timestamped filenames when original files are in use
-## 📦 Installation
-```bash
-npm install -g codesummary
-```
-**Requirements**: Node.js ≥ 18.0.0
-## 🎯 Dual Output Modes
-### 📄 PDF Mode (Default)
-Generate clean, professional PDF documentation:
-```bash
-codesummary
-# Creates: PROJECT_code.pdf
-```
-### 🤖 RAG Mode *(New!)*
-Generate RAG-optimized JSON for AI applications:
-```bash
-codesummary --rag
-# Creates: PROJECT_rag.json with semantic chunks and precise offsets
-```
-### 🔄 Both Modes
-Generate both PDF and RAG outputs:
-```bash
-codesummary --both
-# Creates: PROJECT_code.pdf + PROJECT_rag.json
-```
-## 🎯 Quick Start
-### 📄 **PDF Generation**
-1. **First-time setup** (interactive wizard):
-   ```bash
-   codesummary
-   ```
-2. **Generate PDF for current project**:
-   ```bash
-   cd /path/to/your/project
-   codesummary
-   ```
-### 🤖 **RAG/AI Integration**
-1. **Generate RAG JSON** for vector databases:
-   ```bash
-   codesummary --rag
-   ```
-2. **Use in your AI pipeline**:
-   ```javascript
-   // Example: Loading and using RAG output
-   const ragData = JSON.parse(fs.readFileSync('project_rag.json'));
-   // Access semantic chunks
-   const chunks = ragData.files.flatMap(f => f.chunks);
-   // Use precise offsets for rapid seeking
-   const chunkId = 'chunk_abc123_0';
-   const offset = ragData.index.chunkOffsets[chunkId];
-   // Seek to offset.contentStart → offset.contentEnd for exact content
-   ```
-3. **Override output location**:
-   ```bash
-   codesummary --rag --output ./ai-data
-   ```
-## 📖 Usage
-### Interactive Workflow
-#### 1. First Run Setup
-```bash
-$ codesummary
-Welcome to CodeSummary!
-No configuration found. Starting setup...
-Where should the PDF be generated by default?
-> [ ] Current working directory (relative mode)
-> [x] Fixed folder (absolute mode)
-Enter absolute path for fixed folder:
-> ~/Desktop/CodeSummaries
-```
-#### 2. Extension Selection
-```bash
-Scanning directory: /path/to/project
-Scan Summary:
-   Extensions found: .js, .ts, .md, .json
-   Total files: 127
-   Total size: 2.4 MB
-Select file extensions to include:
-[x] .js → JavaScript (42 files)
-[x] .ts → TypeScript (28 files)
-[x] .md → Markdown (5 files)
-[ ] .json → JSON (52 files)
-```
-#### 3. Generation Complete
-```bash
-SUCCESS: PDF generation completed successfully!
-Summary:
-   Output: ~/Desktop/CodeSummaries/MYPROJECT_code.pdf
-   Extensions: .js, .ts, .md
-   Total files: 75
-   PDF size: 2.3 MB
-```
-### Command Reference
-| Command                      | Description                             |
-| ---------------------------- | --------------------------------------- |
-| `codesummary`                | Generate PDF documentation (default)    |
-| `codesummary --rag`          | Generate RAG-optimized JSON output     |
-| `codesummary --both`         | Generate both PDF and RAG outputs      |
-| `codesummary config`         | Edit configuration settings             |
-| `codesummary --show-config`  | Display current configuration           |
-| `codesummary --reset-config` | Reset configuration to defaults         |
-| `codesummary --help`         | Show help information                   |
-### Command Line Options
-| Option                | Description                              |
-| --------------------- | ---------------------------------------- |
-| `-o, --output <path>` | Override output directory for this run   |
-| `--rag`               | Generate RAG-optimized JSON output      |
-| `--both`              | Generate both PDF and RAG outputs       |
-| `--show-config`       | Display current configuration            |
-| `--reset-config`      | Reset configuration and run setup wizard |
-| `-h, --help`          | Show help message                        |
-### Examples
-```bash
-# Generate PDF with default settings
-codesummary
-# Generate RAG JSON for AI/ML applications
-codesummary --rag
-# Generate both PDF and RAG outputs
-codesummary --both
-# Save outputs to specific directory
-codesummary --both --output ~/Documents/AIData
-# Edit configuration
-codesummary config
-# View current settings
-codesummary --show-config
-```
-## ⚙️ Configuration
-CodeSummary stores global configuration in:
-- **Linux/macOS**: `~/.codesummary/config.json`
-- **Windows**: `%APPDATA%\\CodeSummary\\config.json`
-### Default Configuration
-```json
-{
-  "output": {
-    "mode": "fixed",
-    "fixedPath": "~/Desktop/CodeSummaries"
-  },
-  "allowedExtensions": [
-    ".json", ".ts", ".js", ".jsx", ".tsx", ".xml", ".html",
-    ".css", ".scss", ".md", ".txt", ".py", ".java", ".cs",
-    ".cpp", ".c", ".h", ".yaml", ".yml", ".sh", ".bat",
-    ".ps1", ".php", ".rb", ".go", ".rs", ".swift", ".kt",
-    ".scala", ".vue", ".svelte", ".dockerfile", ".sql", ".graphql"
-  ],
-  "excludeDirs": [
-    "node_modules", ".git", ".vscode", "dist", "build",
-    "coverage", "out", "__pycache__", ".next", ".nuxt"
-  ],
-  "styles": {
-    "colors": {
-      "title": "#333353",
-      "section": "#00FFB9",
-      "text": "#333333",
-      "error": "#FF4D4D",
-      "footer": "#666666"
-    },
-    "layout": {
-      "marginLeft": 40,
-      "marginTop": 40,
-      "marginRight": 40,
-      "footerHeight": 20
-    }
-  },
-  "settings": {
-    "documentTitle": "Project Code Summary",
-    "maxFilesBeforePrompt": 500
-  }
-}
-```
-## 📋 PDF Structure
-Generated PDFs use **A4 format** with optimized margins and contain three main sections:
-### 1. Project Overview
-- Document title and project name
-- Generation timestamp
-- List of included file types with descriptions
-### 2. File Structure
-- Complete hierarchical listing of all included files
-- Organized by relative paths from project root
-- Sorted alphabetically for easy navigation
-### 3. File Content
-- **Complete source code** for each file (no truncation)
-- Proper formatting with monospace fonts for code
-- Intelligent text wrapping without overlap
-- Natural page breaks when needed
-- Error handling for unreadable files
-## 🤖 RAG JSON Structure *(New in v1.1.0)*
-The RAG-optimized JSON output is purpose-built for AI/ML applications, vector databases, and LLM integration:
-### 📊 **Complete JSON Schema**
-```json
-{
-  "metadata": {
-    "projectName": "MyProject",
-    "generatedAt": "2025-07-31T08:00:00.000Z",
-    "version": "3.1.0",
-    "schemaVersion": "1.0",
-    "schemaUrl": "https://github.com/skamoll/CodeSummary/schemas/rag-output.json",
-    "config": {
-      "maxTokensPerChunk": 1000,
-      "tokenEstimationMethod": "enhanced_heuristic_v1.0"
-    }
-  },
-  "files": [
-    {
-      "id": "abc123def456",
-      "path": "src/component.js",
-      "language": "JavaScript",
-      "size": 2048,
-      "hash": "sha256-...",
-      "chunks": [
-        {
-          "id": "chunk_abc123def456_0",
-          "content": "function myFunction() { ... }",
-          "tokenEstimate": 45,
-          "lineStart": 1,
-          "lineEnd": 15,
-          "chunkingMethod": "semantic-function",
-          "context": "function_myFunction",
-          "imports": ["lodash", "react"],
-          "calls": ["useState", "useEffect"]
-        }
-      ]
-    }
-  ],
-  "index": {
-    "summary": {
-      "fileCount": 42,
-      "chunkCount": 387,
-      "totalBytes": 1048576,
-      "languages": ["JavaScript", "TypeScript"],
-      "extensions": [".js", ".ts"]
-    },
-    "chunkOffsets": {
-      "chunk_abc123def456_0": {
-        "jsonStart": 12045,
-        "jsonEnd": 12389,
-        "contentStart": 12123,
-        "contentEnd": 12356,
-        "filePath": "src/component.js"
-      }
-    },
-    "fileOffsets": {
-      "abc123def456": [8192, 16384]
-    },
-    "statistics": {
-      "processingTimeMs": 245,
-      "bytesPerSecond": 4278190,
-      "chunksWithValidOffsets": 387
-    }
-  }
-}
-```
-### 🎯 **Key RAG Features**
-#### **1. Semantic Chunking**
-- **Function-based segmentation**: Each function, class, or logical block becomes a chunk
-- **Context preservation**: Maintains relationships between code elements
-- **Smart boundaries**: Respects language syntax and structure
-- **Metadata enrichment**: Includes imports, function calls, and context tags
-#### **2. Precision Offsets (99.8% accuracy)**
-- **Byte-accurate positioning**: Exact start/end positions for rapid seeking
-- **Dual offset system**: Both JSON structure and content offsets
-- **Instant retrieval**: No need to parse entire file to access specific chunks
-- **Vector DB optimized**: Perfect for embedding-based retrieval systems
-#### **3. Enhanced Token Estimation**
-- **Language-aware calculation**: JavaScript gets different treatment than Python
-- **Syntax consideration**: Accounts for operators, brackets, and language-specific tokens
-- **20% more accurate**: Better LLM context planning and token budget management
-- **Multiple heuristics**: Character count, word count, and syntax analysis combined
-#### **4. Complete Statistics & Monitoring**
-- **Processing metrics**: Time, throughput, success rates
-- **Quality indicators**: Valid offsets, empty files, error tracking
-- **Project insights**: Language distribution, file sizes, chunk density
-### 🚀 **RAG Integration Examples**
-#### **Vector Database Integration**
-```javascript
-// Load RAG output
-const ragData = JSON.parse(fs.readFileSync('project_rag.json'));
-// Extract chunks for embedding
-const chunks = ragData.files.flatMap(file =>
-  file.chunks.map(chunk => ({
-    id: chunk.id,
-    content: chunk.content,
-    metadata: {
-      filePath: file.path,
-      language: file.language,
-      tokenEstimate: chunk.tokenEstimate,
-      context: chunk.context
-    }
-  }))
-);
-// Create embeddings and store in vector DB
-for (const chunk of chunks) {
-  const embedding = await createEmbedding(chunk.content);
-  await vectorDB.store(chunk.id, embedding, chunk.metadata);
-}
-```
-#### **Rapid Content Retrieval**
-```javascript
-// Fast chunk access using offsets
-const chunkId = 'chunk_abc123def456_15';
-const offset = ragData.index.chunkOffsets[chunkId];
-// Direct file seeking (no JSON parsing needed)
-const fd = fs.openSync('project_rag.json', 'r');
-const buffer = Buffer.alloc(offset.contentEnd - offset.contentStart);
-fs.readSync(fd, buffer, 0, buffer.length, offset.contentStart);
-const chunkContent = buffer.toString();
-```
-#### **LLM Context Building**
-```javascript
-// Smart context assembly
-function buildContext(relevantChunkIds, maxTokens = 4000) {
-  let context = '';
-  let tokenCount = 0;
-  for (const chunkId of relevantChunkIds) {
-    const chunk = findChunkById(chunkId);
-    if (tokenCount + chunk.tokenEstimate <= maxTokens) {
-      context += `// File: ${chunk.filePath}\n${chunk.content}\n\n`;
-      tokenCount += chunk.tokenEstimate;
-    }
-  }
-  return { context, tokenCount };
-}
-```
-### 📈 **Performance Benefits**
-| Operation | Traditional Parsing | RAG Offsets | Speedup |
-|-----------|-------------------|-------------|----------|
-| Single chunk access | ~50ms | ~0.1ms | **500x** |
-| Multiple chunk retrieval | ~200ms | ~0.5ms | **400x** |
-| File-based filtering | ~100ms | ~0.2ms | **500x** |
-| Context assembly | ~300ms | ~1ms | **300x** |
-## 🔧 Advanced Features
-### Smart File Conflict Handling
-When the target PDF file is in use (e.g., open in a PDF viewer), CodeSummary automatically creates a timestamped version:
-```bash
-# Original filename
-MYPROJECT_code.pdf
-# If file is in use, creates:
-MYPROJECT_code_20250729_141602.pdf
-```
-### Large File Processing
-- **No file size limits**: Processes files of any size completely
-- **Progress indicators**: Shows processing status for large files
-- **Memory efficient**: Uses streaming for optimal performance
-- **Smart warnings**: Informs about large files being processed
-### Terminal Compatibility
-- **Universal compatibility**: Works with all terminal types and operating systems
-- **No special characters**: Uses standard ASCII text for maximum compatibility
-- **Clear output**: Color-coded messages with fallback text indicators
-## 🎨 Supported File Types
-CodeSummary supports an extensive range of text-based file formats:
-| Extension | Language/Type  | Extension    | Language/Type |
-| --------- | -------------- | ------------ | ------------- |
-| `.js`     | JavaScript     | `.py`        | Python        |
-| `.ts`     | TypeScript     | `.java`      | Java          |
-| `.jsx`    | React JSX      | `.cs`        | C#            |
-| `.tsx`    | TypeScript JSX | `.cpp`       | C++           |
-| `.json`   | JSON           | `.c`         | C             |
-| `.xml`    | XML            | `.h`         | Header        |
-| `.html`   | HTML           | `.yaml/.yml` | YAML          |
-| `.css`    | CSS            | `.sh`        | Shell Script  |
-| `.scss`   | SCSS           | `.bat`       | Batch File    |
-| `.md`     | Markdown       | `.ps1`       | PowerShell    |
-| `.txt`    | Plain Text     | `.php`       | PHP           |
-| `.go`     | Go             | `.rb`        | Ruby          |
-| `.rs`     | Rust           | `.swift`     | Swift         |
-| `.kt`     | Kotlin         | `.scala`     | Scala         |
-| `.vue`    | Vue.js         | `.svelte`    | Svelte        |
-| `.sql`    | SQL            | `.graphql`   | GraphQL       |
-## 🛠️ Development
-### Project Structure
-```
-codesummary/
-├── bin/
-│   └── codesummary.js      # Global executable entry point
-├── src/
-│   ├── cli.js              # Command line interface
-│   ├── configManager.js    # Global configuration management
-│   ├── scanner.js          # File system scanning and filtering
-│   ├── pdfGenerator.js     # PDF creation and formatting
-│   └── errorHandler.js     # Comprehensive error handling
-├── package.json
-├── README.md
-└── features.md
-```
-### Building from Source
-```bash
-# Clone repository
-git clone https://github.com/skamoll/CodeSummary.git
-cd CodeSummary
-# Install dependencies
-npm install
-# Test the CLI
-node bin/codesummary.js --help
-# Run locally without global install
-node bin/codesummary.js
-```
-## 🔍 Troubleshooting
-### Common Issues
-**Configuration not found**
-- Run `codesummary` to trigger first-time setup
-- Check file permissions in config directory
-**PDF generation fails**
-- Verify output directory permissions
-- Ensure Node.js version ≥18.0.0
-- Close any open PDF viewers on the target file
-**Files not showing up**
-- Check that file extensions are in `allowedExtensions`
-- Verify directories aren't in `excludeDirs` list
-- Ensure files are text-based (not binary)
-**Large project performance**
-- Adjust `maxFilesBeforePrompt` in configuration
-- Use extension filtering to reduce file count
-- CodeSummary handles large files efficiently with streaming
-### Getting Help
-1. Run `codesummary --help` for usage information
-2. Check configuration with `codesummary --show-config`
-3. Reset configuration with `codesummary --reset-config`
-4. Open an issue on [GitHub](https://github.com/skamoll/CodeSummary/issues)
-## 🤝 Contributing
-We welcome contributions! Please see our [Contributing Guidelines](CONTRIBUTING.md) for details.
-### Development Setup
-1. Fork the repository
-2. Clone your fork: `git clone https://github.com/yourusername/CodeSummary.git`
-3. Install dependencies: `npm install`
-4. Create a feature branch: `git checkout -b feature-name`
-5. Make your changes and test thoroughly
-6. Submit a pull request
-## 📄 License
-This project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details.
-### License Summary
-- ✅ Commercial use permitted
-- ✅ Modification allowed
-- ✅ Distribution allowed
-- ✅ Private use allowed
-- ❗ Copyleft: derivative works must use GPL-3.0
-- ❗ Must include license and copyright notice
-## 🙏 Acknowledgments
-- Built with [PDFKit](https://pdfkit.org/) for PDF generation
-- Uses [Inquirer.js](https://github.com/SBoudrias/Inquirer.js) for interactive prompts
-- Styled with [Chalk](https://github.com/chalk/chalk) for colorful console output
-- Uses [Ora](https://github.com/sindresorhus/ora) for progress indicators
-## 📊 Roadmap
-### Future Enhancements
-- [ ] Syntax highlighting in PDF output
-- [ ] Clickable table of contents with bookmarks
-- [ ] Multiple output formats (HTML, JSON, Markdown)
-- [ ] Project metrics and code statistics
-- [ ] CI/CD integration mode for automated documentation
-- [ ] Custom PDF themes and styling options
-- [ ] Plugin system for custom processors
-## 📞 Support
-- 📧 Report bugs: [GitHub Issues](https://github.com/skamoll/CodeSummary/issues)
-- 💬 Ask questions: [GitHub Discussions](https://github.com/skamoll/CodeSummary/discussions)
-- 📖 Documentation: [Wiki](https://github.com/skamoll/CodeSummary/wiki)
----
-**Made with ❤️ for developers worldwide**
+# CodeSummary
+[![npm version](https://badge.fury.io/js/codesummary.svg)](https://badge.fury.io/js/codesummary)
+[![Node.js Version](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg)](https://nodejs.org/)
+[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
+[![Cross-Platform](https://img.shields.io/badge/platform-Windows%20%7C%20macOS%20%7C%20Linux-lightgrey)](#)
+A **cross-platform CLI tool** that scans your project's source code and generates three types of output:
+- **PDF** — clean, professional A4 documentation for code reviews and audits
+- **RAG JSON** — semantic chunks with byte offsets and token estimates, ready for vector databases and LLM pipelines
+- **LLM Markdown** — token-optimised single file to paste directly into any chat-based LLM
+## 🚀 Key Features
+- **Three output formats**: PDF, RAG JSON, LLM Markdown — pick what you need
+- **Intelligent scanning**: recursive traversal with configurable whitelist filtering
+- **Extensive language support**: 50+ file types out of the box
+- **Versioned output files**: `-v1`, `-v2` suffixes instead of overwriting or timestamps
+- **Non-interactive mode**: `--no-interactive` for CI/CD and scripted use
+- **Smart config migration**: new defaults merge into existing config without overwriting customisations
+- **Cross-platform**: identical behaviour on Windows, macOS, and Linux
+## 📦 Installation
+```bash
+npm install -g codesummary
+```
+**Requirements**: Node.js ≥ 18.0.0
+---
+## 🎯 Output Modes
+### 📄 PDF — Documentation & audits
+Generate a professional A4 PDF with file structure and complete source content:
+```bash
+codesummary
+# or explicitly:
+codesummary --format pdf
+# Output: MYPROJECT_code.pdf
+```
+Best for: code reviews, client handovers, compliance audits, archival snapshots.
+---
+### 🤖 RAG — Vector databases & AI pipelines
+Generate a structured JSON file built for embedding and retrieval:
+```bash
+codesummary --format rag
+# Output: MYPROJECT_rag.json
+```
+The JSON contains semantic chunks (split by function, class, or logical block) with:
+- Byte-accurate content offsets for fast seeking
+- SHA-256 file hashes for deduplication
+- Token estimates for context budget planning
+- Import/call extraction for graph-based retrieval
+- Full statistics for monitoring
+Best for: building RAG systems, loading code into a vector database (Pinecone, Qdrant, Chroma, etc.), or programmatic LLM integration where you control chunking and retrieval.
+---
+### 💬 LLM — Direct chat with any AI assistant
+Generate a single, token-optimised Markdown file to paste directly into any LLM:
+```bash
+codesummary --format llm
+# Output: MYPROJECT_llm.md
+```
+The file contains a project header, a complete file tree, and each file's content in a fenced code block with syntax highlighting hints. These lossless optimisations are applied automatically to reduce token count:
+- Line endings normalised to `\n`
+- Trailing whitespace stripped per line
+- Leading/trailing blank lines removed per file
+- JSON files compacted (re-serialised without indentation)
+- Markdown files: max 2 consecutive blank lines preserved
+- All other files: max 1 consecutive blank line
+Best for: asking any LLM chat interface to review, explain, or work with your codebase in a single paste — much more token-efficient than a PDF.
+---
+### 🔄 Both — PDF + RAG together
+```bash
+codesummary --format both
+# Output: MYPROJECT_code.pdf + MYPROJECT_rag.json
+```
+Uses a single scan pass. If one format fails, the other still completes.
+---
+## 📖 Usage
+### Quick start
+```bash
+# First run: interactive setup wizard
+codesummary
+# Generate LLM Markdown for the current project
+codesummary --format llm
+# Generate RAG JSON and save to a specific directory
+codesummary --format rag --output ./ai-data
+# Skip all prompts (CI-friendly)
+codesummary --format llm --no-interactive
+# Generate everything at once
+codesummary --format both
+```
+### Interactive workflow
+#### 1. First-run setup
+```
+Welcome to CodeSummary!
+No configuration found. Starting setup...
+Where should the output be saved by default?
+> [ ] Current working directory (relative mode)
+> [x] Fixed folder (absolute mode)
+Enter absolute path for fixed folder:
+> ~/Desktop/CodeSummaries
+```
+#### 2. Extension selection
+```
+Scan Summary:
+   Extensions found: .js, .ts, .md, .json
+   Total files: 127  —  Total size: 2.4 MB
+Select file extensions to include:
+[x] .js → JavaScript (42 files)
+[x] .ts → TypeScript (28 files)
+[x] .md → Markdown (5 files)
+[ ] .json → JSON (52 files)
+```
+#### 3. Output
+```
+SUCCESS: LLM-optimised Markdown generated successfully!
+   Output: ~/Desktop/CodeSummaries/MYPROJECT_llm.md
+   Extensions: .js, .ts, .md
+   Total files: 75
+   File size: 1.1 MB
+   Ready to paste into any LLM chat interface
+```
+### Command reference
+| Command | Description |
+| ------- | ----------- |
+| `codesummary` | Scan and generate PDF (default) |
+| `codesummary --format pdf` | Generate PDF documentation |
+| `codesummary --format rag` | Generate RAG-optimised JSON |
+| `codesummary --format llm` | Generate LLM-optimised Markdown |
+| `codesummary --format both` | Generate PDF + RAG JSON |
+| `codesummary config` | Edit configuration interactively |
+| `codesummary --show-config` | Display current configuration |
+| `codesummary --reset-config` | Reset configuration to defaults |
+### Options
+| Option | Short | Description |
+| ------ | ----- | ----------- |
+| `--format <format>` | `-f` | Output format: `pdf` (default), `rag`, `llm`, or `both` |
+| `--output <path>` | `-o` | Override output directory for this run |
+| `--no-interactive` | | Skip all prompts; auto-select all extensions |
+| `--show-config` | | Display current configuration |
+| `--reset-config` | | Reset configuration to defaults |
+| `--help` | `-h` | Show help |
+| `--version` | `-v` | Show version |
+---
+## ⚙️ Configuration
+Configuration is stored globally at:
+- **Linux/macOS**: `~/.codesummary/config.json`
+- **Windows**: `%APPDATA%\CodeSummary\config.json`
+Existing configuration is never overwritten on upgrade — new defaults are merged in automatically.
+### Default configuration
+```json
+{
+  "output": {
+    "mode": "fixed",
+    "fixedPath": "~/Desktop/CodeSummaries"
+  },
+  "allowedExtensions": [
+    ".js", ".jsx", ".ts", ".tsx", ".json", ".html", ".css", ".scss",
+    ".md", ".txt", ".py", ".java", ".cs", ".cpp", ".c", ".h",
+    ".xml", ".yaml", ".yml", ".sh", ".bat", ".ps1",
+    ".cfg", ".conf", ".env", ".local", ".service", ".timer",
+    ".ino", ".j2", ".csv", ".tsv", ".crt", ".sql",
+    ".toml", ".ini", ".properties", ".tf", ".tfvars", ".proto", ".prisma",
+    ".dart", ".lua", ".r", ".ex", ".exs", ".pl", ".mk", ".cmake",
+    ".mdx", ".astro", ".graphql", ".gql"
+  ],
+  "excludeDirs": [
+    "node_modules", ".git", ".vscode", "dist", "build", "coverage",
+    "out", "__pycache__", ".next", ".nuxt",
+    ".idea", "target", ".gradle", "venv", ".venv",
+    ".pytest_cache", ".mypy_cache", ".tox", ".terraform", ".turbo",
+    ".angular", ".svelte-kit", ".yarn", ".pnpm-store",
+    ".expo", ".dart_tool", "storybook-static", "htmlcov"
+  ],
+  "excludeFiles": [
+    "*-lock.json", "*.lock", "*.min.js", "*.min.css", "*.map",
+    ".DS_Store", "Thumbs.db", "desktop.ini", "ehthumbs.db",
+    "*.pyc", "*.pyo", "*.class", "*.log", "*.tmp", "*.temp",
+    "*.swp", "*.bak", "*.orig"
+  ],
+  "settings": {
+    "documentTitle": "Project Code Summary",
+    "maxFilesBeforePrompt": 500
+  }
+}
+```
+---
+## 📋 PDF structure
+Generated PDFs are A4 with three sections:
+1. **Project overview** — title, project name, generation timestamp, included file types
+2. **File structure** — complete sorted file listing
+3. **File content** — full source of every selected file, monospace font, no truncation
+---
+## 🤖 RAG JSON structure
+```json
+{
+  "metadata": {
+    "projectName": "MyProject",
+    "generatedAt": "2025-07-31T08:00:00.000Z",
+    "version": "3.1.0"
+  },
+  "files": [
+    {
+      "id": "abc123def456",
+      "path": "src/component.js",
+      "language": "JavaScript",
+      "hash": "sha256-...",
+      "chunks": [
+        {
+          "id": "chunk_abc123def456_0",
+          "content": "function myFunction() { ... }",
+          "tokenEstimate": 45,
+          "lineStart": 1,
+          "lineEnd": 15,
+          "chunkingMethod": "semantic-function",
+          "context": "function_myFunction",
+          "imports": ["lodash", "react"],
+          "calls": ["useState", "useEffect"]
+        }
+      ]
+    }
+  ],
+  "index": {
+    "chunkOffsets": {
+      "chunk_abc123def456_0": {
+        "contentStart": 12123,
+        "contentEnd": 12356,
+        "filePath": "src/component.js"
+      }
+    },
+    "statistics": { "processingTimeMs": 245, "chunksWithValidOffsets": 387 }
+  }
+}
+```
+### RAG integration example
+```javascript
+const ragData = JSON.parse(fs.readFileSync('project_rag.json'));
+// Extract all chunks for embedding
+const chunks = ragData.files.flatMap(file =>
+  file.chunks.map(chunk => ({
+    id: chunk.id,
+    content: chunk.content,
+    metadata: { filePath: file.path, language: file.language }
+  }))
+);
+// Store in your vector database
+for (const chunk of chunks) {
+  const embedding = await embed(chunk.content);
+  await vectorDB.upsert(chunk.id, embedding, chunk.metadata);
+}
+```
+---
+## 💬 LLM Markdown structure
+```markdown
+# MyProject — Code Summary
+**Generated:** 2026-04-05 | **Files:** 42 | **Total size:** 1.2 MB
+---
+## File Tree
+```
+  src/cli.js
+  src/scanner.js
+  ...
+```
+---
+## src/cli.js
+```js
+import chalk from 'chalk';
+...
+```
+```
+Paste the `.md` file directly into any LLM chat interface. No further processing needed.
+---
+## 🔧 Advanced features
+### Versioned output filenames
+When the target file already exists, CodeSummary creates a versioned copy instead of overwriting:
+```
+MYPROJECT_llm.md        ← exists
+MYPROJECT_llm-v1.md     ← created
+MYPROJECT_llm-v1.md     ← exists on next run
+MYPROJECT_llm-v2.md     ← created
+```
+This applies to all three output formats (PDF, RAG JSON, LLM Markdown).
+### Non-interactive mode
+Skip all prompts and auto-select all detected extensions:
+```bash
+codesummary --format llm --no-interactive
+```
+Useful for CI pipelines or scripted documentation generation.
+---
+## 🎨 Supported file types
+| Extension | Type | Extension | Type |
+| --------- | ---- | --------- | ---- |
+| `.js` `.jsx` | JavaScript | `.ts` `.tsx` | TypeScript |
+| `.py` | Python | `.java` | Java |
+| `.cs` | C# | `.cpp` `.c` `.h` | C/C++ |
+| `.go` | Go | `.rs` | Rust |
+| `.swift` | Swift | `.kt` | Kotlin |
+| `.rb` | Ruby | `.php` | PHP |
+| `.dart` | Dart | `.lua` | Lua |
+| `.r` | R | `.ex` `.exs` | Elixir |
+| `.pl` | Perl | `.scala` | Scala |
+| `.html` | HTML | `.css` `.scss` | CSS |
+| `.vue` | Vue.js | `.svelte` | Svelte |
+| `.astro` | Astro | `.mdx` | MDX |
+| `.json` | JSON | `.yaml` `.yml` | YAML |
+| `.toml` | TOML | `.xml` | XML |
+| `.ini` | INI | `.properties` | Java Properties |
+| `.tf` `.tfvars` | Terraform | `.proto` | Protobuf |
+| `.prisma` | Prisma | `.graphql` `.gql` | GraphQL |
+| `.sql` | SQL | `.md` `.txt` | Docs |
+| `.sh` `.bash` | Shell | `.bat` | Batch |
+| `.ps1` | PowerShell | `.mk` `.cmake` | Build |
+| `.cfg` `.conf` | Config | `.env` `.local` | Environment |
+| `.service` `.timer` | Systemd | `.ino` | Arduino |
+| `.j2` | Jinja2 | `.csv` `.tsv` | Data |
+| `.crt` | Certificate | `.dockerfile` | Docker |
+---
+## 🛠️ Project structure
+```
+codesummary/
+├── bin/
+│   └── codesummary.js      # Entry point
+├── src/
+│   ├── cli.js              # Argument parsing, orchestration
+│   ├── scanner.js          # Recursive directory scanning
+│   ├── pdfGenerator.js     # PDF generation (PDFKit)
+│   ├── ragGenerator.js     # RAG JSON generation with semantic chunking
+│   ├── llmGenerator.js     # LLM Markdown generation with optimisations
+│   ├── configManager.js    # Global config storage and migration
+│   ├── ragConfig.js        # RAG-specific configuration and YAML loading
+│   ├── errorHandler.js     # Centralised error handling and path validation
+│   └── utils.js            # Shared utilities (formatFileSize, etc.)
+├── rag-schema.json
+├── raggen.config.yaml
+└── package.json
+```
+---
+## 🔍 Troubleshooting
+**No files found after scan**
+- Check `allowedExtensions` in your config (`codesummary --show-config`)
+- Verify the directory is not listed in `excludeDirs`
+**Output file not generated**
+- Check write permissions on the output directory
+- Try `--output ./` to write to the current directory
+**Non-ASCII characters in paths cause issues**
+- Update to v1.2.0+ which fixes Windows path handling for accented characters
+**CI pipeline hangs**
+- Add `--no-interactive` to skip all prompts
+---
+## 🤝 Contributing
+1. Fork the repository
+2. Clone: `git clone https://github.com/skamoll/CodeSummary.git`
+3. Install: `npm install`
+4. Test: `node bin/codesummary.js --help`
+5. Submit a pull request
+---
+## 📄 License
+GNU General Public License v3.0 — see [LICENSE](LICENSE) for details.
+---
+## 📊 Roadmap
+- [ ] Syntax highlighting in PDF output
+- [ ] Clickable table of contents in PDF
+- [x] LLM-optimised Markdown output (`--format llm`)
+- [x] Versioned output filenames (`-v1`, `-v2`)
+- [x] Non-interactive mode (`--no-interactive`)
+- [x] RAG JSON with semantic chunking
+- [ ] `--format all` (PDF + RAG + LLM in one pass)
+- [ ] Git integration (document only changed files)
+- [ ] CI/CD plugin for automated documentation
+---
+## 📞 Support
+- Report bugs: [GitHub Issues](https://github.com/skamoll/CodeSummary/issues)
+- Questions: [GitHub Discussions](https://github.com/skamoll/CodeSummary/discussions)