npm - codesummary - Versions diffs - 1.2.0 → 1.2.2 - Mend

codesummary 1.2.0 → 1.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/CHANGELOG.md +26 -213
package/README.md +61 -395
package/features.md +25 -386
package/package.json +13 -17
package/src/ai/errors.js +85 -0
package/src/ai/featureFlags.js +8 -0
package/src/ai/promptTemplates.js +337 -0
package/src/ai/providerClient.js +81 -0
package/src/ai/providers/ollama.js +92 -0
package/src/ai/providers/openaiCompatible.js +96 -0
package/src/analysis/repositorySignals.js +196 -0
package/src/cli.js +819 -77
package/src/configManager.js +21 -0
package/src/graph/adapters/baseAdapter.js +24 -0
package/src/graph/adapters/javascriptAdapter.js +53 -0
package/src/graph/adapters/pythonAdapter.js +77 -0
package/src/graph/graphEngine.js +151 -0
package/src/graph/graphMetrics.js +79 -0
package/src/graph/graphSchema.js +30 -0
package/src/graph/universalExtractor.js +29 -0
package/src/llmGenerator.js +723 -8
package/src/pdfGenerator.js +1189 -275
package/src/renderers/llmSummaryRenderer.js +14 -0
package/src/renderers/pdfThemeRenderer.js +685 -0
package/src/scanner.js +115 -8
package/rag-schema.json +0 -114
package/src/ragConfig.js +0 -369
package/src/ragGenerator.js +0 -1740

package/features.md CHANGED Viewed

@@ -2,48 +2,13 @@
 ## 1. Overview
-**CodeSummary** is a **Node.js-based, cross-platform CLI tool** (distributed via **npm**) that automatically scans a project's source code and generates output in three formats:
+**CodeSummary** is a Node.js-based, cross-platform CLI tool that automatically scans project source code to generate documentation optimized for both Large Language Models (LLMs) and human auditors.
-- **PDF**: clean, professional A4 documentation for code reviews, audits, and archival snapshots
-- **RAG JSON**: structured output with semantic chunks, byte offsets, and token estimates — built for vector databases and programmatic LLM integration
-- **LLM Markdown**: a single token-optimised Markdown file for pasting directly into any chat-based LLM (any LLM chat interface)
-> **Repository**: [https://github.com/skamoll/CodeSummary](https://github.com/skamoll/CodeSummary)
-> **npm Package Name**: `codesummary`
----
-### 1.1 Target Audience
-- **Developers** who need quick, complete overviews of large projects
-- **Auditors / Consultants** requiring traceable documentation snapshots
-- **Educators / Students** preparing comprehensive code handovers
-- **AI Engineers** building RAG pipelines or feeding code into vector databases
-- **Anyone** who wants to work with their codebase inside a chat-based LLM efficiently
----
-### 1.2 Core Objectives
-1. **Three output modes** — PDF for humans, RAG JSON for machines, LLM Markdown for chat
-2. **Cross-platform reliability** — identical behaviour on Windows, macOS, and Linux
-3. **Lossless content optimisation** — reduce token count without altering code meaning
-4. **Smart config migration** — new defaults merge into existing config without data loss
-5. **Versioned output** — `-v1`, `-v2` suffixes prevent overwrites and timestamp clutter
-6. **Non-interactive operation** — `--no-interactive` for CI/CD pipelines
----
-### 1.3 Technology Stack
-- **Node.js** ≥ 18 (native ES modules)
-- **PDFKit** for PDF generation with streaming support
-- **Inquirer.js** for interactive prompts
-- **Chalk** for terminal styling
-- **Ora** for progress indicators
-- **fs-extra** for enhanced file system operations
-- **js-yaml** for YAML config loading
-- **ajv** for JSON schema validation
+### 1.1 Core Objectives
+1.  **LLM-First Context**: Minimize token usage while preserving semantic meaning for AI consumption.
+2.  **Professional Documentation**: Provide high-fidelity PDF snapshots for audits and code reviews.
+3.  **Cross-Platform Reliability**: Consistent behavior across Windows, macOS, and Linux.
+4.  **Quality Guaranteed**: Built-in regression testing suite to ensure reliable code analysis.
 ---
@@ -52,367 +17,41 @@
 ### 2.1 Command-Line Interface
 #### 2.1.1 Primary Commands
 | Command | Description |
-|---------|-------------|
-| `codesummary` | Scan current directory, generate PDF |
-| `codesummary --format rag` | Generate RAG-optimised JSON |
-| `codesummary --format llm` | Generate LLM-optimised Markdown |
-| `codesummary --format both` | Generate PDF + RAG JSON (single scan) |
-| `codesummary config` | Launch interactive configuration editor |
-| `codesummary --show-config` | Display current configuration |
-| `codesummary --reset-config` | Reset to defaults and run setup wizard |
-| `codesummary --help` | Show help |
-| `codesummary --version` | Show version |
-#### 2.1.2 Command-Line Options
-| Option | Short | Description |
-|--------|-------|-------------|
-| `--format <format>` | `-f` | `pdf` (default), `rag`, `llm`, or `both` |
-| `--output <path>` | `-o` | Override output directory for this run |
-| `--no-interactive` | | Skip all prompts; auto-select all extensions |
-| `--show-config` | | Display current configuration |
-| `--reset-config` | | Reset configuration to defaults |
-| `--help` | `-h` | Show help message |
-| `--version` | `-v` | Show version |
-#### 2.1.3 Interactive Workflow
-1. **First-run setup** — detects missing config, launches setup wizard, creates output directory
-2. **Directory scanning** — recursive scan with whitelist filtering and exclusion rules
-3. **Extension selection** — checkbox prompt with file counts; skipped with `--no-interactive`
-4. **Generation** — selected format(s) generated, versioned filenames used if target exists
+| :--- | :--- |
+| `codesummary` | Scan current directory and generate PDF (default). |
+| `codesummary --format llm` | Generate LLM-optimized Markdown. |
+| `codesummary --format both` | Generate both PDF and LLM Markdown. |
+| `codesummary config` | Launch the interactive configuration editor. |
+| `npm test` | Execute the automated regression test suite. |
 ---
 ### 2.2 Output Formats
-#### 2.2.1 PDF (`--format pdf`)
-Generates a professional A4 PDF with three sections:
-1. **Project overview**: title, project name, timestamp, included file types
-2. **File structure**: sorted complete file listing
-3. **File content**: full source of every selected file, monospace font, no truncation
-File naming: `PROJECTNAME_code.pdf` → `PROJECTNAME_code-v1.pdf` → `PROJECTNAME_code-v2.pdf` ...
-#### 2.2.2 RAG JSON (`--format rag`)
-Generates a structured JSON file built for embedding and retrieval in AI/ML pipelines.
-**When to use RAG:**
-- Loading code into a vector database (Pinecone, Qdrant, Chroma, etc.)
-- Building a retrieval-augmented generation pipeline
-- Programmatic LLM integration where you control chunking and retrieval
-- Rapid chunk seeking via byte offsets without re-parsing the full JSON
-**JSON structure:**
-```json
-{
-  "metadata": { "projectName": "...", "generatedAt": "...", "version": "..." },
-  "files": [
-    {
-      "id": "abc123",
-      "path": "src/component.js",
-      "language": "JavaScript",
-      "hash": "sha256-...",
-      "chunks": [
-        {
-          "id": "chunk_abc123_0",
-          "content": "function myFn() { ... }",
-          "tokenEstimate": 45,
-          "lineStart": 1,
-          "lineEnd": 15,
-          "chunkingMethod": "semantic-function",
-          "context": "function_myFn",
-          "imports": ["react"],
-          "calls": ["useState"]
-        }
-      ]
-    }
-  ],
-  "index": {
-    "chunkOffsets": {
-      "chunk_abc123_0": { "contentStart": 12123, "contentEnd": 12356 }
-    },
-    "statistics": { "processingTimeMs": 245, "chunksWithValidOffsets": 387 }
-  }
-}
-```
-**Key RAG features:**
-- Semantic chunking by function, class, or logical block
-- Byte-accurate content offsets for fast random access
-- SHA-256 file hashes for deduplication
-- Language-aware token estimation (±20% accuracy)
-- Import and call graph extraction
-- YAML-configurable via `raggen.config.yaml`
-File naming: `PROJECTNAME_rag.json` → `PROJECTNAME_rag-v1.json` → ...
-#### 2.2.3 LLM Markdown (`--format llm`)
-Generates a single Markdown file optimised for direct consumption by chat-based LLMs.
-**When to use LLM Markdown:**
-- Asking any LLM chat interface to review or explain your codebase
-- One-off questions that don't justify setting up a RAG pipeline
-- Sharing project context in a conversation without a file upload feature
-**File structure:**
-```markdown
-# ProjectName — Code Summary
-**Generated:** 2026-04-05 | **Files:** 42 | **Total size:** 1.2 MB
+... (unchanged) ...
 ---
-## File Tree
+### 2.3 Quality Control
-```
-  src/cli.js
-  src/scanner.js
-  ...
-```
----
-## src/cli.js
-```js
-// full file content
-```
-```
-**Lossless optimisations applied automatically:**
-| Optimisation | Applies to | Notes |
-|---|---|---|
-| Normalise line endings (`\r\n` → `\n`) | All files | Safe for all languages |
-| Strip trailing whitespace per line | All files | Never has semantic meaning |
-| Remove leading/trailing blank lines | All files | Per-file trimming |
-| Compact JSON | `.json` files | Re-serialise without indentation |
-| Max 2 consecutive blank lines | `.md` / `.mdx` | Preserves paragraph semantics |
-| Max 1 consecutive blank line | All other files | Removes relleno without touching indentation |
-**What is never modified:**
-- Indentation (critical for Python, YAML, Makefiles)
-- Multiple spaces within a line (may be in string literals)
-- Comments
-- Code logic
-File naming: `PROJECTNAME_llm.md` → `PROJECTNAME_llm-v1.md` → ...
-#### 2.2.4 Both (`--format both`)
-Runs PDF and RAG generation in sequence using a single scan pass. Uses continue-on-error: if one format fails, the other still completes. Exit code 1 if either failed.
----
-### 2.3 Configuration Management
-#### 2.3.1 Storage Locations
-- **Linux/macOS**: `~/.codesummary/config.json`
-- **Windows**: `%APPDATA%\CodeSummary\config.json`
-#### 2.3.2 Configuration Structure
-```json
-{
-  "configVersion": "1.1.0",
-  "output": {
-    "mode": "fixed | relative",
-    "fixedPath": "absolute path"
-  },
-  "allowedExtensions": ["array of extensions"],
-  "excludeDirs": ["array of directory names"],
-  "excludeFiles": ["array of glob patterns"],
-  "styles": { "colors": {}, "layout": {}, "fonts": {} },
-  "settings": {
-    "documentTitle": "Project Code Summary",
-    "maxFilesBeforePrompt": 500
-  }
-}
-```
-#### 2.3.3 Smart Migration
-On every run, new defaults are merged into the existing config using `smartMergeArrays`:
-- Items already present are kept in place
-- New items are appended at the end
-- User removals are respected (removed items are not re-added)
-- Changes are saved automatically and the user is notified
-#### 2.3.4 Interactive Editor
-Sections available via `codesummary config`:
-- Output settings (mode, fixed path)
-- Allowed extensions
-- Excluded directories
-- Excluded file patterns
-- General settings (document title, file warning threshold)
+#### 2.3.1 Automated Testing
+The project includes a comprehensive suite of automated tests using the native Node.js test runner:
+-   **Scanner Validation**: Ensures `.csignore` rules and directory exclusions are applied correctly.
+-   **Graph Engine Accuracy**: Verifies that file-to-file dependencies are correctly resolved across multiple languages.
+-   **Config Integrity**: Tests the stability of the global configuration manager and its migration paths.
 ---
 ### 2.4 File System Scanning
-#### 2.4.1 Algorithm
-1. Recursive directory traversal from `process.cwd()`
-2. Whitelist filtering by allowed extensions
-3. Directory exclusion by exact name match + built-in common-skip list
-4. File exclusion by glob pattern matching
-5. Symlink detection (skipped to avoid loops)
-6. File size limit: 100 MB per file
-7. Duplicate detection via absolute path tracking
-#### 2.4.2 Supported Extensions (defaults)
-**Web & JavaScript ecosystem:**
-`.js`, `.jsx`, `.ts`, `.tsx`, `.vue`, `.svelte`, `.astro`, `.mdx`
-**Backend languages:**
-`.py`, `.java`, `.cs`, `.cpp`, `.c`, `.h`, `.go`, `.rs`, `.swift`, `.kt`, `.scala`, `.rb`, `.php`, `.dart`, `.lua`, `.r`, `.ex`, `.exs`, `.pl`
-**Web & markup:**
-`.html`, `.css`, `.scss`, `.xml`
-**Data & config:**
-`.json`, `.yaml`, `.yml`, `.toml`, `.ini`, `.properties`, `.tf`, `.tfvars`, `.env`, `.local`, `.cfg`, `.conf`
-**Schema & query:**
-`.sql`, `.graphql`, `.gql`, `.proto`, `.prisma`
-**Scripts:**
-`.sh`, `.bat`, `.ps1`, `.mk`, `.cmake`
-**Documentation:**
-`.md`, `.mdx`, `.txt`
-**Specialised:**
-`.ino` (Arduino), `.j2` (Jinja2), `.service`, `.timer` (systemd), `.crt` (certificates), `.csv`, `.tsv`
-#### 2.4.3 Default Excluded Directories
-Build output: `dist`, `build`, `out`, `target`
-Dependencies: `node_modules`, `vendor`, `bower_components`
-Caches: `.cache`, `.turbo`, `.gradle`, `.yarn`, `.pnpm-store`, `.pytest_cache`, `.mypy_cache`, `.tox`, `htmlcov`
-IDE: `.git`, `.vscode`, `.idea`
-Framework: `.next`, `.nuxt`, `.angular`, `.svelte-kit`, `.expo`, `.dart_tool`, `storybook-static`
-Python: `__pycache__`, `venv`, `.venv`
-Infrastructure: `.terraform`
-#### 2.4.4 Default Excluded File Patterns
-Lock files: `*-lock.json`, `*.lock`, `composer.lock`, `Pipfile.lock`, `*-lock.yaml`
-Minified: `*.min.js`, `*.min.css`, `*.map`
-Compiled: `*.pyc`, `*.pyo`, `*.class`
-Temporary: `*.log`, `*.tmp`, `*.temp`, `*.swp`, `*.bak`, `*.orig`
-OS metadata: `.DS_Store`, `Thumbs.db`, `desktop.ini`, `ehthumbs.db`
----
-### 2.5 Versioned Output Files
-When a target file already exists, a `-vN` suffix is added instead of overwriting:
-```
-PROJECTNAME_llm.md       ← exists
-PROJECTNAME_llm-v1.md    ← created
-                         ← next run: v1 exists
-PROJECTNAME_llm-v2.md    ← created
-```
-Applies to all three formats. Existing `-vN` suffixes are stripped before re-versioning to avoid `name-v1-v1.md`.
----
-### 2.6 Non-Interactive Mode
-`--no-interactive` (or non-TTY stdin) skips:
-- Extension selection checkbox → all detected extensions selected
-- File count confirmation prompt → proceeds automatically
-Designed for use in CI/CD pipelines and scripted environments.
----
-### 2.7 Error Handling
-- **Path traversal prevention**: blocks `..`, null bytes, and Windows reserved names
-- **Non-ASCII path support**: Unicode characters in paths (e.g. `C:\Users\Andrés\...`) are preserved correctly
-- **Graceful scan errors**: permission denied and missing files are logged but don't abort the scan
-- **PDF stream errors**: file-in-use (EBUSY/EACCES) triggers versioned filename fallback
-- **LLM/RAG errors**: unreadable files emit a warning block in output instead of crashing
-- **`--format both` failures**: continue-on-error; both outputs attempted, all errors reported together
+-   **Recursive Traversal**: Deep scan of the current directory.
+-   **Smart Filtering**: Respects `allowedExtensions` and `excludeDirs` from global config.
+-   **.csignore Support**: High-priority exclusion rules using standard gitignore syntax.
 ---
 ## 3. Technical Architecture
-### 3.1 Module Structure
-```
-src/
-├── cli.js              # Argument parsing, orchestration, user interaction
-├── scanner.js          # Recursive directory scanning and filtering
-├── pdfGenerator.js     # PDF generation (PDFKit, streaming)
-├── ragGenerator.js     # RAG JSON generation with semantic chunking
-├── llmGenerator.js     # LLM Markdown generation with content optimisations
-├── configManager.js    # Global config load, save, migrate, edit
-├── ragConfig.js        # RAG-specific config (YAML loading, defaults)
-├── errorHandler.js     # Path validation, sanitisation, global error handlers
-└── utils.js            # Shared: formatFileSize, getExtensionDescription,
-                        #         matchesGlobPattern, resolveVersionedPath
-```
-### 3.2 Data Flow
-```
-bin/codesummary.js
-  └─ src/index.js          (bootstrap)
-        └─ src/cli.js       (parse args → executeMainFlow)
-              ├─ scanner.js  (scan → filesByExtension)
-              ├─ pdfGenerator.js    (format: pdf)
-              ├─ ragGenerator.js    (format: rag)  ← uses ragConfig.js
-              └─ llmGenerator.js   (format: llm)
-```
-### 3.3 Key Design Decisions
-- **ESM modules** throughout (`"type": "module"`)
-- **No singleton exports** — all modules export classes, instantiated at call site
-- **Shared utilities** in `utils.js` — single source of truth, no duplication
-- **Streaming writes** for PDF and LLM output — memory-efficient on large projects
-- **Static imports** only — dynamic `import()` avoided for consistency
----
-## 4. Security
-- Path traversal (`..`) blocked via pattern matching before any file operation
-- User-supplied paths sanitised: control characters and injection sequences removed
-- Unicode characters in paths preserved (non-ASCII allowed)
-- Windows reserved device names (CON, NUL, COM1, etc.) rejected
-- No external network calls at runtime — fully offline operation
-- Config validated against schema before use; corrupt config prompts reset
----
-## 5. Future Enhancements
-- `--format all`: PDF + RAG + LLM in a single pass
-- Syntax highlighting in PDF output
-- Clickable table of contents with bookmarks in PDF
-- Git integration: document only changed files since last commit
-- CI/CD plugin for automated documentation on push
-- Custom PDF themes and styling
+... (unchanged) ...
 ---
-**Document Version**: 3.0
-**Last Updated**: 2026-04-05
-**Status**: Implementation Complete
+**Status**: Feature Complete (v1.2.1)
+**Last Updated**: 2026-04-18

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "codesummary",
-  "version": "1.2.0",
-  "description": "Cross-platform CLI tool that generates PDF documentation, RAG-optimized JSON for vector databases, and LLM-optimized Markdown for direct use with any chat-based LLM. Perfect for code reviews, audits, RAG pipelines, and AI-assisted development.",
+  "version": "1.2.2",
+  "description": "Cross-platform CLI tool that generates PDF documentation and LLM-optimized Markdown for direct use with any chat-based LLM. Perfect for code reviews, audits, and AI-assisted development.",
   "main": "src/index.js",
   "type": "module",
   "bin": {
@@ -10,7 +10,7 @@
   "scripts": {
     "start": "node bin/codesummary.js",
     "dev": "node bin/codesummary.js",
-    "test": "node bin/codesummary.js --help",
+    "test": "node --test test/**/*.test.js",
     "lint": "eslint src/ bin/",
     "format": "prettier --write .",
     "prepublishOnly": "npm run test"
@@ -18,10 +18,7 @@
   "keywords": [
     "code-documentation",
     "pdf-generator",
-    "rag",
-    "vector-database",
     "ai-ml",
-    "semantic-chunking",
     "source-code",
     "cli-tool",
     "code-review",
@@ -30,7 +27,6 @@
     "cross-platform",
     "documentation-generator",
     "pdf-export",
-    "json-export",
     "code-analysis",
     "file-scanner",
     "project-summary",
@@ -40,10 +36,7 @@
     "automatic-documentation",
     "professional-pdf",
     "terminal-compatible",
-    "llm-integration",
-    "token-estimation",
-    "byte-offsets",
-    "precision-indexing"
+    "llm-integration"
   ],
   "author": {
     "name": "CodeSummary Contributors",
@@ -68,21 +61,24 @@
     "README.md",
     "LICENSE",
     "CHANGELOG.md",
-    "rag-schema.json",
     "features.md"
   ],
   "preferGlobal": true,
   "dependencies": {
-    "pdfkit": "^0.15.0",
-    "inquirer": "^9.2.15",
-    "fs-extra": "^11.2.0",
+    "ajv": "^8.12.0",
     "chalk": "^5.3.0",
-    "ora": "^8.0.1",
+    "fs-extra": "^11.2.0",
+    "ignore": "^5.3.2",
+    "inquirer": "^9.2.15",
     "js-yaml": "^4.1.0",
-    "ajv": "^8.12.0"
+    "ora": "^8.0.1",
+    "pdfkit": "^0.15.0",
+    "pdfkit-table": "^0.1.99"
   },
   "devDependencies": {
+    "@playwright/test": "^1.59.1",
     "eslint": "^8.57.0",
+    "pdfjs-dist": "^5.6.205",
     "prettier": "^3.2.5"
   },
   "funding": {

package/src/ai/errors.js ADDED Viewed

@@ -0,0 +1,85 @@
+export class AiProviderError extends Error {
+  constructor(message, options = {}) {
+    super(message);
+    this.name = 'AiProviderError';
+    this.code = options.code || 'ai_error';
+    this.status = options.status ?? null;
+    this.retryable = Boolean(options.retryable);
+    this.provider = options.provider || null;
+    this.scope = options.scope || 'provider';
+    this.details = options.details || null;
+    this.cause = options.cause || null;
+  }
+}
+function statusToCode(status) {
+  if (status === 400) return 'bad_request';
+  if (status === 401 || status === 403) return 'auth_error';
+  if (status === 404) return 'not_found';
+  if (status === 408) return 'timeout';
+  if (status === 409) return 'conflict';
+  if (status === 413) return 'payload_too_large';
+  if (status === 422) return 'unprocessable';
+  if (status === 425) return 'too_early';
+  if (status === 429) return 'rate_limited';
+  if (status >= 500) return 'server_error';
+  return 'http_error';
+}
+export function isRetryableStatus(status) {
+  return [408, 409, 425, 429, 500, 502, 503, 504].includes(status);
+}
+export function normalizeAiError(error, context = {}) {
+  if (error instanceof AiProviderError) {
+    return error;
+  }
+  const provider = context.provider || null;
+  const scope = context.scope || 'provider';
+  const message = error?.message || 'Unknown AI error';
+  const lower = message.toLowerCase();
+  const isTimeout = lower.includes('aborted') || lower.includes('timeout');
+  const isNetwork = lower.includes('fetch failed') || lower.includes('econnrefused') || lower.includes('enotfound');
+  if (isTimeout) {
+    return new AiProviderError('AI request timed out', {
+      code: 'timeout',
+      retryable: true,
+      provider,
+      scope,
+      cause: error
+    });
+  }
+  if (isNetwork) {
+    return new AiProviderError('AI provider network unavailable', {
+      code: 'network_unavailable',
+      retryable: true,
+      provider,
+      scope,
+      cause: error
+    });
+  }
+  return new AiProviderError(message, {
+    code: 'ai_error',
+    retryable: false,
+    provider,
+    scope,
+    cause: error
+  });
+}
+export function createHttpAiError(provider, status, text = '') {
+  const code = statusToCode(status);
+  const retryable = isRetryableStatus(status);
+  const suffix = text ? `: ${text}` : '';
+  return new AiProviderError(`${provider} request failed (${status})${suffix}`, {
+    code,
+    status,
+    retryable,
+    provider,
+    scope: 'provider'
+  });
+}

package/src/ai/featureFlags.js ADDED Viewed

@@ -0,0 +1,8 @@
+export function isAiSemanticEnabled(aiOptions = {}) {
+  return Boolean(aiOptions.enabled && aiOptions.semantic);
+}
+export default {
+  isAiSemanticEnabled
+};