PyPI - code2llm - Versions diffs - 0.5.18__tar.gz → 0.5.19__tar.gz - Mend

code2llm 0.5.18tar.gz → 0.5.19tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (90) hide show

{code2llm-0.5.18 → code2llm-0.5.19}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: code2llm
-Version: 0.5.18
+Version: 0.5.19
 Summary: High-performance Python code flow analysis with optimized TOON format - CFG, DFG, call graphs, and intelligent code queries
 Home-page: https://github.com/wronai/stts
 Author: STTS Project
@@ -96,46 +96,63 @@ code2llm ./ -f all --max-memory 500
 code2llm ./ -f all --no-png
 ```
-### Large Repository Analysis (Chunking)
-For repositories >100 files, automatic chunking splits analysis into smaller subprojects:
+### Large Repository Analysis (Hierarchical Chunking)
+For large repositories, automatic hierarchical chunking ensures each output file stays under 256KB:
 ```bash
-# Auto-chunking when >100 files detected
+# Auto-chunking when estimated output >256KB
 code2llm ./ -f toon,evolution,code2logic --verbose
 # Force chunking with custom size limit
 code2llm ./ -f toon --chunk --chunk-size 256
-# Analyze only specific subproject
+# Analyze only specific subproject (matches level-1 or level-2 names)
 code2llm ./ -f toon --only-subproject src
+code2llm ./ -f toon --only-subproject src.core
-# Skip tests and examples
-code2llm ./ -f toon --skip-subprojects tests examples
+# Skip specific directories
+code2llm ./ -f toon --skip-subprojects tests examples docs
-# Customize file limit per chunk
-code2llm ./ -f toon --chunk --max-files-per-chunk 50
+# Customize chunking parameters
+code2llm ./ -f toon --chunk --max-files-per-chunk 50 --chunk-size 512
 ```
-**Chunking Benefits:**
-- Each subproject analyzed separately (examples/, tests/, src/, etc.)
-- Output limited to ~256KB per file (configurable)
-- Parallel processing of chunks possible
-- Reduced memory usage for large repos
+**Hierarchical Splitting Strategy:**
+1. **Level 0**: Entire project (if small enough, <256KB)
+2. **Level 1**: Top-level directories (src/, tests/, examples/)
+3. **Level 2**: Subdirectories if parent >256KB (src.core/, src.utils/)
+4. **Level 3**: File chunks if still too large
-**Output Structure:**
+**Example Output Structure:**
 ```
 ./project/
-  ├── src/                    # Core code analysis
-  │   ├── analysis.toon
+  ├── src/                    # Level 1: src/ fits in 256KB
+  │   ├── analysis.toon       # (~200KB)
   │   └── evolution.toon
-  ├── tests/                 # Test code analysis
+  ├── src_core/               # Level 2: src/core/ was too big
+  │   ├── analysis.toon       # (~180KB)
+  │   └── evolution.toon
+  ├── src_utils_part1/        # Level 3: split by file count
+  │   └── analysis.toon         # (~150KB)
+  ├── tests/                  # Level 1: tests/
   │   └── analysis.toon
-  ├── examples/              # Examples analysis
+  ├── examples/               # Level 1: examples/
   │   └── analysis.toon
-  ├── analysis.toon          # Merged summary
-  └── evolution.toon         # Full refactoring queue
+  ├── analysis.toon           # Merged summary (all levels)
+  └── evolution.toon          # Full refactoring queue
 ```
+**Size Estimation:**
+- ~3KB per Python file in TOON format
+- Auto-detect chunking when: `file_count × 3KB > 256KB`
+- Example: 100 files ≈ 300KB → triggers chunking
+**Benefits:**
+- Each output file <256KB (easy for LLMs to process)
+- Natural code boundaries (module/submodule level)
+- Incremental analysis possible
+- Parallel processing ready
 ### Refactoring Focus
 ```bash
 # Get refactoring recommendations

{code2llm-0.5.18 → code2llm-0.5.19}/README.md RENAMED Viewed

@@ -46,46 +46,63 @@ code2llm ./ -f all --max-memory 500
 code2llm ./ -f all --no-png
 ```
-### Large Repository Analysis (Chunking)
-For repositories >100 files, automatic chunking splits analysis into smaller subprojects:
+### Large Repository Analysis (Hierarchical Chunking)
+For large repositories, automatic hierarchical chunking ensures each output file stays under 256KB:
 ```bash
-# Auto-chunking when >100 files detected
+# Auto-chunking when estimated output >256KB
 code2llm ./ -f toon,evolution,code2logic --verbose
 # Force chunking with custom size limit
 code2llm ./ -f toon --chunk --chunk-size 256
-# Analyze only specific subproject
+# Analyze only specific subproject (matches level-1 or level-2 names)
 code2llm ./ -f toon --only-subproject src
+code2llm ./ -f toon --only-subproject src.core
-# Skip tests and examples
-code2llm ./ -f toon --skip-subprojects tests examples
+# Skip specific directories
+code2llm ./ -f toon --skip-subprojects tests examples docs
-# Customize file limit per chunk
-code2llm ./ -f toon --chunk --max-files-per-chunk 50
+# Customize chunking parameters
+code2llm ./ -f toon --chunk --max-files-per-chunk 50 --chunk-size 512
 ```
-**Chunking Benefits:**
-- Each subproject analyzed separately (examples/, tests/, src/, etc.)
-- Output limited to ~256KB per file (configurable)
-- Parallel processing of chunks possible
-- Reduced memory usage for large repos
+**Hierarchical Splitting Strategy:**
+1. **Level 0**: Entire project (if small enough, <256KB)
+2. **Level 1**: Top-level directories (src/, tests/, examples/)
+3. **Level 2**: Subdirectories if parent >256KB (src.core/, src.utils/)
+4. **Level 3**: File chunks if still too large
-**Output Structure:**
+**Example Output Structure:**
 ```
 ./project/
-  ├── src/                    # Core code analysis
-  │   ├── analysis.toon
+  ├── src/                    # Level 1: src/ fits in 256KB
+  │   ├── analysis.toon       # (~200KB)
   │   └── evolution.toon
-  ├── tests/                 # Test code analysis
+  ├── src_core/               # Level 2: src/core/ was too big
+  │   ├── analysis.toon       # (~180KB)
+  │   └── evolution.toon
+  ├── src_utils_part1/        # Level 3: split by file count
+  │   └── analysis.toon         # (~150KB)
+  ├── tests/                  # Level 1: tests/
   │   └── analysis.toon
-  ├── examples/              # Examples analysis
+  ├── examples/               # Level 1: examples/
   │   └── analysis.toon
-  ├── analysis.toon          # Merged summary
-  └── evolution.toon         # Full refactoring queue
+  ├── analysis.toon           # Merged summary (all levels)
+  └── evolution.toon          # Full refactoring queue
 ```
+**Size Estimation:**
+- ~3KB per Python file in TOON format
+- Auto-detect chunking when: `file_count × 3KB > 256KB`
+- Example: 100 files ≈ 300KB → triggers chunking
+**Benefits:**
+- Each output file <256KB (easy for LLMs to process)
+- Natural code boundaries (module/submodule level)
+- Incremental analysis possible
+- Parallel processing ready
 ### Refactoring Focus
 ```bash
 # Get refactoring recommendations

{code2llm-0.5.18 → code2llm-0.5.19}/code2llm/__init__.py RENAMED Viewed

@@ -8,7 +8,7 @@ Includes NLP Processing Pipeline for query normalization, intent matching,
 and entity resolution with multilingual support.
 """
-__version__ = "0.5.18"
+__version__ = "0.5.19"
 __author__ = "STTS Project"
 # Core analysis components

{code2llm-0.5.18 → code2llm-0.5.19}/code2llm/cli.py RENAMED Viewed

@@ -303,7 +303,9 @@ def _run_analysis(args, source_path: Path, output_dir: Path):
     Returns AnalysisResult or exits on error.
     For large repos, may analyze in chunks and merge results.
     """
-    from .core.large_repo import LargeRepoSplitter, should_use_chunking
+    from .core.large_repo import (
+        HierarchicalRepoSplitter, should_use_chunking, get_analysis_plan
+    )
     # Check if we should use chunked analysis
     use_chunking = (
@@ -347,39 +349,57 @@ def _run_analysis(args, source_path: Path, output_dir: Path):
 def _run_chunked_analysis(args, source_path: Path, output_dir: Path):
-    """Analyze large repository in chunks by subproject."""
-    from .core.large_repo import LargeRepoSplitter, SubProject
+    """Analyze large repository using hierarchical chunking.
+    Strategy:
+    1. Level 1 folders first
+    2. If >256KB, split to level 2 subfolders
+    3. If still too big, use file chunking
+    """
+    from .core.large_repo import HierarchicalRepoSplitter
-    splitter = LargeRepoSplitter(
+    splitter = HierarchicalRepoSplitter(
         size_limit_kb=args.chunk_size,
         max_files_per_chunk=args.max_files_per_chunk
     )
-    # Get analysis plan
+    # Get hierarchical analysis plan
     subprojects = splitter.get_analysis_plan(source_path)
     if args.verbose:
-        print(f"Repository split into {len(subprojects)} subprojects:")
+        print(f"Hierarchical analysis plan ({len(subprojects)} chunks):")
+        level_counts = {}
+        for sp in subprojects:
+            level_counts[sp.level] = level_counts.get(sp.level, 0) + 1
+        for level in sorted(level_counts.keys()):
+            level_name = {0: 'root', 1: 'level-1', 2: 'level-2', 3: 'file-chunks'}.get(level, f'level-{level}')
+            print(f"  {level_name}: {level_counts[level]} chunks")
+        print("\nChunks:")
         for sp in subprojects:
-            print(f"  - {sp.name}: {sp.file_count} files (~{sp.estimated_size_kb}KB)")
+            level_indicator = "  " * sp.level
+            size_info = f"~{sp.estimated_size_kb}KB"
+            print(f"{level_indicator}{sp.name}: {sp.file_count} files ({size_info})")
     # Filter subprojects if requested
     if args.only_subproject:
-        subprojects = [sp for sp in subprojects if sp.name == args.only_subproject]
+        subprojects = [sp for sp in subprojects if sp.name == args.only_subproject or sp.name.startswith(args.only_subproject + '.')]
         if not subprojects:
             print(f"Error: Subproject '{args.only_subproject}' not found", file=sys.stderr)
             sys.exit(1)
     if args.skip_subprojects:
-        subprojects = [sp for sp in subprojects if sp.name not in args.skip_subprojects]
+        subprojects = [sp for sp in subprojects if not any(sp.name.startswith(skip) for skip in args.skip_subprojects)]
     # Analyze each subproject
     all_results = []
     for i, subproject in enumerate(subprojects, 1):
         if args.verbose:
-            print(f"\n[{i}/{len(subprojects)}] Analyzing: {subproject.name}")
+            level_name = {0: 'root', 1: 'L1', 2: 'L2', 3: 'chunk'}.get(subproject.level, f'L{subproject.level}')
+            print(f"\n[{i}/{len(subprojects)}] Analyzing [{level_name}]: {subproject.name}")
-        sp_output_dir = output_dir / subproject.name
+        sp_output_dir = output_dir / subproject.name.replace('.', '_')
         sp_output_dir.mkdir(parents=True, exist_ok=True)
         result = _analyze_subproject(args, subproject, sp_output_dir)
@@ -391,7 +411,7 @@ def _run_chunked_analysis(args, source_path: Path, output_dir: Path):
     if args.verbose:
         print(f"\nChunked analysis complete:")
-        print(f"  - Subprojects analyzed: {len(all_results)}")
+        print(f"  - Chunks analyzed: {len(all_results)}")
         print(f"  - Total functions: {len(merged_result.functions)}")
         print(f"  - Total classes: {len(merged_result.classes)}")

{code2llm-0.5.18 → code2llm-0.5.19}/code2llm/cli_exports.py RENAMED Viewed

@@ -1,10 +1,11 @@
 """Export functions for CLI - extracted from cli.py to reduce module complexity."""
+import os
 import shutil
 import subprocess
 import sys
 from pathlib import Path
-from typing import Optional
+from typing import List, Optional
 from .exporters import (
     YAMLExporter, JSONExporter, MermaidExporter,
@@ -71,10 +72,16 @@ def _export_code2logic(args, source_path: Path, output_dir: Path, formats: list[
     found = _find_code2logic_output(output_dir, res)
     target = output_dir / 'project.toon'
-    _normalize_code2logic_output(found, target)
+    final_files = _normalize_code2logic_output(found, target, args)
     if args.verbose:
-        print(f"  - CODE2LOGIC (project logic): {target}")
+        if len(final_files) == 1:
+            print(f"  - CODE2LOGIC (project logic): {final_files[0]}")
+        else:
+            print(f"  - CODE2LOGIC (project logic): {len(final_files)} parts")
+            for f in final_files:
+                size_kb = os.path.getsize(f) / 1024
+                print(f"    → {f.name}: {size_kb:.1f}KB")
 def _should_run_code2logic(formats: list[str]) -> bool:
@@ -155,11 +162,22 @@ def _find_code2logic_output(output_dir: Path, res) -> Path:
     return found
-def _normalize_code2logic_output(found: Path, target: Path) -> None:
-    """Normalize output location to target path."""
+def _normalize_code2logic_output(found: Path, target: Path, args) -> List[Path]:
+    """Normalize output location to target path and check size limits."""
     if found != target:
         target.parent.mkdir(parents=True, exist_ok=True)
         shutil.copyfile(found, target)
+        found = target
+    # Check and split if exceeds 256KB limit
+    from .core.toon_size_manager import manage_toon_size
+    return manage_toon_size(
+        found,
+        target.parent,
+        max_kb=256,
+        prefix="project",
+        verbose=getattr(args, 'verbose', False)
+    )
 def _export_prompt_txt(args, output_dir: Path, formats: list[str], source_path: Optional[Path] = None) -> None:
@@ -383,29 +401,32 @@ def _export_single_project(args, result, output_dir: Path, formats: list, source
 def _export_chunked_results(args, result, output_dir: Path, source_path: Path, formats: list):
     """Export chunked analysis results to subproject directories."""
-    from .core.large_repo import LargeRepoSplitter
+    from .core.large_repo import HierarchicalRepoSplitter
-    splitter = LargeRepoSplitter()
-    subprojects = splitter.detect_subprojects(source_path)
+    splitter = HierarchicalRepoSplitter(size_limit_kb=args.chunk_size)
+    subprojects = splitter.get_analysis_plan(source_path)
     # Filter subprojects same as in analysis
     if hasattr(args, 'only_subproject') and args.only_subproject:
-        subprojects = [sp for sp in subprojects if sp.name == args.only_subproject]
+        subprojects = [sp for sp in subprojects if sp.name == args.only_subproject or sp.name.startswith(args.only_subproject + '.')]
     if hasattr(args, 'skip_subprojects') and args.skip_subprojects:
-        subprojects = [sp for sp in subprojects if sp.name not in args.skip_subprojects]
+        subprojects = [sp for sp in subprojects if not any(sp.name.startswith(skip) for skip in args.skip_subprojects)]
     # Export each subproject to its own directory
     for sp in subprojects:
-        sp_output_dir = output_dir / sp.name
+        sp_output_dir = output_dir / sp.name.replace('.', '_')
         if not sp_output_dir.exists():
-            continue  # Skip if subproject wasn't analyzed
+            continue
-        # Check for subproject result file
-        sp_result_file = sp_output_dir / 'analysis.yaml'
-        if sp_result_file.exists():
-            if args.verbose:
-                print(f"  - Exported {sp.name} to {sp_output_dir}")
+        # Check for subproject result files
+        for ext in ['.toon', '.yaml', '.json']:
+            result_file = sp_output_dir / f'analysis{ext}'
+            if result_file.exists():
+                if args.verbose:
+                    level_name = {0: 'root', 1: 'L1', 2: 'L2'}.get(sp.level, f'L{sp.level}')
+                    print(f"  - Exported [{level_name}] {sp.name}")
+                break
     # Also create merged summary in root output dir
     _export_simple_formats(args, result, output_dir, ['toon', 'context'])

code2llm 0.5.18__tar.gz → 0.5.19__tar.gz

code2llm 0.5.18tar.gz → 0.5.19tar.gz