PyPI - codeclone - Versions diffs - 1.4.2__tar.gz → 1.4.4__tar.gz - Mend

codeclone 1.4.2tar.gz → 1.4.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

{codeclone-1.4.2 → codeclone-1.4.4}/LICENSE RENAMED Viewed

@@ -18,4 +18,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
+SOFTWARE.

{codeclone-1.4.2 → codeclone-1.4.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codeclone
-Version: 1.4.2
+Version: 1.4.4
 Summary: AST and CFG-based code clone detector for Python focused on architectural duplication
 Author-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
 Maintainer-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
@@ -49,7 +49,7 @@ Dynamic: license-file
 ![Baseline](https://img.shields.io/badge/baseline-versioned-green?style=flat-square)
 [![License](https://img.shields.io/pypi/l/codeclone.svg?style=flat-square)](LICENSE)
-**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
+**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
 It discovers architectural duplication and prevents new copy-paste from entering your codebase via CI.
 ---
@@ -75,13 +75,13 @@ Unlike token-based tools, CodeClone compares **structure and control flow**, mak
 **Three Detection Levels:**
-1. **Function clones (CFG fingerprint)**
+1. **Function clones (CFG fingerprint)**
    Strong structural signal for cross-layer duplication
-2. **Block clones (statement windows)**
+2. **Block clones (statement windows)**
    Detects repeated local logic patterns
-3. **Segment clones (report-only)**
+3. **Segment clones (report-only)**
    Internal function repetition for explainability; not used for baseline gating
 **CI-Ready Features:**
@@ -158,12 +158,12 @@ Full contract details: [`docs/book/06-baseline.md`](docs/book/06-baseline.md)
 CodeClone uses a deterministic exit code contract:
-| Code | Meaning                                                                     |
-|------|-----------------------------------------------------------------------------|
-| `0`  | Success — run completed without gating failures                             |
+| Code | Meaning                                                                                                                             |
+|------|-------------------------------------------------------------------------------------------------------------------------------------|
+| `0`  | Success — run completed without gating failures                                                                                     |
 | `2`  | Contract error — baseline missing/untrusted, invalid output extensions, incompatible versions, unreadable source files in CI/gating |
-| `3`  | Gating failure — new clones detected or threshold exceeded                  |
-| `5`  | Internal error — unexpected exception                                       |
+| `3`  | Gating failure — new clones detected or threshold exceeded                                                                          |
+| `5`  | Internal error — unexpected exception                                                                                               |
 **Priority:** Contract errors (`2`) override gating failures (`3`) when both occur.
@@ -223,7 +223,7 @@ Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
     "cache_path": "/path/to/.cache/codeclone/cache.json",
     "cache_used": true,
     "cache_status": "ok",
-    "cache_schema_version": "1.2",
+    "cache_schema_version": "1.3",
     "files_skipped_source_io": 0,
     "groups_counts": {
       "functions": {
@@ -304,7 +304,8 @@ Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
 Cache is an optimization layer only and is never a source of truth.
 - Default path: `<root>/.cache/codeclone/cache.json`
-- Schema version: **v1.2**
+- Schema version: **v1.3**
+- Compatibility includes analysis profile (`min_loc`, `min_stmt`)
 - Invalid or oversized cache is ignored with warning and rebuilt (fail-open)
 Full contract details: [`docs/book/07-cache.md`](docs/book/07-cache.md)

{codeclone-1.4.2 → codeclone-1.4.4}/README.md RENAMED Viewed

@@ -8,7 +8,7 @@
 ![Baseline](https://img.shields.io/badge/baseline-versioned-green?style=flat-square)
 [![License](https://img.shields.io/pypi/l/codeclone.svg?style=flat-square)](LICENSE)
-**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
+**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
 It discovers architectural duplication and prevents new copy-paste from entering your codebase via CI.
 ---
@@ -34,13 +34,13 @@ Unlike token-based tools, CodeClone compares **structure and control flow**, mak
 **Three Detection Levels:**
-1. **Function clones (CFG fingerprint)**
+1. **Function clones (CFG fingerprint)**
    Strong structural signal for cross-layer duplication
-2. **Block clones (statement windows)**
+2. **Block clones (statement windows)**
    Detects repeated local logic patterns
-3. **Segment clones (report-only)**
+3. **Segment clones (report-only)**
    Internal function repetition for explainability; not used for baseline gating
 **CI-Ready Features:**
@@ -117,12 +117,12 @@ Full contract details: [`docs/book/06-baseline.md`](docs/book/06-baseline.md)
 CodeClone uses a deterministic exit code contract:
-| Code | Meaning                                                                     |
-|------|-----------------------------------------------------------------------------|
-| `0`  | Success — run completed without gating failures                             |
+| Code | Meaning                                                                                                                             |
+|------|-------------------------------------------------------------------------------------------------------------------------------------|
+| `0`  | Success — run completed without gating failures                                                                                     |
 | `2`  | Contract error — baseline missing/untrusted, invalid output extensions, incompatible versions, unreadable source files in CI/gating |
-| `3`  | Gating failure — new clones detected or threshold exceeded                  |
-| `5`  | Internal error — unexpected exception                                       |
+| `3`  | Gating failure — new clones detected or threshold exceeded                                                                          |
+| `5`  | Internal error — unexpected exception                                                                                               |
 **Priority:** Contract errors (`2`) override gating failures (`3`) when both occur.
@@ -182,7 +182,7 @@ Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
     "cache_path": "/path/to/.cache/codeclone/cache.json",
     "cache_used": true,
     "cache_status": "ok",
-    "cache_schema_version": "1.2",
+    "cache_schema_version": "1.3",
     "files_skipped_source_io": 0,
     "groups_counts": {
       "functions": {
@@ -263,7 +263,8 @@ Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
 Cache is an optimization layer only and is never a source of truth.
 - Default path: `<root>/.cache/codeclone/cache.json`
-- Schema version: **v1.2**
+- Schema version: **v1.3**
+- Compatibility includes analysis profile (`min_loc`, `min_stmt`)
 - Invalid or oversized cache is ignored with warning and rebuilt (fail-open)
 Full contract details: [`docs/book/07-cache.md`](docs/book/07-cache.md)

{codeclone-1.4.2 → codeclone-1.4.4}/codeclone/_html_snippets.py RENAMED Viewed

@@ -14,6 +14,7 @@ import itertools
 from collections.abc import Iterable
 from dataclasses import dataclass
 from functools import lru_cache
+from types import ModuleType
 from typing import NamedTuple, cast
 from .errors import FileProcessingError
@@ -34,33 +35,19 @@ class _Snippet:
 class _FileCache:
-    __slots__ = ("_get_lines_impl", "maxsize")
+    __slots__ = ("_get_file_lines_impl", "maxsize")
     def __init__(self, maxsize: int = 128) -> None:
         self.maxsize = maxsize
-        self._get_lines_impl = lru_cache(maxsize=maxsize)(self._read_file_range)
+        self._get_file_lines_impl = lru_cache(maxsize=maxsize)(self._read_file_lines)
     @staticmethod
-    def _read_file_range(
-        filepath: str, start_line: int, end_line: int
-    ) -> tuple[str, ...]:
-        if start_line < 1:
-            start_line = 1
-        if end_line < start_line:
-            return ()
+    def _read_file_lines(filepath: str) -> tuple[str, ...]:
         try:
             def _read_with_errors(errors: str) -> tuple[str, ...]:
-                lines: list[str] = []
                 with open(filepath, encoding="utf-8", errors=errors) as f:
-                    for lineno, line in enumerate(f, start=1):
-                        if lineno < start_line:
-                            continue
-                        if lineno > end_line:
-                            break
-                        lines.append(line.rstrip("\n"))
-                return tuple(lines)
+                    return tuple(line.rstrip("\n") for line in f)
             try:
                 return _read_with_errors("strict")
@@ -72,7 +59,16 @@ class _FileCache:
     def get_lines_range(
         self, filepath: str, start_line: int, end_line: int
     ) -> tuple[str, ...]:
-        return self._get_lines_impl(filepath, start_line, end_line)
+        if start_line < 1:
+            start_line = 1
+        if end_line < start_line:
+            return ()
+        lines = self._get_file_lines_impl(filepath)
+        start_index = start_line - 1
+        if start_index >= len(lines):
+            return ()
+        end_index = min(len(lines), end_line)
+        return lines[start_index:end_index]
     class _CacheInfo(NamedTuple):
         hits: int
@@ -81,10 +77,30 @@ class _FileCache:
         currsize: int
     def cache_info(self) -> _CacheInfo:
-        return cast(_FileCache._CacheInfo, self._get_lines_impl.cache_info())
+        return cast(_FileCache._CacheInfo, self._get_file_lines_impl.cache_info())
-def _try_pygments(code: str) -> str | None:
+_PYGMENTS_IMPORTER_ID: int | None = None
+_PYGMENTS_API: tuple[ModuleType, ModuleType, ModuleType] | None = None
+def _load_pygments_api() -> tuple[ModuleType, ModuleType, ModuleType] | None:
+    """
+    Load pygments modules once per import-function identity.
+    Tests monkeypatch `importlib.import_module`; tracking importer identity keeps
+    behavior deterministic and allows import-error branches to stay testable.
+    """
+    global _PYGMENTS_IMPORTER_ID
+    global _PYGMENTS_API
+    importer_id = id(importlib.import_module)
+    if importer_id != _PYGMENTS_IMPORTER_ID:
+        _PYGMENTS_IMPORTER_ID = importer_id
+        _PYGMENTS_API = None
+    if _PYGMENTS_API is not None:
+        return _PYGMENTS_API
     try:
         pygments = importlib.import_module("pygments")
         formatters = importlib.import_module("pygments.formatters")
@@ -92,6 +108,16 @@ def _try_pygments(code: str) -> str | None:
     except ImportError:
         return None
+    _PYGMENTS_API = (pygments, formatters, lexers)
+    return _PYGMENTS_API
+def _try_pygments(code: str) -> str | None:
+    pygments_api = _load_pygments_api()
+    if pygments_api is None:
+        return None
+    pygments, formatters, lexers = pygments_api
     highlight = pygments.highlight
     formatter_cls = formatters.HtmlFormatter
     lexer_cls = lexers.PythonLexer
@@ -104,10 +130,10 @@ def _pygments_css(style_name: str) -> str:
     Returns CSS for pygments tokens. Scoped to `.codebox` to avoid leaking styles.
     If Pygments is not available or style missing, returns "".
     """
-    try:
-        formatters = importlib.import_module("pygments.formatters")
-    except ImportError:
+    pygments_api = _load_pygments_api()
+    if pygments_api is None:
         return ""
+    _, formatters, _ = pygments_api
     try:
         formatter_cls = formatters.HtmlFormatter

{codeclone-1.4.2 → codeclone-1.4.4}/codeclone/_report_explain.py RENAMED Viewed

@@ -9,6 +9,8 @@ Licensed under the MIT License.
 from __future__ import annotations
 import ast
+from bisect import bisect_left, bisect_right
+from dataclasses import dataclass
 from pathlib import Path
 from ._report_explain_contract import (
@@ -23,6 +25,19 @@ from ._report_explain_contract import (
 from ._report_types import GroupItem, GroupMap
+@dataclass(frozen=True, slots=True)
+class _StatementRecord:
+    node: ast.stmt
+    start_line: int
+    end_line: int
+    start_col: int
+    end_col: int
+    type_name: str
+_StatementIndex = tuple[tuple[_StatementRecord, ...], tuple[int, ...]]
 def _signature_parts(group_key: str) -> list[str]:
     return [part for part in group_key.split("|") if part]
@@ -42,6 +57,53 @@ def _parsed_file_tree(
     return tree
+def _build_statement_index(tree: ast.AST) -> _StatementIndex:
+    records = tuple(
+        sorted(
+            (
+                _StatementRecord(
+                    node=node,
+                    start_line=int(getattr(node, "lineno", 0)),
+                    end_line=int(getattr(node, "end_lineno", 0)),
+                    start_col=int(getattr(node, "col_offset", 0)),
+                    end_col=int(getattr(node, "end_col_offset", 0)),
+                    type_name=type(node).__name__,
+                )
+                for node in ast.walk(tree)
+                if isinstance(node, ast.stmt)
+            ),
+            key=lambda record: (
+                record.start_line,
+                record.end_line,
+                record.start_col,
+                record.end_col,
+                record.type_name,
+            ),
+        )
+    )
+    start_lines = tuple(record.start_line for record in records)
+    return records, start_lines
+def _parsed_statement_index(
+    filepath: str,
+    *,
+    ast_cache: dict[str, ast.AST | None],
+    stmt_index_cache: dict[str, _StatementIndex | None],
+) -> _StatementIndex | None:
+    if filepath in stmt_index_cache:
+        return stmt_index_cache[filepath]
+    tree = _parsed_file_tree(filepath, ast_cache=ast_cache)
+    if tree is None:
+        stmt_index_cache[filepath] = None
+        return None
+    index = _build_statement_index(tree)
+    stmt_index_cache[filepath] = index
+    return index
 def _is_assert_like_stmt(stmt: ast.stmt) -> bool:
     if isinstance(stmt, ast.Assert):
         return True
@@ -64,45 +126,42 @@ def _assert_range_stats(
     start_line: int,
     end_line: int,
     ast_cache: dict[str, ast.AST | None],
+    stmt_index_cache: dict[str, _StatementIndex | None],
     range_cache: dict[tuple[str, int, int], tuple[int, int, int]],
 ) -> tuple[int, int, int]:
     cache_key = (filepath, start_line, end_line)
     if cache_key in range_cache:
         return range_cache[cache_key]
-    tree = _parsed_file_tree(filepath, ast_cache=ast_cache)
-    if tree is None:
+    statement_index = _parsed_statement_index(
+        filepath,
+        ast_cache=ast_cache,
+        stmt_index_cache=stmt_index_cache,
+    )
+    if statement_index is None:
         range_cache[cache_key] = (0, 0, 0)
         return 0, 0, 0
-    stmts = [
-        node
-        for node in ast.walk(tree)
-        if isinstance(node, ast.stmt)
-        and int(getattr(node, "lineno", 0)) >= start_line
-        and int(getattr(node, "end_lineno", 0)) <= end_line
-    ]
-    if not stmts:
+    records, start_lines = statement_index
+    if not records:
         range_cache[cache_key] = (0, 0, 0)
         return 0, 0, 0
-    ordered_stmts = sorted(
-        stmts,
-        key=lambda stmt: (
-            int(getattr(stmt, "lineno", 0)),
-            int(getattr(stmt, "end_lineno", 0)),
-            int(getattr(stmt, "col_offset", 0)),
-            int(getattr(stmt, "end_col_offset", 0)),
-            type(stmt).__name__,
-        ),
-    )
+    left = bisect_left(start_lines, start_line)
+    right = bisect_right(start_lines, end_line)
+    if left >= right:
+        range_cache[cache_key] = (0, 0, 0)
+        return 0, 0, 0
-    total = len(ordered_stmts)
+    total = 0
     assert_like = 0
     max_consecutive = 0
     current_consecutive = 0
-    for stmt in ordered_stmts:
-        if _is_assert_like_stmt(stmt):
+    for record in records[left:right]:
+        if record.end_line > end_line:
+            continue
+        total += 1
+        if _is_assert_like_stmt(record.node):
             assert_like += 1
             current_consecutive += 1
             if current_consecutive > max_consecutive:
@@ -110,6 +169,10 @@ def _assert_range_stats(
         else:
             current_consecutive = 0
+    if total == 0:
+        range_cache[cache_key] = (0, 0, 0)
+        return 0, 0, 0
     stats = (total, assert_like, max_consecutive)
     range_cache[cache_key] = stats
     return stats
@@ -121,6 +184,7 @@ def _is_assert_only_range(
     start_line: int,
     end_line: int,
     ast_cache: dict[str, ast.AST | None],
+    stmt_index_cache: dict[str, _StatementIndex | None],
     range_cache: dict[tuple[str, int, int], tuple[int, int, int]],
 ) -> bool:
     total, assert_like, _ = _assert_range_stats(
@@ -128,6 +192,7 @@ def _is_assert_only_range(
         start_line=start_line,
         end_line=end_line,
         ast_cache=ast_cache,
+        stmt_index_cache=stmt_index_cache,
         range_cache=range_cache,
     )
     return total > 0 and total == assert_like
@@ -157,6 +222,7 @@ def _enrich_with_assert_facts(
     facts: dict[str, str],
     items: list[GroupItem],
     ast_cache: dict[str, ast.AST | None],
+    stmt_index_cache: dict[str, _StatementIndex | None],
     range_cache: dict[tuple[str, int, int], tuple[int, int, int]],
 ) -> None:
     assert_only = True
@@ -181,6 +247,7 @@ def _enrich_with_assert_facts(
                 start_line=start_line,
                 end_line=end_line,
                 ast_cache=ast_cache,
+                stmt_index_cache=stmt_index_cache,
                 range_cache=range_cache,
             )
             total_statements += range_total
@@ -198,6 +265,7 @@ def _enrich_with_assert_facts(
                 start_line=start_line,
                 end_line=end_line,
                 ast_cache=ast_cache,
+                stmt_index_cache=stmt_index_cache,
                 range_cache=range_cache,
             )
         ):
@@ -223,6 +291,7 @@ def build_block_group_facts(block_groups: GroupMap) -> dict[str, dict[str, str]]
     Renderers (HTML/TXT/JSON) should only display these facts.
     """
     ast_cache: dict[str, ast.AST | None] = {}
+    stmt_index_cache: dict[str, _StatementIndex | None] = {}
     range_cache: dict[tuple[str, int, int], tuple[int, int, int]] = {}
     facts_by_group: dict[str, dict[str, str]] = {}
@@ -232,6 +301,7 @@ def build_block_group_facts(block_groups: GroupMap) -> dict[str, dict[str, str]]
             facts=facts,
             items=items,
             ast_cache=ast_cache,
+            stmt_index_cache=stmt_index_cache,
             range_cache=range_cache,
         )
         group_arity = len(items)

{codeclone-1.4.2 → codeclone-1.4.4}/codeclone/cache.py RENAMED Viewed

@@ -39,6 +39,7 @@ class CacheStatus(str, Enum):
     VERSION_MISMATCH = "version_mismatch"
     PYTHON_TAG_MISMATCH = "python_tag_mismatch"
     FINGERPRINT_MISMATCH = "mismatch_fingerprint_version"
+    ANALYSIS_PROFILE_MISMATCH = "analysis_profile_mismatch"
     INTEGRITY_FAILED = "integrity_failed"
@@ -84,15 +85,22 @@ class CacheEntry(TypedDict):
     segments: list[SegmentDict]
+class AnalysisProfile(TypedDict):
+    min_loc: int
+    min_stmt: int
 class CacheData(TypedDict):
     version: str
     python_tag: str
     fingerprint_version: str
+    analysis_profile: AnalysisProfile
     files: dict[str, CacheEntry]
 class Cache:
     __slots__ = (
+        "analysis_profile",
         "cache_schema_version",
         "data",
         "fingerprint_version",
@@ -112,14 +120,21 @@ class Cache:
         *,
         root: str | Path | None = None,
         max_size_bytes: int | None = None,
+        min_loc: int = 15,
+        min_stmt: int = 6,
     ):
         self.path = Path(path)
         self.root = _resolve_root(root)
         self.fingerprint_version = BASELINE_FINGERPRINT_VERSION
+        self.analysis_profile: AnalysisProfile = {
+            "min_loc": min_loc,
+            "min_stmt": min_stmt,
+        }
         self.data: CacheData = _empty_cache_data(
             version=self._CACHE_VERSION,
             python_tag=current_python_tag(),
             fingerprint_version=self.fingerprint_version,
+            analysis_profile=self.analysis_profile,
         )
         self.legacy_secret_warning = self._detect_legacy_secret_warning()
         self.cache_schema_version: str | None = None
@@ -164,6 +179,7 @@ class Cache:
             version=self._CACHE_VERSION,
             python_tag=current_python_tag(),
             fingerprint_version=self.fingerprint_version,
+            analysis_profile=self.analysis_profile,
         )
     def _sign_data(self, data: Mapping[str, object]) -> str:
@@ -309,6 +325,28 @@ class Cache:
             )
             return None
+        analysis_profile = _as_analysis_profile(payload.get("ap"))
+        if analysis_profile is None:
+            self._ignore_cache(
+                "Cache format invalid; ignoring cache.",
+                status=CacheStatus.INVALID_TYPE,
+                schema_version=version,
+            )
+            return None
+        if analysis_profile != self.analysis_profile:
+            self._ignore_cache(
+                "Cache analysis profile mismatch "
+                f"(found min_loc={analysis_profile['min_loc']}, "
+                f"min_stmt={analysis_profile['min_stmt']}; "
+                f"expected min_loc={self.analysis_profile['min_loc']}, "
+                f"min_stmt={self.analysis_profile['min_stmt']}); "
+                "ignoring cache.",
+                status=CacheStatus.ANALYSIS_PROFILE_MISMATCH,
+                schema_version=version,
+            )
+            return None
         files_obj = payload.get("files")
         files_dict = _as_str_dict(files_obj)
         if files_dict is None:
@@ -337,6 +375,7 @@ class Cache:
             "version": self._CACHE_VERSION,
             "python_tag": runtime_tag,
             "fingerprint_version": self.fingerprint_version,
+            "analysis_profile": self.analysis_profile,
             "files": parsed_files,
         }
@@ -356,6 +395,7 @@ class Cache:
             payload: dict[str, object] = {
                 "py": current_python_tag(),
                 "fp": self.fingerprint_version,
+                "ap": self.analysis_profile,
                 "files": wire_files,
             }
             signed_doc = {
@@ -371,6 +411,7 @@ class Cache:
             self.data["version"] = self._CACHE_VERSION
             self.data["python_tag"] = current_python_tag()
             self.data["fingerprint_version"] = self.fingerprint_version
+            self.data["analysis_profile"] = self.analysis_profile
         except OSError as e:
             raise CacheError(f"Failed to save cache: {e}") from e
@@ -508,11 +549,13 @@ def _empty_cache_data(
     version: str,
     python_tag: str,
     fingerprint_version: str,
+    analysis_profile: AnalysisProfile,
 ) -> CacheData:
     return {
         "version": version,
         "python_tag": python_tag,
         "fingerprint_version": fingerprint_version,
+        "analysis_profile": analysis_profile,
         "files": {},
     }
@@ -542,6 +585,22 @@ def _as_str_dict(value: object) -> dict[str, object] | None:
     return value
+def _as_analysis_profile(value: object) -> AnalysisProfile | None:
+    obj = _as_str_dict(value)
+    if obj is None:
+        return None
+    if set(obj.keys()) != {"min_loc", "min_stmt"}:
+        return None
+    min_loc = _as_int(obj.get("min_loc"))
+    min_stmt = _as_int(obj.get("min_stmt"))
+    if min_loc is None or min_stmt is None:
+        return None
+    return {"min_loc": min_loc, "min_stmt": min_stmt}
 def _decode_wire_file_entry(value: object, filepath: str) -> CacheEntry | None:
     obj = _as_str_dict(value)
     if obj is None:

{codeclone-1.4.2 → codeclone-1.4.4}/codeclone/cli.py RENAMED Viewed

@@ -310,6 +310,8 @@ def _main_impl() -> None:
         cache_path,
         root=root_path,
         max_size_bytes=args.max_cache_size_mb * 1024 * 1024,
+        min_loc=args.min_loc,
+        min_stmt=args.min_stmt,
     )
     cache.load()
     if cache.load_warning:

{codeclone-1.4.2 → codeclone-1.4.4}/codeclone/contracts.py RENAMED Viewed

@@ -14,7 +14,7 @@ from typing import Final
 BASELINE_SCHEMA_VERSION: Final = "1.0"
 BASELINE_FINGERPRINT_VERSION: Final = "1"
-CACHE_VERSION: Final = "1.2"
+CACHE_VERSION: Final = "1.3"
 REPORT_SCHEMA_VERSION: Final = "1.1"

{codeclone-1.4.2 → codeclone-1.4.4}/codeclone.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codeclone
-Version: 1.4.2
+Version: 1.4.4
 Summary: AST and CFG-based code clone detector for Python focused on architectural duplication
 Author-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
 Maintainer-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
@@ -49,7 +49,7 @@ Dynamic: license-file
 ![Baseline](https://img.shields.io/badge/baseline-versioned-green?style=flat-square)
 [![License](https://img.shields.io/pypi/l/codeclone.svg?style=flat-square)](LICENSE)
-**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
+**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
 It discovers architectural duplication and prevents new copy-paste from entering your codebase via CI.
 ---
@@ -75,13 +75,13 @@ Unlike token-based tools, CodeClone compares **structure and control flow**, mak
 **Three Detection Levels:**
-1. **Function clones (CFG fingerprint)**
+1. **Function clones (CFG fingerprint)**
    Strong structural signal for cross-layer duplication
-2. **Block clones (statement windows)**
+2. **Block clones (statement windows)**
    Detects repeated local logic patterns
-3. **Segment clones (report-only)**
+3. **Segment clones (report-only)**
    Internal function repetition for explainability; not used for baseline gating
 **CI-Ready Features:**
@@ -158,12 +158,12 @@ Full contract details: [`docs/book/06-baseline.md`](docs/book/06-baseline.md)
 CodeClone uses a deterministic exit code contract:
-| Code | Meaning                                                                     |
-|------|-----------------------------------------------------------------------------|
-| `0`  | Success — run completed without gating failures                             |
+| Code | Meaning                                                                                                                             |
+|------|-------------------------------------------------------------------------------------------------------------------------------------|
+| `0`  | Success — run completed without gating failures                                                                                     |
 | `2`  | Contract error — baseline missing/untrusted, invalid output extensions, incompatible versions, unreadable source files in CI/gating |
-| `3`  | Gating failure — new clones detected or threshold exceeded                  |
-| `5`  | Internal error — unexpected exception                                       |
+| `3`  | Gating failure — new clones detected or threshold exceeded                                                                          |
+| `5`  | Internal error — unexpected exception                                                                                               |
 **Priority:** Contract errors (`2`) override gating failures (`3`) when both occur.
@@ -223,7 +223,7 @@ Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
     "cache_path": "/path/to/.cache/codeclone/cache.json",
     "cache_used": true,
     "cache_status": "ok",
-    "cache_schema_version": "1.2",
+    "cache_schema_version": "1.3",
     "files_skipped_source_io": 0,
     "groups_counts": {
       "functions": {
@@ -304,7 +304,8 @@ Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
 Cache is an optimization layer only and is never a source of truth.
 - Default path: `<root>/.cache/codeclone/cache.json`
-- Schema version: **v1.2**
+- Schema version: **v1.3**
+- Compatibility includes analysis profile (`min_loc`, `min_stmt`)
 - Invalid or oversized cache is ignored with warning and rebuilt (fail-open)
 Full contract details: [`docs/book/07-cache.md`](docs/book/07-cache.md)

{codeclone-1.4.2 → codeclone-1.4.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "codeclone"
-version = "1.4.2"
+version = "1.4.4"
 description = "AST and CFG-based code clone detector for Python focused on architectural duplication"
 readme = { file = "README.md", content-type = "text/markdown" }
 license = { text = "MIT" }

{codeclone-1.4.2 → codeclone-1.4.4}/tests/test_cache.py RENAMED Viewed

@@ -50,6 +50,15 @@ def _make_segment(filepath: str) -> SegmentUnit:
     )
+def _analysis_payload(cache: Cache, *, files: object) -> dict[str, object]:
+    return {
+        "py": cache.data["python_tag"],
+        "fp": cache.data["fingerprint_version"],
+        "ap": cache.data["analysis_profile"],
+        "files": files,
+    }
 def test_cache_roundtrip(tmp_path: Path) -> None:
     cache_path = tmp_path / "cache.json"
     cache = Cache(cache_path)
@@ -97,7 +106,7 @@ def test_get_file_entry_missing_after_fallback_returns_none(tmp_path: Path) -> N
     assert cache.get_file_entry(str(root / "pkg" / "missing.py")) is None
-def test_cache_v12_uses_relpaths_when_root_set(tmp_path: Path) -> None:
+def test_cache_v13_uses_relpaths_when_root_set(tmp_path: Path) -> None:
     project_root = tmp_path / "project"
     target = project_root / "pkg" / "module.py"
     target.parent.mkdir(parents=True, exist_ok=True)
@@ -121,14 +130,10 @@ def test_cache_v12_uses_relpaths_when_root_set(tmp_path: Path) -> None:
     assert str(target) not in files
-def test_cache_v12_missing_optional_sections_default_empty(tmp_path: Path) -> None:
+def test_cache_v13_missing_optional_sections_default_empty(tmp_path: Path) -> None:
     cache_path = tmp_path / "cache.json"
     cache = Cache(cache_path)
-    payload = {
-        "py": cache.data["python_tag"],
-        "fp": cache.data["fingerprint_version"],
-        "files": {"x.py": {"st": [1, 2]}},
-    }
+    payload = _analysis_payload(cache, files={"x.py": {"st": [1, 2]}})
     signature = cache._sign_data(payload)
     cache_path.write_text(
         json.dumps({"v": cache._CACHE_VERSION, "payload": payload, "sig": signature}),
@@ -201,11 +206,7 @@ def test_cache_version_mismatch_warns(tmp_path: Path) -> None:
 def test_cache_v_field_version_mismatch_warns(tmp_path: Path) -> None:
     cache_path = tmp_path / "cache.json"
     cache = Cache(cache_path)
-    payload = {
-        "py": cache.data["python_tag"],
-        "fp": cache.data["fingerprint_version"],
-        "files": {},
-    }
+    payload = _analysis_payload(cache, files={})
     signature = cache._sign_data(payload)
     cache_path.write_text(
         json.dumps({"v": "0.0", "payload": payload, "sig": signature}), "utf-8"
@@ -527,11 +528,7 @@ def test_cache_load_unreadable_read_graceful_ignore(
 def test_cache_load_invalid_files_type(tmp_path: Path) -> None:
     cache_path = tmp_path / "cache.json"
     cache = Cache(cache_path)
-    payload = {
-        "py": cache.data["python_tag"],
-        "fp": cache.data["fingerprint_version"],
-        "files": [],
-    }
+    payload = _analysis_payload(cache, files=[])
     signature = cache._sign_data(payload)
     cache_path.write_text(
         json.dumps({"v": cache._CACHE_VERSION, "payload": payload, "sig": signature}),
@@ -644,11 +641,7 @@ def test_cache_load_invalid_top_level_type(tmp_path: Path) -> None:
 def test_cache_load_missing_v_field(tmp_path: Path) -> None:
     cache_path = tmp_path / "cache.json"
     cache = Cache(cache_path)
-    payload = {
-        "py": cache.data["python_tag"],
-        "fp": cache.data["fingerprint_version"],
-        "files": {},
-    }
+    payload = _analysis_payload(cache, files={})
     sig = cache._sign_data(payload)
     cache_path.write_text(json.dumps({"payload": payload, "sig": sig}), "utf-8")
     cache.load()
@@ -683,7 +676,12 @@ def test_cache_load_missing_python_tag_in_payload(tmp_path: Path) -> None:
 def test_cache_load_python_tag_mismatch(tmp_path: Path) -> None:
     cache_path = tmp_path / "cache.json"
     cache = Cache(cache_path)
-    payload = {"py": "cp999", "fp": cache.data["fingerprint_version"], "files": {}}
+    payload = {
+        "py": "cp999",
+        "fp": cache.data["fingerprint_version"],
+        "ap": cache.data["analysis_profile"],
+        "files": {},
+    }
     sig = cache._sign_data(payload)
     cache_path.write_text(
         json.dumps({"v": cache._CACHE_VERSION, "payload": payload, "sig": sig}), "utf-8"
@@ -709,7 +707,12 @@ def test_cache_load_missing_fingerprint_version(tmp_path: Path) -> None:
 def test_cache_load_fingerprint_version_mismatch(tmp_path: Path) -> None:
     cache_path = tmp_path / "cache.json"
     cache = Cache(cache_path)
-    payload = {"py": cache.data["python_tag"], "fp": "old", "files": {}}
+    payload = {
+        "py": cache.data["python_tag"],
+        "fp": "old",
+        "ap": cache.data["analysis_profile"],
+        "files": {},
+    }
     sig = cache._sign_data(payload)
     cache_path.write_text(
         json.dumps({"v": cache._CACHE_VERSION, "payload": payload, "sig": sig}), "utf-8"
@@ -719,18 +722,82 @@ def test_cache_load_fingerprint_version_mismatch(tmp_path: Path) -> None:
     assert "fingerprint version mismatch" in cache.load_warning
-def test_cache_load_invalid_wire_file_entry(tmp_path: Path) -> None:
+def test_cache_load_analysis_profile_mismatch(tmp_path: Path) -> None:
+    cache_path = tmp_path / "cache.json"
+    cache = Cache(cache_path, min_loc=1, min_stmt=1)
+    cache.put_file_entry("x.py", {"mtime_ns": 1, "size": 10}, [], [], [])
+    cache.save()
+    loaded = Cache(cache_path, min_loc=15, min_stmt=6)
+    loaded.load()
+    assert loaded.load_warning is not None
+    assert "analysis profile mismatch" in loaded.load_warning
+    assert loaded.data["files"] == {}
+    assert loaded.load_status == CacheStatus.ANALYSIS_PROFILE_MISMATCH
+    assert loaded.cache_schema_version == Cache._CACHE_VERSION
+def test_cache_load_missing_analysis_profile_in_payload(tmp_path: Path) -> None:
+    cache_path = tmp_path / "cache.json"
+    cache = Cache(cache_path)
+    payload = {
+        "py": cache.data["python_tag"],
+        "fp": cache.data["fingerprint_version"],
+        "files": {},
+    }
+    sig = cache._sign_data(payload)
+    cache_path.write_text(
+        json.dumps({"v": cache._CACHE_VERSION, "payload": payload, "sig": sig}), "utf-8"
+    )
+    cache.load()
+    assert cache.load_warning is not None
+    assert "format invalid" in cache.load_warning
+    assert cache.load_status == CacheStatus.INVALID_TYPE
+    assert cache.cache_schema_version == Cache._CACHE_VERSION
+    assert cache.data["files"] == {}
+@pytest.mark.parametrize(
+    "bad_analysis_profile",
+    [
+        {"min_loc": 15},
+        {"min_loc": "15", "min_stmt": 6},
+    ],
+)
+def test_cache_load_invalid_analysis_profile_payload(
+    tmp_path: Path, bad_analysis_profile: object
+) -> None:
     cache_path = tmp_path / "cache.json"
     cache = Cache(cache_path)
     payload = {
         "py": cache.data["python_tag"],
         "fp": cache.data["fingerprint_version"],
-        "files": {"x.py": {"st": "bad"}},
+        "ap": bad_analysis_profile,
+        "files": {},
     }
     sig = cache._sign_data(payload)
     cache_path.write_text(
         json.dumps({"v": cache._CACHE_VERSION, "payload": payload, "sig": sig}), "utf-8"
     )
+    cache.load()
+    assert cache.load_warning is not None
+    assert "format invalid" in cache.load_warning
+    assert cache.load_status == CacheStatus.INVALID_TYPE
+    assert cache.cache_schema_version == Cache._CACHE_VERSION
+    assert cache.data["files"] == {}
+def test_cache_load_invalid_wire_file_entry(tmp_path: Path) -> None:
+    cache_path = tmp_path / "cache.json"
+    cache = Cache(cache_path)
+    payload = _analysis_payload(cache, files={"x.py": {"st": "bad"}})
+    sig = cache._sign_data(payload)
+    cache_path.write_text(
+        json.dumps({"v": cache._CACHE_VERSION, "payload": payload, "sig": sig}), "utf-8"
+    )
     cache.load()
     assert cache.load_warning is not None
     assert "format invalid" in cache.load_warning

{codeclone-1.4.2 → codeclone-1.4.4}/tests/test_cli_inprocess.py RENAMED Viewed

@@ -708,7 +708,7 @@ def test_cli_cache_status_string_fallback(
         def __init__(self, _path: Path, **_kwargs: object) -> None:
             self.load_warning = load_warning
             self.load_status = "not-a-cache-status"
-            self.cache_schema_version = "1.2"
+            self.cache_schema_version = CACHE_VERSION
         def load(self) -> None:
             return None
@@ -1716,6 +1716,122 @@ def test_cli_reports_cache_meta_when_cache_missing(
     assert meta["cache_schema_version"] is None
+@pytest.mark.parametrize(
+    (
+        "first_min_loc",
+        "first_min_stmt",
+        "second_min_loc",
+        "second_min_stmt",
+        "expected_cache_used",
+        "expected_cache_status",
+        "expected_functions_total",
+        "expected_warning",
+    ),
+    [
+        (
+            1,
+            1,
+            15,
+            6,
+            False,
+            "analysis_profile_mismatch",
+            0,
+            "analysis profile mismatch",
+        ),
+        (
+            15,
+            6,
+            1,
+            1,
+            False,
+            "analysis_profile_mismatch",
+            1,
+            "analysis profile mismatch",
+        ),
+        (1, 1, 1, 1, True, "ok", 1, None),
+    ],
+)
+def test_cli_cache_analysis_profile_compatibility(
+    tmp_path: Path,
+    monkeypatch: pytest.MonkeyPatch,
+    capsys: pytest.CaptureFixture[str],
+    first_min_loc: int,
+    first_min_stmt: int,
+    second_min_loc: int,
+    second_min_stmt: int,
+    expected_cache_used: bool,
+    expected_cache_status: str,
+    expected_functions_total: int,
+    expected_warning: str | None,
+) -> None:
+    src = tmp_path / "a.py"
+    src.write_text(
+        """
+def f1():
+    x = 1
+    return x
+def f2():
+    y = 1
+    return y
+""",
+        "utf-8",
+    )
+    baseline_path = _write_baseline(
+        tmp_path / "baseline.json",
+        python_version=f"{sys.version_info.major}.{sys.version_info.minor}",
+    )
+    cache_path = tmp_path / "cache.json"
+    json_first = tmp_path / "report-first.json"
+    json_second = tmp_path / "report-second.json"
+    _patch_parallel(monkeypatch)
+    _run_main(
+        monkeypatch,
+        [
+            str(tmp_path),
+            "--baseline",
+            str(baseline_path),
+            "--cache-path",
+            str(cache_path),
+            "--json",
+            str(json_first),
+            "--min-loc",
+            str(first_min_loc),
+            "--min-stmt",
+            str(first_min_stmt),
+            "--no-progress",
+        ],
+    )
+    capsys.readouterr()
+    _run_main(
+        monkeypatch,
+        [
+            str(tmp_path),
+            "--baseline",
+            str(baseline_path),
+            "--cache-path",
+            str(cache_path),
+            "--json",
+            str(json_second),
+            "--min-loc",
+            str(second_min_loc),
+            "--min-stmt",
+            str(second_min_stmt),
+            "--no-progress",
+        ],
+    )
+    out = capsys.readouterr().out
+    payload = json.loads(json_second.read_text("utf-8"))
+    meta = payload["meta"]
+    if expected_warning is not None:
+        assert expected_warning in out
+    assert meta["cache_used"] is expected_cache_used
+    assert meta["cache_status"] == expected_cache_status
+    assert meta["groups_counts"]["functions"]["total"] == expected_functions_total
 @pytest.mark.parametrize(
     ("flag", "bad_name", "label", "expected"),
     [

{codeclone-1.4.2 → codeclone-1.4.4}/tests/test_html_report.py RENAMED Viewed

@@ -6,7 +6,7 @@ from typing import Any
 import pytest
-from codeclone.contracts import DOCS_URL, ISSUES_URL, REPOSITORY_URL
+from codeclone.contracts import CACHE_VERSION, DOCS_URL, ISSUES_URL, REPOSITORY_URL
 from codeclone.errors import FileProcessingError
 from codeclone.html_report import (
     _FileCache,
@@ -507,7 +507,7 @@ def test_html_report_includes_provenance_metadata(
         'data-cache-used="true"',
         "Cache schema",
         "Cache status",
-        'data-cache-schema-version="1.2"',
+        f'data-cache-schema-version="{CACHE_VERSION}"',
         'data-cache-status="ok"',
         'data-files-skipped-source-io="0"',
         "Source IO skipped",

{codeclone-1.4.2 → codeclone-1.4.4}/tests/test_report.py RENAMED Viewed

@@ -7,7 +7,7 @@ from typing import cast
 import pytest
 import codeclone.report as report_mod
-from codeclone.contracts import REPORT_SCHEMA_VERSION
+from codeclone.contracts import CACHE_VERSION, REPORT_SCHEMA_VERSION
 from codeclone.report import (
     GroupMap,
     build_block_group_facts,
@@ -276,7 +276,7 @@ def test_report_output_formats(
         '"baseline_schema_version": 1',
         f'"baseline_payload_sha256": "{"a" * 64}"',
         '"baseline_payload_sha256_verified": true',
-        '"cache_schema_version": "1.2"',
+        f'"cache_schema_version": "{CACHE_VERSION}"',
         '"cache_status": "ok"',
         '"files_skipped_source_io": 0',
     ]
@@ -288,7 +288,7 @@ def test_report_output_formats(
         "Baseline generator name: codeclone",
         f"Baseline payload sha256: {'a' * 64}",
         "Baseline payload verified: true",
-        "Cache schema version: 1.2",
+        f"Cache schema version: {CACHE_VERSION}",
         "Cache status: ok",
         "Source IO skipped: 0",
         "FUNCTION CLONES (NEW) (groups=2)",