PyPI - codeclone - Versions diffs - 1.4.3__tar.gz → 1.4.4__tar.gz - Mend

codeclone 1.4.3tar.gz → 1.4.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

{codeclone-1.4.3 → codeclone-1.4.4}/LICENSE RENAMED Viewed

@@ -18,4 +18,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
+SOFTWARE.

{codeclone-1.4.3 → codeclone-1.4.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codeclone
-Version: 1.4.3
+Version: 1.4.4
 Summary: AST and CFG-based code clone detector for Python focused on architectural duplication
 Author-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
 Maintainer-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
@@ -49,7 +49,7 @@ Dynamic: license-file
 ![Baseline](https://img.shields.io/badge/baseline-versioned-green?style=flat-square)
 [![License](https://img.shields.io/pypi/l/codeclone.svg?style=flat-square)](LICENSE)
-**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
+**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
 It discovers architectural duplication and prevents new copy-paste from entering your codebase via CI.
 ---
@@ -75,13 +75,13 @@ Unlike token-based tools, CodeClone compares **structure and control flow**, mak
 **Three Detection Levels:**
-1. **Function clones (CFG fingerprint)**
+1. **Function clones (CFG fingerprint)**
    Strong structural signal for cross-layer duplication
-2. **Block clones (statement windows)**
+2. **Block clones (statement windows)**
    Detects repeated local logic patterns
-3. **Segment clones (report-only)**
+3. **Segment clones (report-only)**
    Internal function repetition for explainability; not used for baseline gating
 **CI-Ready Features:**

{codeclone-1.4.3 → codeclone-1.4.4}/README.md RENAMED Viewed

@@ -8,7 +8,7 @@
 ![Baseline](https://img.shields.io/badge/baseline-versioned-green?style=flat-square)
 [![License](https://img.shields.io/pypi/l/codeclone.svg?style=flat-square)](LICENSE)
-**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
+**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
 It discovers architectural duplication and prevents new copy-paste from entering your codebase via CI.
 ---
@@ -34,13 +34,13 @@ Unlike token-based tools, CodeClone compares **structure and control flow**, mak
 **Three Detection Levels:**
-1. **Function clones (CFG fingerprint)**
+1. **Function clones (CFG fingerprint)**
    Strong structural signal for cross-layer duplication
-2. **Block clones (statement windows)**
+2. **Block clones (statement windows)**
    Detects repeated local logic patterns
-3. **Segment clones (report-only)**
+3. **Segment clones (report-only)**
    Internal function repetition for explainability; not used for baseline gating
 **CI-Ready Features:**

{codeclone-1.4.3 → codeclone-1.4.4}/codeclone/_html_snippets.py RENAMED Viewed

@@ -14,6 +14,7 @@ import itertools
 from collections.abc import Iterable
 from dataclasses import dataclass
 from functools import lru_cache
+from types import ModuleType
 from typing import NamedTuple, cast
 from .errors import FileProcessingError
@@ -34,33 +35,19 @@ class _Snippet:
 class _FileCache:
-    __slots__ = ("_get_lines_impl", "maxsize")
+    __slots__ = ("_get_file_lines_impl", "maxsize")
     def __init__(self, maxsize: int = 128) -> None:
         self.maxsize = maxsize
-        self._get_lines_impl = lru_cache(maxsize=maxsize)(self._read_file_range)
+        self._get_file_lines_impl = lru_cache(maxsize=maxsize)(self._read_file_lines)
     @staticmethod
-    def _read_file_range(
-        filepath: str, start_line: int, end_line: int
-    ) -> tuple[str, ...]:
-        if start_line < 1:
-            start_line = 1
-        if end_line < start_line:
-            return ()
+    def _read_file_lines(filepath: str) -> tuple[str, ...]:
         try:
             def _read_with_errors(errors: str) -> tuple[str, ...]:
-                lines: list[str] = []
                 with open(filepath, encoding="utf-8", errors=errors) as f:
-                    for lineno, line in enumerate(f, start=1):
-                        if lineno < start_line:
-                            continue
-                        if lineno > end_line:
-                            break
-                        lines.append(line.rstrip("\n"))
-                return tuple(lines)
+                    return tuple(line.rstrip("\n") for line in f)
             try:
                 return _read_with_errors("strict")
@@ -72,7 +59,16 @@ class _FileCache:
     def get_lines_range(
         self, filepath: str, start_line: int, end_line: int
     ) -> tuple[str, ...]:
-        return self._get_lines_impl(filepath, start_line, end_line)
+        if start_line < 1:
+            start_line = 1
+        if end_line < start_line:
+            return ()
+        lines = self._get_file_lines_impl(filepath)
+        start_index = start_line - 1
+        if start_index >= len(lines):
+            return ()
+        end_index = min(len(lines), end_line)
+        return lines[start_index:end_index]
     class _CacheInfo(NamedTuple):
         hits: int
@@ -81,10 +77,30 @@ class _FileCache:
         currsize: int
     def cache_info(self) -> _CacheInfo:
-        return cast(_FileCache._CacheInfo, self._get_lines_impl.cache_info())
+        return cast(_FileCache._CacheInfo, self._get_file_lines_impl.cache_info())
-def _try_pygments(code: str) -> str | None:
+_PYGMENTS_IMPORTER_ID: int | None = None
+_PYGMENTS_API: tuple[ModuleType, ModuleType, ModuleType] | None = None
+def _load_pygments_api() -> tuple[ModuleType, ModuleType, ModuleType] | None:
+    """
+    Load pygments modules once per import-function identity.
+    Tests monkeypatch `importlib.import_module`; tracking importer identity keeps
+    behavior deterministic and allows import-error branches to stay testable.
+    """
+    global _PYGMENTS_IMPORTER_ID
+    global _PYGMENTS_API
+    importer_id = id(importlib.import_module)
+    if importer_id != _PYGMENTS_IMPORTER_ID:
+        _PYGMENTS_IMPORTER_ID = importer_id
+        _PYGMENTS_API = None
+    if _PYGMENTS_API is not None:
+        return _PYGMENTS_API
     try:
         pygments = importlib.import_module("pygments")
         formatters = importlib.import_module("pygments.formatters")
@@ -92,6 +108,16 @@ def _try_pygments(code: str) -> str | None:
     except ImportError:
         return None
+    _PYGMENTS_API = (pygments, formatters, lexers)
+    return _PYGMENTS_API
+def _try_pygments(code: str) -> str | None:
+    pygments_api = _load_pygments_api()
+    if pygments_api is None:
+        return None
+    pygments, formatters, lexers = pygments_api
     highlight = pygments.highlight
     formatter_cls = formatters.HtmlFormatter
     lexer_cls = lexers.PythonLexer
@@ -104,10 +130,10 @@ def _pygments_css(style_name: str) -> str:
     Returns CSS for pygments tokens. Scoped to `.codebox` to avoid leaking styles.
     If Pygments is not available or style missing, returns "".
     """
-    try:
-        formatters = importlib.import_module("pygments.formatters")
-    except ImportError:
+    pygments_api = _load_pygments_api()
+    if pygments_api is None:
         return ""
+    _, formatters, _ = pygments_api
     try:
         formatter_cls = formatters.HtmlFormatter

{codeclone-1.4.3 → codeclone-1.4.4}/codeclone/_report_explain.py RENAMED Viewed

@@ -9,6 +9,8 @@ Licensed under the MIT License.
 from __future__ import annotations
 import ast
+from bisect import bisect_left, bisect_right
+from dataclasses import dataclass
 from pathlib import Path
 from ._report_explain_contract import (
@@ -23,6 +25,19 @@ from ._report_explain_contract import (
 from ._report_types import GroupItem, GroupMap
+@dataclass(frozen=True, slots=True)
+class _StatementRecord:
+    node: ast.stmt
+    start_line: int
+    end_line: int
+    start_col: int
+    end_col: int
+    type_name: str
+_StatementIndex = tuple[tuple[_StatementRecord, ...], tuple[int, ...]]
 def _signature_parts(group_key: str) -> list[str]:
     return [part for part in group_key.split("|") if part]
@@ -42,6 +57,53 @@ def _parsed_file_tree(
     return tree
+def _build_statement_index(tree: ast.AST) -> _StatementIndex:
+    records = tuple(
+        sorted(
+            (
+                _StatementRecord(
+                    node=node,
+                    start_line=int(getattr(node, "lineno", 0)),
+                    end_line=int(getattr(node, "end_lineno", 0)),
+                    start_col=int(getattr(node, "col_offset", 0)),
+                    end_col=int(getattr(node, "end_col_offset", 0)),
+                    type_name=type(node).__name__,
+                )
+                for node in ast.walk(tree)
+                if isinstance(node, ast.stmt)
+            ),
+            key=lambda record: (
+                record.start_line,
+                record.end_line,
+                record.start_col,
+                record.end_col,
+                record.type_name,
+            ),
+        )
+    )
+    start_lines = tuple(record.start_line for record in records)
+    return records, start_lines
+def _parsed_statement_index(
+    filepath: str,
+    *,
+    ast_cache: dict[str, ast.AST | None],
+    stmt_index_cache: dict[str, _StatementIndex | None],
+) -> _StatementIndex | None:
+    if filepath in stmt_index_cache:
+        return stmt_index_cache[filepath]
+    tree = _parsed_file_tree(filepath, ast_cache=ast_cache)
+    if tree is None:
+        stmt_index_cache[filepath] = None
+        return None
+    index = _build_statement_index(tree)
+    stmt_index_cache[filepath] = index
+    return index
 def _is_assert_like_stmt(stmt: ast.stmt) -> bool:
     if isinstance(stmt, ast.Assert):
         return True
@@ -64,45 +126,42 @@ def _assert_range_stats(
     start_line: int,
     end_line: int,
     ast_cache: dict[str, ast.AST | None],
+    stmt_index_cache: dict[str, _StatementIndex | None],
     range_cache: dict[tuple[str, int, int], tuple[int, int, int]],
 ) -> tuple[int, int, int]:
     cache_key = (filepath, start_line, end_line)
     if cache_key in range_cache:
         return range_cache[cache_key]
-    tree = _parsed_file_tree(filepath, ast_cache=ast_cache)
-    if tree is None:
+    statement_index = _parsed_statement_index(
+        filepath,
+        ast_cache=ast_cache,
+        stmt_index_cache=stmt_index_cache,
+    )
+    if statement_index is None:
         range_cache[cache_key] = (0, 0, 0)
         return 0, 0, 0
-    stmts = [
-        node
-        for node in ast.walk(tree)
-        if isinstance(node, ast.stmt)
-        and int(getattr(node, "lineno", 0)) >= start_line
-        and int(getattr(node, "end_lineno", 0)) <= end_line
-    ]
-    if not stmts:
+    records, start_lines = statement_index
+    if not records:
         range_cache[cache_key] = (0, 0, 0)
         return 0, 0, 0
-    ordered_stmts = sorted(
-        stmts,
-        key=lambda stmt: (
-            int(getattr(stmt, "lineno", 0)),
-            int(getattr(stmt, "end_lineno", 0)),
-            int(getattr(stmt, "col_offset", 0)),
-            int(getattr(stmt, "end_col_offset", 0)),
-            type(stmt).__name__,
-        ),
-    )
+    left = bisect_left(start_lines, start_line)
+    right = bisect_right(start_lines, end_line)
+    if left >= right:
+        range_cache[cache_key] = (0, 0, 0)
+        return 0, 0, 0
-    total = len(ordered_stmts)
+    total = 0
     assert_like = 0
     max_consecutive = 0
     current_consecutive = 0
-    for stmt in ordered_stmts:
-        if _is_assert_like_stmt(stmt):
+    for record in records[left:right]:
+        if record.end_line > end_line:
+            continue
+        total += 1
+        if _is_assert_like_stmt(record.node):
             assert_like += 1
             current_consecutive += 1
             if current_consecutive > max_consecutive:
@@ -110,6 +169,10 @@ def _assert_range_stats(
         else:
             current_consecutive = 0
+    if total == 0:
+        range_cache[cache_key] = (0, 0, 0)
+        return 0, 0, 0
     stats = (total, assert_like, max_consecutive)
     range_cache[cache_key] = stats
     return stats
@@ -121,6 +184,7 @@ def _is_assert_only_range(
     start_line: int,
     end_line: int,
     ast_cache: dict[str, ast.AST | None],
+    stmt_index_cache: dict[str, _StatementIndex | None],
     range_cache: dict[tuple[str, int, int], tuple[int, int, int]],
 ) -> bool:
     total, assert_like, _ = _assert_range_stats(
@@ -128,6 +192,7 @@ def _is_assert_only_range(
         start_line=start_line,
         end_line=end_line,
         ast_cache=ast_cache,
+        stmt_index_cache=stmt_index_cache,
         range_cache=range_cache,
     )
     return total > 0 and total == assert_like
@@ -157,6 +222,7 @@ def _enrich_with_assert_facts(
     facts: dict[str, str],
     items: list[GroupItem],
     ast_cache: dict[str, ast.AST | None],
+    stmt_index_cache: dict[str, _StatementIndex | None],
     range_cache: dict[tuple[str, int, int], tuple[int, int, int]],
 ) -> None:
     assert_only = True
@@ -181,6 +247,7 @@ def _enrich_with_assert_facts(
                 start_line=start_line,
                 end_line=end_line,
                 ast_cache=ast_cache,
+                stmt_index_cache=stmt_index_cache,
                 range_cache=range_cache,
             )
             total_statements += range_total
@@ -198,6 +265,7 @@ def _enrich_with_assert_facts(
                 start_line=start_line,
                 end_line=end_line,
                 ast_cache=ast_cache,
+                stmt_index_cache=stmt_index_cache,
                 range_cache=range_cache,
             )
         ):
@@ -223,6 +291,7 @@ def build_block_group_facts(block_groups: GroupMap) -> dict[str, dict[str, str]]
     Renderers (HTML/TXT/JSON) should only display these facts.
     """
     ast_cache: dict[str, ast.AST | None] = {}
+    stmt_index_cache: dict[str, _StatementIndex | None] = {}
     range_cache: dict[tuple[str, int, int], tuple[int, int, int]] = {}
     facts_by_group: dict[str, dict[str, str]] = {}
@@ -232,6 +301,7 @@ def build_block_group_facts(block_groups: GroupMap) -> dict[str, dict[str, str]]
             facts=facts,
             items=items,
             ast_cache=ast_cache,
+            stmt_index_cache=stmt_index_cache,
             range_cache=range_cache,
         )
         group_arity = len(items)

{codeclone-1.4.3 → codeclone-1.4.4}/codeclone.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: codeclone
-Version: 1.4.3
+Version: 1.4.4
 Summary: AST and CFG-based code clone detector for Python focused on architectural duplication
 Author-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
 Maintainer-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
@@ -49,7 +49,7 @@ Dynamic: license-file
 ![Baseline](https://img.shields.io/badge/baseline-versioned-green?style=flat-square)
 [![License](https://img.shields.io/pypi/l/codeclone.svg?style=flat-square)](LICENSE)
-**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
+**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
 It discovers architectural duplication and prevents new copy-paste from entering your codebase via CI.
 ---
@@ -75,13 +75,13 @@ Unlike token-based tools, CodeClone compares **structure and control flow**, mak
 **Three Detection Levels:**
-1. **Function clones (CFG fingerprint)**
+1. **Function clones (CFG fingerprint)**
    Strong structural signal for cross-layer duplication
-2. **Block clones (statement windows)**
+2. **Block clones (statement windows)**
    Detects repeated local logic patterns
-3. **Segment clones (report-only)**
+3. **Segment clones (report-only)**
    Internal function repetition for explainability; not used for baseline gating
 **CI-Ready Features:**

{codeclone-1.4.3 → codeclone-1.4.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "codeclone"
-version = "1.4.3"
+version = "1.4.4"
 description = "AST and CFG-based code clone detector for Python focused on architectural duplication"
 readme = { file = "README.md", content-type = "text/markdown" }
 license = { text = "MIT" }