PyPI - logicdiff - Versions diffs - 0.1.0__tar.gz - Mend

logicdiff 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

logicdiff-0.1.0/LICENSE +21 -0
logicdiff-0.1.0/PKG-INFO +116 -0
logicdiff-0.1.0/README.md +93 -0
logicdiff-0.1.0/pyproject.toml +38 -0
logicdiff-0.1.0/setup.cfg +4 -0
logicdiff-0.1.0/src/logicdiff/__init__.py +3 -0
logicdiff-0.1.0/src/logicdiff/__main__.py +6 -0
logicdiff-0.1.0/src/logicdiff/cli.py +167 -0
logicdiff-0.1.0/src/logicdiff/core.py +225 -0
logicdiff-0.1.0/src/logicdiff.egg-info/PKG-INFO +116 -0
logicdiff-0.1.0/src/logicdiff.egg-info/SOURCES.txt +14 -0
logicdiff-0.1.0/src/logicdiff.egg-info/dependency_links.txt +1 -0
logicdiff-0.1.0/src/logicdiff.egg-info/entry_points.txt +2 -0
logicdiff-0.1.0/src/logicdiff.egg-info/top_level.txt +1 -0
logicdiff-0.1.0/tests/test_cli.py +31 -0
logicdiff-0.1.0/tests/test_core.py +111 -0

logicdiff-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 logicdiff contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

logicdiff-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,116 @@
+Metadata-Version: 2.4
+Name: logicdiff
+Version: 0.1.0
+Summary: A whitespace- and reflow-blind diff: folds respacing AND line re-wrapping that git diff -w can't, and tells you if a change is logical or just formatting. Zero dependencies.
+Author: yyfjj
+License: MIT
+Project-URL: Homepage, https://github.com/jjdoor/logicdiff-py
+Project-URL: Repository, https://github.com/jjdoor/logicdiff-py
+Project-URL: Issues, https://github.com/jjdoor/logicdiff-py/issues
+Keywords: diff,whitespace,reflow,format,git,code-review,cli,ci,devtools
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Topic :: Software Development :: Version Control
+Classifier: Topic :: Utilities
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Dynamic: license-file
+# logicdiff
+**A whitespace- and reflow-blind diff.** A pull request reindents a file and
+rewraps a few long lines, and now `git diff` shows 80 changed lines — but did
+anything *actually* change? `logicdiff` answers that: it folds away pure
+formatting (respacing **and** line reflow) and shows only the logical changes.
+```bash
+logicdiff old.js new.js
+#   only formatting differs - no logical change (a line diff would show 80 changed lines)
+logicdiff a.js b.js
+#   --- a.js
+#   +++ b.js
+#   -42:   const total = price * qty;
+#   +51:   const total = price + qty;
+#
+#   1 token removed, 1 added across 2 logical lines (78 lines folded as reflow/whitespace)
+```
+Exit `0` when the change is formatting-only (or identical), `1` when there's a
+real logical change — so CI can ask "is this PR just a reformat?" Zero
+dependencies, language-agnostic, also on npm (`npx logicdiff`) — the two
+builds produce **byte-for-byte identical** output.
+## Why not `git diff -w`?
+`git diff -w` (ignore-all-space) folds *respacing* — but it is still
+line-anchored, so it **cannot fold reflow**. Re-wrap a function signature across
+three lines and `git diff -w` still shows 1 removed + 3 added, even though not a
+single token changed. That exact gap is [GitHub discussion #20610]
+("Ignore Format Changes in Diff"), open and unanswered for years.
+`difftastic` solves it beautifully with per-language tree-sitter parsing — but
+it's a multi-megabyte binary, needs a grammar for each language (config/log/DSL
+files fall back to text), and it's a *display* tool with no "is this
+formatting-only?" exit code.
+`logicdiff` is the lightweight middle ground: **zero-config, zero-dependency,
+language-agnostic** (works on any text — code, YAML, logs, DSLs), folds *both*
+whitespace and reflow, and gives a one-shot CLI answer plus a CI exit code.
+## How it works
+It tokenizes each file into a sequence of tokens — a token is a run of
+`[A-Za-z0-9_]` or a single punctuation character, and **whitespace is dropped**.
+So `a+b`, `a + b`, and `a +\n  b` all become the same token stream `[a, + , b]`:
+respacing and line breaks become invisible. It then runs the canonical
+[Myers diff] on the token streams. If the streams are equal, the change is
+formatting-only. If not, the changed tokens are mapped back to their line
+numbers and shown.
+Because it has no language parser, whitespace **inside string literals** is also
+ignored — `x = "a b"` and `x = "a  b"` are "formatting only", exactly like
+`git diff -w`. That's a deliberate, documented limitation, not a bug.
+## Usage
+```bash
+logicdiff old new            # human diff (or "only formatting differs")
+logicdiff old new --stat     # just the counts, machine-friendly key=value
+logicdiff old new --json     # structured output (byte-identical both builds)
+logicdiff old new -q         # no output, exit code only (the CI gate)
+cat new | logicdiff old -     # - reads stdin
+```
+`--color=auto|always|never`, `--max-tokens N` (bail over N tokens, default 2,000,000).
+Two wildly dissimilar inputs (a huge edit distance) also bail with exit `2` instead of
+risking the heap — logicdiff is for spotting a real change inside a reformat, not for
+diffing unrelated files.
+Exit codes: `0` identical or formatting-only · `1` logical changes · `2` error.
+```yaml
+# CI: warn when a PR is more than a reformat
+- run: logicdiff "$BASE" "$HEAD" -q || echo "::warning::real code change, review carefully"
+```
+## Install
+```bash
+pip install logicdiff # or pipx run logicdiff
+npm i -g logicdiff    # Node build, identical behaviour
+```
+Python ≥ 3.8 or Node ≥ 18. No dependencies.
+[GitHub discussion #20610]: https://github.com/orgs/community/discussions/20610
+[Myers diff]: https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
+## License
+MIT

logicdiff-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,93 @@
+# logicdiff
+**A whitespace- and reflow-blind diff.** A pull request reindents a file and
+rewraps a few long lines, and now `git diff` shows 80 changed lines — but did
+anything *actually* change? `logicdiff` answers that: it folds away pure
+formatting (respacing **and** line reflow) and shows only the logical changes.
+```bash
+logicdiff old.js new.js
+#   only formatting differs - no logical change (a line diff would show 80 changed lines)
+logicdiff a.js b.js
+#   --- a.js
+#   +++ b.js
+#   -42:   const total = price * qty;
+#   +51:   const total = price + qty;
+#
+#   1 token removed, 1 added across 2 logical lines (78 lines folded as reflow/whitespace)
+```
+Exit `0` when the change is formatting-only (or identical), `1` when there's a
+real logical change — so CI can ask "is this PR just a reformat?" Zero
+dependencies, language-agnostic, also on npm (`npx logicdiff`) — the two
+builds produce **byte-for-byte identical** output.
+## Why not `git diff -w`?
+`git diff -w` (ignore-all-space) folds *respacing* — but it is still
+line-anchored, so it **cannot fold reflow**. Re-wrap a function signature across
+three lines and `git diff -w` still shows 1 removed + 3 added, even though not a
+single token changed. That exact gap is [GitHub discussion #20610]
+("Ignore Format Changes in Diff"), open and unanswered for years.
+`difftastic` solves it beautifully with per-language tree-sitter parsing — but
+it's a multi-megabyte binary, needs a grammar for each language (config/log/DSL
+files fall back to text), and it's a *display* tool with no "is this
+formatting-only?" exit code.
+`logicdiff` is the lightweight middle ground: **zero-config, zero-dependency,
+language-agnostic** (works on any text — code, YAML, logs, DSLs), folds *both*
+whitespace and reflow, and gives a one-shot CLI answer plus a CI exit code.
+## How it works
+It tokenizes each file into a sequence of tokens — a token is a run of
+`[A-Za-z0-9_]` or a single punctuation character, and **whitespace is dropped**.
+So `a+b`, `a + b`, and `a +\n  b` all become the same token stream `[a, + , b]`:
+respacing and line breaks become invisible. It then runs the canonical
+[Myers diff] on the token streams. If the streams are equal, the change is
+formatting-only. If not, the changed tokens are mapped back to their line
+numbers and shown.
+Because it has no language parser, whitespace **inside string literals** is also
+ignored — `x = "a b"` and `x = "a  b"` are "formatting only", exactly like
+`git diff -w`. That's a deliberate, documented limitation, not a bug.
+## Usage
+```bash
+logicdiff old new            # human diff (or "only formatting differs")
+logicdiff old new --stat     # just the counts, machine-friendly key=value
+logicdiff old new --json     # structured output (byte-identical both builds)
+logicdiff old new -q         # no output, exit code only (the CI gate)
+cat new | logicdiff old -     # - reads stdin
+```
+`--color=auto|always|never`, `--max-tokens N` (bail over N tokens, default 2,000,000).
+Two wildly dissimilar inputs (a huge edit distance) also bail with exit `2` instead of
+risking the heap — logicdiff is for spotting a real change inside a reformat, not for
+diffing unrelated files.
+Exit codes: `0` identical or formatting-only · `1` logical changes · `2` error.
+```yaml
+# CI: warn when a PR is more than a reformat
+- run: logicdiff "$BASE" "$HEAD" -q || echo "::warning::real code change, review carefully"
+```
+## Install
+```bash
+pip install logicdiff # or pipx run logicdiff
+npm i -g logicdiff    # Node build, identical behaviour
+```
+Python ≥ 3.8 or Node ≥ 18. No dependencies.
+[GitHub discussion #20610]: https://github.com/orgs/community/discussions/20610
+[Myers diff]: https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
+## License
+MIT

logicdiff-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,38 @@
+[build-system]
+requires = ["setuptools>=68"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "logicdiff"
+version = "0.1.0"
+description = "A whitespace- and reflow-blind diff: folds respacing AND line re-wrapping that git diff -w can't, and tells you if a change is logical or just formatting. Zero dependencies."
+readme = "README.md"
+requires-python = ">=3.8"
+license = { text = "MIT" }
+authors = [{ name = "yyfjj" }]
+keywords = ["diff", "whitespace", "reflow", "format", "git", "code-review", "cli", "ci", "devtools"]
+classifiers = [
+    "Development Status :: 4 - Beta",
+    "Environment :: Console",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 3",
+    "Topic :: Software Development :: Version Control",
+    "Topic :: Utilities",
+]
+dependencies = []
+[project.urls]
+Homepage = "https://github.com/jjdoor/logicdiff-py"
+Repository = "https://github.com/jjdoor/logicdiff-py"
+Issues = "https://github.com/jjdoor/logicdiff-py/issues"
+[project.scripts]
+logicdiff = "logicdiff.cli:main"
+[tool.setuptools]
+package-dir = { "" = "src" }
+[tool.setuptools.packages.find]
+where = ["src"]

logicdiff-0.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

logicdiff-0.1.0/src/logicdiff/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""logicdiff — a whitespace- and reflow-blind diff. Zero dependencies."""
+__version__ = "0.1.0"

logicdiff-0.1.0/src/logicdiff/__main__.py ADDED Viewed

@@ -0,0 +1,6 @@
+import sys
+from .cli import main
+if __name__ == "__main__":
+    sys.exit(main())

logicdiff-0.1.0/src/logicdiff/cli.py ADDED Viewed

@@ -0,0 +1,167 @@
+import json
+import os
+import re
+import signal
+import sys
+from . import __version__
+from . import core
+DEFAULT_MAX_TOKENS = 2000000
+def _color_on(mode):
+    if mode == "always":
+        return True
+    if mode == "never":
+        return False
+    return sys.stdout.isatty() and not os.environ.get("NO_COLOR")
+def _mk_paint(on):
+    def col(c, s):
+        return f"\x1b[{c}m{s}\x1b[0m" if on else s
+    return {
+        "red": lambda s: col("31", s), "green": lambda s: col("32", s),
+        "yellow": lambda s: col("33", s), "dim": lambda s: col("2", s),
+        "bold": lambda s: col("1", s), "cyan": lambda s: col("36", s),
+    }
+def _help():
+    b = _mk_paint(_color_on("auto"))["bold"]
+    return (
+        f"{b('logicdiff')} — a whitespace- and reflow-blind diff. Zero dependencies.\n"
+        "\n"
+        "Diffs two files but folds away pure formatting — respacing AND line reflow — and\n"
+        "shows only the logical changes. Where `git diff -w` still flags a re-wrapped block,\n"
+        "logicdiff says \"only formatting differs\". Language-agnostic, no parser.\n"
+        "\n"
+        f"{b('Usage')}\n"
+        "  logicdiff <old> <new>      Compare two files (use - for stdin)\n"
+        "\n"
+        f"{b('Options')}\n"
+        "  --stat             Print only the counts (machine-friendly key=value)\n"
+        "  --json             Machine-readable output (byte-identical across Node and Python)\n"
+        "  -q, --quiet        No output; exit code only\n"
+        "  --color=auto|always|never   Colorize (default auto)\n"
+        f"  --max-tokens N     Bail (exit 2) over N tokens total (default {DEFAULT_MAX_TOKENS})\n"
+        "  --help | --version\n"
+        "\n"
+        f"{b('Exit')}  0 identical or formatting-only · 1 logical changes · 2 error\n"
+    )
+class _Exit(Exception):
+    def __init__(self, code):
+        self.code = code
+def _out(s):
+    sys.stdout.buffer.write(s.encode("latin-1"))
+def main(argv=None):
+    try:
+        signal.signal(signal.SIGPIPE, signal.SIG_DFL)
+    except (AttributeError, ValueError):
+        pass
+    argv = list(sys.argv[1:] if argv is None else argv)
+    def die(msg):
+        sys.stderr.write(_mk_paint(_color_on("auto"))["red"](f"logicdiff: {msg}\n"))
+        raise _Exit(2)
+    try:
+        dd_idx = argv.index("--") if "--" in argv else len(argv)
+        pre = argv[:dd_idx]
+        if "-h" in pre or "--help" in pre:
+            sys.stdout.write(_help())
+            return 0
+        if "-v" in pre or "--version" in pre:
+            sys.stdout.write(__version__ + "\n")
+            return 0
+        as_stat = as_json = quiet = False
+        color = "auto"
+        max_tokens = DEFAULT_MAX_TOKENS
+        files = []
+        dd = False
+        i = 0
+        while i < len(argv):
+            a = argv[i]
+            if dd:
+                files.append(a); i += 1; continue
+            if a == "--":
+                dd = True; i += 1; continue
+            eq = a.find("=") if a.startswith("--") else -1
+            flag = a[:eq] if eq != -1 else a
+            inline = a[eq + 1:] if eq != -1 else None
+            if flag == "--stat":
+                as_stat = True
+            elif flag == "--json":
+                as_json = True
+            elif a == "-q" or flag == "--quiet":
+                quiet = True
+            elif flag == "--color":
+                color = inline if inline is not None else (argv[i + 1] if i + 1 < len(argv) else "")
+                if inline is None:
+                    i += 1
+                if color not in ("auto", "always", "never"):
+                    die("--color must be auto, always or never")
+            elif flag == "--max-tokens":
+                val = inline if inline is not None else (argv[i + 1] if i + 1 < len(argv) else "")
+                if inline is None:
+                    i += 1
+                # Strict integer grammar (optional leading +, decimal digits only) so this
+                # parses byte-identically to the Node build. fullmatch (not $) avoids Python's
+                # trailing-newline match; int() alone would accept PEP-515 '1_000'.
+                if not re.fullmatch(r"\+?[0-9]+", val):
+                    die("--max-tokens must be a positive integer")
+                max_tokens = int(val)
+                if max_tokens <= 0:
+                    die("--max-tokens must be a positive integer")
+            elif a in ("-h", "--help", "-v", "--version"):
+                pass
+            elif a.startswith("-") and a != "-":
+                die(f"unknown option: {a} (use -- to end options)")
+            else:
+                files.append(a)
+            i += 1
+        if len(files) != 2:
+            die(f"expected exactly 2 files, got {len(files)} (usage: logicdiff <old> <new>)")
+        def read_input(p):
+            try:
+                buf = sys.stdin.buffer.read() if p == "-" else open(p, "rb").read()
+            except OSError:
+                die(f"cannot read '{p}'")  # OS-specific detail omitted for Node/Python parity
+            if b"\x00" in buf:
+                return None
+            return buf.decode("latin-1")
+        a_text = read_input(files[0])
+        b_text = read_input(files[1])
+        if a_text is None or b_text is None:
+            if not quiet:
+                sys.stderr.write("logicdiff: binary file, not diffed\n")
+            return 2
+        try:
+            result = core.compare(a_text, b_text, {"maxTokens": max_tokens})
+        except ValueError as e:
+            die(str(e))
+        paint = _mk_paint(_color_on(color) and not as_json and not as_stat)
+        if as_json:
+            _out(json.dumps(core.to_json(result, files[0], files[1]), indent=2, ensure_ascii=False) + "\n")
+        elif as_stat:
+            _out(core.stat_line(result) + "\n")
+        elif not quiet:
+            _out(core.format_human(result, files[0], files[1], paint) + "\n")
+        return 0 if (result["identical"] or result["formattingOnly"]) else 1
+    except _Exit as ex:
+        return ex.code

logicdiff-0.1.0/src/logicdiff/core.py ADDED Viewed

@@ -0,0 +1,225 @@
+"""logicdiff core — pure whitespace-and-reflow-blind diff. No fs, no clock.
+The trick: diff the two files as sequences of TOKENS (words + single punctuation
+chars), with whitespace dropped, so respacing AND line-reflow are invisible. If the
+token sequences are equal, the files differ only in formatting. Token comparison is
+on TEXT ONLY (line numbers are metadata) — that's what makes reflow fold.
+Byte-for-byte identical to the Node build: everything runs on a latin-1
+(byte-faithful) string, the tokenizer uses explicit ASCII classes (never \\w/\\s), and
+the diff is the canonical Myers O(ND) algorithm with a pinned tie-break.
+"""
+WS = "\x20\x09\x0a\x0d\x0c\x0b"  # space, tab, LF, CR, FF, VT
+def is_word(code):
+    return (65 <= code <= 90) or (97 <= code <= 122) or (48 <= code <= 57) or code == 95
+def tokenize(s):
+    """[{text,line}]; word=[A-Za-z0-9_] run, else single non-ws char; ws dropped;
+    only '\\n' increments the 1-based line (lone '\\r' does not -> CRLF == LF)."""
+    toks = []
+    line = 1
+    i = 0
+    n = len(s)
+    while i < n:
+        ch = s[i]
+        if ch == "\n":
+            line += 1
+            i += 1
+            continue
+        if ch in WS:
+            i += 1
+            continue
+        if is_word(ord(ch)):
+            j = i + 1
+            while j < n and is_word(ord(s[j])):
+                j += 1
+            toks.append({"text": s[i:j], "line": line})
+            i = j
+        else:
+            toks.append({"text": ch, "line": line})
+            i += 1
+    return toks
+def split_lines(s):
+    return s.split("\n")
+# ---- canonical Myers O(ND) diff over string arrays -------------------------
+# Coglan reference variant. Tie-break: prefer "down" (insertion) only when
+# k == -d OR (k != d AND V[k-1] < V[k+1]) with strict <. Snapshot V before each
+# d-round. Implemented identically in the Node build.
+# Bail before the O(D*(n+m)) trace exhausts memory (D = edit distance). ~1.2e8 ints keeps
+# the Node build well under its default heap cap; this build bails at the same point.
+MAX_TRACE_ELEMENTS = 120000000
+def shortest_edit(a, b, max_trace=None):
+    n, m = len(a), len(b)
+    mx = n + m
+    cap = max_trace or MAX_TRACE_ELEMENTS
+    trace = []
+    if mx == 0:
+        return trace, n, m
+    v = [0] * (2 * mx + 1)
+    for d in range(0, mx + 1):
+        # The trace holds one full V-snapshot (length 2*mx+1) per d-round, so worst-case
+        # memory is O(D*(n+m)); bail deterministically (identically in both builds) before it
+        # exhausts the heap. Triggered only by very dissimilar inputs (large edit distance).
+        if (d + 1) * (2 * mx + 1) > cap:
+            raise ValueError("input too large to diff (too many differences between the inputs)")
+        trace.append(v[:])
+        for k in range(-d, d + 1, 2):
+            if k == -d or (k != d and v[k - 1 + mx] < v[k + 1 + mx]):
+                x = v[k + 1 + mx]
+            else:
+                x = v[k - 1 + mx] + 1
+            y = x - k
+            while x < n and y < m and a[x] == b[y]:
+                x += 1
+                y += 1
+            v[k + mx] = x
+            if x >= n and y >= m:
+                return trace, n, m
+    return trace, n, m
+def diff_seq(a, b, max_trace=None):
+    n, m = len(a), len(b)
+    mx = n + m
+    trace, _, _ = shortest_edit(a, b, max_trace)
+    x, y = n, m
+    moves = []
+    for d in range(len(trace) - 1, -1, -1):
+        v = trace[d]
+        k = x - y
+        if k == -d or (k != d and v[k - 1 + mx] < v[k + 1 + mx]):
+            prev_k = k + 1
+        else:
+            prev_k = k - 1
+        prev_x = v[prev_k + mx]
+        prev_y = prev_x - prev_k
+        while x > prev_x and y > prev_y:
+            moves.append({"op": "keep", "a": x - 1, "b": y - 1})
+            x -= 1
+            y -= 1
+        if d > 0:
+            if x == prev_x:
+                moves.append({"op": "ins", "b": y - 1})
+            else:
+                moves.append({"op": "del", "a": x - 1})
+            x, y = prev_x, prev_y
+    moves.reverse()
+    return moves
+# ---- top-level comparison --------------------------------------------------
+def compare(a_str, b_str, opts=None):
+    opts = opts or {}
+    max_trace = opts.get("maxTrace")
+    a_toks, b_toks = tokenize(a_str), tokenize(b_str)
+    max_tokens = opts.get("maxTokens")
+    if max_tokens and len(a_toks) + len(b_toks) > max_tokens:
+        raise ValueError(f"input too large ({len(a_toks) + len(b_toks)} tokens > --max-tokens {max_tokens})")
+    moves = diff_seq([t["text"] for t in a_toks], [t["text"] for t in b_toks], max_trace)
+    tokens_added = tokens_removed = 0
+    del_lines, ins_lines = set(), set()
+    for mv in moves:
+        if mv["op"] == "del":
+            tokens_removed += 1
+            del_lines.add(a_toks[mv["a"]]["line"])
+        elif mv["op"] == "ins":
+            tokens_added += 1
+            ins_lines.add(b_toks[mv["b"]]["line"])
+    del_line_arr = sorted(del_lines)
+    ins_line_arr = sorted(ins_lines)
+    a_lines, b_lines = split_lines(a_str), split_lines(b_str)
+    git_would_show = sum(1 for mv in diff_seq(a_lines, b_lines, max_trace) if mv["op"] != "keep")
+    logical_lines_changed = len(del_line_arr) + len(ins_line_arr)
+    return {
+        "identical": a_str == b_str,
+        "formattingOnly": tokens_added == 0 and tokens_removed == 0,
+        "stats": {
+            "tokensAdded": tokens_added, "tokensRemoved": tokens_removed,
+            "logicalLinesChanged": logical_lines_changed,
+            "linesReflowed": max(0, git_would_show - logical_lines_changed),
+            "gitWouldShow": git_would_show,
+        },
+        "delLines": del_line_arr, "insLines": ins_line_arr,
+        "aLines": a_lines, "bLines": b_lines,
+    }
+# ---- rendering -------------------------------------------------------------
+PLAIN = {"red": lambda s: s, "green": lambda s: s, "yellow": lambda s: s,
+         "dim": lambda s: s, "bold": lambda s: s, "cyan": lambda s: s}
+def strip_cr(s):
+    return s[:-1] if s and s[-1] == "\r" else s
+def format_human(r, file_a, file_b, paint=None):
+    p = paint or PLAIN
+    # Output is written as latin-1 bytes (to round-trip file content faithfully),
+    # so all decoration here is ASCII-only.
+    if r["identical"]:
+        return p["dim"]("identical")
+    if r["formattingOnly"]:
+        n = r["stats"]["gitWouldShow"]
+        return p["green"]("only formatting differs") + p["dim"](
+            f" - no logical change (a line diff would show {n} changed line{'' if n == 1 else 's'})")
+    lines = [p["bold"]("--- " + file_a), p["bold"]("+++ " + file_b)]
+    for L in r["delLines"]:
+        lines.append(p["red"]("-" + str(L) + ": " + strip_cr(r["aLines"][L - 1])))
+    for L in r["insLines"]:
+        lines.append(p["green"]("+" + str(L) + ": " + strip_cr(r["bLines"][L - 1])))
+    lines.append("")
+    s = r["stats"]
+    summary = f"{s['tokensRemoved']} token{'' if s['tokensRemoved'] == 1 else 's'} removed, {s['tokensAdded']} added"
+    summary += f" across {s['logicalLinesChanged']} logical line{'' if s['logicalLinesChanged'] == 1 else 's'}"
+    if s["linesReflowed"] > 0:
+        summary += f" ({s['linesReflowed']} line{'' if s['linesReflowed'] == 1 else 's'} folded as reflow/whitespace)"
+    lines.append(p["red"](summary))
+    return "\n".join(lines)
+def stat_line(r):
+    s = r["stats"]
+    return "\n".join([
+        "formatting_only=" + ("1" if r["formattingOnly"] else "0"),
+        "tokens_added=" + str(s["tokensAdded"]),
+        "tokens_removed=" + str(s["tokensRemoved"]),
+        "logical_lines_changed=" + str(s["logicalLinesChanged"]),
+        "lines_reflowed=" + str(s["linesReflowed"]),
+        "git_would_show_lines=" + str(s["gitWouldShow"]),
+    ])
+def to_json(r, file_a, file_b):
+    s = r["stats"]
+    return {
+        "fileA": file_a, "fileB": file_b,
+        "identical": r["identical"],
+        "formatting_only": r["formattingOnly"],
+        "exit_code": 0 if (r["identical"] or r["formattingOnly"]) else 1,
+        "stats": {
+            "tokens_added": s["tokensAdded"],
+            "tokens_removed": s["tokensRemoved"],
+            "logical_lines_changed": s["logicalLinesChanged"],
+            "lines_reflowed": s["linesReflowed"],
+            "git_would_show_lines": s["gitWouldShow"],
+        },
+        "removed": [{"line": L, "text": strip_cr(r["aLines"][L - 1])} for L in r["delLines"]],
+        "added": [{"line": L, "text": strip_cr(r["bLines"][L - 1])} for L in r["insLines"]],
+    }

logicdiff-0.1.0/src/logicdiff.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,116 @@
+Metadata-Version: 2.4
+Name: logicdiff
+Version: 0.1.0
+Summary: A whitespace- and reflow-blind diff: folds respacing AND line re-wrapping that git diff -w can't, and tells you if a change is logical or just formatting. Zero dependencies.
+Author: yyfjj
+License: MIT
+Project-URL: Homepage, https://github.com/jjdoor/logicdiff-py
+Project-URL: Repository, https://github.com/jjdoor/logicdiff-py
+Project-URL: Issues, https://github.com/jjdoor/logicdiff-py/issues
+Keywords: diff,whitespace,reflow,format,git,code-review,cli,ci,devtools
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Topic :: Software Development :: Version Control
+Classifier: Topic :: Utilities
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Dynamic: license-file
+# logicdiff
+**A whitespace- and reflow-blind diff.** A pull request reindents a file and
+rewraps a few long lines, and now `git diff` shows 80 changed lines — but did
+anything *actually* change? `logicdiff` answers that: it folds away pure
+formatting (respacing **and** line reflow) and shows only the logical changes.
+```bash
+logicdiff old.js new.js
+#   only formatting differs - no logical change (a line diff would show 80 changed lines)
+logicdiff a.js b.js
+#   --- a.js
+#   +++ b.js
+#   -42:   const total = price * qty;
+#   +51:   const total = price + qty;
+#
+#   1 token removed, 1 added across 2 logical lines (78 lines folded as reflow/whitespace)
+```
+Exit `0` when the change is formatting-only (or identical), `1` when there's a
+real logical change — so CI can ask "is this PR just a reformat?" Zero
+dependencies, language-agnostic, also on npm (`npx logicdiff`) — the two
+builds produce **byte-for-byte identical** output.
+## Why not `git diff -w`?
+`git diff -w` (ignore-all-space) folds *respacing* — but it is still
+line-anchored, so it **cannot fold reflow**. Re-wrap a function signature across
+three lines and `git diff -w` still shows 1 removed + 3 added, even though not a
+single token changed. That exact gap is [GitHub discussion #20610]
+("Ignore Format Changes in Diff"), open and unanswered for years.
+`difftastic` solves it beautifully with per-language tree-sitter parsing — but
+it's a multi-megabyte binary, needs a grammar for each language (config/log/DSL
+files fall back to text), and it's a *display* tool with no "is this
+formatting-only?" exit code.
+`logicdiff` is the lightweight middle ground: **zero-config, zero-dependency,
+language-agnostic** (works on any text — code, YAML, logs, DSLs), folds *both*
+whitespace and reflow, and gives a one-shot CLI answer plus a CI exit code.
+## How it works
+It tokenizes each file into a sequence of tokens — a token is a run of
+`[A-Za-z0-9_]` or a single punctuation character, and **whitespace is dropped**.
+So `a+b`, `a + b`, and `a +\n  b` all become the same token stream `[a, + , b]`:
+respacing and line breaks become invisible. It then runs the canonical
+[Myers diff] on the token streams. If the streams are equal, the change is
+formatting-only. If not, the changed tokens are mapped back to their line
+numbers and shown.
+Because it has no language parser, whitespace **inside string literals** is also
+ignored — `x = "a b"` and `x = "a  b"` are "formatting only", exactly like
+`git diff -w`. That's a deliberate, documented limitation, not a bug.
+## Usage
+```bash
+logicdiff old new            # human diff (or "only formatting differs")
+logicdiff old new --stat     # just the counts, machine-friendly key=value
+logicdiff old new --json     # structured output (byte-identical both builds)
+logicdiff old new -q         # no output, exit code only (the CI gate)
+cat new | logicdiff old -     # - reads stdin
+```
+`--color=auto|always|never`, `--max-tokens N` (bail over N tokens, default 2,000,000).
+Two wildly dissimilar inputs (a huge edit distance) also bail with exit `2` instead of
+risking the heap — logicdiff is for spotting a real change inside a reformat, not for
+diffing unrelated files.
+Exit codes: `0` identical or formatting-only · `1` logical changes · `2` error.
+```yaml
+# CI: warn when a PR is more than a reformat
+- run: logicdiff "$BASE" "$HEAD" -q || echo "::warning::real code change, review carefully"
+```
+## Install
+```bash
+pip install logicdiff # or pipx run logicdiff
+npm i -g logicdiff    # Node build, identical behaviour
+```
+Python ≥ 3.8 or Node ≥ 18. No dependencies.
+[GitHub discussion #20610]: https://github.com/orgs/community/discussions/20610
+[Myers diff]: https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
+## License
+MIT

logicdiff-0.1.0/src/logicdiff.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,14 @@
+LICENSE
+README.md
+pyproject.toml
+src/logicdiff/__init__.py
+src/logicdiff/__main__.py
+src/logicdiff/cli.py
+src/logicdiff/core.py
+src/logicdiff.egg-info/PKG-INFO
+src/logicdiff.egg-info/SOURCES.txt
+src/logicdiff.egg-info/dependency_links.txt
+src/logicdiff.egg-info/entry_points.txt
+src/logicdiff.egg-info/top_level.txt
+tests/test_cli.py
+tests/test_core.py

logicdiff-0.1.0/src/logicdiff.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

logicdiff-0.1.0/src/logicdiff.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ logicdiff = logicdiff.cli:main

logicdiff-0.1.0/src/logicdiff.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ logicdiff

logicdiff-0.1.0/tests/test_cli.py ADDED Viewed

@@ -0,0 +1,31 @@
+import pytest
+from logicdiff import cli
+def run(args, capsys):
+    # cli.main() catches its own _Exit and returns the code, so no process is killed.
+    code = cli.main(list(args))
+    cap = capsys.readouterr()
+    return code, cap.out, cap.err
+# int() would leniently accept '1_000' (PEP 515); the Node build's parseInt would accept
+# '12abc'/'5.9'/'1e3'. Both builds must reject every one of these identically.
+@pytest.mark.parametrize("bad", ["12abc", "5.9", "1e3", "1_000", "-5", "0", "", "0x10"])
+def test_max_tokens_rejects(bad, capsys):
+    # The arg loop validates --max-tokens before reading files, so dummy names are fine.
+    code, out, err = run(["a", "b", "--max-tokens", bad], capsys)
+    assert code == 2
+    assert out == ""
+    assert "--max-tokens must be a positive integer" in err
+def test_max_tokens_accepts_plain_integer(tmp_path, capsys):
+    a = tmp_path / "a"
+    b = tmp_path / "b"
+    a.write_text("x = 1\n")
+    b.write_text("x = 2\n")
+    code, out, err = run([str(a), str(b), "--max-tokens", "100"], capsys)
+    assert code == 1  # a logical change
+    assert err == ""

logicdiff-0.1.0/tests/test_core.py ADDED Viewed

@@ -0,0 +1,111 @@
+import pytest
+from logicdiff import core
+def toks(s):
+    return [t["text"] for t in core.tokenize(s)]
+def test_tokenize_words_punct_ws_dropped():
+    assert toks("a+b") == ["a", "+", "b"]
+    assert toks("a  +  b") == ["a", "+", "b"]
+    assert toks("foo.bar(x)") == ["foo", ".", "bar", "(", "x", ")"]
+def test_tokenize_reflow_identical():
+    assert toks("foo(a,\n    b)") == toks("foo(a, b)")
+    assert toks("foo(a, b)") == ["foo", "(", "a", ",", "b", ")"]
+def test_tokenize_line_numbers_crlf_eq_lf():
+    t = core.tokenize("a\nb\r\nc\rd")
+    assert [(x["text"], x["line"]) for x in t] == [("a", 1), ("b", 2), ("c", 3), ("d", 3)]
+def test_tokenize_ascii_word_only():
+    assert toks("café") == ["caf", "é"]
+    assert toks("a_b1") == ["a_b1"]
+    assert toks("1_000") == ["1_000"]
+def test_diff_seq_roundtrip_ambiguous():
+    a = ["a", "b", "c", "a", "b", "b", "a"]
+    b = ["c", "b", "a", "b", "a", "c"]
+    moves = core.diff_seq(a, b)
+    assert sum(1 for m in moves if m["op"] == "keep") == 4
+    out = []
+    for m in moves:
+        if m["op"] == "keep":
+            out.append(a[m["a"]])
+        elif m["op"] == "ins":
+            out.append(b[m["b"]])
+    assert out == b
+def test_compare_whitespace_only():
+    r = core.compare("x=a+b\n", "x = a  +  b\n")
+    assert r["formattingOnly"] is True
+    assert r["identical"] is False
+def test_compare_reflow():
+    a = "function foo(a, b) {\n  return a + b;\n}\n"
+    b = "function foo(\n  a,\n  b\n) {\n  return a + b;\n}\n"
+    r = core.compare(a, b)
+    assert r["formattingOnly"] is True
+    assert r["stats"]["gitWouldShow"] > 0
+def test_compare_identical():
+    r = core.compare("a b c\n", "a b c\n")
+    assert r["identical"] is True
+    assert r["formattingOnly"] is True
+def test_compare_logical_change_buried():
+    a = "function foo(a, b) {\n  return a + b;\n}\n"
+    b = "function foo(\n  a,\n  b\n) {\n  return a - b;\n}\n"
+    r = core.compare(a, b)
+    assert r["formattingOnly"] is False
+    assert r["delLines"] == [2]
+    assert r["insLines"] == [5]
+    assert r["stats"]["linesReflowed"] > 0
+def test_compare_dedup_line():
+    r = core.compare('const x = "ab";\n', 'const x = "cd";\n')
+    assert r["delLines"] == [1]
+    assert r["insLines"] == [1]
+def test_compare_empty():
+    assert core.compare("", "a\n")["formattingOnly"] is False
+    assert core.compare("", "")["identical"] is True
+def test_crlf_lf_only():
+    assert core.compare("a\nb\n", "a\r\nb\r\n")["formattingOnly"] is True
+def test_max_tokens_guard():
+    with pytest.raises(ValueError, match="too large"):
+        core.compare("a b c d e", "x y z", {"maxTokens": 3})
+def test_trace_guard():
+    # maxTrace lowers the heap-protection cap so a tiny input trips it (prod default ~1.2e8).
+    with pytest.raises(ValueError, match="too many differences between the inputs"):
+        core.compare("a b c d e f", "u v w x y z", {"maxTrace": 50})
+def test_json_stat_shapes():
+    r = core.compare("a + b\n", "a - b\n")
+    j = core.to_json(r, "a.txt", "b.txt")
+    assert j["formatting_only"] is False
+    assert j["exit_code"] == 1
+    assert j["stats"]["tokens_added"] == 1
+    assert j["stats"]["tokens_removed"] == 1
+    assert j["removed"][0]["text"] == "a + b"
+    assert j["added"][0]["text"] == "a - b"
+    assert "formatting_only=0" in core.stat_line(r)