logicdiff 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 logicdiff contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,116 @@
1
+ Metadata-Version: 2.4
2
+ Name: logicdiff
3
+ Version: 0.1.0
4
+ Summary: A whitespace- and reflow-blind diff: folds respacing AND line re-wrapping that git diff -w can't, and tells you if a change is logical or just formatting. Zero dependencies.
5
+ Author: yyfjj
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/jjdoor/logicdiff-py
8
+ Project-URL: Repository, https://github.com/jjdoor/logicdiff-py
9
+ Project-URL: Issues, https://github.com/jjdoor/logicdiff-py/issues
10
+ Keywords: diff,whitespace,reflow,format,git,code-review,cli,ci,devtools
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Environment :: Console
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Topic :: Software Development :: Version Control
18
+ Classifier: Topic :: Utilities
19
+ Requires-Python: >=3.8
20
+ Description-Content-Type: text/markdown
21
+ License-File: LICENSE
22
+ Dynamic: license-file
23
+
24
+ # logicdiff
25
+
26
+ **A whitespace- and reflow-blind diff.** A pull request reindents a file and
27
+ rewraps a few long lines, and now `git diff` shows 80 changed lines — but did
28
+ anything *actually* change? `logicdiff` answers that: it folds away pure
29
+ formatting (respacing **and** line reflow) and shows only the logical changes.
30
+
31
+ ```bash
32
+ logicdiff old.js new.js
33
+ # only formatting differs - no logical change (a line diff would show 80 changed lines)
34
+
35
+ logicdiff a.js b.js
36
+ # --- a.js
37
+ # +++ b.js
38
+ # -42: const total = price * qty;
39
+ # +51: const total = price + qty;
40
+ #
41
+ # 1 token removed, 1 added across 2 logical lines (78 lines folded as reflow/whitespace)
42
+ ```
43
+
44
+ Exit `0` when the change is formatting-only (or identical), `1` when there's a
45
+ real logical change — so CI can ask "is this PR just a reformat?" Zero
46
+ dependencies, language-agnostic, also on npm (`npx logicdiff`) — the two
47
+ builds produce **byte-for-byte identical** output.
48
+
49
+ ## Why not `git diff -w`?
50
+
51
+ `git diff -w` (ignore-all-space) folds *respacing* — but it is still
52
+ line-anchored, so it **cannot fold reflow**. Re-wrap a function signature across
53
+ three lines and `git diff -w` still shows 1 removed + 3 added, even though not a
54
+ single token changed. That exact gap is [GitHub discussion #20610]
55
+ ("Ignore Format Changes in Diff"), open and unanswered for years.
56
+
57
+ `difftastic` solves it beautifully with per-language tree-sitter parsing — but
58
+ it's a multi-megabyte binary, needs a grammar for each language (config/log/DSL
59
+ files fall back to text), and it's a *display* tool with no "is this
60
+ formatting-only?" exit code.
61
+
62
+ `logicdiff` is the lightweight middle ground: **zero-config, zero-dependency,
63
+ language-agnostic** (works on any text — code, YAML, logs, DSLs), folds *both*
64
+ whitespace and reflow, and gives a one-shot CLI answer plus a CI exit code.
65
+
66
+ ## How it works
67
+
68
+ It tokenizes each file into a sequence of tokens — a token is a run of
69
+ `[A-Za-z0-9_]` or a single punctuation character, and **whitespace is dropped**.
70
+ So `a+b`, `a + b`, and `a +\n b` all become the same token stream `[a, + , b]`:
71
+ respacing and line breaks become invisible. It then runs the canonical
72
+ [Myers diff] on the token streams. If the streams are equal, the change is
73
+ formatting-only. If not, the changed tokens are mapped back to their line
74
+ numbers and shown.
75
+
76
+ Because it has no language parser, whitespace **inside string literals** is also
77
+ ignored — `x = "a b"` and `x = "a b"` are "formatting only", exactly like
78
+ `git diff -w`. That's a deliberate, documented limitation, not a bug.
79
+
80
+ ## Usage
81
+
82
+ ```bash
83
+ logicdiff old new # human diff (or "only formatting differs")
84
+ logicdiff old new --stat # just the counts, machine-friendly key=value
85
+ logicdiff old new --json # structured output (byte-identical both builds)
86
+ logicdiff old new -q # no output, exit code only (the CI gate)
87
+ cat new | logicdiff old - # - reads stdin
88
+ ```
89
+
90
+ `--color=auto|always|never`, `--max-tokens N` (bail over N tokens, default 2,000,000).
91
+ Two wildly dissimilar inputs (a huge edit distance) also bail with exit `2` instead of
92
+ risking the heap — logicdiff is for spotting a real change inside a reformat, not for
93
+ diffing unrelated files.
94
+
95
+ Exit codes: `0` identical or formatting-only · `1` logical changes · `2` error.
96
+
97
+ ```yaml
98
+ # CI: warn when a PR is more than a reformat
99
+ - run: logicdiff "$BASE" "$HEAD" -q || echo "::warning::real code change, review carefully"
100
+ ```
101
+
102
+ ## Install
103
+
104
+ ```bash
105
+ pip install logicdiff # or pipx run logicdiff
106
+ npm i -g logicdiff # Node build, identical behaviour
107
+ ```
108
+
109
+ Python ≥ 3.8 or Node ≥ 18. No dependencies.
110
+
111
+ [GitHub discussion #20610]: https://github.com/orgs/community/discussions/20610
112
+ [Myers diff]: https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
113
+
114
+ ## License
115
+
116
+ MIT
@@ -0,0 +1,93 @@
1
+ # logicdiff
2
+
3
+ **A whitespace- and reflow-blind diff.** A pull request reindents a file and
4
+ rewraps a few long lines, and now `git diff` shows 80 changed lines — but did
5
+ anything *actually* change? `logicdiff` answers that: it folds away pure
6
+ formatting (respacing **and** line reflow) and shows only the logical changes.
7
+
8
+ ```bash
9
+ logicdiff old.js new.js
10
+ # only formatting differs - no logical change (a line diff would show 80 changed lines)
11
+
12
+ logicdiff a.js b.js
13
+ # --- a.js
14
+ # +++ b.js
15
+ # -42: const total = price * qty;
16
+ # +51: const total = price + qty;
17
+ #
18
+ # 1 token removed, 1 added across 2 logical lines (78 lines folded as reflow/whitespace)
19
+ ```
20
+
21
+ Exit `0` when the change is formatting-only (or identical), `1` when there's a
22
+ real logical change — so CI can ask "is this PR just a reformat?" Zero
23
+ dependencies, language-agnostic, also on npm (`npx logicdiff`) — the two
24
+ builds produce **byte-for-byte identical** output.
25
+
26
+ ## Why not `git diff -w`?
27
+
28
+ `git diff -w` (ignore-all-space) folds *respacing* — but it is still
29
+ line-anchored, so it **cannot fold reflow**. Re-wrap a function signature across
30
+ three lines and `git diff -w` still shows 1 removed + 3 added, even though not a
31
+ single token changed. That exact gap is [GitHub discussion #20610]
32
+ ("Ignore Format Changes in Diff"), open and unanswered for years.
33
+
34
+ `difftastic` solves it beautifully with per-language tree-sitter parsing — but
35
+ it's a multi-megabyte binary, needs a grammar for each language (config/log/DSL
36
+ files fall back to text), and it's a *display* tool with no "is this
37
+ formatting-only?" exit code.
38
+
39
+ `logicdiff` is the lightweight middle ground: **zero-config, zero-dependency,
40
+ language-agnostic** (works on any text — code, YAML, logs, DSLs), folds *both*
41
+ whitespace and reflow, and gives a one-shot CLI answer plus a CI exit code.
42
+
43
+ ## How it works
44
+
45
+ It tokenizes each file into a sequence of tokens — a token is a run of
46
+ `[A-Za-z0-9_]` or a single punctuation character, and **whitespace is dropped**.
47
+ So `a+b`, `a + b`, and `a +\n b` all become the same token stream `[a, + , b]`:
48
+ respacing and line breaks become invisible. It then runs the canonical
49
+ [Myers diff] on the token streams. If the streams are equal, the change is
50
+ formatting-only. If not, the changed tokens are mapped back to their line
51
+ numbers and shown.
52
+
53
+ Because it has no language parser, whitespace **inside string literals** is also
54
+ ignored — `x = "a b"` and `x = "a b"` are "formatting only", exactly like
55
+ `git diff -w`. That's a deliberate, documented limitation, not a bug.
56
+
57
+ ## Usage
58
+
59
+ ```bash
60
+ logicdiff old new # human diff (or "only formatting differs")
61
+ logicdiff old new --stat # just the counts, machine-friendly key=value
62
+ logicdiff old new --json # structured output (byte-identical both builds)
63
+ logicdiff old new -q # no output, exit code only (the CI gate)
64
+ cat new | logicdiff old - # - reads stdin
65
+ ```
66
+
67
+ `--color=auto|always|never`, `--max-tokens N` (bail over N tokens, default 2,000,000).
68
+ Two wildly dissimilar inputs (a huge edit distance) also bail with exit `2` instead of
69
+ risking the heap — logicdiff is for spotting a real change inside a reformat, not for
70
+ diffing unrelated files.
71
+
72
+ Exit codes: `0` identical or formatting-only · `1` logical changes · `2` error.
73
+
74
+ ```yaml
75
+ # CI: warn when a PR is more than a reformat
76
+ - run: logicdiff "$BASE" "$HEAD" -q || echo "::warning::real code change, review carefully"
77
+ ```
78
+
79
+ ## Install
80
+
81
+ ```bash
82
+ pip install logicdiff # or pipx run logicdiff
83
+ npm i -g logicdiff # Node build, identical behaviour
84
+ ```
85
+
86
+ Python ≥ 3.8 or Node ≥ 18. No dependencies.
87
+
88
+ [GitHub discussion #20610]: https://github.com/orgs/community/discussions/20610
89
+ [Myers diff]: https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
90
+
91
+ ## License
92
+
93
+ MIT
@@ -0,0 +1,38 @@
1
+ [build-system]
2
+ requires = ["setuptools>=68"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "logicdiff"
7
+ version = "0.1.0"
8
+ description = "A whitespace- and reflow-blind diff: folds respacing AND line re-wrapping that git diff -w can't, and tells you if a change is logical or just formatting. Zero dependencies."
9
+ readme = "README.md"
10
+ requires-python = ">=3.8"
11
+ license = { text = "MIT" }
12
+ authors = [{ name = "yyfjj" }]
13
+ keywords = ["diff", "whitespace", "reflow", "format", "git", "code-review", "cli", "ci", "devtools"]
14
+ classifiers = [
15
+ "Development Status :: 4 - Beta",
16
+ "Environment :: Console",
17
+ "Intended Audience :: Developers",
18
+ "License :: OSI Approved :: MIT License",
19
+ "Operating System :: OS Independent",
20
+ "Programming Language :: Python :: 3",
21
+ "Topic :: Software Development :: Version Control",
22
+ "Topic :: Utilities",
23
+ ]
24
+ dependencies = []
25
+
26
+ [project.urls]
27
+ Homepage = "https://github.com/jjdoor/logicdiff-py"
28
+ Repository = "https://github.com/jjdoor/logicdiff-py"
29
+ Issues = "https://github.com/jjdoor/logicdiff-py/issues"
30
+
31
+ [project.scripts]
32
+ logicdiff = "logicdiff.cli:main"
33
+
34
+ [tool.setuptools]
35
+ package-dir = { "" = "src" }
36
+
37
+ [tool.setuptools.packages.find]
38
+ where = ["src"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,3 @@
1
+ """logicdiff — a whitespace- and reflow-blind diff. Zero dependencies."""
2
+
3
+ __version__ = "0.1.0"
@@ -0,0 +1,6 @@
1
+ import sys
2
+
3
+ from .cli import main
4
+
5
+ if __name__ == "__main__":
6
+ sys.exit(main())
@@ -0,0 +1,167 @@
1
+ import json
2
+ import os
3
+ import re
4
+ import signal
5
+ import sys
6
+
7
+ from . import __version__
8
+ from . import core
9
+
10
+ DEFAULT_MAX_TOKENS = 2000000
11
+
12
+
13
+ def _color_on(mode):
14
+ if mode == "always":
15
+ return True
16
+ if mode == "never":
17
+ return False
18
+ return sys.stdout.isatty() and not os.environ.get("NO_COLOR")
19
+
20
+
21
+ def _mk_paint(on):
22
+ def col(c, s):
23
+ return f"\x1b[{c}m{s}\x1b[0m" if on else s
24
+ return {
25
+ "red": lambda s: col("31", s), "green": lambda s: col("32", s),
26
+ "yellow": lambda s: col("33", s), "dim": lambda s: col("2", s),
27
+ "bold": lambda s: col("1", s), "cyan": lambda s: col("36", s),
28
+ }
29
+
30
+
31
+ def _help():
32
+ b = _mk_paint(_color_on("auto"))["bold"]
33
+ return (
34
+ f"{b('logicdiff')} — a whitespace- and reflow-blind diff. Zero dependencies.\n"
35
+ "\n"
36
+ "Diffs two files but folds away pure formatting — respacing AND line reflow — and\n"
37
+ "shows only the logical changes. Where `git diff -w` still flags a re-wrapped block,\n"
38
+ "logicdiff says \"only formatting differs\". Language-agnostic, no parser.\n"
39
+ "\n"
40
+ f"{b('Usage')}\n"
41
+ " logicdiff <old> <new> Compare two files (use - for stdin)\n"
42
+ "\n"
43
+ f"{b('Options')}\n"
44
+ " --stat Print only the counts (machine-friendly key=value)\n"
45
+ " --json Machine-readable output (byte-identical across Node and Python)\n"
46
+ " -q, --quiet No output; exit code only\n"
47
+ " --color=auto|always|never Colorize (default auto)\n"
48
+ f" --max-tokens N Bail (exit 2) over N tokens total (default {DEFAULT_MAX_TOKENS})\n"
49
+ " --help | --version\n"
50
+ "\n"
51
+ f"{b('Exit')} 0 identical or formatting-only · 1 logical changes · 2 error\n"
52
+ )
53
+
54
+
55
+ class _Exit(Exception):
56
+ def __init__(self, code):
57
+ self.code = code
58
+
59
+
60
+ def _out(s):
61
+ sys.stdout.buffer.write(s.encode("latin-1"))
62
+
63
+
64
+ def main(argv=None):
65
+ try:
66
+ signal.signal(signal.SIGPIPE, signal.SIG_DFL)
67
+ except (AttributeError, ValueError):
68
+ pass
69
+
70
+ argv = list(sys.argv[1:] if argv is None else argv)
71
+
72
+ def die(msg):
73
+ sys.stderr.write(_mk_paint(_color_on("auto"))["red"](f"logicdiff: {msg}\n"))
74
+ raise _Exit(2)
75
+
76
+ try:
77
+ dd_idx = argv.index("--") if "--" in argv else len(argv)
78
+ pre = argv[:dd_idx]
79
+ if "-h" in pre or "--help" in pre:
80
+ sys.stdout.write(_help())
81
+ return 0
82
+ if "-v" in pre or "--version" in pre:
83
+ sys.stdout.write(__version__ + "\n")
84
+ return 0
85
+
86
+ as_stat = as_json = quiet = False
87
+ color = "auto"
88
+ max_tokens = DEFAULT_MAX_TOKENS
89
+ files = []
90
+ dd = False
91
+ i = 0
92
+ while i < len(argv):
93
+ a = argv[i]
94
+ if dd:
95
+ files.append(a); i += 1; continue
96
+ if a == "--":
97
+ dd = True; i += 1; continue
98
+ eq = a.find("=") if a.startswith("--") else -1
99
+ flag = a[:eq] if eq != -1 else a
100
+ inline = a[eq + 1:] if eq != -1 else None
101
+ if flag == "--stat":
102
+ as_stat = True
103
+ elif flag == "--json":
104
+ as_json = True
105
+ elif a == "-q" or flag == "--quiet":
106
+ quiet = True
107
+ elif flag == "--color":
108
+ color = inline if inline is not None else (argv[i + 1] if i + 1 < len(argv) else "")
109
+ if inline is None:
110
+ i += 1
111
+ if color not in ("auto", "always", "never"):
112
+ die("--color must be auto, always or never")
113
+ elif flag == "--max-tokens":
114
+ val = inline if inline is not None else (argv[i + 1] if i + 1 < len(argv) else "")
115
+ if inline is None:
116
+ i += 1
117
+ # Strict integer grammar (optional leading +, decimal digits only) so this
118
+ # parses byte-identically to the Node build. fullmatch (not $) avoids Python's
119
+ # trailing-newline match; int() alone would accept PEP-515 '1_000'.
120
+ if not re.fullmatch(r"\+?[0-9]+", val):
121
+ die("--max-tokens must be a positive integer")
122
+ max_tokens = int(val)
123
+ if max_tokens <= 0:
124
+ die("--max-tokens must be a positive integer")
125
+ elif a in ("-h", "--help", "-v", "--version"):
126
+ pass
127
+ elif a.startswith("-") and a != "-":
128
+ die(f"unknown option: {a} (use -- to end options)")
129
+ else:
130
+ files.append(a)
131
+ i += 1
132
+
133
+ if len(files) != 2:
134
+ die(f"expected exactly 2 files, got {len(files)} (usage: logicdiff <old> <new>)")
135
+
136
+ def read_input(p):
137
+ try:
138
+ buf = sys.stdin.buffer.read() if p == "-" else open(p, "rb").read()
139
+ except OSError:
140
+ die(f"cannot read '{p}'") # OS-specific detail omitted for Node/Python parity
141
+ if b"\x00" in buf:
142
+ return None
143
+ return buf.decode("latin-1")
144
+
145
+ a_text = read_input(files[0])
146
+ b_text = read_input(files[1])
147
+ if a_text is None or b_text is None:
148
+ if not quiet:
149
+ sys.stderr.write("logicdiff: binary file, not diffed\n")
150
+ return 2
151
+
152
+ try:
153
+ result = core.compare(a_text, b_text, {"maxTokens": max_tokens})
154
+ except ValueError as e:
155
+ die(str(e))
156
+
157
+ paint = _mk_paint(_color_on(color) and not as_json and not as_stat)
158
+ if as_json:
159
+ _out(json.dumps(core.to_json(result, files[0], files[1]), indent=2, ensure_ascii=False) + "\n")
160
+ elif as_stat:
161
+ _out(core.stat_line(result) + "\n")
162
+ elif not quiet:
163
+ _out(core.format_human(result, files[0], files[1], paint) + "\n")
164
+
165
+ return 0 if (result["identical"] or result["formattingOnly"]) else 1
166
+ except _Exit as ex:
167
+ return ex.code
@@ -0,0 +1,225 @@
1
+ """logicdiff core — pure whitespace-and-reflow-blind diff. No fs, no clock.
2
+
3
+ The trick: diff the two files as sequences of TOKENS (words + single punctuation
4
+ chars), with whitespace dropped, so respacing AND line-reflow are invisible. If the
5
+ token sequences are equal, the files differ only in formatting. Token comparison is
6
+ on TEXT ONLY (line numbers are metadata) — that's what makes reflow fold.
7
+
8
+ Byte-for-byte identical to the Node build: everything runs on a latin-1
9
+ (byte-faithful) string, the tokenizer uses explicit ASCII classes (never \\w/\\s), and
10
+ the diff is the canonical Myers O(ND) algorithm with a pinned tie-break.
11
+ """
12
+
13
+ WS = "\x20\x09\x0a\x0d\x0c\x0b" # space, tab, LF, CR, FF, VT
14
+
15
+
16
+ def is_word(code):
17
+ return (65 <= code <= 90) or (97 <= code <= 122) or (48 <= code <= 57) or code == 95
18
+
19
+
20
+ def tokenize(s):
21
+ """[{text,line}]; word=[A-Za-z0-9_] run, else single non-ws char; ws dropped;
22
+ only '\\n' increments the 1-based line (lone '\\r' does not -> CRLF == LF)."""
23
+ toks = []
24
+ line = 1
25
+ i = 0
26
+ n = len(s)
27
+ while i < n:
28
+ ch = s[i]
29
+ if ch == "\n":
30
+ line += 1
31
+ i += 1
32
+ continue
33
+ if ch in WS:
34
+ i += 1
35
+ continue
36
+ if is_word(ord(ch)):
37
+ j = i + 1
38
+ while j < n and is_word(ord(s[j])):
39
+ j += 1
40
+ toks.append({"text": s[i:j], "line": line})
41
+ i = j
42
+ else:
43
+ toks.append({"text": ch, "line": line})
44
+ i += 1
45
+ return toks
46
+
47
+
48
+ def split_lines(s):
49
+ return s.split("\n")
50
+
51
+
52
+ # ---- canonical Myers O(ND) diff over string arrays -------------------------
53
+ # Coglan reference variant. Tie-break: prefer "down" (insertion) only when
54
+ # k == -d OR (k != d AND V[k-1] < V[k+1]) with strict <. Snapshot V before each
55
+ # d-round. Implemented identically in the Node build.
56
+
57
+ # Bail before the O(D*(n+m)) trace exhausts memory (D = edit distance). ~1.2e8 ints keeps
58
+ # the Node build well under its default heap cap; this build bails at the same point.
59
+ MAX_TRACE_ELEMENTS = 120000000
60
+
61
+
62
+ def shortest_edit(a, b, max_trace=None):
63
+ n, m = len(a), len(b)
64
+ mx = n + m
65
+ cap = max_trace or MAX_TRACE_ELEMENTS
66
+ trace = []
67
+ if mx == 0:
68
+ return trace, n, m
69
+ v = [0] * (2 * mx + 1)
70
+ for d in range(0, mx + 1):
71
+ # The trace holds one full V-snapshot (length 2*mx+1) per d-round, so worst-case
72
+ # memory is O(D*(n+m)); bail deterministically (identically in both builds) before it
73
+ # exhausts the heap. Triggered only by very dissimilar inputs (large edit distance).
74
+ if (d + 1) * (2 * mx + 1) > cap:
75
+ raise ValueError("input too large to diff (too many differences between the inputs)")
76
+ trace.append(v[:])
77
+ for k in range(-d, d + 1, 2):
78
+ if k == -d or (k != d and v[k - 1 + mx] < v[k + 1 + mx]):
79
+ x = v[k + 1 + mx]
80
+ else:
81
+ x = v[k - 1 + mx] + 1
82
+ y = x - k
83
+ while x < n and y < m and a[x] == b[y]:
84
+ x += 1
85
+ y += 1
86
+ v[k + mx] = x
87
+ if x >= n and y >= m:
88
+ return trace, n, m
89
+ return trace, n, m
90
+
91
+
92
+ def diff_seq(a, b, max_trace=None):
93
+ n, m = len(a), len(b)
94
+ mx = n + m
95
+ trace, _, _ = shortest_edit(a, b, max_trace)
96
+ x, y = n, m
97
+ moves = []
98
+ for d in range(len(trace) - 1, -1, -1):
99
+ v = trace[d]
100
+ k = x - y
101
+ if k == -d or (k != d and v[k - 1 + mx] < v[k + 1 + mx]):
102
+ prev_k = k + 1
103
+ else:
104
+ prev_k = k - 1
105
+ prev_x = v[prev_k + mx]
106
+ prev_y = prev_x - prev_k
107
+ while x > prev_x and y > prev_y:
108
+ moves.append({"op": "keep", "a": x - 1, "b": y - 1})
109
+ x -= 1
110
+ y -= 1
111
+ if d > 0:
112
+ if x == prev_x:
113
+ moves.append({"op": "ins", "b": y - 1})
114
+ else:
115
+ moves.append({"op": "del", "a": x - 1})
116
+ x, y = prev_x, prev_y
117
+ moves.reverse()
118
+ return moves
119
+
120
+
121
+ # ---- top-level comparison --------------------------------------------------
122
+
123
+ def compare(a_str, b_str, opts=None):
124
+ opts = opts or {}
125
+ max_trace = opts.get("maxTrace")
126
+ a_toks, b_toks = tokenize(a_str), tokenize(b_str)
127
+ max_tokens = opts.get("maxTokens")
128
+ if max_tokens and len(a_toks) + len(b_toks) > max_tokens:
129
+ raise ValueError(f"input too large ({len(a_toks) + len(b_toks)} tokens > --max-tokens {max_tokens})")
130
+ moves = diff_seq([t["text"] for t in a_toks], [t["text"] for t in b_toks], max_trace)
131
+
132
+ tokens_added = tokens_removed = 0
133
+ del_lines, ins_lines = set(), set()
134
+ for mv in moves:
135
+ if mv["op"] == "del":
136
+ tokens_removed += 1
137
+ del_lines.add(a_toks[mv["a"]]["line"])
138
+ elif mv["op"] == "ins":
139
+ tokens_added += 1
140
+ ins_lines.add(b_toks[mv["b"]]["line"])
141
+ del_line_arr = sorted(del_lines)
142
+ ins_line_arr = sorted(ins_lines)
143
+
144
+ a_lines, b_lines = split_lines(a_str), split_lines(b_str)
145
+ git_would_show = sum(1 for mv in diff_seq(a_lines, b_lines, max_trace) if mv["op"] != "keep")
146
+ logical_lines_changed = len(del_line_arr) + len(ins_line_arr)
147
+
148
+ return {
149
+ "identical": a_str == b_str,
150
+ "formattingOnly": tokens_added == 0 and tokens_removed == 0,
151
+ "stats": {
152
+ "tokensAdded": tokens_added, "tokensRemoved": tokens_removed,
153
+ "logicalLinesChanged": logical_lines_changed,
154
+ "linesReflowed": max(0, git_would_show - logical_lines_changed),
155
+ "gitWouldShow": git_would_show,
156
+ },
157
+ "delLines": del_line_arr, "insLines": ins_line_arr,
158
+ "aLines": a_lines, "bLines": b_lines,
159
+ }
160
+
161
+
162
+ # ---- rendering -------------------------------------------------------------
163
+
164
+ PLAIN = {"red": lambda s: s, "green": lambda s: s, "yellow": lambda s: s,
165
+ "dim": lambda s: s, "bold": lambda s: s, "cyan": lambda s: s}
166
+
167
+
168
+ def strip_cr(s):
169
+ return s[:-1] if s and s[-1] == "\r" else s
170
+
171
+
172
+ def format_human(r, file_a, file_b, paint=None):
173
+ p = paint or PLAIN
174
+ # Output is written as latin-1 bytes (to round-trip file content faithfully),
175
+ # so all decoration here is ASCII-only.
176
+ if r["identical"]:
177
+ return p["dim"]("identical")
178
+ if r["formattingOnly"]:
179
+ n = r["stats"]["gitWouldShow"]
180
+ return p["green"]("only formatting differs") + p["dim"](
181
+ f" - no logical change (a line diff would show {n} changed line{'' if n == 1 else 's'})")
182
+ lines = [p["bold"]("--- " + file_a), p["bold"]("+++ " + file_b)]
183
+ for L in r["delLines"]:
184
+ lines.append(p["red"]("-" + str(L) + ": " + strip_cr(r["aLines"][L - 1])))
185
+ for L in r["insLines"]:
186
+ lines.append(p["green"]("+" + str(L) + ": " + strip_cr(r["bLines"][L - 1])))
187
+ lines.append("")
188
+ s = r["stats"]
189
+ summary = f"{s['tokensRemoved']} token{'' if s['tokensRemoved'] == 1 else 's'} removed, {s['tokensAdded']} added"
190
+ summary += f" across {s['logicalLinesChanged']} logical line{'' if s['logicalLinesChanged'] == 1 else 's'}"
191
+ if s["linesReflowed"] > 0:
192
+ summary += f" ({s['linesReflowed']} line{'' if s['linesReflowed'] == 1 else 's'} folded as reflow/whitespace)"
193
+ lines.append(p["red"](summary))
194
+ return "\n".join(lines)
195
+
196
+
197
+ def stat_line(r):
198
+ s = r["stats"]
199
+ return "\n".join([
200
+ "formatting_only=" + ("1" if r["formattingOnly"] else "0"),
201
+ "tokens_added=" + str(s["tokensAdded"]),
202
+ "tokens_removed=" + str(s["tokensRemoved"]),
203
+ "logical_lines_changed=" + str(s["logicalLinesChanged"]),
204
+ "lines_reflowed=" + str(s["linesReflowed"]),
205
+ "git_would_show_lines=" + str(s["gitWouldShow"]),
206
+ ])
207
+
208
+
209
+ def to_json(r, file_a, file_b):
210
+ s = r["stats"]
211
+ return {
212
+ "fileA": file_a, "fileB": file_b,
213
+ "identical": r["identical"],
214
+ "formatting_only": r["formattingOnly"],
215
+ "exit_code": 0 if (r["identical"] or r["formattingOnly"]) else 1,
216
+ "stats": {
217
+ "tokens_added": s["tokensAdded"],
218
+ "tokens_removed": s["tokensRemoved"],
219
+ "logical_lines_changed": s["logicalLinesChanged"],
220
+ "lines_reflowed": s["linesReflowed"],
221
+ "git_would_show_lines": s["gitWouldShow"],
222
+ },
223
+ "removed": [{"line": L, "text": strip_cr(r["aLines"][L - 1])} for L in r["delLines"]],
224
+ "added": [{"line": L, "text": strip_cr(r["bLines"][L - 1])} for L in r["insLines"]],
225
+ }
@@ -0,0 +1,116 @@
1
+ Metadata-Version: 2.4
2
+ Name: logicdiff
3
+ Version: 0.1.0
4
+ Summary: A whitespace- and reflow-blind diff: folds respacing AND line re-wrapping that git diff -w can't, and tells you if a change is logical or just formatting. Zero dependencies.
5
+ Author: yyfjj
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/jjdoor/logicdiff-py
8
+ Project-URL: Repository, https://github.com/jjdoor/logicdiff-py
9
+ Project-URL: Issues, https://github.com/jjdoor/logicdiff-py/issues
10
+ Keywords: diff,whitespace,reflow,format,git,code-review,cli,ci,devtools
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Environment :: Console
13
+ Classifier: Intended Audience :: Developers
14
+ Classifier: License :: OSI Approved :: MIT License
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Topic :: Software Development :: Version Control
18
+ Classifier: Topic :: Utilities
19
+ Requires-Python: >=3.8
20
+ Description-Content-Type: text/markdown
21
+ License-File: LICENSE
22
+ Dynamic: license-file
23
+
24
+ # logicdiff
25
+
26
+ **A whitespace- and reflow-blind diff.** A pull request reindents a file and
27
+ rewraps a few long lines, and now `git diff` shows 80 changed lines — but did
28
+ anything *actually* change? `logicdiff` answers that: it folds away pure
29
+ formatting (respacing **and** line reflow) and shows only the logical changes.
30
+
31
+ ```bash
32
+ logicdiff old.js new.js
33
+ # only formatting differs - no logical change (a line diff would show 80 changed lines)
34
+
35
+ logicdiff a.js b.js
36
+ # --- a.js
37
+ # +++ b.js
38
+ # -42: const total = price * qty;
39
+ # +51: const total = price + qty;
40
+ #
41
+ # 1 token removed, 1 added across 2 logical lines (78 lines folded as reflow/whitespace)
42
+ ```
43
+
44
+ Exit `0` when the change is formatting-only (or identical), `1` when there's a
45
+ real logical change — so CI can ask "is this PR just a reformat?" Zero
46
+ dependencies, language-agnostic, also on npm (`npx logicdiff`) — the two
47
+ builds produce **byte-for-byte identical** output.
48
+
49
+ ## Why not `git diff -w`?
50
+
51
+ `git diff -w` (ignore-all-space) folds *respacing* — but it is still
52
+ line-anchored, so it **cannot fold reflow**. Re-wrap a function signature across
53
+ three lines and `git diff -w` still shows 1 removed + 3 added, even though not a
54
+ single token changed. That exact gap is [GitHub discussion #20610]
55
+ ("Ignore Format Changes in Diff"), open and unanswered for years.
56
+
57
+ `difftastic` solves it beautifully with per-language tree-sitter parsing — but
58
+ it's a multi-megabyte binary, needs a grammar for each language (config/log/DSL
59
+ files fall back to text), and it's a *display* tool with no "is this
60
+ formatting-only?" exit code.
61
+
62
+ `logicdiff` is the lightweight middle ground: **zero-config, zero-dependency,
63
+ language-agnostic** (works on any text — code, YAML, logs, DSLs), folds *both*
64
+ whitespace and reflow, and gives a one-shot CLI answer plus a CI exit code.
65
+
66
+ ## How it works
67
+
68
+ It tokenizes each file into a sequence of tokens — a token is a run of
69
+ `[A-Za-z0-9_]` or a single punctuation character, and **whitespace is dropped**.
70
+ So `a+b`, `a + b`, and `a +\n b` all become the same token stream `[a, + , b]`:
71
+ respacing and line breaks become invisible. It then runs the canonical
72
+ [Myers diff] on the token streams. If the streams are equal, the change is
73
+ formatting-only. If not, the changed tokens are mapped back to their line
74
+ numbers and shown.
75
+
76
+ Because it has no language parser, whitespace **inside string literals** is also
77
+ ignored — `x = "a b"` and `x = "a b"` are "formatting only", exactly like
78
+ `git diff -w`. That's a deliberate, documented limitation, not a bug.
79
+
80
+ ## Usage
81
+
82
+ ```bash
83
+ logicdiff old new # human diff (or "only formatting differs")
84
+ logicdiff old new --stat # just the counts, machine-friendly key=value
85
+ logicdiff old new --json # structured output (byte-identical both builds)
86
+ logicdiff old new -q # no output, exit code only (the CI gate)
87
+ cat new | logicdiff old - # - reads stdin
88
+ ```
89
+
90
+ `--color=auto|always|never`, `--max-tokens N` (bail over N tokens, default 2,000,000).
91
+ Two wildly dissimilar inputs (a huge edit distance) also bail with exit `2` instead of
92
+ risking the heap — logicdiff is for spotting a real change inside a reformat, not for
93
+ diffing unrelated files.
94
+
95
+ Exit codes: `0` identical or formatting-only · `1` logical changes · `2` error.
96
+
97
+ ```yaml
98
+ # CI: warn when a PR is more than a reformat
99
+ - run: logicdiff "$BASE" "$HEAD" -q || echo "::warning::real code change, review carefully"
100
+ ```
101
+
102
+ ## Install
103
+
104
+ ```bash
105
+ pip install logicdiff # or pipx run logicdiff
106
+ npm i -g logicdiff # Node build, identical behaviour
107
+ ```
108
+
109
+ Python ≥ 3.8 or Node ≥ 18. No dependencies.
110
+
111
+ [GitHub discussion #20610]: https://github.com/orgs/community/discussions/20610
112
+ [Myers diff]: https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
113
+
114
+ ## License
115
+
116
+ MIT
@@ -0,0 +1,14 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ src/logicdiff/__init__.py
5
+ src/logicdiff/__main__.py
6
+ src/logicdiff/cli.py
7
+ src/logicdiff/core.py
8
+ src/logicdiff.egg-info/PKG-INFO
9
+ src/logicdiff.egg-info/SOURCES.txt
10
+ src/logicdiff.egg-info/dependency_links.txt
11
+ src/logicdiff.egg-info/entry_points.txt
12
+ src/logicdiff.egg-info/top_level.txt
13
+ tests/test_cli.py
14
+ tests/test_core.py
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ logicdiff = logicdiff.cli:main
@@ -0,0 +1 @@
1
+ logicdiff
@@ -0,0 +1,31 @@
1
+ import pytest
2
+
3
+ from logicdiff import cli
4
+
5
+
6
+ def run(args, capsys):
7
+ # cli.main() catches its own _Exit and returns the code, so no process is killed.
8
+ code = cli.main(list(args))
9
+ cap = capsys.readouterr()
10
+ return code, cap.out, cap.err
11
+
12
+
13
+ # int() would leniently accept '1_000' (PEP 515); the Node build's parseInt would accept
14
+ # '12abc'/'5.9'/'1e3'. Both builds must reject every one of these identically.
15
+ @pytest.mark.parametrize("bad", ["12abc", "5.9", "1e3", "1_000", "-5", "0", "", "0x10"])
16
+ def test_max_tokens_rejects(bad, capsys):
17
+ # The arg loop validates --max-tokens before reading files, so dummy names are fine.
18
+ code, out, err = run(["a", "b", "--max-tokens", bad], capsys)
19
+ assert code == 2
20
+ assert out == ""
21
+ assert "--max-tokens must be a positive integer" in err
22
+
23
+
24
+ def test_max_tokens_accepts_plain_integer(tmp_path, capsys):
25
+ a = tmp_path / "a"
26
+ b = tmp_path / "b"
27
+ a.write_text("x = 1\n")
28
+ b.write_text("x = 2\n")
29
+ code, out, err = run([str(a), str(b), "--max-tokens", "100"], capsys)
30
+ assert code == 1 # a logical change
31
+ assert err == ""
@@ -0,0 +1,111 @@
1
+ import pytest
2
+
3
+ from logicdiff import core
4
+
5
+
6
+ def toks(s):
7
+ return [t["text"] for t in core.tokenize(s)]
8
+
9
+
10
+ def test_tokenize_words_punct_ws_dropped():
11
+ assert toks("a+b") == ["a", "+", "b"]
12
+ assert toks("a + b") == ["a", "+", "b"]
13
+ assert toks("foo.bar(x)") == ["foo", ".", "bar", "(", "x", ")"]
14
+
15
+
16
+ def test_tokenize_reflow_identical():
17
+ assert toks("foo(a,\n b)") == toks("foo(a, b)")
18
+ assert toks("foo(a, b)") == ["foo", "(", "a", ",", "b", ")"]
19
+
20
+
21
+ def test_tokenize_line_numbers_crlf_eq_lf():
22
+ t = core.tokenize("a\nb\r\nc\rd")
23
+ assert [(x["text"], x["line"]) for x in t] == [("a", 1), ("b", 2), ("c", 3), ("d", 3)]
24
+
25
+
26
+ def test_tokenize_ascii_word_only():
27
+ assert toks("café") == ["caf", "é"]
28
+ assert toks("a_b1") == ["a_b1"]
29
+ assert toks("1_000") == ["1_000"]
30
+
31
+
32
+ def test_diff_seq_roundtrip_ambiguous():
33
+ a = ["a", "b", "c", "a", "b", "b", "a"]
34
+ b = ["c", "b", "a", "b", "a", "c"]
35
+ moves = core.diff_seq(a, b)
36
+ assert sum(1 for m in moves if m["op"] == "keep") == 4
37
+ out = []
38
+ for m in moves:
39
+ if m["op"] == "keep":
40
+ out.append(a[m["a"]])
41
+ elif m["op"] == "ins":
42
+ out.append(b[m["b"]])
43
+ assert out == b
44
+
45
+
46
+ def test_compare_whitespace_only():
47
+ r = core.compare("x=a+b\n", "x = a + b\n")
48
+ assert r["formattingOnly"] is True
49
+ assert r["identical"] is False
50
+
51
+
52
+ def test_compare_reflow():
53
+ a = "function foo(a, b) {\n return a + b;\n}\n"
54
+ b = "function foo(\n a,\n b\n) {\n return a + b;\n}\n"
55
+ r = core.compare(a, b)
56
+ assert r["formattingOnly"] is True
57
+ assert r["stats"]["gitWouldShow"] > 0
58
+
59
+
60
+ def test_compare_identical():
61
+ r = core.compare("a b c\n", "a b c\n")
62
+ assert r["identical"] is True
63
+ assert r["formattingOnly"] is True
64
+
65
+
66
+ def test_compare_logical_change_buried():
67
+ a = "function foo(a, b) {\n return a + b;\n}\n"
68
+ b = "function foo(\n a,\n b\n) {\n return a - b;\n}\n"
69
+ r = core.compare(a, b)
70
+ assert r["formattingOnly"] is False
71
+ assert r["delLines"] == [2]
72
+ assert r["insLines"] == [5]
73
+ assert r["stats"]["linesReflowed"] > 0
74
+
75
+
76
+ def test_compare_dedup_line():
77
+ r = core.compare('const x = "ab";\n', 'const x = "cd";\n')
78
+ assert r["delLines"] == [1]
79
+ assert r["insLines"] == [1]
80
+
81
+
82
+ def test_compare_empty():
83
+ assert core.compare("", "a\n")["formattingOnly"] is False
84
+ assert core.compare("", "")["identical"] is True
85
+
86
+
87
+ def test_crlf_lf_only():
88
+ assert core.compare("a\nb\n", "a\r\nb\r\n")["formattingOnly"] is True
89
+
90
+
91
+ def test_max_tokens_guard():
92
+ with pytest.raises(ValueError, match="too large"):
93
+ core.compare("a b c d e", "x y z", {"maxTokens": 3})
94
+
95
+
96
+ def test_trace_guard():
97
+ # maxTrace lowers the heap-protection cap so a tiny input trips it (prod default ~1.2e8).
98
+ with pytest.raises(ValueError, match="too many differences between the inputs"):
99
+ core.compare("a b c d e f", "u v w x y z", {"maxTrace": 50})
100
+
101
+
102
+ def test_json_stat_shapes():
103
+ r = core.compare("a + b\n", "a - b\n")
104
+ j = core.to_json(r, "a.txt", "b.txt")
105
+ assert j["formatting_only"] is False
106
+ assert j["exit_code"] == 1
107
+ assert j["stats"]["tokens_added"] == 1
108
+ assert j["stats"]["tokens_removed"] == 1
109
+ assert j["removed"][0]["text"] == "a + b"
110
+ assert j["added"][0]["text"] == "a - b"
111
+ assert "formatting_only=0" in core.stat_line(r)