exhash 0.3.2__tar.gz → 0.3.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. {exhash-0.3.2 → exhash-0.3.3}/Cargo.lock +3 -3
  2. {exhash-0.3.2 → exhash-0.3.3}/Cargo.toml +1 -1
  3. {exhash-0.3.2 → exhash-0.3.3}/DEV.md +8 -2
  4. {exhash-0.3.2 → exhash-0.3.3}/PKG-INFO +51 -8
  5. {exhash-0.3.2 → exhash-0.3.3}/README.md +50 -7
  6. {exhash-0.3.2 → exhash-0.3.3}/pyproject.toml +3 -1
  7. exhash-0.3.3/python/exhash/__init__.py +337 -0
  8. exhash-0.3.3/python/exhash/skill.py +44 -0
  9. {exhash-0.3.2 → exhash-0.3.3}/src/bin/exhash.rs +50 -30
  10. {exhash-0.3.2 → exhash-0.3.3}/src/engine.rs +67 -44
  11. {exhash-0.3.2 → exhash-0.3.3}/src/lib.rs +6 -2
  12. {exhash-0.3.2 → exhash-0.3.3}/src/lnhash.rs +42 -10
  13. {exhash-0.3.2 → exhash-0.3.3}/src/parse.rs +77 -39
  14. {exhash-0.3.2 → exhash-0.3.3}/src/python.rs +31 -12
  15. {exhash-0.3.2 → exhash-0.3.3}/tests/cli.rs +67 -9
  16. {exhash-0.3.2 → exhash-0.3.3}/tests/test_exhash.py +111 -0
  17. exhash-0.3.2/python/exhash/__init__.py +0 -99
  18. {exhash-0.3.2 → exhash-0.3.3}/.github/workflows/ci.yml +0 -0
  19. {exhash-0.3.2 → exhash-0.3.3}/.gitignore +0 -0
  20. {exhash-0.3.2 → exhash-0.3.3}/_config.yml +0 -0
  21. {exhash-0.3.2 → exhash-0.3.3}/_layouts/default.html +0 -0
  22. {exhash-0.3.2 → exhash-0.3.3}/python/exhash.data/scripts/.gitkeep +0 -0
  23. {exhash-0.3.2 → exhash-0.3.3}/src/bin/lnhashview.rs +0 -0
  24. {exhash-0.3.2 → exhash-0.3.3}/tools/build.sh +0 -0
  25. {exhash-0.3.2 → exhash-0.3.3}/tools/bump.sh +0 -0
  26. {exhash-0.3.2 → exhash-0.3.3}/tools/bump2.sh +0 -0
  27. {exhash-0.3.2 → exhash-0.3.3}/tools/release.sh +0 -0
  28. {exhash-0.3.2 → exhash-0.3.3}/tools/test.sh +0 -0
@@ -25,7 +25,7 @@ checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
25
25
 
26
26
  [[package]]
27
27
  name = "exhash"
28
- version = "0.3.2"
28
+ version = "0.3.3"
29
29
  dependencies = [
30
30
  "pyo3",
31
31
  "regex",
@@ -48,9 +48,9 @@ dependencies = [
48
48
 
49
49
  [[package]]
50
50
  name = "libc"
51
- version = "0.2.185"
51
+ version = "0.2.186"
52
52
  source = "registry+https://github.com/rust-lang/crates.io-index"
53
- checksum = "52ff2c0fe9bc6cb6b14a0592c2ff4fa9ceb83eea9db979b0487cd054946a2b8f"
53
+ checksum = "68ab91017fe16c622486840e4c83c9a37afeff978bd239b5293d61ece587de66"
54
54
 
55
55
  [[package]]
56
56
  name = "memchr"
@@ -1,6 +1,6 @@
1
1
  [package]
2
2
  name = "exhash"
3
- version = "0.3.2"
3
+ version = "0.3.3"
4
4
  edition = "2021"
5
5
  license = "MIT OR Apache-2.0"
6
6
  description = "Verified line-addressed file editor using lnhash addresses"
@@ -18,7 +18,8 @@ src/
18
18
  bin/exhash.rs CLI editor (atomic in-place edit, dry-run, stdin mode)
19
19
  bin/lnhashview.rs CLI viewer
20
20
  python/exhash/
21
- __init__.py Python wrapper functions with typed/docstring API (+ exhash_result helper)
21
+ __init__.py Python wrapper functions plus file-aware exhash_file orchestration
22
+ skill.py pyskills entry point exposing exhash APIs for LLM tools
22
23
  python/exhash.data/scripts/
23
24
  exhash native binary (built, not checked in)
24
25
  lnhashview native binary (built, not checked in)
@@ -50,6 +51,9 @@ cargo test && pytest -q
50
51
  `edit_text` verifies lnhashes command-by-command against the current in-memory buffer, immediately before each command executes (not all upfront). If an earlier command shifts or rewrites a later target line, that later command will fail with a stale-hash error unless you recompute addresses.
51
52
  The `$` (last line) and `%` (whole file) address forms are resolved against the current buffer and do not require hashes.
52
53
  `edit_text_with_sw` exposes configurable shift width for `<` and `>`; `edit_text` defaults to `sw=4`.
54
+ In CLI and Python file-helper flows, a missing file is treated as empty input only when the parsed command set is valid against an empty buffer (for example `0|0000|a`); otherwise the original file-not-found error is preserved.
55
+ Python `exhash_file` adds the file-qualified orchestration layer. It parses optional `path:` prefixes, applies each command to the current in-memory buffer for that file, rejects cross-file source ranges, and writes changed files only after every command succeeds.
56
+ `lnhashview` range requests clamp `end` past EOF to the last available line, while invalid `start` values still error.
53
57
 
54
58
  ## Release
55
59
 
@@ -86,9 +90,11 @@ Maturin's `data` option in `pyproject.toml` points to `python/exhash.data/`. Fil
86
90
 
87
91
  The Rust core has three parsing functions:
88
92
 
89
- - `parse_commands_from_strs(&[&str])` — for the Python API; each string is one command, text blocks are the remaining lines (no `.` terminator; a trailing `.` line is literal text and the Python binding warns about this common mistake)
93
+ - `parse_commands_from_strs(&[&str])` — for the Python API; each string is one command, and multiline `a/i/c` text blocks must be in that same string using newlines, e.g. `["12|abcd|c\nnew line 1\nnew line 2"]`. Do not use `.` terminators or split the inserted text into separate command entries; a trailing `.` line is literal text and the Python binding warns about this common mistake.
90
94
  - `parse_commands_from_script(&str)` — for script strings; commands separated by newlines, text blocks terminated by `.`
91
95
  - `parse_commands_from_args(&[String], &mut BufRead)` — for the CLI; each arg is a command, text blocks read from stdin terminated by `.`
92
96
 
97
+ File-qualified addresses are parsed by the Python `exhash_file` wrapper; the Rust parser and CLI remain single-buffer.
98
+
93
99
  Substitute parsing keeps Rust regex escapes intact (`\d`, `\w`, etc.) while still allowing escaped command delimiters (`\/`) in pattern and replacement.
94
100
  Transliteration uses `y/src/dst/` and validates equal character counts at parse time.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: exhash
3
- Version: 0.3.2
3
+ Version: 0.3.3
4
4
  Classifier: Programming Language :: Rust
5
5
  Classifier: Programming Language :: Python :: Implementation :: CPython
6
6
  Summary: Verified line-addressed file editor using lnhash addresses
@@ -52,6 +52,8 @@ lnhashview path/to/file.txt
52
52
  lnhashview path/to/file.txt 10 20
53
53
  ```
54
54
 
55
+ If `end` is past EOF, `lnhashview` returns through the last available line instead of failing.
56
+
55
57
  ### Edit
56
58
 
57
59
  ```bash
@@ -80,6 +82,12 @@ exhash file.txt '%j'
80
82
 
81
83
  # Move a line to EOF using $ as the destination
82
84
  exhash file.txt '12|abcd|m$'
85
+
86
+ # Create a missing file by treating it as empty input
87
+ exhash new.txt '0|0000|a' <<'EOF'
88
+ first line
89
+ .
90
+ EOF
83
91
  ```
84
92
 
85
93
  Substitute uses Rust regex syntax:
@@ -99,6 +107,8 @@ For `a/i/c` commands, provide the text block on stdin:
99
107
  printf "new line 1\nnew line 2\n.\n" | exhash file.txt "2|beef|a"
100
108
  ```
101
109
 
110
+ If the file does not exist and the command set is valid on empty input, exhash treats it as an empty file and writes the result. For example, `0|0000|a` can create a new file.
111
+
102
112
  ### Stdin filter mode
103
113
 
104
114
  ```bash
@@ -117,13 +127,13 @@ from exhash import exhash, exhash_file, lnhash, lnhashview, lnhashview_file, lin
117
127
 
118
128
  ```py
119
129
  text = "foo\nbar\n"
120
- view = lnhashview(text) # ["1|a1b2| foo", "2|c3d4| bar"]
121
- view = lnhashview_file("f.py") # same but reads from file
130
+ view = lnhashview(text) # ["1|a1b2| foo", "2|c3d4| bar"]
131
+ view = lnhashview_file("f.py", start=1, end=260) # end past EOF is clamped
122
132
  ```
123
133
 
124
134
  ### Editing
125
135
 
126
- `exhash(text, cmds, sw=4)` takes the text and a required iterable of command strings (use `[]` for no-op). `sw` controls how far `<` and `>` shift. For `a`/`i`/`c` commands, lines after the command are the text block. Do not include an ex-style trailing `.` line here: unlike CLI/script mode, `exhash(text, cmds)` does not use one, and a final `.` line is inserted literally.
136
+ `exhash(text, cmds, sw=4)` takes the text and a required iterable of command strings (use `[]` for no-op). `sw` controls how far `<` and `>` shift. For multiline `a`/`i`/`c` commands, include the inserted text in the same command string using newline characters, e.g. `["12|abcd|c\nnew line 1\nnew line 2"]`. Do not use `.` terminators, and do not split the text block into separate `cmds` entries. If you include a final `.` line, it is inserted literally and exhash emits a warning.
127
137
 
128
138
  ```py
129
139
  addr = lnhash(1, "foo") # "1|a1b2|"
@@ -138,12 +148,15 @@ res = exhash(text, [f"{a1}s/foo/FOO/", f"{a2}s/bar/BAR/"])
138
148
  # Hashes are checked just-in-time per command.
139
149
  # If earlier commands change/shift a later target line, recompute lnhash first.
140
150
 
141
- # Append multiline text (no dot terminator)
151
+ # Append multiline text in the same command string (no dot terminator)
142
152
  res = exhash(text, [f"{addr}a\nnew line 1\nnew line 2"])
143
153
 
144
154
  # Wrong for the Python API: the trailing "." would be inserted literally
145
155
  # res = exhash(text, [f"{addr}a\nnew line 1\nnew line 2\n."])
146
156
 
157
+ # Also wrong: do not split the inserted text into separate cmds entries
158
+ # res = exhash(text, [f"{addr}a", "new line 1", "new line 2"])
159
+
147
160
  # Change shift width for < and >
148
161
  res = exhash(text, [f"{addr}>1"], sw=2)
149
162
 
@@ -157,18 +170,46 @@ res = exhash("foo\nbar\n", [f"{a1},{a2}s/foo\nbar/replaced/"])
157
170
 
158
171
  ### File helpers
159
172
 
160
- `exhash_file` and `lnhashview_file` read directly from a file path:
173
+ `lnhashview_file` reads directly from one file path. `exhash_file(path, cmds, sw=4, inplace=False)` uses `path` as the default file context for unqualified addresses, and also accepts file-qualified source and `m`/`t` destination addresses:
161
174
 
162
175
  ```py
163
176
  view = lnhashview_file("file.py")
164
177
 
165
- # Returns EditResult, file unchanged
178
+ # Returns FileSetEditResult, files unchanged
166
179
  res = exhash_file("file.py", [f"{addr}s/foo/bar/"])
180
+ print(res.changed) # ["file.py"]
181
+ print(res["file.py"].lines)
182
+ print(res.format_diff()) # includes --- file.py / +++ file.py headers
167
183
 
168
- # With inplace=True, writes back on success and returns diff string
184
+ # With inplace=True, writes changed files after every command succeeds
185
+ # and returns the combined diff string.
169
186
  diff = exhash_file("file.py", [f"{addr}s/foo/bar/"], inplace=True)
187
+
188
+ # Missing files are treated as empty only when the command is valid on empty input.
189
+ diff = exhash_file("new.py", ["0|0000|a\nprint('hi')"], inplace=True)
190
+
191
+ # File-qualified addresses can edit or transfer lines across files.
192
+ cmds = [
193
+ "src/a.py:24|8f12|,38|c0de|m src/b.py:$",
194
+ r"src/a.py:5|91aa|s/from \.b import old/from \.b import helper/",
195
+ ]
196
+ diff = exhash_file("src/a.py", cmds, inplace=True)
170
197
  ```
171
198
 
199
+ A file prefix is separated from the address with `:`. Escape literal colons in filenames as `\:` and literal backslashes as `\\`.
200
+
201
+ `exhash_file(..., inplace=False)` returns a `FileSetEditResult`:
202
+
203
+ - `res.files` — dict of path to `FileEditResult`
204
+ - `res.changed` — changed paths, in first-touch order
205
+ - `res.default_path` — the default path passed to `exhash_file`
206
+ - `res[path]` — shorthand for `res.files[path]`
207
+ - `res.format_diff(context=1)` — combined diff with `--- path` / `+++ path` headers
208
+
209
+ ### Pyskill
210
+
211
+ The package registers `exhash.skill` as a pyskill exposing the primary Python APIs with LLM-oriented workflow docs. Use `doc(exhash.skill)` after importing it through a pyskills host.
212
+
172
213
  ### EditResult
173
214
 
174
215
  `exhash()` returns an `EditResult` with attributes (also accessible via `res["key"]`):
@@ -184,6 +225,8 @@ diff = exhash_file("file.py", [f"{addr}s/foo/bar/"], inplace=True)
184
225
  ```py
185
226
  res = exhash(text, [f"{addr}s/foo/baz/"])
186
227
  print(res.format_diff())
228
+ # --- original
229
+ # +++ modified
187
230
  # -1|a1b2| foo
188
231
  # +1|c3d4| baz
189
232
  # 2|e5f6| bar
@@ -37,6 +37,8 @@ lnhashview path/to/file.txt
37
37
  lnhashview path/to/file.txt 10 20
38
38
  ```
39
39
 
40
+ If `end` is past EOF, `lnhashview` returns through the last available line instead of failing.
41
+
40
42
  ### Edit
41
43
 
42
44
  ```bash
@@ -65,6 +67,12 @@ exhash file.txt '%j'
65
67
 
66
68
  # Move a line to EOF using $ as the destination
67
69
  exhash file.txt '12|abcd|m$'
70
+
71
+ # Create a missing file by treating it as empty input
72
+ exhash new.txt '0|0000|a' <<'EOF'
73
+ first line
74
+ .
75
+ EOF
68
76
  ```
69
77
 
70
78
  Substitute uses Rust regex syntax:
@@ -84,6 +92,8 @@ For `a/i/c` commands, provide the text block on stdin:
84
92
  printf "new line 1\nnew line 2\n.\n" | exhash file.txt "2|beef|a"
85
93
  ```
86
94
 
95
+ If the file does not exist and the command set is valid on empty input, exhash treats it as an empty file and writes the result. For example, `0|0000|a` can create a new file.
96
+
87
97
  ### Stdin filter mode
88
98
 
89
99
  ```bash
@@ -102,13 +112,13 @@ from exhash import exhash, exhash_file, lnhash, lnhashview, lnhashview_file, lin
102
112
 
103
113
  ```py
104
114
  text = "foo\nbar\n"
105
- view = lnhashview(text) # ["1|a1b2| foo", "2|c3d4| bar"]
106
- view = lnhashview_file("f.py") # same but reads from file
115
+ view = lnhashview(text) # ["1|a1b2| foo", "2|c3d4| bar"]
116
+ view = lnhashview_file("f.py", start=1, end=260) # end past EOF is clamped
107
117
  ```
108
118
 
109
119
  ### Editing
110
120
 
111
- `exhash(text, cmds, sw=4)` takes the text and a required iterable of command strings (use `[]` for no-op). `sw` controls how far `<` and `>` shift. For `a`/`i`/`c` commands, lines after the command are the text block. Do not include an ex-style trailing `.` line here: unlike CLI/script mode, `exhash(text, cmds)` does not use one, and a final `.` line is inserted literally.
121
+ `exhash(text, cmds, sw=4)` takes the text and a required iterable of command strings (use `[]` for no-op). `sw` controls how far `<` and `>` shift. For multiline `a`/`i`/`c` commands, include the inserted text in the same command string using newline characters, e.g. `["12|abcd|c\nnew line 1\nnew line 2"]`. Do not use `.` terminators, and do not split the text block into separate `cmds` entries. If you include a final `.` line, it is inserted literally and exhash emits a warning.
112
122
 
113
123
  ```py
114
124
  addr = lnhash(1, "foo") # "1|a1b2|"
@@ -123,12 +133,15 @@ res = exhash(text, [f"{a1}s/foo/FOO/", f"{a2}s/bar/BAR/"])
123
133
  # Hashes are checked just-in-time per command.
124
134
  # If earlier commands change/shift a later target line, recompute lnhash first.
125
135
 
126
- # Append multiline text (no dot terminator)
136
+ # Append multiline text in the same command string (no dot terminator)
127
137
  res = exhash(text, [f"{addr}a\nnew line 1\nnew line 2"])
128
138
 
129
139
  # Wrong for the Python API: the trailing "." would be inserted literally
130
140
  # res = exhash(text, [f"{addr}a\nnew line 1\nnew line 2\n."])
131
141
 
142
+ # Also wrong: do not split the inserted text into separate cmds entries
143
+ # res = exhash(text, [f"{addr}a", "new line 1", "new line 2"])
144
+
132
145
  # Change shift width for < and >
133
146
  res = exhash(text, [f"{addr}>1"], sw=2)
134
147
 
@@ -142,18 +155,46 @@ res = exhash("foo\nbar\n", [f"{a1},{a2}s/foo\nbar/replaced/"])
142
155
 
143
156
  ### File helpers
144
157
 
145
- `exhash_file` and `lnhashview_file` read directly from a file path:
158
+ `lnhashview_file` reads directly from one file path. `exhash_file(path, cmds, sw=4, inplace=False)` uses `path` as the default file context for unqualified addresses, and also accepts file-qualified source and `m`/`t` destination addresses:
146
159
 
147
160
  ```py
148
161
  view = lnhashview_file("file.py")
149
162
 
150
- # Returns EditResult, file unchanged
163
+ # Returns FileSetEditResult, files unchanged
151
164
  res = exhash_file("file.py", [f"{addr}s/foo/bar/"])
165
+ print(res.changed) # ["file.py"]
166
+ print(res["file.py"].lines)
167
+ print(res.format_diff()) # includes --- file.py / +++ file.py headers
152
168
 
153
- # With inplace=True, writes back on success and returns diff string
169
+ # With inplace=True, writes changed files after every command succeeds
170
+ # and returns the combined diff string.
154
171
  diff = exhash_file("file.py", [f"{addr}s/foo/bar/"], inplace=True)
172
+
173
+ # Missing files are treated as empty only when the command is valid on empty input.
174
+ diff = exhash_file("new.py", ["0|0000|a\nprint('hi')"], inplace=True)
175
+
176
+ # File-qualified addresses can edit or transfer lines across files.
177
+ cmds = [
178
+ "src/a.py:24|8f12|,38|c0de|m src/b.py:$",
179
+ r"src/a.py:5|91aa|s/from \.b import old/from \.b import helper/",
180
+ ]
181
+ diff = exhash_file("src/a.py", cmds, inplace=True)
155
182
  ```
156
183
 
184
+ A file prefix is separated from the address with `:`. Escape literal colons in filenames as `\:` and literal backslashes as `\\`.
185
+
186
+ `exhash_file(..., inplace=False)` returns a `FileSetEditResult`:
187
+
188
+ - `res.files` — dict of path to `FileEditResult`
189
+ - `res.changed` — changed paths, in first-touch order
190
+ - `res.default_path` — the default path passed to `exhash_file`
191
+ - `res[path]` — shorthand for `res.files[path]`
192
+ - `res.format_diff(context=1)` — combined diff with `--- path` / `+++ path` headers
193
+
194
+ ### Pyskill
195
+
196
+ The package registers `exhash.skill` as a pyskill exposing the primary Python APIs with LLM-oriented workflow docs. Use `doc(exhash.skill)` after importing it through a pyskills host.
197
+
157
198
  ### EditResult
158
199
 
159
200
  `exhash()` returns an `EditResult` with attributes (also accessible via `res["key"]`):
@@ -169,6 +210,8 @@ diff = exhash_file("file.py", [f"{addr}s/foo/bar/"], inplace=True)
169
210
  ```py
170
211
  res = exhash(text, [f"{addr}s/foo/baz/"])
171
212
  print(res.format_diff())
213
+ # --- original
214
+ # +++ modified
172
215
  # -1|a1b2| foo
173
216
  # +1|c3d4| baz
174
217
  # 2|e5f6| bar
@@ -4,7 +4,7 @@ build-backend = "maturin"
4
4
 
5
5
  [project]
6
6
  name = "exhash"
7
- version = "0.3.2"
7
+ version = "0.3.3"
8
8
  description = "Verified line-addressed file editor using lnhash addresses"
9
9
  license = {text = "MIT OR Apache-2.0"}
10
10
  requires-python = ">=3.10"
@@ -19,6 +19,8 @@ classifiers = [
19
19
  Homepage = "https://github.com/AnswerDotAI/exhash"
20
20
  Repository = "https://github.com/AnswerDotAI/exhash"
21
21
  Issues = "https://github.com/AnswerDotAI/exhash/issues"
22
+ [project.entry-points.pyskills]
23
+ exhash = "exhash.skill"
22
24
 
23
25
  [tool.maturin]
24
26
  features = ["extension-module"]
@@ -0,0 +1,337 @@
1
+ import re
2
+ from difflib import SequenceMatcher
3
+ from pathlib import Path
4
+ from .exhash import line_hash as _line_hash, lnhash as _lnhash, lnhashview as _lnhashview, exhash as _exhash
5
+
6
+ def line_hash(line:str) -> str:
7
+ 'Return a 4-char lowercase hex hash for a single line of text.'
8
+ return _line_hash(line)
9
+
10
+
11
+ def lnhash(lineno:int, line:str) -> str:
12
+ 'Return an lnhash address ``lineno|hash|`` for ``line`` at 1-based ``lineno``.'
13
+ return _lnhash(lineno, line)
14
+
15
+
16
+ def lnhashview(text:str, start:int=None, end:int=None) -> list[str]:
17
+ 'Return lines formatted as ``lineno|hash| content``. Optional 1-based ``start``/``end`` filter the range; ``end`` past EOF is clamped.'
18
+ return _lnhashview(text, start, end)
19
+
20
+
21
+ def lnhashview_file(path:str, start:int=None, end:int=None) -> list[str]:
22
+ 'Return lines formatted as ``lineno|hash| content`` for file at ``path``. Optional 1-based ``start``/``end`` filter the range; ``end`` past EOF is clamped.'
23
+ return _lnhashview(Path(path).read_text(), start, end)
24
+
25
+
26
+ def exhash(text:str, cmds:list[str], sw:int=4):
27
+ """Verified line-addressed editor. Apply commands to `text`, return an EditResult.
28
+
29
+ Commands primarily use lnhash addresses: ``lineno|hash|cmd`` where hash is
30
+ a 4-char hex content hash. Use ``lnhashview(text)`` or
31
+ ``lnhash(lineno, line)`` to get hashed addresses.
32
+ Each command's hashes are verified against current text immediately before
33
+ that command executes.
34
+
35
+ Addressing:
36
+ Single: ``12|a3f2|cmd``
37
+ Range: ``12|a3f2|,15|b1c3|cmd``
38
+ Last: ``$cmd`` (last line)
39
+ Whole: ``%cmd`` (whole file, same as ``1,$``)
40
+ Special: ``0|0000|`` targets before line 1 (only with a or i)
41
+
42
+ Commands:
43
+ s/pat/rep/[flags] Substitute using Rust regex syntax.
44
+ Replacement supports $1, $0, ${name}. Flags: g=all, i=case-insensitive
45
+ Any non-alphanumeric delimiter works: s@pat@rep@, s|pat|rep|g
46
+ Literal newlines in pat/rep are supported (joins/splits lines)
47
+ y/src/dst/ Transliterate chars in-place (also supports custom delimiters;
48
+ source and destination lengths must match)
49
+ d Delete line(s)
50
+ a Append text after line
51
+ i Insert text before line
52
+ c Change/replace line(s)
53
+ j Join with next line; with range, joins all
54
+ m dest Move line(s) after dest address
55
+ t dest Copy line(s) after dest address
56
+ >[n] Indent n levels (default 1, `sw` spaces each)
57
+ <[n] Dedent n levels (default 1, `sw` spaces each)
58
+ sort Sort lines alphabetically
59
+ p Print (include in output without changing)
60
+ g/pat/cmd Global: run cmd on matching lines (custom delimiters ok: g@pat@cmd)
61
+ g!/pat/cmd Inverted global (also v/pat/cmd; custom delimiters ok)
62
+
63
+ `sw` controls shift width for `<` and `>` and defaults to 4.
64
+
65
+ For multiline a/i/c commands, include the inserted text in the same command
66
+ string using newline characters, e.g. ``["12|abcd|c\nnew line 1\nnew line 2"]``.
67
+ Do not use ``.`` terminators, and do not split the text block into separate
68
+ ``cmds`` entries. If you include a final ``.`` line, it is inserted literally
69
+ and exhash emits a warning.
70
+
71
+ Returns an EditResult with attributes (also accessible as dict keys):
72
+ lines list of output lines
73
+ hashes lnhash for each output line
74
+ modified 1-based line numbers of modified/added lines
75
+ deleted 1-based line numbers of removed lines (in original)
76
+ origins for each output line, the 1-based original line number (None if inserted)
77
+
78
+ Call ``res.format_diff(context=1)`` for a unified-diff-style summary.
79
+ Non-empty diffs start with ``--- original`` and ``+++ modified`` headers.
80
+
81
+ Examples::
82
+
83
+ from exhash import exhash, lnhash, lnhashview
84
+ text = "foo\\nbar\\n"
85
+ addr = lnhash(1, "foo") # "1|a1b2|"
86
+ res = exhash(text, [f"{addr}s/foo/baz/"])
87
+ print(res["lines"]) # ["baz", "bar"]
88
+ print(res.format_diff()) # unified-diff-style summary
89
+ """
90
+ return _exhash(text, *cmds, sw=sw)
91
+
92
+
93
+ class FileEditResult:
94
+ 'Edited state for one file.'
95
+ def __init__(self, path, original_lines, lines):
96
+ self.path = _norm_path(path)
97
+ self.original_lines = list(original_lines)
98
+ self.lines = list(lines)
99
+ self.hashes = [lnhash(i + 1, line) for i, line in enumerate(self.lines)]
100
+
101
+ @property
102
+ def changed(self): return self.original_lines != self.lines
103
+
104
+ def __getitem__(self, key):
105
+ if key in {"lines", "hashes", "original_lines"}: return getattr(self, key)
106
+ raise KeyError(key)
107
+
108
+ def format_diff(self, context=1): return _format_file_diff(self.path, self.original_lines, self.lines, context)
109
+
110
+
111
+ class FileSetEditResult:
112
+ 'Edited state for an exhash_file command set.'
113
+ def __init__(self, files, default_path):
114
+ self.files = files
115
+ self.default_path = default_path
116
+ self.changed = [path for path, result in files.items() if result.changed]
117
+
118
+ def __getitem__(self, path): return self.files[_norm_path(path)]
119
+
120
+ def format_diff(self, context=1): return ''.join(self.files[path].format_diff(context) for path in self.changed)
121
+
122
+
123
+ _ADDR_RE = re.compile(r'(?:\$|%|\d+\|[0-9a-fA-F]{4}\|)')
124
+ _LNHASH_RE = re.compile(r'(\d+)\|([0-9a-fA-F]{4})\|')
125
+
126
+
127
+ def _norm_path(path): return str(Path(path))
128
+
129
+
130
+ def _text_from_lines(lines): return '\n'.join(lines) + ('\n' if lines else '')
131
+
132
+
133
+ def _write_lines(path, lines): Path(path).write_text(_text_from_lines(lines))
134
+
135
+
136
+ def _format_file_diff(path, old_lines, new_lines, context=1):
137
+ if old_lines == new_lines: return ''
138
+ events = []
139
+ for tag, i1, i2, j1, j2 in SequenceMatcher(a=old_lines, b=new_lines, autojunk=False).get_opcodes():
140
+ if tag == 'equal': events += [(' ', j + 1, new_lines[j]) for j in range(j1, j2)]
141
+ elif tag == 'delete': events += [('-', i + 1, old_lines[i]) for i in range(i1, i2)]
142
+ elif tag == 'insert': events += [('+', j + 1, new_lines[j]) for j in range(j1, j2)]
143
+ elif tag == 'replace':
144
+ events += [('-', i + 1, old_lines[i]) for i in range(i1, i2)]
145
+ events += [('+', j + 1, new_lines[j]) for j in range(j1, j2)]
146
+ interesting = set()
147
+ for i, (tag, _, _) in enumerate(events):
148
+ if tag != ' ': interesting.update(range(max(0, i - context), min(len(events), i + context + 1)))
149
+ out, last = [f'--- {path}', f'+++ {path}'], None
150
+ for i in sorted(interesting):
151
+ if last is not None and i > last + 1: out.append('---')
152
+ tag, lineno, line = events[i]
153
+ out.append(f'{tag}{lnhash(lineno, line)} {line}')
154
+ last = i
155
+ return '\n'.join(out) + '\n'
156
+
157
+
158
+ def _unescape_path(path):
159
+ out, escaped = [], False
160
+ for ch in path:
161
+ if escaped:
162
+ if ch not in ':\\': out.append('\\')
163
+ out.append(ch)
164
+ escaped = False
165
+ elif ch == '\\': escaped = True
166
+ else: out.append(ch)
167
+ if escaped: out.append('\\')
168
+ return ''.join(out)
169
+
170
+
171
+ def _split_file_prefix(s):
172
+ if _ADDR_RE.match(s): return None, s
173
+ escaped = False
174
+ for i, ch in enumerate(s):
175
+ if escaped:
176
+ escaped = False
177
+ continue
178
+ if ch == '\\':
179
+ escaped = True
180
+ continue
181
+ if ch == ':' and _ADDR_RE.match(s[i + 1:]):
182
+ path = _unescape_path(s[:i])
183
+ if not path: raise ValueError('empty filename prefix')
184
+ return _norm_path(path), s[i + 1:]
185
+ return None, s
186
+
187
+
188
+ def _parse_fileaddr(s, default_path):
189
+ path, rest = _split_file_prefix(s)
190
+ path = path or default_path
191
+ m = _ADDR_RE.match(rest)
192
+ if not m: raise ValueError(f'expected exhash address near {s[:40]!r}')
193
+ return path, m.group(0), rest[m.end():]
194
+
195
+
196
+ def _parse_file_command(raw, default_path):
197
+ if not raw.strip(): return None
198
+ src, addr1, rest = _parse_fileaddr(raw.lstrip(), default_path)
199
+ has_comma, addr2, local = False, None, addr1
200
+ if rest.startswith(','):
201
+ has_comma = True
202
+ src2, addr2, rest = _parse_fileaddr(rest[1:], src)
203
+ if src2 != src: raise ValueError('cross-file ranges are invalid')
204
+ local += ',' + addr2
205
+ local += rest
206
+ body = rest.lstrip()
207
+ op = body[:1] if body[:1] in {'m', 't'} else None
208
+ dest = dest_addr = None
209
+ if op:
210
+ dest, dest_addr, tail = _parse_fileaddr(body[1:].strip(), src)
211
+ if tail.strip(): raise ValueError(f'unexpected trailing characters after destination: {tail!r}')
212
+ return dict(src=src, addr1=addr1, addr2=addr2, has_comma=has_comma, rest=rest, local=local, op=op, dest=dest, dest_addr=dest_addr)
213
+
214
+
215
+ def _load_buffer(buffers, path, missing_ok=False):
216
+ if path in buffers: return buffers[path]
217
+ p = Path(path)
218
+ try: lines = p.read_text().splitlines()
219
+ except FileNotFoundError:
220
+ if not missing_ok: raise
221
+ if not p.parent.exists(): raise
222
+ lines = []
223
+ buffers[path] = dict(path=path, original=list(lines), lines=list(lines))
224
+ return buffers[path]
225
+
226
+
227
+ def _can_create_missing(parsed): return parsed['addr1'] == '0|0000|' and parsed['rest'].lstrip()[:1] in {'a', 'i'}
228
+
229
+
230
+ def _split_lnhash_addr(addr):
231
+ m = _LNHASH_RE.fullmatch(addr)
232
+ if not m: raise ValueError(f'expected lnhash address, got {addr!r}')
233
+ return int(m.group(1)), m.group(2).lower()
234
+
235
+
236
+ def _line_no(lines, addr, allow_zero=False):
237
+ if addr == '$':
238
+ if not lines: raise ValueError("address '$' out of range on empty file")
239
+ return len(lines)
240
+ if addr == '%': raise ValueError('% is only allowed as a source range')
241
+ lineno, expected = _split_lnhash_addr(addr)
242
+ if lineno == 0:
243
+ if expected != '0000': raise ValueError('0|0000| must have hash 0000')
244
+ if allow_zero: return 0
245
+ raise ValueError('address 0 is not allowed here')
246
+ if lineno > len(lines): raise ValueError(f'address out of range: {lineno} > {len(lines)}')
247
+ actual = line_hash(lines[lineno - 1])
248
+ if actual != expected: raise ValueError(f'stale lnhash at line {lineno}: expected {expected}, got {actual}')
249
+ return lineno
250
+
251
+
252
+ def _source_indexes(lines, parsed):
253
+ if parsed['addr1'] == '%':
254
+ if parsed['has_comma'] or parsed['addr2'] is not None: raise ValueError('% is already a whole-file range')
255
+ return (0, len(lines) - 1) if lines else (0, -1)
256
+ start = _line_no(lines, parsed['addr1'])
257
+ end = _line_no(lines, parsed['addr2']) if parsed['addr2'] is not None else start
258
+ if start > end: raise ValueError(f'invalid range: {start}..{end}')
259
+ return start - 1, end - 1
260
+
261
+
262
+ def _dest_index(lines, addr):
263
+ if addr == '%': raise ValueError('destination % is not allowed')
264
+ return _line_no(lines, addr, allow_zero=True)
265
+
266
+
267
+ def _apply_transfer(buffers, parsed):
268
+ src = _load_buffer(buffers, parsed['src'])
269
+ dst = _load_buffer(buffers, parsed['dest'], missing_ok=parsed['dest_addr'] == '0|0000|')
270
+ s, e = _source_indexes(src['lines'], parsed)
271
+ dest = _dest_index(dst['lines'], parsed['dest_addr'])
272
+ segment = src['lines'][s:e + 1] if s <= e else []
273
+ if parsed['op'] == 't':
274
+ dst['lines'][dest:dest] = list(segment)
275
+ return
276
+ if src is dst:
277
+ if s <= e and s < dest <= e + 1: raise ValueError('destination is within moved range')
278
+ del src['lines'][s:e + 1]
279
+ insert_at = dest if dest <= s else dest - len(segment)
280
+ src['lines'][insert_at:insert_at] = segment
281
+ else:
282
+ del src['lines'][s:e + 1]
283
+ dst['lines'][dest:dest] = segment
284
+
285
+
286
+ def _apply_file_command(buffers, parsed, sw):
287
+ if parsed['op']:
288
+ _apply_transfer(buffers, parsed)
289
+ return
290
+ buf = _load_buffer(buffers, parsed['src'], missing_ok=_can_create_missing(parsed))
291
+ res = exhash(_text_from_lines(buf['lines']), [parsed['local']], sw=sw)
292
+ buf['lines'] = list(res['lines'])
293
+
294
+
295
+ def exhash_file(path:str, cmds:list[str], sw:int=4, inplace:bool=False):
296
+ r'''Read files, apply file-aware exhash commands, and return per-file results or a combined diff.
297
+
298
+ Core command syntax is the same as ``exhash(text, cmds, sw=sw)``; run
299
+ ``doc(exhash)`` for the full command reference. Use ``path`` as the default
300
+ file context for unqualified addresses. Prefix any source address, and any
301
+ ``m``/``t`` destination, with ``path:`` to target another file::
302
+
303
+ src/a.py:12|a3f2|s/foo/bar/
304
+ src/a.py:10|aaaa|,20|bbbb|m src/b.py:$
305
+ src/a.py:10|aaaa|t new.py:0|0000|
306
+
307
+ A range must stay within one file. The second address may omit the filename
308
+ and inherit it from the first address. Cross-file ranges are invalid. Escape
309
+ literal colons in filenames as ``\:`` and literal backslashes as ``\\\\``.
310
+
311
+ For multiline ``a``/``i``/``c`` commands, include the inserted text in the
312
+ same command string using newline characters. Do not use ``.`` terminators,
313
+ and do not split the text block into separate ``cmds`` entries.
314
+
315
+ Missing files are treated as empty only when the command is valid against an
316
+ empty buffer, such as ``0|0000|a``/``0|0000|i`` or an ``m``/``t`` destination
317
+ of ``0|0000|``.
318
+
319
+ With ``inplace=False``, return a ``FileSetEditResult`` with ``files``,
320
+ ``changed``, ``default_path``, ``res[path]``, and
321
+ ``res.format_diff(context=1)``. With ``inplace=True``, write changed files
322
+ only after every command succeeds and return the combined diff string. If
323
+ any command fails, write nothing.
324
+ '''
325
+ default_path, buffers = _norm_path(path), {}
326
+ for raw in cmds:
327
+ parsed = _parse_file_command(raw, default_path)
328
+ if parsed is not None: _apply_file_command(buffers, parsed, sw)
329
+ if not buffers: _load_buffer(buffers, default_path)
330
+ files = {path: FileEditResult(path, buf['original'], buf['lines']) for path, buf in buffers.items()}
331
+ result = FileSetEditResult(files, default_path)
332
+ if inplace:
333
+ for path in result.changed: _write_lines(path, result[path].lines)
334
+ return result.format_diff()
335
+ return result
336
+
337
+