llm-commit-helper 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. llm_commit_helper-0.1.0/PKG-INFO +287 -0
  2. llm_commit_helper-0.1.0/README.md +268 -0
  3. llm_commit_helper-0.1.0/llm_commit_helper/__init__.py +7 -0
  4. llm_commit_helper-0.1.0/llm_commit_helper/__main__.py +10 -0
  5. llm_commit_helper-0.1.0/llm_commit_helper/cli.py +214 -0
  6. llm_commit_helper-0.1.0/llm_commit_helper/config.py +132 -0
  7. llm_commit_helper-0.1.0/llm_commit_helper/diff_engine.py +91 -0
  8. llm_commit_helper-0.1.0/llm_commit_helper/formatters/__init__.py +35 -0
  9. llm_commit_helper-0.1.0/llm_commit_helper/formatters/generic_fmt.py +47 -0
  10. llm_commit_helper-0.1.0/llm_commit_helper/formatters/python_fmt.py +81 -0
  11. llm_commit_helper-0.1.0/llm_commit_helper/formatters/verilog_fmt.py +116 -0
  12. llm_commit_helper-0.1.0/llm_commit_helper/git_staged.py +244 -0
  13. llm_commit_helper-0.1.0/llm_commit_helper/output.py +83 -0
  14. llm_commit_helper-0.1.0/llm_commit_helper/submodule.py +96 -0
  15. llm_commit_helper-0.1.0/llm_commit_helper/utils.py +124 -0
  16. llm_commit_helper-0.1.0/llm_commit_helper.egg-info/PKG-INFO +287 -0
  17. llm_commit_helper-0.1.0/llm_commit_helper.egg-info/SOURCES.txt +25 -0
  18. llm_commit_helper-0.1.0/llm_commit_helper.egg-info/dependency_links.txt +1 -0
  19. llm_commit_helper-0.1.0/llm_commit_helper.egg-info/entry_points.txt +2 -0
  20. llm_commit_helper-0.1.0/llm_commit_helper.egg-info/top_level.txt +1 -0
  21. llm_commit_helper-0.1.0/pyproject.toml +36 -0
  22. llm_commit_helper-0.1.0/setup.cfg +4 -0
  23. llm_commit_helper-0.1.0/tests/test_config.py +74 -0
  24. llm_commit_helper-0.1.0/tests/test_formatters.py +88 -0
  25. llm_commit_helper-0.1.0/tests/test_git_staged.py +102 -0
  26. llm_commit_helper-0.1.0/tests/test_output.py +56 -0
  27. llm_commit_helper-0.1.0/tests/test_submodule.py +66 -0
@@ -0,0 +1,287 @@
1
+ Metadata-Version: 2.4
2
+ Name: llm-commit-helper
3
+ Version: 0.1.0
4
+ Summary: LLM-friendly replacement for git diff --staged
5
+ Author-email: Ronan Barzic <rbarzic@gmail.com>
6
+ License-Expression: MIT
7
+ Project-URL: Homepage, https://github.com/rbarzic/llm-commit-helper
8
+ Project-URL: Repository, https://github.com/rbarzic/llm-commit-helper
9
+ Keywords: git,llm,commit,diff,ai
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Environment :: Console
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Classifier: Topic :: Software Development :: Version Control :: Git
17
+ Requires-Python: >=3.11
18
+ Description-Content-Type: text/markdown
19
+
20
+ # llm-commit-helper
21
+
22
+ A smarter replacement for `git diff --staged`, designed to feed LLMs a clean,
23
+ size-bounded summary of staged changes rather than raw diff noise.
24
+
25
+ The core problem with `git diff --staged` for LLM commit message generation:
26
+
27
+ - Large files (netlists, SVD files) flood the context window
28
+ - Reformatted Python files produce massive diffs with zero logic change
29
+ - Verilog AUTO-expanded sections (`AUTOWIRE`, `AUTOINST`, …) produce long,
30
+ order-dependent diffs that obscure real changes
31
+ - Submodule updates show only a hash pair — no indication of what actually changed
32
+
33
+ `llm-commit-helper` handles all of these.
34
+
35
+ ---
36
+
37
+ ## Requirements
38
+
39
+ - Python 3.11+
40
+ - `git` in `PATH`
41
+ - Optional: `black` (for Python formatting isolation)
42
+ - Optional: `emacs` with `verilog-mode` (for Verilog AUTO stripping)
43
+
44
+ No extra Python packages are required. Run directly from the source tree.
45
+
46
+ ---
47
+
48
+ ## Installation
49
+
50
+ ```sh
51
+ pip install -e /path/to/llm-commit-helper
52
+ ```
53
+
54
+ Or from inside the project directory:
55
+
56
+ ```sh
57
+ pip install -e .
58
+ # or
59
+ make install
60
+ ```
61
+
62
+ ## Quick start
63
+
64
+ ```sh
65
+ # From inside a git repository, with staged changes:
66
+ llm-commit-helper
67
+ ```
68
+
69
+ Pipe the output to an LLM to generate a commit message:
70
+
71
+ ```sh
72
+ llm-commit-helper | llm "Write a commit message"
73
+ ```
74
+
75
+ ---
76
+
77
+ ## Options
78
+
79
+ | Flag | Description |
80
+ |---|---|
81
+ | `--config PATH` | Use a specific config file instead of searching the hierarchy |
82
+ | `--max-total-size SIZE` | Override the output size limit (e.g. `500`, `10KB`, `1MB`) |
83
+ | `-v`, `--verbose` | Print diagnostics to stderr (git root, config file used, budget) |
84
+
85
+ ---
86
+
87
+ ## Output format
88
+
89
+ ```
90
+ === Staged Changes Summary ===
91
+ Files: 8 total (4 modified, 2 added, 1 excluded, 1 submodule)
92
+ Config: /home/user/project/config.jsonc
93
+
94
+ --- File: src/foo.py [modified] ---
95
+ @@ -10,6 +10,8 @@
96
+ ...
97
+
98
+ --- File: src/bar.py [modified] [formatting-only] ---
99
+ [no logic changes - formatting only]
100
+
101
+ --- File: chip.netlist.v [excluded] ---
102
+ [changed - excluded by rule]
103
+
104
+ --- File: data/blob.bin [binary] ---
105
+ [binary file changed]
106
+
107
+ --- File: include/new_header.h [added] ---
108
+ [new file - contents not shown]
109
+
110
+ --- Submodule: support/imported/socbuilder ---
111
+ Updated: b2ae0f8 -> 153b625
112
+ 153b625 Update create_svd.py
113
+ bfee9c1 feat(latex): Add LaTeX table generation
114
+
115
+ === End of Staged Changes (3842 chars) ===
116
+ ```
117
+
118
+ Diagnostic messages (config warnings, fallback notices) go to **stderr**.
119
+ The staged-changes summary goes to **stdout**, so it can be piped cleanly.
120
+
121
+ ---
122
+
123
+ ## File handling
124
+
125
+ ### Added files
126
+ New files are reported as `[added]` with no content. Adding file content to a
127
+ commit message prompt is rarely useful and wastes context budget.
128
+
129
+ ### Deleted files
130
+ Reported as `[deleted]` with no diff shown.
131
+
132
+ ### Binary files
133
+ Detected via `git diff --numstat` (shows `- -` for binary). Reported as
134
+ `[binary file changed]`.
135
+
136
+ ### Excluded files
137
+ Files matching an `exclude` pattern in the config are reported as
138
+ `[changed - excluded by rule]`. Useful for generated files, netlists, or
139
+ anything too noisy to be useful in a commit message.
140
+
141
+ ### Files exceeding max_file_size
142
+ Reported as `[changed - file too large]`. Default threshold: 200 MB.
143
+
144
+ ### Submodules
145
+ For each updated submodule, the output includes a `git log --oneline` of the
146
+ commits between the old and new hash. If the submodule is not initialized on
147
+ disk, a warning is printed to stderr and the section is skipped.
148
+
149
+ ---
150
+
151
+ ## Smart formatters
152
+
153
+ ### Python (`.py`)
154
+ Runs `black --quiet` on temporary copies of both the old and new versions of
155
+ the file, then diffs the formatted results. If the formatted versions are
156
+ identical, the file is marked `[formatting-only]` and no diff is shown. This
157
+ suppresses the large diffs produced by `black` reformatting passes.
158
+
159
+ Falls back to generic formatting if `black` is not installed.
160
+
161
+ ### Verilog (`.v`, `.sv`)
162
+ Detects files using AUTO macros (`AUTOARG`, `AUTOINPUT`, `AUTOOUTPUT`,
163
+ `AUTOINST`, `AUTOWIRE`, `AUTOREG`, …). When found, runs
164
+ `emacs --batch -f verilog-batch-delete-auto -f save-buffer` on temporary copies
165
+ of both versions to **delete** the AUTO-generated sections before diffing.
166
+
167
+ This is the correct approach: diffing the hand-written source (before AUTO
168
+ expansion) rather than the expanded output, which is order-dependent and
169
+ produces spurious diffs even when nothing real changed.
170
+
171
+ Falls back to generic formatting if `emacs` is not installed or if no AUTO
172
+ macros are detected.
173
+
174
+ ### Generic (all other files)
175
+ Per-hunk whitespace normalization: strips leading/trailing whitespace and
176
+ collapses internal runs of spaces/tabs, then checks if normalized removed and
177
+ added lines are equal. Hunks that are whitespace-only are annotated
178
+ `[formatting-only]` inline in the diff.
179
+
180
+ ---
181
+
182
+ ## Configuration
183
+
184
+ `llm-commit-helper` searches for `config.jsonc` starting from the current
185
+ working directory, walking up to the git root, then falling back to a global
186
+ location. The first file found wins.
187
+
188
+ **Search order:**
189
+
190
+ 1. `<cwd>/config.jsonc`
191
+ 2. `<cwd>/.llm-commit-helper/config.jsonc`
192
+ 3. Same two patterns repeated for each parent directory up to the git root
193
+ 4. `~/.config/llm-commit-helper/config.jsonc`
194
+
195
+ To skip the search and use a specific file:
196
+
197
+ ```sh
198
+ python -m llm_commit_helper --config /path/to/my-config.jsonc
199
+ ```
200
+
201
+ ### Config file format
202
+
203
+ The file is JSONC — standard JSON with `//` line comments and trailing commas
204
+ allowed.
205
+
206
+ ```jsonc
207
+ {
208
+ "version": 1,
209
+ "rules": {
210
+ // Glob patterns for files to suppress (report as 'changed' only)
211
+ "exclude": [
212
+ "sim/firmware_ctests/**",
213
+ "atpg/from_genus_1d-comp-sdf/chip.test_netlist.v",
214
+ "*.netlist.v"
215
+ ],
216
+
217
+ // Files larger than this are suppressed (supports B, KB, MB, GB)
218
+ "max_file_size": "200MB",
219
+
220
+ // Total output budget; truncates with a summary when exceeded
221
+ "max_total_size": 20000,
222
+ }
223
+ }
224
+ ```
225
+
226
+ ### Defaults (no config file)
227
+
228
+ | Setting | Default |
229
+ |---|---|
230
+ | `exclude` | `[]` (nothing excluded) |
231
+ | `max_file_size` | `200MB` |
232
+ | `max_total_size` | `20000` (chars) |
233
+
234
+ ### Size values
235
+
236
+ `max_file_size` and `max_total_size` accept:
237
+
238
+ - Plain integers: `20000`
239
+ - Suffixed strings: `200MB`, `20KB`, `1GB`, `4096B`
240
+ - `max_total_size` on the CLI (`--max-total-size`) accepts the same formats
241
+
242
+ ### Output truncation
243
+
244
+ When the accumulated output exceeds `max_total_size`, remaining files are
245
+ listed by name only, followed by an `[OUTPUT TRUNCATED]` notice. The footer
246
+ always shows the actual character count of the output produced.
247
+
248
+ ---
249
+
250
+ ## Running tests
251
+
252
+ ```sh
253
+ cd support/local/llm-commit-helper
254
+ python -m pytest tests/ -v
255
+ # or
256
+ make test
257
+ ```
258
+
259
+ Tests use `pytest` with `unittest.mock` for all subprocess calls — no git
260
+ repository or external tools required.
261
+
262
+ ---
263
+
264
+ ## Project layout
265
+
266
+ ```
267
+ llm_commit_helper/
268
+ ├── __main__.py # python -m llm_commit_helper entry point
269
+ ├── cli.py # argument parsing and main pipeline
270
+ ├── config.py # JSONC loading and hierarchical config search
271
+ ├── git_staged.py # staged file listing and classification
272
+ ├── submodule.py # submodule log retrieval and formatting
273
+ ├── diff_engine.py # difflib wrapper with formatting-only annotation
274
+ ├── output.py # size-budgeted output assembly
275
+ ├── utils.py # subprocess, size parsing, glob matching
276
+ └── formatters/
277
+ ├── __init__.py # extension-based dispatcher
278
+ ├── generic_fmt.py # whitespace normalization
279
+ ├── python_fmt.py # black-based logic/formatting separation
280
+ └── verilog_fmt.py # emacs AUTO deletion
281
+ tests/
282
+ ├── test_config.py
283
+ ├── test_formatters.py
284
+ ├── test_git_staged.py
285
+ ├── test_output.py
286
+ └── test_submodule.py
287
+ ```
@@ -0,0 +1,268 @@
1
+ # llm-commit-helper
2
+
3
+ A smarter replacement for `git diff --staged`, designed to feed LLMs a clean,
4
+ size-bounded summary of staged changes rather than raw diff noise.
5
+
6
+ The core problem with `git diff --staged` for LLM commit message generation:
7
+
8
+ - Large files (netlists, SVD files) flood the context window
9
+ - Reformatted Python files produce massive diffs with zero logic change
10
+ - Verilog AUTO-expanded sections (`AUTOWIRE`, `AUTOINST`, …) produce long,
11
+ order-dependent diffs that obscure real changes
12
+ - Submodule updates show only a hash pair — no indication of what actually changed
13
+
14
+ `llm-commit-helper` handles all of these.
15
+
16
+ ---
17
+
18
+ ## Requirements
19
+
20
+ - Python 3.11+
21
+ - `git` in `PATH`
22
+ - Optional: `black` (for Python formatting isolation)
23
+ - Optional: `emacs` with `verilog-mode` (for Verilog AUTO stripping)
24
+
25
+ No extra Python packages are required. Run directly from the source tree.
26
+
27
+ ---
28
+
29
+ ## Installation
30
+
31
+ ```sh
32
+ pip install -e /path/to/llm-commit-helper
33
+ ```
34
+
35
+ Or from inside the project directory:
36
+
37
+ ```sh
38
+ pip install -e .
39
+ # or
40
+ make install
41
+ ```
42
+
43
+ ## Quick start
44
+
45
+ ```sh
46
+ # From inside a git repository, with staged changes:
47
+ llm-commit-helper
48
+ ```
49
+
50
+ Pipe the output to an LLM to generate a commit message:
51
+
52
+ ```sh
53
+ llm-commit-helper | llm "Write a commit message"
54
+ ```
55
+
56
+ ---
57
+
58
+ ## Options
59
+
60
+ | Flag | Description |
61
+ |---|---|
62
+ | `--config PATH` | Use a specific config file instead of searching the hierarchy |
63
+ | `--max-total-size SIZE` | Override the output size limit (e.g. `500`, `10KB`, `1MB`) |
64
+ | `-v`, `--verbose` | Print diagnostics to stderr (git root, config file used, budget) |
65
+
66
+ ---
67
+
68
+ ## Output format
69
+
70
+ ```
71
+ === Staged Changes Summary ===
72
+ Files: 8 total (4 modified, 2 added, 1 excluded, 1 submodule)
73
+ Config: /home/user/project/config.jsonc
74
+
75
+ --- File: src/foo.py [modified] ---
76
+ @@ -10,6 +10,8 @@
77
+ ...
78
+
79
+ --- File: src/bar.py [modified] [formatting-only] ---
80
+ [no logic changes - formatting only]
81
+
82
+ --- File: chip.netlist.v [excluded] ---
83
+ [changed - excluded by rule]
84
+
85
+ --- File: data/blob.bin [binary] ---
86
+ [binary file changed]
87
+
88
+ --- File: include/new_header.h [added] ---
89
+ [new file - contents not shown]
90
+
91
+ --- Submodule: support/imported/socbuilder ---
92
+ Updated: b2ae0f8 -> 153b625
93
+ 153b625 Update create_svd.py
94
+ bfee9c1 feat(latex): Add LaTeX table generation
95
+
96
+ === End of Staged Changes (3842 chars) ===
97
+ ```
98
+
99
+ Diagnostic messages (config warnings, fallback notices) go to **stderr**.
100
+ The staged-changes summary goes to **stdout**, so it can be piped cleanly.
101
+
102
+ ---
103
+
104
+ ## File handling
105
+
106
+ ### Added files
107
+ New files are reported as `[added]` with no content. Adding file content to a
108
+ commit message prompt is rarely useful and wastes context budget.
109
+
110
+ ### Deleted files
111
+ Reported as `[deleted]` with no diff shown.
112
+
113
+ ### Binary files
114
+ Detected via `git diff --numstat` (shows `- -` for binary). Reported as
115
+ `[binary file changed]`.
116
+
117
+ ### Excluded files
118
+ Files matching an `exclude` pattern in the config are reported as
119
+ `[changed - excluded by rule]`. Useful for generated files, netlists, or
120
+ anything too noisy to be useful in a commit message.
121
+
122
+ ### Files exceeding max_file_size
123
+ Reported as `[changed - file too large]`. Default threshold: 200 MB.
124
+
125
+ ### Submodules
126
+ For each updated submodule, the output includes a `git log --oneline` of the
127
+ commits between the old and new hash. If the submodule is not initialized on
128
+ disk, a warning is printed to stderr and the section is skipped.
129
+
130
+ ---
131
+
132
+ ## Smart formatters
133
+
134
+ ### Python (`.py`)
135
+ Runs `black --quiet` on temporary copies of both the old and new versions of
136
+ the file, then diffs the formatted results. If the formatted versions are
137
+ identical, the file is marked `[formatting-only]` and no diff is shown. This
138
+ suppresses the large diffs produced by `black` reformatting passes.
139
+
140
+ Falls back to generic formatting if `black` is not installed.
141
+
142
+ ### Verilog (`.v`, `.sv`)
143
+ Detects files using AUTO macros (`AUTOARG`, `AUTOINPUT`, `AUTOOUTPUT`,
144
+ `AUTOINST`, `AUTOWIRE`, `AUTOREG`, …). When found, runs
145
+ `emacs --batch -f verilog-batch-delete-auto -f save-buffer` on temporary copies
146
+ of both versions to **delete** the AUTO-generated sections before diffing.
147
+
148
+ This is the correct approach: diffing the hand-written source (before AUTO
149
+ expansion) rather than the expanded output, which is order-dependent and
150
+ produces spurious diffs even when nothing real changed.
151
+
152
+ Falls back to generic formatting if `emacs` is not installed or if no AUTO
153
+ macros are detected.
154
+
155
+ ### Generic (all other files)
156
+ Per-hunk whitespace normalization: strips leading/trailing whitespace and
157
+ collapses internal runs of spaces/tabs, then checks if normalized removed and
158
+ added lines are equal. Hunks that are whitespace-only are annotated
159
+ `[formatting-only]` inline in the diff.
160
+
161
+ ---
162
+
163
+ ## Configuration
164
+
165
+ `llm-commit-helper` searches for `config.jsonc` starting from the current
166
+ working directory, walking up to the git root, then falling back to a global
167
+ location. The first file found wins.
168
+
169
+ **Search order:**
170
+
171
+ 1. `<cwd>/config.jsonc`
172
+ 2. `<cwd>/.llm-commit-helper/config.jsonc`
173
+ 3. Same two patterns repeated for each parent directory up to the git root
174
+ 4. `~/.config/llm-commit-helper/config.jsonc`
175
+
176
+ To skip the search and use a specific file:
177
+
178
+ ```sh
179
+ python -m llm_commit_helper --config /path/to/my-config.jsonc
180
+ ```
181
+
182
+ ### Config file format
183
+
184
+ The file is JSONC — standard JSON with `//` line comments and trailing commas
185
+ allowed.
186
+
187
+ ```jsonc
188
+ {
189
+ "version": 1,
190
+ "rules": {
191
+ // Glob patterns for files to suppress (report as 'changed' only)
192
+ "exclude": [
193
+ "sim/firmware_ctests/**",
194
+ "atpg/from_genus_1d-comp-sdf/chip.test_netlist.v",
195
+ "*.netlist.v"
196
+ ],
197
+
198
+ // Files larger than this are suppressed (supports B, KB, MB, GB)
199
+ "max_file_size": "200MB",
200
+
201
+ // Total output budget; truncates with a summary when exceeded
202
+ "max_total_size": 20000,
203
+ }
204
+ }
205
+ ```
206
+
207
+ ### Defaults (no config file)
208
+
209
+ | Setting | Default |
210
+ |---|---|
211
+ | `exclude` | `[]` (nothing excluded) |
212
+ | `max_file_size` | `200MB` |
213
+ | `max_total_size` | `20000` (chars) |
214
+
215
+ ### Size values
216
+
217
+ `max_file_size` and `max_total_size` accept:
218
+
219
+ - Plain integers: `20000`
220
+ - Suffixed strings: `200MB`, `20KB`, `1GB`, `4096B`
221
+ - `max_total_size` on the CLI (`--max-total-size`) accepts the same formats
222
+
223
+ ### Output truncation
224
+
225
+ When the accumulated output exceeds `max_total_size`, remaining files are
226
+ listed by name only, followed by an `[OUTPUT TRUNCATED]` notice. The footer
227
+ always shows the actual character count of the output produced.
228
+
229
+ ---
230
+
231
+ ## Running tests
232
+
233
+ ```sh
234
+ cd support/local/llm-commit-helper
235
+ python -m pytest tests/ -v
236
+ # or
237
+ make test
238
+ ```
239
+
240
+ Tests use `pytest` with `unittest.mock` for all subprocess calls — no git
241
+ repository or external tools required.
242
+
243
+ ---
244
+
245
+ ## Project layout
246
+
247
+ ```
248
+ llm_commit_helper/
249
+ ├── __main__.py # python -m llm_commit_helper entry point
250
+ ├── cli.py # argument parsing and main pipeline
251
+ ├── config.py # JSONC loading and hierarchical config search
252
+ ├── git_staged.py # staged file listing and classification
253
+ ├── submodule.py # submodule log retrieval and formatting
254
+ ├── diff_engine.py # difflib wrapper with formatting-only annotation
255
+ ├── output.py # size-budgeted output assembly
256
+ ├── utils.py # subprocess, size parsing, glob matching
257
+ └── formatters/
258
+ ├── __init__.py # extension-based dispatcher
259
+ ├── generic_fmt.py # whitespace normalization
260
+ ├── python_fmt.py # black-based logic/formatting separation
261
+ └── verilog_fmt.py # emacs AUTO deletion
262
+ tests/
263
+ ├── test_config.py
264
+ ├── test_formatters.py
265
+ ├── test_git_staged.py
266
+ ├── test_output.py
267
+ └── test_submodule.py
268
+ ```
@@ -0,0 +1,7 @@
1
+ """llm-commit-helper: LLM-friendly replacement for git diff --staged."""
2
+
3
+ __version__ = "0.1.0"
4
+
5
+ # Local Variables:
6
+ # eval: (blacken-mode)
7
+ # End:
@@ -0,0 +1,10 @@
1
+ """Entry point: python -m llm_commit_helper"""
2
+
3
+ from llm_commit_helper.cli import main
4
+
5
+ if __name__ == "__main__":
6
+ main()
7
+
8
+ # Local Variables:
9
+ # eval: (blacken-mode)
10
+ # End: