git2xml 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- git2xml-0.1.0/LICENSE +21 -0
- git2xml-0.1.0/PKG-INFO +349 -0
- git2xml-0.1.0/README.md +318 -0
- git2xml-0.1.0/pyproject.toml +76 -0
- git2xml-0.1.0/setup.cfg +4 -0
- git2xml-0.1.0/src/git2xml/__init__.py +43 -0
- git2xml-0.1.0/src/git2xml/__main__.py +8 -0
- git2xml-0.1.0/src/git2xml/api.py +95 -0
- git2xml-0.1.0/src/git2xml/cli.py +158 -0
- git2xml-0.1.0/src/git2xml/constants.py +31 -0
- git2xml-0.1.0/src/git2xml/core.py +633 -0
- git2xml-0.1.0/src/git2xml/git_scanner.py +708 -0
- git2xml-0.1.0/src/git2xml/models.py +273 -0
- git2xml-0.1.0/src/git2xml/py.typed +0 -0
- git2xml-0.1.0/src/git2xml/utils.py +251 -0
- git2xml-0.1.0/src/git2xml.egg-info/PKG-INFO +349 -0
- git2xml-0.1.0/src/git2xml.egg-info/SOURCES.txt +26 -0
- git2xml-0.1.0/src/git2xml.egg-info/dependency_links.txt +1 -0
- git2xml-0.1.0/src/git2xml.egg-info/entry_points.txt +2 -0
- git2xml-0.1.0/src/git2xml.egg-info/requires.txt +5 -0
- git2xml-0.1.0/src/git2xml.egg-info/top_level.txt +1 -0
- git2xml-0.1.0/tests/test_api.py +254 -0
- git2xml-0.1.0/tests/test_cli.py +315 -0
- git2xml-0.1.0/tests/test_core.py +1015 -0
- git2xml-0.1.0/tests/test_git_scanner.py +858 -0
- git2xml-0.1.0/tests/test_integration.py +1183 -0
- git2xml-0.1.0/tests/test_models.py +9 -0
- git2xml-0.1.0/tests/test_utils.py +428 -0
git2xml-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Amit Ben-Ari
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
git2xml-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,349 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: git2xml
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Generate contextual XML briefs for Git Commits and PRs.
|
|
5
|
+
Author-email: Amit Ben-Ari <amit@hivetrail.com>
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/a-benari/git2xml
|
|
8
|
+
Project-URL: Repository, https://github.com/a-benari/git2xml
|
|
9
|
+
Project-URL: Issues, https://github.com/a-benari/git2xml/issues
|
|
10
|
+
Project-URL: Changelog, https://github.com/a-benari/git2xml/blob/main/CHANGELOG.md
|
|
11
|
+
Keywords: git,xml,llm,context,diff,pull-request,cli
|
|
12
|
+
Classifier: Development Status :: 4 - Beta
|
|
13
|
+
Classifier: Environment :: Console
|
|
14
|
+
Classifier: Intended Audience :: Developers
|
|
15
|
+
Classifier: Operating System :: OS Independent
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
22
|
+
Classifier: Topic :: Software Development :: Version Control :: Git
|
|
23
|
+
Requires-Python: >=3.9
|
|
24
|
+
Description-Content-Type: text/markdown
|
|
25
|
+
License-File: LICENSE
|
|
26
|
+
Provides-Extra: dev
|
|
27
|
+
Requires-Dist: pytest>=7.0.0; extra == "dev"
|
|
28
|
+
Requires-Dist: pyright>=1.1.350; extra == "dev"
|
|
29
|
+
Requires-Dist: ruff>=0.6; extra == "dev"
|
|
30
|
+
Dynamic: license-file
|
|
31
|
+
|
|
32
|
+
# git2xml
|
|
33
|
+
|
|
34
|
+
A zero-dependency CLI that generates structured XML briefs of your Git commits and pull requests - ready to paste into Claude, ChatGPT, or any LLM that benefits from clean context.
|
|
35
|
+
|
|
36
|
+
## Why this exists
|
|
37
|
+
|
|
38
|
+
LLMs work better with structured context than with raw blobs of text. When you ask Claude to write a PR description, pasting `git diff` output is workable but lossy - it strips staging information, mixes binary and text files into a mess, and doesn't separate file contents from their diffs. `git2xml` solves this by formatting your git state into XML that LLMs can parse cleanly, with explicit file paths, statuses, diffs, and content sections.
|
|
39
|
+
|
|
40
|
+
The result: better-quality output from your AI assistant with less prompt engineering on your end.
|
|
41
|
+
|
|
42
|
+
## Features
|
|
43
|
+
|
|
44
|
+
- **AI-ready output**: Produces XML structured specifically for LLM consumption, with explicit file paths, change statuses, diffs, and content sections that models parse reliably.
|
|
45
|
+
- **One command per use case**: `git2xml commit` for current changes, `git2xml pr` for branch-vs-base - no flag-juggling for common workflows.
|
|
46
|
+
- **Zero dependencies**: Built entirely on the Python standard library. No supply-chain surface beyond Python itself.
|
|
47
|
+
- **Robust binary detection**: Automatically excludes binary files using BOM detection and statistical character analysis to prevent XML corruption.
|
|
48
|
+
- **Smart XML escaping**: Safely wraps code containing CDATA terminators using dynamic Markdown fencing.
|
|
49
|
+
- **Staging-aware**: Differentiates between staged, unstaged, and untracked files for accurate commit briefs.
|
|
50
|
+
- **Context-budget controls**: Per-file content and diff size caps (`--max-size`, `--max-diff-size`) keep oversized files and runaway diffs out of your prompt while still recording that the change happened.
|
|
51
|
+
- **Usable as a library**: A small typed Python API returns the brief as a string (sync or async) for use inside scripts, agents, and LLM pipelines - not just the CLI.
|
|
52
|
+
|
|
53
|
+
## Requirements
|
|
54
|
+
|
|
55
|
+
Python 3.9 or higher. No other dependencies.
|
|
56
|
+
|
|
57
|
+
## Installation
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
pip install git2xml
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
## Usage
|
|
64
|
+
|
|
65
|
+
Run from inside any local Git repository, or target one with `--repo PATH`
|
|
66
|
+
|
|
67
|
+
### Generate a commit brief
|
|
68
|
+
|
|
69
|
+
Summarizes your currently modified files (or staged files if using the `--staged` flag) against `HEAD`.
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
git2xml commit
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
Outputs to `commit_brief.xml` by default, written to the directory you ran the command from.
|
|
76
|
+
|
|
77
|
+
### Generate a pull request brief
|
|
78
|
+
|
|
79
|
+
Summarizes all changes on your current branch against a base branch (defaults to `main`).
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
git2xml pr --base main --output my_pr_summary.xml
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
### Content-control flags
|
|
86
|
+
|
|
87
|
+
Four optional flags let you shape what ends up in the brief:
|
|
88
|
+
|
|
89
|
+
| Flag | Description |
|
|
90
|
+
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
|
91
|
+
| `--no-untracked` | Exclude untracked (new, un-`git add`ed) files from a commit brief. No-op when `--staged` is set (staged mode already excludes them) or in PR mode (no untracked files exist there). |
|
|
92
|
+
| `--max-size N` | Override the per-file **content** size threshold (in bytes), above which file _content_ is omitted and replaced with a reason string. Does not apply to diffs - that is `--max-diff-size` (see "Size limits: content vs. diffs"). The file's `<diff>` is still emitted, so the change stays visible. Defaults to 5 MiB (`5242880`). Must be a positive integer; `--max-size 0` or a negative value exits with an error. |
|
|
93
|
+
| `--max-diff-size N` | Override the per-file **diff** size threshold (in bytes, UTF-8). A diff larger than this is dropped from the output - its `<diff>` slot renders `status="omitted"` with a reason while the `<content>` stays. Unlike `--max-size`, this is output-shaping, not a pre-fetch guard (a diff has no size git can report before computing it). Defaults to 1 MiB (`1048576`). Must be `>= 0`; `--max-diff-size 0` disables the cap (diffs are always included in full). |
|
|
94
|
+
| `--no-content` | Produce a **diff-only** brief - all `<content>` bodies are suppressed and every file is represented by its `<diff>`. For newly added and untracked files (which have no prior version to diff against), the diff _is_ the full file content shown as added (`+`) lines - so a diff-only brief still captures new files completely. |
|
|
95
|
+
| `--strict-xml` | Generate strict XML 1.0 output - escape control characters and split CDATA terminators. If False (default), prioritize exact file fidelity, falling back to markdown fencing when a CDATA terminator is present. See the **XML Compliance vs. File Fidelity** section below for more details. |
|
|
96
|
+
|
|
97
|
+
These flags compose freely with each other and with `--staged`:
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
git2xml commit --no-untracked # omit untracked files
|
|
101
|
+
git2xml commit --max-size 102400 # cap content at 100 KiB
|
|
102
|
+
git2xml commit --max-diff-size 262144 # drop any single diff over 256 KiB
|
|
103
|
+
git2xml commit --no-content # diffs only, no file bodies
|
|
104
|
+
git2xml commit --no-untracked --no-content # combine: drop untracked, diffs only
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
> **Note - new files under `--no-content`:** Normally a brand-new file's change is
|
|
108
|
+
> carried by its `<content>`. Because `--no-content` suppresses content, git2xml
|
|
109
|
+
> instead emits the file's **add-diff** (every line shown as an added `+` line), so
|
|
110
|
+
> the file's contents are still present in the brief - just rendered as a diff rather
|
|
111
|
+
> than a content block. Untracked files (not yet `git add`ed) are diffed against an
|
|
112
|
+
> empty file to produce the same result. This applies only under `--no-content`; in
|
|
113
|
+
> the default mode new files render as normal `<content>`.
|
|
114
|
+
|
|
115
|
+
### Size limits: content vs. diffs
|
|
116
|
+
|
|
117
|
+
git2xml caps two things independently - file **content** (`--max-size`) and a
|
|
118
|
+
single file's **diff** (`--max-diff-size`). Same unit (bytes), different mechanics.
|
|
119
|
+
|
|
120
|
+
`--max-size` caps **file content**. Content size is read from git's metadata
|
|
121
|
+
(`ls-tree` / `cat-file`) or the filesystem _before_ the file is loaded, so an
|
|
122
|
+
oversized file is detected and skipped without ever being read into memory - the
|
|
123
|
+
guard prevents the work. The file's `<file>` element and `<diff>` are still
|
|
124
|
+
emitted, so the change stays visible.
|
|
125
|
+
|
|
126
|
+
`--max-diff-size` caps a single file's **diff**. Unlike content, a diff has no
|
|
127
|
+
size git can report in advance - it exists only once git computes it - so the cap
|
|
128
|
+
can't prevent the work the way `--max-size` does. Instead the diff is streamed and
|
|
129
|
+
abandoned once it crosses the limit (git2xml stops reading rather than buffering
|
|
130
|
+
the whole thing), then dropped from the output: its `<diff>` slot renders
|
|
131
|
+
`status="omitted"` with a reason while the `<content>` stays. This keeps a runaway
|
|
132
|
+
diff - a big generated or vendored file, or a large _deleted_ file whose only
|
|
133
|
+
payload is its diff - out of your context budget. Defaults to 1 MiB; pass
|
|
134
|
+
`--max-diff-size 0` to disable it and always include diffs in full.
|
|
135
|
+
|
|
136
|
+
### Execution options
|
|
137
|
+
|
|
138
|
+
| Flag | Description |
|
|
139
|
+
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
|
140
|
+
| `--git-timeout N` | Per-git-command timeout in seconds. Raise it for very large repos where a single diff/log can take a while. Default: 30. |
|
|
141
|
+
| `--diff-semaphore-limit N` | Max number of diffs fetched concurrently. Default: 20. Lower it to reduce load; raise it for more parallelism on fast disks. |
|
|
142
|
+
| `--verbose`/`-v` | Verbose logging. Logs per-file and per-commit progress, as well as debug log messages. |
|
|
143
|
+
| `--hide-repo-path` | Emit only the repository's directory name in the root `<{commit,pr}_brief repo="...">` attribute instead of its absolute local path. Use when sharing briefs externally. Individual file `path` attributes are always repo-relative and unaffected. Default: off (the absolute path is emitted). |
|
|
144
|
+
|
|
145
|
+
## Output location
|
|
146
|
+
|
|
147
|
+
The brief is written to the directory you ran the command from, using the name from
|
|
148
|
+
`--output` (or the `{command}_brief.xml` default). A relative `--output` is resolved
|
|
149
|
+
against your current directory; an absolute path is honored as given. Note this is
|
|
150
|
+
independent of `--repo`: pointing `--repo` at another repository still writes the
|
|
151
|
+
brief to where you invoked the command, not into that repository.
|
|
152
|
+
|
|
153
|
+
## Use as a Python library
|
|
154
|
+
|
|
155
|
+
Beyond the CLI, `git2xml` exposes a small programmatic API that **returns the brief
|
|
156
|
+
as a string** (nothing is written to disk), so you can feed it straight into an LLM
|
|
157
|
+
call, an agent pipeline, or any tool that assembles context.
|
|
158
|
+
|
|
159
|
+
```python
|
|
160
|
+
import git2xml
|
|
161
|
+
from git2xml import Git2xmlConfig
|
|
162
|
+
|
|
163
|
+
# Synchronous - for plain scripts
|
|
164
|
+
xml = git2xml.generate_commit_brief_sync(Git2xmlConfig(repo="/path/to/repo"))
|
|
165
|
+
|
|
166
|
+
# A PR brief against a base branch
|
|
167
|
+
xml = git2xml.generate_pr_brief_sync(Git2xmlConfig(repo=".", base="develop"))
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
The engine is `asyncio`-based, so async callers (agents, web handlers) can await the
|
|
171
|
+
native coroutines directly instead of blocking their event loop:
|
|
172
|
+
|
|
173
|
+
```python
|
|
174
|
+
import asyncio
|
|
175
|
+
import git2xml
|
|
176
|
+
from git2xml import Git2xmlConfig
|
|
177
|
+
|
|
178
|
+
async def main():
|
|
179
|
+
xml = await git2xml.generate_commit_brief(Git2xmlConfig(repo=".", staged=True))
|
|
180
|
+
# ... hand `xml` to your model / agent ...
|
|
181
|
+
|
|
182
|
+
asyncio.run(main())
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
> **Windows note:** the async functions spawn `git` via asyncio subprocesses,
|
|
186
|
+
> which on Windows require the `ProactorEventLoop`. `asyncio.run(...)` (above)
|
|
187
|
+
> selects it for you, so the normal case needs no action. Only if you supply
|
|
188
|
+
> your own event loop on Windows must it be a `ProactorEventLoop` - the
|
|
189
|
+
> `SelectorEventLoop` cannot create subprocesses and the call will fail. The
|
|
190
|
+
> sync wrappers and the CLI are unaffected.
|
|
191
|
+
|
|
192
|
+
All options live on the typed `Git2xmlConfig` object - the same settings the CLI
|
|
193
|
+
flags map to (`repo`, `base`, `staged`, `strict_xml`, `no_untracked`, `max_size`,
|
|
194
|
+
`max_diff_size`, `no_content`, `git_timeout`, `diff_semaphore_limit`, `hide_repo_path`). The **function name selects the
|
|
195
|
+
mode**, so you never set `command` yourself:
|
|
196
|
+
|
|
197
|
+
```python
|
|
198
|
+
config = Git2xmlConfig(repo=".", base="main", strict_xml=True, max_size=100_000)
|
|
199
|
+
xml = git2xml.generate_pr_brief_sync(config)
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
### API reference
|
|
203
|
+
|
|
204
|
+
| Function | Sync/Async | Returns |
|
|
205
|
+
| ------------------------------------ | ---------- | ---------- |
|
|
206
|
+
| `generate_commit_brief(config)` | async | XML string |
|
|
207
|
+
| `generate_pr_brief(config)` | async | XML string |
|
|
208
|
+
| `generate_commit_brief_sync(config)` | sync | XML string |
|
|
209
|
+
| `generate_pr_brief_sync(config)` | sync | XML string |
|
|
210
|
+
|
|
211
|
+
- Each returns the brief as a string, or an **empty string `""`** when there is
|
|
212
|
+
nothing to summarize (a clean working tree, or no commits between the branch and
|
|
213
|
+
its base).
|
|
214
|
+
- Failures raise `git2xml.Git2xmlError`, or a more specific subclass:
|
|
215
|
+
`NotAGitRepositoryError`, `GitNotInstalledError`, `GitCommandError`.
|
|
216
|
+
- The `*_sync` helpers cannot be called from inside a running event loop (e.g. a
|
|
217
|
+
Jupyter cell or an async handler); use the async variants there - they raise a
|
|
218
|
+
clear `RuntimeError` if misused.
|
|
219
|
+
|
|
220
|
+
## Example output
|
|
221
|
+
|
|
222
|
+
A commit brief for one new file and one modified file looks like this. Content and
|
|
223
|
+
diffs are wrapped in `CDATA` so source is embedded verbatim; the repository name is
|
|
224
|
+
emitted as a `<name>` element, and added files carry their full contents as `<content>`
|
|
225
|
+
(no diff is needed since the content is the whole change):
|
|
226
|
+
|
|
227
|
+
```xml
|
|
228
|
+
<commit_brief repo="/Users/dev/myapp">
|
|
229
|
+
<name>myapp</name>
|
|
230
|
+
<file path="src/tests/test_auth.py" status="added">
|
|
231
|
+
<content format="cdata"><![CDATA[# New file contents
|
|
232
|
+
]]></content>
|
|
233
|
+
</file>
|
|
234
|
+
<file path="src/auth.py" status="modified">
|
|
235
|
+
<content format="cdata"><![CDATA[def verify_token(token):
|
|
236
|
+
return token in VALID_TOKENS and not is_expired(token)
|
|
237
|
+
]]></content>
|
|
238
|
+
<diff format="cdata"><![CDATA[@@ -1,2 +1,2 @@
|
|
239
|
+
def verify_token(token):
|
|
240
|
+
- return token in VALID_TOKENS
|
|
241
|
+
+ return token in VALID_TOKENS and not is_expired(token)]]></diff>
|
|
242
|
+
</file>
|
|
243
|
+
<file path="src/config_loader.py" status="added">
|
|
244
|
+
<content format="cdata"><![CDATA[Symlink pointing to: ../shared/config_loader.py]]></content>
|
|
245
|
+
</file>
|
|
246
|
+
<file path="assets/logo.png" status="modified" reason="omitted - binary file detected" />
|
|
247
|
+
</commit_brief>
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
> The `repo` attribute shows the absolute path by default; run with
|
|
251
|
+
> `--hide-repo-path` to emit just the directory name (`repo="myapp"`) when
|
|
252
|
+
> sharing the brief externally.
|
|
253
|
+
|
|
254
|
+
A file whose content is omitted by `--max-size` still carries its `<diff>` and an
|
|
255
|
+
explanatory `reason`:
|
|
256
|
+
|
|
257
|
+
```xml
|
|
258
|
+
<file path="data/big.csv" status="modified" reason="omitted - file exceeds 5242880 bytes">
|
|
259
|
+
<diff format="cdata"><![CDATA[@@ ... @@]]></diff>
|
|
260
|
+
</file>
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
A file whose diff is dropped by `--max-diff-size` keeps its `<content>` and marks
|
|
264
|
+
the omission on the diff slot:
|
|
265
|
+
|
|
266
|
+
```xml
|
|
267
|
+
<file path="vendor/bundle.js" status="modified">
|
|
268
|
+
<content format="cdata"><![CDATA[/* ... file contents ... */]]></content>
|
|
269
|
+
<diff status="omitted" reason="diff exceeds 1048576 bytes" />
|
|
270
|
+
</file>
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
PR mode wraps the same `<file>` elements and additionally emits a `<commit_log>` of the
|
|
274
|
+
branch's commits.
|
|
275
|
+
|
|
276
|
+
## XML Compliance vs. File Fidelity
|
|
277
|
+
|
|
278
|
+
By default, `git2xml` prioritizes **exact file fidelity** over strict XML 1.0 compliance. AI models (like Claude) read raw token streams and do not use strict XML parsers.
|
|
279
|
+
|
|
280
|
+
- **Control Characters:** Literal control bytes (e.g., `0x00–0x08`, `0x0B`, `0x0C`, `0x0E–0x1F`) in your source code are passed through exactly as they appear in `<content>` and `<diff>` bodies. This also applies to control bytes inside _attribute_ values (a file path or commit author): in default mode they pass through unescaped, so a control byte there (such as a stray newline in a path) can break attribute well-formedness. `--strict-xml` escapes control characters in attributes too.
|
|
281
|
+
- **CDATA Terminators:** If a file contains the literal string `]]>`, `git2xml` avoids splitting the tag (which alters the raw text the LLM sees) and instead falls back to dynamic Markdown fencing (`format="fenced"`).
|
|
282
|
+
- **Invalid UTF-8:** Text is decoded as UTF-8 on a best-effort basis. Bytes that aren't valid UTF-8 are replaced with the Unicode replacement character (U+FFFD, `�`) rather than causing an error. Files git detects as binary are omitted entirely, so this affects only text files containing occasional malformed bytes.
|
|
283
|
+
|
|
284
|
+
If you are piping this output into a strict automated XML parser (like `xml.etree` or a CI/CD pipeline) rather than an LLM, you can use the `--strict-xml` flag. This will force strict XML 1.0 compliance by replacing control characters with their string representations (e.g., `\x1b`) and safely splitting CDATA terminators (`]]]]>< - a desktop app that assembles structured LLM context from multiple sources (Notion, GitHub Issues, local files, git repos). The git-handling component turned out to be useful as a standalone CLI, so I extracted it under MIT license.
|
|
340
|
+
|
|
341
|
+
If you find this tool helpful and want the same approach applied across your full developer context - not just git - check out HiveTrail Mesh.
|
|
342
|
+
|
|
343
|
+
## Contributing
|
|
344
|
+
|
|
345
|
+
Issues and PRs welcome. This is a small utility, so expect light maintenance - but reasonable bug reports and improvements will be reviewed and merged.
|
|
346
|
+
|
|
347
|
+
## License
|
|
348
|
+
|
|
349
|
+
MIT - see [LICENSE](./LICENSE) for details.
|