headson 0.8.0__cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
headson/__init__.py ADDED
@@ -0,0 +1,6 @@
1
+ from __future__ import annotations
2
+
3
+ # Re-export the compiled extension API directly.
4
+ from .headson import summarize # type: ignore
5
+
6
+ __all__ = ["summarize"]
Binary file
@@ -0,0 +1,296 @@
1
+ Metadata-Version: 2.4
2
+ Name: headson
3
+ Version: 0.8.0
4
+ Classifier: Programming Language :: Python
5
+ Classifier: Programming Language :: Python :: 3
6
+ Classifier: Programming Language :: Rust
7
+ Classifier: Operating System :: OS Independent
8
+ Requires-Dist: pytest>=8 ; extra == 'test'
9
+ Provides-Extra: test
10
+ License-File: LICENSE
11
+ Summary: Budget‑constrained JSON preview renderer (Python bindings)
12
+ Keywords: json,preview,summarize,cli,bindings
13
+ Requires-Python: >=3.10
14
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
15
+
16
+ <h1 align="center">
17
+ <img src="https://raw.githubusercontent.com/kantord/headson/main/docs/assets/logo.svg" alt="headson" width="221" />
18
+ </h1>
19
+ <p align="center">
20
+ <img src="https://raw.githubusercontent.com/kantord/headson/main/docs/assets/tapes/demo.gif" alt="Terminal demo" width="1560" height="900" />
21
+ <br/>
22
+ </p>
23
+
24
+ `head`/`tail` for JSON, YAML — but structure‑aware. Get a compact preview that shows both the shape and representative values of your data, all within a strict byte budget. (Just like `head`/`tail`, `hson` can also work with unstructured text files.)
25
+
26
+ Available as:
27
+ - CLI (see [Usage](#usage))
28
+ - Python library (see [Python Bindings](#python-bindings))
29
+
30
+ ![Codecov](https://img.shields.io/codecov/c/github/kantord/headson?style=flat-square) ![Crates.io Version](https://img.shields.io/crates/v/headson?style=flat-square) ![PyPI - Version](https://img.shields.io/pypi/v/headson?style=flat-square)
31
+
32
+
33
+ ## Install
34
+
35
+
36
+ Using Cargo:
37
+
38
+ cargo install headson
39
+
40
+
41
+ > Note: the CLI installs as `hson`. All examples below use `hson ...`.
42
+
43
+
44
+ From source:
45
+
46
+ cargo build --release
47
+ target/release/hson --help
48
+
49
+
50
+
51
+
52
+
53
+ ## Features
54
+
55
+ - Budgeted output: specify exactly how much you want to see
56
+ - Output formats: `auto | json | yaml | text`
57
+ - Styles: `strict | default | detailed`
58
+ - JSON family: `strict` → strict JSON, `default` → human‑friendly Pseudo, `detailed` → JS with inline comments
59
+ - YAML: always YAML; `strict` has no comments, `default` uses “# …”, `detailed` uses “# N more …”
60
+ - Text: prints raw lines. In `default` style, omissions are shown as a single line `…`; in `detailed`, as `… N more lines …`. `strict` omits array‑level summaries.
61
+ - Multiple inputs: preview many files at once with a shared or per‑file budget
62
+ - Fast: processes gigabyte‑scale files in seconds (mostly disk‑bound)
63
+ - Available as a CLI app and as a Python library
64
+
65
+ ## Fits into command line workflows
66
+
67
+ If you’re comfortable with tools like `head` and `tail`, use `hson` when you want a quick, structured peek into a JSON file without dumping the entire thing.
68
+
69
+ - `head`/`tail` operate on bytes/lines - their output is not optimized for tree structures
70
+ - `jq`: you need to craft filters to preview large JSON files
71
+ - `hson`: head/tail for trees—zero‑config by default; force text with `-i text` when you want raw lines
72
+
73
+ ## Usage
74
+
75
+ hson [FLAGS] [INPUT...]
76
+
77
+ - INPUT (optional, repeatable): file path(s). If omitted, reads from stdin. Multiple input files are supported.
78
+ - Prints the preview to stdout. On parse errors, exits non‑zero and prints an error to stderr.
79
+
80
+ Common flags:
81
+
82
+ - `-c, --bytes <BYTES>`: per‑file output budget (bytes). For multiple inputs, default total budget is `<BYTES> * number_of_inputs`.
83
+ - `-u, --chars <CHARS>`: per‑file output budget (Unicode code points). Behaves like `--bytes` but counts characters instead of bytes.
84
+ - `-C, --global-bytes <BYTES>`: total output budget across all inputs. With `--bytes`, the effective total is the smaller of the two.
85
+ - `-f, --format <auto|json|yaml|text>`: output format (default: `auto`).
86
+ - Auto: stdin → JSON family; filesets → per‑file based on extension (`.json` → JSON family, `.yaml`/`.yml` → YAML, unknown → Text).
87
+ - `-t, --template <strict|default|detailed>`: output style (default: `default`).
88
+ - JSON family: `strict` → strict JSON; `default` → Pseudo; `detailed` → JS with inline comments.
89
+ - YAML: always YAML; style only affects comments (`strict` none, `default` “# …”, `detailed` “# N more …”).
90
+ - `-i, --input-format <json|yaml|text>`: ingestion format (default: `json`). For filesets in `auto` format, ingestion is chosen by extensions.
91
+ - `-m, --compact`: no indentation, no spaces, no newlines
92
+ - `--no-newline`: single line output
93
+ - `--no-header`: suppress fileset section headers (useful when embedding output in scripts)
94
+ - `--no-space`: no space after `:` in objects
95
+ - `--indent <STR>`: indentation unit (default: two spaces)
96
+ - `--string-cap <N>`: max graphemes to consider per string (default: 500)
97
+ - `--grep <REGEX>`: guarantee inclusion of values/keys/lines matching the regex (ripgrep‑style). Matches + ancestors are “free”; budgets apply to everything else. If matches consume all headroom, only the must‑keep path is shown.
98
+ - `--head`: prefer the beginning of arrays when truncating (keep first N). Strings are unaffected. Display styles place omission markers accordingly; strict JSON remains unannotated. Mutually exclusive with `--tail`.
99
+ - `--tail`: prefer the end of arrays when truncating (keep last N). Strings are unaffected. Display styles place omission markers accordingly; strict JSON remains unannotated. Mutually exclusive with `--head`.
100
+
101
+ Notes:
102
+
103
+ - Multiple inputs:
104
+ - With newlines enabled, file sections are rendered with human‑readable headers (pass `--no-header` to suppress them). In compact/single‑line modes, headers are omitted.
105
+ - Order: inputs are sorted by git frecency (via frecenfile) when available, then by mtime; pass `--no-sort` to keep the original input order without repo scanning.
106
+ - In `--format auto`, each file uses its own best format: JSON family for `.json`, YAML for `.yaml`/`.yml`.
107
+ - Unknown extensions are treated as Text (raw lines) — safe for logs and `.txt` files.
108
+ - `--global-bytes` may truncate or omit entire files to respect the total budget.
109
+ - The tool finds the largest preview that fits the budget; even if extremely tight, you still get a minimal, valid preview.
110
+ - Directories and binary files are ignored; a notice is printed to stderr for each. Stdin reads the stream as‑is.
111
+ - Head vs Tail sampling: these options bias which part of arrays are kept before rendering. Display styles may still insert internal gap markers to honor very small budgets; strict JSON stays unannotated.
112
+
113
+ ### Working with multiple files
114
+
115
+ - Budgets: per-file caps (`--bytes`/`--chars`/`--lines`) apply to each input; global caps (`--global-*`) constrain the combined output. Default byte budget scales by input count when no globals are set.
116
+ - Sorting: inputs are pre-sorted by git frecency (frecenfile) with last-modified-time fallback so recently touched files appear first. Pass `--no-sort` to preserve the order you provided and skip repo scanning.
117
+ - Headers: fileset sections get `==>` headers when newlines are enabled; hide them with `--no-header`. Compact and single-line modes omit headers automatically.
118
+ - Formats: in `--format auto`, each file picks JSON/YAML/Text based on extension; unknowns fall back to Text so mixed filesets “just work.”
119
+
120
+ ## Budget Modes
121
+
122
+ - Bytes (`-c/--bytes`, `-C/--global-bytes`)
123
+ - Measures UTF‑8 bytes in the output.
124
+ - Default per‑file budget is 500 bytes when neither `--lines` nor `--chars` is provided.
125
+ - Multiple inputs: total default budget is `<BYTES> * number_of_inputs`; `--global-bytes` caps the total.
126
+
127
+ - Characters (`-u/--chars`)
128
+ - Measures Unicode code points (not grapheme clusters).
129
+
130
+ - Lines (`-n/--lines`, `-N/--global-lines`)
131
+ - Caps the number of lines in the output.
132
+ - Incompatible with `--no-newline`.
133
+ - Multiple inputs: defaults to `<LINES> * number_of_inputs`; `--global-lines` caps the total.
134
+ - Fileset headers, blank separators, and summary lines do not count toward the line cap by default; only actual content lines are considered. Pass `-H/--count-headers` to include headers/summaries in the line budget.
135
+
136
+ - Interactions and precedence
137
+ - All active budgets are enforced simultaneously. The render must satisfy all of: bytes (if set), chars (if set), and lines (if set). The strictest cap wins.
138
+ - When only lines are specified, no implicit byte cap applies. When neither lines nor chars are specified, a 500‑byte default applies.
139
+
140
+ Quick one‑liners:
141
+
142
+ - Peek a big JSON stream (keeps structure):
143
+
144
+ zstdcat huge.json.zst | hson -c 800 -f json -t default
145
+
146
+ - Many files with a fixed overall size:
147
+
148
+ hson -C 1200 -f json -t strict logs/*.json
149
+
150
+ - Glance at a file, JavaScript‑style comments for omissions:
151
+
152
+ hson -c 400 -f json -t detailed data.json
153
+
154
+ - YAML with detailed comments:
155
+
156
+ hson -c 400 -f yaml -t detailed config.yaml
157
+
158
+ ### Text mode
159
+
160
+ - Single file (auto):
161
+
162
+ hson -c 200 notes.txt
163
+
164
+ - Force Text ingest/output (useful when mixing with other extensions, or when the extension suggests JSON/YAML):
165
+
166
+ hson -c 200 -i text -f text notes.txt
167
+ # Force text ingest even if the file looks like JSON
168
+ hson -i text notes.json
169
+
170
+ - Styles on Text:
171
+ - default: omission as a standalone `…` line.
172
+ - detailed: omission as `… N more lines …`.
173
+ - strict: no array‑level omission line (individual long lines may still truncate with `…`).
174
+
175
+ > **Note:** Filesets always render with per-file auto templates. When you need to preview a directory of mixed formats, skip `-f text` and let `-f auto` pick the right renderer for each entry.
176
+
177
+ Show help:
178
+
179
+ hson --help
180
+
181
+ Note: flags align with head/tail conventions (`-c/--bytes`, `-C/--global-bytes`).
182
+
183
+ ## Examples: head vs hson
184
+
185
+ Input:
186
+
187
+ ```json
188
+ {"users":[{"id":1,"name":"Ana","roles":["admin","dev"]},{"id":2,"name":"Bo"}],"meta":{"count":2,"source":"db"}}
189
+ ```
190
+
191
+ Naive cut (can break mid‑token):
192
+
193
+ ```bash
194
+ jq -c . users.json | head -c 80
195
+ # {"users":[{"id":1,"name":"Ana","roles":["admin","dev"]},{"id":2,"name":"Bo"}],"me
196
+ ```
197
+
198
+ Structured preview with hson (JSON family, default style → Pseudo):
199
+
200
+ ```bash
201
+ hson -c 120 -f json -t default users.json
202
+ # {
203
+ # users: [
204
+ # { id: 1, name: "Ana", roles: [ "admin", … ] },
205
+ # …
206
+ # ]
207
+ # meta: { count: 2, … }
208
+ # }
209
+ ```
210
+
211
+ Machine‑readable preview (JSON family, strict style → strict JSON):
212
+
213
+ ```bash
214
+ hson -c 120 -f json -t strict users.json
215
+ # {"users":[{"id":1,"name":"Ana","roles":["admin"]}],"meta":{"count":2}}
216
+ ```
217
+
218
+ ## Terminal Demos
219
+
220
+ Regenerate locally:
221
+
222
+ - Place tapes under docs/tapes (e.g., docs/tapes/demo.tape)
223
+ - Run: cargo make tapes
224
+ - Outputs are written to docs/assets/tapes
225
+
226
+
227
+ ## Python Bindings
228
+
229
+ A thin Python extension module is available on PyPI as `headson`.
230
+
231
+ - Install: `pip install headson` (ABI3 wheels for Python 3.10+ on Linux/macOS/Windows).
232
+ - API:
233
+ - `headson.summarize(text: str, *, format: str = "auto", style: str = "default", input_format: str = "json", byte_budget: int | None = None, skew: str = "balanced") -> str`
234
+ - `format`: `"auto" | "json" | "yaml"` (auto maps to JSON family for single inputs)
235
+ - `style`: `"strict" | "default" | "detailed"`
236
+ - `input_format`: `"json" | "yaml"` (ingestion)
237
+ - `byte_budget`: maximum output size in bytes (default: 500)
238
+ - `skew`: `"balanced" | "head" | "tail"` (affects display styles; strict JSON remains unannotated)
239
+
240
+ Examples:
241
+
242
+ ```python
243
+ import json
244
+ import headson
245
+
246
+ data = {"foo": [1, 2, 3], "bar": {"x": "y"}}
247
+ preview = headson.summarize(json.dumps(data), format="json", style="strict", byte_budget=200)
248
+ print(preview)
249
+
250
+ # Prefer the tail of arrays (annotations show with style="default"/"detailed")
251
+ print(
252
+ headson.summarize(
253
+ json.dumps(list(range(100))),
254
+ format="json",
255
+ style="detailed",
256
+ byte_budget=80,
257
+ skew="tail",
258
+ )
259
+ )
260
+
261
+ # YAML support
262
+ doc = "root:\n items: [1,2,3,4,5,6,7,8,9,10]\n"
263
+ print(headson.summarize(doc, format="yaml", style="default", input_format="yaml", byte_budget=60))
264
+ ```
265
+
266
+ ## Source Code Support
267
+
268
+ Source code support is a challenging area. While `headson`'s algorithm and code structure would allow for the use of
269
+ completely accurate parsing using language-specific parsers using `tree-sitter`, this would increase the complexity
270
+ of the application and its number of dependencies.
271
+
272
+ Instead of attempting a deep parse of source code files, we convert them into nested arrays based on a heuristic that
273
+ understands indentation patterns in the file.
274
+
275
+ When `headson` detects a code-like file, it uses a set of additional heuristics:
276
+ - **Atomic line ingest**: each line is treated as an atomic string so omission markers never split a code line.
277
+ - **Depth-aware sampling**:
278
+ - We attempt to include more of the top level of the source code in order to give a good overview of classes, function and constants at the top level.
279
+ - Nested blocks (function bodies, loops) prefer to omit lines in the middle to attempt to preserve natural "block" boundaries
280
+ - **Header priority**: lines that introduce a nested block (e.g., `def foo():`) get a small priority boost to ensure they survive tight budgets.
281
+
282
+ # Algorithm
283
+
284
+ ![Algorithm overview](https://raw.githubusercontent.com/kantord/headson/main/docs/assets/algorithm.svg)
285
+
286
+ ## Footnotes
287
+ - <sup><b>[1]</b></sup> <b>Optimized tree representation</b>: An arena‑style tree stored in flat, contiguous buffers. Each node records its kind and value plus index ranges into shared child and key arrays. Arrays are ingested in a single pass and may be deterministically pre‑sampled: the first element is always kept; additional elements are selected via a fixed per‑index inclusion test; for kept elements, original indices are stored and full lengths are counted. This enables accurate omission info and internal gap markers later, while minimizing pointer chasing.
288
+ - <sup><b>[2]</b></sup> <b>Priority order</b>: Nodes are scored so previews surface representative structure and values first. Arrays can favor head/mid/tail coverage (default) or strictly the head; tail preference flips head/tail when configured. Object properties are ordered by key, and strings expand by grapheme with early characters prioritized over very deep expansions.
289
+ - <sup><b>[3]</b></sup> <b>Choose top N nodes (binary search)</b>: Iteratively picks N so that the rendered preview fits within the byte budget, looping between “choose N” and a render attempt to converge quickly.
290
+ - <sup><b>[4]</b></sup> <b>Render attempt</b>: Serializes the currently included nodes using the selected template. Omission summaries and per-file section headers appear in display templates (pseudo/js); json remains strict. For arrays, display templates may insert internal gap markers between non‑contiguous kept items using original indices.
291
+ - <sup><b>[5]</b></sup> <b>Diagram source</b>: The Algorithm diagram is generated from `docs/diagrams/algorithm.mmd`. Regenerate the SVG with `cargo make diagrams` before releasing.
292
+
293
+ ## License
294
+
295
+ MIT
296
+
@@ -0,0 +1,6 @@
1
+ headson-0.8.0.dist-info/METADATA,sha256=4Xv_oF4owlMo3itZexZaeWfFEPT8ENg-d8IEQkeGflA,14792
2
+ headson-0.8.0.dist-info/WHEEL,sha256=X_Cce7mIV5wGTHIl0EkW0q8IpVi433Ij5ILty3Nzqfk,145
3
+ headson-0.8.0.dist-info/licenses/LICENSE,sha256=GZ9row3L2LsnOSbEuGMQZ0zKOIEd5tHr76cZHpg4KK8,1072
4
+ headson/__init__.py,sha256=18MQhYgSfuL6tr0UNF1pDNAaiH2cUkVtrZ_u9uNoM_Q,157
5
+ headson/headson.abi3.so,sha256=ix3RjB6lza3J4wio-qsEXHfpY8Toz1m196Aj63bQZWM,2027128
6
+ headson-0.8.0.dist-info/RECORD,,
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: maturin (1.10.2)
3
+ Root-Is-Purelib: false
4
+ Tag: cp310-abi3-manylinux_2_17_x86_64
5
+ Tag: cp310-abi3-manylinux2014_x86_64
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Dániel Kántor
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.