PyPI - gcf-python - Versions diffs - 0.4.0__tar.gz → 0.5.1__tar.gz - Mend

gcf-python 0.4.0tar.gz → 0.5.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

{gcf_python-0.4.0 → gcf_python-0.5.1}/CHANGELOG.md RENAMED Viewed

@@ -1,5 +1,11 @@
 # Changelog
+## v0.5.0 (2026-06-06)
+- `GenericStreamEncoder`: zero-buffering tabular streaming encode (begin_array/write_row/end_array/write_kv/write_section/write_inline_array)
+- `decode_generic`: decode any GCF text (tabular or graph) back to Python objects
+- `StreamEncoder`: zero-buffering streaming encode (added in v0.4.0)
 ## v0.3.0 (2026-06-05)
 - `encode_generic`: primitive arrays inlined as `name[N]: val1,val2,val3`

{gcf_python-0.4.0 → gcf_python-0.5.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: gcf-python
-Version: 0.4.0
+Version: 0.5.1
 Summary: Python implementation of GCF (Graph Compact Format): token-optimized wire format for LLM tool responses
 Project-URL: Homepage, https://github.com/blackwell-systems/gcf-python
 Project-URL: Documentation, https://blackwell-systems.github.io/gcf/
@@ -30,9 +30,9 @@ Description-Content-Type: text/markdown
 # gcf-python
-Python implementation of [GCF (Graph Compact Format)](https://gcformat.com/) — the most token-efficient wire format for LLMs. A drop-in alternative to JSON and TOON for any structured data.
+Python implementation of [GCF](https://gcformat.com/) — the most token-efficient wire format for LLMs. A drop-in alternative to JSON and TOON for any structured data.
-**79% fewer input tokens than JSON. 75% fewer output tokens. 52% smaller than TOON. 100% LLM comprehension at 500 symbols, where JSON scores 76.9% and TOON scores 92.3%.**
+**79% fewer input tokens than JSON. 63% fewer output tokens. 90.5% average comprehension accuracy across 10 models and 3 providers (four models hit 100%). 1,300+ LLM evaluations. Zero training.**
 Docs: [gcformat.com](https://gcformat.com/) · [Playground](https://gcformat.com/playground.html) · [GCF vs TOON](https://gcformat.com/guide/vs-toon.html)
@@ -66,33 +66,21 @@ Payload: 50 symbols, 20 edges
 ### Quick Start
 ```python
-from gcf import encode, Payload, Symbol, Edge
+from gcf import encode_generic
-p = Payload(
-    tool="context_for_task",
-    token_budget=5000,
-    tokens_used=1847,
-    symbols=[
-        Symbol(qualified_name="pkg.AuthMiddleware", kind="function", score=0.78, provenance="lsp_resolved", distance=0),
-        Symbol(qualified_name="pkg.NewServer", kind="function", score=0.54, provenance="lsp_resolved", distance=1),
-    ],
-    edges=[
-        Edge(source="pkg.NewServer", target="pkg.AuthMiddleware", edge_type="calls"),
+output = encode_generic({
+    "employees": [
+        {"id": 1, "name": "Alice", "department": "Engineering", "salary": 95000},
+        {"id": 2, "name": "Bob", "department": "Sales", "salary": 72000},
     ],
-)
-output = encode(p)
+})
 ```
 Output:
 ```
-GCF tool=context_for_task budget=5000 tokens=1847 symbols=2 edges=1
-## targets
-@0 fn pkg.AuthMiddleware 0.78 lsp_resolved
-## related
-@1 fn pkg.NewServer 0.54 lsp_resolved
-## edges [1]
-@0<@1 calls
+## employees [2]{id,name,department,salary}
+1|Alice|Engineering|95000
+2|Bob|Sales|72000
 ```
 ## Decode
@@ -216,33 +204,18 @@ Works on dicts, lists, and primitives. Lists of uniform dicts get tabular rows.
 | `Session` | Thread-safe tracker for multi-call deduplication |
 | `KIND_ABBREV` / `KIND_EXPAND` | Bidirectional kind abbreviation dicts |
-## Comprehension Eval
-Rigorous 3-way benchmark (GCF vs TOON vs JSON) at 500 symbols, 200 edges. 13 structured extraction questions sent to an LLM with zero format instructions:
-| Format | Accuracy | Tokens | vs JSON |
-|--------|----------|--------|---------|
-| **GCF** | **100%** (13/13) | **11,090** | **79% fewer** |
-| TOON | 92.3% (12/13) | 16,378 | 69% fewer |
-| JSON | 76.9% (10/13) | 53,341 | baseline |
-GCF is the only format with perfect accuracy at scale, at 32% fewer tokens than TOON.
-Reproduce: `git clone https://github.com/blackwell-systems/gcf-go && cd gcf-go/eval && GOWORK=off go test -run TestComprehension -v -timeout 0`
-## Token Efficiency (TOON's Own Benchmark)
-Running [TOON's benchmark harness](https://github.com/blackwell-systems/toon/tree/gcf-comparison) with GCF inserted (their datasets, their tokenizer):
+## Benchmarks
-| Track | GCF | TOON | Result |
-|-------|-----|------|--------|
-| Mixed-structure (nested, semi-uniform) | 170,367 | 227,896 | **GCF 34% smaller** |
-| Flat-only (tabular) | 66,029 | 67,837 | **GCF 3% smaller** |
-| Semi-uniform event logs | 108,158 | 154,032 | **GCF 42% smaller** |
+1,300+ LLM evaluations across 10 models, 3 providers, and 51 independent test runs.
-GCF wins all 6 datasets. On semi-uniform data (the most common real-world pattern), GCF uses 42% fewer tokens than TOON.
+| | GCF | TOON | JSON |
+|---|---|---|---|
+| **Comprehension** (23 runs, 10 models) | **90.5%** | 68.5% | 53.6% |
+| **Generation** (28 runs, 9 models) | **5/5** | 1.0/5 | 5.0/5 |
+| **Input tokens** (500 symbols) | **11,090** | 16,378 | 53,341 |
+| **Output tokens** (100 symbols) | **5,976** | 8,937 | 16,121 |
-Reproduce: `git clone https://github.com/blackwell-systems/toon && cd toon && git checkout gcf-comparison && cd benchmarks && pnpm install && pnpm benchmark:tokens`
+GCF wins all 6 datasets on [TOON's own benchmark](https://github.com/blackwell-systems/toon/tree/gcf-comparison). Full results: [gcformat.com/guide/benchmarks](https://gcformat.com/guide/benchmarks.html)
 ## Links

{gcf_python-0.4.0 → gcf_python-0.5.1}/README.md RENAMED Viewed

@@ -5,9 +5,9 @@
 # gcf-python
-Python implementation of [GCF (Graph Compact Format)](https://gcformat.com/) — the most token-efficient wire format for LLMs. A drop-in alternative to JSON and TOON for any structured data.
+Python implementation of [GCF](https://gcformat.com/) — the most token-efficient wire format for LLMs. A drop-in alternative to JSON and TOON for any structured data.
-**79% fewer input tokens than JSON. 75% fewer output tokens. 52% smaller than TOON. 100% LLM comprehension at 500 symbols, where JSON scores 76.9% and TOON scores 92.3%.**
+**79% fewer input tokens than JSON. 63% fewer output tokens. 90.5% average comprehension accuracy across 10 models and 3 providers (four models hit 100%). 1,300+ LLM evaluations. Zero training.**
 Docs: [gcformat.com](https://gcformat.com/) · [Playground](https://gcformat.com/playground.html) · [GCF vs TOON](https://gcformat.com/guide/vs-toon.html)
@@ -41,33 +41,21 @@ Payload: 50 symbols, 20 edges
 ### Quick Start
 ```python
-from gcf import encode, Payload, Symbol, Edge
+from gcf import encode_generic
-p = Payload(
-    tool="context_for_task",
-    token_budget=5000,
-    tokens_used=1847,
-    symbols=[
-        Symbol(qualified_name="pkg.AuthMiddleware", kind="function", score=0.78, provenance="lsp_resolved", distance=0),
-        Symbol(qualified_name="pkg.NewServer", kind="function", score=0.54, provenance="lsp_resolved", distance=1),
-    ],
-    edges=[
-        Edge(source="pkg.NewServer", target="pkg.AuthMiddleware", edge_type="calls"),
+output = encode_generic({
+    "employees": [
+        {"id": 1, "name": "Alice", "department": "Engineering", "salary": 95000},
+        {"id": 2, "name": "Bob", "department": "Sales", "salary": 72000},
     ],
-)
-output = encode(p)
+})
 ```
 Output:
 ```
-GCF tool=context_for_task budget=5000 tokens=1847 symbols=2 edges=1
-## targets
-@0 fn pkg.AuthMiddleware 0.78 lsp_resolved
-## related
-@1 fn pkg.NewServer 0.54 lsp_resolved
-## edges [1]
-@0<@1 calls
+## employees [2]{id,name,department,salary}
+1|Alice|Engineering|95000
+2|Bob|Sales|72000
 ```
 ## Decode
@@ -191,33 +179,18 @@ Works on dicts, lists, and primitives. Lists of uniform dicts get tabular rows.
 | `Session` | Thread-safe tracker for multi-call deduplication |
 | `KIND_ABBREV` / `KIND_EXPAND` | Bidirectional kind abbreviation dicts |
-## Comprehension Eval
-Rigorous 3-way benchmark (GCF vs TOON vs JSON) at 500 symbols, 200 edges. 13 structured extraction questions sent to an LLM with zero format instructions:
-| Format | Accuracy | Tokens | vs JSON |
-|--------|----------|--------|---------|
-| **GCF** | **100%** (13/13) | **11,090** | **79% fewer** |
-| TOON | 92.3% (12/13) | 16,378 | 69% fewer |
-| JSON | 76.9% (10/13) | 53,341 | baseline |
-GCF is the only format with perfect accuracy at scale, at 32% fewer tokens than TOON.
-Reproduce: `git clone https://github.com/blackwell-systems/gcf-go && cd gcf-go/eval && GOWORK=off go test -run TestComprehension -v -timeout 0`
-## Token Efficiency (TOON's Own Benchmark)
-Running [TOON's benchmark harness](https://github.com/blackwell-systems/toon/tree/gcf-comparison) with GCF inserted (their datasets, their tokenizer):
+## Benchmarks
-| Track | GCF | TOON | Result |
-|-------|-----|------|--------|
-| Mixed-structure (nested, semi-uniform) | 170,367 | 227,896 | **GCF 34% smaller** |
-| Flat-only (tabular) | 66,029 | 67,837 | **GCF 3% smaller** |
-| Semi-uniform event logs | 108,158 | 154,032 | **GCF 42% smaller** |
+1,300+ LLM evaluations across 10 models, 3 providers, and 51 independent test runs.
-GCF wins all 6 datasets. On semi-uniform data (the most common real-world pattern), GCF uses 42% fewer tokens than TOON.
+| | GCF | TOON | JSON |
+|---|---|---|---|
+| **Comprehension** (23 runs, 10 models) | **90.5%** | 68.5% | 53.6% |
+| **Generation** (28 runs, 9 models) | **5/5** | 1.0/5 | 5.0/5 |
+| **Input tokens** (500 symbols) | **11,090** | 16,378 | 53,341 |
+| **Output tokens** (100 symbols) | **5,976** | 8,937 | 16,121 |
-Reproduce: `git clone https://github.com/blackwell-systems/toon && cd toon && git checkout gcf-comparison && cd benchmarks && pnpm install && pnpm benchmark:tokens`
+GCF wins all 6 datasets on [TOON's own benchmark](https://github.com/blackwell-systems/toon/tree/gcf-comparison). Full results: [gcformat.com/guide/benchmarks](https://gcformat.com/guide/benchmarks.html)
 ## Links

{gcf_python-0.4.0 → gcf_python-0.5.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "gcf-python"
-version = "0.4.0"
+version = "0.5.1"
 description = "Python implementation of GCF (Graph Compact Format): token-optimized wire format for LLM tool responses"
 readme = "README.md"
 license = {text = "MIT"}

{gcf_python-0.4.0 → gcf_python-0.5.1}/src/gcf/__init__.py RENAMED Viewed

@@ -40,13 +40,16 @@ from .delta import encode_delta
 from .encode import encode
 from .generic import encode_generic
 from .session import Session, encode_with_session
+from .decode_generic import decode_generic
 from .stream import StreamEncoder
+from .stream_generic import GenericStreamEncoder
 from .types import Components, DeltaPayload, Edge, Payload, Symbol
 __all__ = [
     "Components",
     "DecodeError",
     "DeltaPayload",
+    "GenericStreamEncoder",
     "Edge",
     "KIND_ABBREV",
     "KIND_EXPAND",
@@ -55,6 +58,7 @@ __all__ = [
     "StreamEncoder",
     "Symbol",
     "decode",
+    "decode_generic",
     "encode",
     "encode_delta",
     "encode_generic",

gcf_python-0.5.1/src/gcf/decode_generic.py ADDED Viewed

@@ -0,0 +1,255 @@
+"""GCF generic decoder: parses any GCF text (tabular or graph) back to Python objects."""
+from __future__ import annotations
+from typing import Any
+from .decode import decode
+def decode_generic(input_text: str) -> Any:
+    """Decode any GCF text back into Python objects.
+    Handles tabular arrays, key-value pairs, nested sections, inline
+    primitive arrays, and graph profile payloads.
+    Returns dicts, lists, and primitives matching the original structure.
+    """
+    input_text = input_text.rstrip("\n\r")
+    if not input_text:
+        return None
+    lines = input_text.split("\n")
+    # Graph profile fallback.
+    if lines[0].startswith("GCF "):
+        p = decode(input_text)
+        return {
+            "tool": p.tool,
+            "tokenBudget": p.token_budget,
+            "tokensUsed": p.tokens_used,
+            "packRoot": p.pack_root,
+            "symbols": [
+                {
+                    "qualifiedName": s.qualified_name,
+                    "kind": s.kind,
+                    "score": s.score,
+                    "provenance": s.provenance,
+                    "distance": s.distance,
+                }
+                for s in p.symbols
+            ],
+            "edges": [
+                {
+                    "source": e.source,
+                    "target": e.target,
+                    "edgeType": e.edge_type,
+                    **({"status": e.status} if e.status else {}),
+                }
+                for e in p.edges
+            ],
+        }
+    result: dict[str, Any] = {}
+    _parse_object(lines, 0, 0, result)
+    return result
+def _parse_object(lines: list[str], start: int, depth: int, out: dict[str, Any]) -> int:
+    indent = "  " * depth
+    i = start
+    while i < len(lines):
+        raw = lines[i].rstrip("\r")
+        if raw == "" or raw.startswith("# "):
+            i += 1
+            continue
+        if depth > 0 and not raw.startswith(indent):
+            break
+        content = raw[len(indent):] if depth > 0 else raw
+        if content.startswith("## _summary"):
+            i += 1
+            continue
+        if content.startswith("## "):
+            header = content[3:]
+            bracket_idx = header.find(" [")
+            if bracket_idx >= 0:
+                name = header[:bracket_idx]
+                rest = header[bracket_idx + 2:]
+                close_bracket = rest.find("]")
+                if close_bracket >= 0:
+                    after_bracket = rest[close_bracket + 1:]
+                    if after_bracket.startswith("{"):
+                        field_end = after_bracket.find("}")
+                        if field_end >= 0:
+                            fields = after_bracket[1:field_end].split(",")
+                            i += 1
+                            rows, consumed = _parse_tabular_rows(lines, i, depth, fields)
+                            out[name] = rows
+                            i += consumed
+                            continue
+                    else:
+                        count_str = rest[:close_bracket]
+                        if count_str == "0":
+                            out[name] = []
+                            i += 1
+                            continue
+                        i += 1
+                        items, consumed = _parse_non_uniform_array(lines, i, depth)
+                        out[name] = items
+                        i += consumed
+                        continue
+            name = header
+            bi = name.find(" [")
+            if bi >= 0:
+                name = name[:bi]
+            i += 1
+            nested: dict[str, Any] = {}
+            consumed = _parse_object(lines, i, depth + 1, nested)
+            out[name] = nested
+            i += consumed
+            continue
+        # Inline primitive array.
+        bracket_idx = content.find("[")
+        if bracket_idx > 0:
+            colon_idx = content.find("]: ")
+            if colon_idx > bracket_idx:
+                name = content[:bracket_idx]
+                vals_str = content[colon_idx + 3:]
+                out[name] = [_parse_value(v.strip()) for v in vals_str.split(",")]
+                i += 1
+                continue
+        # Key=value.
+        eq_idx = content.find("=")
+        if eq_idx > 0:
+            key = content[:eq_idx]
+            val = content[eq_idx + 1:]
+            out[key] = _parse_value(val)
+            i += 1
+            continue
+        i += 1
+    return i - start
+def _parse_tabular_rows(
+    lines: list[str], start: int, depth: int, fields: list[str]
+) -> tuple[list[Any], int]:
+    indent = "  " * depth
+    rows: list[Any] = []
+    i = start
+    while i < len(lines):
+        raw = lines[i].rstrip("\r")
+        if raw == "":
+            i += 1
+            continue
+        if depth > 0 and not raw.startswith(indent):
+            break
+        content = raw[len(indent):] if depth > 0 else raw
+        if content.startswith("## "):
+            break
+        if content.startswith("# "):
+            i += 1
+            continue
+        row_data = content
+        has_nested = False
+        if row_data.startswith("@"):
+            sp = row_data.find(" ")
+            if sp > 0:
+                row_data = row_data[sp + 1:]
+                has_nested = True
+        vals = row_data.split("|")
+        row: dict[str, Any] = {}
+        for j, f in enumerate(fields):
+            row[f] = _parse_value(vals[j]) if j < len(vals) else None
+        i += 1
+        if has_nested:
+            nested_indent = indent + "  "
+            while i < len(lines):
+                nl = lines[i].rstrip("\r")
+                if not nl.startswith(nested_indent):
+                    break
+                nc = nl[len(nested_indent):]
+                if nc.startswith("."):
+                    field_name = nc[1:]
+                    i += 1
+                    nested: dict[str, Any] = {}
+                    consumed = _parse_object(lines, i, depth + 2, nested)
+                    row[field_name] = nested
+                    i += consumed
+                else:
+                    break
+        rows.append(row)
+    return rows, i - start
+def _parse_non_uniform_array(
+    lines: list[str], start: int, depth: int
+) -> tuple[list[Any], int]:
+    indent = "  " * depth
+    items: list[Any] = []
+    i = start
+    while i < len(lines):
+        raw = lines[i].rstrip("\r")
+        if raw == "":
+            i += 1
+            continue
+        if depth > 0 and not raw.startswith(indent):
+            break
+        content = raw[len(indent):] if depth > 0 else raw
+        if content.startswith("## "):
+            break
+        if content.startswith("@"):
+            sp = content.find(" ")
+            if sp > 0:
+                items.append(_parse_value(content[sp + 1:]))
+            i += 1
+        else:
+            break
+    return items, i - start
+def _parse_value(s: str) -> Any:
+    if s == "-":
+        return None
+    if s == "true":
+        return True
+    if s == "false":
+        return False
+    if s == '""':
+        return ""
+    if len(s) >= 2 and s[0] == '"' and s[-1] == '"':
+        return s[1:-1].replace('\\"', '"').replace("\\\\", "\\")
+    try:
+        return int(s)
+    except ValueError:
+        pass
+    try:
+        return float(s)
+    except ValueError:
+        pass
+    return s

gcf_python-0.5.1/src/gcf/stream_generic.py ADDED Viewed

@@ -0,0 +1,111 @@
+"""GCF generic streaming encoder: zero-buffering tabular encode to any writable."""
+from __future__ import annotations
+import threading
+from typing import Any, Sequence
+class GenericStreamEncoder:
+    """Writes GCF tabular output incrementally as rows arrive.
+    Zero buffering: each row is written immediately. A trailer summary is
+    emitted on close() with the final counts.
+    Example::
+        enc = GenericStreamEncoder(sys.stdout)
+        enc.begin_array("employees", ["id", "name", "department", "salary"])
+        enc.write_row([1, "Alice", "Engineering", 95000])
+        enc.write_row([2, "Bob", "Sales", 72000])
+        enc.end_array()
+        enc.close()
+    """
+    def __init__(self, writer: Any) -> None:
+        self._w = writer
+        self._lock = threading.Lock()
+        self._sections: list[tuple[str, int]] = []
+        self._current: dict[str, Any] | None = None
+    def begin_array(self, name: str, fields: Sequence[str]) -> None:
+        """Start a tabular array section with deferred count [?]."""
+        with self._lock:
+            if self._current is not None:
+                self._end_array_locked()
+            self._w.write(f"## {name} [?]{{{','.join(fields)}}}\n")
+            self._current = {"name": name, "fields": list(fields), "count": 0}
+    def write_row(self, values: Sequence[Any]) -> None:
+        """Emit a single pipe-separated row immediately."""
+        with self._lock:
+            if self._current is None:
+                return
+            parts = [_format_value(v) for v in values]
+            self._w.write("|".join(parts) + "\n")
+            self._current["count"] += 1
+    def end_array(self) -> None:
+        """Close the current array section and record its count."""
+        with self._lock:
+            self._end_array_locked()
+    def write_kv(self, key: str, value: Any) -> None:
+        """Emit a key=value line immediately."""
+        with self._lock:
+            self._w.write(f"{key}={_format_value(value)}\n")
+    def write_section(self, name: str) -> None:
+        """Start a nested object section (## key)."""
+        with self._lock:
+            if self._current is not None:
+                self._end_array_locked()
+            self._w.write(f"## {name}\n")
+    def write_inline_array(self, name: str, values: Sequence[Any]) -> None:
+        """Emit a primitive array inline: name[N]: val1,val2,val3"""
+        with self._lock:
+            parts = [_format_value(v) for v in values]
+            self._w.write(f"{name}[{len(values)}]: {','.join(parts)}\n")
+    def close(self) -> None:
+        """Emit the ## _summary trailer with final counts."""
+        with self._lock:
+            if self._current is not None:
+                self._end_array_locked()
+            if not self._sections:
+                return
+            total_rows = 0
+            section_parts: list[str] = []
+            for name, count in self._sections:
+                section_parts.append(f"{name}:{count}")
+                total_rows += count
+            self._w.write(
+                f"## _summary rows={total_rows} sections={','.join(section_parts)}\n"
+            )
+    def _end_array_locked(self) -> None:
+        if self._current is None:
+            return
+        self._sections.append((self._current["name"], self._current["count"]))
+        self._current = None
+def _format_value(v: Any) -> str:
+    if v is None:
+        return "-"
+    if isinstance(v, bool):
+        return "true" if v else "false"
+    if isinstance(v, int):
+        return str(v)
+    if isinstance(v, float):
+        # Match Go's %g formatting
+        s = f"{v:g}"
+        return s
+    if isinstance(v, str):
+        if v == "":
+            return '""'
+        if "|" in v or "\n" in v:
+            return '"' + v.replace('"', '\\"') + '"'
+        return v
+    return str(v)

gcf_python-0.5.1/tests/test_stream_generic.py ADDED Viewed

@@ -0,0 +1,126 @@
+"""Tests for the GenericStreamEncoder."""
+import io
+from gcf import GenericStreamEncoder
+def test_tabular():
+    buf = io.StringIO()
+    enc = GenericStreamEncoder(buf)
+    enc.begin_array("employees", ["id", "name", "department", "salary"])
+    enc.write_row([1, "Alice", "Engineering", 95000])
+    enc.write_row([2, "Bob", "Sales", 72000])
+    enc.write_row([3, "Carol", "Marketing", 85000])
+    enc.end_array()
+    enc.close()
+    out = buf.getvalue()
+    assert "## employees [?]{id,name,department,salary}" in out
+    assert "1|Alice|Engineering|95000" in out
+    assert "## _summary rows=3 sections=employees:3" in out
+def test_kv_and_inline_array():
+    buf = io.StringIO()
+    enc = GenericStreamEncoder(buf)
+    enc.write_kv("name", "my-service")
+    enc.write_kv("version", "2.1.0")
+    enc.write_inline_array("tags", ["production", "us-east-1", "critical"])
+    enc.close()
+    out = buf.getvalue()
+    assert "name=my-service" in out
+    assert "tags[3]: production,us-east-1,critical" in out
+def test_incremental():
+    buf = io.StringIO()
+    enc = GenericStreamEncoder(buf)
+    enc.begin_array("data", ["id", "val"])
+    assert len(buf.getvalue()) > 0, "header should be written immediately"
+    header_len = len(buf.getvalue())
+    enc.write_row([1, "a"])
+    assert len(buf.getvalue()) > header_len, "row should be written immediately"
+    enc.end_array()
+    enc.close()
+def test_multiple_arrays():
+    buf = io.StringIO()
+    enc = GenericStreamEncoder(buf)
+    enc.begin_array("users", ["id", "name"])
+    enc.write_row([1, "Alice"])
+    enc.write_row([2, "Bob"])
+    enc.end_array()
+    enc.begin_array("roles", ["name", "level"])
+    enc.write_row(["admin", 10])
+    enc.end_array()
+    enc.close()
+    out = buf.getvalue()
+    assert "sections=users:2,roles:1" in out
+def test_null_and_bool():
+    buf = io.StringIO()
+    enc = GenericStreamEncoder(buf)
+    enc.begin_array("data", ["a", "b", "c"])
+    enc.write_row([None, True, False])
+    enc.end_array()
+    enc.close()
+    out = buf.getvalue()
+    assert "-|true|false" in out
+def test_empty_string_and_pipe():
+    buf = io.StringIO()
+    enc = GenericStreamEncoder(buf)
+    enc.begin_array("data", ["a", "b"])
+    enc.write_row(["", "has|pipe"])
+    enc.end_array()
+    enc.close()
+    out = buf.getvalue()
+    assert '""|"has|pipe"' in out
+def test_auto_close_on_begin_array():
+    buf = io.StringIO()
+    enc = GenericStreamEncoder(buf)
+    enc.begin_array("first", ["a"])
+    enc.write_row([1])
+    enc.begin_array("second", ["b"])
+    enc.write_row([2])
+    enc.end_array()
+    enc.close()
+    out = buf.getvalue()
+    assert "sections=first:1,second:1" in out
+def test_write_section():
+    buf = io.StringIO()
+    enc = GenericStreamEncoder(buf)
+    enc.begin_array("items", ["id"])
+    enc.write_row([1])
+    enc.write_section("metadata")
+    enc.write_kv("count", 1)
+    enc.close()
+    out = buf.getvalue()
+    assert "## metadata" in out
+    assert "## _summary rows=1 sections=items:1" in out