@clickhouse/client 1.22.0 → 1.23.0-head.70ad405.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +1342 -0
- package/README.md +20 -7
- package/dist/client.d.ts +2 -2
- package/dist/client.js +11 -4
- package/dist/client.js.map +1 -1
- package/dist/common/clickhouse_types.d.ts +98 -0
- package/dist/common/clickhouse_types.js +30 -0
- package/dist/common/clickhouse_types.js.map +1 -0
- package/dist/common/client.d.ts +233 -0
- package/dist/common/client.js +414 -0
- package/dist/common/client.js.map +1 -0
- package/dist/common/config.d.ts +234 -0
- package/dist/common/config.js +364 -0
- package/dist/common/config.js.map +1 -0
- package/dist/common/connection.d.ts +124 -0
- package/dist/common/connection.js +3 -0
- package/dist/common/connection.js.map +1 -0
- package/dist/common/data_formatter/format_query_params.d.ts +11 -0
- package/dist/common/data_formatter/format_query_params.js +128 -0
- package/dist/common/data_formatter/format_query_params.js.map +1 -0
- package/dist/common/data_formatter/format_query_settings.d.ts +2 -0
- package/dist/common/data_formatter/format_query_settings.js +20 -0
- package/dist/common/data_formatter/format_query_settings.js.map +1 -0
- package/dist/common/data_formatter/formatter.d.ts +41 -0
- package/dist/common/data_formatter/formatter.js +78 -0
- package/dist/common/data_formatter/formatter.js.map +1 -0
- package/dist/common/data_formatter/index.d.ts +3 -0
- package/dist/common/data_formatter/index.js +24 -0
- package/dist/common/data_formatter/index.js.map +1 -0
- package/dist/common/error/error.d.ts +20 -0
- package/dist/common/error/error.js +73 -0
- package/dist/common/error/error.js.map +1 -0
- package/dist/common/error/index.d.ts +1 -0
- package/dist/common/error/index.js +18 -0
- package/dist/common/error/index.js.map +1 -0
- package/dist/common/index.d.ts +67 -0
- package/dist/common/index.js +97 -0
- package/dist/common/index.js.map +1 -0
- package/dist/common/logger.d.ts +80 -0
- package/dist/common/logger.js +154 -0
- package/dist/common/logger.js.map +1 -0
- package/dist/common/parse/column_types.d.ts +155 -0
- package/dist/common/parse/column_types.js +594 -0
- package/dist/common/parse/column_types.js.map +1 -0
- package/dist/common/parse/index.d.ts +2 -0
- package/dist/common/parse/index.js +19 -0
- package/dist/common/parse/index.js.map +1 -0
- package/dist/common/parse/json_handling.d.ts +19 -0
- package/dist/common/parse/json_handling.js +8 -0
- package/dist/common/parse/json_handling.js.map +1 -0
- package/dist/common/result.d.ts +90 -0
- package/dist/common/result.js +3 -0
- package/dist/common/result.js.map +1 -0
- package/dist/common/settings.d.ts +2007 -0
- package/dist/common/settings.js +19 -0
- package/dist/common/settings.js.map +1 -0
- package/dist/common/tracing.d.ts +146 -0
- package/dist/common/tracing.js +76 -0
- package/dist/common/tracing.js.map +1 -0
- package/dist/common/ts_utils.d.ts +4 -0
- package/dist/common/ts_utils.js +3 -0
- package/dist/common/ts_utils.js.map +1 -0
- package/dist/common/utils/connection.d.ts +21 -0
- package/dist/common/utils/connection.js +43 -0
- package/dist/common/utils/connection.js.map +1 -0
- package/dist/common/utils/index.d.ts +5 -0
- package/dist/common/utils/index.js +22 -0
- package/dist/common/utils/index.js.map +1 -0
- package/dist/common/utils/multipart.d.ts +34 -0
- package/dist/common/utils/multipart.js +81 -0
- package/dist/common/utils/multipart.js.map +1 -0
- package/dist/common/utils/sleep.d.ts +4 -0
- package/dist/common/utils/sleep.js +12 -0
- package/dist/common/utils/sleep.js.map +1 -0
- package/dist/common/utils/stream.d.ts +15 -0
- package/dist/common/utils/stream.js +50 -0
- package/dist/common/utils/stream.js.map +1 -0
- package/dist/common/utils/url.d.ts +20 -0
- package/dist/common/utils/url.js +67 -0
- package/dist/common/utils/url.js.map +1 -0
- package/dist/common/version.d.ts +2 -0
- package/dist/common/version.js +4 -0
- package/dist/common/version.js.map +1 -0
- package/dist/config.d.ts +22 -2
- package/dist/config.js +2 -2
- package/dist/config.js.map +1 -1
- package/dist/connection/compression.d.ts +2 -2
- package/dist/connection/compression.js +4 -4
- package/dist/connection/compression.js.map +1 -1
- package/dist/connection/create_connection.d.ts +1 -1
- package/dist/connection/node_base_connection.d.ts +3 -3
- package/dist/connection/node_base_connection.js +22 -22
- package/dist/connection/node_base_connection.js.map +1 -1
- package/dist/connection/node_custom_agent_connection.js +2 -2
- package/dist/connection/node_custom_agent_connection.js.map +1 -1
- package/dist/connection/node_http_connection.js +2 -2
- package/dist/connection/node_http_connection.js.map +1 -1
- package/dist/connection/node_https_connection.d.ts +1 -1
- package/dist/connection/node_https_connection.js +3 -3
- package/dist/connection/node_https_connection.js.map +1 -1
- package/dist/connection/socket_pool.d.ts +1 -1
- package/dist/connection/socket_pool.js +30 -30
- package/dist/connection/socket_pool.js.map +1 -1
- package/dist/connection/stream.d.ts +1 -1
- package/dist/connection/stream.js +9 -9
- package/dist/connection/stream.js.map +1 -1
- package/dist/index.d.ts +9 -7
- package/dist/index.js +26 -24
- package/dist/index.js.map +1 -1
- package/dist/result_set.d.ts +1 -1
- package/dist/result_set.js +10 -10
- package/dist/result_set.js.map +1 -1
- package/dist/utils/encoder.d.ts +1 -1
- package/dist/utils/encoder.js +5 -5
- package/dist/utils/encoder.js.map +1 -1
- package/dist/version.d.ts +1 -1
- package/dist/version.js +1 -1
- package/dist/version.js.map +1 -1
- package/package.json +10 -7
- package/skills/AGENTS.md +8 -0
- package/skills/clickhouse-js-node-rowbinary/AGENTS.md +44 -0
- package/skills/clickhouse-js-node-rowbinary/CHANGELOG.md +49 -0
- package/skills/clickhouse-js-node-rowbinary/EXAMPLES.md +48 -0
- package/skills/clickhouse-js-node-rowbinary/README.md +319 -0
- package/skills/clickhouse-js-node-rowbinary/SKILL.md +111 -0
- package/skills/clickhouse-js-node-rowbinary/case-studies/iot-rowbinary-vs-json.md +83 -0
- package/skills/clickhouse-js-node-rowbinary/case-studies/ledger-rowbinary-vs-json.md +103 -0
- package/skills/clickhouse-js-node-rowbinary/case-studies/logs-json-wins.md +86 -0
- package/skills/clickhouse-js-node-rowbinary/case-studies/wasm-vs-js.md +172 -0
- package/skills/clickhouse-js-node-rowbinary/reader.md +126 -0
- package/skills/clickhouse-js-node-rowbinary/src/examples/carts.ts +75 -0
- package/skills/clickhouse-js-node-rowbinary/src/examples/events.ts +51 -0
- package/skills/clickhouse-js-node-rowbinary/src/examples/iot.ts +158 -0
- package/skills/clickhouse-js-node-rowbinary/src/examples/ledger.ts +98 -0
- package/skills/clickhouse-js-node-rowbinary/src/examples/logs.ts +73 -0
- package/skills/clickhouse-js-node-rowbinary/src/examples/observability.ts +141 -0
- package/skills/clickhouse-js-node-rowbinary/src/examples/orders.ts +66 -0
- package/skills/clickhouse-js-node-rowbinary/src/examples/profiles.ts +60 -0
- package/skills/clickhouse-js-node-rowbinary/src/examples/telemetry.ts +102 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/aggregateFunction.ts +34 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/bool.ts +10 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/columnar.ts +125 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/compile.ts +328 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/composite.ts +181 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/core.ts +77 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/datetime.ts +113 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/decimals.ts +57 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/dynamic.ts +332 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/enums.ts +40 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/floats.ts +32 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/geo.ts +109 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/header.ts +29 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/integers.ts +95 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/interval.ts +54 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/ip.ts +93 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/json.ts +33 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/lowCardinality.ts +18 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/nested.ts +23 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/nothing.ts +29 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/reader.ts +68 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/rowBinaryWithNamesAndTypes.ts +155 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/rows.ts +58 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/simpleAggregateFunction.ts +20 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/stream.ts +276 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/strings.ts +55 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/time.ts +61 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/uuid.ts +153 -0
- package/skills/clickhouse-js-node-rowbinary/src/readers/varint.ts +70 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/aggregateFunction.ts +18 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/bool.ts +10 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/composite.ts +140 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/core.ts +92 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/datetime.ts +123 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/decimals.ts +51 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/enums.ts +18 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/floats.ts +40 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/geo.ts +125 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/integers.ts +90 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/interval.ts +11 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/ip.ts +121 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/lowCardinality.ts +12 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/nested.ts +17 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/nothing.ts +21 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/rows.ts +144 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/simpleAggregateFunction.ts +12 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/strings.ts +77 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/time.ts +54 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/uuid.ts +60 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/varint.ts +64 -0
- package/skills/clickhouse-js-node-rowbinary/src/writers/writer.ts +101 -0
- package/skills/clickhouse-js-node-rowbinary/writer.md +96 -0
|
@@ -0,0 +1,319 @@
|
|
|
1
|
+
# ClickHouse Node.js RowBinary Codec Generator
|
|
2
|
+
|
|
3
|
+
**If JS had a -O3 compiler flag, this skill would be it.** (for RowBinary read & write)
|
|
4
|
+
|
|
5
|
+
A skill and a library that lets a coding agent generate bespoke RowBinary codecs on the first pass from the column type definitions of a ClickHouse response. The [spirit](#the-spirit) behind the approach.
|
|
6
|
+
|
|
7
|
+
**Reads and writes.** Both directions are covered: readers (decode bytes → values) and writers (encode values → bytes), split under `src/readers/` and `src/writers/`. The reader path is the more mature one — the writers mirror it type-for-type, with a few decode-only paths (`Dynamic`, `JSON`, the runtime header/compile path, and the columnar typed-array path) not yet mirrored.
|
|
8
|
+
|
|
9
|
+
## Status
|
|
10
|
+
|
|
11
|
+
- ✅ Sonnet 4.6: 60% -> 94.0% pass rate
|
|
12
|
+
- ✅ Opus 4.8: 71% -> 94.7% pass rate
|
|
13
|
+
- ✅ Haiku 4.5: 52% -> 86.0% pass rate
|
|
14
|
+
- ✅ Composer 2.5 Fast: 3x parser performance
|
|
15
|
+
- ✅ 724/724 tests (readers + writers)
|
|
16
|
+
- ✅ type-checked
|
|
17
|
+
- ✅ benchmarked
|
|
18
|
+
|
|
19
|
+
## Example
|
|
20
|
+
|
|
21
|
+
Take a small orders result:
|
|
22
|
+
|
|
23
|
+
```sql
|
|
24
|
+
SELECT id, uid, price, status FROM orders
|
|
25
|
+
-- id UInt8
|
|
26
|
+
-- uid UUID
|
|
27
|
+
-- price Decimal64(2)
|
|
28
|
+
-- status Enum8('new' = 1, 'shipped' = 2, 'done' = 3)
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
**The API-only reader** — what you write by composing the library's combinators. Correct, clear, and a fine default:
|
|
32
|
+
|
|
33
|
+
```ts
|
|
34
|
+
const readOrderRow: Reader<OrderRow> = (s) => ({
|
|
35
|
+
id: readUInt8(s),
|
|
36
|
+
uid: formatUUID(readUUID(s)),
|
|
37
|
+
price: readDecimal64(2)(s),
|
|
38
|
+
status: readInt8(s), // raw enum int; `readEnum8(map)` resolves it to the name
|
|
39
|
+
});
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
**The optimized reader the skill generates** — same row, monomorphized to
|
|
43
|
+
straight-line code. The whole row is fixed-width (1 + 16 + 8 + 1 = 26 bytes), so
|
|
44
|
+
the four separate bounds checks coalesce into one `advance(s, 26)` and every leaf
|
|
45
|
+
read happens at a constant offset; the per-field combinators are gone:
|
|
46
|
+
|
|
47
|
+
```ts
|
|
48
|
+
const readOrderRowFast: Reader<OrderRow> = (s) => {
|
|
49
|
+
const { buf, view } = s;
|
|
50
|
+
const o = advance(s, 26); // one bounds check for the whole 26-byte row
|
|
51
|
+
const id = buf[o]!;
|
|
52
|
+
const uid = formatUUIDTable(buf.subarray(o + 1, o + 17));
|
|
53
|
+
const price: DecimalValue = [view.getBigInt64(o + 17, true), 2];
|
|
54
|
+
const status = view.getInt8(o + 25);
|
|
55
|
+
return { id, uid, price, status };
|
|
56
|
+
};
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Same values, same streaming-safety — **~3.4x** faster.
|
|
60
|
+
|
|
61
|
+
## How to use
|
|
62
|
+
|
|
63
|
+
As a library (comes with the skill):
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
npm install @clickhouse/rowbinary
|
|
67
|
+
npx skills-npm setup
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
As a skill only:
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
npx skills add ClickHouse/clickhouse-js/skills/clickhouse-js-node-rowbinary
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
```console
|
|
77
|
+
> Hey, Claude, tell me what the rowbinary skill can do for me.
|
|
78
|
+
> A lot! It generates custom, high-performance RowBinary readers and writers…
|
|
79
|
+
> Super, generate a reader for the queries in app/src/model.ts.
|
|
80
|
+
< Reading skill clickhouse-js-node-rowbinary…
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
## Using it with the ClickHouse JS client
|
|
84
|
+
|
|
85
|
+
This library only **decodes** the bytes — it doesn't open connections. Pair it
|
|
86
|
+
with the official client to fetch a `RowBinary` response and feed the byte chunks
|
|
87
|
+
into `streamRowBatches(chunks, readRow)`.
|
|
88
|
+
|
|
89
|
+
`RowBinary` isn't one of the formats the client decodes itself, so don't use
|
|
90
|
+
`client.query({ format: ... })` for it. Instead use `client.exec({ query })` with
|
|
91
|
+
the `FORMAT RowBinary` clause written into the SQL yourself — `exec` hands back the
|
|
92
|
+
**raw, undecoded byte stream** of the response, which is exactly what this library
|
|
93
|
+
consumes. (Use plain `RowBinary`, not `RowBinaryWithNamesAndTypes`, unless your
|
|
94
|
+
reader also skips the leading names/types header.)
|
|
95
|
+
|
|
96
|
+
The row reader below is the `orders` example from [EXAMPLES.md](EXAMPLES.md); swap
|
|
97
|
+
in the reader the skill generates for your own columns.
|
|
98
|
+
|
|
99
|
+
```ts
|
|
100
|
+
import {
|
|
101
|
+
type Reader,
|
|
102
|
+
readUInt8,
|
|
103
|
+
readInt8,
|
|
104
|
+
readUUID,
|
|
105
|
+
formatUUID,
|
|
106
|
+
readDecimal64,
|
|
107
|
+
type DecimalValue,
|
|
108
|
+
streamRowBatches,
|
|
109
|
+
} from "@clickhouse/rowbinary";
|
|
110
|
+
import { createClient } from "@clickhouse/client";
|
|
111
|
+
|
|
112
|
+
type OrderRow = {
|
|
113
|
+
id: number;
|
|
114
|
+
uid: string;
|
|
115
|
+
price: DecimalValue;
|
|
116
|
+
status: number;
|
|
117
|
+
};
|
|
118
|
+
|
|
119
|
+
const readOrderRow: Reader<OrderRow> = (s) => ({
|
|
120
|
+
id: readUInt8(s),
|
|
121
|
+
uid: formatUUID(readUUID(s)),
|
|
122
|
+
price: readDecimal64(2)(s),
|
|
123
|
+
status: readInt8(s), // raw enum int; `readEnum8(map)` resolves it to the name
|
|
124
|
+
});
|
|
125
|
+
|
|
126
|
+
// `exec` resolves to a Node `Stream.Readable`. It is already an
|
|
127
|
+
// `AsyncIterable<Uint8Array>` (chunks are `Buffer`/`Uint8Array`, which
|
|
128
|
+
// `streamRowBatches` normalizes), so pass `stream` straight in:
|
|
129
|
+
|
|
130
|
+
const client = createClient();
|
|
131
|
+
|
|
132
|
+
const { stream } = await client.exec({
|
|
133
|
+
query: "SELECT id, uid, price, status FROM orders FORMAT RowBinary",
|
|
134
|
+
});
|
|
135
|
+
|
|
136
|
+
for await (const rows of streamRowBatches(stream, readOrderRow)) {
|
|
137
|
+
for (const row of rows) console.log(row); // { id, uid, price: [unscaled, scale], status }
|
|
138
|
+
}
|
|
139
|
+
|
|
140
|
+
await client.close();
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## Why it's worth it
|
|
144
|
+
|
|
145
|
+
Four pillars — speed, correctness, judgment, and lifting smaller models:
|
|
146
|
+
|
|
147
|
+
- **~2–3x faster code than the straightforward decoder.** The skill emits
|
|
148
|
+
monomorphized, flattened, straight-line code — inlined reads, bounds checks
|
|
149
|
+
coalesced across adjacent fixed-width columns, the right array layout — measured
|
|
150
|
+
at ~1.3–3.4x over the _same logic written with the plain combinator API_
|
|
151
|
+
(`npm run bench`). This is why
|
|
152
|
+
- inlined JIT friendly code
|
|
153
|
+
- benchmarked hot paths
|
|
154
|
+
- minimal allocations
|
|
155
|
+
- v8 and Node.js specific optimizations
|
|
156
|
+
- **Correct on the gotchas that otherwise quietly break.** UUID byte
|
|
157
|
+
order, `Variant`'s sort-by-type-name discriminant, `DateTime64` sub-second
|
|
158
|
+
precision, signed-high-word wide integers, faithful decimals, `Dynamic`/`JSON`
|
|
159
|
+
self-description, transparent wrappers, opaque `AggregateFunction` — each
|
|
160
|
+
encoded with a live, server-verified test ([details below](#correctness-on-the-gotcha-heavy-types)).
|
|
161
|
+
- **Judgment, not just code.** The skill carries the working knowledge to make
|
|
162
|
+
the right call _before_ writing a line, so the agent neither over- nor
|
|
163
|
+
under-engineers:
|
|
164
|
+
- **Is RowBinary even right?** For string-heavy results read as text, a `JSON*`
|
|
165
|
+
format + V8's native `JSON.parse` (plus `gzip`/`zstd`) can beat a JS RowBinary
|
|
166
|
+
decoder — reach for RowBinary when the data is numeric / wide-integer /
|
|
167
|
+
binary-blob heavy.
|
|
168
|
+
- **Whole buffer or stream?** Drop the `advance()` bounds checks for a complete
|
|
169
|
+
in-memory buffer (faster); keep them for a chunked HTTP response that must
|
|
170
|
+
survive rows straddling chunk boundaries.
|
|
171
|
+
- **Drop the portability scaffolding.** RowBinary is little-endian and the
|
|
172
|
+
target is x86/ARM, so the skill steers away from big-endian / byte-swap
|
|
173
|
+
"portability" code a cautious one-shot pass tends to add.
|
|
174
|
+
- **Improves smaller models' performance.** Because the skill hands over the
|
|
175
|
+
hard-won answers up front, it lifts a weaker model the most. In a 24-eval
|
|
176
|
+
with-skill vs no-skill benchmark, the skill [raised](eval_result_sonnet.md) **Sonnet 4.6** from 60.4% to
|
|
177
|
+
**94.0%** (+34pp) — bringing it level with skill-equipped **Opus 4.8** (94.7%),
|
|
178
|
+
which itself [gained](eval_result.md) +23pp (71.5% → 94.7%). Composer 2.5 Fast
|
|
179
|
+
[got](eval_result_composer.md) a 3x parser performance boost, Haiku 4.5
|
|
180
|
+
[raised](eval_result_haiku.md) from 52% to 86% — the skill closes
|
|
181
|
+
most of the model-capability gap on this task.
|
|
182
|
+
|
|
183
|
+
## What it does
|
|
184
|
+
|
|
185
|
+
Given the columns of a query result — their names and ClickHouse type
|
|
186
|
+
definitions (as returned by `RowBinaryWithNamesAndTypes`, or supplied by the
|
|
187
|
+
user) — the skill generates parser code tailored to exactly those types. Rather
|
|
188
|
+
than shipping a generic, runtime-driven decoder, it emits straight-line code
|
|
189
|
+
that reads each column in order, so the parser only contains the logic the
|
|
190
|
+
specific result shape needs.
|
|
191
|
+
|
|
192
|
+
**Schema only known at runtime?** `compileRowBinaryWithNamesAndTypes(cursor)`
|
|
193
|
+
reads the `RowBinaryWithNamesAndTypes` header and folds each column type into a
|
|
194
|
+
reader on the fly (type strings parsed by `@clickhouse/datatype-parser`),
|
|
195
|
+
returning a `readRows` driver for the rest of the stream — a generic, no-codegen
|
|
196
|
+
path for dynamic schemas. The specialized codegen above stays the fast path when
|
|
197
|
+
the types are fixed.
|
|
198
|
+
|
|
199
|
+
## Correctness on the gotcha-heavy types
|
|
200
|
+
|
|
201
|
+
For a plain `UInt64, String, DateTime` result a strong model already writes fast,
|
|
202
|
+
correct code on its own. The skill earns its keep on the **long tail of RowBinary
|
|
203
|
+
traps** — the encodings where a from-scratch decoder is quietly wrong — each one
|
|
204
|
+
captured here with a live, server-verified test:
|
|
205
|
+
|
|
206
|
+
- **UUID** — two little-endian `UInt64` halves, each byte-reversed vs. the text
|
|
207
|
+
form (not 16 bytes in order).
|
|
208
|
+
- **`Variant(...)`** — the 1-byte discriminant indexes the alternatives sorted by
|
|
209
|
+
**type name** (ClickHouse globally sorts them), NOT declaration order; `0xFF`
|
|
210
|
+
is NULL.
|
|
211
|
+
- **`DateTime64(P)`** — returned as `[Date, nanoseconds]` so the sub-second part
|
|
212
|
+
isn't lost to a `Date`'s millisecond resolution; `Time`/`Time64` are durations,
|
|
213
|
+
not instants.
|
|
214
|
+
- **Wide integers** — `Int128`/`Int256` compose from 64-bit words with the **high
|
|
215
|
+
word read signed**; 64-bit values stay `bigint`, never a lossy `number`.
|
|
216
|
+
- **Decimals** — kept as the exact `[unscaled, scale]` pair, not a lossy float.
|
|
217
|
+
- **`Dynamic` / `JSON`** — self-describing: a per-value binary type encoding, then
|
|
218
|
+
the value; declared typed `JSON` paths are written without a tag (need the
|
|
219
|
+
schema). Wrappers are erased (`Nullable`/`Variant` → concrete type).
|
|
220
|
+
- **Transparent wrappers** — `LowCardinality(T)` / `SimpleAggregateFunction(f, T)`
|
|
221
|
+
decode as the inner `T` (no dictionary layer in RowBinary); `Nested(...)` is
|
|
222
|
+
`Array(Tuple(...))` with no wire of its own.
|
|
223
|
+
- **`AggregateFunction(...)`** — opaque, unframed state: not decodable or even
|
|
224
|
+
skippable; finalize server-side instead.
|
|
225
|
+
- **`FixedString`** preserves trailing NUL padding; **`Enum`** decodes to the
|
|
226
|
+
underlying int (the name map is metadata); **`BFloat16`** is the top 16 bits of
|
|
227
|
+
a `Float32`.
|
|
228
|
+
|
|
229
|
+
This is also where a raw model is most likely to go wrong. In a clean-room test
|
|
230
|
+
on a `Variant` / `UUID` / `DateTime64` / `LowCardinality` schema, a no-skill
|
|
231
|
+
Sonnet produced a **silently wrong UUID** (treated the bytes as plain, missing
|
|
232
|
+
the two-reversed-halves layout), and a no-skill Opus got it right only after
|
|
233
|
+
**three web searches**. The skill hands over these answers up front — correct by
|
|
234
|
+
construction, no lookups. See `baseline/README.md` for the full control.
|
|
235
|
+
|
|
236
|
+
And the failure isn't a one-off — it's a coin-flip. Running the same no-skill
|
|
237
|
+
Sonnet on the `orders` schema (`UInt8, UUID, Decimal64(2), Enum8`) **5 times in
|
|
238
|
+
isolation**, only **3 of 5** runs decoded correctly; both failures were the same
|
|
239
|
+
UUID byte-order scramble. Even the passing runs varied ~1.9x in generated-code
|
|
240
|
+
throughput. With the skill, every run is correct. So a single A/B undersells the
|
|
241
|
+
gap: from scratch the model is right roughly 60% of the time and silently wrong
|
|
242
|
+
the rest, while the skill makes correctness deterministic.
|
|
243
|
+
|
|
244
|
+
## Examples
|
|
245
|
+
|
|
246
|
+
Six end-to-end examples live in [EXAMPLES.md](EXAMPLES.md). Each ships both an API-combinator
|
|
247
|
+
reader and an optimized, monomorphized one, with a runnable round-trip test and
|
|
248
|
+
a benchmark — so the speedups below are measured, not claimed (Node 24 / V8;
|
|
249
|
+
`npm run bench` for your own numbers):
|
|
250
|
+
|
|
251
|
+
| Example | Columns | Optimized speedup |
|
|
252
|
+
| ----------------- | ---------------------------------------------------- | ------------------- |
|
|
253
|
+
| **orders** | `UUID`, `Decimal64`, `Enum8` | **~3.4x** |
|
|
254
|
+
| **carts** | nested `Array(Tuple(...))`, `Array(Nullable(...))` | **~2.0x** |
|
|
255
|
+
| **telemetry** | `Map`, `Array(Float64)`, `Nullable`, named `Tuple` | **~1.4x** |
|
|
256
|
+
| **observability** | `Variant`, `DateTime64(3)`, `LowCardinality`, nested | **~1.4x** |
|
|
257
|
+
| **profiles** | `Array(String)`, `Nullable(Int32)` | **~1.3x** |
|
|
258
|
+
| **events** | `UInt64`, `String`, `DateTime` scalars | **~1.05x — on par** |
|
|
259
|
+
|
|
260
|
+
Two axes drive the win. **Composite structure** is one: monomorphization pays in
|
|
261
|
+
proportion to how many per-row combinator closures it removes (`carts` /
|
|
262
|
+
`telemetry` / `observability`). **Per-row formatting** is the other, independent
|
|
263
|
+
of composites: `orders` is all-scalar yet the biggest win (~3.4x), almost
|
|
264
|
+
entirely from swapping the BigInt UUID formatter for the lookup-table
|
|
265
|
+
`formatUUIDTable`. The genuinely flat case — a scalar row with no hot formatter
|
|
266
|
+
(`events`) — is on par, so the simpler API reader is the right call there.
|
|
267
|
+
Measure, don't assume.
|
|
268
|
+
|
|
269
|
+
## Scope
|
|
270
|
+
|
|
271
|
+
- **In scope (reading):** `RowBinary`, `RowBinaryWithNames`, and
|
|
272
|
+
`RowBinaryWithNamesAndTypes` decoding for Node.js — full-buffer and streaming
|
|
273
|
+
(chunked) via `advance()`/`NeedMoreData`, `readRows()`, and the async
|
|
274
|
+
`streamRowBatches()` (with a built-in small-chunk warning and the optional
|
|
275
|
+
`coalesceChunks()` debounce filter).
|
|
276
|
+
- **In scope (writing):** the inverse encode path — a `writeX` mirroring every
|
|
277
|
+
`readX`, appending bytes to a `Sink`, plus `writeRows()`. Imported from
|
|
278
|
+
`@clickhouse/rowbinary/writer`. A handful of decode-only paths are not yet
|
|
279
|
+
mirrored: `Dynamic`, `JSON`, the runtime header/compile path, and the columnar
|
|
280
|
+
typed-array path.
|
|
281
|
+
- **Out of scope (for now):** browsers and Edge runtimes, non-RowBinary formats
|
|
282
|
+
(JSON / CSV / TSV / Parquet), and big-endian hosts.
|
|
283
|
+
|
|
284
|
+
## The spirit
|
|
285
|
+
|
|
286
|
+
A RowBinary codec generator is a narrow thing. But it's built as an instance of
|
|
287
|
+
a broader bet about what libraries become once a capable LLM is part of the
|
|
288
|
+
toolchain. Three shifts, each already visible in this repo:
|
|
289
|
+
|
|
290
|
+
- **Self-modifiable software.** The library deliberately ships _several_
|
|
291
|
+
equivalent decoders for the same type — `readUUID` / `readUUIDBigInt` /
|
|
292
|
+
`readUUIDHiLo`, `formatUUID` / `formatUUIDTable`, `new Array(n)` vs `[]`+push,
|
|
293
|
+
streaming vs whole-buffer — because the fastest one depends on the workload,
|
|
294
|
+
not the type. Today the agent picks at generation time from measured
|
|
295
|
+
benchmarks. The next step is to pair the skill with a tracing layer that runs
|
|
296
|
+
variant A against variant B _on the live workload_ and keeps whichever wins for
|
|
297
|
+
this data shape and access pattern — a parser that re-tunes itself as the
|
|
298
|
+
traffic drifts, instead of freezing one author's guess into a release.
|
|
299
|
+
|
|
300
|
+
- **Custom software.** The value here isn't a fixed high-level API; it's the
|
|
301
|
+
benchmarked building blocks plus the judgment to combine them. So the end user
|
|
302
|
+
doesn't bend their code to the authors' generic surface — they have the agent
|
|
303
|
+
assemble the high-level API _they_ actually want, shaped to their queries, row
|
|
304
|
+
shapes, and latency/memory budget. Two teams with different workloads grow two
|
|
305
|
+
different libraries from the same primitives, and neither inherits a design
|
|
306
|
+
decision that was only ever right for the original authors' use case.
|
|
307
|
+
|
|
308
|
+
- **Read-write libraries.** For either of the above to be safe, the source has to
|
|
309
|
+
be legible to an LLM, not merely runnable. So this repo is written _read-write_:
|
|
310
|
+
every tradeoff is commented where it's made — the per-column ClickHouse type
|
|
311
|
+
annotations, the `SAFE TO TOGGLE` markers on the fast variants, each reader's
|
|
312
|
+
doc comment carrying its exact monomorphized form. An LLM can
|
|
313
|
+
read _why_ a decision was made and change it in depth with confidence — not
|
|
314
|
+
just call the public functions, but safely rework the internals.
|
|
315
|
+
|
|
316
|
+
The through-line: the last mile is glue the LLM writes over stable, benchmarked
|
|
317
|
+
blocks, so the authors' job shrinks to exporting good primitives and documenting
|
|
318
|
+
their tradeoffs honestly — rather than trying to bake the right performance
|
|
319
|
+
constants for every possible workload into the library ahead of time.
|
|
@@ -0,0 +1,111 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: clickhouse-js-node-rowbinary
|
|
3
|
+
description: >
|
|
4
|
+
Generate TypeScript/JavaScript code that reads/decodes AND writes/encodes
|
|
5
|
+
ClickHouse RowBinary streams for the ClickHouse HTTP server.
|
|
6
|
+
Use this skill whenever a user wants to parse or produce `RowBinary`,
|
|
7
|
+
`RowBinaryWithNames`, or `RowBinaryWithNamesAndTypes`.
|
|
8
|
+
Node.js only, doesn't cover browsers.
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
# ClickHouse JS RowBinary Codec Generator for Node.js
|
|
12
|
+
|
|
13
|
+
This skill generates both directions of the wire format: **readers** (decode
|
|
14
|
+
bytes → values) and **writers** (encode values → bytes, the mirror). A given
|
|
15
|
+
task normally needs only one side. This file is the shared entry point — the
|
|
16
|
+
format gate plus the principles common to both directions; the per-direction
|
|
17
|
+
decisions, guidance, and the per-type reference tables live in two sibling files.
|
|
18
|
+
|
|
19
|
+
**Pick your side — read only the one you need:**
|
|
20
|
+
|
|
21
|
+
- **Decoding a `RowBinary*` response** from ClickHouse into JS values →
|
|
22
|
+
**[reader.md](reader.md)**. Streaming vs whole-buffer, row-objects vs columnar,
|
|
23
|
+
fixed vs runtime schema, and the per-type reader reference.
|
|
24
|
+
- **Encoding JS values into a `RowBinary` payload** to send to ClickHouse →
|
|
25
|
+
**[writer.md](writer.md)**. The `Sink`/`writeX` building blocks, `writeRows`
|
|
26
|
+
streaming, and the per-type writer reference.
|
|
27
|
+
|
|
28
|
+
The per-type code is real, split by direction under `src/readers/` and
|
|
29
|
+
`src/writers/`.
|
|
30
|
+
|
|
31
|
+
## First: is RowBinary even the right format?
|
|
32
|
+
|
|
33
|
+
RowBinary exists for throughput, but it is **not automatically the fastest
|
|
34
|
+
path** — match the format to the shape of the data before committing to a
|
|
35
|
+
bespoke parser.
|
|
36
|
+
|
|
37
|
+
**Prefer a `JSON*` format (e.g. `JSONEachRow`) when** the result is mostly
|
|
38
|
+
strings / JSON-like values that you consume wholesale — randomly accessing
|
|
39
|
+
essentially every field, running string/regexp methods on them, treating values
|
|
40
|
+
as text. V8's native `JSON.parse` is heavily optimized C++ and builds JS strings
|
|
41
|
+
and objects faster than a JS-level RowBinary decoder can; pair it with HTTP
|
|
42
|
+
response compression (`gzip` / `zstd`, which crushes JSON's repetitive keys) and
|
|
43
|
+
the wire cost shrinks too.
|
|
44
|
+
|
|
45
|
+
**RowBinary clearly wins when** the result is dominated by:
|
|
46
|
+
|
|
47
|
+
- **Wide numerics** — `Int128`/`Int256`/`UInt128`/`UInt256`,
|
|
48
|
+
`Decimal128`/`Decimal256`.
|
|
49
|
+
- **Binary / fixed-width blobs** — `IPv4`, `IPv6`, `UUID`, `FixedString`.
|
|
50
|
+
- **High-volume fixed-width numeric columns** generally, where each value is a
|
|
51
|
+
single `DataView` read.
|
|
52
|
+
|
|
53
|
+
**Prefer the `Native` format when** columnar load and client-side analytics are
|
|
54
|
+
the main goal (fold/scan/filter columns, feed typed arrays to a Worker or WASM).
|
|
55
|
+
`Native` is column-major, so it loads straight into one typed array per column
|
|
56
|
+
with no transpose.
|
|
57
|
+
|
|
58
|
+
For help choosing and consuming a `JSON*` format (or CSV / TSV) instead, use the
|
|
59
|
+
**`clickhouse-js-node-coding`** skill.
|
|
60
|
+
|
|
61
|
+
## Core guidance (both directions)
|
|
62
|
+
|
|
63
|
+
These principles apply whether you are generating a reader or a writer; the
|
|
64
|
+
side-specific operational guidance is in [reader.md](reader.md) /
|
|
65
|
+
[writer.md](writer.md).
|
|
66
|
+
|
|
67
|
+
- **Little-endian only.** RowBinary is little-endian; target x86/ARM. Read and
|
|
68
|
+
write every multi-byte number with `DataView` accessors passing a **literal**
|
|
69
|
+
`true` for the `littleEndian` flag.
|
|
70
|
+
|
|
71
|
+
- **Correct first, then optimize.** First emit a correct codec built from the
|
|
72
|
+
plain per-type API. Only after it's correct (and tested) specialize it. Don't
|
|
73
|
+
bake performance assumptions in before correctness.
|
|
74
|
+
|
|
75
|
+
- **Monomorphize generic/composite types.** Emit specialized, inlined code per
|
|
76
|
+
type combination instead of passing functions as arguments where the type is
|
|
77
|
+
known ahead of time.
|
|
78
|
+
|
|
79
|
+
- **Inline the leaf ops.** The per-type `readX`/`writeX` functions are the
|
|
80
|
+
correct, composable reference; the generated codec should INLINE their bodies,
|
|
81
|
+
not call them, so the row loop is straight-line with no per-field indirection
|
|
82
|
+
(and so the fixed-width coalescing can fold the offset arithmetic together).
|
|
83
|
+
|
|
84
|
+
- **Annotate the type per column.** Inlining erases the type structure, so put a
|
|
85
|
+
short comment above each column's encode/decode block naming the ClickHouse
|
|
86
|
+
type it handles.
|
|
87
|
+
|
|
88
|
+
- **Shared scratch is not reentrant.** Some hot methods reuse a module-level
|
|
89
|
+
scratch buffer as a write-then-read pair — correct only because the access is
|
|
90
|
+
fully synchronous. An `async`/`yield` boundary between populating and reading
|
|
91
|
+
it corrupts the value.
|
|
92
|
+
|
|
93
|
+
- **TypeScript by default.** Generate TypeScript code and helpers unless the user
|
|
94
|
+
explicitly asks for plain JavaScript.
|
|
95
|
+
|
|
96
|
+
## Worked examples
|
|
97
|
+
|
|
98
|
+
Six end-to-end examples with real speedup are catalogued in [EXAMPLES.md](EXAMPLES.md).
|
|
99
|
+
|
|
100
|
+
## Out of scope
|
|
101
|
+
|
|
102
|
+
- **JSON / CSV / TSV / Parquet parsing** → use `clickhouse-js-node-coding`.
|
|
103
|
+
- **Connection errors, hangs, type mismatches** → use
|
|
104
|
+
`clickhouse-js-node-troubleshooting`.
|
|
105
|
+
- **Browser / Web Worker / Edge** → `@clickhouse/client-web`.
|
|
106
|
+
|
|
107
|
+
## Still Stuck?
|
|
108
|
+
|
|
109
|
+
- [ClickHouse RowBinary format](https://clickhouse.com/docs/interfaces/formats#rowbinary)
|
|
110
|
+
- [ClickHouse data types](https://clickhouse.com/docs/sql-reference/data-types)
|
|
111
|
+
- [ClickHouse JS client docs](https://clickhouse.com/docs/integrations/javascript)
|
|
@@ -0,0 +1,83 @@
|
|
|
1
|
+
# Case study: RowBinary vs JSON on a table of IoT readings
|
|
2
|
+
|
|
3
|
+
**TL;DR** — On a dense fixed-width numeric row, the skill's optimized RowBinary
|
|
4
|
+
reader decodes **3.5x faster than the best JSON format** (`JSONCompactEachRow`)
|
|
5
|
+
and **5.4x faster than `JSONEachRow`**, over a wire that is **1.6–3.3x smaller**.
|
|
6
|
+
This is the workload shape the [SKILL's format-choice
|
|
7
|
+
guidance](../SKILL.md#first-is-rowbinary-even-the-right-format) points at
|
|
8
|
+
RowBinary for — and the numbers below are _measured_, not assumed.
|
|
9
|
+
|
|
10
|
+
Reproduce: `npx vitest bench --run tests/iot.bench.ts` (against a live
|
|
11
|
+
ClickHouse server). Source: [`tests/iot.bench.ts`](../tests/iot.bench.ts),
|
|
12
|
+
reader: [`src/examples/iot.ts`](../src/examples/iot.ts).
|
|
13
|
+
|
|
14
|
+
## The data
|
|
15
|
+
|
|
16
|
+
A table of IoT sensor readings — every column fixed-width, not a string in the
|
|
17
|
+
row, so the whole record is a flat 41-byte run:
|
|
18
|
+
|
|
19
|
+
```sql
|
|
20
|
+
sensor_id UInt32 -- 4 bytes
|
|
21
|
+
ts DateTime64(3) -- 8 bytes
|
|
22
|
+
temperature Float64 -- 8 bytes
|
|
23
|
+
humidity Float64 -- 8 bytes
|
|
24
|
+
pressure Float64 -- 8 bytes
|
|
25
|
+
battery Float32 -- 4 bytes
|
|
26
|
+
status UInt8 -- 1 byte
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
50,000 rows, fetched from a live server in three formats and decoded into
|
|
30
|
+
equivalent JS objects. A cross-format check asserts the RowBinary (binary
|
|
31
|
+
float) and JSON (decimal-text → float) decodes agree on every numeric column
|
|
32
|
+
before any timing is taken — so this measures the same work three ways, not
|
|
33
|
+
three different results.
|
|
34
|
+
|
|
35
|
+
## What was compared
|
|
36
|
+
|
|
37
|
+
- **RowBinary — optimized.** The skill's monomorphized reader: the seven column
|
|
38
|
+
bounds checks coalesce into one `advance(s, 41)`, every field read at a
|
|
39
|
+
constant offset off that base.
|
|
40
|
+
- **RowBinary — API combinators.** The same logic written with the plain
|
|
41
|
+
per-type readers (`readUInt32`, `readFloat64`, …) — the clear default.
|
|
42
|
+
- **JSONCompactEachRow — `JSON.parse`.** Newline-delimited _arrays_ (no repeated
|
|
43
|
+
keys). The strongest JSON contender a knowledgeable user would pick.
|
|
44
|
+
- **JSONEachRow — `JSON.parse`.** Newline-delimited _objects_ (keys repeated
|
|
45
|
+
every row) — the naive idiomatic choice.
|
|
46
|
+
|
|
47
|
+
Both JSON paths use the fastest idiomatic decode: splice the rows into one
|
|
48
|
+
`[...]` document and hand it to V8's native `JSON.parse` in a single call.
|
|
49
|
+
|
|
50
|
+
## Wire size (HTTP response bytes)
|
|
51
|
+
|
|
52
|
+
| Format | Size | B/row | vs RowBinary |
|
|
53
|
+
| ------------------ | ------- | ----- | ------------ |
|
|
54
|
+
| RowBinary | 2.05 MB | 41.0 | 1.0x |
|
|
55
|
+
| JSONCompactEachRow | 3.38 MB | 67.6 | 1.6x |
|
|
56
|
+
| JSONEachRow | 6.68 MB | 133.6 | 3.3x |
|
|
57
|
+
|
|
58
|
+
## Decode throughput (full 50k-row decode; higher = faster)
|
|
59
|
+
|
|
60
|
+
| Decoder | ops/s | ms/decode | ≈ rows/s | speedup |
|
|
61
|
+
| --------------------------------- | ----- | --------- | -------- | -------- |
|
|
62
|
+
| **RowBinary — optimized** | 399 | 2.50 | ~20.0 M | **1.0x** |
|
|
63
|
+
| RowBinary — API combinators | 159 | 6.31 | ~7.9 M | 0.40x |
|
|
64
|
+
| JSONCompactEachRow — `JSON.parse` | 114 | 8.76 | ~5.7 M | 0.29x |
|
|
65
|
+
| JSONEachRow — `JSON.parse` | 74 | 13.47 | ~3.7 M | 0.19x |
|
|
66
|
+
|
|
67
|
+
_Node 24 / V8. Your numbers will vary; run `npm run bench` on your own hardware._
|
|
68
|
+
|
|
69
|
+
## Takeaways
|
|
70
|
+
|
|
71
|
+
- **This is the textbook RowBinary win.** High-volume fixed-width numerics where
|
|
72
|
+
each field is one `DataView` read and there is no text to tokenize or numbers
|
|
73
|
+
to parse from decimal strings. The monomorphization win (2.5x over the
|
|
74
|
+
combinator API) is unusually large here because the whole row coalesces into a
|
|
75
|
+
_single_ bounds check with constant-offset reads.
|
|
76
|
+
- **Format choice matters more than the optimization.** Even the plain
|
|
77
|
+
combinator-API RowBinary reader (~7.9 M rows/s) beats the best JSON option —
|
|
78
|
+
before any monomorphization.
|
|
79
|
+
- **The flip side still holds.** Had this been a string-heavy result (logs, JSON
|
|
80
|
+
blobs, text consumed wholesale), `JSON.parse`'s optimized C++ would likely
|
|
81
|
+
_win_, and the skill would steer you to `JSONEachRow` + compression instead.
|
|
82
|
+
For IoT telemetry, RowBinary is clearly right — match the format to the shape
|
|
83
|
+
of the data.
|
|
@@ -0,0 +1,103 @@
|
|
|
1
|
+
# Case study: RowBinary vs JSON on a financial ledger (wide ints & decimals)
|
|
2
|
+
|
|
3
|
+
**TL;DR** — When every column is wider than a JS `number` can hold (`UInt128`,
|
|
4
|
+
`Int64`, `Decimal128(18)`, `UInt256`), RowBinary wins _twice over_. Stock
|
|
5
|
+
`JSON.parse` is not merely slow here — it is **silently wrong**, rounding every
|
|
6
|
+
value to a float64. The only correct JSON path quotes the values server-side and
|
|
7
|
+
re-parses each string into a `bigint`/decimal pair by hand, which is **~5x
|
|
8
|
+
slower** than the optimized RowBinary reader over a **2.1–2.6x larger** wire.
|
|
9
|
+
RowBinary reads each value exactly, straight off the wire.
|
|
10
|
+
|
|
11
|
+
This is the workload the [SKILL's format-choice
|
|
12
|
+
guidance](../SKILL.md#first-is-rowbinary-even-the-right-format) calls out
|
|
13
|
+
explicitly: "RowBinary clearly wins when the result is dominated by **wide
|
|
14
|
+
numerics** — `Int128`/`Int256`/`UInt128`/`UInt256`, `Decimal128`/`Decimal256`."
|
|
15
|
+
|
|
16
|
+
Reproduce: `npx vitest bench --run tests/ledger.bench.ts` (against a live
|
|
17
|
+
ClickHouse server). Source: [`tests/ledger.bench.ts`](../tests/ledger.bench.ts),
|
|
18
|
+
reader: [`src/examples/ledger.ts`](../src/examples/ledger.ts).
|
|
19
|
+
|
|
20
|
+
## The data
|
|
21
|
+
|
|
22
|
+
A financial ledger — every column exceeds IEEE-754 double's 53-bit exact range:
|
|
23
|
+
|
|
24
|
+
```sql
|
|
25
|
+
txn_id UInt128 -- 16 bytes
|
|
26
|
+
account Int64 -- 8 bytes (values past 2^53)
|
|
27
|
+
amount Decimal128(18) -- 16 bytes (~32 significant digits)
|
|
28
|
+
balance Decimal128(18) -- 16 bytes
|
|
29
|
+
fee Decimal64(4) -- 8 bytes
|
|
30
|
+
volume UInt256 -- 32 bytes
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
50,000 rows, fixed-width (96 bytes/row), fetched from a live server.
|
|
34
|
+
|
|
35
|
+
## The correctness trap
|
|
36
|
+
|
|
37
|
+
ClickHouse emits these types as **bare, unquoted JSON numbers**. So stock
|
|
38
|
+
`JSON.parse` parses them as float64 and silently corrupts every one — measured
|
|
39
|
+
on row 0 of the live result:
|
|
40
|
+
|
|
41
|
+
| Column | Exact value (RowBinary) | `JSON.parse` of bare JSON | |
|
|
42
|
+
| --------- | ----------------------------------------- | ----------------------------------------- | ---------------- |
|
|
43
|
+
| `txn_id` | `340282366920938463463374607431768200000` | `340282366920938463463374607431768211456` | ✗ off by 11 456 |
|
|
44
|
+
| `account` | `9007199254740993` | `9007199254740992` | ✗ off by 1 |
|
|
45
|
+
| `amount` | `98765432109876.123456789012345678` | `98765432109876.12` | ✗ lost 16 digits |
|
|
46
|
+
|
|
47
|
+
No exception, no warning — just wrong numbers. For money and IDs, that is a
|
|
48
|
+
correctness bug, not a performance footnote.
|
|
49
|
+
|
|
50
|
+
### Making JSON correct costs extra work
|
|
51
|
+
|
|
52
|
+
The only way to get exact values through JSON is to **quote them server-side** so
|
|
53
|
+
they arrive as strings, then re-parse each one:
|
|
54
|
+
|
|
55
|
+
```sql
|
|
56
|
+
... SETTINGS output_format_json_quote_64bit_integers = 1,
|
|
57
|
+
output_format_json_quote_decimals = 1
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
```ts
|
|
61
|
+
txn_id: BigInt(r.txn_id), // string -> bigint
|
|
62
|
+
amount: parseDecimal(r.amount, 18), // string -> [unscaled, scale]
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
That per-field `BigInt(...)` / decimal parse is work RowBinary doesn't do — it
|
|
66
|
+
reads the exact `bigint` directly with two `DataView` reads — and it lands on
|
|
67
|
+
top of a larger wire (strings are longer than the binary words).
|
|
68
|
+
|
|
69
|
+
## Wire size (correct paths quote wide values as strings)
|
|
70
|
+
|
|
71
|
+
| Format | Size | vs RowBinary |
|
|
72
|
+
| --------------------------- | -------- | ------------ |
|
|
73
|
+
| RowBinary | 4.80 MB | 1.0x |
|
|
74
|
+
| JSONCompactEachRow (quoted) | 9.88 MB | 2.1x |
|
|
75
|
+
| JSONEachRow (quoted) | 12.28 MB | 2.6x |
|
|
76
|
+
|
|
77
|
+
## Decode throughput (full 50k-row decode; higher = faster)
|
|
78
|
+
|
|
79
|
+
| Decoder | ops/s | ms/decode | ≈ rows/s | speedup | correct? |
|
|
80
|
+
| -------------------------------------------------- | ----- | --------- | -------- | -------- | -------------- |
|
|
81
|
+
| **RowBinary — optimized** | 130 | 7.71 | ~6.5 M | **1.0x** | ✅ |
|
|
82
|
+
| RowBinary — API combinators | 80 | 12.50 | ~4.0 M | 0.62x | ✅ |
|
|
83
|
+
| JSONEachRow bare — `JSON.parse` only | 44 | 22.74 | ~2.2 M | 0.34x | ❌ **corrupt** |
|
|
84
|
+
| JSONCompactEachRow quoted — parse + BigInt/decimal | 26 | 37.78 | ~1.3 M | 0.20x | ✅ |
|
|
85
|
+
| JSONEachRow quoted — parse + BigInt/decimal | 25 | 40.70 | ~1.2 M | 0.19x | ✅ |
|
|
86
|
+
|
|
87
|
+
_Node 24 / V8. Your numbers will vary; run `npm run bench` on your own hardware._
|
|
88
|
+
|
|
89
|
+
## Takeaways
|
|
90
|
+
|
|
91
|
+
- **The fast JSON path is the wrong one.** Bare `JSON.parse` is JSON's quickest
|
|
92
|
+
option and it is still 2.95x slower than RowBinary — _and_ it silently
|
|
93
|
+
corrupts every wide value. There is no "fast and correct" JSON here.
|
|
94
|
+
- **The correct JSON path is ~5x slower.** Quote + per-field `BigInt`/decimal
|
|
95
|
+
parsing is the price of correctness, on top of a 2.1–2.6x larger wire.
|
|
96
|
+
- **RowBinary is correct by construction.** Each value is composed from 64-bit
|
|
97
|
+
words read at constant offsets (high word signed for the signed types),
|
|
98
|
+
yielding an exact `bigint` or `[unscaled, scale]` pair — no rounding, no
|
|
99
|
+
string re-parsing.
|
|
100
|
+
- **Contrast with the [IoT case study](iot-rowbinary-vs-json.md):** there the
|
|
101
|
+
numbers fit a float64 and the win was purely throughput (3.5x). Here the values
|
|
102
|
+
don't fit, so the win is _correctness first_, throughput second. Match the
|
|
103
|
+
format to the shape of the data.
|