@clickhouse/client 1.22.0 → 1.23.0-head.70ad405.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (191) hide show
  1. package/CHANGELOG.md +1342 -0
  2. package/README.md +20 -7
  3. package/dist/client.d.ts +2 -2
  4. package/dist/client.js +11 -4
  5. package/dist/client.js.map +1 -1
  6. package/dist/common/clickhouse_types.d.ts +98 -0
  7. package/dist/common/clickhouse_types.js +30 -0
  8. package/dist/common/clickhouse_types.js.map +1 -0
  9. package/dist/common/client.d.ts +233 -0
  10. package/dist/common/client.js +414 -0
  11. package/dist/common/client.js.map +1 -0
  12. package/dist/common/config.d.ts +234 -0
  13. package/dist/common/config.js +364 -0
  14. package/dist/common/config.js.map +1 -0
  15. package/dist/common/connection.d.ts +124 -0
  16. package/dist/common/connection.js +3 -0
  17. package/dist/common/connection.js.map +1 -0
  18. package/dist/common/data_formatter/format_query_params.d.ts +11 -0
  19. package/dist/common/data_formatter/format_query_params.js +128 -0
  20. package/dist/common/data_formatter/format_query_params.js.map +1 -0
  21. package/dist/common/data_formatter/format_query_settings.d.ts +2 -0
  22. package/dist/common/data_formatter/format_query_settings.js +20 -0
  23. package/dist/common/data_formatter/format_query_settings.js.map +1 -0
  24. package/dist/common/data_formatter/formatter.d.ts +41 -0
  25. package/dist/common/data_formatter/formatter.js +78 -0
  26. package/dist/common/data_formatter/formatter.js.map +1 -0
  27. package/dist/common/data_formatter/index.d.ts +3 -0
  28. package/dist/common/data_formatter/index.js +24 -0
  29. package/dist/common/data_formatter/index.js.map +1 -0
  30. package/dist/common/error/error.d.ts +20 -0
  31. package/dist/common/error/error.js +73 -0
  32. package/dist/common/error/error.js.map +1 -0
  33. package/dist/common/error/index.d.ts +1 -0
  34. package/dist/common/error/index.js +18 -0
  35. package/dist/common/error/index.js.map +1 -0
  36. package/dist/common/index.d.ts +67 -0
  37. package/dist/common/index.js +97 -0
  38. package/dist/common/index.js.map +1 -0
  39. package/dist/common/logger.d.ts +80 -0
  40. package/dist/common/logger.js +154 -0
  41. package/dist/common/logger.js.map +1 -0
  42. package/dist/common/parse/column_types.d.ts +155 -0
  43. package/dist/common/parse/column_types.js +594 -0
  44. package/dist/common/parse/column_types.js.map +1 -0
  45. package/dist/common/parse/index.d.ts +2 -0
  46. package/dist/common/parse/index.js +19 -0
  47. package/dist/common/parse/index.js.map +1 -0
  48. package/dist/common/parse/json_handling.d.ts +19 -0
  49. package/dist/common/parse/json_handling.js +8 -0
  50. package/dist/common/parse/json_handling.js.map +1 -0
  51. package/dist/common/result.d.ts +90 -0
  52. package/dist/common/result.js +3 -0
  53. package/dist/common/result.js.map +1 -0
  54. package/dist/common/settings.d.ts +2007 -0
  55. package/dist/common/settings.js +19 -0
  56. package/dist/common/settings.js.map +1 -0
  57. package/dist/common/tracing.d.ts +146 -0
  58. package/dist/common/tracing.js +76 -0
  59. package/dist/common/tracing.js.map +1 -0
  60. package/dist/common/ts_utils.d.ts +4 -0
  61. package/dist/common/ts_utils.js +3 -0
  62. package/dist/common/ts_utils.js.map +1 -0
  63. package/dist/common/utils/connection.d.ts +21 -0
  64. package/dist/common/utils/connection.js +43 -0
  65. package/dist/common/utils/connection.js.map +1 -0
  66. package/dist/common/utils/index.d.ts +5 -0
  67. package/dist/common/utils/index.js +22 -0
  68. package/dist/common/utils/index.js.map +1 -0
  69. package/dist/common/utils/multipart.d.ts +34 -0
  70. package/dist/common/utils/multipart.js +81 -0
  71. package/dist/common/utils/multipart.js.map +1 -0
  72. package/dist/common/utils/sleep.d.ts +4 -0
  73. package/dist/common/utils/sleep.js +12 -0
  74. package/dist/common/utils/sleep.js.map +1 -0
  75. package/dist/common/utils/stream.d.ts +15 -0
  76. package/dist/common/utils/stream.js +50 -0
  77. package/dist/common/utils/stream.js.map +1 -0
  78. package/dist/common/utils/url.d.ts +20 -0
  79. package/dist/common/utils/url.js +67 -0
  80. package/dist/common/utils/url.js.map +1 -0
  81. package/dist/common/version.d.ts +2 -0
  82. package/dist/common/version.js +4 -0
  83. package/dist/common/version.js.map +1 -0
  84. package/dist/config.d.ts +22 -2
  85. package/dist/config.js +2 -2
  86. package/dist/config.js.map +1 -1
  87. package/dist/connection/compression.d.ts +2 -2
  88. package/dist/connection/compression.js +4 -4
  89. package/dist/connection/compression.js.map +1 -1
  90. package/dist/connection/create_connection.d.ts +1 -1
  91. package/dist/connection/node_base_connection.d.ts +3 -3
  92. package/dist/connection/node_base_connection.js +22 -22
  93. package/dist/connection/node_base_connection.js.map +1 -1
  94. package/dist/connection/node_custom_agent_connection.js +2 -2
  95. package/dist/connection/node_custom_agent_connection.js.map +1 -1
  96. package/dist/connection/node_http_connection.js +2 -2
  97. package/dist/connection/node_http_connection.js.map +1 -1
  98. package/dist/connection/node_https_connection.d.ts +1 -1
  99. package/dist/connection/node_https_connection.js +3 -3
  100. package/dist/connection/node_https_connection.js.map +1 -1
  101. package/dist/connection/socket_pool.d.ts +1 -1
  102. package/dist/connection/socket_pool.js +30 -30
  103. package/dist/connection/socket_pool.js.map +1 -1
  104. package/dist/connection/stream.d.ts +1 -1
  105. package/dist/connection/stream.js +9 -9
  106. package/dist/connection/stream.js.map +1 -1
  107. package/dist/index.d.ts +9 -7
  108. package/dist/index.js +26 -24
  109. package/dist/index.js.map +1 -1
  110. package/dist/result_set.d.ts +1 -1
  111. package/dist/result_set.js +10 -10
  112. package/dist/result_set.js.map +1 -1
  113. package/dist/utils/encoder.d.ts +1 -1
  114. package/dist/utils/encoder.js +5 -5
  115. package/dist/utils/encoder.js.map +1 -1
  116. package/dist/version.d.ts +1 -1
  117. package/dist/version.js +1 -1
  118. package/dist/version.js.map +1 -1
  119. package/package.json +10 -7
  120. package/skills/AGENTS.md +8 -0
  121. package/skills/clickhouse-js-node-rowbinary/AGENTS.md +44 -0
  122. package/skills/clickhouse-js-node-rowbinary/CHANGELOG.md +49 -0
  123. package/skills/clickhouse-js-node-rowbinary/EXAMPLES.md +48 -0
  124. package/skills/clickhouse-js-node-rowbinary/README.md +319 -0
  125. package/skills/clickhouse-js-node-rowbinary/SKILL.md +111 -0
  126. package/skills/clickhouse-js-node-rowbinary/case-studies/iot-rowbinary-vs-json.md +83 -0
  127. package/skills/clickhouse-js-node-rowbinary/case-studies/ledger-rowbinary-vs-json.md +103 -0
  128. package/skills/clickhouse-js-node-rowbinary/case-studies/logs-json-wins.md +86 -0
  129. package/skills/clickhouse-js-node-rowbinary/case-studies/wasm-vs-js.md +172 -0
  130. package/skills/clickhouse-js-node-rowbinary/reader.md +126 -0
  131. package/skills/clickhouse-js-node-rowbinary/src/examples/carts.ts +75 -0
  132. package/skills/clickhouse-js-node-rowbinary/src/examples/events.ts +51 -0
  133. package/skills/clickhouse-js-node-rowbinary/src/examples/iot.ts +158 -0
  134. package/skills/clickhouse-js-node-rowbinary/src/examples/ledger.ts +98 -0
  135. package/skills/clickhouse-js-node-rowbinary/src/examples/logs.ts +73 -0
  136. package/skills/clickhouse-js-node-rowbinary/src/examples/observability.ts +141 -0
  137. package/skills/clickhouse-js-node-rowbinary/src/examples/orders.ts +66 -0
  138. package/skills/clickhouse-js-node-rowbinary/src/examples/profiles.ts +60 -0
  139. package/skills/clickhouse-js-node-rowbinary/src/examples/telemetry.ts +102 -0
  140. package/skills/clickhouse-js-node-rowbinary/src/readers/aggregateFunction.ts +34 -0
  141. package/skills/clickhouse-js-node-rowbinary/src/readers/bool.ts +10 -0
  142. package/skills/clickhouse-js-node-rowbinary/src/readers/columnar.ts +125 -0
  143. package/skills/clickhouse-js-node-rowbinary/src/readers/compile.ts +328 -0
  144. package/skills/clickhouse-js-node-rowbinary/src/readers/composite.ts +181 -0
  145. package/skills/clickhouse-js-node-rowbinary/src/readers/core.ts +77 -0
  146. package/skills/clickhouse-js-node-rowbinary/src/readers/datetime.ts +113 -0
  147. package/skills/clickhouse-js-node-rowbinary/src/readers/decimals.ts +57 -0
  148. package/skills/clickhouse-js-node-rowbinary/src/readers/dynamic.ts +332 -0
  149. package/skills/clickhouse-js-node-rowbinary/src/readers/enums.ts +40 -0
  150. package/skills/clickhouse-js-node-rowbinary/src/readers/floats.ts +32 -0
  151. package/skills/clickhouse-js-node-rowbinary/src/readers/geo.ts +109 -0
  152. package/skills/clickhouse-js-node-rowbinary/src/readers/header.ts +29 -0
  153. package/skills/clickhouse-js-node-rowbinary/src/readers/integers.ts +95 -0
  154. package/skills/clickhouse-js-node-rowbinary/src/readers/interval.ts +54 -0
  155. package/skills/clickhouse-js-node-rowbinary/src/readers/ip.ts +93 -0
  156. package/skills/clickhouse-js-node-rowbinary/src/readers/json.ts +33 -0
  157. package/skills/clickhouse-js-node-rowbinary/src/readers/lowCardinality.ts +18 -0
  158. package/skills/clickhouse-js-node-rowbinary/src/readers/nested.ts +23 -0
  159. package/skills/clickhouse-js-node-rowbinary/src/readers/nothing.ts +29 -0
  160. package/skills/clickhouse-js-node-rowbinary/src/readers/reader.ts +68 -0
  161. package/skills/clickhouse-js-node-rowbinary/src/readers/rowBinaryWithNamesAndTypes.ts +155 -0
  162. package/skills/clickhouse-js-node-rowbinary/src/readers/rows.ts +58 -0
  163. package/skills/clickhouse-js-node-rowbinary/src/readers/simpleAggregateFunction.ts +20 -0
  164. package/skills/clickhouse-js-node-rowbinary/src/readers/stream.ts +276 -0
  165. package/skills/clickhouse-js-node-rowbinary/src/readers/strings.ts +55 -0
  166. package/skills/clickhouse-js-node-rowbinary/src/readers/time.ts +61 -0
  167. package/skills/clickhouse-js-node-rowbinary/src/readers/uuid.ts +153 -0
  168. package/skills/clickhouse-js-node-rowbinary/src/readers/varint.ts +70 -0
  169. package/skills/clickhouse-js-node-rowbinary/src/writers/aggregateFunction.ts +18 -0
  170. package/skills/clickhouse-js-node-rowbinary/src/writers/bool.ts +10 -0
  171. package/skills/clickhouse-js-node-rowbinary/src/writers/composite.ts +140 -0
  172. package/skills/clickhouse-js-node-rowbinary/src/writers/core.ts +92 -0
  173. package/skills/clickhouse-js-node-rowbinary/src/writers/datetime.ts +123 -0
  174. package/skills/clickhouse-js-node-rowbinary/src/writers/decimals.ts +51 -0
  175. package/skills/clickhouse-js-node-rowbinary/src/writers/enums.ts +18 -0
  176. package/skills/clickhouse-js-node-rowbinary/src/writers/floats.ts +40 -0
  177. package/skills/clickhouse-js-node-rowbinary/src/writers/geo.ts +125 -0
  178. package/skills/clickhouse-js-node-rowbinary/src/writers/integers.ts +90 -0
  179. package/skills/clickhouse-js-node-rowbinary/src/writers/interval.ts +11 -0
  180. package/skills/clickhouse-js-node-rowbinary/src/writers/ip.ts +121 -0
  181. package/skills/clickhouse-js-node-rowbinary/src/writers/lowCardinality.ts +12 -0
  182. package/skills/clickhouse-js-node-rowbinary/src/writers/nested.ts +17 -0
  183. package/skills/clickhouse-js-node-rowbinary/src/writers/nothing.ts +21 -0
  184. package/skills/clickhouse-js-node-rowbinary/src/writers/rows.ts +144 -0
  185. package/skills/clickhouse-js-node-rowbinary/src/writers/simpleAggregateFunction.ts +12 -0
  186. package/skills/clickhouse-js-node-rowbinary/src/writers/strings.ts +77 -0
  187. package/skills/clickhouse-js-node-rowbinary/src/writers/time.ts +54 -0
  188. package/skills/clickhouse-js-node-rowbinary/src/writers/uuid.ts +60 -0
  189. package/skills/clickhouse-js-node-rowbinary/src/writers/varint.ts +64 -0
  190. package/skills/clickhouse-js-node-rowbinary/src/writers/writer.ts +101 -0
  191. package/skills/clickhouse-js-node-rowbinary/writer.md +96 -0
@@ -0,0 +1,319 @@
1
+ # ClickHouse Node.js RowBinary Codec Generator
2
+
3
+ **If JS had a -O3 compiler flag, this skill would be it.** (for RowBinary read & write)
4
+
5
+ A skill and a library that lets a coding agent generate bespoke RowBinary codecs on the first pass from the column type definitions of a ClickHouse response. The [spirit](#the-spirit) behind the approach.
6
+
7
+ **Reads and writes.** Both directions are covered: readers (decode bytes → values) and writers (encode values → bytes), split under `src/readers/` and `src/writers/`. The reader path is the more mature one — the writers mirror it type-for-type, with a few decode-only paths (`Dynamic`, `JSON`, the runtime header/compile path, and the columnar typed-array path) not yet mirrored.
8
+
9
+ ## Status
10
+
11
+ - ✅ Sonnet 4.6: 60% -> 94.0% pass rate
12
+ - ✅ Opus 4.8: 71% -> 94.7% pass rate
13
+ - ✅ Haiku 4.5: 52% -> 86.0% pass rate
14
+ - ✅ Composer 2.5 Fast: 3x parser performance
15
+ - ✅ 724/724 tests (readers + writers)
16
+ - ✅ type-checked
17
+ - ✅ benchmarked
18
+
19
+ ## Example
20
+
21
+ Take a small orders result:
22
+
23
+ ```sql
24
+ SELECT id, uid, price, status FROM orders
25
+ -- id UInt8
26
+ -- uid UUID
27
+ -- price Decimal64(2)
28
+ -- status Enum8('new' = 1, 'shipped' = 2, 'done' = 3)
29
+ ```
30
+
31
+ **The API-only reader** — what you write by composing the library's combinators. Correct, clear, and a fine default:
32
+
33
+ ```ts
34
+ const readOrderRow: Reader<OrderRow> = (s) => ({
35
+ id: readUInt8(s),
36
+ uid: formatUUID(readUUID(s)),
37
+ price: readDecimal64(2)(s),
38
+ status: readInt8(s), // raw enum int; `readEnum8(map)` resolves it to the name
39
+ });
40
+ ```
41
+
42
+ **The optimized reader the skill generates** — same row, monomorphized to
43
+ straight-line code. The whole row is fixed-width (1 + 16 + 8 + 1 = 26 bytes), so
44
+ the four separate bounds checks coalesce into one `advance(s, 26)` and every leaf
45
+ read happens at a constant offset; the per-field combinators are gone:
46
+
47
+ ```ts
48
+ const readOrderRowFast: Reader<OrderRow> = (s) => {
49
+ const { buf, view } = s;
50
+ const o = advance(s, 26); // one bounds check for the whole 26-byte row
51
+ const id = buf[o]!;
52
+ const uid = formatUUIDTable(buf.subarray(o + 1, o + 17));
53
+ const price: DecimalValue = [view.getBigInt64(o + 17, true), 2];
54
+ const status = view.getInt8(o + 25);
55
+ return { id, uid, price, status };
56
+ };
57
+ ```
58
+
59
+ Same values, same streaming-safety — **~3.4x** faster.
60
+
61
+ ## How to use
62
+
63
+ As a library (comes with the skill):
64
+
65
+ ```bash
66
+ npm install @clickhouse/rowbinary
67
+ npx skills-npm setup
68
+ ```
69
+
70
+ As a skill only:
71
+
72
+ ```bash
73
+ npx skills add ClickHouse/clickhouse-js/skills/clickhouse-js-node-rowbinary
74
+ ```
75
+
76
+ ```console
77
+ > Hey, Claude, tell me what the rowbinary skill can do for me.
78
+ > A lot! It generates custom, high-performance RowBinary readers and writers…
79
+ > Super, generate a reader for the queries in app/src/model.ts.
80
+ < Reading skill clickhouse-js-node-rowbinary…
81
+ ```
82
+
83
+ ## Using it with the ClickHouse JS client
84
+
85
+ This library only **decodes** the bytes — it doesn't open connections. Pair it
86
+ with the official client to fetch a `RowBinary` response and feed the byte chunks
87
+ into `streamRowBatches(chunks, readRow)`.
88
+
89
+ `RowBinary` isn't one of the formats the client decodes itself, so don't use
90
+ `client.query({ format: ... })` for it. Instead use `client.exec({ query })` with
91
+ the `FORMAT RowBinary` clause written into the SQL yourself — `exec` hands back the
92
+ **raw, undecoded byte stream** of the response, which is exactly what this library
93
+ consumes. (Use plain `RowBinary`, not `RowBinaryWithNamesAndTypes`, unless your
94
+ reader also skips the leading names/types header.)
95
+
96
+ The row reader below is the `orders` example from [EXAMPLES.md](EXAMPLES.md); swap
97
+ in the reader the skill generates for your own columns.
98
+
99
+ ```ts
100
+ import {
101
+ type Reader,
102
+ readUInt8,
103
+ readInt8,
104
+ readUUID,
105
+ formatUUID,
106
+ readDecimal64,
107
+ type DecimalValue,
108
+ streamRowBatches,
109
+ } from "@clickhouse/rowbinary";
110
+ import { createClient } from "@clickhouse/client";
111
+
112
+ type OrderRow = {
113
+ id: number;
114
+ uid: string;
115
+ price: DecimalValue;
116
+ status: number;
117
+ };
118
+
119
+ const readOrderRow: Reader<OrderRow> = (s) => ({
120
+ id: readUInt8(s),
121
+ uid: formatUUID(readUUID(s)),
122
+ price: readDecimal64(2)(s),
123
+ status: readInt8(s), // raw enum int; `readEnum8(map)` resolves it to the name
124
+ });
125
+
126
+ // `exec` resolves to a Node `Stream.Readable`. It is already an
127
+ // `AsyncIterable<Uint8Array>` (chunks are `Buffer`/`Uint8Array`, which
128
+ // `streamRowBatches` normalizes), so pass `stream` straight in:
129
+
130
+ const client = createClient();
131
+
132
+ const { stream } = await client.exec({
133
+ query: "SELECT id, uid, price, status FROM orders FORMAT RowBinary",
134
+ });
135
+
136
+ for await (const rows of streamRowBatches(stream, readOrderRow)) {
137
+ for (const row of rows) console.log(row); // { id, uid, price: [unscaled, scale], status }
138
+ }
139
+
140
+ await client.close();
141
+ ```
142
+
143
+ ## Why it's worth it
144
+
145
+ Four pillars — speed, correctness, judgment, and lifting smaller models:
146
+
147
+ - **~2–3x faster code than the straightforward decoder.** The skill emits
148
+ monomorphized, flattened, straight-line code — inlined reads, bounds checks
149
+ coalesced across adjacent fixed-width columns, the right array layout — measured
150
+ at ~1.3–3.4x over the _same logic written with the plain combinator API_
151
+ (`npm run bench`). This is why
152
+ - inlined JIT friendly code
153
+ - benchmarked hot paths
154
+ - minimal allocations
155
+ - v8 and Node.js specific optimizations
156
+ - **Correct on the gotchas that otherwise quietly break.** UUID byte
157
+ order, `Variant`'s sort-by-type-name discriminant, `DateTime64` sub-second
158
+ precision, signed-high-word wide integers, faithful decimals, `Dynamic`/`JSON`
159
+ self-description, transparent wrappers, opaque `AggregateFunction` — each
160
+ encoded with a live, server-verified test ([details below](#correctness-on-the-gotcha-heavy-types)).
161
+ - **Judgment, not just code.** The skill carries the working knowledge to make
162
+ the right call _before_ writing a line, so the agent neither over- nor
163
+ under-engineers:
164
+ - **Is RowBinary even right?** For string-heavy results read as text, a `JSON*`
165
+ format + V8's native `JSON.parse` (plus `gzip`/`zstd`) can beat a JS RowBinary
166
+ decoder — reach for RowBinary when the data is numeric / wide-integer /
167
+ binary-blob heavy.
168
+ - **Whole buffer or stream?** Drop the `advance()` bounds checks for a complete
169
+ in-memory buffer (faster); keep them for a chunked HTTP response that must
170
+ survive rows straddling chunk boundaries.
171
+ - **Drop the portability scaffolding.** RowBinary is little-endian and the
172
+ target is x86/ARM, so the skill steers away from big-endian / byte-swap
173
+ "portability" code a cautious one-shot pass tends to add.
174
+ - **Improves smaller models' performance.** Because the skill hands over the
175
+ hard-won answers up front, it lifts a weaker model the most. In a 24-eval
176
+ with-skill vs no-skill benchmark, the skill [raised](eval_result_sonnet.md) **Sonnet 4.6** from 60.4% to
177
+ **94.0%** (+34pp) — bringing it level with skill-equipped **Opus 4.8** (94.7%),
178
+ which itself [gained](eval_result.md) +23pp (71.5% → 94.7%). Composer 2.5 Fast
179
+ [got](eval_result_composer.md) a 3x parser performance boost, Haiku 4.5
180
+ [raised](eval_result_haiku.md) from 52% to 86% — the skill closes
181
+ most of the model-capability gap on this task.
182
+
183
+ ## What it does
184
+
185
+ Given the columns of a query result — their names and ClickHouse type
186
+ definitions (as returned by `RowBinaryWithNamesAndTypes`, or supplied by the
187
+ user) — the skill generates parser code tailored to exactly those types. Rather
188
+ than shipping a generic, runtime-driven decoder, it emits straight-line code
189
+ that reads each column in order, so the parser only contains the logic the
190
+ specific result shape needs.
191
+
192
+ **Schema only known at runtime?** `compileRowBinaryWithNamesAndTypes(cursor)`
193
+ reads the `RowBinaryWithNamesAndTypes` header and folds each column type into a
194
+ reader on the fly (type strings parsed by `@clickhouse/datatype-parser`),
195
+ returning a `readRows` driver for the rest of the stream — a generic, no-codegen
196
+ path for dynamic schemas. The specialized codegen above stays the fast path when
197
+ the types are fixed.
198
+
199
+ ## Correctness on the gotcha-heavy types
200
+
201
+ For a plain `UInt64, String, DateTime` result a strong model already writes fast,
202
+ correct code on its own. The skill earns its keep on the **long tail of RowBinary
203
+ traps** — the encodings where a from-scratch decoder is quietly wrong — each one
204
+ captured here with a live, server-verified test:
205
+
206
+ - **UUID** — two little-endian `UInt64` halves, each byte-reversed vs. the text
207
+ form (not 16 bytes in order).
208
+ - **`Variant(...)`** — the 1-byte discriminant indexes the alternatives sorted by
209
+ **type name** (ClickHouse globally sorts them), NOT declaration order; `0xFF`
210
+ is NULL.
211
+ - **`DateTime64(P)`** — returned as `[Date, nanoseconds]` so the sub-second part
212
+ isn't lost to a `Date`'s millisecond resolution; `Time`/`Time64` are durations,
213
+ not instants.
214
+ - **Wide integers** — `Int128`/`Int256` compose from 64-bit words with the **high
215
+ word read signed**; 64-bit values stay `bigint`, never a lossy `number`.
216
+ - **Decimals** — kept as the exact `[unscaled, scale]` pair, not a lossy float.
217
+ - **`Dynamic` / `JSON`** — self-describing: a per-value binary type encoding, then
218
+ the value; declared typed `JSON` paths are written without a tag (need the
219
+ schema). Wrappers are erased (`Nullable`/`Variant` → concrete type).
220
+ - **Transparent wrappers** — `LowCardinality(T)` / `SimpleAggregateFunction(f, T)`
221
+ decode as the inner `T` (no dictionary layer in RowBinary); `Nested(...)` is
222
+ `Array(Tuple(...))` with no wire of its own.
223
+ - **`AggregateFunction(...)`** — opaque, unframed state: not decodable or even
224
+ skippable; finalize server-side instead.
225
+ - **`FixedString`** preserves trailing NUL padding; **`Enum`** decodes to the
226
+ underlying int (the name map is metadata); **`BFloat16`** is the top 16 bits of
227
+ a `Float32`.
228
+
229
+ This is also where a raw model is most likely to go wrong. In a clean-room test
230
+ on a `Variant` / `UUID` / `DateTime64` / `LowCardinality` schema, a no-skill
231
+ Sonnet produced a **silently wrong UUID** (treated the bytes as plain, missing
232
+ the two-reversed-halves layout), and a no-skill Opus got it right only after
233
+ **three web searches**. The skill hands over these answers up front — correct by
234
+ construction, no lookups. See `baseline/README.md` for the full control.
235
+
236
+ And the failure isn't a one-off — it's a coin-flip. Running the same no-skill
237
+ Sonnet on the `orders` schema (`UInt8, UUID, Decimal64(2), Enum8`) **5 times in
238
+ isolation**, only **3 of 5** runs decoded correctly; both failures were the same
239
+ UUID byte-order scramble. Even the passing runs varied ~1.9x in generated-code
240
+ throughput. With the skill, every run is correct. So a single A/B undersells the
241
+ gap: from scratch the model is right roughly 60% of the time and silently wrong
242
+ the rest, while the skill makes correctness deterministic.
243
+
244
+ ## Examples
245
+
246
+ Six end-to-end examples live in [EXAMPLES.md](EXAMPLES.md). Each ships both an API-combinator
247
+ reader and an optimized, monomorphized one, with a runnable round-trip test and
248
+ a benchmark — so the speedups below are measured, not claimed (Node 24 / V8;
249
+ `npm run bench` for your own numbers):
250
+
251
+ | Example | Columns | Optimized speedup |
252
+ | ----------------- | ---------------------------------------------------- | ------------------- |
253
+ | **orders** | `UUID`, `Decimal64`, `Enum8` | **~3.4x** |
254
+ | **carts** | nested `Array(Tuple(...))`, `Array(Nullable(...))` | **~2.0x** |
255
+ | **telemetry** | `Map`, `Array(Float64)`, `Nullable`, named `Tuple` | **~1.4x** |
256
+ | **observability** | `Variant`, `DateTime64(3)`, `LowCardinality`, nested | **~1.4x** |
257
+ | **profiles** | `Array(String)`, `Nullable(Int32)` | **~1.3x** |
258
+ | **events** | `UInt64`, `String`, `DateTime` scalars | **~1.05x — on par** |
259
+
260
+ Two axes drive the win. **Composite structure** is one: monomorphization pays in
261
+ proportion to how many per-row combinator closures it removes (`carts` /
262
+ `telemetry` / `observability`). **Per-row formatting** is the other, independent
263
+ of composites: `orders` is all-scalar yet the biggest win (~3.4x), almost
264
+ entirely from swapping the BigInt UUID formatter for the lookup-table
265
+ `formatUUIDTable`. The genuinely flat case — a scalar row with no hot formatter
266
+ (`events`) — is on par, so the simpler API reader is the right call there.
267
+ Measure, don't assume.
268
+
269
+ ## Scope
270
+
271
+ - **In scope (reading):** `RowBinary`, `RowBinaryWithNames`, and
272
+ `RowBinaryWithNamesAndTypes` decoding for Node.js — full-buffer and streaming
273
+ (chunked) via `advance()`/`NeedMoreData`, `readRows()`, and the async
274
+ `streamRowBatches()` (with a built-in small-chunk warning and the optional
275
+ `coalesceChunks()` debounce filter).
276
+ - **In scope (writing):** the inverse encode path — a `writeX` mirroring every
277
+ `readX`, appending bytes to a `Sink`, plus `writeRows()`. Imported from
278
+ `@clickhouse/rowbinary/writer`. A handful of decode-only paths are not yet
279
+ mirrored: `Dynamic`, `JSON`, the runtime header/compile path, and the columnar
280
+ typed-array path.
281
+ - **Out of scope (for now):** browsers and Edge runtimes, non-RowBinary formats
282
+ (JSON / CSV / TSV / Parquet), and big-endian hosts.
283
+
284
+ ## The spirit
285
+
286
+ A RowBinary codec generator is a narrow thing. But it's built as an instance of
287
+ a broader bet about what libraries become once a capable LLM is part of the
288
+ toolchain. Three shifts, each already visible in this repo:
289
+
290
+ - **Self-modifiable software.** The library deliberately ships _several_
291
+ equivalent decoders for the same type — `readUUID` / `readUUIDBigInt` /
292
+ `readUUIDHiLo`, `formatUUID` / `formatUUIDTable`, `new Array(n)` vs `[]`+push,
293
+ streaming vs whole-buffer — because the fastest one depends on the workload,
294
+ not the type. Today the agent picks at generation time from measured
295
+ benchmarks. The next step is to pair the skill with a tracing layer that runs
296
+ variant A against variant B _on the live workload_ and keeps whichever wins for
297
+ this data shape and access pattern — a parser that re-tunes itself as the
298
+ traffic drifts, instead of freezing one author's guess into a release.
299
+
300
+ - **Custom software.** The value here isn't a fixed high-level API; it's the
301
+ benchmarked building blocks plus the judgment to combine them. So the end user
302
+ doesn't bend their code to the authors' generic surface — they have the agent
303
+ assemble the high-level API _they_ actually want, shaped to their queries, row
304
+ shapes, and latency/memory budget. Two teams with different workloads grow two
305
+ different libraries from the same primitives, and neither inherits a design
306
+ decision that was only ever right for the original authors' use case.
307
+
308
+ - **Read-write libraries.** For either of the above to be safe, the source has to
309
+ be legible to an LLM, not merely runnable. So this repo is written _read-write_:
310
+ every tradeoff is commented where it's made — the per-column ClickHouse type
311
+ annotations, the `SAFE TO TOGGLE` markers on the fast variants, each reader's
312
+ doc comment carrying its exact monomorphized form. An LLM can
313
+ read _why_ a decision was made and change it in depth with confidence — not
314
+ just call the public functions, but safely rework the internals.
315
+
316
+ The through-line: the last mile is glue the LLM writes over stable, benchmarked
317
+ blocks, so the authors' job shrinks to exporting good primitives and documenting
318
+ their tradeoffs honestly — rather than trying to bake the right performance
319
+ constants for every possible workload into the library ahead of time.
@@ -0,0 +1,111 @@
1
+ ---
2
+ name: clickhouse-js-node-rowbinary
3
+ description: >
4
+ Generate TypeScript/JavaScript code that reads/decodes AND writes/encodes
5
+ ClickHouse RowBinary streams for the ClickHouse HTTP server.
6
+ Use this skill whenever a user wants to parse or produce `RowBinary`,
7
+ `RowBinaryWithNames`, or `RowBinaryWithNamesAndTypes`.
8
+ Node.js only, doesn't cover browsers.
9
+ ---
10
+
11
+ # ClickHouse JS RowBinary Codec Generator for Node.js
12
+
13
+ This skill generates both directions of the wire format: **readers** (decode
14
+ bytes → values) and **writers** (encode values → bytes, the mirror). A given
15
+ task normally needs only one side. This file is the shared entry point — the
16
+ format gate plus the principles common to both directions; the per-direction
17
+ decisions, guidance, and the per-type reference tables live in two sibling files.
18
+
19
+ **Pick your side — read only the one you need:**
20
+
21
+ - **Decoding a `RowBinary*` response** from ClickHouse into JS values →
22
+ **[reader.md](reader.md)**. Streaming vs whole-buffer, row-objects vs columnar,
23
+ fixed vs runtime schema, and the per-type reader reference.
24
+ - **Encoding JS values into a `RowBinary` payload** to send to ClickHouse →
25
+ **[writer.md](writer.md)**. The `Sink`/`writeX` building blocks, `writeRows`
26
+ streaming, and the per-type writer reference.
27
+
28
+ The per-type code is real, split by direction under `src/readers/` and
29
+ `src/writers/`.
30
+
31
+ ## First: is RowBinary even the right format?
32
+
33
+ RowBinary exists for throughput, but it is **not automatically the fastest
34
+ path** — match the format to the shape of the data before committing to a
35
+ bespoke parser.
36
+
37
+ **Prefer a `JSON*` format (e.g. `JSONEachRow`) when** the result is mostly
38
+ strings / JSON-like values that you consume wholesale — randomly accessing
39
+ essentially every field, running string/regexp methods on them, treating values
40
+ as text. V8's native `JSON.parse` is heavily optimized C++ and builds JS strings
41
+ and objects faster than a JS-level RowBinary decoder can; pair it with HTTP
42
+ response compression (`gzip` / `zstd`, which crushes JSON's repetitive keys) and
43
+ the wire cost shrinks too.
44
+
45
+ **RowBinary clearly wins when** the result is dominated by:
46
+
47
+ - **Wide numerics** — `Int128`/`Int256`/`UInt128`/`UInt256`,
48
+ `Decimal128`/`Decimal256`.
49
+ - **Binary / fixed-width blobs** — `IPv4`, `IPv6`, `UUID`, `FixedString`.
50
+ - **High-volume fixed-width numeric columns** generally, where each value is a
51
+ single `DataView` read.
52
+
53
+ **Prefer the `Native` format when** columnar load and client-side analytics are
54
+ the main goal (fold/scan/filter columns, feed typed arrays to a Worker or WASM).
55
+ `Native` is column-major, so it loads straight into one typed array per column
56
+ with no transpose.
57
+
58
+ For help choosing and consuming a `JSON*` format (or CSV / TSV) instead, use the
59
+ **`clickhouse-js-node-coding`** skill.
60
+
61
+ ## Core guidance (both directions)
62
+
63
+ These principles apply whether you are generating a reader or a writer; the
64
+ side-specific operational guidance is in [reader.md](reader.md) /
65
+ [writer.md](writer.md).
66
+
67
+ - **Little-endian only.** RowBinary is little-endian; target x86/ARM. Read and
68
+ write every multi-byte number with `DataView` accessors passing a **literal**
69
+ `true` for the `littleEndian` flag.
70
+
71
+ - **Correct first, then optimize.** First emit a correct codec built from the
72
+ plain per-type API. Only after it's correct (and tested) specialize it. Don't
73
+ bake performance assumptions in before correctness.
74
+
75
+ - **Monomorphize generic/composite types.** Emit specialized, inlined code per
76
+ type combination instead of passing functions as arguments where the type is
77
+ known ahead of time.
78
+
79
+ - **Inline the leaf ops.** The per-type `readX`/`writeX` functions are the
80
+ correct, composable reference; the generated codec should INLINE their bodies,
81
+ not call them, so the row loop is straight-line with no per-field indirection
82
+ (and so the fixed-width coalescing can fold the offset arithmetic together).
83
+
84
+ - **Annotate the type per column.** Inlining erases the type structure, so put a
85
+ short comment above each column's encode/decode block naming the ClickHouse
86
+ type it handles.
87
+
88
+ - **Shared scratch is not reentrant.** Some hot methods reuse a module-level
89
+ scratch buffer as a write-then-read pair — correct only because the access is
90
+ fully synchronous. An `async`/`yield` boundary between populating and reading
91
+ it corrupts the value.
92
+
93
+ - **TypeScript by default.** Generate TypeScript code and helpers unless the user
94
+ explicitly asks for plain JavaScript.
95
+
96
+ ## Worked examples
97
+
98
+ Six end-to-end examples with real speedup are catalogued in [EXAMPLES.md](EXAMPLES.md).
99
+
100
+ ## Out of scope
101
+
102
+ - **JSON / CSV / TSV / Parquet parsing** → use `clickhouse-js-node-coding`.
103
+ - **Connection errors, hangs, type mismatches** → use
104
+ `clickhouse-js-node-troubleshooting`.
105
+ - **Browser / Web Worker / Edge** → `@clickhouse/client-web`.
106
+
107
+ ## Still Stuck?
108
+
109
+ - [ClickHouse RowBinary format](https://clickhouse.com/docs/interfaces/formats#rowbinary)
110
+ - [ClickHouse data types](https://clickhouse.com/docs/sql-reference/data-types)
111
+ - [ClickHouse JS client docs](https://clickhouse.com/docs/integrations/javascript)
@@ -0,0 +1,83 @@
1
+ # Case study: RowBinary vs JSON on a table of IoT readings
2
+
3
+ **TL;DR** — On a dense fixed-width numeric row, the skill's optimized RowBinary
4
+ reader decodes **3.5x faster than the best JSON format** (`JSONCompactEachRow`)
5
+ and **5.4x faster than `JSONEachRow`**, over a wire that is **1.6–3.3x smaller**.
6
+ This is the workload shape the [SKILL's format-choice
7
+ guidance](../SKILL.md#first-is-rowbinary-even-the-right-format) points at
8
+ RowBinary for — and the numbers below are _measured_, not assumed.
9
+
10
+ Reproduce: `npx vitest bench --run tests/iot.bench.ts` (against a live
11
+ ClickHouse server). Source: [`tests/iot.bench.ts`](../tests/iot.bench.ts),
12
+ reader: [`src/examples/iot.ts`](../src/examples/iot.ts).
13
+
14
+ ## The data
15
+
16
+ A table of IoT sensor readings — every column fixed-width, not a string in the
17
+ row, so the whole record is a flat 41-byte run:
18
+
19
+ ```sql
20
+ sensor_id UInt32 -- 4 bytes
21
+ ts DateTime64(3) -- 8 bytes
22
+ temperature Float64 -- 8 bytes
23
+ humidity Float64 -- 8 bytes
24
+ pressure Float64 -- 8 bytes
25
+ battery Float32 -- 4 bytes
26
+ status UInt8 -- 1 byte
27
+ ```
28
+
29
+ 50,000 rows, fetched from a live server in three formats and decoded into
30
+ equivalent JS objects. A cross-format check asserts the RowBinary (binary
31
+ float) and JSON (decimal-text → float) decodes agree on every numeric column
32
+ before any timing is taken — so this measures the same work three ways, not
33
+ three different results.
34
+
35
+ ## What was compared
36
+
37
+ - **RowBinary — optimized.** The skill's monomorphized reader: the seven column
38
+ bounds checks coalesce into one `advance(s, 41)`, every field read at a
39
+ constant offset off that base.
40
+ - **RowBinary — API combinators.** The same logic written with the plain
41
+ per-type readers (`readUInt32`, `readFloat64`, …) — the clear default.
42
+ - **JSONCompactEachRow — `JSON.parse`.** Newline-delimited _arrays_ (no repeated
43
+ keys). The strongest JSON contender a knowledgeable user would pick.
44
+ - **JSONEachRow — `JSON.parse`.** Newline-delimited _objects_ (keys repeated
45
+ every row) — the naive idiomatic choice.
46
+
47
+ Both JSON paths use the fastest idiomatic decode: splice the rows into one
48
+ `[...]` document and hand it to V8's native `JSON.parse` in a single call.
49
+
50
+ ## Wire size (HTTP response bytes)
51
+
52
+ | Format | Size | B/row | vs RowBinary |
53
+ | ------------------ | ------- | ----- | ------------ |
54
+ | RowBinary | 2.05 MB | 41.0 | 1.0x |
55
+ | JSONCompactEachRow | 3.38 MB | 67.6 | 1.6x |
56
+ | JSONEachRow | 6.68 MB | 133.6 | 3.3x |
57
+
58
+ ## Decode throughput (full 50k-row decode; higher = faster)
59
+
60
+ | Decoder | ops/s | ms/decode | ≈ rows/s | speedup |
61
+ | --------------------------------- | ----- | --------- | -------- | -------- |
62
+ | **RowBinary — optimized** | 399 | 2.50 | ~20.0 M | **1.0x** |
63
+ | RowBinary — API combinators | 159 | 6.31 | ~7.9 M | 0.40x |
64
+ | JSONCompactEachRow — `JSON.parse` | 114 | 8.76 | ~5.7 M | 0.29x |
65
+ | JSONEachRow — `JSON.parse` | 74 | 13.47 | ~3.7 M | 0.19x |
66
+
67
+ _Node 24 / V8. Your numbers will vary; run `npm run bench` on your own hardware._
68
+
69
+ ## Takeaways
70
+
71
+ - **This is the textbook RowBinary win.** High-volume fixed-width numerics where
72
+ each field is one `DataView` read and there is no text to tokenize or numbers
73
+ to parse from decimal strings. The monomorphization win (2.5x over the
74
+ combinator API) is unusually large here because the whole row coalesces into a
75
+ _single_ bounds check with constant-offset reads.
76
+ - **Format choice matters more than the optimization.** Even the plain
77
+ combinator-API RowBinary reader (~7.9 M rows/s) beats the best JSON option —
78
+ before any monomorphization.
79
+ - **The flip side still holds.** Had this been a string-heavy result (logs, JSON
80
+ blobs, text consumed wholesale), `JSON.parse`'s optimized C++ would likely
81
+ _win_, and the skill would steer you to `JSONEachRow` + compression instead.
82
+ For IoT telemetry, RowBinary is clearly right — match the format to the shape
83
+ of the data.
@@ -0,0 +1,103 @@
1
+ # Case study: RowBinary vs JSON on a financial ledger (wide ints & decimals)
2
+
3
+ **TL;DR** — When every column is wider than a JS `number` can hold (`UInt128`,
4
+ `Int64`, `Decimal128(18)`, `UInt256`), RowBinary wins _twice over_. Stock
5
+ `JSON.parse` is not merely slow here — it is **silently wrong**, rounding every
6
+ value to a float64. The only correct JSON path quotes the values server-side and
7
+ re-parses each string into a `bigint`/decimal pair by hand, which is **~5x
8
+ slower** than the optimized RowBinary reader over a **2.1–2.6x larger** wire.
9
+ RowBinary reads each value exactly, straight off the wire.
10
+
11
+ This is the workload the [SKILL's format-choice
12
+ guidance](../SKILL.md#first-is-rowbinary-even-the-right-format) calls out
13
+ explicitly: "RowBinary clearly wins when the result is dominated by **wide
14
+ numerics** — `Int128`/`Int256`/`UInt128`/`UInt256`, `Decimal128`/`Decimal256`."
15
+
16
+ Reproduce: `npx vitest bench --run tests/ledger.bench.ts` (against a live
17
+ ClickHouse server). Source: [`tests/ledger.bench.ts`](../tests/ledger.bench.ts),
18
+ reader: [`src/examples/ledger.ts`](../src/examples/ledger.ts).
19
+
20
+ ## The data
21
+
22
+ A financial ledger — every column exceeds IEEE-754 double's 53-bit exact range:
23
+
24
+ ```sql
25
+ txn_id UInt128 -- 16 bytes
26
+ account Int64 -- 8 bytes (values past 2^53)
27
+ amount Decimal128(18) -- 16 bytes (~32 significant digits)
28
+ balance Decimal128(18) -- 16 bytes
29
+ fee Decimal64(4) -- 8 bytes
30
+ volume UInt256 -- 32 bytes
31
+ ```
32
+
33
+ 50,000 rows, fixed-width (96 bytes/row), fetched from a live server.
34
+
35
+ ## The correctness trap
36
+
37
+ ClickHouse emits these types as **bare, unquoted JSON numbers**. So stock
38
+ `JSON.parse` parses them as float64 and silently corrupts every one — measured
39
+ on row 0 of the live result:
40
+
41
+ | Column | Exact value (RowBinary) | `JSON.parse` of bare JSON | |
42
+ | --------- | ----------------------------------------- | ----------------------------------------- | ---------------- |
43
+ | `txn_id` | `340282366920938463463374607431768200000` | `340282366920938463463374607431768211456` | ✗ off by 11 456 |
44
+ | `account` | `9007199254740993` | `9007199254740992` | ✗ off by 1 |
45
+ | `amount` | `98765432109876.123456789012345678` | `98765432109876.12` | ✗ lost 16 digits |
46
+
47
+ No exception, no warning — just wrong numbers. For money and IDs, that is a
48
+ correctness bug, not a performance footnote.
49
+
50
+ ### Making JSON correct costs extra work
51
+
52
+ The only way to get exact values through JSON is to **quote them server-side** so
53
+ they arrive as strings, then re-parse each one:
54
+
55
+ ```sql
56
+ ... SETTINGS output_format_json_quote_64bit_integers = 1,
57
+ output_format_json_quote_decimals = 1
58
+ ```
59
+
60
+ ```ts
61
+ txn_id: BigInt(r.txn_id), // string -> bigint
62
+ amount: parseDecimal(r.amount, 18), // string -> [unscaled, scale]
63
+ ```
64
+
65
+ That per-field `BigInt(...)` / decimal parse is work RowBinary doesn't do — it
66
+ reads the exact `bigint` directly with two `DataView` reads — and it lands on
67
+ top of a larger wire (strings are longer than the binary words).
68
+
69
+ ## Wire size (correct paths quote wide values as strings)
70
+
71
+ | Format | Size | vs RowBinary |
72
+ | --------------------------- | -------- | ------------ |
73
+ | RowBinary | 4.80 MB | 1.0x |
74
+ | JSONCompactEachRow (quoted) | 9.88 MB | 2.1x |
75
+ | JSONEachRow (quoted) | 12.28 MB | 2.6x |
76
+
77
+ ## Decode throughput (full 50k-row decode; higher = faster)
78
+
79
+ | Decoder | ops/s | ms/decode | ≈ rows/s | speedup | correct? |
80
+ | -------------------------------------------------- | ----- | --------- | -------- | -------- | -------------- |
81
+ | **RowBinary — optimized** | 130 | 7.71 | ~6.5 M | **1.0x** | ✅ |
82
+ | RowBinary — API combinators | 80 | 12.50 | ~4.0 M | 0.62x | ✅ |
83
+ | JSONEachRow bare — `JSON.parse` only | 44 | 22.74 | ~2.2 M | 0.34x | ❌ **corrupt** |
84
+ | JSONCompactEachRow quoted — parse + BigInt/decimal | 26 | 37.78 | ~1.3 M | 0.20x | ✅ |
85
+ | JSONEachRow quoted — parse + BigInt/decimal | 25 | 40.70 | ~1.2 M | 0.19x | ✅ |
86
+
87
+ _Node 24 / V8. Your numbers will vary; run `npm run bench` on your own hardware._
88
+
89
+ ## Takeaways
90
+
91
+ - **The fast JSON path is the wrong one.** Bare `JSON.parse` is JSON's quickest
92
+ option and it is still 2.95x slower than RowBinary — _and_ it silently
93
+ corrupts every wide value. There is no "fast and correct" JSON here.
94
+ - **The correct JSON path is ~5x slower.** Quote + per-field `BigInt`/decimal
95
+ parsing is the price of correctness, on top of a 2.1–2.6x larger wire.
96
+ - **RowBinary is correct by construction.** Each value is composed from 64-bit
97
+ words read at constant offsets (high word signed for the signed types),
98
+ yielding an exact `bigint` or `[unscaled, scale]` pair — no rounding, no
99
+ string re-parsing.
100
+ - **Contrast with the [IoT case study](iot-rowbinary-vs-json.md):** there the
101
+ numbers fit a float64 and the win was purely throughput (3.5x). Here the values
102
+ don't fit, so the win is _correctness first_, throughput second. Match the
103
+ format to the shape of the data.