@chrismo/superkit 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (74) hide show
  1. package/LICENSE.txt +29 -0
  2. package/README.md +26 -0
  3. package/dist/cli/pager.d.ts +6 -0
  4. package/dist/cli/pager.d.ts.map +1 -0
  5. package/dist/cli/pager.js +21 -0
  6. package/dist/cli/pager.js.map +1 -0
  7. package/dist/cli/skdoc.d.ts +3 -0
  8. package/dist/cli/skdoc.d.ts.map +1 -0
  9. package/dist/cli/skdoc.js +42 -0
  10. package/dist/cli/skdoc.js.map +1 -0
  11. package/dist/cli/skgrok.d.ts +3 -0
  12. package/dist/cli/skgrok.d.ts.map +1 -0
  13. package/dist/cli/skgrok.js +21 -0
  14. package/dist/cli/skgrok.js.map +1 -0
  15. package/dist/cli/skops.d.ts +3 -0
  16. package/dist/cli/skops.d.ts.map +1 -0
  17. package/dist/cli/skops.js +32 -0
  18. package/dist/cli/skops.js.map +1 -0
  19. package/dist/index.d.ts +10 -0
  20. package/dist/index.d.ts.map +1 -0
  21. package/dist/index.js +11 -0
  22. package/dist/index.js.map +1 -0
  23. package/dist/lib/docs.d.ts +11 -0
  24. package/dist/lib/docs.d.ts.map +1 -0
  25. package/dist/lib/docs.js +29 -0
  26. package/dist/lib/docs.js.map +1 -0
  27. package/dist/lib/expert-sections.d.ts +32 -0
  28. package/dist/lib/expert-sections.d.ts.map +1 -0
  29. package/dist/lib/expert-sections.js +130 -0
  30. package/dist/lib/expert-sections.js.map +1 -0
  31. package/dist/lib/grok.d.ts +15 -0
  32. package/dist/lib/grok.d.ts.map +1 -0
  33. package/dist/lib/grok.js +57 -0
  34. package/dist/lib/grok.js.map +1 -0
  35. package/dist/lib/help.d.ts +20 -0
  36. package/dist/lib/help.d.ts.map +1 -0
  37. package/dist/lib/help.js +163 -0
  38. package/dist/lib/help.js.map +1 -0
  39. package/dist/lib/recipes.d.ts +29 -0
  40. package/dist/lib/recipes.d.ts.map +1 -0
  41. package/dist/lib/recipes.js +133 -0
  42. package/dist/lib/recipes.js.map +1 -0
  43. package/dist/superkit.tar.gz +0 -0
  44. package/docs/grok-patterns.sup +89 -0
  45. package/docs/recipes/array.md +66 -0
  46. package/docs/recipes/array.spq +31 -0
  47. package/docs/recipes/character.md +110 -0
  48. package/docs/recipes/character.spq +57 -0
  49. package/docs/recipes/escape.md +159 -0
  50. package/docs/recipes/escape.spq +102 -0
  51. package/docs/recipes/format.md +51 -0
  52. package/docs/recipes/format.spq +24 -0
  53. package/docs/recipes/index.md +23 -0
  54. package/docs/recipes/integer.md +101 -0
  55. package/docs/recipes/integer.spq +53 -0
  56. package/docs/recipes/records.md +84 -0
  57. package/docs/recipes/records.spq +61 -0
  58. package/docs/recipes/string.md +177 -0
  59. package/docs/recipes/string.spq +105 -0
  60. package/docs/superdb-expert.md +929 -0
  61. package/docs/tutorials/bash_to_sup.md +123 -0
  62. package/docs/tutorials/chess-tiebreaks.md +233 -0
  63. package/docs/tutorials/debug.md +439 -0
  64. package/docs/tutorials/fork_for_window.md +296 -0
  65. package/docs/tutorials/grok.md +166 -0
  66. package/docs/tutorials/index.md +10 -0
  67. package/docs/tutorials/joins.md +79 -0
  68. package/docs/tutorials/moar_subqueries.md +35 -0
  69. package/docs/tutorials/subqueries.md +236 -0
  70. package/docs/tutorials/sup_to_bash.md +164 -0
  71. package/docs/tutorials/super_db_update.md +34 -0
  72. package/docs/tutorials/unnest.md +113 -0
  73. package/docs/zq-to-super-upgrades.md +549 -0
  74. package/package.json +46 -0
@@ -0,0 +1,929 @@
1
+ ---
2
+ title: "Expert Guide"
3
+ name: superdb-expert
4
+ description: "Expert guide for SuperDB queries and data transformations. Covers syntax, patterns, and best practices."
5
+ layout: default
6
+ nav_order: 2
7
+ superdb_version: "0.3.0"
8
+ last_updated: "2026-03-27"
9
+ ---
10
+
11
+ # SuperDB Expert Guide
12
+
13
+ SuperDB is a command-line tool and query engine that puts JSON and relational
14
+ tables on equal footing with a super-structured data model. It uses a
15
+ Unix-inspired pipeline syntax similar to jq but with its own distinct language
16
+ and semantics. This guide covers SuperDB's query syntax, patterns, and best
17
+ practices.
18
+
19
+ ## Note on Zed/zq Compatibility
20
+
21
+ **SuperDB replaces the older Zed/zq toolchain and has breaking changes.**
22
+ The Zed language documentation at zed.brimdata.io is outdated and incompatible
23
+ with current SuperDB syntax. Use SuperDB documentation at superdb.org and
24
+ GitHub examples instead. When in doubt, test syntax with the actual `super`
25
+ binary rather than relying on old Zed/zq examples.
26
+
27
+ ## Core Knowledge
28
+
29
+ ### SuperDB Binary
30
+
31
+ - The binary is `super` (not `superdb`)
32
+ - Common flags:
33
+ - `-c` for command/query
34
+ - `-j` for JSON output
35
+ - `-J` for pretty JSON
36
+ - `-s` for SUP format
37
+ - `-S` for pretty SUP
38
+ - `-f` for output format (sup, json, bsup, csup, arrows, parquet, csv, etc.)
39
+ - `-i` for input format
40
+ - `-f line` for clean number formatting without type decorators
41
+
42
+ #### Removed switches
43
+
44
+ - `-z` for deprecated ZSON name. **Removed** — no longer accepted.
45
+ - `-Z` for deprecated ZSON name. **Removed** — no longer accepted.
46
+
47
+ ### Critical Rules
48
+
49
+ 1. **Trailing dash**: ONLY use `-` at the end of a super command when piping
50
+ stdin. Never use it without stdin or super returns empty.
51
+
52
+ - Bad: `super -j -c "values {token: \"$token\"}" -` (no stdin)
53
+ - Good: `super -j -c "values {token: \"$token\"}"` (no stdin, no dash)
54
+ - Good: `echo "$data" | super -j -c "query" -` (has stdin, has dash)
55
+
56
+ 2. **Syntax differences from JavaScript**:
57
+
58
+ - Use `values` instead of `yield`
59
+ - Use `unnest` instead of `over`
60
+ - Type casting: `cast(myvar, <int64>)` may require either `-s` or `-f line` for clean output.
61
+
62
+ ## Language Syntax Reference
63
+
64
+ ### Pipeline Pattern
65
+
66
+ SuperDB uses Unix-inspired pipeline syntax:
67
+
68
+ ```
69
+ command | command | command | ...
70
+ ```
71
+
72
+ ### Fork Operations (Parallel Processing)
73
+
74
+ SuperDB supports fork operations for parallel data processing:
75
+
76
+ ```
77
+ from source
78
+ | fork
79
+ ( operator | filter | transform )
80
+ ( operator | different_filter | transform )
81
+ | join on condition
82
+ ```
83
+
84
+ - Each branch runs in parallel using parentheses syntax
85
+ - Branches can be combined, merged, or joined
86
+ - Without explicit join/merge, an implied "combine" operator forwards values
87
+ - **Note:** The `=>` fat arrow syntax from the old Zed language is not supported.
88
+
89
+ ## PostgreSQL Compatibility & Traditional SQL
90
+
91
+ SuperDB is rapidly evolving toward full PostgreSQL compatibility while maintaining
92
+ its unique pipe-style syntax. You can use either traditional SQL or pipe syntax.
93
+
94
+ ### SQL Compatibility Features
95
+
96
+ - **Backward compatible**: Any SQL query is also a SuperSQL query
97
+ - **Embedded SQL**: SQL queries can appear as pipe operators anywhere
98
+ - **Mixed syntax**: Combine pipe and SQL syntax in the same query
99
+ - **SQL scoping**: Traditional SQL scoping rules apply inside SQL operators
100
+
101
+ ### Common Table Expressions (CTEs)
102
+
103
+ SuperDB supports CTEs using standard WITH clause syntax:
104
+
105
+ ```sql
106
+ with user_stats as (select user_id, count(*) as total_actions
107
+ from events
108
+ where date >= '2024-01-01'
109
+ group by user_id),
110
+ active_users as (select user_id
111
+ from user_stats
112
+ where total_actions > 10)
113
+ select *
114
+ from active_users;
115
+ ```
116
+
117
+ ### Traditional SQL Syntax
118
+
119
+ Standard SQL operations work alongside pipe operations:
120
+
121
+ ```sql
122
+ -- Basic SELECT
123
+ select id, name, email
124
+ from users
125
+ where active = true;
126
+
127
+ -- JOINs
128
+ select u.name, p.title
129
+ from users u
130
+ join projects p on u.id = p.owner_id;
131
+
132
+ -- Subqueries
133
+ select name
134
+ from users
135
+ where id in (select user_id from projects where status = 'active');
136
+ ```
137
+
138
+ ### SQL + Pipe Hybrid Queries
139
+
140
+ Combine SQL and pipe syntax for maximum flexibility:
141
+
142
+ ```sql
143
+ select union(type) as kinds, network_of(srcip) as net
144
+ from ( from source | ? "example.com" and "urgent")
145
+ where message_length > 100
146
+ group by net;
147
+ ```
148
+
149
+ ### PostgreSQL-Compatible Features
150
+
151
+ - Window functions are **not yet implemented** (planned post-GA, see [#5921](https://github.com/brimdata/super/issues/5921))
152
+ - Advanced JOIN types (LEFT, RIGHT, FULL OUTER, CROSS)
153
+ - Aggregate functions (COUNT, SUM, AVG, MIN, MAX, STRING_AGG)
154
+ - CASE expressions and conditional logic
155
+ - Date/time functions and operations
156
+ - Array and JSON operations
157
+ - Regular expressions (SIMILAR TO, regexp functions)
158
+
159
+ **Note**: PostgreSQL compatibility is actively being developed. Some features
160
+ may have subtle differences from pure PostgreSQL behavior.
161
+
162
+ ### Core Operators
163
+
164
+ #### unnest
165
+
166
+ Flattens arrays into individual elements:
167
+
168
+ ```
169
+ # Input: [1,2,3]
170
+ # Query: unnest this
171
+ # Output: 1, 2, 3
172
+ ```
173
+
174
+ #### switch
175
+
176
+ Conditional processing with cases:
177
+
178
+ ```
179
+ switch
180
+ case a == 2 ( put v:='two' )
181
+ case a == 1 ( put v:='one' )
182
+ case a == 3 ( values null )
183
+ case true ( put a:=-1 )
184
+ ```
185
+
186
+ **Adding fields with switch:**
187
+ Use `put field:='value'` to add new fields to records:
188
+
189
+ ```
190
+ | switch
191
+ case period=='today' ( put prefix:='Daily milestone' )
192
+ case period=='week' ( put prefix:='Weekly milestone' )
193
+ case true ( put prefix:='All time milestone' )
194
+ ```
195
+
196
+ #### cut
197
+
198
+ Select specific fields (like SQL SELECT):
199
+
200
+ ```
201
+ cut field1, field2, nested.field, new_name:=old_name
202
+ ```
203
+
204
+ NOTE: you can REORDER the output with cut as well.
205
+
206
+ #### drop
207
+
208
+ Remove specific fields:
209
+
210
+ ```
211
+ drop unwanted_field, nested.unwanted
212
+ ```
213
+
214
+ #### put
215
+
216
+ Add or modify fields:
217
+
218
+ ```
219
+ put new_field:=value, computed:=field1+field2
220
+ ```
221
+
222
+ #### join
223
+
224
+ Combine data streams:
225
+
226
+ ```
227
+ join on key=key other_stream
228
+ ```
229
+
230
+ #### search
231
+
232
+ Pattern matching:
233
+
234
+ ```
235
+ search "keyword"
236
+ search /regex_pattern/
237
+ ? "keyword" # shorthand for search
238
+ ```
239
+
240
+ #### where
241
+
242
+ Filter records:
243
+
244
+ ```
245
+ where field > 100 AND status == "active"
246
+ ```
247
+
248
+ #### aggregate/summarize
249
+
250
+ Group and aggregate data:
251
+
252
+ ```
253
+ aggregate count:=count(), sum:=sum(amount) by category
254
+ summarize avg(value), max(value) by group
255
+ ```
256
+
257
+ #### sort
258
+
259
+ Order results:
260
+
261
+ ```
262
+ sort field
263
+ sort -r field # reverse
264
+ sort field1, -field2 # multi-field
265
+ ```
266
+
267
+ #### head/tail
268
+
269
+ Limit results:
270
+
271
+ ```
272
+ head 10
273
+ tail 5
274
+ ```
275
+
276
+ #### uniq
277
+
278
+ Remove duplicates:
279
+
280
+ ```
281
+ uniq
282
+ uniq -c # with count
283
+ ```
284
+
285
+ ## Practical Query Patterns
286
+
287
+ ### Basic Transformations
288
+
289
+ ```bash
290
+ # Convert JSON to SUP
291
+ super -s data.json
292
+
293
+ # Filter and transform
294
+ echo '{"a":1,"b":2}' | super -s -c "put c:=a+b | drop a" -
295
+
296
+ # Type conversion with clean output
297
+ super -f line -c "int64(123.45)"
298
+ ```
299
+
300
+ ### Complex Pipelines
301
+
302
+ ```bash
303
+ # Search, filter, and aggregate - return JSON
304
+ super -j -c '
305
+ search "error"
306
+ | where severity > 3
307
+ | aggregate count:=count() by type
308
+ | sort -count
309
+ ' logs.json
310
+
311
+ # Fork operation with parallel branches - return SuperJSON text
312
+ super -s -c '
313
+ from data.json
314
+ | fork
315
+ ( where type=="A" | put tag:="alpha" )
316
+ ( where type=="B" | put tag:="beta" )
317
+ | sort timestamp
318
+ '
319
+ ```
320
+
321
+ ### Data Type Handling
322
+
323
+ ```bash
324
+ # Mixed-type arrays - return pretty-printed JSON
325
+ echo '[1, "foo", 2.3, true]' | super -J -c "unnest this" -
326
+
327
+ # Type switching - return pretty-printed SuperJSON
328
+ super -S -c '
329
+ switch
330
+ case typeof(value) == "int64" ( put category:="number" )
331
+ case typeof(value) == "string" ( put category:="text" )
332
+ case true ( put category:="other" )
333
+ ' mixed.json
334
+ ```
335
+
336
+ ### SQL Syntax Examples
337
+
338
+ Traditional SQL syntax works seamlessly with SuperDB:
339
+
340
+ #### Traditional SELECT queries
341
+ ```bash
342
+ super -s -c "SELECT * FROM users WHERE age > 21" users.json
343
+ ```
344
+
345
+ #### CTEs (Common Table Expressions)
346
+ ```bash
347
+ super -s -c "
348
+ WITH recent_orders AS (
349
+ SELECT customer_id, order_date, total
350
+ FROM orders
351
+ WHERE order_date >= '2024-01-01'
352
+ ),
353
+ customer_totals AS (
354
+ SELECT customer_id, SUM(total) as yearly_total
355
+ FROM recent_orders
356
+ GROUP BY customer_id
357
+ )
358
+ SELECT c.name, ct.yearly_total
359
+ FROM customers c
360
+ JOIN customer_totals ct ON c.id = ct.customer_id
361
+ WHERE ct.yearly_total > 1000;
362
+ " orders.json
363
+ ```
364
+
365
+ #### Mixed SQL and pipe syntax
366
+ ```bash
367
+ super -s -c "
368
+ SELECT name, processed_date
369
+ FROM ( from logs.json | ? 'error' | put processed_date:=now() )
370
+ WHERE processed_date IS NOT NULL
371
+ ORDER BY processed_date DESC;
372
+ " logs.json
373
+ ```
374
+
375
+ #### Joins
376
+ ```bash
377
+ echo '{"id":1,"name":"foo"}
378
+ {"id":2,"name":"bar"}' > people.json
379
+
380
+ echo '{id:1,person_id:1,exercise:"tango"}
381
+ {id:2,person_id:1,exercise:"typing"}
382
+ {id:3,person_id:2,exercise:"jogging"}
383
+ {id:4,person_id:2,exercise:"cooking"}' > exercises.sup
384
+
385
+ # joins supported: left, right, inner, full outer, anti
386
+ super -c "
387
+ select * from people.json people
388
+ join exercises.sup exercises
389
+ on people.id=exercises.person_id
390
+ "
391
+
392
+ # where ... is null not supported yet
393
+ # unless coalesce used in the select clause
394
+ super -c "
395
+ select * from people.json people
396
+ left join exercises.sup exercises
397
+ on people.id=exercises.person_id
398
+ where is_error(exercises.exercise)
399
+ "
400
+ ```
401
+
402
+ #### WHERE Clause Tips
403
+
404
+ ##### Negation
405
+
406
+ `where !(this in $json)` is invalid!
407
+
408
+ `where not (this in $json)` is valid!
409
+
410
+ ### Tips
411
+
412
+ - Merge together `super` calls whenever you can.
413
+
414
+ **Not as Good**
415
+
416
+ ```bash
417
+ _current_tasks "| where done==true" | super -s -c "count()" -
418
+ ```
419
+
420
+ **Better**
421
+
422
+ ```bash
423
+ _current_tasks | super -s -c "where done==true | count()" -
424
+ ```
425
+
426
+ ## Advanced SuperDB Features
427
+
428
+ ### Type System
429
+
430
+ - Strongly typed with dynamic flexibility
431
+ - Algebraic types (sum and product types)
432
+ - First-class type values
433
+ - Type representation: `<[int64|string]>` for mixed types
434
+
435
+ #### Optional fields
436
+
437
+ Record fields can be marked optional with `?` syntax. Optional fields may be
438
+ absent (null) without changing the record's type:
439
+
440
+ ```
441
+ {x:1, y?:null} # y is optional and absent
442
+ {x:2, y?:3} # y is optional and present
443
+ ```
444
+
445
+ #### Fusion types
446
+
447
+ `fuse` merges heterogeneous record types into a common schema using union types
448
+ and optional fields. `defuse` reverses this:
449
+
450
+ ```
451
+ values {a:1}, {a:2, b:"hello"} | fuse
452
+ # => fusion({a:1,b?:_::string},<{a:int64}>)
453
+ # => fusion({a:2,b?:"hello"},<{a:int64,b:string}>)
454
+
455
+ values {a:1}, {a:2, b:"hello"} | fuse | defuse(this)
456
+ # => {a:1}
457
+ # => {a:2,b:"hello"}
458
+ ```
459
+
460
+ Similarly, `blend` creates a common type using optional fields and unions, and
461
+ `unblend` reverses it.
462
+
463
+ ### Nested Field Access
464
+
465
+ ```
466
+ # Access nested fields
467
+ cut user.profile.name, user.settings.theme
468
+
469
+ # Conditional nested access
470
+ put display_name:=user?.profile?.name ?? "Anonymous"
471
+ ```
472
+
473
+ ### Time Operations
474
+
475
+ **Type representation:**
476
+
477
+ - `time`: signed 64-bit integer as nanoseconds from epoch
478
+ - `duration`: signed 64-bit integer as nanoseconds
479
+
480
+ ```
481
+ # Current time
482
+ ts:=now()
483
+
484
+ # Time comparisons
485
+ where ts > 2024-01-01T00:00:00Z
486
+
487
+ # Time formatting
488
+ put formatted:=strftime("%Y-%m-%d", ts)
489
+ ```
490
+
491
+ ### Grok Pattern Parsing
492
+
493
+ Parse unstructured strings into structured records using predefined grok patterns:
494
+
495
+ ```bash
496
+ # Parse log line with predefined patterns
497
+ grok("%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}", log_line)
498
+
499
+ # Common pattern examples
500
+ grok("%{IP:client_ip} %{WORD:method} %{URIPATH:path}", access_log)
501
+ grok("%{NUMBER:duration:float} %{WORD:unit}", "123.45 seconds")
502
+
503
+ # With custom pattern definitions (third argument)
504
+ grok("%{CUSTOM:field}", input_string, "CUSTOM \\d{3}-\\d{4}")
505
+ ```
506
+
507
+ Returns a record with named fields extracted from the input string.
508
+
509
+ **Using with raw text files:**
510
+
511
+ ```bash
512
+ # Parse log file line-by-line
513
+ super -i line -s -c 'put parsed:=grok("%{TIMESTAMP_ISO8601:ts} %{LOGLEVEL:level} %{GREEDYDATA:msg}", this)' app.log
514
+
515
+ # Filter parsed results
516
+ super -i line -j -c 'grok("%{IP:ip} %{NUMBER:status:int} %{NUMBER:bytes:int}", this) | where status >= 400' access.log
517
+ ```
518
+
519
+ **Using with structured data:**
520
+
521
+ ```bash
522
+ # Parse string field from JSON records (no -i line needed)
523
+ echo '{"raw_log":"2024-01-15 ERROR Database connection failed"}' |
524
+ super -j -c 'put parsed:=grok("%{TIMESTAMP_ISO8601:ts} %{LOGLEVEL:level} %{GREEDYDATA:msg}", raw_log)' -
525
+ ```
526
+
527
+ ### Array and Record Concatenation
528
+
529
+ Use the spread operator.
530
+
531
+ ```bash
532
+ super -s -c "{a:[], b:[]} | [...a, ...b]" # => []
533
+ super -s -c "{a:[1], b:[]} | [...a, ...b]" # => [1]
534
+ super -s -c "{a:[1], b:[2,3]} | [...a, ...b]" # => [1,2,3]
535
+ ```
536
+
537
+ ```bash
538
+ super -s -c "{a:{}, b:{}} | [...a, ...b]" # => {}
539
+ super -s -c "{a:{c:1}, b:{}} | [...a, ...b]" # => {c:1}
540
+ super -s -c "{a:{c:1}, b:{d:'foo'}} | {...a, ...b}" # => {c:1, d:'foo'}
541
+ ```
542
+
543
+ ## Debugging Tips
544
+
545
+ ### Common Issues and Solutions
546
+
547
+ 1. **Empty Results**
548
+
549
+ - Check for a trailing `-` without stdin
550
+ - Check for no trailing `-` with stdin (sometimes you get output anyway but this is usually wrong!)
551
+ - Verify field names match exactly (case-sensitive)
552
+ - Check type mismatches in comparisons
553
+
554
+ 2. **Type Errors**
555
+
556
+ - Use `typeof()` to inspect types
557
+ - Cast explicitly: `int64()`, `string()`, `float64()`
558
+ - Use `-f line` for clean numeric output
559
+
560
+ 3. **Performance Issues**
561
+
562
+ - Use `head` early in pipeline to limit data
563
+ - Aggregate before sorting when possible
564
+ - Use vectorized operations (vector: true in tests)
565
+
566
+ 4. **Complex Queries**
567
+
568
+ - Break into smaller pipelines for debugging
569
+ - Use `super -s -c "values this"` to inspect intermediate data
570
+ - Add `| head 5` to preview results during development
571
+
572
+ ### The `debug` operator
573
+
574
+ The `debug` operator taps into a pipeline to emit values to stderr without
575
+ affecting the main output:
576
+
577
+ ```
578
+ debug [ <expr> ] [ filter ( <pred> ) ]
579
+ ```
580
+
581
+ - With no args, echoes every value to stderr
582
+ - `<expr>` transforms what gets emitted (e.g., `debug f'val={this}'`)
583
+ - `filter (<pred>)` controls which values trigger debug output
584
+ - Output is always SUP format on stderr, regardless of `-f` flag
585
+
586
+ ```bash
587
+ # Flag failing scores on stderr while writing results to a file
588
+ super -s -c "
589
+ from scores.json
590
+ | put pass:=score >= 70
591
+ | debug f'FAIL: {name} ({score})' filter (pass=false)
592
+ | sort name
593
+ " > results.sup
594
+ ```
595
+
596
+ See `tutorial:debug` for advanced patterns including `fork`, subqueries, and
597
+ `fn`/`op` usage with debug.
598
+
599
+ ### Debugging Commands
600
+
601
+ ```bash
602
+ # Inspect data structure
603
+ echo "$data" | super -S -c "head 1" -
604
+
605
+ # Check field types
606
+ echo "$data" | super -s -c "put types:=typeof(this)" -
607
+
608
+ # Count records at each stage
609
+ super -s -c "query | aggregate count:=count()" data.json
610
+ super -s -c "query | filter | aggregate count:=count()" data.json
611
+
612
+ # Validate syntax without execution
613
+ super -s -c "your query" -n
614
+ ```
615
+
616
+ ## Format Conversions
617
+
618
+ ### Input/Output Formats
619
+
620
+ ```bash
621
+ # JSON to Parquet
622
+ super -f parquet data.json >data.parquet
623
+
624
+ # CSV to JSON with pretty print
625
+ super -J data.csv
626
+
627
+ # Multiple formats to Arrow
628
+ super -f arrows file1.json file2.parquet file3.csv >combined.arrows
629
+
630
+ # SUP format (self-describing)
631
+ super -s mixed-data.json >structured.sup
632
+ ```
633
+
634
+ ## Key Differences from SQL
635
+
636
+ 1. **Pipe syntax** instead of nested queries
637
+ 2. **Polymorphic operators** work across types
638
+ 3. **First-class arrays** and nested data
639
+ 4. **No NULL** - use error values or missing fields
640
+ 5. **Type-aware operations** with automatic handling
641
+ 6. **Streaming architecture** for large datasets
642
+
643
+ ### Date and Time
644
+
645
+ date_trunc is a valid postgresql function, but it's not supported yet in
646
+ superdb. So you can use `bucket(now(), 1d)` instead of `date_trunc('day',
647
+ now())` for the time being.
648
+
649
+ ### Duration Type Conversions
650
+
651
+ Converting numeric values (like milliseconds) to duration types uses f-string interpolation and type casting:
652
+
653
+ **Basic patterns:**
654
+
655
+ ```bash
656
+ # Convert milliseconds to duration
657
+ super -c "values 993958 | values f'{this}ms'::duration"
658
+
659
+ # Convert to seconds first, then duration
660
+ super -c "values 993958 / 1000 | values f'{this}s'::duration"
661
+
662
+ # Round duration to buckets (e.g., 15 minute chunks)
663
+ super -c "values 993958 / 1000 | values f'{this}s'::duration | bucket(this, 15m)"
664
+ ```
665
+
666
+ **Key points:**
667
+
668
+ - Use f-string interpolation: `f'{this}ms'` or `f'{this}s'`
669
+ - Cast to duration with `::duration` suffix
670
+ - Common units: `ms` (milliseconds), `s` (seconds), `m` (minutes), `h` (hours), `d` (days), `w` (weeks), `y` (years)
671
+ - **MONTH IS NOT A SUPPORTED UNIT.**
672
+ - **WEEKS ARE STRANGE:** You can use `w` in input (e.g., `'1w'::duration`, `bucket(this, 2w)`), but output always shows
673
+ days instead of weeks (e.g., `'1w'::duration` outputs `7d`)
674
+ - Use `bucket()` function to round durations into time chunks
675
+ - Duration values can be formatted and compared like other types
676
+
677
+ ### Type Casting
678
+
679
+ SuperDB uses `::type` syntax for type conversions (not function calls):
680
+
681
+ ```bash
682
+ # Integer conversion (truncates decimals)
683
+ super -c "values 1234.56::int64" # outputs: 1234
684
+
685
+ # String conversion
686
+ super -c "values 42::string" # outputs: "42"
687
+
688
+ # Float conversion
689
+ super -c "values 100::float64" # outputs: 100.0
690
+
691
+ # Chaining casts
692
+ super -c "values (123.45::int64)::string" # outputs: "123"
693
+ ```
694
+
695
+ **Important:**
696
+
697
+ - Use `::type` syntax, NOT function calls like `int64(value)`, `string(value)`, etc.
698
+ - **Historical note:** Earlier SuperDB pre-releases supported function-style casting like `int64(123.45)`, but this
699
+ syntax has been removed. Always use `::type` syntax instead.
700
+
701
+ ### Type Inference with `infer`
702
+
703
+ The `infer` operator samples input and auto-detects native types for string
704
+ values, casting them automatically. Useful for cleaning up CSV or other
705
+ string-heavy data:
706
+
707
+ ```bash
708
+ # Strings become ints, bools, timestamps, IPs where detected
709
+ echo '{"port":"80","active":"true","ts":"Jun 1, 2025"}' |
710
+ super -s -c "infer" -
711
+ # => {port:80,active:true,ts:2025-06-01T00:00:00Z}
712
+ ```
713
+
714
+ Candidate types: `int64`, `float64`, `ip`, `net`, `time`, `bool`. Default
715
+ sample size is 100 records; `infer 0` analyzes all input.
716
+
717
+ ### `defuse` and `unblend` functions
718
+
719
+ These reverse the effects of `fuse` and `blend`:
720
+
721
+ - `defuse(val)` — reverses `fuse` by converting fusion types back to original
722
+ types, downcasting union values to their subtype equivalent
723
+ - `unblend(val)` — reverses `blend` by removing union types and eliminating
724
+ optional fields without values
725
+
726
+ ### Rounding Numbers
727
+
728
+ SuperDB has a `round()` function that rounds to the nearest integer:
729
+
730
+ ```bash
731
+ # Round to nearest integer (single argument only)
732
+ super -c "values round(3.14)" # outputs: 3.0
733
+ super -c "values round(-1.5)" # outputs: -2.0
734
+ super -c "values round(1234.567)" # outputs: 1235.0
735
+
736
+ # For rounding to specific decimal places, use the multiply-cast-divide pattern
737
+ super -c "values ((1234.567 * 100)::int64 / 100.0)" # outputs: 1234.56 (2 decimals)
738
+ super -c "values ((1234.567 * 10)::int64 / 10.0)" # outputs: 1234.5 (1 decimal)
739
+ ```
740
+
741
+ **Key points:**
742
+
743
+ - `round(value)` only rounds to nearest integer, no decimal places parameter
744
+ - For rounding to N decimals: multiply by 10^N, cast to int64, divide by 10^N
745
+ - Cast to `::int64` truncates decimals (doesn't round)
746
+
747
+ ### String Interpolation and F-strings
748
+
749
+ SuperDB supports f-string interpolation for formatting output:
750
+
751
+ ```
752
+ # Basic f-string with variable interpolation
753
+ | values f'Message: {field_name}'
754
+
755
+ # Type conversion needed for numbers
756
+ | values f'Count: {count::string} items'
757
+
758
+ # Multiple fields
759
+ | values f'{prefix}: {count::string} {grid_type} wins!'
760
+ ```
761
+
762
+ **Important:**
763
+
764
+ - Numbers must be converted to strings using `::string` casting
765
+ - F-strings use single quotes with `f'...'` prefix
766
+ - Variables are referenced with `{variable_name}` syntax
767
+ - `null` values are silently ignored in f-strings (as of v0.3.0):
768
+ `f'hello {null} world'` produces `"hello world"`
769
+
770
+ ### Avoid jq syntax
771
+
772
+ There's very little jq syntax that is valid in SuperDB.
773
+
774
+ - Do not use ` // 0 ` - this is only valid in jq, not in SuperDB. You can use coalesce instead.
775
+
776
+ - SuperDB uses **0-based indexing** by default. Use `pragma index_base = 1` to switch to 1-based indexing within a scope:
777
+
778
+ ```
779
+ -- Default: 0-based
780
+ values [10,20,30][0] -- 10
781
+
782
+ -- Switch to 1-based in a scope
783
+ pragma index_base = 1
784
+ values [10,20,30][1] -- 10
785
+
786
+ -- Pragmas are lexically scoped
787
+ pragma index_base = 1
788
+ values {
789
+ a: this[2:3], -- 1-based: [20]
790
+ b: (
791
+ pragma index_base = 0
792
+ values this[0] -- 0-based: 10
793
+ )
794
+ }
795
+ ```
796
+
797
+ ## Pragmas
798
+
799
+ Pragmas control language features and appear in declaration blocks with lexical scoping:
800
+
801
+ ```
802
+ pragma <id> [ = <expr> ]
803
+ ```
804
+
805
+ If `<expr>` is omitted, it defaults to `true`. Available pragmas:
806
+
807
+ - **`index_base`** — `0` (default) for zero-based indexing, `1` for one-based indexing
808
+ - **`pg`** — `false` (default, Google SQL semantics) or `true` (PostgreSQL semantics for GROUP BY identifier resolution)
809
+
810
+ ## SuperDB Quoting Rules (Bash Integration)
811
+
812
+ Follow these quoting rules when calling `super` from bash:
813
+
814
+ - Use double quotes for the `-c` parameter: `super -s -c "..."`
815
+ - Use single quotes inside SuperDB queries: `{type:10, content:'$variable'}`
816
+ - Avoid escaping double quotes inside SuperDB — use single quotes instead
817
+ - This allows bash interpolation while avoiding quote escaping issues
818
+
819
+ **Examples:**
820
+
821
+ ```bash
822
+ # CORRECT: Double quotes for -c, single quotes inside
823
+ super -j -c "values {type:10, content:'$message'}"
824
+
825
+ # WRONG: Single quotes for -c prevents bash interpolation
826
+ super -j -c 'values {type:10, content:"$message"}'
827
+
828
+ # WRONG: Escaping double quotes inside is error-prone
829
+ super -j -c "values {type:10, content:\"$message\"}"
830
+ ```
831
+
832
+ ## SuperDB Array Filtering (Critical Pattern)
833
+
834
+ **`where` operates on streams, not arrays directly**. To filter elements from an array:
835
+
836
+ **Correct pattern:**
837
+
838
+ ```bash
839
+ # Filter nulls from an array
840
+ super -j -c "
841
+ [array_elements]
842
+ | unnest this
843
+ | where this is not null
844
+ | collect(this)"
845
+ ```
846
+
847
+ **Key points:**
848
+
849
+ - `unnest this` - converts array to stream of elements
850
+ - `where this is not null` - filters elements (note: use `is not null`, not `!= null`)
851
+ - `collect(this)` - reassembles stream back into array
852
+
853
+ **Wrong approaches:**
854
+
855
+ ```bash
856
+ # WRONG: where doesn't work directly on arrays
857
+ super -s -c "[1,null,2] | where this != null"
858
+
859
+ # WRONG: incorrect null comparison syntax
860
+ super -s -c "unnest this | where this != null"
861
+ ```
862
+
863
+ ## Aggregate Functions
864
+
865
+ Aggregate functions (`count()`, `sum()`, `avg()`, `min()`, `max()`, `collect()`,
866
+ etc.) can **only** be used inside `aggregate`/`summarize` operators. Using them
867
+ in expression context (e.g., `put row:=count()`) is an error:
868
+
869
+ ```
870
+ call to aggregate function in non-aggregate context
871
+ ```
872
+
873
+ This was changed in [PR #6355](https://github.com/brimdata/super/pull/6355).
874
+ Earlier versions of SuperDB/Zed allowed "streaming aggregations" in expression
875
+ context, but this was removed for SQL compatibility and parallelization.
876
+
877
+ ### Aggregate / Summarize: Summary Output
878
+
879
+ Use `aggregate` (or `summarize`) to produce summary output. Can be parallelized.
880
+
881
+ ```bash
882
+ # Single summary across all records
883
+ echo '{"x":1}
884
+ {"x":2}
885
+ {"x":3}' |
886
+ super -j -c "aggregate total:=count(), sum_x:=sum(x)" -
887
+
888
+ # Output:
889
+ {"total":3,"sum_x":6}
890
+
891
+ # Group by category
892
+ echo '{"category":"A","amount":10}
893
+ {"category":"B","amount":20}
894
+ {"category":"A","amount":15}' |
895
+ super -j -c "aggregate total:=sum(amount) by category | sort category" -
896
+
897
+ # Output:
898
+ {"category":"A","total":25}
899
+ {"category":"B","total":20}
900
+ ```
901
+
902
+ ### The `count` Operator (Row Numbering)
903
+
904
+ For sequential row numbering — the most common former use of expression-context
905
+ `count()` — use the **`count` operator** ([PR #6344](https://github.com/brimdata/super/pull/6344)):
906
+
907
+ ```bash
908
+ # Default: wraps input in "that" field, adds "count" field
909
+ super -s -c "values 1,2,3 | count"
910
+ # {that:1,count:1}
911
+ # {that:2,count:2}
912
+ # {that:3,count:3}
913
+
914
+ # Custom record expression with count alias
915
+ super -s -c "values 1,2,3 | count {value:this, c}"
916
+ # {value:1,c:1}
917
+ # {value:2,c:2}
918
+ # {value:3,c:3}
919
+
920
+ # Spread input fields alongside the count
921
+ super -s -c "values {a:1},{b:2},{c:3} | count | {row:count,...that}"
922
+ # {row:1,a:1}
923
+ # {row:2,b:2}
924
+ # {row:3,c:3}
925
+ ```
926
+
927
+ **No replacement exists** for other streaming patterns (`sum`, `avg`, `min`,
928
+ `max`, progressive `collect`). Window functions are planned post-GA
929
+ ([#5921](https://github.com/brimdata/super/issues/5921)).