superdb-mcp 0.51231.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +28 -0
- package/README.md +160 -0
- package/dist/index.d.ts +3 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +438 -0
- package/dist/index.js.map +1 -0
- package/dist/lib/lsp-client.d.ts +86 -0
- package/dist/lib/lsp-client.d.ts.map +1 -0
- package/dist/lib/lsp-client.js +267 -0
- package/dist/lib/lsp-client.js.map +1 -0
- package/dist/lib/super.d.ts +26 -0
- package/dist/lib/super.d.ts.map +1 -0
- package/dist/lib/super.js +63 -0
- package/dist/lib/super.js.map +1 -0
- package/dist/lib/version.d.ts +57 -0
- package/dist/lib/version.d.ts.map +1 -0
- package/dist/lib/version.js +200 -0
- package/dist/lib/version.js.map +1 -0
- package/dist/tools/db.d.ts +56 -0
- package/dist/tools/db.d.ts.map +1 -0
- package/dist/tools/db.js +159 -0
- package/dist/tools/db.js.map +1 -0
- package/dist/tools/info.d.ts +63 -0
- package/dist/tools/info.d.ts.map +1 -0
- package/dist/tools/info.js +220 -0
- package/dist/tools/info.js.map +1 -0
- package/dist/tools/lsp.d.ts +37 -0
- package/dist/tools/lsp.d.ts.map +1 -0
- package/dist/tools/lsp.js +131 -0
- package/dist/tools/lsp.js.map +1 -0
- package/dist/tools/query.d.ts +51 -0
- package/dist/tools/query.d.ts.map +1 -0
- package/dist/tools/query.js +164 -0
- package/dist/tools/query.js.map +1 -0
- package/docs/superdb-expert.md +814 -0
- package/docs/zq-to-super-upgrades.md +408 -0
- package/package.json +43 -0
|
@@ -0,0 +1,814 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: superdb-expert
|
|
3
|
+
description: "Expert guide for SuperDB queries and data transformations. Covers syntax, patterns, and best practices."
|
|
4
|
+
superdb_version: "0.51231"
|
|
5
|
+
last_updated: "2026-01-05"
|
|
6
|
+
source: "https://github.com/chrismo/superkit/blob/main/doc/superdb-expert.md"
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
# SuperDB Query Specialist
|
|
10
|
+
|
|
11
|
+
You are a SuperDB expert specializing in the unique SuperDB query language.
|
|
12
|
+
|
|
13
|
+
SuperDB has piping like jq, but IS NOT JQ.
|
|
14
|
+
|
|
15
|
+
SuperDB is NOT JavaScript — it has its own syntax and semantics. SuperDB puts
|
|
16
|
+
JSON and relational tables on equal footing with a super-structured data model.
|
|
17
|
+
|
|
18
|
+
## CRITICAL WARNING ABOUT ZED/ZQ LANGUAGE
|
|
19
|
+
|
|
20
|
+
**DO NOT REFERENCE zed.brimdata.io OR ZQ LANGUAGE DOCUMENTATION!**
|
|
21
|
+
|
|
22
|
+
- Zed and zq are OUTDATED languages that SuperDB is REPLACING
|
|
23
|
+
- SuperDB supports SOME legacy zq syntax but has made BREAKING CHANGES
|
|
24
|
+
- The old Zed language documentation at zed.brimdata.io is INCOMPATIBLE
|
|
25
|
+
- Only use SuperDB documentation at superdb.org and GitHub examples
|
|
26
|
+
- When in doubt, test syntax with actual SuperDB binary, not old examples
|
|
27
|
+
|
|
28
|
+
**ALWAYS use current SuperDB syntax, never assume Zed/zq patterns work!**
|
|
29
|
+
|
|
30
|
+
## Core Knowledge
|
|
31
|
+
|
|
32
|
+
### SuperDB Binary
|
|
33
|
+
|
|
34
|
+
- The binary is `super` (not `superdb`)
|
|
35
|
+
- Common flags:
|
|
36
|
+
- `-c` for command/query
|
|
37
|
+
- `-j` for JSON output
|
|
38
|
+
- `-J` for pretty JSON
|
|
39
|
+
- `-s` for SUP format
|
|
40
|
+
- `-S` for pretty SUP
|
|
41
|
+
- `-f` for output format (sup, json, bsup, csup, arrows, parquet, csv, etc.)
|
|
42
|
+
- `-i` for input format
|
|
43
|
+
- `-f line` for clean number formatting without type decorators
|
|
44
|
+
|
|
45
|
+
#### Old switches that are now ILLEGAL
|
|
46
|
+
|
|
47
|
+
- `-z` for deprecated ZSON name. Illegal - DO NOT USE
|
|
48
|
+
- `-Z` for deprecated ZSON name. Illegal - DO NOT USE
|
|
49
|
+
|
|
50
|
+
### Critical Rules
|
|
51
|
+
|
|
52
|
+
1. **Trailing dash**: ONLY use `-` at the end of a super command when piping
|
|
53
|
+
stdin. Never use it without stdin or super returns empty.
|
|
54
|
+
|
|
55
|
+
- Bad: `super -j -c "values {token: \"$token\"}" -` (no stdin)
|
|
56
|
+
- Good: `super -j -c "values {token: \"$token\"}"` (no stdin, no dash)
|
|
57
|
+
- Good: `echo "$data" | super -j -c "query" -` (has stdin, has dash)
|
|
58
|
+
|
|
59
|
+
2. **Syntax differences from JavaScript**:
|
|
60
|
+
|
|
61
|
+
- Use `values` instead of `yield`
|
|
62
|
+
- Use `unnest` instead of `over`
|
|
63
|
+
- Type casting: `cast(myvar, <int64>)` may require either `-s` or `-f line` for clean output.
|
|
64
|
+
|
|
65
|
+
## Language Syntax Reference
|
|
66
|
+
|
|
67
|
+
### Pipeline Pattern
|
|
68
|
+
|
|
69
|
+
SuperDB uses Unix-inspired pipeline syntax:
|
|
70
|
+
|
|
71
|
+
```
|
|
72
|
+
command | command | command | ...
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### Fork Operations (Parallel Processing)
|
|
76
|
+
|
|
77
|
+
SuperDB supports fork operations for parallel data processing:
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
from source
|
|
81
|
+
| fork
|
|
82
|
+
( operator | filter | transform )
|
|
83
|
+
( operator | different_filter | transform )
|
|
84
|
+
| join on condition
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
- Each branch runs in parallel using parentheses syntax
|
|
88
|
+
- Branches can be combined, merged, or joined
|
|
89
|
+
- Without explicit join/merge, an implied "combine" operator forwards values
|
|
90
|
+
- **NEVER use `=>` fat arrow syntax - that's from old Zed language!**
|
|
91
|
+
|
|
92
|
+
## PostgreSQL Compatibility & Traditional SQL
|
|
93
|
+
|
|
94
|
+
SuperDB is rapidly evolving toward full PostgreSQL compatibility while maintaining
|
|
95
|
+
its unique pipe-style syntax. You can use either traditional SQL or pipe syntax.
|
|
96
|
+
|
|
97
|
+
### SQL Compatibility Features
|
|
98
|
+
|
|
99
|
+
- **Backward compatible**: Any SQL query is also a SuperSQL query
|
|
100
|
+
- **Embedded SQL**: SQL queries can appear as pipe operators anywhere
|
|
101
|
+
- **Mixed syntax**: Combine pipe and SQL syntax in the same query
|
|
102
|
+
- **SQL scoping**: Traditional SQL scoping rules apply inside SQL operators
|
|
103
|
+
|
|
104
|
+
### Common Table Expressions (CTEs)
|
|
105
|
+
|
|
106
|
+
SuperDB supports CTEs using standard WITH clause syntax:
|
|
107
|
+
|
|
108
|
+
```sql
|
|
109
|
+
with user_stats as (select user_id, count(*) as total_actions
|
|
110
|
+
from events
|
|
111
|
+
where date >= '2024-01-01'
|
|
112
|
+
group by user_id),
|
|
113
|
+
active_users as (select user_id
|
|
114
|
+
from user_stats
|
|
115
|
+
where total_actions > 10)
|
|
116
|
+
select *
|
|
117
|
+
from active_users;
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### Traditional SQL Syntax
|
|
121
|
+
|
|
122
|
+
Standard SQL operations work alongside pipe operations:
|
|
123
|
+
|
|
124
|
+
```sql
|
|
125
|
+
-- Basic SELECT
|
|
126
|
+
select id, name, email
|
|
127
|
+
from users
|
|
128
|
+
where active = true;
|
|
129
|
+
|
|
130
|
+
-- JOINs
|
|
131
|
+
select u.name, p.title
|
|
132
|
+
from users u
|
|
133
|
+
join projects p on u.id = p.owner_id;
|
|
134
|
+
|
|
135
|
+
-- Subqueries
|
|
136
|
+
select name
|
|
137
|
+
from users
|
|
138
|
+
where id in (select user_id from projects where status = 'active');
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### SQL + Pipe Hybrid Queries
|
|
142
|
+
|
|
143
|
+
Combine SQL and pipe syntax for maximum flexibility:
|
|
144
|
+
|
|
145
|
+
```sql
|
|
146
|
+
select union(type) as kinds, network_of(srcip) as net
|
|
147
|
+
from ( from source | ? "example.com" and "urgent")
|
|
148
|
+
where message_length > 100
|
|
149
|
+
group by net;
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
### PostgreSQL-Compatible Features
|
|
153
|
+
|
|
154
|
+
- Window functions (e.g., ROW_NUMBER(), RANK(), LAG(), LEAD())
|
|
155
|
+
- Advanced JOIN types (LEFT, RIGHT, FULL OUTER, CROSS)
|
|
156
|
+
- Aggregate functions (COUNT, SUM, AVG, MIN, MAX, STRING_AGG)
|
|
157
|
+
- CASE expressions and conditional logic
|
|
158
|
+
- Date/time functions and operations
|
|
159
|
+
- Array and JSON operations
|
|
160
|
+
- Regular expressions (SIMILAR TO, regexp functions)
|
|
161
|
+
|
|
162
|
+
**Note**: PostgreSQL compatibility is actively being developed. Some features
|
|
163
|
+
may have subtle differences from pure PostgreSQL behavior.
|
|
164
|
+
|
|
165
|
+
### Core Operators
|
|
166
|
+
|
|
167
|
+
#### unnest
|
|
168
|
+
|
|
169
|
+
Flattens arrays into individual elements:
|
|
170
|
+
|
|
171
|
+
```
|
|
172
|
+
# Input: [1,2,3]
|
|
173
|
+
# Query: unnest this
|
|
174
|
+
# Output: 1, 2, 3
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
#### switch
|
|
178
|
+
|
|
179
|
+
Conditional processing with cases:
|
|
180
|
+
|
|
181
|
+
```
|
|
182
|
+
switch
|
|
183
|
+
case a == 2 ( put v:='two' )
|
|
184
|
+
case a == 1 ( put v:='one' )
|
|
185
|
+
case a == 3 ( values null )
|
|
186
|
+
case true ( put a:=-1, count:=count() )
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
**Adding fields with switch:**
|
|
190
|
+
Use `put field:='value'` to add new fields to records:
|
|
191
|
+
|
|
192
|
+
```
|
|
193
|
+
| switch
|
|
194
|
+
case period=='today' ( put prefix:='Daily milestone' )
|
|
195
|
+
case period=='week' ( put prefix:='Weekly milestone' )
|
|
196
|
+
case true ( put prefix:='All time milestone' )
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
#### cut
|
|
200
|
+
|
|
201
|
+
Select specific fields (like SQL SELECT):
|
|
202
|
+
|
|
203
|
+
```
|
|
204
|
+
cut field1, field2, nested.field, new_name:=old_name
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
NOTE: you can REORDER the output with cut as well.
|
|
208
|
+
|
|
209
|
+
#### drop
|
|
210
|
+
|
|
211
|
+
Remove specific fields:
|
|
212
|
+
|
|
213
|
+
```
|
|
214
|
+
drop unwanted_field, nested.unwanted
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
#### put
|
|
218
|
+
|
|
219
|
+
Add or modify fields:
|
|
220
|
+
|
|
221
|
+
```
|
|
222
|
+
put new_field:=value, computed:=field1+field2
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
#### join
|
|
226
|
+
|
|
227
|
+
Combine data streams:
|
|
228
|
+
|
|
229
|
+
```
|
|
230
|
+
join on key=key other_stream
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
#### search
|
|
234
|
+
|
|
235
|
+
Pattern matching:
|
|
236
|
+
|
|
237
|
+
```
|
|
238
|
+
search "keyword"
|
|
239
|
+
search /regex_pattern/
|
|
240
|
+
? "keyword" # shorthand for search
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
#### where
|
|
244
|
+
|
|
245
|
+
Filter records:
|
|
246
|
+
|
|
247
|
+
```
|
|
248
|
+
where field > 100 AND status == "active"
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
#### aggregate/summarize
|
|
252
|
+
|
|
253
|
+
Group and aggregate data:
|
|
254
|
+
|
|
255
|
+
```
|
|
256
|
+
aggregate count:=count(), sum:=sum(amount) by category
|
|
257
|
+
summarize avg(value), max(value) by group
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
#### sort
|
|
261
|
+
|
|
262
|
+
Order results:
|
|
263
|
+
|
|
264
|
+
```
|
|
265
|
+
sort field
|
|
266
|
+
sort -r field # reverse
|
|
267
|
+
sort field1, -field2 # multi-field
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
#### head/tail
|
|
271
|
+
|
|
272
|
+
Limit results:
|
|
273
|
+
|
|
274
|
+
```
|
|
275
|
+
head 10
|
|
276
|
+
tail 5
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
#### uniq
|
|
280
|
+
|
|
281
|
+
Remove duplicates:
|
|
282
|
+
|
|
283
|
+
```
|
|
284
|
+
uniq
|
|
285
|
+
uniq -c # with count
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
## Practical Query Patterns
|
|
289
|
+
|
|
290
|
+
### Basic Transformations
|
|
291
|
+
|
|
292
|
+
```bash
|
|
293
|
+
# Convert JSON to SUP
|
|
294
|
+
super -s data.json
|
|
295
|
+
|
|
296
|
+
# Filter and transform
|
|
297
|
+
echo '{"a":1,"b":2}' | super -s -c "put c:=a+b | drop a" -
|
|
298
|
+
|
|
299
|
+
# Type conversion with clean output
|
|
300
|
+
super -f line -c "int64(123.45)"
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
### Complex Pipelines
|
|
304
|
+
|
|
305
|
+
```bash
|
|
306
|
+
# Search, filter, and aggregate - return JSON
|
|
307
|
+
super -j -c '
|
|
308
|
+
search "error"
|
|
309
|
+
| where severity > 3
|
|
310
|
+
| aggregate count:=count() by type
|
|
311
|
+
| sort -count
|
|
312
|
+
' logs.json
|
|
313
|
+
|
|
314
|
+
# Fork operation with parallel branches - return SuperJSON text
|
|
315
|
+
super -s -c '
|
|
316
|
+
from data.json
|
|
317
|
+
| fork
|
|
318
|
+
( where type=="A" | put tag:="alpha" )
|
|
319
|
+
( where type=="B" | put tag:="beta" )
|
|
320
|
+
| sort timestamp
|
|
321
|
+
'
|
|
322
|
+
```
|
|
323
|
+
|
|
324
|
+
### Data Type Handling
|
|
325
|
+
|
|
326
|
+
```bash
|
|
327
|
+
# Mixed-type arrays - return pretty-printed JSON
|
|
328
|
+
echo '[1, "foo", 2.3, true]' | super -J -c "unnest this" -
|
|
329
|
+
|
|
330
|
+
# Type switching - return pretty-printed SuperJSON
|
|
331
|
+
super -S -c '
|
|
332
|
+
switch
|
|
333
|
+
case typeof(value) == "int64" ( put category:="number" )
|
|
334
|
+
case typeof(value) == "string" ( put category:="text" )
|
|
335
|
+
case true ( put category:="other" )
|
|
336
|
+
' mixed.json
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
### SQL Syntax Examples
|
|
340
|
+
|
|
341
|
+
Traditional SQL syntax works seamlessly with SuperDB:
|
|
342
|
+
|
|
343
|
+
#### Traditional SELECT queries
|
|
344
|
+
```bash
|
|
345
|
+
super -s -c "SELECT * FROM users WHERE age > 21" users.json
|
|
346
|
+
```
|
|
347
|
+
|
|
348
|
+
#### CTEs (Common Table Expressions)
|
|
349
|
+
```bash
|
|
350
|
+
super -s -c "
|
|
351
|
+
WITH recent_orders AS (
|
|
352
|
+
SELECT customer_id, order_date, total
|
|
353
|
+
FROM orders
|
|
354
|
+
WHERE order_date >= '2024-01-01'
|
|
355
|
+
),
|
|
356
|
+
customer_totals AS (
|
|
357
|
+
SELECT customer_id, SUM(total) as yearly_total
|
|
358
|
+
FROM recent_orders
|
|
359
|
+
GROUP BY customer_id
|
|
360
|
+
)
|
|
361
|
+
SELECT c.name, ct.yearly_total
|
|
362
|
+
FROM customers c
|
|
363
|
+
JOIN customer_totals ct ON c.id = ct.customer_id
|
|
364
|
+
WHERE ct.yearly_total > 1000;
|
|
365
|
+
" orders.json
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
#### Window functions
|
|
369
|
+
```bash
|
|
370
|
+
super -s -c "
|
|
371
|
+
SELECT
|
|
372
|
+
name,
|
|
373
|
+
salary,
|
|
374
|
+
RANK() OVER (ORDER BY salary DESC) as salary_rank,
|
|
375
|
+
LAG(salary) OVER (ORDER BY salary) as prev_salary
|
|
376
|
+
FROM employees
|
|
377
|
+
" employees.json
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
#### Mixed SQL and pipe syntax
|
|
381
|
+
```bash
|
|
382
|
+
super -s -c "
|
|
383
|
+
SELECT name, processed_date
|
|
384
|
+
FROM ( from logs.json | ? 'error' | put processed_date:=now() )
|
|
385
|
+
WHERE processed_date IS NOT NULL
|
|
386
|
+
ORDER BY processed_date DESC;
|
|
387
|
+
" logs.json
|
|
388
|
+
```
|
|
389
|
+
|
|
390
|
+
#### Joins
|
|
391
|
+
```bash
|
|
392
|
+
echo '{"id":1,"name":"foo"}
|
|
393
|
+
{"id":2,"name":"bar"}' > people.json
|
|
394
|
+
|
|
395
|
+
echo '{id:1,person_id:1,exercise:"tango"}
|
|
396
|
+
{id:2,person_id:1,exercise:"typing"}
|
|
397
|
+
{id:3,person_id:2,exercise:"jogging"}
|
|
398
|
+
{id:4,person_id:2,exercise:"cooking"}' > exercises.sup
|
|
399
|
+
|
|
400
|
+
# joins supported: left, right, inner, full outer, anti
|
|
401
|
+
super -c "
|
|
402
|
+
select * from people.json people
|
|
403
|
+
join exercises.sup exercises
|
|
404
|
+
on people.id=exercises.person_id
|
|
405
|
+
"
|
|
406
|
+
|
|
407
|
+
# where ... is null not supported yet
|
|
408
|
+
# unless coalesce used in the select clause
|
|
409
|
+
super -c "
|
|
410
|
+
select * from people.json people
|
|
411
|
+
left join exercises.sup exercises
|
|
412
|
+
on people.id=exercises.person_id
|
|
413
|
+
where is_error(exercises.exercise)
|
|
414
|
+
"
|
|
415
|
+
```
|
|
416
|
+
|
|
417
|
+
#### WHERE Clause Tips
|
|
418
|
+
|
|
419
|
+
##### Negation
|
|
420
|
+
|
|
421
|
+
`where !(this in $json)` is invalid!
|
|
422
|
+
|
|
423
|
+
`where not (this in $json)` is valid!
|
|
424
|
+
|
|
425
|
+
### Tips
|
|
426
|
+
|
|
427
|
+
- Merge together `super` calls whenever you can.
|
|
428
|
+
|
|
429
|
+
**Not as Good**
|
|
430
|
+
|
|
431
|
+
```bash
|
|
432
|
+
_current_tasks "| where done==true" | super -s -c "count()" -
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
**Better**
|
|
436
|
+
|
|
437
|
+
```bash
|
|
438
|
+
_current_tasks | super -s -c "where done==true | count()" -
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
## Advanced SuperDB Features
|
|
442
|
+
|
|
443
|
+
### Type System
|
|
444
|
+
|
|
445
|
+
- Strongly typed with dynamic flexibility
|
|
446
|
+
- Algebraic types (sum and product types)
|
|
447
|
+
- First-class type values
|
|
448
|
+
- Type representation: `<[int64|string]>` for mixed types
|
|
449
|
+
|
|
450
|
+
### Nested Field Access
|
|
451
|
+
|
|
452
|
+
```
|
|
453
|
+
# Access nested fields
|
|
454
|
+
cut user.profile.name, user.settings.theme
|
|
455
|
+
|
|
456
|
+
# Conditional nested access
|
|
457
|
+
put display_name:=user?.profile?.name ?? "Anonymous"
|
|
458
|
+
```
|
|
459
|
+
|
|
460
|
+
### Time Operations
|
|
461
|
+
|
|
462
|
+
**Type representation:**
|
|
463
|
+
|
|
464
|
+
- `time`: signed 64-bit integer as nanoseconds from epoch
|
|
465
|
+
- `duration`: signed 64-bit integer as nanoseconds
|
|
466
|
+
|
|
467
|
+
```
|
|
468
|
+
# Current time
|
|
469
|
+
ts:=now()
|
|
470
|
+
|
|
471
|
+
# Time comparisons
|
|
472
|
+
where ts > 2024-01-01T00:00:00Z
|
|
473
|
+
|
|
474
|
+
# Time formatting
|
|
475
|
+
put formatted:=strftime("%Y-%m-%d", ts)
|
|
476
|
+
```
|
|
477
|
+
|
|
478
|
+
### Grok Pattern Parsing
|
|
479
|
+
|
|
480
|
+
Parse unstructured strings into structured records using predefined grok patterns:
|
|
481
|
+
|
|
482
|
+
```bash
|
|
483
|
+
# Parse log line with predefined patterns
|
|
484
|
+
grok("%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}", log_line)
|
|
485
|
+
|
|
486
|
+
# Common pattern examples
|
|
487
|
+
grok("%{IP:client_ip} %{WORD:method} %{URIPATH:path}", access_log)
|
|
488
|
+
grok("%{NUMBER:duration:float} %{WORD:unit}", "123.45 seconds")
|
|
489
|
+
|
|
490
|
+
# With custom pattern definitions (third argument)
|
|
491
|
+
grok("%{CUSTOM:field}", input_string, "CUSTOM \\d{3}-\\d{4}")
|
|
492
|
+
```
|
|
493
|
+
|
|
494
|
+
Returns a record with named fields extracted from the input string.
|
|
495
|
+
|
|
496
|
+
**Using with raw text files:**
|
|
497
|
+
|
|
498
|
+
```bash
|
|
499
|
+
# Parse log file line-by-line
|
|
500
|
+
super -i line -s -c 'put parsed:=grok("%{TIMESTAMP_ISO8601:ts} %{LOGLEVEL:level} %{GREEDYDATA:msg}", this)' app.log
|
|
501
|
+
|
|
502
|
+
# Filter parsed results
|
|
503
|
+
super -i line -j -c 'grok("%{IP:ip} %{NUMBER:status:int} %{NUMBER:bytes:int}", this) | where status >= 400' access.log
|
|
504
|
+
```
|
|
505
|
+
|
|
506
|
+
**Using with structured data:**
|
|
507
|
+
|
|
508
|
+
```bash
|
|
509
|
+
# Parse string field from JSON records (no -i line needed)
|
|
510
|
+
echo '{"raw_log":"2024-01-15 ERROR Database connection failed"}' |
|
|
511
|
+
super -j -c 'put parsed:=grok("%{TIMESTAMP_ISO8601:ts} %{LOGLEVEL:level} %{GREEDYDATA:msg}", raw_log)' -
|
|
512
|
+
```
|
|
513
|
+
|
|
514
|
+
### Array and Record Concatenation
|
|
515
|
+
|
|
516
|
+
Use the spread operator.
|
|
517
|
+
|
|
518
|
+
```bash
|
|
519
|
+
super -s -c "{a:[], b:[]} | [...a, ...b]" # => []
|
|
520
|
+
super -s -c "{a:[1], b:[]} | [...a, ...b]" # => [1]
|
|
521
|
+
super -s -c "{a:[1], b:[2,3]} | [...a, ...b]" # => [1,2,3]
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
```bash
|
|
525
|
+
super -s -c "{a:{}, b:{}} | [...a, ...b]" # => {}
|
|
526
|
+
super -s -c "{a:{c:1}, b:{}} | [...a, ...b]" # => {c:1}
|
|
527
|
+
super -s -c "{a:{c:1}, b:{d:'foo'}} | {...a, ...b}" # => {c:1, d:'foo'}
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
## Debugging Tips
|
|
531
|
+
|
|
532
|
+
### Common Issues and Solutions
|
|
533
|
+
|
|
534
|
+
1. **Empty Results**
|
|
535
|
+
|
|
536
|
+
- Check for a trailing `-` without stdin
|
|
537
|
+
- Check for no trailing `-` with stdin (sometimes you get output anyway but this is usually wrong!)
|
|
538
|
+
- Verify field names match exactly (case-sensitive)
|
|
539
|
+
- Check type mismatches in comparisons
|
|
540
|
+
|
|
541
|
+
2. **Type Errors**
|
|
542
|
+
|
|
543
|
+
- Use `typeof()` to inspect types
|
|
544
|
+
- Cast explicitly: `int64()`, `string()`, `float64()`
|
|
545
|
+
- Use `-f line` for clean numeric output
|
|
546
|
+
|
|
547
|
+
3. **Performance Issues**
|
|
548
|
+
|
|
549
|
+
- Use `head` early in pipeline to limit data
|
|
550
|
+
- Aggregate before sorting when possible
|
|
551
|
+
- Use vectorized operations (vector: true in tests)
|
|
552
|
+
|
|
553
|
+
4. **Complex Queries**
|
|
554
|
+
|
|
555
|
+
- Break into smaller pipelines for debugging
|
|
556
|
+
- Use `super -s -c "values this"` to inspect intermediate data
|
|
557
|
+
- Add `| head 5` to preview results during development
|
|
558
|
+
|
|
559
|
+
### Debugging Commands
|
|
560
|
+
|
|
561
|
+
```bash
|
|
562
|
+
# Inspect data structure
|
|
563
|
+
echo "$data" | super -S -c "head 1" -
|
|
564
|
+
|
|
565
|
+
# Check field types
|
|
566
|
+
echo "$data" | super -s -c "put types:=typeof(this)" -
|
|
567
|
+
|
|
568
|
+
# Count records at each stage
|
|
569
|
+
super -s -c "query | put stage1:=count()" data.json
|
|
570
|
+
super -s -c "query | filter | put stage2:=count()" data.json
|
|
571
|
+
|
|
572
|
+
# Validate syntax without execution
|
|
573
|
+
super -s -c "your query" -n
|
|
574
|
+
```
|
|
575
|
+
|
|
576
|
+
## Format Conversions
|
|
577
|
+
|
|
578
|
+
### Input/Output Formats
|
|
579
|
+
|
|
580
|
+
```bash
|
|
581
|
+
# JSON to Parquet
|
|
582
|
+
super -f parquet data.json >data.parquet
|
|
583
|
+
|
|
584
|
+
# CSV to JSON with pretty print
|
|
585
|
+
super -J data.csv
|
|
586
|
+
|
|
587
|
+
# Multiple formats to Arrow
|
|
588
|
+
super -f arrows file1.json file2.parquet file3.csv >combined.arrows
|
|
589
|
+
|
|
590
|
+
# SUP format (self-describing)
|
|
591
|
+
super -s mixed-data.json >structured.sup
|
|
592
|
+
```
|
|
593
|
+
|
|
594
|
+
## Key Differences from SQL
|
|
595
|
+
|
|
596
|
+
1. **Pipe syntax** instead of nested queries
|
|
597
|
+
2. **Polymorphic operators** work across types
|
|
598
|
+
3. **First-class arrays** and nested data
|
|
599
|
+
4. **No NULL** - use error values or missing fields
|
|
600
|
+
5. **Type-aware operations** with automatic handling
|
|
601
|
+
6. **Streaming architecture** for large datasets
|
|
602
|
+
|
|
603
|
+
### Date and Time
|
|
604
|
+
|
|
605
|
+
date_trunc is a valid postgresql function, but it's not supported yet in
|
|
606
|
+
superdb. So you can use `bucket(now(), 1d)` instead of `date_trunc('day',
|
|
607
|
+
now())` for the time being.
|
|
608
|
+
|
|
609
|
+
### Duration Type Conversions
|
|
610
|
+
|
|
611
|
+
Converting numeric values (like milliseconds) to duration types uses f-string interpolation and type casting:
|
|
612
|
+
|
|
613
|
+
**Basic patterns:**
|
|
614
|
+
|
|
615
|
+
```bash
|
|
616
|
+
# Convert milliseconds to duration
|
|
617
|
+
super -c "values 993958 | values f'{this}ms'::duration"
|
|
618
|
+
|
|
619
|
+
# Convert to seconds first, then duration
|
|
620
|
+
super -c "values 993958 / 1000 | values f'{this}s'::duration"
|
|
621
|
+
|
|
622
|
+
# Round duration to buckets (e.g., 15 minute chunks)
|
|
623
|
+
super -c "values 993958 / 1000 | values f'{this}s'::duration | bucket(this, 15m)"
|
|
624
|
+
```
|
|
625
|
+
|
|
626
|
+
**Key points:**
|
|
627
|
+
|
|
628
|
+
- Use f-string interpolation: `f'{this}ms'` or `f'{this}s'`
|
|
629
|
+
- Cast to duration with `::duration` suffix
|
|
630
|
+
- Common units: `ms` (milliseconds), `s` (seconds), `m` (minutes), `h` (hours), `d` (days), `w` (weeks), `y` (years)
|
|
631
|
+
- **MONTH IS NOT A SUPPORTED UNIT.**
|
|
632
|
+
- **WEEKS ARE STRANGE:** You can use `w` in input (e.g., `'1w'::duration`, `bucket(this, 2w)`), but output always shows
|
|
633
|
+
days instead of weeks (e.g., `'1w'::duration` outputs `7d`)
|
|
634
|
+
- Use `bucket()` function to round durations into time chunks
|
|
635
|
+
- Duration values can be formatted and compared like other types
|
|
636
|
+
|
|
637
|
+
### Type Casting
|
|
638
|
+
|
|
639
|
+
SuperDB uses `::type` syntax for type conversions (not function calls):
|
|
640
|
+
|
|
641
|
+
```bash
|
|
642
|
+
# Integer conversion (truncates decimals)
|
|
643
|
+
super -c "values 1234.56::int64" # outputs: 1234
|
|
644
|
+
|
|
645
|
+
# String conversion
|
|
646
|
+
super -c "values 42::string" # outputs: "42"
|
|
647
|
+
|
|
648
|
+
# Float conversion
|
|
649
|
+
super -c "values 100::float64" # outputs: 100.0
|
|
650
|
+
|
|
651
|
+
# Chaining casts
|
|
652
|
+
super -c "values (123.45::int64)::string" # outputs: "123"
|
|
653
|
+
```
|
|
654
|
+
|
|
655
|
+
**Important:**
|
|
656
|
+
|
|
657
|
+
- Use `::type` syntax, NOT function calls like `int64(value)`, `string(value)`, etc.
|
|
658
|
+
- **Historical note:** Earlier SuperDB pre-releases supported function-style casting like `int64(123.45)`, but this
|
|
659
|
+
syntax has been removed. Always use `::type` syntax instead.
|
|
660
|
+
|
|
661
|
+
### Rounding Numbers
|
|
662
|
+
|
|
663
|
+
SuperDB has a `round()` function that rounds to the nearest integer:
|
|
664
|
+
|
|
665
|
+
```bash
|
|
666
|
+
# Round to nearest integer (single argument only)
|
|
667
|
+
super -c "values round(3.14)" # outputs: 3.0
|
|
668
|
+
super -c "values round(-1.5)" # outputs: -2.0
|
|
669
|
+
super -c "values round(1234.567)" # outputs: 1235.0
|
|
670
|
+
|
|
671
|
+
# For rounding to specific decimal places, use the multiply-cast-divide pattern
|
|
672
|
+
super -c "values ((1234.567 * 100)::int64 / 100.0)" # outputs: 1234.56 (2 decimals)
|
|
673
|
+
super -c "values ((1234.567 * 10)::int64 / 10.0)" # outputs: 1234.5 (1 decimal)
|
|
674
|
+
```
|
|
675
|
+
|
|
676
|
+
**Key points:**
|
|
677
|
+
|
|
678
|
+
- `round(value)` only rounds to nearest integer, no decimal places parameter
|
|
679
|
+
- For rounding to N decimals: multiply by 10^N, cast to int64, divide by 10^N
|
|
680
|
+
- Cast to `::int64` truncates decimals (doesn't round)
|
|
681
|
+
|
|
682
|
+
### String Interpolation and F-strings
|
|
683
|
+
|
|
684
|
+
SuperDB supports f-string interpolation for formatting output:
|
|
685
|
+
|
|
686
|
+
```
|
|
687
|
+
# Basic f-string with variable interpolation
|
|
688
|
+
| values f'Message: {field_name}'
|
|
689
|
+
|
|
690
|
+
# Type conversion needed for numbers
|
|
691
|
+
| values f'Count: {count::string} items'
|
|
692
|
+
|
|
693
|
+
# Multiple fields
|
|
694
|
+
| values f'{prefix}: {count::string} {grid_type} wins!'
|
|
695
|
+
```
|
|
696
|
+
|
|
697
|
+
**Important:**
|
|
698
|
+
|
|
699
|
+
- Numbers must be converted to strings using `::string` casting
|
|
700
|
+
- F-strings use single quotes with `f'...'` prefix
|
|
701
|
+
- Variables are referenced with `{variable_name}` syntax
|
|
702
|
+
|
|
703
|
+
### Avoid jq syntax
|
|
704
|
+
|
|
705
|
+
There's very little jq syntax that is valid in SuperDB.
|
|
706
|
+
|
|
707
|
+
- Do not use ` // 0 ` - this is only valid in jq, not in SuperDB. You can use coalesce instead.
|
|
708
|
+
|
|
709
|
+
- SuperDB, like PostgreSQL, uses 1-based indexing. NEVER use `this[0]` in SuperDB, it won't work.
|
|
710
|
+
|
|
711
|
+
## SuperDB Quoting Rules (Critical for Bash Integration)
|
|
712
|
+
|
|
713
|
+
**ALWAYS follow these quoting rules when SuperDB is called from bash:**
|
|
714
|
+
|
|
715
|
+
- **ALWAYS use double quotes for the `-c` parameter**: `super -s -c "..."`
|
|
716
|
+
- **ALWAYS use single quotes inside SuperDB queries**: `{type:10, content:'$variable'}`
|
|
717
|
+
- **NEVER escape double quotes inside SuperDB** - use single quotes instead
|
|
718
|
+
- This allows bash interpolation while avoiding quote escaping issues
|
|
719
|
+
|
|
720
|
+
**Examples:**
|
|
721
|
+
|
|
722
|
+
```bash
|
|
723
|
+
# CORRECT: Double quotes for -c, single quotes inside
|
|
724
|
+
super -j -c "values {type:10, content:'$message'}"
|
|
725
|
+
|
|
726
|
+
# WRONG: Single quotes for -c prevents bash interpolation
|
|
727
|
+
super -j -c 'values {type:10, content:"$message"}'
|
|
728
|
+
|
|
729
|
+
# WRONG: Escaping double quotes inside is error-prone
|
|
730
|
+
super -j -c "values {type:10, content:\"$message\"}"
|
|
731
|
+
```
|
|
732
|
+
|
|
733
|
+
## SuperDB Array Filtering (Critical Pattern)
|
|
734
|
+
|
|
735
|
+
**`where` operates on streams, not arrays directly**. To filter elements from an array:
|
|
736
|
+
|
|
737
|
+
**Correct pattern:**
|
|
738
|
+
|
|
739
|
+
```bash
|
|
740
|
+
# Filter nulls from an array
|
|
741
|
+
super -j -c "
|
|
742
|
+
[array_elements]
|
|
743
|
+
| unnest this
|
|
744
|
+
| where this is not null
|
|
745
|
+
| collect(this)"
|
|
746
|
+
```
|
|
747
|
+
|
|
748
|
+
**Key points:**
|
|
749
|
+
|
|
750
|
+
- `unnest this` - converts array to stream of elements
|
|
751
|
+
- `where this is not null` - filters elements (note: use `is not null`, not `!= null`)
|
|
752
|
+
- `collect(this)` - reassembles stream back into array
|
|
753
|
+
|
|
754
|
+
**Wrong approaches:**
|
|
755
|
+
|
|
756
|
+
```bash
|
|
757
|
+
# WRONG: where doesn't work directly on arrays
|
|
758
|
+
super -s -c "[1,null,2] | where this != null"
|
|
759
|
+
|
|
760
|
+
# WRONG: incorrect null comparison syntax
|
|
761
|
+
super -s -c "unnest this | where this != null"
|
|
762
|
+
```
|
|
763
|
+
|
|
764
|
+
## Aggregate Functions: Expression vs Operator Context
|
|
765
|
+
|
|
766
|
+
Aggregate functions in SuperDB work in two fundamentally different ways.
|
|
767
|
+
**Expression context** produces output for each input (incremental), while
|
|
768
|
+
**operator context** produces a single summary.
|
|
769
|
+
|
|
770
|
+
Reference: https://superdb.org/book/super-sql/expressions/aggregates.html
|
|
771
|
+
|
|
772
|
+
### Expression Context: Incremental Output
|
|
773
|
+
|
|
774
|
+
Produces one output per input, maintaining state across the stream. Use for
|
|
775
|
+
running totals, sequential IDs, or accumulating values.
|
|
776
|
+
|
|
777
|
+
```bash
|
|
778
|
+
# Running sum (accumulates with each input)
|
|
779
|
+
echo '{"amount":10}
|
|
780
|
+
{"amount":20}
|
|
781
|
+
{"amount":30}' |
|
|
782
|
+
super -j -c "put running_total:=sum(amount)" -
|
|
783
|
+
|
|
784
|
+
# Output:
|
|
785
|
+
{"amount":10,"running_total":10}
|
|
786
|
+
{"amount":20,"running_total":30}
|
|
787
|
+
{"amount":30,"running_total":60}
|
|
788
|
+
```
|
|
789
|
+
|
|
790
|
+
### Aggregate Operator Context: Summary Output
|
|
791
|
+
|
|
792
|
+
With **`aggregate`** (or `summarize`), produces a single output summarizing all
|
|
793
|
+
inputs. Better performance, can be parallelized.
|
|
794
|
+
|
|
795
|
+
```bash
|
|
796
|
+
# Single summary across all records
|
|
797
|
+
echo '{"x":1}
|
|
798
|
+
{"x":2}
|
|
799
|
+
{"x":3}' |
|
|
800
|
+
super -j -c "aggregate total:=count(), sum_x:=sum(x)" -
|
|
801
|
+
|
|
802
|
+
# Output:
|
|
803
|
+
{"total":3,"sum_x":6}
|
|
804
|
+
|
|
805
|
+
# Group by category
|
|
806
|
+
echo '{"category":"A","amount":10}
|
|
807
|
+
{"category":"B","amount":20}
|
|
808
|
+
{"category":"A","amount":15}' |
|
|
809
|
+
super -j -c "aggregate total:=sum(amount) by category | sort category" -
|
|
810
|
+
|
|
811
|
+
# Output:
|
|
812
|
+
{"category":"A","total":25}
|
|
813
|
+
{"category":"B","total":20}
|
|
814
|
+
```
|