@chrismo/superkit 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE.txt +29 -0
- package/README.md +26 -0
- package/dist/cli/pager.d.ts +6 -0
- package/dist/cli/pager.d.ts.map +1 -0
- package/dist/cli/pager.js +21 -0
- package/dist/cli/pager.js.map +1 -0
- package/dist/cli/skdoc.d.ts +3 -0
- package/dist/cli/skdoc.d.ts.map +1 -0
- package/dist/cli/skdoc.js +42 -0
- package/dist/cli/skdoc.js.map +1 -0
- package/dist/cli/skgrok.d.ts +3 -0
- package/dist/cli/skgrok.d.ts.map +1 -0
- package/dist/cli/skgrok.js +21 -0
- package/dist/cli/skgrok.js.map +1 -0
- package/dist/cli/skops.d.ts +3 -0
- package/dist/cli/skops.d.ts.map +1 -0
- package/dist/cli/skops.js +32 -0
- package/dist/cli/skops.js.map +1 -0
- package/dist/index.d.ts +10 -0
- package/dist/index.d.ts.map +1 -0
- package/dist/index.js +11 -0
- package/dist/index.js.map +1 -0
- package/dist/lib/docs.d.ts +11 -0
- package/dist/lib/docs.d.ts.map +1 -0
- package/dist/lib/docs.js +29 -0
- package/dist/lib/docs.js.map +1 -0
- package/dist/lib/expert-sections.d.ts +32 -0
- package/dist/lib/expert-sections.d.ts.map +1 -0
- package/dist/lib/expert-sections.js +130 -0
- package/dist/lib/expert-sections.js.map +1 -0
- package/dist/lib/grok.d.ts +15 -0
- package/dist/lib/grok.d.ts.map +1 -0
- package/dist/lib/grok.js +57 -0
- package/dist/lib/grok.js.map +1 -0
- package/dist/lib/help.d.ts +20 -0
- package/dist/lib/help.d.ts.map +1 -0
- package/dist/lib/help.js +163 -0
- package/dist/lib/help.js.map +1 -0
- package/dist/lib/recipes.d.ts +29 -0
- package/dist/lib/recipes.d.ts.map +1 -0
- package/dist/lib/recipes.js +133 -0
- package/dist/lib/recipes.js.map +1 -0
- package/dist/superkit.tar.gz +0 -0
- package/docs/grok-patterns.sup +89 -0
- package/docs/recipes/array.md +66 -0
- package/docs/recipes/array.spq +31 -0
- package/docs/recipes/character.md +110 -0
- package/docs/recipes/character.spq +57 -0
- package/docs/recipes/escape.md +159 -0
- package/docs/recipes/escape.spq +102 -0
- package/docs/recipes/format.md +51 -0
- package/docs/recipes/format.spq +24 -0
- package/docs/recipes/index.md +23 -0
- package/docs/recipes/integer.md +101 -0
- package/docs/recipes/integer.spq +53 -0
- package/docs/recipes/records.md +84 -0
- package/docs/recipes/records.spq +61 -0
- package/docs/recipes/string.md +177 -0
- package/docs/recipes/string.spq +105 -0
- package/docs/superdb-expert.md +929 -0
- package/docs/tutorials/bash_to_sup.md +123 -0
- package/docs/tutorials/chess-tiebreaks.md +233 -0
- package/docs/tutorials/debug.md +439 -0
- package/docs/tutorials/fork_for_window.md +296 -0
- package/docs/tutorials/grok.md +166 -0
- package/docs/tutorials/index.md +10 -0
- package/docs/tutorials/joins.md +79 -0
- package/docs/tutorials/moar_subqueries.md +35 -0
- package/docs/tutorials/subqueries.md +236 -0
- package/docs/tutorials/sup_to_bash.md +164 -0
- package/docs/tutorials/super_db_update.md +34 -0
- package/docs/tutorials/unnest.md +113 -0
- package/docs/zq-to-super-upgrades.md +549 -0
- package/package.json +46 -0
|
@@ -0,0 +1,236 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Subqueries"
|
|
3
|
+
name: subqueries
|
|
4
|
+
description: "Examples of correlated subqueries and derived table patterns in SuperDB."
|
|
5
|
+
layout: default
|
|
6
|
+
nav_order: 7
|
|
7
|
+
parent: Tutorials
|
|
8
|
+
superdb_version: "0.3.0"
|
|
9
|
+
last_updated: "2026-02-15"
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Subqueries
|
|
13
|
+
|
|
14
|
+
While there are many different types of subqueries, this document so far is just
|
|
15
|
+
highlighting some common scenarios that may not have obvious implementations in
|
|
16
|
+
superdb.
|
|
17
|
+
|
|
18
|
+
## Correlated Subqueries
|
|
19
|
+
|
|
20
|
+
[//]: # (TODO: file versions - phil's versions from Slack - NOT versions - issue #54)
|
|
21
|
+
|
|
22
|
+
Let's start with this simple dataset:
|
|
23
|
+
|
|
24
|
+
```json lines
|
|
25
|
+
{"id":1, "date":"2025-02-27", "foo": 3}
|
|
26
|
+
{"id":2, "date":"2025-02-27", "foo": 2}
|
|
27
|
+
{"id":3, "date":"2025-02-28", "foo": 5}
|
|
28
|
+
{"id":4, "date":"2025-02-28", "foo": 9}
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
And we want to select the entries with the largest `foo` for each date.
|
|
32
|
+
|
|
33
|
+
One way to do this in SQL looks like this:
|
|
34
|
+
|
|
35
|
+
```sql
|
|
36
|
+
select id, date, foo
|
|
37
|
+
from data
|
|
38
|
+
where (date, foo) in
|
|
39
|
+
(select date, max(foo) as max_foo
|
|
40
|
+
from data
|
|
41
|
+
group by date);
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
Another way, by joining a derived table:
|
|
45
|
+
|
|
46
|
+
```sql
|
|
47
|
+
select *
|
|
48
|
+
from data d
|
|
49
|
+
join
|
|
50
|
+
(select date, max(foo) as max_foo
|
|
51
|
+
from data
|
|
52
|
+
group by date) max_foo
|
|
53
|
+
on d.date = max_foo.date and
|
|
54
|
+
d.foo = max_foo.max_foo;
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
In `super` this can be done with piped operators and a Lateral Subquery:
|
|
58
|
+
|
|
59
|
+
```mdtest-input data.json
|
|
60
|
+
{"id":1, "date":"2025-02-27", "foo": 3}
|
|
61
|
+
{"id":2, "date":"2025-02-27", "foo": 2}
|
|
62
|
+
{"id":3, "date":"2025-02-28", "foo": 5}
|
|
63
|
+
{"id":4, "date":"2025-02-28", "foo": 9}
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Here's an example using the piped `where` operator with `unnest ... into`:
|
|
67
|
+
```mdtest-command
|
|
68
|
+
super -s -c '
|
|
69
|
+
collect(this)
|
|
70
|
+
| {data: this}
|
|
71
|
+
| maxes:=[unnest this.data | foo:=max(foo) by date | values {date,foo}]
|
|
72
|
+
| unnest {this.maxes, this.data} into (
|
|
73
|
+
where {this.data.date, this.data.foo} in this.maxes
|
|
74
|
+
| values this.data
|
|
75
|
+
)' data.json
|
|
76
|
+
```
|
|
77
|
+
```mdtest-output
|
|
78
|
+
{id:1,date:"2025-02-27",foo:3}
|
|
79
|
+
{id:4,date:"2025-02-28",foo:9}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
And a simpler example using the piped `join` operator with `from`:
|
|
83
|
+
```mdtest-command
|
|
84
|
+
super -s -c '
|
|
85
|
+
from data.json
|
|
86
|
+
| inner join (from data.json
|
|
87
|
+
| foo:=max(foo) by date
|
|
88
|
+
| values {date,foo})
|
|
89
|
+
on {left.date,left.foo}={right.date,right.foo}
|
|
90
|
+
| values left
|
|
91
|
+
| sort id'
|
|
92
|
+
```
|
|
93
|
+
```mdtest-output
|
|
94
|
+
{id:1,date:"2025-02-27",foo:3}
|
|
95
|
+
{id:4,date:"2025-02-28",foo:9}
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
`super` also supports SQL syntax, and these subqueries work[^1]:
|
|
99
|
+
|
|
100
|
+
```mdtest-command
|
|
101
|
+
super -s -c '
|
|
102
|
+
select *
|
|
103
|
+
from "data.json"
|
|
104
|
+
where foo in (select max(foo), date
|
|
105
|
+
from "data.json"
|
|
106
|
+
group by date) '
|
|
107
|
+
```
|
|
108
|
+
```mdtest-output
|
|
109
|
+
{id:1,date:"2025-02-27",foo:3}
|
|
110
|
+
{id:4,date:"2025-02-28",foo:9}
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
If we save off the max data to a file first, then we can start to see how this
|
|
114
|
+
could look:
|
|
115
|
+
```mdtest-command
|
|
116
|
+
super -s -c '
|
|
117
|
+
select max(foo) as max_foo
|
|
118
|
+
from "data.json"
|
|
119
|
+
group by date' > max.sup
|
|
120
|
+
|
|
121
|
+
super -s -c '
|
|
122
|
+
select l.id, l.date, l.foo
|
|
123
|
+
from "data.json" l
|
|
124
|
+
join "max.sup" r
|
|
125
|
+
on l.foo==r.max_foo'
|
|
126
|
+
```
|
|
127
|
+
```mdtest-output
|
|
128
|
+
{id:1,date:"2025-02-27",foo:3}
|
|
129
|
+
{id:4,date:"2025-02-28",foo:9}
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## Subquery with Related Data Join
|
|
133
|
+
|
|
134
|
+
A more realistic scenario: find the records with the top `score` per date, and
|
|
135
|
+
also pull in user information from a related table.
|
|
136
|
+
|
|
137
|
+
```mdtest-input scores.json
|
|
138
|
+
{"id":1, "date":"2025-02-27", "score": 3, "user_id": 101}
|
|
139
|
+
{"id":2, "date":"2025-02-27", "score": 2, "user_id": 102}
|
|
140
|
+
{"id":3, "date":"2025-02-28", "score": 5, "user_id": 101}
|
|
141
|
+
{"id":4, "date":"2025-02-28", "score": 9, "user_id": 103}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
```mdtest-input users.json
|
|
145
|
+
{"user_id": 101, "name": "Moxie"}
|
|
146
|
+
{"user_id": 102, "name": "Ziggy"}
|
|
147
|
+
{"user_id": 103, "name": "Sprocket"}
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
First, the basic join returns all records with user names:
|
|
151
|
+
```mdtest-command
|
|
152
|
+
super -s -c '
|
|
153
|
+
select s.id, s.date, s.score, s.user_id, u.name
|
|
154
|
+
from "scores.json" s
|
|
155
|
+
join "users.json" u on s.user_id = u.user_id
|
|
156
|
+
order by s.id'
|
|
157
|
+
```
|
|
158
|
+
```mdtest-output
|
|
159
|
+
{id:1,date:"2025-02-27",score:3,user_id:101,name:"Moxie"}
|
|
160
|
+
{id:2,date:"2025-02-27",score:2,user_id:102,name:"Ziggy"}
|
|
161
|
+
{id:3,date:"2025-02-28",score:5,user_id:101,name:"Moxie"}
|
|
162
|
+
{id:4,date:"2025-02-28",score:9,user_id:103,name:"Sprocket"}
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
Filtering to top scores per date using a subquery:
|
|
166
|
+
```mdtest-command
|
|
167
|
+
super -s -c '
|
|
168
|
+
select *
|
|
169
|
+
from "scores.json"
|
|
170
|
+
where score in (select max(score), date
|
|
171
|
+
from "scores.json"
|
|
172
|
+
group by date)'
|
|
173
|
+
```
|
|
174
|
+
```mdtest-output
|
|
175
|
+
{id:1,date:"2025-02-27",score:3,user_id:101}
|
|
176
|
+
{id:4,date:"2025-02-28",score:9,user_id:103}
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
The "obvious" SQL approach with tuple comparison returns empty — this is a known
|
|
180
|
+
issue ([#6326](https://github.com/brimdata/super/issues/6326)):
|
|
181
|
+
```mdtest-command
|
|
182
|
+
super -s -c '
|
|
183
|
+
select s.id, s.date, s.score, s.user_id, u.name
|
|
184
|
+
from "scores.json" s
|
|
185
|
+
join "users.json" u on s.user_id = u.user_id
|
|
186
|
+
where (s.date, s.score) in (
|
|
187
|
+
select date, max(score)
|
|
188
|
+
from "scores.json"
|
|
189
|
+
group by date)'
|
|
190
|
+
```
|
|
191
|
+
```mdtest-output
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
A derived table approach (subquery in FROM) does work:
|
|
195
|
+
```mdtest-command
|
|
196
|
+
super -s -c '
|
|
197
|
+
select s.id, s.date, s.score, s.user_id, u.name
|
|
198
|
+
from "scores.json" s
|
|
199
|
+
join (
|
|
200
|
+
select date, max(score) as max_score
|
|
201
|
+
from "scores.json"
|
|
202
|
+
group by date
|
|
203
|
+
) m on s.date = m.date and s.score = m.max_score
|
|
204
|
+
join "users.json" u on s.user_id = u.user_id
|
|
205
|
+
order by s.id'
|
|
206
|
+
```
|
|
207
|
+
```mdtest-output
|
|
208
|
+
{id:1,date:"2025-02-27",score:3,user_id:101,name:"Moxie"}
|
|
209
|
+
{id:4,date:"2025-02-28",score:9,user_id:103,name:"Sprocket"}
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
The piped approach also works — filter first, then join to get usernames:
|
|
213
|
+
```mdtest-command
|
|
214
|
+
super -s -c '
|
|
215
|
+
from "scores.json"
|
|
216
|
+
| where score in (select max(score), date from "scores.json" group by date)
|
|
217
|
+
| inner join (from "users.json") on left.user_id=right.user_id
|
|
218
|
+
| select left.id, left.date, left.score, right.name
|
|
219
|
+
| sort id'
|
|
220
|
+
```
|
|
221
|
+
```mdtest-output
|
|
222
|
+
{id:1,date:"2025-02-27",score:3,name:"Moxie"}
|
|
223
|
+
{id:4,date:"2025-02-28",score:9,name:"Sprocket"}
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
[^1]: SQL subqueries that reference files re-read the file for each subquery,
|
|
227
|
+
which increases CPU usage and wall time compared to the piped approach.
|
|
228
|
+
|
|
229
|
+
# as of versions
|
|
230
|
+
|
|
231
|
+
```mdtest-command
|
|
232
|
+
super --version
|
|
233
|
+
```
|
|
234
|
+
```mdtest-output
|
|
235
|
+
Version: v0.2.0
|
|
236
|
+
```
|
|
@@ -0,0 +1,164 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Optimizing Sup Values into Bash Variables"
|
|
3
|
+
name: sup-to-bash
|
|
4
|
+
description: "Optimizing SuperDB output into Bash variables efficiently."
|
|
5
|
+
layout: default
|
|
6
|
+
nav_order: 8
|
|
7
|
+
parent: Tutorials
|
|
8
|
+
superdb_version: "0.3.0"
|
|
9
|
+
last_updated: "2026-02-15"
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Optimizing Sup Values into Bash Variables
|
|
13
|
+
|
|
14
|
+
[//]: # (TODO: THIS NEEDS THE FULL DOCUMENTATION TREATMENT)
|
|
15
|
+
|
|
16
|
+
While you should try to push as much logic into SuperDB commands, inevitably you'll
|
|
17
|
+
need these values in variables in your language of choice.
|
|
18
|
+
|
|
19
|
+
A simple one of those — languages — is Bash. It can be a common pattern to then do
|
|
20
|
+
something like this:
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
echo '
|
|
24
|
+
{"InstanceId":"i-05b132aa000f0afa0",
|
|
25
|
+
"InstanceType":"t4g.teeny",
|
|
26
|
+
"LaunchTime":"2025-04-01T12:34:56+00:00",
|
|
27
|
+
"PrivateIpAddress":"10.0.1.2"}
|
|
28
|
+
' > ec2s.json
|
|
29
|
+
|
|
30
|
+
instance_id=$(super -f line -c "values InstanceId" ec2s.json)
|
|
31
|
+
instance_type=$(super -f line -c "values InstanceType" ec2s.json)
|
|
32
|
+
launch_time=$(super -f line -c "values LaunchTime" ec2s.json)
|
|
33
|
+
private_ip=$(super -f line -c "values PrivateIpAddress" ec2s.json)
|
|
34
|
+
|
|
35
|
+
echo "$instance_id" : "$instance_type" : "$launch_time" : "$private_ip"
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
But that just seems ... slow. And repetitive. How can we make this better?
|
|
39
|
+
|
|
40
|
+
If we can trust that there's no spaces in the data, we can do this:
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
read -r instance_id instance_type launch_time private_ip <<<"$(
|
|
44
|
+
echo '
|
|
45
|
+
{"InstanceId":"i-05b132aa000f0afa0",
|
|
46
|
+
"InstanceType":"t4g.teeny",
|
|
47
|
+
"LaunchTime":"2025-04-01T12:34:56+00:00",
|
|
48
|
+
"PrivateIpAddress":"10.0.1.2"}
|
|
49
|
+
' |
|
|
50
|
+
super -f line -c "
|
|
51
|
+
[InstanceId,InstanceType,LaunchTime,PrivateIpAddress]
|
|
52
|
+
| join(this, ' ')" -
|
|
53
|
+
)"
|
|
54
|
+
|
|
55
|
+
echo "$instance_id" : "$instance_type" : "$launch_time" : "$private_ip"
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
The IFS env var controls how Bash splits strings and defaults to space, tab, and
|
|
59
|
+
newline. If we need tab-delimited output from super to support spaces in values,
|
|
60
|
+
we can do this:
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
IFS=$'\t' read -r name title <<<"$(
|
|
64
|
+
echo '{"Name":"David Lloyd George","Title":"Prime Minister"}' |
|
|
65
|
+
super -f line -c "[Name,Title] | join(this, '\t')" -
|
|
66
|
+
)"
|
|
67
|
+
|
|
68
|
+
echo "$title" : "$name"
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
If we want a simpler SuperDB command, just outputting the values as a separate
|
|
72
|
+
string each on its own line will require we handle that with `mapfile` into a
|
|
73
|
+
Bash array. Accessing the Bash array is still fast, and this approach is about
|
|
74
|
+
the equivalent in terms of time.
|
|
75
|
+
|
|
76
|
+
This version is more verbose on the Bash side of things, and is probably not
|
|
77
|
+
worth the simpler SuperDB command.
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
mapfile -t values <<<"$(
|
|
81
|
+
echo '
|
|
82
|
+
{"InstanceId":"i-05b132aa000f0afa0",
|
|
83
|
+
"InstanceType":"t4g.teeny",
|
|
84
|
+
"LaunchTime":"2025-04-01T12:34:56+00:00",
|
|
85
|
+
"PrivateIpAddress":"10.0.1.2"}
|
|
86
|
+
' |
|
|
87
|
+
super -f line -c "values InstanceId,InstanceType,LaunchTime,PrivateIpAddress" -
|
|
88
|
+
)"
|
|
89
|
+
|
|
90
|
+
instance_id="${values[0]}"
|
|
91
|
+
instance_type="${values[1]}"
|
|
92
|
+
launch_time="${values[2]}"
|
|
93
|
+
private_ip="${values[3]}"
|
|
94
|
+
|
|
95
|
+
echo "$instance_id" : "$instance_type" : "$launch_time" : "$private_ip"
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
Except ...
|
|
99
|
+
|
|
100
|
+
### Empty Fields and IFS Whitespace
|
|
101
|
+
|
|
102
|
+
There's a flaw in space or tab-delimited options. If a field being returned is
|
|
103
|
+
*empty*, the multiple delimiters will be collapsed together, causing the reads.
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
IFS=$'\t' read -r a b c <<<"$(
|
|
107
|
+
echo '{"a":"x","b":"","c":"z"}' |
|
|
108
|
+
super -f line -c "
|
|
109
|
+
values [this.a, this.b, this.c]
|
|
110
|
+
| join(this, '\t')
|
|
111
|
+
" -
|
|
112
|
+
)"
|
|
113
|
+
|
|
114
|
+
echo "$a" : "$b" : "$c"
|
|
115
|
+
# => x : z :
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
"If the value of IFS consists solely of IFS whitespace, any sequence of IFS
|
|
119
|
+
whitespace characters delimits a field, so a field consists of characters that
|
|
120
|
+
are not unquoted IFS whitespace, and null fields result only from quoting.
|
|
121
|
+
|
|
122
|
+
If IFS contains a non-whitespace character, then any character in the value of
|
|
123
|
+
IFS that is not IFS whitespace, along with any adjacent IFS whitespace
|
|
124
|
+
characters, delimits a field. This means that adjacent non-IFS-whitespace
|
|
125
|
+
delimiters produce a null field. A sequence of IFS whitespace characters also
|
|
126
|
+
delimits a field.
|
|
127
|
+
|
|
128
|
+
Explicit null arguments ("" or '') are retained and passed to commands as empty
|
|
129
|
+
strings. Unquoted implicit null arguments, resulting from the expansion of
|
|
130
|
+
parameters that have no values, are removed. Expanding a parameter with no value
|
|
131
|
+
within double quotes produces a null field, which is retained and passed to a
|
|
132
|
+
command as an empty string."
|
|
133
|
+
|
|
134
|
+
-- https://www.gnu.org/software/bash/manual/bash.html#Word-Splitting-1
|
|
135
|
+
|
|
136
|
+
So, maybe we can put quotes around our values? While it does retain the empty
|
|
137
|
+
field, the quotes aren't stripped out of the values, which makes sense, but
|
|
138
|
+
isn't helpful here.
|
|
139
|
+
|
|
140
|
+
```bash
|
|
141
|
+
IFS=$'\t' read -r a b c <<<"$(
|
|
142
|
+
echo '{"a":"x","b":"","c":"z"}' |
|
|
143
|
+
super -f line -c "
|
|
144
|
+
values [f'\"{this.a}\"', f'\"{this.b}\"', f'\"{this.c}\"']
|
|
145
|
+
| join(this, '\t')
|
|
146
|
+
" -
|
|
147
|
+
)"
|
|
148
|
+
|
|
149
|
+
echo "$a" : "$b" : "$c"
|
|
150
|
+
# => "x" : "" : "z"
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
We should then be able to use a non-whitespace IFS to split the values, and
|
|
154
|
+
retain empty fields:
|
|
155
|
+
|
|
156
|
+
```bash
|
|
157
|
+
IFS='|' read -r a b c <<<"$(
|
|
158
|
+
echo '{"a":"x","b":"","c":"z"}' |
|
|
159
|
+
super -f line -c "values [a, b, c] | join(this, '|')" -
|
|
160
|
+
)"
|
|
161
|
+
|
|
162
|
+
echo "$a" : "$b" : "$c"
|
|
163
|
+
# => "x" : "" : "z"
|
|
164
|
+
```
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "Updating Data in a Lake"
|
|
3
|
+
name: super-db-update
|
|
4
|
+
description: "Workarounds for updating data in a SuperDB lake."
|
|
5
|
+
layout: default
|
|
6
|
+
nav_order: 11
|
|
7
|
+
parent: Tutorials
|
|
8
|
+
superdb_version: "0.2.0"
|
|
9
|
+
last_updated: "2026-02-15"
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Updating Data in a Lake
|
|
13
|
+
|
|
14
|
+
There are plans to eventually support this, captured in this [GitHub Issue
|
|
15
|
+
#4024](https://github.com/brimdata/super/issues/4024). But for now, we'll have
|
|
16
|
+
to fudge it.
|
|
17
|
+
|
|
18
|
+
All we can do for now are separate `delete` and `load` actions. It's safer to do
|
|
19
|
+
a load-then-delete, in case the delete fails, we'll at least have duplicated
|
|
20
|
+
data, vs. no data at all in the case of failure during a delete-then-load.
|
|
21
|
+
|
|
22
|
+
Since we have unstructured data, we can attempt to track the ...
|
|
23
|
+
|
|
24
|
+
load: {id:4,foo:1,ts:time('2025-02-18T01:00:00')}
|
|
25
|
+
load: {id:4,foo:2,ts:time('2025-02-18T02:00:00')}
|
|
26
|
+
delete: -where 'id==4 and ts < 2025-02-18T02:00:00'
|
|
27
|
+
|
|
28
|
+
if it's typed data
|
|
29
|
+
|
|
30
|
+
delete: -where 'is(<foo>) ...'
|
|
31
|
+
|
|
32
|
+
if we need to double-check duplicate data:
|
|
33
|
+
|
|
34
|
+
'count() by is(<foo>)'
|
|
@@ -0,0 +1,113 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "unnest"
|
|
3
|
+
name: unnest
|
|
4
|
+
description: "Guide to the unnest operator including nested unnest...into patterns."
|
|
5
|
+
layout: default
|
|
6
|
+
nav_order: 9
|
|
7
|
+
parent: Tutorials
|
|
8
|
+
superdb_version: "0.3.0"
|
|
9
|
+
last_updated: "2026-02-15"
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# unnest
|
|
13
|
+
|
|
14
|
+
The `unnest` operator has this signature in the super docs:
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
unnest <expr> [ into ( <query> ) ]
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
The simple form of unnest is pretty straightforward, to loop over each value in
|
|
21
|
+
the `<expr>` array. In this simple example, we `collect` the values output by
|
|
22
|
+
`seq 1 3` into an array, and then `unnest` each value back out again.
|
|
23
|
+
|
|
24
|
+
```mdtest-command
|
|
25
|
+
seq 1 3 | super -s -c "
|
|
26
|
+
collect(this)
|
|
27
|
+
| unnest this
|
|
28
|
+
" -
|
|
29
|
+
```
|
|
30
|
+
```mdtest-output
|
|
31
|
+
1
|
|
32
|
+
2
|
|
33
|
+
3
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## unnest ... into examples
|
|
37
|
+
|
|
38
|
+
`unnest ... into` is a bit more complicated, esp. in the case of `<expr>` being
|
|
39
|
+
a two field record. The docs explain it like this:
|
|
40
|
+
|
|
41
|
+
> If `<expr>` is a record, it must have two fields of the form:
|
|
42
|
+
>
|
|
43
|
+
> `{<first>: <any>, <second>:<array>}`
|
|
44
|
+
>
|
|
45
|
+
> where `<first>` and `<second>` are arbitrary field names, `<any>` is any
|
|
46
|
+
SuperSQL value, and <array> is an array value. In this case, the derived
|
|
47
|
+
sequence has the form:
|
|
48
|
+
> ```
|
|
49
|
+
> {<first>: <any>, <second>:<elem0>}
|
|
50
|
+
> {<first>: <any>, <second>:<elem1>}
|
|
51
|
+
> ...
|
|
52
|
+
> ```
|
|
53
|
+
|
|
54
|
+
Let's expand on our previous example a bit. We collect the values 1,2,3 into an
|
|
55
|
+
array, as before then package that up as the values of both fields in a new
|
|
56
|
+
record.
|
|
57
|
+
|
|
58
|
+
```mdtest-command
|
|
59
|
+
seq 1 3 | super -s -c "
|
|
60
|
+
collect(this)
|
|
61
|
+
| {foo:this, bar:this}
|
|
62
|
+
| unnest {foo, bar} into (
|
|
63
|
+
-- for each value in the bar array, pass that value plus all of data each time
|
|
64
|
+
values f'foo is {this.foo}, bar is {this.bar}'
|
|
65
|
+
)
|
|
66
|
+
" -
|
|
67
|
+
```
|
|
68
|
+
```mdtest-output
|
|
69
|
+
"foo is [1,2,3], bar is 1"
|
|
70
|
+
"foo is [1,2,3], bar is 2"
|
|
71
|
+
"foo is [1,2,3], bar is 3"
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
You can see here, `foo` is the `<any>` value and so is passed in as-is, while
|
|
75
|
+
the inner `bar` value is each individual value in the array.
|
|
76
|
+
|
|
77
|
+
But what this means is we can actually do a pair of nested `unnest` loops, like
|
|
78
|
+
so:
|
|
79
|
+
|
|
80
|
+
```mdtest-command
|
|
81
|
+
seq 1 3 | super -s -c "
|
|
82
|
+
collect(this)
|
|
83
|
+
| {foo:this, bar:this}
|
|
84
|
+
| unnest {foo, bar} into (
|
|
85
|
+
-- for each value in the bar array, pass that value plus all of data each time
|
|
86
|
+
unnest {this.bar, this.foo} into (
|
|
87
|
+
-- now, for each value in the foo array, pass that value plus the single value
|
|
88
|
+
-- from the outer unnest
|
|
89
|
+
values f'foo is {this.foo}, bar is {this.bar}'
|
|
90
|
+
)
|
|
91
|
+
)
|
|
92
|
+
" -
|
|
93
|
+
```
|
|
94
|
+
```mdtest-output
|
|
95
|
+
"foo is 1, bar is 1"
|
|
96
|
+
"foo is 2, bar is 1"
|
|
97
|
+
"foo is 3, bar is 1"
|
|
98
|
+
"foo is 1, bar is 2"
|
|
99
|
+
"foo is 2, bar is 2"
|
|
100
|
+
"foo is 3, bar is 2"
|
|
101
|
+
"foo is 1, bar is 3"
|
|
102
|
+
"foo is 2, bar is 3"
|
|
103
|
+
"foo is 3, bar is 3"
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
## as of versions
|
|
107
|
+
|
|
108
|
+
```mdtest-command
|
|
109
|
+
super --version
|
|
110
|
+
```
|
|
111
|
+
```mdtest-output
|
|
112
|
+
Version: v0.2.0
|
|
113
|
+
```
|