@xcitedbs/client 0.2.14 → 0.2.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/client.d.ts +7 -1
- package/dist/client.js +12 -0
- package/dist/index.d.ts +1 -1
- package/dist/types.d.ts +14 -0
- package/llms-full.txt +2 -0
- package/llms.txt +1 -1
- package/package.json +4 -2
- package/unquery-ai-guide.md +1139 -0
- package/unquery-grammar.md +414 -0
|
@@ -0,0 +1,414 @@
|
|
|
1
|
+
# Unquery — grammar and parse tree (parser-faithful)
|
|
2
|
+
|
|
3
|
+
This document is the **ground truth** for Unquery syntax as implemented in the XCiteDB engine. It is derived from `TParser` in [`XCiteDB2/XCiteDB/src/TemplateParser.cpp`](../XCiteDB2/XCiteDB/src/TemplateParser.cpp) and the AST in [`XCiteDB2/XCiteDB/include/TemplateQuery.h`](../XCiteDB2/XCiteDB/include/TemplateQuery.h). Use it for **syntax checking** and **conservative static typing** of expressions inside JSON templates.
|
|
4
|
+
|
|
5
|
+
Human-oriented prose and examples: [Unquery reference manual](../web/src/docs/content/unquery-language-reference.html) (Asciidoc source: `web/src/docs/unquery-source/unquery-language-reference.adoc`).
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 1. Two-stage parsing
|
|
10
|
+
|
|
11
|
+
### 1.1 Stage 1 — JSON template (`JSONToTQ`)
|
|
12
|
+
|
|
13
|
+
The outer query is a **JSON value** (object, array, or leaf). The host parses JSON; Unquery then walks it.
|
|
14
|
+
|
|
15
|
+
| JSON kind | AST / node | Notes |
|
|
16
|
+
|-----------|------------|--------|
|
|
17
|
+
| `array` | `TQArray` | Each element is a child `TemplateQuery`. |
|
|
18
|
+
| `object` | `TQObject` | Each member: parse **key string** then **value**. |
|
|
19
|
+
| `string` (top-level fragment) | `TQValue` / `TQValueWithCond` | Entire string passed to `TParser::value()`. |
|
|
20
|
+
| `number` | `TQValue` + `TExprIntConst` / `TExprDoubleConst` | Not parsed as Unquery text. |
|
|
21
|
+
| `true` / `false` | `TQValue` + `TExprBoolConst` | Not parsed as Unquery text. |
|
|
22
|
+
| `null` / missing | `TQValue` + `TExprJSONConst(null)` | Fallback when `JSONToTQ` gets an unsupported JSON kind. |
|
|
23
|
+
|
|
24
|
+
**Object member processing** (`JSONToTQ`, object branch):
|
|
25
|
+
|
|
26
|
+
1. `name = m.name.GetString()` — the JSON **key** is a string.
|
|
27
|
+
2. `TParser parse_name(name, st)` — same lexer/parser as values.
|
|
28
|
+
3. `key = parse_name.key()` then `context_mod = parse_name.context_mod(true, ContextModMode::Start)`.
|
|
29
|
+
4. If `m.value` is a **string**: parse with `TParser parse_val(...)` according to `key->getKeyType()` (see §7).
|
|
30
|
+
5. If `m.value` is not a string: `JSONToTQ(m.value)` recursively; optional `context_mod->replace(val)`.
|
|
31
|
+
|
|
32
|
+
**Symbol table**: `TSymTable` holds `funcs: map<string,int>` (name → parameter count). Populated when a `#func` key is parsed; used when parsing `$userfunc(...)` in expressions.
|
|
33
|
+
|
|
34
|
+
### 1.2 Stage 2 — string sublanguages
|
|
35
|
+
|
|
36
|
+
| Sublanguage | Entry point | Typical use |
|
|
37
|
+
|-------------|---------------|-------------|
|
|
38
|
+
| Object **key** (before context modifiers) | `TParser::key()` | JSON object member name. |
|
|
39
|
+
| Context modifier chain after key | `TParser::context_mod()` | Suffix after key in the same string. |
|
|
40
|
+
| Object **value** (string leaf) or directive body | `TParser::value()` | Expression + optional `?` + optional constraint + optional `@sort`. |
|
|
41
|
+
| `#if` value only | `TParser::condition()` | No `value()` — no sort suffix on the condition string. |
|
|
42
|
+
|
|
43
|
+
**Principle (evaluation)**: result shape mirrors template shape, except where the manual documents exceptions (`#return`, `#returnif`, dynamic keys, `||` context-or, `**`, etc.) — see §9.
|
|
44
|
+
|
|
45
|
+
---
|
|
46
|
+
|
|
47
|
+
## 2. Lexical grammar (`TParser::nextToken`)
|
|
48
|
+
|
|
49
|
+
Whitespace (`isspace`) is skipped between tokens. After a token is consumed, trailing spaces are skipped (`_pos` advanced).
|
|
50
|
+
|
|
51
|
+
### 2.1 Tokens
|
|
52
|
+
|
|
53
|
+
**Numbers** — if first char is digit:
|
|
54
|
+
|
|
55
|
+
- Consume digits and at most one `.` in the numeric part.
|
|
56
|
+
- If the next char is alphanumeric or `_`, extend the token as an **identifier-like** run (so tokens like `123abc` exist as a single token).
|
|
57
|
+
|
|
58
|
+
**Identifiers** — if first char is `isalnum(c) || c=='_'` OR `$` OR `%`:
|
|
59
|
+
|
|
60
|
+
- Consume while `isalnum_` (`[A-Za-z0-9_]`).
|
|
61
|
+
|
|
62
|
+
**Double-quoted string** — starts with `"`:
|
|
63
|
+
|
|
64
|
+
- Run until closing `"` that is not preceded by `\` (backslash escapes the quote).
|
|
65
|
+
- Error if EOF before closing quote.
|
|
66
|
+
|
|
67
|
+
**Single-quoted string** — starts with `'`:
|
|
68
|
+
|
|
69
|
+
- Run until next `'` (no escape handling in lexer).
|
|
70
|
+
- Error if EOF.
|
|
71
|
+
|
|
72
|
+
**Backtick string** — starts with `` ` ``:
|
|
73
|
+
|
|
74
|
+
- Run until next `` ` ``.
|
|
75
|
+
- Error if EOF.
|
|
76
|
+
|
|
77
|
+
**Single-character punctuation** (each is one token unless merged below):
|
|
78
|
+
|
|
79
|
+
`()[]{};,.#?!:$%`
|
|
80
|
+
|
|
81
|
+
Special: if char is `.` and next is `.`, emit token `..` (two dots).
|
|
82
|
+
|
|
83
|
+
**Two-character operators** (only these merge; otherwise adjacent operators are separate tokens):
|
|
84
|
+
|
|
85
|
+
| Token | Condition |
|
|
86
|
+
|-------|-----------|
|
|
87
|
+
| `<=` | `<` followed by `=` |
|
|
88
|
+
| `>=` | `>` followed by `=` |
|
|
89
|
+
| `<<` | `<` followed by `<` |
|
|
90
|
+
| `->` | `-` followed by `>` |
|
|
91
|
+
| `\|\|` | `\|` followed by `\|` |
|
|
92
|
+
| `**` | `*` followed by `*` |
|
|
93
|
+
|
|
94
|
+
**Important**: `<-` is **not** one token — it lexes as `<` then `-` (comment in parser: avoids invalid `<-`).
|
|
95
|
+
|
|
96
|
+
Any other punctuation: single character only.
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## 3. String value grammar (`TParser::value`)
|
|
101
|
+
|
|
102
|
+
Parser entry: read one **expression**, then optional pieces, then **must** be at end of string (`eos()`).
|
|
103
|
+
|
|
104
|
+
```ebnf
|
|
105
|
+
value ::= expression ( '?' condition )? constraint? sortSpec? EOF
|
|
106
|
+
constraint ::= baseCondition (* only if next token is not '@' and not EOF; see below *)
|
|
107
|
+
sortSpec ::= '@' ( 'ascending' | 'descending' | 'unique_ascending' | 'unique_descending' )
|
|
108
|
+
( '(' INTEGER ')' )?
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
**Constraint coupling** (implementation detail, affects syntax):
|
|
112
|
+
|
|
113
|
+
- After parsing `expression` and optional `?condition`, if the input is **not** at EOF and the next token is **not** `@`, the parser calls `baseCondition(exp)` where `exp` is the already-parsed expression (first operand of the comparison / test).
|
|
114
|
+
- So forms like `"Field1?predicate Field1=2"` are parsed; the constraint shares the leading expression as LHS.
|
|
115
|
+
- AST: outer `TQValue` may be wrapped in `TQValueWithCond` for the `?` part; the **constraint** from `value()`’s `second` member is attached separately in `JSONToTQ` (object field vs `KeyType::Values` — see §7).
|
|
116
|
+
|
|
117
|
+
**Sort order** (`@`): must be one of the four keywords above; optional `( n )` where `n` is a single integer token (`stoll`). Unknown sort keyword → parse error.
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## 4. Expression grammar (`expression`, `baseExpression`)
|
|
122
|
+
|
|
123
|
+
### 4.1 Precedence (`expression(int prec)`)
|
|
124
|
+
|
|
125
|
+
Climbing parser levels:
|
|
126
|
+
|
|
127
|
+
| `prec` | Operators | Associates |
|
|
128
|
+
|--------|-----------|------------|
|
|
129
|
+
| 0 | `+`, `-` | Left via loop: each new rhs parsed with `expression(1)`. |
|
|
130
|
+
| 1 | `*`, `/`, `mod` | Left; rhs parsed with `expression(2)`. |
|
|
131
|
+
| 3 | Postfix `.token` and `[ expr ]` | Postfix chain on current `res`. |
|
|
132
|
+
|
|
133
|
+
Parentheses: `(` `expression(0)` `)` as primary.
|
|
134
|
+
|
|
135
|
+
### 4.2 EBNF sketch
|
|
136
|
+
|
|
137
|
+
```ebnf
|
|
138
|
+
expression ::= '(' expression ')'
|
|
139
|
+
| baseExpression ( ( '+' | '-' ) expression(1)
|
|
140
|
+
| ( '*' | '/' | 'mod' ) expression(2) )*
|
|
141
|
+
| baseExpression ( '.' fieldSegment | '[' expression ']' )*
|
|
142
|
+
|
|
143
|
+
fieldSegment ::= IDENT | BACKTICK_STR | '$' IDENT ... (* lexer token after '.'; special case: if token starts with '$', subfield parse backs up — see pathWithBrackets *)
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**Postfix** (prec ≤ 3): after `consume()` of `.` or `[`, if `[` then parse `expression(0)` unless next is `]`; require `]`; build `TExprSubfield(res, expr_or_empty, is_index)`.
|
|
147
|
+
|
|
148
|
+
### 4.3 `baseExpression` — alternatives (first token disambiguation)
|
|
149
|
+
|
|
150
|
+
Roughly in parse order:
|
|
151
|
+
|
|
152
|
+
| Prefix / form | AST | Notes |
|
|
153
|
+
|-----------------|-----|--------|
|
|
154
|
+
| `[` optional `]` | `TExprSubfield(TExprField("."), expr?, is_index)` | `[]` = whole array on current `.`; `[e]` = index. |
|
|
155
|
+
| `$` `(` `expression` `)` | `TExprField(expr)` | Evaluate / path from dynamic name (`$()`). |
|
|
156
|
+
| `$if` `(` `condition` `,` `expression` `,` `expression` `)` | `TExprITE` | |
|
|
157
|
+
| `$call` `(` IDENT `)` | `TExprCall(name)` | Zero-arg user function by name. |
|
|
158
|
+
| `$var` `(` IDENT `)` | `TExprVar` | |
|
|
159
|
+
| `$file` `(` `expression` `)` | `TExprFile` | |
|
|
160
|
+
| `$csv` `(` `expression` ( `,` delim? `,` header? )? `)` | `TExprCSV` | Optional delim: next token strip quotes; optional header: `boolean()`. |
|
|
161
|
+
| `$prev` `(` `expression` `)` | `TExprPrev` | Aggregate. |
|
|
162
|
+
| `$lower` / `$upper` `(` `expression` `)` | `TExprChangeCase` | |
|
|
163
|
+
| `$path` | `TExprPath` | |
|
|
164
|
+
| `$index` | `TExprIndex` | |
|
|
165
|
+
| `$key` | `TExprKey` | |
|
|
166
|
+
| `$reskey` | `TExprReskey` | |
|
|
167
|
+
| `$filename` | `TExprFilename` | |
|
|
168
|
+
| `$env` `(` `expression` `)` | `TExprEnv` | |
|
|
169
|
+
| `$identifier` ( `(` INT `)` )? | `TExprIdentifier(parts)` | |
|
|
170
|
+
| `$xml` / `$xml_no_children` | `TExprXML` | |
|
|
171
|
+
| `$node` | `TExprNode` | |
|
|
172
|
+
| `$attr` `(` QUOTED `)` | `TExprAttr` | Param must be quoted then stripped. |
|
|
173
|
+
| `$child` `(` QUOTED `)` | `TExprChild` | |
|
|
174
|
+
| `$xpath` / `$lxpath` `(` QUOTED `)` | `TExprXPath` | |
|
|
175
|
+
| `$in_filter` `(` QUOTED `)` | `TExprInFilter` | |
|
|
176
|
+
| `$text` `(` `expression` `)` | `TExprText` | |
|
|
177
|
+
| `$size` / `$length` `(` `expression` `)` | `TExprSize` / `TExprLength` | |
|
|
178
|
+
| `$to_time` / `$time_to_str` `(` `expression` ( `,` QUOTED )? `)` | **Bug in parser**: inner `fmt` is declared twice in block; format may be ignored — treat as optional second arg. |
|
|
179
|
+
| `$now` | `TExprIntConst(epoch)` | Evaluated at parse time in parser. |
|
|
180
|
+
| `$string` / `$number` / `$int` / `$float` / `$bool` `(` `expression` `)` | `TExprTypeCast` | |
|
|
181
|
+
| `$node_date` | `TExprLastChange(false,false)` | |
|
|
182
|
+
| `$data_date` ( `(` `expression` `)` )? | `TExprLastChange(true,false,arg)` | No parens → arg is `TExprField(".")`. |
|
|
183
|
+
| `$count` | `TExprCount` | Aggregate, no args. |
|
|
184
|
+
| `$sum` / `$avg` `(` `expression` `)` | `TExprSum` / `TExprAvg` | Aggregates. |
|
|
185
|
+
| `$min` / `$max` `(` `expression` `)` | `TExprMinmax` | Aggregate. |
|
|
186
|
+
| `$substr` `(` `expression` `,` `expression(0)` ( `,` `expression(0)` )? `)` | `TExprSubstr` | |
|
|
187
|
+
| `$find` / `$ifind` `(` `expression` `,` `expression(0)` `)` | `TExprFind` | |
|
|
188
|
+
| `$replace` `(` expr `,` expr `,` expr ( `,` boolean )? `)` | `TExprReplace` | |
|
|
189
|
+
| `$split` `(` expr `,` expr `)` | `TExprSplit` | |
|
|
190
|
+
| `$join` `(` expr ( `,` expr )? `)` | `TExprJoin` | If no second arg, delimiter is empty string const. |
|
|
191
|
+
| `$D` QUOTED | `TExprIntConst` | Date literal → epoch via `stringToTime`. |
|
|
192
|
+
| `%` IDENT | `TExprVar` | Same as `$var(name)` when name is rest of token. |
|
|
193
|
+
| `$` IDENT `(` arg ( `,` arg )* `)` | `TExprCall` | **User-defined** function: name must exist in `sym_table->funcs`; arity must match. |
|
|
194
|
+
| `"..."` / `'...'` | `TExprStringConst` | Strip outer quotes. |
|
|
195
|
+
| number | `TExprIntConst` / `TExprDoubleConst` | If token contains `.` → double. |
|
|
196
|
+
| `true` / `false` | `TExprBoolConst` | |
|
|
197
|
+
| `/` `baseExpression` | `TExprChangepath(..., ROOT)` | |
|
|
198
|
+
| `..` ( `/` `baseExpression` )? | `TExprChangepath(..., UP)` | Without `/`, inner arg defaults to `TExprField(".")`. |
|
|
199
|
+
| `<<` `baseExpression` | `TExprChangepath(..., PREVID)` | |
|
|
200
|
+
| IDENT … | `TExprField(path)` | Via `pathWithBrackets(firstToken)`. |
|
|
201
|
+
| `` `...` `` | `TExprField` | Strip + `escape_field_name`. |
|
|
202
|
+
| `.` | `TExprField(".")` | |
|
|
203
|
+
| `-` number | `TExprIntConst(-n)` | **Only** if next token is a number token. |
|
|
204
|
+
|
|
205
|
+
**Backtick in postfix position**: if the next token after an infix position starts with `` ` ``, lexer returns it; `expression()` treats it as `op` and runs `escape_field_name(stripQuotes_(op))` for the field segment string.
|
|
206
|
+
|
|
207
|
+
---
|
|
208
|
+
|
|
209
|
+
## 5. Path segments (`pathId`, `pathWithBrackets`)
|
|
210
|
+
|
|
211
|
+
```ebnf
|
|
212
|
+
pathId ::= IDENT | BACKTICK_STR | '"' ... '"' | '\'' ... '\''
|
|
213
|
+
pathMore ::= '.' ( IDENT | BACKTICK_STR | '$' ... ) (* if next token after '.' starts with '$', pathWithBrackets stops — Field.$x is NOT continued as path *)
|
|
214
|
+
| '[' INTEGER ']'
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
**`pathWithBrackets`**: uses save/restore; only consumes `.field` or `[n]` when syntactically valid; numeric index must be integer token followed by `]`.
|
|
218
|
+
|
|
219
|
+
**Note**: Dynamic index in **path text** at primary position uses postfix `[` `expression` `]` in `expression()`, not `pathWithBrackets` (which only accepts **integer literals** in bracket segments).
|
|
220
|
+
|
|
221
|
+
### 5.1 Identifier expressions (after `->`)
|
|
222
|
+
|
|
223
|
+
Parsed when the text after `->` is unambiguously an identifier path (see §8.2 dispatch). Builds a **`TQIdExpr`**: a list of `TQIdStep` operations (`IdStepKind`) evaluated at runtime to produce **zero or more** identifier strings (set-valued, like JSON-path iteration). A path that collapses to a single identifier is the trivial one-element case.
|
|
224
|
+
|
|
225
|
+
```ebnf
|
|
226
|
+
identifierExpression ::= idAnchor idSuffix
|
|
227
|
+
| '[' ']' idSuffix (* anchor-less: same as `->./[]` … *)
|
|
228
|
+
| '**' idSuffix (* anchor-less: same as `->./**` … *)
|
|
229
|
+
| idSegment ( '/' idSegment )+ idSuffix (* unanchored under current id *)
|
|
230
|
+
|
|
231
|
+
idAnchor ::= '/' | '.' '/' | ( '..' '/' )+
|
|
232
|
+
|
|
233
|
+
idSuffix ::= ( '/' idSegment | '[' ']' | '**' )*
|
|
234
|
+
|
|
235
|
+
idSegment ::= IDENT | BACKTICK_STR | QUOTED_STR
|
|
236
|
+
| '$' '(' expression ')'
|
|
237
|
+
| '%' IDENT
|
|
238
|
+
| '.' (* no-op segment *)
|
|
239
|
+
| '[' ']' (* children of preceding identifier *)
|
|
240
|
+
| '**' (* self + all descendants of preceding *)
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
**Not allowed:** mid-path `..` (only the leading `../` anchor run); `[` `N` `]` / `[` anything other than `]` (no index or pattern brackets); mid-path `..` as a segment; `[]` / `**` immediately inside `->$each(...)` (each argument is a JSON expression, not an identifier path). Going “above” `/` with `../` yields an empty identifier at runtime (no match, no exception).
|
|
244
|
+
|
|
245
|
+
**`[]` semantics:** expand to every **direct child** identifier under the path built so far (same as legacy `->$children` from that node). When the `id_hier` LMDB sub-db is present on overlay and base, evaluation uses `XMLReader::listChildrenFromHierMerged` for efficiency; otherwise it falls back to the same `SimpleQuery` + `match_start` scan as before.
|
|
246
|
+
|
|
247
|
+
**`**` semantics:** expand to the identifier built so far **plus every descendant** identifier under it (same shape as JSON-path `**` under the current path — includes self).
|
|
248
|
+
|
|
249
|
+
**`->$each(expr)`** is a separate arrow form (`ArrowOp::Each`): `expr.getJSON(ctx)` must be a string or array of strings; each string is used as an identifier. **`$each(...)` must not be followed by `/` in the same arrow modifier** (parse error); chain another `->` or put path logic inside the expression.
|
|
250
|
+
|
|
251
|
+
---
|
|
252
|
+
|
|
253
|
+
## 6. Conditions (`condition`, `baseCondition`)
|
|
254
|
+
|
|
255
|
+
### 6.1 Boolean condition grammar
|
|
256
|
+
|
|
257
|
+
```ebnf
|
|
258
|
+
condition ::= condOr
|
|
259
|
+
condOr ::= condAnd ( '|' condAnd )* (* prec <= 1 *)
|
|
260
|
+
condAnd ::= condNot ( '&' condNot )* (* prec <= 2 *)
|
|
261
|
+
condNot ::= '!' condNot (* prec <= 3 *)
|
|
262
|
+
| condAtom
|
|
263
|
+
condAtom ::= '(' disambig ')'
|
|
264
|
+
disambig ::= try parse expression(0); if next ')' then ')' and baseCondition(expr)
|
|
265
|
+
| else condition(0) ')' (* parenthesized sub-condition *)
|
|
266
|
+
| baseCondition()
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
**`baseCondition`** (after optional lhs from caller):
|
|
270
|
+
|
|
271
|
+
1. If no `lhs` and next is `(`, parse condition inside parens (retry path on parse error — see source).
|
|
272
|
+
2. Else if no `lhs`, `lhs = expression()`.
|
|
273
|
+
3. `op = nextToken()`.
|
|
274
|
+
4. If `op == "!"` and next is `=`, consume `=` → op `!=`.
|
|
275
|
+
5. Else if `op == "!"` alone → **`TQExistsTest(lhs)`** — no RHS expression.
|
|
276
|
+
6. Else if `op` is one of `is_array`, `is_object`, `is_literal`, `is_string`, `is_number`, `is_int`, `is_float`, `is_bool` → **`TQTypeTest`**, no RHS.
|
|
277
|
+
7. Else `rhs = expression()`; then dispatch on `op` for compare / string / `in` / `not_in`.
|
|
278
|
+
8. **Static rule**: if `(lhs->isAggregate(0) && !rhs->isLiteral()) || (rhs->isAggregate(0) && !lhs->isLiteral())` → **parse error** `"Aggregates can only be compared with literals"`.
|
|
279
|
+
|
|
280
|
+
Compare ops: `=`, `!=`, `<`, `>`, `<=`, `>=`.
|
|
281
|
+
String ops: `contains`, `starts_with`, `ends_with`, `matches`.
|
|
282
|
+
Set ops: `in`, `not_in`.
|
|
283
|
+
|
|
284
|
+
**Note**: condition OR is **single** `|`, not `||` (`||` is lexer merge for **context modifiers** only).
|
|
285
|
+
|
|
286
|
+
---
|
|
287
|
+
|
|
288
|
+
## 7. Keys (`TParser::key`) and `JSONToTQ` string dispatch
|
|
289
|
+
|
|
290
|
+
### 7.1 Key grammar
|
|
291
|
+
|
|
292
|
+
```ebnf
|
|
293
|
+
key ::= '$' '(' expression ')' (* dynamic key → TQParamKey *)
|
|
294
|
+
| '$' IDENT ... | '%' IDENT ... (* whole thing parsed as expression() → TQParamKey *)
|
|
295
|
+
| '{' '}' (* TQRegexKey empty regex *)
|
|
296
|
+
| '{' QUOTED '}' (* TQRegexKey *)
|
|
297
|
+
| '#' directive
|
|
298
|
+
| pathId (* TQSimpleKey, name may include quoted segments from pathId *)
|
|
299
|
+
|
|
300
|
+
directive ::= 'if'
|
|
301
|
+
| 'func' IDENT ( '(' IDENT ( ',' IDENT )* ')' )? (* register arity; TQFuncDefinition *)
|
|
302
|
+
| 'var' IDENT
|
|
303
|
+
| 'assign' IDENT
|
|
304
|
+
| 'exists' | 'notexists' | 'return' | 'returnif'
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
Unknown `#name` → error.
|
|
308
|
+
|
|
309
|
+
### 7.2 `KeyType` and how string **values** are parsed (`JSONToTQ`)
|
|
310
|
+
|
|
311
|
+
For object members whose JSON **value** is a string, `JSONToTQ` chooses the parser by `key->getKeyType()`:
|
|
312
|
+
|
|
313
|
+
| Key kind | `getKeyType()` | String value parsed with | `obj->add(key, q, condq)` behavior |
|
|
314
|
+
|----------|----------------|----------------------------|-------------------------------------|
|
|
315
|
+
| `#if` | `Cond` | `condition()` only → `TQConditionP` stored in local `cond` | `q` stays empty. If `cond` set, `condq = TQCondWrapper(cond)` (with optional `context_mod` on `condq`). `add(#if, nullptr, condq)` — `TQObject::add` inserts a synthetic `KeyType::Cond` field holding the wrapper so `processData` tests it before other members. |
|
|
316
|
+
| `TQSimpleKey`, `TQParamKey`, `TQRegexKey` | **`Values`** (inherits default `TQKey::getKeyType`) | `value()` → `vc.first` = template value, `vc.second` = optional **constraint** from same string | If `vc.second`: `condq = TQCondWrapper` + context mod; `q = vc.first` + context mod. Constraint is **object-level** (wrapper), not inside `TQValueWithCond`. |
|
|
317
|
+
| `#var`, `#assign`, `#exists`, `#notexists`, `#return`, `#returnif` | `Variable`, `Assign`, `Exists`, `Notexists`, `Return`, `ReturnIf` | `else` branch: `value()` | If `vc.second`, entire `q` is `TQValueWithCond(vc.first, vc.second)` — constraint is **on the value**. |
|
|
318
|
+
| `#func` | `Func` | `else` branch: `value()` (body is a full template subtree when JSON is object/array) | Same wrapping as other directives; registration uses `TQFuncDefinition` + `sym_table`. |
|
|
319
|
+
|
|
320
|
+
**Summary**: Trailing **constraint** after the main expression in a field value is either a separate **`TQCondWrapper`** (normal / dynamic keys) or embedded in **`TQValueWithCond`** (directives). `#if` has no `value()` — its string is only a `condition()`.
|
|
321
|
+
|
|
322
|
+
---
|
|
323
|
+
|
|
324
|
+
## 8. Context modifiers (`TParser::context_mod`)
|
|
325
|
+
|
|
326
|
+
Parsed **after** `key()` from the **same** JSON key string. Mode parameter `ContextModMode` controls whether leading `:` / `->` / `[` / `{` / `.` is required or forbidden (start vs continuation vs `||` branch).
|
|
327
|
+
|
|
328
|
+
### 8.1 Modifier atoms (after optional leading `:`)
|
|
329
|
+
|
|
330
|
+
| Form | `ContextMode` | `arrow` / payload |
|
|
331
|
+
|------|---------------|-------------------|
|
|
332
|
+
| `{` `}` or `{` `regex` `}` | `Regex` | String context; empty regex = all keys |
|
|
333
|
+
| `**` | `AllPaths` | |
|
|
334
|
+
| `$` `(` `expression` `)` | `Eval` | `expr` |
|
|
335
|
+
| `$…` or `%…` expression | `Eval` | `expr` from `expression()` |
|
|
336
|
+
| `->` … | `Arrow` | See §8.2 (`identifierExpression`, builtins, legacy JSON-field lookup, `->$each`); often `new_frame = true` |
|
|
337
|
+
| `[` `]` | `Array` | traversal |
|
|
338
|
+
| `[` INT `]` | *(string context)* | literal index path segment |
|
|
339
|
+
| `.` | *(path)* | context string `"."` |
|
|
340
|
+
| IDENT… `[` `]`? | `Array` if trailing `[]`, else path | Empty path + no `[` → **`Reskey`** mode |
|
|
341
|
+
| Optional `?` `condition` after each atom | — | Wraps inner mod chain in `TQValueWithCond` |
|
|
342
|
+
|
|
343
|
+
Recursion: `context_mod(...)` parses the **rest** of the chain; result wrapped in `TQContextMod` (expr or string form).
|
|
344
|
+
|
|
345
|
+
### 8.2 Arrow (`->`) targets
|
|
346
|
+
|
|
347
|
+
Dispatch (after `->`):
|
|
348
|
+
|
|
349
|
+
| Next shape | Behaviour |
|
|
350
|
+
|------------|-----------|
|
|
351
|
+
| `$self`, `$parent`, `$children`, `$descendants`, `$descendants_and_self`, `$ancestors`, `$ancestors_and_self`, `$all` | **Parser sugar:** lowered to a constant `TQIdExpr` and stored as **`ArrowOp::IdExpr`** (same runtime as path syntax). |
|
|
352
|
+
| `$date(…)`, `$branch(…)`, `$file(…)`, `$csv(…)` | Built-ins unchanged (`getArrowOp` + parsing of args). |
|
|
353
|
+
| `$var` `(` … `)` | `ArrowOp::Var` with `context` from `pathWithBrackets`. |
|
|
354
|
+
| `%ident` (standalone) | `ArrowOp::Var`. |
|
|
355
|
+
| `$each` `(` `expression` `)` | **`ArrowOp::Each`** — iterate string/array of identifier strings from `expr.getJSON(ctx)`. Must not be followed by `/` in the same modifier. |
|
|
356
|
+
| `/`, `./`, `../`, `[` `]`, `**`, anchors + `identifierExpression` triggers (see `startsIdentifierExpression`) | **`identifierExpression`** → **`ArrowOp::IdExpr`** + `TQIdExpr`. |
|
|
357
|
+
| Single quoted path segment with no following `/` | **`identifierExpression`** (one `Literal` step). |
|
|
358
|
+
| `IDENT` / `pathWithBrackets` with **no** `/` that triggered `identifierExpression`; bare `$(expr)` with no following `/` | **Legacy** JSON-field lookup: `ctx.getJSON(name)` where `name` is field name or `expr` string (`ArrowOp::None`). **Deprecation:** parser sets `legacy_warn` on the `TQContextMod`; runtime prints a one-time-per-node warning to `stderr`. |
|
|
359
|
+
|
|
360
|
+
**Chaining**: `||` after a **complete** modifier started with `:` (see source): builds `TQContextModOr` with `TQShared` placeholder so alternatives share downstream value.
|
|
361
|
+
|
|
362
|
+
---
|
|
363
|
+
|
|
364
|
+
## 9. Result shape (static intuition)
|
|
365
|
+
|
|
366
|
+
- **Objects**: one result object per template object; keys may be repeated dynamically (`$(...)` / `{}` keys).
|
|
367
|
+
- **Arrays**: appends one element per evaluation pass as per engine rules.
|
|
368
|
+
- **`#return`**: first `#return` in object wins; flattens shape (manual §8.6).
|
|
369
|
+
- **`#returnif`**: first non-empty wins among returnifs (manual §8.7).
|
|
370
|
+
- **Context traversal** (`:[]`, `:{}`, `**`, arrow iterators): may run sub-template many times → many contributions to arrays / dynamic keys.
|
|
371
|
+
- **`||`**: union of contexts for the **same** value subtree.
|
|
372
|
+
|
|
373
|
+
---
|
|
374
|
+
|
|
375
|
+
## 10. Static type-checking checklist (for AIs)
|
|
376
|
+
|
|
377
|
+
1. **End of string**: `value()` must consume entire string; trailing garbage → `"Could not parse text at the end"`.
|
|
378
|
+
2. **Aggregates in conditions**: one side must be **literal** if the other is aggregate (`isAggregate(0)`).
|
|
379
|
+
3. **`#if` value**: parse as **condition**, not `value()` — no `@sort` on that string.
|
|
380
|
+
4. **`#func f(a,b)`**: register arity; every `$f(...)` must supply exactly that many comma-separated `expression` arguments.
|
|
381
|
+
5. **Built-in arity**: match §4.3 table; `$join` allows 1–2 args; `$replace` allows 3–4; `$csv` allows 1–3; `$identifier` allows 0–1 (parens with int).
|
|
382
|
+
6. **`<-`**: not a token; write `<` and unary `-` separately — almost certainly wrong for “less than minus”.
|
|
383
|
+
7. **`..`**: token is two dots; field access after parent must use `../` (`..` then `/` then `baseExpression`).
|
|
384
|
+
8. **Sort**: `@sort` must be last in a **value** string; cannot follow non-`@` trailing constraint without parsing constraint first (constraint is **not** after `@` — `@` ends value parsing).
|
|
385
|
+
9. **Keys**: `#func` name must not duplicate earlier definition in same template symbol table pass.
|
|
386
|
+
10. **`%x`**: same as `$var(x)` for expression typing.
|
|
387
|
+
11. **Path literals in keys**: `pathId` allows quoted strings as path segments (stored as part of key name string); unusual but legal lexer-wise.
|
|
388
|
+
12. **Condition OR**: use single `|`, not `||`.
|
|
389
|
+
13. **`is_string`**: implemented in parser / `Operator` enum; reference manual may not list it — engine supports it.
|
|
390
|
+
14. **Identifier expressions**: `[` `N` `]` and `[` non-`]` forms are rejected inside `idSegment` and immediately after a completed `identifierExpression` (no `[` before `?` / `||` / end) except the allowed `[]` children step.
|
|
391
|
+
15. **`../` past root**: evaluates to empty identifier → no match (no throw).
|
|
392
|
+
16. **`[]` / `**`**: the only bracket / star multi-value forms in identifier expressions; `[N]`, `[$(x)]`, mid-path `..`, and `[]` / `**` inside `$each(...)` are parse errors.
|
|
393
|
+
|
|
394
|
+
---
|
|
395
|
+
|
|
396
|
+
## 11. Appendix — AST classes (quick index)
|
|
397
|
+
|
|
398
|
+
**Template / structure**: `TemplateQuery`, `TQPlaceholder`, `TQObject`, `TQArray`, `TQValue`, `TQValueWithCond`, `TQCondWrapper`, `TQContextMod`, `TQContextModOr`, `TQShared`, `TQInnerValue`, …
|
|
399
|
+
|
|
400
|
+
**Keys**: `TQKey`, `TQSimpleKey`, `TQParamKey`, `TQRegexKey`, `TQDirectiveKey`, `TQFuncDefinition`
|
|
401
|
+
|
|
402
|
+
**Expressions**: `TExpression`, `TExprField`, `TExprSubfield`, `TExprBinaryOp`, `TExprChangepath`, `TExprITE`, literals (`TExprStringConst`, `TExprIntConst`, `TExprDoubleConst`, `TExprBoolConst`, `TExprJSONConst`), `TExprCall`, `TExprVar`, `TExprFile`, `TExprCSV`, `TExprPrev`, `TExprCount`, `TExprSum`, `TExprAvg`, `TExprMinmax`, string/XML helpers (`TExprPath`, `TExprIndex`, `TExprKey`, …), `TExprTypeCast`, `TExprLastChange`, `TExprIdentifier`, `TExprIdentifierUp`, …
|
|
403
|
+
|
|
404
|
+
**Context modifiers**: `TQContextMod` carries optional `legacy_warn` (deprecated legacy `->field` / bare `->$(expr)` JSON-field branch). **`ArrowOp`**: `None`, `Date`, `Branch`, `Var`, `File`, `Other`, **`IdExpr`** (with `TQIdExprP id_expr`), **`Each`**. Structural `->$self` / `$children` / … are not distinct enum values — they parse to `IdExpr`.
|
|
405
|
+
|
|
406
|
+
**Conditions**: `TQCondition`, `TQCondBool`, `TQCompareTest`, `TQStringTest`, `TQJSONTest`, `TQExistsTest`, `TQTypeTest`
|
|
407
|
+
|
|
408
|
+
**Operators** (shared enum): `Operator` in `TemplateQuery.h` — includes arithmetic, compare, `ROOT`, `UP`, `PREVID`, etc.
|
|
409
|
+
|
|
410
|
+
---
|
|
411
|
+
|
|
412
|
+
## Document history
|
|
413
|
+
|
|
414
|
+
- Introduced as canonical grammar reference for tooling and AI-assisted editing; keep in sync when `TemplateParser.cpp` changes.
|