@plurnk/plurnk-grammar 0.1.1 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +3 -3
- package/bin/plurnk.js +2 -2
- package/package.json +9 -4
- package/plurnk.md +4 -4
- package/schema/Agent.json +18 -0
- package/schema/ChannelContent.json +14 -0
- package/schema/Entry.json +51 -0
- package/schema/LineMarker.json +13 -0
- package/schema/LogEntry.json +100 -0
- package/schema/Loop.json +20 -0
- package/schema/MatcherBody.json +60 -0
- package/schema/Packet.json +64 -0
- package/schema/Params.json +13 -0
- package/schema/ParsedPath.json +51 -0
- package/schema/PlurnkStatement.json +183 -0
- package/schema/Position.json +13 -0
- package/schema/ProviderDeclaration.json +16 -0
- package/schema/Run.json +21 -0
- package/schema/SchemeRegistration.json +31 -0
- package/schema/SendBody.json +13 -0
- package/schema/Session.json +20 -0
- package/schema/Turn.json +30 -0
- package/schema/Visibility.json +17 -0
- package/src/AstBuilder.ts +372 -0
- package/src/PlurnkErrorStrategy.ts +139 -0
- package/src/{errors.ts → PlurnkParseError.ts} +1 -2
- package/src/PlurnkParser.ts +92 -0
- package/src/RecordingListener.ts +34 -0
- package/src/Validator.ts +94 -0
- package/src/generated/plurnkLexer.ts +224 -176
- package/src/generated/plurnkParser.ts +1461 -195
- package/src/generated/plurnkParserVisitor.ts +97 -6
- package/src/index.ts +29 -142
- package/src/types.generated.ts +491 -0
- package/src/types.ts +30 -0
- package/SPEC.md +0 -625
- package/src/ast.ts +0 -348
- package/src/error-strategy.ts +0 -140
package/SPEC.md
DELETED
|
@@ -1,625 +0,0 @@
|
|
|
1
|
-
# Plurnk Grammar Specification
|
|
2
|
-
|
|
3
|
-
## 1. Overview
|
|
4
|
-
|
|
5
|
-
Plurnk extends HEREDOC formatting into a state-machine grammar for LLM
|
|
6
|
-
agents. Every plurnk statement is a single self-contained operation: a
|
|
7
|
-
canonical open tag, an optional payload, and a colon-fenced opaque body
|
|
8
|
-
terminated by a matching close tag. Statements are flat — there is no
|
|
9
|
-
composition or substitution. Documents may contain arbitrary
|
|
10
|
-
interstatement text, which the parser captures verbatim and surfaces
|
|
11
|
-
to consumers without imposing meaning on it.
|
|
12
|
-
|
|
13
|
-
The parser produces a typed AST (per OP discriminated union) plus a
|
|
14
|
-
list of structured errors. Both are JSON-serializable. Errors are
|
|
15
|
-
per-statement; the parser recovers at statement boundaries when it
|
|
16
|
-
can, and surfaces an `unparsedTail` when a boundary-destroying error
|
|
17
|
-
prevents further recovery. See §12 for the consumer contract.
|
|
18
|
-
|
|
19
|
-
Note: SEND status codes (§9) are a *protocol-level* convention for
|
|
20
|
-
SEND statements emitted by the model and runtime. They are unrelated
|
|
21
|
-
to parse-time `PlurnkParseError` objects produced by this package (§12).
|
|
22
|
-
|
|
23
|
-
## 1.1 Domain Boundary
|
|
24
|
-
|
|
25
|
-
The grammar is purely syntactic. A rule belongs in the grammar if and
|
|
26
|
-
only if it can be expressed as a shape constraint on character
|
|
27
|
-
sequences. A rule has crossed into runtime as soon as it requires any
|
|
28
|
-
of:
|
|
29
|
-
|
|
30
|
-
- **Variables** — state held across statements, named bindings, references to prior values.
|
|
31
|
-
- **Magic numbers** — values that carry semantic weight (`410` means "Gone," `200` means "OK"); the grammar accepts the digit string, not its meaning.
|
|
32
|
-
- **Embedded code** — executing language fragments to determine well-formedness (compiling a regex, validating an xpath, resolving a URI).
|
|
33
|
-
|
|
34
|
-
Anything that fits inside that constraint belongs in `.g4`. Anything
|
|
35
|
-
that needs interpretation belongs in the runtime resolver.
|
|
36
|
-
|
|
37
|
-
**Concretely in domain — parser-managed (lexer + parser):**
|
|
38
|
-
|
|
39
|
-
- Statement structure: open tag, slots, body, close tag.
|
|
40
|
-
- Lexical tokens: `<<`, OP keywords, `[…]`, `(…)`, `<N>`, `:`, body, close tag, interstatement TEXT.
|
|
41
|
-
- Slot *shape* constraints: URI shape (scheme grammar + path character class), line-marker integer form, suffix character class, CSV form of `[signal]`.
|
|
42
|
-
- HEREDOC discipline: open/close tag character match, body opacity between `:body:` fences, nesting via suffix.
|
|
43
|
-
- Whitespace rules (§11).
|
|
44
|
-
- Hard constraint: `:OPsuffix` close tag must character-match the open tag's `OPsuffix`.
|
|
45
|
-
|
|
46
|
-
**Concretely in domain — Visitor-managed (typed AST construction):**
|
|
47
|
-
|
|
48
|
-
- Extracting `op`, `suffix`, `signal` (split on comma), `path` (raw),
|
|
49
|
-
`lineMarker` (parsed `<N>` or `<N-M>` integer form), and `body` (raw)
|
|
50
|
-
from the parse tree into a typed discriminated union.
|
|
51
|
-
- Native-JS validation of slot contents where useful (e.g., `new URL()`
|
|
52
|
-
for path, `new RegExp()` for regex bodies). This is preferred over
|
|
53
|
-
ANTLR sub-grammars for URI/regex/xpath/jsonpath — Node's built-ins are
|
|
54
|
-
authoritative, well-tested, and zero-cost to invoke.
|
|
55
|
-
|
|
56
|
-
**Concretely out of domain — runtime:**
|
|
57
|
-
|
|
58
|
-
- URI resolution: what `known://`, `unknown://`, `file://` actually point at; what bare paths resolve to.
|
|
59
|
-
- Tag-matching combination (AND/OR), tag-set semantics.
|
|
60
|
-
- Line-marker arithmetic, out-of-range handling, result-set ordering for pagination.
|
|
61
|
-
- Status code *meanings*: any digit string is grammatically valid in `[signal]`; whether `[410]` means "Gone" or any code carries privileged semantics on any OP is runtime convention.
|
|
62
|
-
- Empty-body semantics (e.g., empty EDIT clears the entry).
|
|
63
|
-
- EXEC body execution: runtime selection, sandboxing, permissions.
|
|
64
|
-
- Filter composition (how SHOW/HIDE combine path × tag × body filters).
|
|
65
|
-
- Output shape returned to the model after a statement executes. The §4 Per-OP Output table documents convention, not grammar rules.
|
|
66
|
-
|
|
67
|
-
## 2. Canonical Statement Form
|
|
68
|
-
|
|
69
|
-
```
|
|
70
|
-
<<OPsuffix [signal]? (path)? <L>? : body? :OPsuffix
|
|
71
|
-
```
|
|
72
|
-
|
|
73
|
-
The `:` characters fence the body. Everything between the opening `:`
|
|
74
|
-
and the closing `:OPsuffix` literal is body, verbatim. This is what
|
|
75
|
-
makes plurnk solve grammatical enclosure: body content is fully opaque
|
|
76
|
-
to OP keywords, modifier-like characters, and the protocol's own
|
|
77
|
-
syntax.
|
|
78
|
-
|
|
79
|
-
Optionality:
|
|
80
|
-
|
|
81
|
-
| Element | Status |
|
|
82
|
-
|-------------|---------------|
|
|
83
|
-
| `<<` | required |
|
|
84
|
-
| `OP` | required |
|
|
85
|
-
| `suffix` | optional; used for nesting and `:OPkeyword` escape (see §8) |
|
|
86
|
-
| `[signal]` | optional, OP-dependent contents |
|
|
87
|
-
| `(path)` | required for all OPs except SEND |
|
|
88
|
-
| `<L>` | optional; single position or range (see §7) |
|
|
89
|
-
| `:` | required (header → body delimiter) |
|
|
90
|
-
| `body` | optional, OP-dependent meaning |
|
|
91
|
-
| `:OPsuffix` | required (close tag: `:` + open tag's OP and suffix, character-matching) |
|
|
92
|
-
|
|
93
|
-
Hard constraints:
|
|
94
|
-
|
|
95
|
-
- Close-tag `:OPsuffix` must character-match the open tag's `OPsuffix`.
|
|
96
|
-
- Header elements appear in the order shown above (signal, then path, then `<L>`, then `:`).
|
|
97
|
-
|
|
98
|
-
All other restrictions are runtime concerns, not grammar concerns.
|
|
99
|
-
|
|
100
|
-
## 3. Lexical Elements
|
|
101
|
-
|
|
102
|
-
- `<<` — open delimiter.
|
|
103
|
-
- `OP` — exactly one of: `FIND`, `READ`, `EDIT`, `COPY`, `MOVE`, `SHOW`, `HIDE`, `SEND`, `EXEC`.
|
|
104
|
-
- `suffix` — `[A-Za-z0-9_]*` immediately concatenated to `OP`, no separator.
|
|
105
|
-
- `[` … `]` — signal slot; contents are OP-dependent (see §4).
|
|
106
|
-
- `(` … `)` — path slot; contents are a URI (see §5).
|
|
107
|
-
- `<L>` — line marker. Shape: `<` `-?[0-9]+` (`-` `-?[0-9]+`)? `>`. A single signed integer denotes a position; two signed integers separated by `-` denote an inclusive range.
|
|
108
|
-
- `:` — body delimiter. Appears between header and body, and (with the OP+suffix following) at the close.
|
|
109
|
-
- `body` — opaque byte stream between the opening `:` and the matching close tag `:OPsuffix`.
|
|
110
|
-
- `:OPsuffix` close — `:` immediately followed by the open tag's `OP` and `suffix` (character-matching, no whitespace).
|
|
111
|
-
|
|
112
|
-
## 4. Per-OP Semantics
|
|
113
|
-
|
|
114
|
-
| OP | `[signal]` | `(path)` | `body` | `<LineN>` |
|
|
115
|
-
|--------|-------------------|----------|-------------------------|---------------|
|
|
116
|
-
| FIND | tag filter (CSV) | required | pattern matcher | result-set pagination |
|
|
117
|
-
| READ | tag filter (CSV) | required | pattern matcher | per-entry lines |
|
|
118
|
-
| EDIT | tags (CSV) | required | content (empty body clears the entry) | entry lines |
|
|
119
|
-
| COPY | tags to apply (CSV) | required | destination URI | entry lines |
|
|
120
|
-
| MOVE | tags to apply (CSV) | required | destination URI | entry lines |
|
|
121
|
-
| SHOW | tag filter (CSV) | required | optional pattern matcher | result-set pagination |
|
|
122
|
-
| HIDE | tag filter (CSV) | required | optional pattern matcher | result-set pagination |
|
|
123
|
-
| SEND | HTTP status code (single integer) | optional | message payload (JSON by convention for structured responses) | not applicable |
|
|
124
|
-
| EXEC | runtime tag (single string; `sh` default, `node`, `python`, …) | required | command or code snippet | not applicable |
|
|
125
|
-
|
|
126
|
-
The `<L>` slot is optional. Its referent shifts by OP (per the column
|
|
127
|
-
above) but the syntax is uniform: a single integer denotes one
|
|
128
|
-
position, an integer range `<N-M>` selects items at positions `N..M`
|
|
129
|
-
inclusive of whatever sequence the OP operates on or produces.
|
|
130
|
-
|
|
131
|
-
EDIT line-marker semantics (single source of authority):
|
|
132
|
-
|
|
133
|
-
- No `<L>` + body present: replace entire entry contents with body.
|
|
134
|
-
- No `<L>` + no body: clear entry contents (empty replacement).
|
|
135
|
-
- `<N>` (single position) + body: replace the single line at `N` with body.
|
|
136
|
-
- `<N-M>` (range) + body: replace lines `N..M` inclusive with body.
|
|
137
|
-
- `<0>` + body: prepend body before line 1.
|
|
138
|
-
- `<-1>` + body: append body after the last line.
|
|
139
|
-
|
|
140
|
-
SHOW and HIDE filters are AND-combined: an entry is selected when its
|
|
141
|
-
path matches `(path)`, its tags satisfy `[signal]` (if present), and its
|
|
142
|
-
content matches `body` (if present).
|
|
143
|
-
|
|
144
|
-
### Per-OP Output (what each OP produces)
|
|
145
|
-
|
|
146
|
-
| OP | Produces |
|
|
147
|
-
|------|----------|
|
|
148
|
-
| FIND | list of matching paths |
|
|
149
|
-
| READ | content of matched entries (or matched substrings if `body` is a pattern) |
|
|
150
|
-
| EDIT | status; resulting entry content on success |
|
|
151
|
-
| COPY | status; destination path on success |
|
|
152
|
-
| MOVE | status; destination path on success |
|
|
153
|
-
| SHOW | status; list of paths moved into the Index |
|
|
154
|
-
| HIDE | status; list of paths moved into the Archive |
|
|
155
|
-
| SEND | status; recipient ack if applicable |
|
|
156
|
-
| EXEC | exit code, stdout, stderr |
|
|
157
|
-
|
|
158
|
-
Output is delivered to the model in the next turn. The shape of "status"
|
|
159
|
-
is a SEND-style status code (see §9) so that errors are uniform across
|
|
160
|
-
all OPs.
|
|
161
|
-
|
|
162
|
-
## 5. Path Grammar
|
|
163
|
-
|
|
164
|
-
Paths are URI-shaped, drawn from RFC 3986 in spirit but not strictly.
|
|
165
|
-
Two RFC concessions justify the relaxation:
|
|
166
|
-
|
|
167
|
-
1. RFC 3986 lists `)` as a sub-delim — a valid path character. Plurnk
|
|
168
|
-
reserves `)` to close the path slot. Strict compliance would
|
|
169
|
-
require an escape mechanism; plurnk does not provide one.
|
|
170
|
-
2. Bulk Pattern Matching extends path segments with glob
|
|
171
|
-
metacharacters (`*`, `**`, `?`, `[…]`) that fall outside the RFC
|
|
172
|
-
character set.
|
|
173
|
-
|
|
174
|
-
Lexer-enforced shape:
|
|
175
|
-
|
|
176
|
-
- Optional scheme: `[a-z][a-z0-9+.-]*` followed by `://`.
|
|
177
|
-
- Path content: any character except `)` and newline.
|
|
178
|
-
- Glob metacharacters in path segments are permitted.
|
|
179
|
-
|
|
180
|
-
Runtime-enforced semantics:
|
|
181
|
-
|
|
182
|
-
- Bare paths (no scheme) resolve as `file://` at runtime.
|
|
183
|
-
- Conventional schemes include `known://`, `unknown://`, `log://`,
|
|
184
|
-
`file://`, `http://`, `https://`. Any scheme matching the lexer
|
|
185
|
-
shape is grammatically valid; resolution is a runtime concern.
|
|
186
|
-
- Percent-encoding, authority structure, port range, and other RFC
|
|
187
|
-
3986 finer points are validated by the runtime URI resolver, not
|
|
188
|
-
the parser.
|
|
189
|
-
|
|
190
|
-
## 6. Bulk Pattern Matching
|
|
191
|
-
|
|
192
|
-
For FIND, READ, SHOW, and HIDE, `body` is an optional pattern matcher.
|
|
193
|
-
The lexer captures the body opaquely (between the `:body:` fences) —
|
|
194
|
-
dialect dispatch is not a lexer concern. Dialect is determined by the
|
|
195
|
-
body's leading characters, and validated by the Visitor using native
|
|
196
|
-
JS facilities (`new RegExp()` etc.) where applicable:
|
|
197
|
-
|
|
198
|
-
| Leading prefix | Dialect | Canonical form | Validation |
|
|
199
|
-
|----------------|-----------|---------------------------|--------------------|
|
|
200
|
-
| `//` | xpath | `//…` | runtime (xpath lib) |
|
|
201
|
-
| `/` | regex | `/pattern/flags` (trailing `/` required, flags `[a-z]*`) | `new RegExp()` in Visitor |
|
|
202
|
-
| `$` | jsonpath | `$…` | runtime (jsonpath lib) |
|
|
203
|
-
| otherwise | glob | `…` (literal substring if no metacharacters) | runtime (glob library) |
|
|
204
|
-
|
|
205
|
-
Dialect conventions (the Visitor uses these to construct typed AST
|
|
206
|
-
body fields; the lexer is unaware):
|
|
207
|
-
|
|
208
|
-
- Xpath body begins with `//` (descendant-or-self axis). Absolute-root
|
|
209
|
-
`/foo` is unreachable (collides with regex prefix); rework as `//foo`.
|
|
210
|
-
- Regex body is a delimited literal: opens with `/`, ends with `/`
|
|
211
|
-
before the close fence, with optional flag chars `[a-z]*` between
|
|
212
|
-
the closing `/` and the close fence. Literal `/` inside the pattern
|
|
213
|
-
must be escaped `\/`.
|
|
214
|
-
- Regex anchors `^` and `$` go inside the slashes: `/^foo$/`.
|
|
215
|
-
- Flag semantics (`i` case-insensitive, `m` multiline, `s` dotall,
|
|
216
|
-
etc.) follow ECMAScript regex.
|
|
217
|
-
- Glob is the catch-all and includes the literal-substring case when
|
|
218
|
-
no metacharacters are present.
|
|
219
|
-
|
|
220
|
-
**Implemented validation in the Visitor (Node-native):**
|
|
221
|
-
|
|
222
|
-
- **Path**: the Visitor distinguishes local paths from URLs by the
|
|
223
|
-
presence of a scheme prefix (`[a-z][a-z0-9+.-]*://`). Local paths
|
|
224
|
-
(filesystem-style, no scheme) are stored as `{ kind: "local", raw }`
|
|
225
|
-
without further parsing — `new URL()` is not invoked, so no URL
|
|
226
|
-
conventions are imposed on what was clearly intended as a local
|
|
227
|
-
reference. URLs are parsed by `new URL(raw)` and decomposed into
|
|
228
|
-
components (`scheme`, `username`, `password`, `hostname`, `port`,
|
|
229
|
-
`pathname`, `search`, `fragment`). Genuine URL-protocol violations
|
|
230
|
-
(malformed authority, unterminated IPv6 brackets, invalid port, etc.)
|
|
231
|
-
produce a `PlurnkParseError` with source `"visitor"`.
|
|
232
|
-
- **Regex body** (matcher-body OPs only, leading `/` and not `//`):
|
|
233
|
-
the Visitor extracts `pattern` and `flags` (respecting `\/` escapes)
|
|
234
|
-
and calls `new RegExp(pattern, flags)`. On failure (missing closing
|
|
235
|
-
`/`, unterminated character class, invalid flag, etc.), a
|
|
236
|
-
`PlurnkParseError` with source `"visitor"` is emitted.
|
|
237
|
-
- **XPath body** (matcher-body OPs only, leading `//`): the Visitor
|
|
238
|
-
calls `xpath.parse()` from the `xpath` npm package (XPath 1.0
|
|
239
|
-
parser-only, no DOM execution). On failure (unterminated predicate,
|
|
240
|
-
invalid operator, etc.), a `PlurnkParseError` with source `"visitor"`
|
|
241
|
-
is emitted.
|
|
242
|
-
- **JsonPath body** (matcher-body OPs only, leading `$`): the Visitor
|
|
243
|
-
calls `JSONPath({ path: body, json: {} })` from the `jsonpath-plus`
|
|
244
|
-
npm package. The empty `{}` ensures syntax parsing happens without
|
|
245
|
-
document evaluation. Syntax errors (unclosed parens, malformed
|
|
246
|
-
filter expressions, etc.) throw and become `PlurnkParseError` with
|
|
247
|
-
source `"visitor"`.
|
|
248
|
-
|
|
249
|
-
**Deferred validation:**
|
|
250
|
-
|
|
251
|
-
- **Glob** bodies — pass through as raw; runtime applies whatever
|
|
252
|
-
glob matcher is appropriate.
|
|
253
|
-
|
|
254
|
-
**Why not ANTLR sub-grammars for any of these?** Node's `new URL()`
|
|
255
|
-
and `new RegExp()` are authoritative, well-tested, and zero-cost to
|
|
256
|
-
invoke; `xpath` and `jsonpath-plus` are the de facto Node parsers for
|
|
257
|
-
their respective dialects. ANTLR sub-grammars for any of these would
|
|
258
|
-
add hundreds of lines of generated parser code with no validation
|
|
259
|
-
benefit over the native or library facilities.
|
|
260
|
-
|
|
261
|
-
## 7. Line Markers
|
|
262
|
-
|
|
263
|
-
A line marker selects a position or range from the sequence an OP
|
|
264
|
-
operates on or produces. The sequence type is OP-specific (see §4
|
|
265
|
-
per-OP table): entry lines for EDIT/COPY/MOVE, matched content lines
|
|
266
|
-
for READ, positions in the matched-paths list for FIND/SHOW/HIDE.
|
|
267
|
-
|
|
268
|
-
**Token shape:** `<` `-?[0-9]+` (`-` `-?[0-9]+`)? `>`.
|
|
269
|
-
|
|
270
|
-
| Form | Meaning |
|
|
271
|
-
|----------|--------------------------------------|
|
|
272
|
-
| `<N>` | single position N |
|
|
273
|
-
| `<N-M>` | inclusive range N..M |
|
|
274
|
-
| `<0>` | prepend anchor (before position 1) |
|
|
275
|
-
| `<-1>` | append anchor (after last position) |
|
|
276
|
-
|
|
277
|
-
Examples involving negative integers:
|
|
278
|
-
|
|
279
|
-
- `<-1-5>` — range from -1 to 5
|
|
280
|
-
- `<0--5>` — range from 0 to -5
|
|
281
|
-
- `<-3--1>` — range from -3 to -1
|
|
282
|
-
|
|
283
|
-
**Parsing rule:** greedy. The first signed integer consumes leading
|
|
284
|
-
`-` and digits maximally; the optional `-` range separator follows; the
|
|
285
|
-
optional second signed integer consumes its own optional `-` and
|
|
286
|
-
digits. So `<-1-5>` parses as first=`-1`, separator=`-`, second=`5`.
|
|
287
|
-
This falls out of standard ANTLR longest-match.
|
|
288
|
-
|
|
289
|
-
**Runtime concerns** (not enforced by the parser):
|
|
290
|
-
|
|
291
|
-
- `N ≥ 1`: 1-indexed position.
|
|
292
|
-
- Validity of any specific value (out-of-range, inverted range where
|
|
293
|
-
`N > M`, sentinel meanings beyond the canonical `0`/`-1`) is decided
|
|
294
|
-
per-OP at runtime.
|
|
295
|
-
|
|
296
|
-
**Result-set ordering** (FIND, SHOW, HIDE): the runtime must produce a
|
|
297
|
-
deterministic order so that `<N-M>` pagination is reproducible.
|
|
298
|
-
Lexicographic ascending order over the matched path strings is the
|
|
299
|
-
canonical ordering. Runtime guarantee, not a parser concern.
|
|
300
|
-
|
|
301
|
-
## 8. Suffix Discipline
|
|
302
|
-
|
|
303
|
-
The `:body:` fencing handles the vast majority of grammatical-enclosure
|
|
304
|
-
concerns: body content is fully opaque to OP keywords and modifier-like
|
|
305
|
-
characters. The suffix is reserved for the residual edge case where
|
|
306
|
-
body content literally contains the close-tag pattern `:OPkeyword`.
|
|
307
|
-
That happens in two scenarios:
|
|
308
|
-
|
|
309
|
-
1. **Nesting plurnk statements inside a body** (recording a plurnk
|
|
310
|
-
transcript, storing examples, etc.). The inner statement's close
|
|
311
|
-
`:OP` would prematurely terminate the outer's body.
|
|
312
|
-
2. **Body content contains `:OPkeyword` as literal text** (e.g., a
|
|
313
|
-
stored JSON object with a value mentioning plurnk syntax).
|
|
314
|
-
|
|
315
|
-
Suffix rules:
|
|
316
|
-
|
|
317
|
-
- `suffix` is `[A-Za-z0-9_]*`, concatenated to `OP` with no separator, on both open and close.
|
|
318
|
-
- Open `<<OPsuffix` and close `:OPsuffix` must character-match.
|
|
319
|
-
- A non-empty suffix on the outer statement ensures its close tag
|
|
320
|
-
(`:OPsuffix`) is distinct from any `:OP` substring that may appear in
|
|
321
|
-
body content (whether as nested plurnk or as literal text).
|
|
322
|
-
- The body of a statement cannot contain its own exact close-tag
|
|
323
|
-
literal; choose a suffix that does not collide.
|
|
324
|
-
- Empty suffix is the default. Most statements need no suffix.
|
|
325
|
-
|
|
326
|
-
Example — nested EDIT inside an outer EDITa:
|
|
327
|
-
|
|
328
|
-
```
|
|
329
|
-
<<EDITa(known://demo):
|
|
330
|
-
The following is a quoted plurnk operation, preserved verbatim:
|
|
331
|
-
<<EDIT(known://inner):hello world:EDIT
|
|
332
|
-
:EDITa
|
|
333
|
-
```
|
|
334
|
-
|
|
335
|
-
The inner's `:EDIT` close does not terminate the outer because the
|
|
336
|
-
outer's close tag is `:EDITa`.
|
|
337
|
-
|
|
338
|
-
## 9. SEND Status Codes
|
|
339
|
-
|
|
340
|
-
SEND status codes align with HTTP semantics so that model training
|
|
341
|
-
transfers directly:
|
|
342
|
-
|
|
343
|
-
- `1xx` Informational — continuation; `102 Processing` is the canonical loop-continuation code.
|
|
344
|
-
- `2xx` Success — terminal delivery; `200 OK` is the canonical final-answer code.
|
|
345
|
-
- `3xx` Redirection — handoff to another agent or address.
|
|
346
|
-
- `4xx` Client Error — model-side failure (malformed plurnk, missing path, contract violation).
|
|
347
|
-
- `5xx` Server Error — runtime or infrastructure failure (network, permission, tool unavailable).
|
|
348
|
-
|
|
349
|
-
SEND with no `(path)` broadcasts to the default control channel. SEND
|
|
350
|
-
with `(path)` directs the message at a specific recipient URI.
|
|
351
|
-
|
|
352
|
-
### Response Body Convention
|
|
353
|
-
|
|
354
|
-
Structured responses (errors, query results, multi-field acknowledgments)
|
|
355
|
-
are emitted as **JSON in the SEND body**, so the model can consume them
|
|
356
|
-
with the same jsonpath dialect it uses for matching:
|
|
357
|
-
|
|
358
|
-
```
|
|
359
|
-
<<SEND[400](err://lex)
|
|
360
|
-
{"reason":"unexpected token","position":{"line":47,"column":12},"expected":[")"],"got":"["}
|
|
361
|
-
SEND
|
|
362
|
-
```
|
|
363
|
-
|
|
364
|
-
The model retrieves a field with `<<READ(err://lex)$.reasonREAD` or
|
|
365
|
-
similar. Plain-text bodies remain valid for simple terminal answers
|
|
366
|
-
(`<<SEND[200]ParisSEND`). The JSON convention is runtime policy; the
|
|
367
|
-
grammar treats body as opaque.
|
|
368
|
-
|
|
369
|
-
## 10. Implementation Notes
|
|
370
|
-
|
|
371
|
-
- ANTLR4 split follows standard convention: `plurnkLexer.g4` defines
|
|
372
|
-
tokens; `plurnkParser.g4` defines statement structure. Generated
|
|
373
|
-
using `antlr-ng` targeting the `antlr4ng` runtime.
|
|
374
|
-
- The body is fenced by `:` on the header side and `:OPsuffix` on the
|
|
375
|
-
close side. The lexer enters body mode when it consumes the opening
|
|
376
|
-
`:` after the last header element. In body mode, the close-tag rule
|
|
377
|
-
uses a semantic predicate (`atColonCloseTag()`) that fires when the
|
|
378
|
-
next characters match `:OPsuffix` exactly. The open tag (`OP +
|
|
379
|
-
suffix`) is captured at statement start and held on the lexer
|
|
380
|
-
instance.
|
|
381
|
-
- The body is uniformly opaque at the lexer level (a sequence of
|
|
382
|
-
`BODY_TEXT` tokens). The Visitor reconstructs body content as a
|
|
383
|
-
single string and, per OP semantics, interprets it as content,
|
|
384
|
-
destination URI, payload, command, or matcher.
|
|
385
|
-
- Header mode hierarchy: state machine `DEFAULT → OPENED → SIGNAL → POST_SIGNAL → PATH → POST_PATH → POST_L → BODY` tracks which
|
|
386
|
-
header elements remain valid at each position (after signal, signal
|
|
387
|
-
is no longer valid; after path, neither signal nor path; after `<L>`,
|
|
388
|
-
only the `:` body delimiter is valid). Each header mode requires the
|
|
389
|
-
`:` to transition to BODY; no fallback.
|
|
390
|
-
- PATH and SIGNAL content reject `<<` (single `<` is permitted inside
|
|
391
|
-
them, double `<<` is the statement-opener prefix and must not appear).
|
|
392
|
-
This prevents a malformed path or signal from silently swallowing the
|
|
393
|
-
next statement.
|
|
394
|
-
- Interstatement content (between statements) is captured as `TEXT`
|
|
395
|
-
tokens. The lexer's `TEXT` rule matches any chars that aren't a
|
|
396
|
-
recognized statement opener; a `<<` followed by a non-OP sequence is
|
|
397
|
-
rolled into `TEXT` rather than producing an error.
|
|
398
|
-
- Error model: the parser uses ANTLR's `DefaultErrorStrategy` for
|
|
399
|
-
cross-statement recovery (sync to next statement opener on error).
|
|
400
|
-
An error listener records every syntax error as a `PlurnkParseError`
|
|
401
|
-
(line, column, source: `"lexer" | "parser"`, message). The Visitor's
|
|
402
|
-
caller correlates errors to statement positions and emits them in
|
|
403
|
-
the result's `items` array in order.
|
|
404
|
-
- Boundary-destroying errors (lexer ends in a non-DEFAULT mode at EOF,
|
|
405
|
-
typically meaning a statement was never closed) surface as
|
|
406
|
-
`unparsedTail` on the parse result. The agent's consumer treats this
|
|
407
|
-
as "the document past this point is unparseable; do not execute
|
|
408
|
-
anything after the last successful item."
|
|
409
|
-
|
|
410
|
-
## 11. Whitespace and Comments
|
|
411
|
-
|
|
412
|
-
Plurnk is HEREDOC-disciplined and LLM-tolerant: forgiving where
|
|
413
|
-
forgiveness is safe, strict where laxity would corrupt content.
|
|
414
|
-
|
|
415
|
-
- **Between header elements** (`OPsuffix`, `[signal]`, `(path)`,
|
|
416
|
-
`<L>`, the body-delimiter `:`): whitespace (spaces, tabs, newlines)
|
|
417
|
-
is optional and non-significant.
|
|
418
|
-
- **Inside header elements** (between the brackets/parens/angles
|
|
419
|
-
themselves — e.g., inside `[…]`, `(…)`, `<…>`, between `OP` and
|
|
420
|
-
`suffix`): whitespace is forbidden. These are strict tokens.
|
|
421
|
-
- **Body interior**: whitespace is preserved verbatim. Body content
|
|
422
|
-
begins at the character immediately after the opening `:` and ends
|
|
423
|
-
immediately before the closing `:OPsuffix`. Leading and trailing
|
|
424
|
-
newlines in body content (common for multi-line bodies written by
|
|
425
|
-
the model) are part of the body; runtime consumers may normalize
|
|
426
|
-
them.
|
|
427
|
-
- **Close tag** (`:OPsuffix`): the `:` and the `OPsuffix` must be
|
|
428
|
-
character-adjacent — no whitespace permitted between them. Whitespace
|
|
429
|
-
*before* the close `:` (i.e., trailing whitespace in body) is body
|
|
430
|
-
content, preserved verbatim.
|
|
431
|
-
|
|
432
|
-
Comments: plurnk has no comment syntax. The protocol is wire-shaped,
|
|
433
|
-
not source-shaped. To leave a self-documenting breadcrumb, use
|
|
434
|
-
`<<EDIT(known://notes/…):…:EDIT` (model-visible) or
|
|
435
|
-
`<<SEND[1xx](…):…:SEND` (orchestrator-visible).
|
|
436
|
-
|
|
437
|
-
## 12. Public API
|
|
438
|
-
|
|
439
|
-
This package exports a single entry point `parse(input: string): ParseResult` and the AST type union. The full surface area:
|
|
440
|
-
|
|
441
|
-
```typescript
|
|
442
|
-
parse(input: string): ParseResult
|
|
443
|
-
|
|
444
|
-
type ParseResult = {
|
|
445
|
-
items: ParseItem[];
|
|
446
|
-
unparsedTail?: { from: Position; reason: string };
|
|
447
|
-
};
|
|
448
|
-
|
|
449
|
-
type ParseItem =
|
|
450
|
-
| { kind: "statement"; statement: PlurnkStatement }
|
|
451
|
-
| { kind: "error"; error: PlurnkParseError }
|
|
452
|
-
| { kind: "text"; text: string; position: Position };
|
|
453
|
-
|
|
454
|
-
type Position = { line: number; column: number };
|
|
455
|
-
|
|
456
|
-
type PlurnkOp = "FIND" | "READ" | "EDIT" | "COPY" | "MOVE" | "SHOW" | "HIDE" | "SEND" | "EXEC";
|
|
457
|
-
|
|
458
|
-
type PlurnkStatement =
|
|
459
|
-
| FindStatement | ReadStatement | EditStatement
|
|
460
|
-
| CopyStatement | MoveStatement
|
|
461
|
-
| ShowStatement | HideStatement
|
|
462
|
-
| SendStatement | ExecStatement;
|
|
463
|
-
|
|
464
|
-
interface StatementBase<S> {
|
|
465
|
-
suffix: string; // empty string if no suffix
|
|
466
|
-
signal: S | null; // null = no [signal] slot; type S varies per OP (see below)
|
|
467
|
-
path: ParsedPath | null; // typed parse of (path); null if no slot or empty
|
|
468
|
-
lineMarker: LineMarker | null;
|
|
469
|
-
position: Position;
|
|
470
|
-
// body type varies per OP — declared on each concrete statement (below).
|
|
471
|
-
}
|
|
472
|
-
|
|
473
|
-
interface LineMarker { first: number; last: number | null; }
|
|
474
|
-
|
|
475
|
-
// Path is local (no scheme) or URL (has scheme). The Visitor decides by
|
|
476
|
-
// matching the leading [a-z][a-z0-9+.-]*:// pattern; only URLs are passed
|
|
477
|
-
// through `new URL()` for component breakdown.
|
|
478
|
-
type ParsedPath = LocalPath | UrlPath;
|
|
479
|
-
|
|
480
|
-
interface LocalPath {
|
|
481
|
-
kind: "local";
|
|
482
|
-
raw: string; // filesystem path or other non-URL identifier
|
|
483
|
-
}
|
|
484
|
-
|
|
485
|
-
interface UrlPath {
|
|
486
|
-
kind: "url";
|
|
487
|
-
raw: string;
|
|
488
|
-
scheme: string; // protocol without trailing ':'
|
|
489
|
-
username: string | null;
|
|
490
|
-
password: string | null;
|
|
491
|
-
hostname: string | null; // first authority segment; for custom schemes like
|
|
492
|
-
// `known://entries/foo`, hostname = "entries"
|
|
493
|
-
port: number | null;
|
|
494
|
-
pathname: string; // path component, may be empty
|
|
495
|
-
search: Record<string, string | string[]>;
|
|
496
|
-
fragment: string | null;
|
|
497
|
-
}
|
|
498
|
-
|
|
499
|
-
// Typed body for FIND/READ/SHOW/HIDE — dialect dispatch with compiled regex.
|
|
500
|
-
type MatcherBody =
|
|
501
|
-
| { dialect: "xpath"; raw: string }
|
|
502
|
-
| { dialect: "regex"; raw: string; pattern: string; flags: string; regexp: RegExp }
|
|
503
|
-
| { dialect: "jsonpath"; raw: string }
|
|
504
|
-
| { dialect: "glob"; raw: string };
|
|
505
|
-
|
|
506
|
-
// Typed body for SEND — best-effort JSON parse alongside raw.
|
|
507
|
-
interface SendBody {
|
|
508
|
-
raw: string;
|
|
509
|
-
json: unknown | null; // parsed value if body is valid JSON, else null
|
|
510
|
-
}
|
|
511
|
-
|
|
512
|
-
// Each variant declares its own body type. Tag-bearing OPs share signal=string[];
|
|
513
|
-
// SEND uses number; EXEC uses string.
|
|
514
|
-
|
|
515
|
-
// Matcher OPs — body is a typed pattern matcher.
|
|
516
|
-
interface FindStatement extends StatementBase<string[]> { op: "FIND"; body: MatcherBody | null; }
|
|
517
|
-
interface ReadStatement extends StatementBase<string[]> { op: "READ"; body: MatcherBody | null; }
|
|
518
|
-
interface ShowStatement extends StatementBase<string[]> { op: "SHOW"; body: MatcherBody | null; }
|
|
519
|
-
interface HideStatement extends StatementBase<string[]> { op: "HIDE"; body: MatcherBody | null; }
|
|
520
|
-
|
|
521
|
-
// EDIT — body is arbitrary content (markdown, code, prose). Raw.
|
|
522
|
-
interface EditStatement extends StatementBase<string[]> { op: "EDIT"; body: string | null; }
|
|
523
|
-
|
|
524
|
-
// COPY/MOVE — body is the destination URI, parsed identically to the path slot.
|
|
525
|
-
interface CopyStatement extends StatementBase<string[]> { op: "COPY"; body: ParsedPath | null; }
|
|
526
|
-
interface MoveStatement extends StatementBase<string[]> { op: "MOVE"; body: ParsedPath | null; }
|
|
527
|
-
|
|
528
|
-
// SEND — body is raw + best-effort JSON.
|
|
529
|
-
interface SendStatement extends StatementBase<number> { op: "SEND"; body: SendBody | null; }
|
|
530
|
-
|
|
531
|
-
// EXEC — body is a command or code snippet. Raw.
|
|
532
|
-
interface ExecStatement extends StatementBase<string> { op: "EXEC"; body: string | null; }
|
|
533
|
-
```
|
|
534
|
-
|
|
535
|
-
The `op` field is the discriminator. TypeScript narrows the statement
|
|
536
|
-
type per-branch: `switch (s.op) { case "EDIT": /* s is EditStatement */ }`.
|
|
537
|
-
|
|
538
|
-
**Items are ordered.** The agent consumer iterates in order: execute on
|
|
539
|
-
`statement`, halt on `error`, surface or ignore `text` per policy.
|
|
540
|
-
|
|
541
|
-
**ANTLR types do not leak.** All `antlr4ng` types are internal to this
|
|
542
|
-
package; consumers receive only the types listed above.
|
|
543
|
-
|
|
544
|
-
### CLI
|
|
545
|
-
|
|
546
|
-
The package also exposes a `plurnk` CLI for local development and tooling:
|
|
547
|
-
|
|
548
|
-
```
|
|
549
|
-
plurnk [file] Parse plurnk source from a file (or stdin if omitted or '-')
|
|
550
|
-
and print the parse result as JSON.
|
|
551
|
-
plurnk --help Show usage.
|
|
552
|
-
```
|
|
553
|
-
|
|
554
|
-
Exit codes: `0` for a clean parse (no error items, no `unparsedTail`),
|
|
555
|
-
`1` otherwise. `RegExp` values inside `MatcherBody` serialize as their
|
|
556
|
-
`/pattern/flags` string form; `PlurnkParseError` instances serialize via
|
|
557
|
-
their `toJSON()` method to `{ line, column, source, message }`.
|
|
558
|
-
|
|
559
|
-
## 13. Error Format
|
|
560
|
-
|
|
561
|
-
`PlurnkParseError` is a JSON-serializable Error subclass:
|
|
562
|
-
|
|
563
|
-
```typescript
|
|
564
|
-
type ErrorSource = "lexer" | "parser" | "visitor";
|
|
565
|
-
|
|
566
|
-
class PlurnkParseError extends Error {
|
|
567
|
-
readonly line: number;
|
|
568
|
-
readonly column: number;
|
|
569
|
-
readonly source: ErrorSource;
|
|
570
|
-
// .message is "Plurnk <source> error at <line>:<column> — <message>"
|
|
571
|
-
}
|
|
572
|
-
```
|
|
573
|
-
|
|
574
|
-
The three sources distinguish:
|
|
575
|
-
|
|
576
|
-
- **`"lexer"`** — token-level failures (unrecognized character, malformed integer in `<L>`, etc.).
|
|
577
|
-
- **`"parser"`** — structural failures at parse-tree level (missing close tag, wrong token order, etc.).
|
|
578
|
-
- **`"visitor"`** — semantic failures during AST construction (SEND signal not an integer, EXEC signal with multiple values, etc.).
|
|
579
|
-
|
|
580
|
-
Serialization convention for transmission to the model (the agent
|
|
581
|
-
runtime constructs this; the parser provides the fields):
|
|
582
|
-
|
|
583
|
-
```json
|
|
584
|
-
{
|
|
585
|
-
"line": 1,
|
|
586
|
-
"column": 12,
|
|
587
|
-
"source": "parser",
|
|
588
|
-
"message": "expected close tag; got end of input"
|
|
589
|
-
}
|
|
590
|
-
```
|
|
591
|
-
|
|
592
|
-
**Message style rules** (enforced by `PlurnkErrorStrategy` and the
|
|
593
|
-
lexer message translator):
|
|
594
|
-
|
|
595
|
-
- **Protocol vocabulary only.** Messages refer to plurnk concepts (open
|
|
596
|
-
tag, close tag, signal, path, line marker, body, statement header,
|
|
597
|
-
between statements) — never ANTLR or parser-internal terms (no
|
|
598
|
-
"token recognition error," "extraneous input," "RPAREN," "no viable
|
|
599
|
-
alternative," "<EOF>", etc.).
|
|
600
|
-
- **Terse but complete.** One short sentence naming what was wrong.
|
|
601
|
-
No suggestions, no recovery hints, no extra context. The model
|
|
602
|
-
receives `line`/`column` separately and doesn't need duplication.
|
|
603
|
-
- **Slot/feature references**, not rule references. "in path" rather
|
|
604
|
-
than "in rule path"; "expected close tag" rather than "expected
|
|
605
|
-
CLOSE_TAG."
|
|
606
|
-
|
|
607
|
-
Examples of canonical messages:
|
|
608
|
-
|
|
609
|
-
- `unrecognized character '<<' in path`
|
|
610
|
-
- `unrecognized character ':' in signal`
|
|
611
|
-
- `unrecognized character 'X' in statement header`
|
|
612
|
-
- `expected close tag; got end of input`
|
|
613
|
-
- `expected ')'; got ':'`
|
|
614
|
-
|
|
615
|
-
**Per-statement semantics.** A single statement produces at most one
|
|
616
|
-
error (fail-hard within a statement; first error wins, no cascading
|
|
617
|
-
within-statement reports). Across statements, the parser recovers and
|
|
618
|
-
continues — independent malformations in different statements each
|
|
619
|
-
get their own error item in the result.
|
|
620
|
-
|
|
621
|
-
**Boundary-destroying errors.** When the lexer cannot determine where
|
|
622
|
-
a malformed statement ends (e.g., a body that never finds its close
|
|
623
|
-
tag), the parser cannot reliably parse content after that point. The
|
|
624
|
-
result's `unparsedTail` field marks the position from which parsing
|
|
625
|
-
gave up. Consumers must treat anything past that point as undefined.
|