papagaio 0.7.9 → 0.32.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +239 -499
- package/bin/cli.js +51 -0
- package/dist/wasm/papagaio.js +97 -0
- package/dist/wasm/papagaio_wasm.js +0 -0
- package/package.json +12 -7
- package/bin/cli.mjs +0 -66
- package/index.html +0 -209
- package/papagaio.js +0 -257
- package/tests/test.js +0 -101
- package/tests/tests.json +0 -214
package/README.md
CHANGED
|
@@ -1,585 +1,325 @@
|
|
|
1
1
|
# Papagaio
|
|
2
|
-
Minimal yet powerful text preprocessor.
|
|
3
2
|
|
|
4
|
-
-
|
|
5
|
-
- **It's small!** papagaio is around ~250 lines and ~10kb.
|
|
6
|
-
- **It's easy!** papagaio doesnt have any complicated stuff, 1 class and 1 method for doing everything!
|
|
7
|
-
- **It's flexible!** do papagaio sigil and delimiters conflict with whatever you want to process? then simply change it! papagaio allow us to modify ANY of its keywords and symbols.
|
|
8
|
-
- **It's powerful!!** aside been inspired by the m4 preprocessor and meant to be a preprocessor, papagaio still a fully-featured programming language because it can evaluate any valid javascript code using $eval;
|
|
3
|
+
Papagaio is a C-first, embeddable text processing engine. It is designed to be highly modular and **script-agnostic**, allowing core pattern matching to be used alone or extended via WebAssembly (Wasm) plugins.
|
|
9
4
|
|
|
10
|
-
##
|
|
11
|
-
```javascript
|
|
12
|
-
import { Papagaio } from './papagaio.js';
|
|
13
|
-
const papagaio = new Papagaio();
|
|
14
|
-
const result = papagaio.process(input);
|
|
15
|
-
```
|
|
16
|
-
|
|
17
|
-
## Configuration
|
|
18
|
-
```javascript
|
|
19
|
-
papagaio.symbols = {
|
|
20
|
-
pattern: "pattern", // pattern keyword
|
|
21
|
-
open: "{", // opening delimiter (multi-char supported)
|
|
22
|
-
close: "}", // closing delimiter (multi-char supported)
|
|
23
|
-
sigil: "$", // variable marker
|
|
24
|
-
eval: "eval", // eval keyword
|
|
25
|
-
block: "recursive", // block keyword (recursive nesting)
|
|
26
|
-
regex: "regex", // regex keyword
|
|
27
|
-
blockseq: "sequential" // blockseq keyword (sequential blocks)
|
|
28
|
-
};
|
|
29
|
-
```
|
|
5
|
+
## Key Features
|
|
30
6
|
|
|
31
|
-
|
|
7
|
+
- **Lightweight Core**: Efficient C engine for pattern matching and transformation.
|
|
8
|
+
- **Pattern-Matching**: Powerful capture system with built-in and custom modifiers.
|
|
9
|
+
- **WebAssembly Plugins**: Highly secure, zero-dependency plugin architecture via an embedded `wasm3` runtime.
|
|
10
|
+
- **Configurable Delimiters**: Redefine sigils, delimiters, and markers at runtime.
|
|
11
|
+
- **Language Bindings**: Native usage in C and Node.js/WebAssembly.
|
|
32
12
|
|
|
33
|
-
##
|
|
13
|
+
## Quick Start
|
|
34
14
|
|
|
35
|
-
###
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
hello
|
|
39
|
-
```
|
|
40
|
-
Output: `hello`
|
|
41
|
-
|
|
42
|
-
### 2. Multiple Variables
|
|
43
|
-
```
|
|
44
|
-
$pattern {$x $y $z} {$z, $y, $x}
|
|
45
|
-
apple banana cherry
|
|
15
|
+
### Command Line Interface (CLI)
|
|
16
|
+
```sh
|
|
17
|
+
# Process with patterns defined in the file or passed via -p
|
|
18
|
+
papagaio -e '$pattern {hello $w} {Hi $w}' input.txt
|
|
46
19
|
```
|
|
47
|
-
Output: `cherry, banana, apple`
|
|
48
20
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
Papagaio provides flexible variable capture with automatic context-aware behavior.
|
|
54
|
-
|
|
55
|
-
### `$x` - Smart Variable
|
|
56
|
-
Automatically adapts based on context:
|
|
57
|
-
- **Before a block**: Captures everything until the block's opening delimiter
|
|
58
|
-
- **Before a literal**: Captures everything until that literal appears
|
|
59
|
-
- **Otherwise**: Captures a single word (non-whitespace token)
|
|
60
|
-
|
|
61
|
-
```
|
|
62
|
-
$pattern {$x} {[$x]}
|
|
63
|
-
hello world
|
|
64
|
-
```
|
|
65
|
-
Output: `[hello] [world]`
|
|
21
|
+
### C API
|
|
22
|
+
```c
|
|
23
|
+
Papagaio *ctx = papagaio_open();
|
|
66
24
|
|
|
25
|
+
char *out = papagaio_process_text(ctx, input_text, strlen(input_text));
|
|
26
|
+
printf("%s", out);
|
|
27
|
+
free(out);
|
|
28
|
+
papagaio_close(ctx);
|
|
67
29
|
```
|
|
68
|
-
$pattern {$name ${(}{)}content} {$name: $content}
|
|
69
|
-
greeting (hello world)
|
|
70
|
-
```
|
|
71
|
-
Output: `greeting: hello world`
|
|
72
30
|
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
```
|
|
77
|
-
Output: `value-key`
|
|
78
|
-
|
|
79
|
-
### `$x?` - Optional Variable
|
|
80
|
-
Same behavior as `$x`, but won't fail if empty or not found.
|
|
31
|
+
### JavaScript / WASM (Node.js)
|
|
32
|
+
```javascript
|
|
33
|
+
import Papagaio from './papagaio.js';
|
|
81
34
|
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
```
|
|
86
|
-
Output: `<>`
|
|
35
|
+
const p = new Papagaio();
|
|
36
|
+
await p.init();
|
|
37
|
+
p.registerCommand("mycmd", (name, ...args) => `Result: ${args[0]}`);
|
|
87
38
|
|
|
39
|
+
console.log(p.process('$mycmd{Hello}')); // Output: Result: Hello
|
|
88
40
|
```
|
|
89
|
-
$pattern {$greeting? $name} {Hello $name$greeting}
|
|
90
|
-
Hi John
|
|
91
|
-
```
|
|
92
|
-
Output: `Hello JohnHi`
|
|
93
41
|
|
|
94
42
|
---
|
|
95
43
|
|
|
96
|
-
##
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
###
|
|
106
|
-
|
|
107
|
-
$
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
44
|
+
## Pattern Syntax
|
|
45
|
+
|
|
46
|
+
Patterns are composed of whitespace-separated tokens. The engine uses a "flex-matching" strategy that automatically skips horizontal whitespace between variables.
|
|
47
|
+
|
|
48
|
+
- **Literal**: Matches exact text.
|
|
49
|
+
- **Variable**: `$name` (captures a sequence up to the next pattern match).
|
|
50
|
+
- **Optional**: `$name?` or `literal?` (marker is configurable, e.g., `MAYBE`, via `$changesymbols`).
|
|
51
|
+
- **Escaping**: Use `$$` to match a literal `$`.
|
|
52
|
+
|
|
53
|
+
### Modifiers
|
|
54
|
+
Modifiers specify the data type or constraints of a match:
|
|
55
|
+
- **Numbers**: `$var$int`, `$var$float`, `$var$number`
|
|
56
|
+
- **Casing**: `$var$upper`, `$var$lower`, `$var$capitalized`
|
|
57
|
+
- **Formats**: `$var$word`, `$var$identifier`, `$var$hex`, `$var$path`, `$var$binary`, `$var$percent`
|
|
58
|
+
- **Regex**: `$id$regex{[0-9]+}`
|
|
59
|
+
- **Block**: `$item$block{[}{]}` (captures everything between delimiters)
|
|
60
|
+
- **Aliases**: `$kind$aliases{cat}{dog}{bird}` (multi-block syntax).
|
|
61
|
+
- **Substrings**: `$var$starts{foo}`, `$var$ends{bar}`, `$var$prefix{p}`, `$var$suffix{s}`, `$var$infix{i}`, `$var$includes{x}`
|
|
62
|
+
- **Grouping**: `$item$group{subpattern}` (recursive grouping, matches as one unit)
|
|
63
|
+
- **Optionality**: any token (literal, variable, or group) can be made optional by adding `?` (or a custom marker like `MAYBE`).
|
|
64
|
+
- **Trailing Sigil (whitespace collapse)**: appending a bare `$` (or the current sigil) directly after any variable or literal causes the matcher to **consume all following whitespace** in the input — making the adjacent `TOK_WS` optional. This is useful when the number of spaces between tokens is variable:
|
|
65
|
+
```text
|
|
66
|
+
$pattern {$a$ $b} {$a/$b}
|
|
67
|
+
hello world → hello/world
|
|
68
|
+
```
|
|
69
|
+
The trailing `$` after `$a` collapses any run of spaces/tabs/newlines between `$a` and `$b`.
|
|
70
|
+
|
|
71
|
+
### Braced Variables
|
|
72
|
+
|
|
73
|
+
When a captured variable name needs to be immediately followed by literal text (e.g., a suffix), wrap the name in `${...}` to prevent ambiguity:
|
|
74
|
+
|
|
75
|
+
```text
|
|
76
|
+
$pattern {$id$word} {${id}x}
|
|
77
|
+
foo
|
|
78
|
+
```
|
|
79
|
+
*Output: `foox`* — without braces, `$idx` would be parsed as a single variable named `idx`.
|
|
80
|
+
|
|
81
|
+
Braced syntax can be used in any replacement string:
|
|
82
|
+
```text
|
|
83
|
+
$pattern {$first $last} {Hello, ${first}!
|
|
84
|
+
}
|
|
85
|
+
John Doe
|
|
116
86
|
```
|
|
117
|
-
Output: `
|
|
87
|
+
*Output: `Hello, John!`*
|
|
118
88
|
|
|
119
|
-
###
|
|
89
|
+
### Nesting
|
|
90
|
+
Modifiers support full recursive nesting:
|
|
91
|
+
```text
|
|
92
|
+
$pattern {$n$aliases{$x$int}{abc}} {VALUE: $n}
|
|
120
93
|
```
|
|
121
|
-
$pattern {$regex year {[0-9]{4}}-$regex month {[0-9]{2}}} {Month $month in $year}
|
|
122
|
-
2024-03
|
|
123
|
-
```
|
|
124
|
-
Output: `Month 03 in 2024`
|
|
125
94
|
|
|
126
95
|
---
|
|
127
96
|
|
|
128
|
-
##
|
|
129
|
-
|
|
130
|
-
Papagaio
|
|
131
|
-
|
|
132
|
-
###
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
```
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
```
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
```
|
|
155
|
-
Output: `DATA: json stuff`
|
|
156
|
-
|
|
157
|
-
**Multi-Character Delimiters:**
|
|
158
|
-
```
|
|
159
|
-
$pattern {${```}{```}code} {<pre>$code</pre>}
|
|
160
|
-
```markdown
|
|
161
|
-
# Title
|
|
162
|
-
```
|
|
163
|
-
Output: `<pre># Title</pre>`
|
|
164
|
-
|
|
165
|
-
**Default Delimiters (empty blocks):**
|
|
166
|
-
```
|
|
167
|
-
$pattern {${}{}data} {[$data]}
|
|
168
|
-
{hello world}
|
|
169
|
-
```
|
|
170
|
-
Output: `[hello world]`
|
|
171
|
-
*(Uses default `{` and `}` when delimiters are empty)*
|
|
97
|
+
## Extensibility (Wasm Plugins)
|
|
98
|
+
|
|
99
|
+
Papagaio follows a Wasm-first plugin architecture. Core features are limited to pattern matching and transformation, while custom text processing capabilities are provided by WebAssembly plugins.
|
|
100
|
+
|
|
101
|
+
### Built-in Operators
|
|
102
|
+
- **`$document`**: Injects the current state of the document (alias for `$document$current`).
|
|
103
|
+
- **`$document$original`**: Injects the initial, unprocessed input text. Useful for referencing the source even after multiple transformations.
|
|
104
|
+
- **`$document$current`**: Injects the current state of the document during the pre-processing pass.
|
|
105
|
+
- **`$wasm{path}`**: Loads a WebAssembly plugin from the file system (CLI only).
|
|
106
|
+
- **$file{path}**: Injects the content of a file from the file system (CLI only).
|
|
107
|
+
- **`$wat{source}`**: Compiles a WebAssembly Text Format (WAT) source string inline and registers all exported `papagaio_*` functions as commands. Useful for embedding lightweight plugins without an external `.wasm` file.
|
|
108
|
+
- **`$NAME$from{value}`**: Dynamically assigns a processed `value` to `$NAME`. The assignment itself is suppressed from the output, and the variable becomes available for exact-match replacement in the remaining document.
|
|
109
|
+
|
|
110
|
+
```text
|
|
111
|
+
$NAME$from{Alice}
|
|
112
|
+
Hello, $NAME!
|
|
113
|
+
```
|
|
114
|
+
*Output: `Hello, Alice!`*
|
|
115
|
+
|
|
116
|
+
```text
|
|
117
|
+
$wat{
|
|
118
|
+
(module
|
|
119
|
+
(func (export "papagaio_hello") (result i32)
|
|
120
|
+
i32.const 42))
|
|
121
|
+
}
|
|
122
|
+
$hello
|
|
123
|
+
```
|
|
172
124
|
|
|
173
|
-
|
|
174
|
-
```
|
|
175
|
-
$pattern {${(}{)}outer} {[$outer]}
|
|
176
|
-
(outer (inner (deep)))
|
|
177
|
-
```
|
|
178
|
-
Output: `[outer (inner (deep))]`
|
|
125
|
+
---
|
|
179
126
|
|
|
180
|
-
|
|
127
|
+
## CLI Argument Expansion
|
|
181
128
|
|
|
182
|
-
|
|
129
|
+
Papagaio can resolve command-line arguments directly within your source files. This is useful for passing configuration, flags, or metadata into the processing pipeline.
|
|
183
130
|
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
$${opening_delimiter}{closing_delimiter}varName
|
|
187
|
-
```
|
|
131
|
+
### Positional Arguments
|
|
132
|
+
The `argv` array maps as follows (where `argv[0]` is the binary name, invisible to Papagaio):
|
|
188
133
|
|
|
189
|
-
|
|
134
|
+
| Variable | Value |
|
|
135
|
+
|---|---|
|
|
136
|
+
| `$args$0` | `argv[1]` — the input file/script name |
|
|
137
|
+
| `$args$1`, `$args$2`, … | Subsequent positional arguments |
|
|
138
|
+
| `$args$count` | Total number of arguments (excludes the binary name, `argv[0]`) |
|
|
139
|
+
| `$args$all` | All extra arguments from index 1 onwards (after the script), joined with spaces |
|
|
190
140
|
|
|
191
|
-
**
|
|
192
|
-
```
|
|
193
|
-
$pattern {$${[}{]}items} {Items: $items}
|
|
194
|
-
[first][second][third]
|
|
195
|
-
```
|
|
196
|
-
Output: `Items: first second third`
|
|
141
|
+
If a `$args$NAME` variable is not found, it is emitted **literally** (e.g. `$args$missing` stays as-is).
|
|
197
142
|
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
```
|
|
203
|
-
Output: `Tags: html body div`
|
|
143
|
+
### Named Variables (Overrides)
|
|
144
|
+
Arguments in the format `key=value` are automatically parsed and can be accessed in two ways:
|
|
145
|
+
1. **Explicit**: `$args$key`
|
|
146
|
+
2. **Direct**: `$key` (shorthand for `$args$key`)
|
|
204
147
|
|
|
205
|
-
|
|
206
|
-
```
|
|
207
|
-
$pattern {$${}{}data} {Result: $data}
|
|
208
|
-
{a}{b}{c}
|
|
209
|
-
```
|
|
210
|
-
Output: `Result: a b c`
|
|
148
|
+
Direct access (`$key`) will only resolve if `key` does not conflict with a registered command (like `$wasm`) or a built-in directive.
|
|
211
149
|
|
|
212
|
-
|
|
150
|
+
#### Example:
|
|
151
|
+
```sh
|
|
152
|
+
./papagaiocc input.c version=1.2.3 target=wasm -O3
|
|
213
153
|
```
|
|
214
|
-
|
|
215
|
-
|
|
154
|
+
Inside `input.c`:
|
|
155
|
+
```c
|
|
156
|
+
const char *v = "$version"; // "1.2.3"
|
|
157
|
+
const char *t = "$target"; // "wasm"
|
|
158
|
+
const char *f = "$args$1"; // "-O3"
|
|
216
159
|
```
|
|
217
|
-
Output: `Nested: a (b c) | Seq: x y z`
|
|
218
|
-
|
|
219
|
-
### Block Comparison
|
|
220
|
-
|
|
221
|
-
| Feature | Nested `${}{}var` | Adjacent `$${}{}var` |
|
|
222
|
-
|---------|---------------------|------------------------|
|
|
223
|
-
| Purpose | Capture nested content | Capture adjacent blocks |
|
|
224
|
-
| Input | `[a [b [c]]]` | `[a][b][c]` |
|
|
225
|
-
| Output | `a [b [c]]` | `a b c` |
|
|
226
|
-
| Nesting | Handled recursively | Not nested, sequential |
|
|
227
|
-
| Spacing | Preserves internal structure | Joins with spaces |
|
|
228
160
|
|
|
229
161
|
---
|
|
230
162
|
|
|
231
|
-
##
|
|
163
|
+
## Dynamic Customization
|
|
232
164
|
|
|
233
|
-
|
|
165
|
+
You can redefine the engine's syntax symbols at runtime using the atomic **`$changesymbols`** directive.
|
|
234
166
|
|
|
235
|
-
###
|
|
236
|
-
|
|
237
|
-
$pattern {hello} {world}
|
|
238
|
-
hello
|
|
239
|
-
```
|
|
240
|
-
Output: `world`
|
|
167
|
+
### `$changesymbols{sigil}{open}{close}{optional}`
|
|
168
|
+
Default: `$changesymbols{$}{{} }{}}{?}`
|
|
241
169
|
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
* Patterns do not persist between `process()` calls
|
|
246
|
-
* Perfect for hierarchical transformations
|
|
247
|
-
|
|
248
|
-
### Nested Patterns with Inheritance
|
|
249
|
-
```
|
|
250
|
-
$pattern {outer $x} {
|
|
251
|
-
$pattern {inner $y} {[$y from $x]}
|
|
252
|
-
inner $x
|
|
253
|
-
}
|
|
254
|
-
outer hello
|
|
170
|
+
Example:
|
|
171
|
+
```text
|
|
172
|
+
$changesymbols{@}{<}{>}{!} @pattern <@n!> <ID: @n> [x] [y]
|
|
255
173
|
```
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
The inner pattern has access to `$x` from the outer pattern's capture.
|
|
259
|
-
|
|
260
|
-
### Deep Nesting
|
|
261
|
-
```
|
|
262
|
-
$pattern {level1 $a} {
|
|
263
|
-
$pattern {level2 $b} {
|
|
264
|
-
$pattern {level3 $c} {$a > $b > $c}
|
|
265
|
-
level3 $b
|
|
266
|
-
}
|
|
267
|
-
level2 $a
|
|
268
|
-
}
|
|
269
|
-
level1 ROOT
|
|
270
|
-
```
|
|
271
|
-
Output: `ROOT > ROOT > ROOT`
|
|
272
|
-
|
|
273
|
-
Each nested level inherits all patterns from parent scopes.
|
|
274
|
-
|
|
275
|
-
### Sibling Scopes Don't Share
|
|
276
|
-
```
|
|
277
|
-
$pattern {branch1} {
|
|
278
|
-
$pattern {x} {A}
|
|
279
|
-
x
|
|
280
|
-
}
|
|
281
|
-
$pattern {branch2} {
|
|
282
|
-
x
|
|
283
|
-
}
|
|
284
|
-
branch1
|
|
285
|
-
branch2
|
|
286
|
-
```
|
|
287
|
-
Output:
|
|
288
|
-
```
|
|
289
|
-
A
|
|
290
|
-
x
|
|
291
|
-
```
|
|
292
|
-
|
|
293
|
-
Patterns in `branch1` are not available in `branch2` (they are siblings, not parent-child).
|
|
174
|
+
This changes the sigil to `@`, delimiters to `< >`, and the optional marker to `!`. Preprocessor directives (like `$changesymbols` itself) always use the stable `$` and `{}` to remain functional.
|
|
294
175
|
|
|
295
176
|
---
|
|
296
177
|
|
|
297
|
-
##
|
|
178
|
+
## Recursive Priority System
|
|
298
179
|
|
|
299
|
-
|
|
300
|
-
Executes JavaScript code with access to the Papagaio instance.
|
|
180
|
+
Papagaio allows you to control the order of execution and side-effects (such as pattern definitions or WASM loading) using the **`$priority$N`** directive.
|
|
301
181
|
|
|
302
|
-
|
|
303
|
-
$
|
|
304
|
-
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
**Accessing Papagaio Instance:**
|
|
309
|
-
```
|
|
310
|
-
$pattern {info} {$eval{
|
|
311
|
-
return `Content length: ${papagaio.content.length}`;
|
|
312
|
-
}}
|
|
313
|
-
info
|
|
314
|
-
```
|
|
182
|
+
- **`$priority$0{...}`**: Maximum priority.
|
|
183
|
+
- **`$priority$max{...}`**: Alias for `INT_MIN + 1` (Absolute highest priority).
|
|
184
|
+
- **`$priority$min{...}`**: Alias for `INT_MAX - 1` (Absolute lowest priority).
|
|
185
|
+
- **`$priority$1`, `$priority$2`, ...**: Successively lower priorities.
|
|
186
|
+
- **Recursive Evaluation**: Blocks with higher numerical priority (lower value) are fully processed — including their own nested patterns and commands — before any lower-priority blocks, regardless of their physical position in the file.
|
|
187
|
+
- **Unspecified Priority**: Any text not wrapped in a `$priority` block is treated as priority `INT_MAX - 1` (processed last).
|
|
315
188
|
|
|
316
|
-
|
|
189
|
+
#### Example:
|
|
190
|
+
```text
|
|
191
|
+
$priority$1{ Result: A }
|
|
192
|
+
$priority$max{ $pattern{A}{OK} }
|
|
317
193
|
```
|
|
318
|
-
$pattern
|
|
319
|
-
5
|
|
320
|
-
```
|
|
321
|
-
Output: `10`
|
|
194
|
+
*Output: `Result: OK`* — even though `A` is used before being defined in the source, the `$priority$max` block ensures the pattern definition happens first.
|
|
322
195
|
|
|
323
196
|
---
|
|
324
197
|
|
|
325
|
-
##
|
|
326
|
-
|
|
327
|
-
### Variable Matching
|
|
328
|
-
* `$x` = smart capture (context-aware: word, until literal, or until block)
|
|
329
|
-
* `$x?` = optional version of `$x` (won't fail if empty)
|
|
330
|
-
* `$regex name {pattern}` = regex-based capture
|
|
331
|
-
* Variables automatically skip leading whitespace
|
|
332
|
-
* Trailing whitespace is trimmed when variables appear before literals
|
|
333
|
-
|
|
334
|
-
### Block Matching
|
|
335
|
-
* `${open}{close}name` = nested block capture
|
|
336
|
-
* `$${open}{close}name` = adjacent block capture (captures adjacent blocks)
|
|
337
|
-
* Supports multi-character delimiters of any length
|
|
338
|
-
* Empty delimiters `${}{}name` or `$${}{}name` use defaults from `symbols.open` and `symbols.close`
|
|
339
|
-
* Sequential blocks are joined with spaces in the captured variable
|
|
198
|
+
## Dynamic Variable Assignment ($from)
|
|
340
199
|
|
|
341
|
-
|
|
342
|
-
* `$pattern {match} {replace}` = pattern scoped to current context
|
|
343
|
-
* Patterns inherit from parent scopes hierarchically
|
|
344
|
-
* Each `process()` call starts with a clean slate (no persistence)
|
|
345
|
-
|
|
346
|
-
---
|
|
200
|
+
The **`$from`** operator allows you to capture processed content and assign it to a variable at runtime. This turns Papagaio into a stateful processor where variables can be defined, redefined, and chained.
|
|
347
201
|
|
|
348
|
-
|
|
202
|
+
### Syntax
|
|
203
|
+
`$NAME$from{...content...}`
|
|
349
204
|
|
|
350
|
-
|
|
205
|
+
1. **Recursive Processing**: The `content` is fully processed (patterns, WASM commands, other assignments) before being stored.
|
|
206
|
+
2. **Immediate Registration**: The variable is registered as an exact-match rule as soon as it is parsed. This allows for **chained assignments**.
|
|
207
|
+
3. **Output Suppression**: The entire `$from` directive is removed from the output text.
|
|
351
208
|
|
|
352
|
-
###
|
|
353
|
-
```javascript
|
|
354
|
-
const p = new Papagaio('$', '<<<', '>>>');
|
|
355
|
-
```
|
|
209
|
+
### Examples
|
|
356
210
|
|
|
357
|
-
|
|
211
|
+
#### Chained Assignments
|
|
212
|
+
Variables can depend on previously defined variables within the same document:
|
|
213
|
+
```text
|
|
214
|
+
$A$from{1}
|
|
215
|
+
$B$from{$A$A}
|
|
216
|
+
Count: $B
|
|
358
217
|
```
|
|
359
|
-
|
|
360
|
-
hello
|
|
361
|
-
```
|
|
362
|
-
Output: `[hello]`
|
|
218
|
+
*Output: `Count: 11`*
|
|
363
219
|
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
220
|
+
#### Nested Assignments
|
|
221
|
+
You can define internal variables while defining a larger block:
|
|
222
|
+
```text
|
|
223
|
+
$GREET$from{
|
|
224
|
+
$USER$from{Alice}
|
|
225
|
+
Hello, $USER!
|
|
226
|
+
}
|
|
227
|
+
$GREET
|
|
368
228
|
```
|
|
369
|
-
Output: `
|
|
229
|
+
*Output: `Hello, Alice!`*
|
|
370
230
|
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
231
|
+
#### Interaction with Patterns
|
|
232
|
+
Assignments can be used to dynamically generate pattern keys or replacements:
|
|
233
|
+
```text
|
|
234
|
+
$KEY$from{FOO}
|
|
235
|
+
$pattern{$KEY}{BAR}
|
|
236
|
+
Result: FOO
|
|
375
237
|
```
|
|
376
|
-
Output: `
|
|
238
|
+
*Output: `Result: BAR`*
|
|
377
239
|
|
|
378
240
|
---
|
|
379
241
|
|
|
380
|
-
##
|
|
242
|
+
## Plugin Development
|
|
381
243
|
|
|
382
|
-
|
|
383
|
-
```javascript
|
|
384
|
-
const p = new Papagaio();
|
|
385
|
-
const template = `
|
|
386
|
-
$pattern {# $title} {<h1>$title</h1>}
|
|
387
|
-
$pattern {## $title} {<h2>$title</h2>}
|
|
388
|
-
$pattern {**$text**} {<strong>$text</strong>}
|
|
389
|
-
|
|
390
|
-
# Hello World
|
|
391
|
-
## Subtitle
|
|
392
|
-
**bold text**
|
|
393
|
-
`;
|
|
394
|
-
|
|
395
|
-
p.process(template);
|
|
396
|
-
// Output:
|
|
397
|
-
// <h1>Hello World</h1>
|
|
398
|
-
// <h2>Subtitle</h2>
|
|
399
|
-
// <strong>bold text</strong>
|
|
400
|
-
```
|
|
244
|
+
Papagaio features a modern, frictionless Wasm plugin system. With the **`papagaiocc`** standalone compiler, you can write plugins in standard C using simple naming conventions and compile them into zero-dependency WebAssembly modules.
|
|
401
245
|
|
|
402
|
-
###
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
|
|
407
|
-
|
|
408
|
-
|
|
409
|
-
return
|
|
410
|
-
}
|
|
246
|
+
### 1. Write your plugin
|
|
247
|
+
Create a file named `greet.c`:
|
|
248
|
+
```c
|
|
249
|
+
// Functions starting with 'papagaio_' are automatically registered as commands
|
|
250
|
+
char* papagaio_greet(int argc, char **argv)
|
|
251
|
+
{
|
|
252
|
+
if (argc < 1) return "Hello, Stranger!";
|
|
253
|
+
return argv[0]; // return first argument
|
|
411
254
|
}
|
|
412
|
-
|
|
413
|
-
[apple][banana][cherry]
|
|
414
|
-
`;
|
|
415
|
-
|
|
416
|
-
p.process(template);
|
|
417
|
-
// Output:
|
|
418
|
-
// 1. apple
|
|
419
|
-
// 2. banana
|
|
420
|
-
// 3. cherry
|
|
421
255
|
```
|
|
422
256
|
|
|
423
|
-
|
|
424
|
-
```
|
|
425
|
-
|
|
426
|
-
p.vars = {}; // Custom property for storing variables
|
|
427
|
-
|
|
428
|
-
const template = `
|
|
429
|
-
$pattern {var $name = $value} {$eval{
|
|
430
|
-
papagaio.vars['$name'] = '$value';
|
|
431
|
-
return '';
|
|
432
|
-
}}
|
|
433
|
-
$pattern {get $name} {$eval{
|
|
434
|
-
return papagaio.vars['$name'] || 'undefined';
|
|
435
|
-
}}
|
|
436
|
-
|
|
437
|
-
var title = My Page
|
|
438
|
-
var author = John Doe
|
|
439
|
-
Title: get title
|
|
440
|
-
Author: get author
|
|
441
|
-
`;
|
|
442
|
-
|
|
443
|
-
p.process(template);
|
|
444
|
-
// Output:
|
|
445
|
-
// Title: My Page
|
|
446
|
-
// Author: John Doe
|
|
447
|
-
```
|
|
257
|
+
To use the Papagaio Wasm SDK (`lib.c`), copy it from `examples/lib.c` into your project and include it explicitly:
|
|
258
|
+
```c
|
|
259
|
+
#include "lib.c"
|
|
448
260
|
|
|
449
|
-
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
|
|
453
|
-
|
|
454
|
-
|
|
455
|
-
|
|
456
|
-
|
|
457
|
-
|
|
261
|
+
char* papagaio_greet(int argc, char **argv)
|
|
262
|
+
{
|
|
263
|
+
if (argc < 1) return "Hello, Stranger!";
|
|
264
|
+
|
|
265
|
+
// lib.c provides standard C functions like malloc and sprintf
|
|
266
|
+
char *res = (char*)malloc(strlen(argv[0]) + 16);
|
|
267
|
+
sprintf(res, "Hello, %s!", argv[0]);
|
|
268
|
+
|
|
269
|
+
return res;
|
|
458
270
|
}
|
|
459
|
-
|
|
460
|
-
if (true) then [yes branch] else <no branch>
|
|
461
|
-
if (false) then [yes branch] else <no branch>
|
|
462
|
-
`;
|
|
463
|
-
|
|
464
|
-
p.process(template);
|
|
465
|
-
// Output:
|
|
466
|
-
// yes branch
|
|
467
|
-
// no branch
|
|
468
271
|
```
|
|
469
272
|
|
|
470
|
-
###
|
|
471
|
-
|
|
472
|
-
|
|
473
|
-
|
|
474
|
-
$pattern {double $x} {$eval{return parseInt('$x') * 2}}
|
|
475
|
-
$pattern {add $x $y} {$eval{return parseInt('$x') + parseInt('$y')}}
|
|
476
|
-
|
|
477
|
-
double 5
|
|
478
|
-
add 3 7
|
|
479
|
-
add (double 4) 10
|
|
480
|
-
`;
|
|
481
|
-
|
|
482
|
-
p.process(template);
|
|
483
|
-
// Output:
|
|
484
|
-
// 10
|
|
485
|
-
// 10
|
|
486
|
-
// 18
|
|
487
|
-
```
|
|
488
|
-
|
|
489
|
-
### Sequential Block Processing
|
|
490
|
-
```javascript
|
|
491
|
-
const p = new Papagaio();
|
|
492
|
-
const template = `
|
|
493
|
-
$pattern {sum $${[}{]}nums} {
|
|
494
|
-
$eval{
|
|
495
|
-
const numbers = '$nums'.split(' ').map(x => parseInt(x));
|
|
496
|
-
return numbers.reduce((a, b) => a + b, 0);
|
|
497
|
-
}
|
|
498
|
-
}
|
|
499
|
-
|
|
500
|
-
sum [10][20][30][40]
|
|
501
|
-
`;
|
|
502
|
-
|
|
503
|
-
p.process(template);
|
|
504
|
-
// Output: 100
|
|
273
|
+
### 2. Compile with `papagaiocc`
|
|
274
|
+
The `papagaiocc` tool is a self-contained compiler driver. Run it with your source file:
|
|
275
|
+
```sh
|
|
276
|
+
./papagaiocc greet.c
|
|
505
277
|
```
|
|
278
|
+
This generates `greet.wasm`.
|
|
506
279
|
|
|
507
|
-
|
|
508
|
-
|
|
509
|
-
|
|
510
|
-
|
|
511
|
-
| Problem | Solution |
|
|
512
|
-
|---------|----------|
|
|
513
|
-
| Variable not captured | Check context: use `$x?` for optional, or verify literals/blocks exist |
|
|
514
|
-
| Block mismatch | Verify opening and closing delimiters match the declaration |
|
|
515
|
-
| Infinite recursion | Pattern creates circular transformation; redesign pattern logic |
|
|
516
|
-
| Pattern not matching | Verify whitespace between tokens, check if variable should be optional |
|
|
517
|
-
| Pattern not available | Check scope hierarchy; patterns only inherit from parents, not siblings |
|
|
518
|
-
| Nested blocks fail | Ensure delimiters are properly balanced |
|
|
519
|
-
| Multi-char delimiters broken | Check delimiters don't conflict; use escaping if needed |
|
|
520
|
-
| Regex not matching | Test regex pattern separately; ensure it matches at the exact position |
|
|
521
|
-
| Empty delimiter behavior | `${}{}x` uses defaults; explicitly set if you need different behavior |
|
|
522
|
-
|
|
523
|
-
---
|
|
524
|
-
|
|
525
|
-
## Syntax Reference
|
|
526
|
-
|
|
280
|
+
If your plugin uses `lib.c`, pass the directory containing it via `-I`:
|
|
281
|
+
```sh
|
|
282
|
+
./papagaiocc greet.c -I /path/to/lib
|
|
527
283
|
```
|
|
528
|
-
|
|
529
|
-
|
|
530
|
-
|
|
531
|
-
|
|
532
|
-
|
|
533
|
-
$pattern {${}{}n} {$n} # block with default delimiters
|
|
534
|
-
$eval{code} # JavaScript evaluation
|
|
284
|
+
Or simply place `lib.c` in the same directory as `greet.c`:
|
|
285
|
+
```sh
|
|
286
|
+
# Copy the SDK alongside your source
|
|
287
|
+
cp examples/lib.c .
|
|
288
|
+
./papagaiocc greet.c
|
|
535
289
|
```
|
|
536
290
|
|
|
537
|
-
|
|
538
|
-
|
|
539
|
-
|
|
540
|
-
|
|
541
|
-
|
|
542
|
-
```javascript
|
|
543
|
-
new Papagaio(sigil, open, close, pattern, evalKw, blockKw, regexKw, blockseqKw)
|
|
291
|
+
### 3. Use in Papagaio
|
|
292
|
+
Loading the Wasm file automatically registers all exported commands.
|
|
293
|
+
```text
|
|
294
|
+
$wasm{greet.wasm}
|
|
295
|
+
$greet{Papagaio}
|
|
544
296
|
```
|
|
297
|
+
*Output: Hello, Papagaio!*
|
|
545
298
|
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
- `
|
|
549
|
-
-
|
|
550
|
-
- `
|
|
551
|
-
-
|
|
552
|
-
- `regexKw` (default: `'regex'`) - Regex keyword
|
|
553
|
-
|
|
554
|
-
### Properties
|
|
555
|
-
- `papagaio.content` - Last processed output
|
|
556
|
-
- `papagaio.match` - Last matched substring (available in replacements)
|
|
557
|
-
- `papagaio.symbols` - Configuration object
|
|
558
|
-
- `papagaio.exit` - Optional hook function called after processing
|
|
559
|
-
|
|
560
|
-
### Methods
|
|
561
|
-
- `papagaio.process(input)` - Process input text and return transformed output
|
|
562
|
-
|
|
563
|
-
### Exit Hook
|
|
564
|
-
```javascript
|
|
565
|
-
const p = new Papagaio();
|
|
566
|
-
p.exit = function() {
|
|
567
|
-
console.log('Processing complete:', this.content);
|
|
568
|
-
};
|
|
569
|
-
p.process('$pattern {x} {y}\nx');
|
|
570
|
-
```
|
|
299
|
+
### Wasm SDK (lib.c)
|
|
300
|
+
The Wasm SDK lives at `examples/lib.c` inside the repository. It is **not** automatically embedded into `papagaiocc` — you supply it to your plugin's build as needed. It provides a curated, zero-dependency C standard library for WebAssembly, including:
|
|
301
|
+
- **Memory Management**: `malloc`, `free`, `realloc`
|
|
302
|
+
- **String Processing**: `strlen`, `strcpy`, `sprintf`, `strrev`, etc.
|
|
303
|
+
- **Formatted I/O**: `printf`, `snprintf`, `sscanf`
|
|
304
|
+
- **Standard Math**: `sin`, `cos`, `pow`, etc.
|
|
571
305
|
|
|
572
306
|
---
|
|
573
307
|
|
|
574
|
-
##
|
|
308
|
+
## Building
|
|
575
309
|
|
|
576
|
-
|
|
577
|
-
|
|
578
|
-
|
|
579
|
-
|
|
580
|
-
|
|
581
|
-
|
|
310
|
+
```sh
|
|
311
|
+
make # Core & CLI
|
|
312
|
+
make papagaiocc # Standalone plugin compiler
|
|
313
|
+
make wasm # WebAssembly build (Papagaio in the browser/node)
|
|
314
|
+
make test # Run comprehensive test suite
|
|
315
|
+
```
|
|
582
316
|
|
|
583
|
-
|
|
317
|
+
## References
|
|
584
318
|
|
|
585
|
-
|
|
319
|
+
- [cpp](https://en.wikipedia.org/wiki/C_preprocessor)
|
|
320
|
+
- [m4](https://www.gnu.org/software/m4/)
|
|
321
|
+
- [libregexp](https://bellard.org/quickjs/)
|
|
322
|
+
- [quickjs](https://bellard.org/quickjs/)
|
|
323
|
+
- [tcc](https://bellard.org/tcc/)
|
|
324
|
+
- [wasm3](https://github.com/wasm3/wasm3)
|
|
325
|
+
- [watr](https://github.com/dy/watr)
|