papagaio 0.32.5 → 0.37.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,12 +1,11 @@
1
1
  # Papagaio
2
2
 
3
- Papagaio is a C-first, embeddable text processing engine. It is designed to be highly modular and **script-agnostic**, allowing core pattern matching to be used alone or extended via WebAssembly (Wasm) plugins.
3
+ Papagaio is an embeddable text processing engine.
4
4
 
5
5
  ## Key Features
6
6
 
7
7
  - **Lightweight Core**: Efficient C engine for pattern matching and transformation.
8
8
  - **Pattern-Matching**: Powerful capture system with built-in and custom modifiers.
9
- - **WebAssembly Plugins**: Highly secure, zero-dependency plugin architecture via an embedded `wasm3` runtime.
10
9
  - **Configurable Delimiters**: Redefine sigils, delimiters, and markers at runtime.
11
10
  - **Language Bindings**: Native usage in C and Node.js/WebAssembly.
12
11
 
@@ -28,17 +27,6 @@ free(out);
28
27
  papagaio_close(ctx);
29
28
  ```
30
29
 
31
- ### JavaScript / WASM (Node.js)
32
- ```javascript
33
- import Papagaio from './papagaio.js';
34
-
35
- const p = new Papagaio();
36
- await p.init();
37
- p.registerCommand("mycmd", (name, ...args) => `Result: ${args[0]}`);
38
-
39
- console.log(p.process('$mycmd{Hello}')); // Output: Result: Hello
40
- ```
41
-
42
30
  ---
43
31
 
44
32
  ## Pattern Syntax
@@ -55,7 +43,6 @@ Modifiers specify the data type or constraints of a match:
55
43
  - **Numbers**: `$var$int`, `$var$float`, `$var$number`
56
44
  - **Casing**: `$var$upper`, `$var$lower`, `$var$capitalized`
57
45
  - **Formats**: `$var$word`, `$var$identifier`, `$var$hex`, `$var$path`, `$var$binary`, `$var$percent`
58
- - **Regex**: `$id$regex{[0-9]+}`
59
46
  - **Block**: `$item$block{[}{]}` (captures everything between delimiters)
60
47
  - **Aliases**: `$kind$aliases{cat}{dog}{bird}` (multi-block syntax).
61
48
  - **Substrings**: `$var$starts{foo}`, `$var$ends{bar}`, `$var$prefix{p}`, `$var$suffix{s}`, `$var$infix{i}`, `$var$includes{x}`
@@ -94,17 +81,11 @@ $pattern {$n$aliases{$x$int}{abc}} {VALUE: $n}
94
81
 
95
82
  ---
96
83
 
97
- ## Extensibility (Wasm Plugins)
98
-
99
- Papagaio follows a Wasm-first plugin architecture. Core features are limited to pattern matching and transformation, while custom text processing capabilities are provided by WebAssembly plugins.
100
-
101
84
  ### Built-in Operators
102
85
  - **`$document`**: Injects the current state of the document (alias for `$document$current`).
103
86
  - **`$document$original`**: Injects the initial, unprocessed input text. Useful for referencing the source even after multiple transformations.
104
87
  - **`$document$current`**: Injects the current state of the document during the pre-processing pass.
105
- - **`$wasm{path}`**: Loads a WebAssembly plugin from the file system (CLI only).
106
- - **$file{path}**: Injects the content of a file from the file system (CLI only).
107
- - **`$wat{source}`**: Compiles a WebAssembly Text Format (WAT) source string inline and registers all exported `papagaio_*` functions as commands. Useful for embedding lightweight plugins without an external `.wasm` file.
88
+ - **$file{path}**: Injects the content of a file from the file system.
108
89
  - **`$NAME$from{value}`**: Dynamically assigns a processed `value` to `$NAME`. The assignment itself is suppressed from the output, and the variable becomes available for exact-match replacement in the remaining document.
109
90
 
110
91
  ```text
@@ -113,15 +94,6 @@ Papagaio follows a Wasm-first plugin architecture. Core features are limited to
113
94
  ```
114
95
  *Output: `Hello, Alice!`*
115
96
 
116
- ```text
117
- $wat{
118
- (module
119
- (func (export "papagaio_hello") (result i32)
120
- i32.const 42))
121
- }
122
- $hello
123
- ```
124
-
125
97
  ---
126
98
 
127
99
  ## CLI Argument Expansion
@@ -145,11 +117,11 @@ Arguments in the format `key=value` are automatically parsed and can be accessed
145
117
  1. **Explicit**: `$args$key`
146
118
  2. **Direct**: `$key` (shorthand for `$args$key`)
147
119
 
148
- Direct access (`$key`) will only resolve if `key` does not conflict with a registered command (like `$wasm`) or a built-in directive.
120
+ Direct access (`$key`) will only resolve if `key` does not conflict with a registered command or a built-in directive.
149
121
 
150
122
  #### Example:
151
123
  ```sh
152
- ./papagaiocc input.c version=1.2.3 target=wasm -O3
124
+ # Then compile ready.c with clang
153
125
  ```
154
126
  Inside `input.c`:
155
127
  ```c
@@ -177,7 +149,7 @@ This changes the sigil to `@`, delimiters to `< >`, and the optional marker to `
177
149
 
178
150
  ## Recursive Priority System
179
151
 
180
- Papagaio allows you to control the order of execution and side-effects (such as pattern definitions or WASM loading) using the **`$priority$N`** directive.
152
+ Papagaio allows you to control the order of execution and side-effects (such as pattern definitions) using the **`$priority$N`** directive.
181
153
 
182
154
  - **`$priority$0{...}`**: Maximum priority.
183
155
  - **`$priority$max{...}`**: Alias for `INT_MIN + 1` (Absolute highest priority).
@@ -202,7 +174,7 @@ The **`$from`** operator allows you to capture processed content and assign it t
202
174
  ### Syntax
203
175
  `$NAME$from{...content...}`
204
176
 
205
- 1. **Recursive Processing**: The `content` is fully processed (patterns, WASM commands, other assignments) before being stored.
177
+ 1. **Recursive Processing**: The `content` is fully processed (patterns, other assignments) before being stored.
206
178
  2. **Immediate Registration**: The variable is registered as an exact-match rule as soon as it is parsed. This allows for **chained assignments**.
207
179
  3. **Output Suppression**: The entire `$from` directive is removed from the output text.
208
180
 
@@ -239,69 +211,130 @@ Result: FOO
239
211
 
240
212
  ---
241
213
 
242
- ## Plugin Development
214
+ ## List Operations (`$list`)
243
215
 
244
- Papagaio features a modern, frictionless Wasm plugin system. With the **`papagaiocc`** standalone compiler, you can write plugins in standard C using simple naming conventions and compile them into zero-dependency WebAssembly modules.
216
+ Any variable can be treated as a list by accessing it through the `$list` modifier chain. The **separator** can be any string (single character or multi-character) and is itself **processed** before use, allowing dynamic separators.
245
217
 
246
- ### 1. Write your plugin
247
- Create a file named `greet.c`:
248
- ```c
249
- // Functions starting with 'papagaio_' are automatically registered as commands
250
- char* papagaio_greet(int argc, char **argv)
251
- {
252
- if (argc < 1) return "Hello, Stranger!";
253
- return argv[0]; // return first argument
254
- }
218
+ ### Syntax
219
+
220
+ ```
221
+ $VARNAME$list$OPERATION{separator}{...arguments}
255
222
  ```
256
223
 
257
- To use the Papagaio Wasm SDK (`lib.c`), copy it from `lib/lib.c` into your project and include it explicitly:
258
- ```c
259
- #include "lib.c"
260
-
261
- char* papagaio_greet(int argc, char **argv)
262
- {
263
- if (argc < 1) return "Hello, Stranger!";
264
-
265
- // lib.c provides standard C functions like malloc and sprintf
266
- char *res = (char*)malloc(strlen(argv[0]) + 16);
267
- sprintf(res, "Hello, %s!", argv[0]);
268
-
269
- return res;
270
- }
224
+ ### Operations
225
+
226
+ | Operation | Signature | Emits | Mutates |
227
+ |---|---|---|---|
228
+ | `get` | `$V$list$get{sep}{idx}` | Element at index | No |
229
+ | `set` | `$V$list$set{sep}{idx}{content}` | Nothing | Yes |
230
+ | `push` | `$V$list$push{sep}{content}` | Nothing | Yes |
231
+ | `pop` | `$V$list$pop{sep}` | Last element | Yes |
232
+ | `shift` | `$V$list$shift{sep}` | First element | Yes |
233
+ | `unshift` | `$V$list$unshift{sep}{content}` | Nothing | Yes |
234
+ | `insert` | `$V$list$insert{sep}{idx}{content}` | Nothing | Yes |
235
+ | `remove` | `$V$list$remove{sep}{idx}` | Nothing | Yes |
236
+ | `swap` | `$V$list$swap{sep}{idx_a}{idx_b}` | Nothing | Yes |
237
+ | `reverse` | `$V$list$reverse{sep}` | Nothing | Yes |
238
+ | `count` | `$V$list$count{sep}` | Number of elements | No |
239
+ | `join` | `$V$list$join{sep_orig}{sep_new}` | List with new separator | No |
240
+
241
+ **Index rules**: zero-based; negative indices count from the end (`-1` = last); out-of-range access emits `""` silently.
242
+
243
+ ### Examples
244
+
245
+ ```text
246
+ $FRUITS$from{apple,banana,orange}
247
+
248
+ $FRUITS$list$get{,}{0} → apple
249
+ $FRUITS$list$get{,}{-1} → orange
250
+ $FRUITS$list$count{,} → 3
271
251
  ```
272
252
 
273
- ### 2. Compile with `papagaiocc`
274
- The `papagaiocc` tool is a self-contained compiler driver. Run it with your source file:
275
- ```sh
276
- ./papagaiocc greet.c
253
+ ```text
254
+ $L$from{a,b,c}
255
+ $L$list$push{,}{d}
256
+ $L$list$set{,}{1}{B}
257
+ $L → a,B,c,d
277
258
  ```
278
- This generates `greet.wasm`.
279
259
 
280
- If your plugin uses `lib.c`, pass the directory containing it via `-I`:
281
- ```sh
282
- ./papagaiocc greet.c -I /path/to/lib
260
+ ```text
261
+ $STACK$from{x,y,z}
262
+ Popped: $STACK$list$pop{,}
263
+ Rest: $STACK → Popped: z / Rest: x,y
283
264
  ```
284
- Or simply place `lib.c` in the same directory as `greet.c`:
285
- ```sh
286
- # Copy the SDK alongside your source
287
- cp lib/lib.c .
288
- ./papagaiocc greet.c
265
+
266
+ ```text
267
+ $CSV$from{one,two,three}
268
+ $CSV$list$join{,}{ | } → one | two | three
289
269
  ```
290
270
 
291
- ### 3. Use in Papagaio
292
- Loading the Wasm file automatically registers all exported commands.
293
271
  ```text
294
- $wasm{greet.wasm}
295
- $greet{Papagaio}
272
+ $PATH$from{/usr/local/bin}
273
+ $PATH$list$get{/}{-1} → bin
296
274
  ```
297
- *Output: Hello, Papagaio!*
298
275
 
299
- ### Wasm SDK (lib.c)
300
- The Wasm SDK lives at `lib/lib.c` inside the repository. It is **not** automatically embedded into `papagaiocc` — you supply it to your plugin's build as needed. It provides a curated, zero-dependency C standard library for WebAssembly, including:
301
- - **Memory Management**: `malloc`, `free`, `realloc`
302
- - **String Processing**: `strlen`, `strcpy`, `sprintf`, `strrev`, etc.
303
- - **Formatted I/O**: `printf`, `snprintf`, `sscanf`
304
- - **Standard Math**: `sin`, `cos`, `pow`, etc.
276
+ ```text
277
+ /* Dynamic separator from variable */
278
+ $SEP$from{,}
279
+ $L$from{x,y,z}
280
+ $L$list$get{$SEP}{2} → z
281
+ ```
282
+
283
+ ---
284
+
285
+ ## Flow Control Operators
286
+
287
+ Papagaio provides operators for conditional logic and value chaining. These are treated as **suffix modifiers** that can be appended to any variable, list operation, or expression.
288
+
289
+ ### Syntax
290
+
291
+ ```
292
+ $VAL$compare{target}
293
+ $VAL$then{content}
294
+ $VAL$else{content}
295
+ ```
296
+
297
+ | Operator | Behavior |
298
+ |---|---|
299
+ | **`compare`** | If `$VAL` matches `target`, emits `$VAL`. Otherwise, emits `""`. |
300
+ | **`then`** | If `$VAL` is **not empty**, processes and emits `code`. Otherwise, emits `""`. |
301
+ | **`else`** | If `$VAL` **is empty**, processes and emits `code`. Otherwise, passes `$VAL` through. |
302
+ | **`repeat`** | `$repeat{N}{code}` | Executes `code` N times. Emits nothing; used for side effects. |
303
+ | **`while`** | `$while{pat}{code}` | Executes `code` while its result matches `pat`. Emits the last successful result. |
304
+ | **`until`** | `$until{pat}{code}` | Executes `code` until its result matches `pat`. Emits the match that caused the break. |
305
+ | **`byte`** | `$byte{code}` | Appends a byte (0-255) to the variable or current stream. |
306
+
307
+ ### Chaining (If-Then-Else)
308
+
309
+ Operators can be chained to create complex conditional logic. The output of one operator becomes the input for the next.
310
+
311
+ #### Basic If-Then:
312
+ ```text
313
+ $A$from{hello}
314
+ $A$compare{hello}$then{Matched!} → Matched!
315
+ $A$compare{world}$then{Matched!} → (empty)
316
+ ```
317
+
318
+ #### If-Then-Else Pattern:
319
+ ```text
320
+ $A$from{abc}
321
+ $A$compare{abc}$then{YES}$else{NO} → YES
322
+ $A$compare{xyz}$then{YES}$else{NO} → NO
323
+ ```
324
+
325
+ ### Standalone and Braced Usage
326
+
327
+ - **Standalone**: If used without a preceding variable (e.g., `$else{default}`), the input is assumed to be an empty string.
328
+ - **Braced**: You can pipe arbitrary braced expressions into flow operators: `${some content}$then{has content!}`.
329
+
330
+ #### Example:
331
+ ```text
332
+ $L$from{a,b,c}
333
+ $R$from{$L$list$get{,}{0}}
334
+ $R$compare{a}$then{Is A}$else{Not A} → Is A
335
+ ```
336
+
337
+ ---
305
338
 
306
339
  ---
307
340
 
@@ -309,17 +342,23 @@ The Wasm SDK lives at `lib/lib.c` inside the repository. It is **not** automatic
309
342
 
310
343
  ```sh
311
344
  make # Core & CLI
312
- make papagaiocc # Standalone plugin compiler
313
- make wasm # WebAssembly build (Papagaio in the browser/node)
345
+ make wasm # WebAssembly build (via Emscripten)
314
346
  make test # Run comprehensive test suite
315
347
  ```
316
348
 
349
+ ---
350
+
351
+ ## System Limits
352
+
353
+ | Feature | Limit | Rationale / Detail |
354
+ |---|---|---|
355
+ | **Symbol Length** | 15 characters | Sigils, delimiters (`{`, `}`), and markers are stored in fixed 16-byte buffers. |
356
+ | **String Size** | Unlimited | All internal buffers (`StrBuf`) use dynamic `realloc`. Limited only by available RAM. |
357
+ | **Pattern Count** | Unlimited | Registered rules are stored in a dynamic array. |
358
+ | **Priority Range** | `INT_MIN` to `INT_MAX` | Priorities are handled as standard signed integers. |
359
+ | **Recursion Depth** | Stack-limited | Deeply nested patterns or priority blocks are processed recursively. |
360
+
317
361
  ## References
318
362
 
319
363
  - [cpp](https://en.wikipedia.org/wiki/C_preprocessor)
320
- - [m4](https://www.gnu.org/software/m4/)
321
- - [libregexp](https://bellard.org/quickjs/)
322
- - [quickjs](https://bellard.org/quickjs/)
323
- - [tcc](https://bellard.org/tcc/)
324
- - [wasm3](https://github.com/wasm3/wasm3)
325
- - [watr](https://github.com/dy/watr)
364
+ - [m4](https://www.gnu.org/software/m4/)
@@ -1,89 +1,62 @@
1
- import createWasmModule from './papagaio_wasm.js';
1
+ import ModuleFactory from './papagaio_wasm.js';
2
2
 
3
3
  class Papagaio {
4
4
  constructor() {
5
5
  this._module = null;
6
+ this._ctx = null;
6
7
  this._initialized = false;
7
- this._initPromise = this.init();
8
+ this._args = [];
8
9
  }
9
10
 
10
11
  async init() {
11
- if (this._initialized) return;
12
- this._module = await createWasmModule();
12
+ if (this._initialized) return this;
13
+
14
+ // Load WASM module
15
+ this._module = await ModuleFactory();
13
16
  this._ctx = this._module._papagaio_open();
14
17
  this._initialized = true;
15
18
  return this;
16
19
  }
17
20
 
18
21
  registerCommand(name, handler) {
19
- if (!this._initialized) throw new Error("Not initialized");
20
-
21
- // name=i, argc=ii, argv=iii, argl=iiii, userdata=iiiii
22
- const wrapper = (ctx, namePtr, argc, argvPtr, arglPtr, userdata) => {
23
- const cmdName = this._module.UTF8ToString(namePtr);
24
- const args = [];
25
- for (let i = 0; i < argc; i++) {
26
- const ptr = this._module.getValue(argvPtr + (i * 4), "i32");
27
- const len = this._module.getValue(arglPtr + (i * 4), "i32");
28
- args.push(this._module.UTF8ToString(ptr, len));
29
- }
30
- const result = handler(cmdName, ...args);
31
-
32
- if (result === null || result === undefined) return 0;
33
- const resStr = String(result);
34
- const resLen = this._module.lengthBytesUTF8(resStr) + 1;
35
- const resPtr = this._module._malloc(resLen);
36
- this._module.stringToUTF8(resStr, resPtr, resLen);
37
- return resPtr;
38
- };
39
-
40
- const funcPtr = this._module.addFunction(wrapper, 'iiiiiii');
41
- this._module.ccall(
42
- "papagaio_register_command",
43
- null,
44
- ["number", "string", "number", "number"],
45
- [this._ctx, name, funcPtr, 0]
46
- );
22
+ // Note: Implementing JS->C callbacks requires addFunction and extra glue.
23
+ // For now, we warn that this is not yet supported in the pure WASM wrapper.
24
+ console.warn(`[WASM] Warning: registerCommand('${name}') is not supported in JS wrapper yet.`);
47
25
  }
48
26
 
49
27
  setArgs(argv) {
50
- if (!this._initialized) throw new Error("Not initialized");
51
- if (!Array.isArray(argv)) throw new Error("argv must be an array");
52
-
53
- /* Create a null-terminated array of string pointers in Wasm memory */
54
- const ptrs = argv.map(str => {
55
- const len = this._module.lengthBytesUTF8(str) + 1;
56
- const ptr = this._module._malloc(len);
57
- this._module.stringToUTF8(str, ptr, len);
58
- return ptr;
59
- });
28
+ if (!this._initialized) throw new Error("Papagaio not initialized. Call init() first.");
29
+ this._args = argv;
60
30
 
61
31
  const argc = argv.length;
62
- const argvPtr = this._module._malloc(argc * 4);
32
+ const argvPtr = this._module._malloc(argc * 4);
63
33
  for (let i = 0; i < argc; i++) {
64
- this._module.setValue(argvPtr + (i * 4), ptrs[i], "i32");
34
+ const str = argv[i];
35
+ const strLen = this._module.lengthBytesUTF8(str) + 1;
36
+ const strPtr = this._module._malloc(strLen);
37
+ this._module.stringToUTF8(str, strPtr, strLen);
38
+ this._module.setValue(argvPtr + (i * 4), strPtr, 'i32');
65
39
  }
66
40
 
67
41
  this._module._papagaio_set_args(this._ctx, argc, argvPtr);
68
- /* We don't free ptrs/argvPtr immediately as Papagaio might need them during processing.
69
- In a production-ready class, we should track these for cleanup. */
42
+ // We don't free argvPtr/strPtr here as Papagaio context might use them.
43
+ // They will be "leaked" until destroy(), which is fine for a processing session.
70
44
  }
71
45
 
72
46
  process(text) {
73
- if (!this._initialized) {
74
- throw new Error("Papagaio wasm not initialized. Use 'await papagaio.init()'");
75
- }
76
- const ptr = this._module.ccall(
77
- "papagaio_process_text",
78
- "number",
79
- ["number", "string", "number"],
80
- [this._ctx, text, text.length]
81
- );
47
+ if (!this._initialized) throw new Error("Papagaio not initialized. Call init() first.");
48
+
49
+ const textLen = this._module.lengthBytesUTF8(text);
50
+ const textPtr = this._module._malloc(textLen + 1);
51
+ this._module.stringToUTF8(text, textPtr, textLen + 1);
52
+
53
+ const outPtr = this._module._papagaio_process_text(this._ctx, textPtr, textLen);
54
+ const output = this._module.UTF8ToString(outPtr);
55
+
56
+ this._module._free(textPtr);
57
+ this._module._free(outPtr);
82
58
 
83
- if (ptr === 0) return "";
84
- const result = this._module.UTF8ToString(ptr);
85
- this._module._free(ptr);
86
- return result;
59
+ return output;
87
60
  }
88
61
 
89
62
  destroy() {
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "papagaio",
3
- "version": "0.32.5",
3
+ "version": "0.37.3",
4
4
  "description": "easy yet powerful preprocessor",
5
5
  "main": "dist/wasm/papagaio.js",
6
6
  "bin": {