tree-sitter-ucode 0.3.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/README.md +32 -81
  2. package/grammar.js +221 -35
  3. package/markup/grammar.js +1057 -0
  4. package/markup/queries/folds.scm +20 -0
  5. package/markup/queries/highlights.scm +38 -0
  6. package/markup/queries/indents.scm +51 -0
  7. package/markup/queries/injections.scm +40 -0
  8. package/markup/queries/locals.scm +107 -0
  9. package/markup/queries/tags.scm +65 -0
  10. package/markup/queries/textobjects.scm +56 -0
  11. package/markup/src/grammar.json +5814 -0
  12. package/markup/src/node-types.json +3224 -0
  13. package/markup/src/parser.c +134512 -0
  14. package/markup/src/scanner.c +22 -0
  15. package/package.json +9 -5
  16. package/prebuilds/darwin-arm64/tree-sitter-ucode.node +0 -0
  17. package/prebuilds/linux-arm64/tree-sitter-ucode.node +0 -0
  18. package/prebuilds/linux-x64/tree-sitter-ucode.node +0 -0
  19. package/prebuilds/win32-x64/tree-sitter-ucode.node +0 -0
  20. package/queries/locals.scm +8 -0
  21. package/queries/tags.scm +15 -2
  22. package/scripts/generate-markup-grammar.js +93 -0
  23. package/src/grammar.json +1104 -233
  24. package/src/node-types.json +736 -69
  25. package/src/parser.c +106697 -25362
  26. package/src/scanner.c +16 -193
  27. package/src/scanner_impl.h +494 -0
  28. package/tree-sitter-ucode.wasm +0 -0
  29. package/tree-sitter-ucode_markup.wasm +0 -0
  30. package/tree-sitter.json +46 -22
  31. package/ucdocs/grammar.js +284 -0
  32. package/ucdocs/queries/highlights.scm +25 -0
  33. package/ucdocs/queries/tags.scm +22 -0
  34. package/ucdocs/src/grammar.json +1437 -0
  35. package/ucdocs/src/node-types.json +1347 -0
  36. package/ucdocs/src/parser.c +6387 -0
  37. package/ucdocs/src/tree_sitter/alloc.h +54 -0
  38. package/ucdocs/src/tree_sitter/array.h +330 -0
  39. package/ucdocs/src/tree_sitter/parser.h +286 -0
  40. package/tmpl/grammar.js +0 -68
  41. package/tmpl/queries/folds.scm +0 -4
  42. package/tmpl/queries/highlights.scm +0 -23
  43. package/tmpl/queries/indents.scm +0 -5
  44. package/tmpl/queries/injections.scm +0 -8
  45. package/tmpl/queries/locals.scm +0 -3
  46. package/tmpl/src/grammar.json +0 -251
  47. package/tmpl/src/node-types.json +0 -238
  48. package/tmpl/src/parser.c +0 -724
  49. package/tmpl/src/scanner.c +0 -174
  50. package/tree-sitter-ucode_tmpl.wasm +0 -0
  51. /package/{tmpl → markup}/src/tree_sitter/alloc.h +0 -0
  52. /package/{tmpl → markup}/src/tree_sitter/array.h +0 -0
  53. /package/{tmpl → markup}/src/tree_sitter/parser.h +0 -0
package/README.md CHANGED
@@ -6,13 +6,13 @@ Two grammars are provided:
6
6
 
7
7
  | Grammar | Scope | File types |
8
8
  |---------|-------|------------|
9
- | `ucode` | `source.uc` | `.uc` |
10
- | `ucode_tmpl` | `source.uc.tmpl` | `.uc` (template files — detected by content) |
9
+ | `ucode` | `source.uc` | `.uc`, `.ucode`, `.ut` |
10
+ | `ucode_markup` | `source.ucode.markup` | `.uc`, `.ucode`, `.ut`, `.uc.tmpl` (template files — detected by content) |
11
11
 
12
- Both grammars use the `.uc` extension. Template files are distinguished from plain
13
- code files by content: any `.uc` file whose first tag opener (`{%`, `{{`, or `{#`)
14
- appears at the start of a line is automatically parsed by `ucode_tmpl`. Plain code
15
- files fall back to `ucode`. See [File-type detection](#file-type-detection) below.
12
+ Both grammars share the same file extensions. Template files are distinguished from plain
13
+ code files by content: any file whose first tag opener (`{%`, `{{`, or `{#`) appears at
14
+ the start of a line is automatically parsed by `ucode_markup`. Plain code files fall back
15
+ to `ucode`. See [File-type detection](#file-type-detection) below.
16
16
 
17
17
  ## Ucode vs JavaScript
18
18
 
@@ -47,79 +47,37 @@ To regenerate parsers after editing a grammar file:
47
47
  # ucode grammar
48
48
  npx tree-sitter generate
49
49
 
50
- # ucode_tmpl grammar
51
- npx tree-sitter generate tmpl/grammar.js --output tmpl/src
50
+ # ucode_markup grammar (generated from grammar.js — do not edit markup/grammar.js directly)
51
+ node scripts/generate-markup-grammar.js
52
+ cd markup && npx tree-sitter generate
52
53
  ```
53
54
 
54
55
  ## Test
55
56
 
56
57
  ```sh
57
- npm test # runs tree-sitter test for ucode and ucode_tmpl
58
+ npm test # runs tree-sitter test for ucode and ucode_markup
58
59
  ```
59
60
 
60
61
  To filter by corpus file name:
61
62
 
62
63
  ```sh
63
- npx tree-sitter test --lib-path ucode.so --lang-name ucode --file-name control_flow
64
- npx tree-sitter test -p tmpl --file-name template
64
+ npx tree-sitter test --file-name control_flow
65
+ cd markup && npx tree-sitter test --file-name markup
65
66
  ```
66
67
 
67
68
  ## File-type detection
68
69
 
69
- Both grammars claim the `.uc` extension. Tools that respect `content-regex` in
70
+ Both grammars claim the same file extensions. Tools that respect `content-regex` in
70
71
  `tree-sitter.json` (including the tree-sitter CLI ≥ 0.24) automatically route
71
- template files to `ucode_tmpl` when a tag opener appears at the start of a line.
72
+ template files to `ucode_markup` when a tag opener appears at the start of a line.
72
73
  Editors that manage their own filetype dispatch (Neovim, Helix) need an explicit
73
74
  rule — see the editor sections below.
74
75
 
75
- ## Use in Neovim (nvim-treesitter)
76
-
77
- Add to your nvim-treesitter config (e.g. `~/.config/nvim/lua/plugins/treesitter.lua`):
78
-
79
- ```lua
80
- local parser_config = require("nvim-treesitter.parsers").get_parser_configs()
81
-
82
- parser_config.ucode = {
83
- install_info = {
84
- url = "https://github.com/m00qek/tree-sitter-ucode",
85
- files = { "src/parser.c", "src/scanner.c" },
86
- branch = "main",
87
- },
88
- filetype = "ucode",
89
- }
90
-
91
- parser_config.ucode_tmpl = {
92
- install_info = {
93
- url = "https://github.com/m00qek/tree-sitter-ucode",
94
- files = { "tmpl/src/parser.c", "tmpl/src/scanner.c" },
95
- branch = "main",
96
- },
97
- filetype = "ucode_tmpl",
98
- }
99
- ```
76
+ ## Use in Neovim
100
77
 
101
- Associate `.uc` files with the right filetype. Since nvim-treesitter does not
102
- use `content-regex`, the mapping must be done with a `BufRead` autocmd or a
103
- filetype function that inspects the file content:
104
-
105
- ```lua
106
- vim.filetype.add({
107
- extension = {
108
- uc = function(path)
109
- -- Files whose first tag opener is at the start of a line are templates
110
- local f = io.open(path, "r")
111
- if f then
112
- local content = f:read("*a")
113
- f:close()
114
- if content:find("^%s*{[%%{#]", 1, false) or content:find("\n%s*{[%%{#]") then
115
- return "ucode_tmpl"
116
- end
117
- end
118
- return "ucode"
119
- end,
120
- },
121
- })
122
- ```
78
+ The easiest way to install this grammar in Neovim is with
79
+ [tree-sitter-manager.nvim](https://github.com/m00qek/tree-sitter-manager.nvim),
80
+ which handles parser registration, filetype detection, and query setup automatically.
123
81
 
124
82
  ## Use in Helix
125
83
 
@@ -129,37 +87,37 @@ Add to `~/.config/helix/languages.toml`:
129
87
  [[language]]
130
88
  name = "ucode"
131
89
  scope = "source.uc"
132
- file-types = [{ glob = "*.uc" }]
90
+ file-types = [{ glob = "*.uc" }, { glob = "*.ucode" }, { glob = "*.ut" }]
133
91
  comment-token = "//"
134
92
  indent = { tab-width = 2, unit = " " }
135
93
  grammar = "ucode"
136
94
 
137
95
  [[language]]
138
- name = "ucode-tmpl"
139
- scope = "source.uc.tmpl"
96
+ name = "ucode-markup"
97
+ scope = "source.ucode.markup"
98
+ file-types = [{ glob = "*.uc.tmpl" }]
140
99
  comment-token = "{#"
141
100
  indent = { tab-width = 2, unit = " " }
142
- grammar = "ucode_tmpl"
101
+ grammar = "ucode_markup"
143
102
 
144
103
  [[grammar]]
145
104
  name = "ucode"
146
- source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.3.0" }
105
+ source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.5.0" }
147
106
 
148
107
  [[grammar]]
149
- name = "ucode_tmpl"
150
- source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.3.0", subpath = "tmpl" }
108
+ name = "ucode_markup"
109
+ source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.5.0", subpath = "markup" }
151
110
  ```
152
111
 
153
- Helix does not support content-based filetype detection. To open a `.uc`
154
- template file as `ucode-tmpl`, use `:set-language ucode-tmpl` in command mode,
155
- or configure a file-specific override via a `.helix/languages.toml` in your
156
- project.
112
+ Helix does not support content-based filetype detection for shared extensions. For
113
+ `.uc` files that are templates, use `:set-language ucode-markup` in command mode,
114
+ or configure a file-specific override via a `.helix/languages.toml` in your project.
157
115
 
158
116
  ## Template files
159
117
 
160
- Template files mix raw text with code tags. The `ucode_tmpl` grammar produces a
161
- `document` tree; editors use language injection to apply ucode highlighting
162
- inside the code and expression tags.
118
+ Template files mix raw text with code tags. The `ucode_markup` grammar produces a
119
+ `markup` tree; editors use language injection to apply ucode highlighting inside the
120
+ code and expression tags.
163
121
 
164
122
  | Tag | Purpose |
165
123
  |-----|---------|
@@ -176,13 +134,6 @@ with any closer variant. `{%-` / `{{-` / `{#-` strip the preceding raw text;
176
134
  `-%}` / `-}}` / `-#}` strip the following raw text. `{%+` suppresses
177
135
  `lstrip_blocks` stripping and may be combined with `-%}`.
178
136
 
179
- **EOF-implicit close**: a `{%` statement tag that reaches end-of-file without
180
- an explicit `%}` is treated as implicitly closed. This supports the common
181
- OpenWrt pattern of a file that opens one `{%` block at the top and contains
182
- only ucode code, with no closing `%}`. The parse tree shows an `(eof_close)`
183
- node in the `close` field. Expression (`{{`) and comment (`{#`) tags still
184
- require explicit closers.
185
-
186
137
  Example:
187
138
 
188
139
  ```
package/grammar.js CHANGED
@@ -14,9 +14,19 @@ module.exports = grammar({
14
14
  name: 'ucode',
15
15
 
16
16
  externals: $ => [
17
- $._automatic_semicolon,
18
- $._template_chars,
19
- $._ternary_qmark,
17
+ $._automatic_semicolon, // 0
18
+ $._template_chars, // 1
19
+ $._ternary_qmark, // 2
20
+ $.raw_text, // 3 literal text outside tags
21
+ $.statement_tag_open, // 4 {%
22
+ $.statement_tag_trim_open, // 5 {%-
23
+ $.statement_tag_lstrip_open, // 6 {%+
24
+ $.statement_tag_close, // 7 %}
25
+ $.statement_tag_trim_close, // 8 -%}
26
+ $.expression_tag_open, // 9 {{
27
+ $.expression_tag_trim_open, // 10 {{-
28
+ $.expression_tag_close, // 11 }}
29
+ $.expression_tag_trim_close, // 12 -}}
20
30
  ],
21
31
 
22
32
  extras: $ => [
@@ -73,6 +83,12 @@ module.exports = grammar({
73
83
  $._identifier,
74
84
  $._reserved_identifier,
75
85
  $._lhs_expression,
86
+ $._markup_node,
87
+ $._if_markup_node,
88
+ $._stmt_open,
89
+ $._stmt_close,
90
+ $._expr_open,
91
+ $._expr_close,
76
92
  ],
77
93
 
78
94
  precedences: $ => [
@@ -120,6 +136,63 @@ module.exports = grammar({
120
136
  repeat($.statement),
121
137
  ),
122
138
 
139
+ //
140
+ // Markup entry point
141
+ //
142
+ // A .uc.tmpl document is a flat sequence of markup nodes: raw
143
+ // text, comment tags, expression tags, statement tags, and the
144
+ // alt-syntax constructs that span multiple tags.
145
+ //
146
+ // Statement tags that contain only simple (non-spanning) code are
147
+ // wrapped in `statement_tag`. Alt-syntax constructs that span tag
148
+ // boundaries appear directly as markup nodes with explicit tag-open /
149
+ // tag-close fields, giving a pristine tree with no empty-statement noise.
150
+ //
151
+ markup: $ => seq(
152
+ optional($.hash_bang_line),
153
+ repeat($._markup_node),
154
+ ),
155
+
156
+ _markup_node: $ => choice(
157
+ $.raw_text,
158
+ $.expression_tag,
159
+ $.comment_tag,
160
+ $.statement_tag,
161
+ // Alt-syntax constructs that span tag boundaries:
162
+ $.if_alt_statement,
163
+ $.for_alt_statement,
164
+ $.for_in_alt_statement,
165
+ $.while_alt_statement,
166
+ ),
167
+
168
+ // -----------------------------------------------------------------------
169
+ // Simple tag wrappers
170
+ // -----------------------------------------------------------------------
171
+
172
+ // A statement_tag wraps non-spanning code: {% stmt; stmt; %}
173
+ statement_tag: $ => seq(
174
+ field('open', $._stmt_open),
175
+ repeat($.statement),
176
+ field('close', $._stmt_close),
177
+ ),
178
+
179
+ // {{ expr }} or {{- expr -}}
180
+ expression_tag: $ => seq(
181
+ field('open', $._expr_open),
182
+ optional($._expressions),
183
+ field('close', $._expr_close),
184
+ ),
185
+
186
+ // {# ... #} with optional whitespace-stripping markers
187
+ comment_tag: $ => seq(
188
+ field('open', choice('{#-', '{#')),
189
+ optional(field('content', $.comment_content)),
190
+ field('close', choice('-#}', '#}')),
191
+ ),
192
+
193
+ // Matches everything up to but not including #} or -#}
194
+ comment_content: _ => /([^#-]|#[^}]|-(?:[^#]|#[^}]))+/,
195
+
123
196
  hash_bang_line: _ => /#!.*/,
124
197
 
125
198
  //
@@ -132,7 +205,7 @@ module.exports = grammar({
132
205
  export_statement: $ => choice(
133
206
  seq(
134
207
  'export',
135
- $.export_clause,
208
+ field('clause', $.export_clause),
136
209
  $._semicolon,
137
210
  ),
138
211
  seq(
@@ -184,22 +257,22 @@ module.exports = grammar({
184
257
  import_statement: $ => seq(
185
258
  'import',
186
259
  choice(
187
- seq($.import_clause, 'from', field('source', $.string)),
260
+ seq(field('clause', $.import_clause), 'from', field('source', $.string)),
188
261
  field('source', $.string),
189
262
  ),
190
263
  $._semicolon,
191
264
  ),
192
265
 
193
266
  import_clause: $ => choice(
194
- $.namespace_import,
195
- $.named_imports,
267
+ field('namespace', $.namespace_import),
268
+ field('named', $.named_imports),
196
269
  seq(
197
- $.identifier,
270
+ field('default', $.identifier),
198
271
  optional(seq(
199
272
  ',',
200
273
  choice(
201
- $.namespace_import,
202
- $.named_imports,
274
+ field('namespace', $.namespace_import),
275
+ field('named', $.named_imports),
203
276
  ),
204
277
  )),
205
278
  ),
@@ -285,22 +358,71 @@ module.exports = grammar({
285
358
  optional(field('alternative', $.else_clause)),
286
359
  )),
287
360
 
288
- // Alternative colon/endif syntax
289
- if_alt_statement: $ => seq(
290
- 'if',
361
+ // Alternative colon/endif syntax — two forms:
362
+ // code form: if (cond): stmts … endif (used in program / statement_tag)
363
+ // markup form: {% if (cond): %} … {% endif %} (spans tag boundaries in markup)
364
+ //
365
+ // The markup form uses a flat content repeat (_if_markup_node) rather than
366
+ // nested elif/else bodies. elif_clause_tag and else_alt_clause_tag are pure
367
+ // header tags that appear as regular items inside that repeat. This avoids
368
+ // the shift/reduce conflict that arises when a nested repeat($._markup_node)
369
+ // can't decide whether statement_tag_open starts another body node or the
370
+ // enclosing end tag.
371
+ if_alt_statement: $ => choice(
372
+ seq(
373
+ 'if',
374
+ field('condition', $.parenthesized_expression),
375
+ ':',
376
+ field('body', repeat($.statement)),
377
+ repeat(field('elif_clause', $.elif_clause)),
378
+ optional(field('else_body', $.else_alt_clause)),
379
+ 'endif',
380
+ ),
381
+ seq(
382
+ field('open', $._stmt_open),
383
+ 'if',
384
+ field('condition', $.parenthesized_expression),
385
+ ':',
386
+ repeat($.statement),
387
+ field('close', $._stmt_close),
388
+ repeat($._if_markup_node),
389
+ field('end_open', $._stmt_open),
390
+ 'endif',
391
+ field('end_close', $._stmt_close),
392
+ ),
393
+ ),
394
+
395
+ // Flat content node for if_alt_statement markup bodies.
396
+ // elif_clause_tag and else_alt_clause_tag are plain header tags here;
397
+ // the actual body content between them is expressed as sibling nodes.
398
+ _if_markup_node: $ => choice(
399
+ $._markup_node,
400
+ $.elif_clause_tag,
401
+ $.else_alt_clause_tag,
402
+ ),
403
+
404
+ // Inline wrappers for tag delimiter tokens.
405
+ // Each groups all variants (plain / trim / lstrip) so grammar rules stay
406
+ // concise while still surfacing distinct node types for highlight queries.
407
+ _stmt_open: $ => choice($.statement_tag_open, $.statement_tag_trim_open, $.statement_tag_lstrip_open),
408
+ _stmt_close: $ => choice($.statement_tag_close, $.statement_tag_trim_close),
409
+ _expr_open: $ => choice($.expression_tag_open, $.expression_tag_trim_open),
410
+ _expr_close: $ => choice($.expression_tag_close, $.expression_tag_trim_close),
411
+
412
+ elif_clause: $ => seq(
413
+ 'elif',
291
414
  field('condition', $.parenthesized_expression),
292
415
  ':',
293
416
  field('body', repeat($.statement)),
294
- repeat(field('elif_clause', $.elif_clause)),
295
- optional(field('else_body', $.else_alt_clause)),
296
- 'endif',
297
417
  ),
298
418
 
299
- elif_clause: $ => seq(
419
+ // Markup form of elif: just the header tag; body is sibling _if_markup_nodes
420
+ elif_clause_tag: $ => seq(
421
+ field('open', $._stmt_open),
300
422
  'elif',
301
423
  field('condition', $.parenthesized_expression),
302
424
  ':',
303
- field('body', repeat($.statement)),
425
+ field('close', $._stmt_close),
304
426
  ),
305
427
 
306
428
  else_alt_clause: $ => seq(
@@ -308,6 +430,13 @@ module.exports = grammar({
308
430
  field('body', repeat($.statement)),
309
431
  ),
310
432
 
433
+ // Markup form of else: just the header tag; body is sibling _if_markup_nodes
434
+ else_alt_clause_tag: $ => seq(
435
+ field('open', $._stmt_open),
436
+ 'else',
437
+ field('close', $._stmt_close),
438
+ ),
439
+
311
440
  switch_statement: $ => seq(
312
441
  'switch',
313
442
  field('value', $.parenthesized_expression),
@@ -319,11 +448,24 @@ module.exports = grammar({
319
448
  field('body', $.statement),
320
449
  ),
321
450
 
322
- for_alt_statement: $ => seq(
323
- forHeader($),
324
- ':',
325
- field('body', repeat($.statement)),
326
- 'endfor',
451
+ for_alt_statement: $ => choice(
452
+ seq(
453
+ forHeader($),
454
+ ':',
455
+ field('body', repeat($.statement)),
456
+ 'endfor',
457
+ ),
458
+ seq(
459
+ field('open', $._stmt_open),
460
+ forHeader($),
461
+ ':',
462
+ repeat($.statement),
463
+ field('close', $._stmt_close),
464
+ field('body', repeat($._markup_node)),
465
+ field('end_open', $._stmt_open),
466
+ 'endfor',
467
+ field('end_close', $._stmt_close),
468
+ ),
327
469
  ),
328
470
 
329
471
  for_in_statement: $ => seq(
@@ -332,12 +474,42 @@ module.exports = grammar({
332
474
  field('body', $.statement),
333
475
  ),
334
476
 
335
- for_in_alt_statement: $ => seq(
336
- 'for',
337
- $._for_header,
338
- ':',
339
- field('body', repeat($.statement)),
340
- 'endfor',
477
+ for_in_alt_statement: $ => choice(
478
+ seq(
479
+ 'for',
480
+ $._for_header,
481
+ ':',
482
+ field('body', repeat($.statement)),
483
+ 'endfor',
484
+ ),
485
+ seq(
486
+ field('open', $._stmt_open),
487
+ 'for',
488
+ $._for_header,
489
+ ':',
490
+ repeat($.statement),
491
+ field('close', $._stmt_close),
492
+ field('body', repeat($._markup_node)),
493
+ field('end_open', $._stmt_open),
494
+ 'endfor',
495
+ field('end_close', $._stmt_close),
496
+ ),
497
+ // Compact double-nested form: {% for (outer): for (inner): %} body {% endfor; endfor %}
498
+ // Both iterables contribute `right` fields; both loop vars contribute `left` fields.
499
+ seq(
500
+ field('open', $._stmt_open),
501
+ 'for',
502
+ $._for_header,
503
+ ':',
504
+ 'for',
505
+ $._for_header,
506
+ ':',
507
+ field('close', $._stmt_close),
508
+ field('body', repeat($._markup_node)),
509
+ field('end_open', $._stmt_open),
510
+ 'endfor', ';', 'endfor',
511
+ field('end_close', $._stmt_close),
512
+ ),
341
513
  ),
342
514
 
343
515
  // Supports both `for (k in obj)` and `for (k, v in obj)` (ucode two-variable form)
@@ -365,12 +537,26 @@ module.exports = grammar({
365
537
  field('body', $.statement),
366
538
  ),
367
539
 
368
- while_alt_statement: $ => seq(
369
- 'while',
370
- field('condition', $.parenthesized_expression),
371
- ':',
372
- field('body', repeat($.statement)),
373
- 'endwhile',
540
+ while_alt_statement: $ => choice(
541
+ seq(
542
+ 'while',
543
+ field('condition', $.parenthesized_expression),
544
+ ':',
545
+ field('body', repeat($.statement)),
546
+ 'endwhile',
547
+ ),
548
+ seq(
549
+ field('open', $._stmt_open),
550
+ 'while',
551
+ field('condition', $.parenthesized_expression),
552
+ ':',
553
+ repeat($.statement),
554
+ field('close', $._stmt_close),
555
+ field('body', repeat($._markup_node)),
556
+ field('end_open', $._stmt_open),
557
+ 'endwhile',
558
+ field('end_close', $._stmt_close),
559
+ ),
374
560
  ),
375
561
 
376
562
  try_statement: $ => seq(