tree-sitter-ucode 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/README.md +49 -58
  2. package/grammar.js +214 -28
  3. package/markup/grammar.js +1057 -0
  4. package/markup/queries/folds.scm +20 -0
  5. package/markup/queries/highlights.scm +38 -0
  6. package/markup/queries/indents.scm +51 -0
  7. package/markup/queries/injections.scm +40 -0
  8. package/markup/queries/locals.scm +107 -0
  9. package/markup/queries/tags.scm +65 -0
  10. package/markup/queries/textobjects.scm +56 -0
  11. package/markup/src/grammar.json +5786 -0
  12. package/markup/src/node-types.json +3211 -0
  13. package/markup/src/parser.c +134461 -0
  14. package/markup/src/scanner.c +22 -0
  15. package/package.json +8 -7
  16. package/prebuilds/darwin-arm64/tree-sitter-ucode.node +0 -0
  17. package/prebuilds/linux-arm64/tree-sitter-ucode.node +0 -0
  18. package/prebuilds/linux-x64/tree-sitter-ucode.node +0 -0
  19. package/prebuilds/win32-x64/tree-sitter-ucode.node +0 -0
  20. package/queries/folds.scm +38 -0
  21. package/queries/highlights.scm +6 -0
  22. package/queries/indents.scm +63 -0
  23. package/queries/locals.scm +1 -0
  24. package/queries/textobjects.scm +84 -0
  25. package/scripts/generate-markup-grammar.js +93 -0
  26. package/src/grammar.json +1069 -226
  27. package/src/node-types.json +662 -8
  28. package/src/parser.c +106401 -25117
  29. package/src/scanner.c +16 -193
  30. package/src/scanner_impl.h +494 -0
  31. package/tree-sitter-ucode.wasm +0 -0
  32. package/tree-sitter-ucode_markup.wasm +0 -0
  33. package/tree-sitter.json +24 -12
  34. package/tmpl/grammar.js +0 -67
  35. package/tmpl/queries/highlights.scm +0 -23
  36. package/tmpl/queries/injections.scm +0 -8
  37. package/tmpl/queries/locals.scm +0 -3
  38. package/tmpl/src/grammar.json +0 -243
  39. package/tmpl/src/node-types.json +0 -230
  40. package/tmpl/src/parser.c +0 -707
  41. package/tmpl/src/scanner.c +0 -169
  42. package/tree-sitter-ucode_tmpl.wasm +0 -0
  43. /package/{tmpl → markup}/src/tree_sitter/alloc.h +0 -0
  44. /package/{tmpl → markup}/src/tree_sitter/array.h +0 -0
  45. /package/{tmpl → markup}/src/tree_sitter/parser.h +0 -0
package/README.md CHANGED
@@ -6,8 +6,13 @@ Two grammars are provided:
6
6
 
7
7
  | Grammar | Scope | File types |
8
8
  |---------|-------|------------|
9
- | `ucode` | `source.uc` | `.uc` |
10
- | `ucode_tmpl` | `source.uc.tmpl` | `.uc.tmpl`, `.utpl` |
9
+ | `ucode` | `source.uc` | `.uc`, `.ucode`, `.ut` |
10
+ | `ucode_markup` | `source.ucode.markup` | `.uc`, `.ucode`, `.ut`, `.uc.tmpl` (template files — detected by content) |
11
+
12
+ Both grammars share the same file extensions. Template files are distinguished from plain
13
+ code files by content: any file whose first tag opener (`{%`, `{{`, or `{#`) appears at
14
+ the start of a line is automatically parsed by `ucode_markup`. Plain code files fall back
15
+ to `ucode`. See [File-type detection](#file-type-detection) below.
11
16
 
12
17
  ## Ucode vs JavaScript
13
18
 
@@ -42,60 +47,37 @@ To regenerate parsers after editing a grammar file:
42
47
  # ucode grammar
43
48
  npx tree-sitter generate
44
49
 
45
- # ucode_tmpl grammar
46
- npx tree-sitter generate tmpl/grammar.js --output tmpl/src
50
+ # ucode_markup grammar (generated from grammar.js — do not edit markup/grammar.js directly)
51
+ node scripts/generate-markup-grammar.js
52
+ cd markup && npx tree-sitter generate
47
53
  ```
48
54
 
49
55
  ## Test
50
56
 
51
57
  ```sh
52
- npm test # runs tree-sitter test for ucode and ucode_tmpl
58
+ npm test # runs tree-sitter test for ucode and ucode_markup
53
59
  ```
54
60
 
55
61
  To filter by corpus file name:
56
62
 
57
63
  ```sh
58
64
  npx tree-sitter test --file-name control_flow
59
- npx tree-sitter test -p tmpl --file-name template
65
+ cd markup && npx tree-sitter test --file-name markup
60
66
  ```
61
67
 
62
- ## Use in Neovim (nvim-treesitter)
63
-
64
- Add to your nvim-treesitter config (e.g. `~/.config/nvim/lua/plugins/treesitter.lua`):
65
-
66
- ```lua
67
- local parser_config = require("nvim-treesitter.parsers").get_parser_configs()
68
-
69
- parser_config.ucode = {
70
- install_info = {
71
- url = "https://github.com/m00qek/tree-sitter-ucode",
72
- files = { "src/parser.c", "src/scanner.c" },
73
- branch = "main",
74
- },
75
- filetype = "ucode",
76
- }
77
-
78
- parser_config.ucode_tmpl = {
79
- install_info = {
80
- url = "https://github.com/m00qek/tree-sitter-ucode",
81
- files = { "tmpl/src/parser.c", "tmpl/src/scanner.c" },
82
- branch = "main",
83
- },
84
- filetype = "ucode_tmpl",
85
- }
86
- ```
68
+ ## File-type detection
87
69
 
88
- Associate `.uc` and `.uc.tmpl` files with the right filetypes:
70
+ Both grammars claim the same file extensions. Tools that respect `content-regex` in
71
+ `tree-sitter.json` (including the tree-sitter CLI ≥ 0.24) automatically route
72
+ template files to `ucode_markup` when a tag opener appears at the start of a line.
73
+ Editors that manage their own filetype dispatch (Neovim, Helix) need an explicit
74
+ rule — see the editor sections below.
89
75
 
90
- ```lua
91
- vim.filetype.add({
92
- extension = {
93
- uc = "ucode",
94
- utpl = "ucode_tmpl",
95
- },
96
- pattern = { [".*%.uc%.tmpl"] = "ucode_tmpl" },
97
- })
98
- ```
76
+ ## Use in Neovim
77
+
78
+ The easiest way to install this grammar in Neovim is with
79
+ [tree-sitter-manager.nvim](https://github.com/m00qek/tree-sitter-manager.nvim),
80
+ which handles parser registration, filetype detection, and query setup automatically.
99
81
 
100
82
  ## Use in Helix
101
83
 
@@ -103,33 +85,39 @@ Add to `~/.config/helix/languages.toml`:
103
85
 
104
86
  ```toml
105
87
  [[language]]
106
- name = "ucode"
107
- scope = "source.uc"
108
- file-types = ["uc"]
88
+ name = "ucode"
89
+ scope = "source.uc"
90
+ file-types = [{ glob = "*.uc" }, { glob = "*.ucode" }, { glob = "*.ut" }]
109
91
  comment-token = "//"
110
- indent = { tab-width = 2, unit = " " }
111
- grammar = "ucode"
92
+ indent = { tab-width = 2, unit = " " }
93
+ grammar = "ucode"
112
94
 
113
95
  [[language]]
114
- name = "ucode-tmpl"
115
- scope = "source.uc.tmpl"
116
- file-types = ["uc.tmpl", "utpl"]
117
- grammar = "ucode_tmpl"
96
+ name = "ucode-markup"
97
+ scope = "source.ucode.markup"
98
+ file-types = [{ glob = "*.uc.tmpl" }]
99
+ comment-token = "{#"
100
+ indent = { tab-width = 2, unit = " " }
101
+ grammar = "ucode_markup"
118
102
 
119
103
  [[grammar]]
120
104
  name = "ucode"
121
- source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "main" }
105
+ source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.4.0" }
122
106
 
123
107
  [[grammar]]
124
- name = "ucode_tmpl"
125
- source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "main", subpath = "tmpl" }
108
+ name = "ucode_markup"
109
+ source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.4.0", subpath = "markup" }
126
110
  ```
127
111
 
128
- ## Template files (.uc.tmpl / .utpl)
112
+ Helix does not support content-based filetype detection for shared extensions. For
113
+ `.uc` files that are templates, use `:set-language ucode-markup` in command mode,
114
+ or configure a file-specific override via a `.helix/languages.toml` in your project.
115
+
116
+ ## Template files
129
117
 
130
- Template files mix raw text with code tags. The `ucode_tmpl` grammar produces
131
- a document tree; editors use language injection to apply ucode highlighting
132
- inside the code and expression tags.
118
+ Template files mix raw text with code tags. The `ucode_markup` grammar produces a
119
+ `markup` tree; editors use language injection to apply ucode highlighting inside the
120
+ code and expression tags.
133
121
 
134
122
  | Tag | Purpose |
135
123
  |-----|---------|
@@ -141,7 +129,10 @@ inside the code and expression tags.
141
129
  | `{%+ ... %}` | Statement block — suppress `lstrip_blocks` stripping |
142
130
  | `{#- ... -#}` | Comment — strip whitespace on both sides |
143
131
 
144
- Opener and closer markers are independent: any opener variant may be combined with any closer variant. `{%-` / `{{-` / `{#-` strip the preceding raw text; `-%}` / `-}}` / `-#}` strip the following raw text. `{%+` suppresses `lstrip_blocks` stripping and may be combined with `-%}` (e.g. `{%+ ... -%}`).
132
+ Opener and closer markers are independent: any opener variant may be combined
133
+ with any closer variant. `{%-` / `{{-` / `{#-` strip the preceding raw text;
134
+ `-%}` / `-}}` / `-#}` strip the following raw text. `{%+` suppresses
135
+ `lstrip_blocks` stripping and may be combined with `-%}`.
145
136
 
146
137
  Example:
147
138
 
package/grammar.js CHANGED
@@ -14,9 +14,19 @@ module.exports = grammar({
14
14
  name: 'ucode',
15
15
 
16
16
  externals: $ => [
17
- $._automatic_semicolon,
18
- $._template_chars,
19
- $._ternary_qmark,
17
+ $._automatic_semicolon, // 0
18
+ $._template_chars, // 1
19
+ $._ternary_qmark, // 2
20
+ $.raw_text, // 3 literal text outside tags
21
+ $.statement_tag_open, // 4 {%
22
+ $.statement_tag_trim_open, // 5 {%-
23
+ $.statement_tag_lstrip_open, // 6 {%+
24
+ $.statement_tag_close, // 7 %}
25
+ $.statement_tag_trim_close, // 8 -%}
26
+ $.expression_tag_open, // 9 {{
27
+ $.expression_tag_trim_open, // 10 {{-
28
+ $.expression_tag_close, // 11 }}
29
+ $.expression_tag_trim_close, // 12 -}}
20
30
  ],
21
31
 
22
32
  extras: $ => [
@@ -73,6 +83,12 @@ module.exports = grammar({
73
83
  $._identifier,
74
84
  $._reserved_identifier,
75
85
  $._lhs_expression,
86
+ $._markup_node,
87
+ $._if_markup_node,
88
+ $._stmt_open,
89
+ $._stmt_close,
90
+ $._expr_open,
91
+ $._expr_close,
76
92
  ],
77
93
 
78
94
  precedences: $ => [
@@ -120,6 +136,63 @@ module.exports = grammar({
120
136
  repeat($.statement),
121
137
  ),
122
138
 
139
+ //
140
+ // Markup entry point
141
+ //
142
+ // A .uc.tmpl document is a flat sequence of markup nodes: raw
143
+ // text, comment tags, expression tags, statement tags, and the
144
+ // alt-syntax constructs that span multiple tags.
145
+ //
146
+ // Statement tags that contain only simple (non-spanning) code are
147
+ // wrapped in `statement_tag`. Alt-syntax constructs that span tag
148
+ // boundaries appear directly as markup nodes with explicit tag-open /
149
+ // tag-close fields, giving a pristine tree with no empty-statement noise.
150
+ //
151
+ markup: $ => seq(
152
+ optional($.hash_bang_line),
153
+ repeat($._markup_node),
154
+ ),
155
+
156
+ _markup_node: $ => choice(
157
+ $.raw_text,
158
+ $.expression_tag,
159
+ $.comment_tag,
160
+ $.statement_tag,
161
+ // Alt-syntax constructs that span tag boundaries:
162
+ $.if_alt_statement,
163
+ $.for_alt_statement,
164
+ $.for_in_alt_statement,
165
+ $.while_alt_statement,
166
+ ),
167
+
168
+ // -----------------------------------------------------------------------
169
+ // Simple tag wrappers
170
+ // -----------------------------------------------------------------------
171
+
172
+ // A statement_tag wraps non-spanning code: {% stmt; stmt; %}
173
+ statement_tag: $ => seq(
174
+ field('open', $._stmt_open),
175
+ repeat($.statement),
176
+ field('close', $._stmt_close),
177
+ ),
178
+
179
+ // {{ expr }} or {{- expr -}}
180
+ expression_tag: $ => seq(
181
+ field('open', $._expr_open),
182
+ optional($._expressions),
183
+ field('close', $._expr_close),
184
+ ),
185
+
186
+ // {# ... #} with optional whitespace-stripping markers
187
+ comment_tag: $ => seq(
188
+ field('open', choice('{#-', '{#')),
189
+ optional(field('content', $.comment_content)),
190
+ field('close', choice('-#}', '#}')),
191
+ ),
192
+
193
+ // Matches everything up to but not including #} or -#}
194
+ comment_content: _ => /([^#-]|#[^}]|-(?:[^#]|#[^}]))+/,
195
+
123
196
  hash_bang_line: _ => /#!.*/,
124
197
 
125
198
  //
@@ -285,22 +358,71 @@ module.exports = grammar({
285
358
  optional(field('alternative', $.else_clause)),
286
359
  )),
287
360
 
288
- // Alternative colon/endif syntax
289
- if_alt_statement: $ => seq(
290
- 'if',
361
+ // Alternative colon/endif syntax — two forms:
362
+ // code form: if (cond): stmts … endif (used in program / statement_tag)
363
+ // markup form: {% if (cond): %} … {% endif %} (spans tag boundaries in markup)
364
+ //
365
+ // The markup form uses a flat content repeat (_if_markup_node) rather than
366
+ // nested elif/else bodies. elif_clause_tag and else_alt_clause_tag are pure
367
+ // header tags that appear as regular items inside that repeat. This avoids
368
+ // the shift/reduce conflict that arises when a nested repeat($._markup_node)
369
+ // can't decide whether statement_tag_open starts another body node or the
370
+ // enclosing end tag.
371
+ if_alt_statement: $ => choice(
372
+ seq(
373
+ 'if',
374
+ field('condition', $.parenthesized_expression),
375
+ ':',
376
+ field('body', repeat($.statement)),
377
+ repeat(field('elif_clause', $.elif_clause)),
378
+ optional(field('else_body', $.else_alt_clause)),
379
+ 'endif',
380
+ ),
381
+ seq(
382
+ field('open', $._stmt_open),
383
+ 'if',
384
+ field('condition', $.parenthesized_expression),
385
+ ':',
386
+ repeat($.statement),
387
+ field('close', $._stmt_close),
388
+ repeat($._if_markup_node),
389
+ field('end_open', $._stmt_open),
390
+ 'endif',
391
+ field('end_close', $._stmt_close),
392
+ ),
393
+ ),
394
+
395
+ // Flat content node for if_alt_statement markup bodies.
396
+ // elif_clause_tag and else_alt_clause_tag are plain header tags here;
397
+ // the actual body content between them is expressed as sibling nodes.
398
+ _if_markup_node: $ => choice(
399
+ $._markup_node,
400
+ $.elif_clause_tag,
401
+ $.else_alt_clause_tag,
402
+ ),
403
+
404
+ // Inline wrappers for tag delimiter tokens.
405
+ // Each groups all variants (plain / trim / lstrip) so grammar rules stay
406
+ // concise while still surfacing distinct node types for highlight queries.
407
+ _stmt_open: $ => choice($.statement_tag_open, $.statement_tag_trim_open, $.statement_tag_lstrip_open),
408
+ _stmt_close: $ => choice($.statement_tag_close, $.statement_tag_trim_close),
409
+ _expr_open: $ => choice($.expression_tag_open, $.expression_tag_trim_open),
410
+ _expr_close: $ => choice($.expression_tag_close, $.expression_tag_trim_close),
411
+
412
+ elif_clause: $ => seq(
413
+ 'elif',
291
414
  field('condition', $.parenthesized_expression),
292
415
  ':',
293
416
  field('body', repeat($.statement)),
294
- repeat(field('elif_clause', $.elif_clause)),
295
- optional(field('else_body', $.else_alt_clause)),
296
- 'endif',
297
417
  ),
298
418
 
299
- elif_clause: $ => seq(
419
+ // Markup form of elif: just the header tag; body is sibling _if_markup_nodes
420
+ elif_clause_tag: $ => seq(
421
+ field('open', $._stmt_open),
300
422
  'elif',
301
423
  field('condition', $.parenthesized_expression),
302
424
  ':',
303
- field('body', repeat($.statement)),
425
+ field('close', $._stmt_close),
304
426
  ),
305
427
 
306
428
  else_alt_clause: $ => seq(
@@ -308,6 +430,13 @@ module.exports = grammar({
308
430
  field('body', repeat($.statement)),
309
431
  ),
310
432
 
433
+ // Markup form of else: just the header tag; body is sibling _if_markup_nodes
434
+ else_alt_clause_tag: $ => seq(
435
+ field('open', $._stmt_open),
436
+ 'else',
437
+ field('close', $._stmt_close),
438
+ ),
439
+
311
440
  switch_statement: $ => seq(
312
441
  'switch',
313
442
  field('value', $.parenthesized_expression),
@@ -319,11 +448,24 @@ module.exports = grammar({
319
448
  field('body', $.statement),
320
449
  ),
321
450
 
322
- for_alt_statement: $ => seq(
323
- forHeader($),
324
- ':',
325
- field('body', repeat($.statement)),
326
- 'endfor',
451
+ for_alt_statement: $ => choice(
452
+ seq(
453
+ forHeader($),
454
+ ':',
455
+ field('body', repeat($.statement)),
456
+ 'endfor',
457
+ ),
458
+ seq(
459
+ field('open', $._stmt_open),
460
+ forHeader($),
461
+ ':',
462
+ repeat($.statement),
463
+ field('close', $._stmt_close),
464
+ field('body', repeat($._markup_node)),
465
+ field('end_open', $._stmt_open),
466
+ 'endfor',
467
+ field('end_close', $._stmt_close),
468
+ ),
327
469
  ),
328
470
 
329
471
  for_in_statement: $ => seq(
@@ -332,12 +474,42 @@ module.exports = grammar({
332
474
  field('body', $.statement),
333
475
  ),
334
476
 
335
- for_in_alt_statement: $ => seq(
336
- 'for',
337
- $._for_header,
338
- ':',
339
- field('body', repeat($.statement)),
340
- 'endfor',
477
+ for_in_alt_statement: $ => choice(
478
+ seq(
479
+ 'for',
480
+ $._for_header,
481
+ ':',
482
+ field('body', repeat($.statement)),
483
+ 'endfor',
484
+ ),
485
+ seq(
486
+ field('open', $._stmt_open),
487
+ 'for',
488
+ $._for_header,
489
+ ':',
490
+ repeat($.statement),
491
+ field('close', $._stmt_close),
492
+ field('body', repeat($._markup_node)),
493
+ field('end_open', $._stmt_open),
494
+ 'endfor',
495
+ field('end_close', $._stmt_close),
496
+ ),
497
+ // Compact double-nested form: {% for (outer): for (inner): %} body {% endfor; endfor %}
498
+ // Both iterables contribute `right` fields; both loop vars contribute `left` fields.
499
+ seq(
500
+ field('open', $._stmt_open),
501
+ 'for',
502
+ $._for_header,
503
+ ':',
504
+ 'for',
505
+ $._for_header,
506
+ ':',
507
+ field('close', $._stmt_close),
508
+ field('body', repeat($._markup_node)),
509
+ field('end_open', $._stmt_open),
510
+ 'endfor', ';', 'endfor',
511
+ field('end_close', $._stmt_close),
512
+ ),
341
513
  ),
342
514
 
343
515
  // Supports both `for (k in obj)` and `for (k, v in obj)` (ucode two-variable form)
@@ -365,12 +537,26 @@ module.exports = grammar({
365
537
  field('body', $.statement),
366
538
  ),
367
539
 
368
- while_alt_statement: $ => seq(
369
- 'while',
370
- field('condition', $.parenthesized_expression),
371
- ':',
372
- field('body', repeat($.statement)),
373
- 'endwhile',
540
+ while_alt_statement: $ => choice(
541
+ seq(
542
+ 'while',
543
+ field('condition', $.parenthesized_expression),
544
+ ':',
545
+ field('body', repeat($.statement)),
546
+ 'endwhile',
547
+ ),
548
+ seq(
549
+ field('open', $._stmt_open),
550
+ 'while',
551
+ field('condition', $.parenthesized_expression),
552
+ ':',
553
+ repeat($.statement),
554
+ field('close', $._stmt_close),
555
+ field('body', repeat($._markup_node)),
556
+ field('end_open', $._stmt_open),
557
+ 'endwhile',
558
+ field('end_close', $._stmt_close),
559
+ ),
374
560
  ),
375
561
 
376
562
  try_statement: $ => seq(