tree-sitter-ucode 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -2,17 +2,24 @@
2
2
 
3
3
  Tree-sitter grammar for [ucode](https://github.com/jow-/ucode), the ECMAScript-like scripting language used in OpenWrt.
4
4
 
5
- Two grammars are provided:
5
+ Three grammars are provided:
6
6
 
7
- | Grammar | Scope | File types |
8
- |---------|-------|------------|
9
- | `ucode` | `source.uc` | `.uc`, `.ucode`, `.ut` |
10
- | `ucode_markup` | `source.ucode.markup` | `.uc`, `.ucode`, `.ut`, `.uc.tmpl` (template files — detected by content) |
7
+ | Grammar | Scope | File types | Purpose |
8
+ |---------|-------|------------|---------|
9
+ | `ucode` | `source.uc` | `.uc`, `.ucode`, `.ut` | Plain ucode source files |
10
+ | `ucode_markup` | `source.ucode.markup` | `.uc`, `.ucode`, `.ut` (template files — detected by content) | Ucode template files mixing raw text and code tags |
11
+ | `ucdocs` | — | injected | JSDoc-style `/** */` doc comment blocks |
12
+
13
+ `ucode` and `ucode_markup` share file extensions. Template files are distinguished from plain
14
+ code files by content: any file containing a tag opener (`{%`, `{{`, or `{#`) at the start
15
+ of a line (with optional leading whitespace) is automatically parsed by `ucode_markup`. Plain
16
+ code files fall back to `ucode`. See [File-type detection](#file-type-detection) below.
11
17
 
12
- Both grammars share the same file extensions. Template files are distinguished from plain
13
- code files by content: any file whose first tag opener (`{%`, `{{`, or `{#`) appears at
14
- the start of a line is automatically parsed by `ucode_markup`. Plain code files fall back
15
- to `ucode`. See [File-type detection](#file-type-detection) below.
18
+ `ucdocs` is not a standalone file grammar it is automatically injected by the `ucode` and
19
+ `ucode_markup` grammars into every `/** */` doc comment block. Tools that load grammars
20
+ directly from `tree-sitter.json` (including the tree-sitter CLI) handle this automatically.
21
+ Editor plugins may require registering the `ucdocs` grammar separately see the editor
22
+ sections below.
16
23
 
17
24
  ## Ucode vs JavaScript
18
25
 
@@ -26,9 +33,45 @@ Ucode is an ECMAScript subset with OpenWrt-specific extensions. Key differences:
26
33
  | Removed features | Destructuring, `for...of`, `do-while`, generators, forward declarations, dynamic `import()` | All supported |
27
34
  | Added number literals | `0177` (C octal), `0x1.8` (hex float), `0B`/`0O` prefixes | Standard only |
28
35
  | Added escape sequences | `\e` (ESC), `\a` (BEL), octal `\177` | Standard only |
36
+ | String unicode escapes | `\uXXXX` only (no `\u{…}`); no `\u` escapes in identifiers | `\uXXXX` and `\u{…}` |
29
37
  | Regex flags | `g`, `i`, `s` only | Full set |
30
38
  | Module system | Static `import`/`export` only; no `from` on re-exports | Full ES modules |
31
39
 
40
+ The grammar tracks ucode's parser closely, with one deliberate exception: **automatic
41
+ semicolon insertion is kept ECMAScript-style (more lenient than the compiler).** ucode
42
+ only lets you drop a statement's `;` before `}`, end-of-file, a template tag close, or an
43
+ alt-syntax end keyword (`endif`/`endfor`/`endwhile`/`endfunction`/`elif`/`else`), whereas
44
+ the grammar also tolerates a bare newline between statements so that in-progress edits are
45
+ not flagged as errors.
46
+
47
+ ## Doc comment grammar (ucdocs)
48
+
49
+ `/** */` blocks are parsed by the `ucdocs` grammar and injected into the host parse tree.
50
+ The grammar understands the following tags:
51
+
52
+ | Tag | Syntax |
53
+ |-----|--------|
54
+ | `@param` | `@param {Type} name description` |
55
+ | `@returns` / `@return` | `@returns {Type} description` |
56
+ | `@throws` / `@throw` | `@throws {Type} description` |
57
+ | `@type` | `@type {Type}` |
58
+ | `@typedef` | `@typedef {Type} TypeName` |
59
+ | `@template` | `@template T, U` |
60
+ | `@function` | `@function module:path#member` |
61
+ | `@module` | `@module name` |
62
+ | `@deprecated` | `@deprecated description` |
63
+ | `@since` | `@since version` |
64
+ | `@see` | `@see reference` |
65
+ | `@example` | `@example code` |
66
+ | `@default` | `@default value` |
67
+
68
+ Type expressions support: primitives (`int`, `float`, `string`, `boolean`, `null`, `void`,
69
+ `function`), `*`/`any`, `list<T>`, `dict<T>`, record types (`{field: T}`), named types
70
+ (`TypeName`, `TypeName<T, U>`), cross-module refs (`module:path.To.Type`), named function
71
+ types `(name: T) => U`, anonymous function types `function(T): U`, union `T | U`, nullable
72
+ `?T`, and array postfix `T[]`. Inline `{@link ...}` tags and optional params `[name=default]`
73
+ are also supported.
74
+
32
75
  ## Requirements
33
76
 
34
77
  - [tree-sitter CLI](https://github.com/tree-sitter/tree-sitter) ≥ 0.24
@@ -38,10 +81,10 @@ Ucode is an ECMAScript subset with OpenWrt-specific extensions. Key differences:
38
81
 
39
82
  ```sh
40
83
  npm install
41
- npm run build # generate + compile Node.js bindings
84
+ npm run build # generate + compile Node.js bindings (ucode and ucode_markup only)
42
85
  ```
43
86
 
44
- To regenerate parsers after editing a grammar file:
87
+ To regenerate parsers after editing a grammar file (run from the repo root):
45
88
 
46
89
  ```sh
47
90
  # ucode grammar
@@ -49,27 +92,33 @@ npx tree-sitter generate
49
92
 
50
93
  # ucode_markup grammar (generated from grammar.js — do not edit markup/grammar.js directly)
51
94
  node scripts/generate-markup-grammar.js
52
- cd markup && npx tree-sitter generate
95
+ npx tree-sitter generate markup/grammar.js --output markup/src
96
+
97
+ # ucdocs grammar (not included in npm run build — must be regenerated manually)
98
+ npx tree-sitter generate ucdocs/grammar.js --output ucdocs/src
53
99
  ```
54
100
 
55
101
  ## Test
56
102
 
57
103
  ```sh
58
- npm test # runs tree-sitter test for ucode and ucode_markup
104
+ npm test # builds and tests all three grammars (ucode, ucode_markup, ucdocs)
59
105
  ```
60
106
 
61
- To filter by corpus file name:
107
+ To filter by corpus file name (run from the repo root):
62
108
 
63
109
  ```sh
64
110
  npx tree-sitter test --file-name control_flow
65
- cd markup && npx tree-sitter test --file-name markup
111
+ (cd markup && npx tree-sitter test --file-name markup)
112
+ (cd ucdocs && npx tree-sitter test --file-name tags)
113
+ (cd ucdocs && npx tree-sitter test --file-name types)
66
114
  ```
67
115
 
68
116
  ## File-type detection
69
117
 
70
- Both grammars claim the same file extensions. Tools that respect `content-regex` in
118
+ Both `ucode` and `ucode_markup` claim the same file extensions. Tools that respect `content-regex` in
71
119
  `tree-sitter.json` (including the tree-sitter CLI ≥ 0.24) automatically route
72
- template files to `ucode_markup` when a tag opener appears at the start of a line.
120
+ template files to `ucode_markup` when a tag opener (`{%`, `{{`, or `{#`) appears at
121
+ the start of a line (with optional leading whitespace).
73
122
  Editors that manage their own filetype dispatch (Neovim, Helix) need an explicit
74
123
  rule — see the editor sections below.
75
124
 
@@ -102,11 +151,15 @@ grammar = "ucode_markup"
102
151
 
103
152
  [[grammar]]
104
153
  name = "ucode"
105
- source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.5.0" }
154
+ source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.6.0" }
106
155
 
107
156
  [[grammar]]
108
157
  name = "ucode_markup"
109
- source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.5.0", subpath = "markup" }
158
+ source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.6.0", subpath = "markup" }
159
+
160
+ [[grammar]]
161
+ name = "ucdocs"
162
+ source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.6.0", subpath = "ucdocs" }
110
163
  ```
111
164
 
112
165
  Helix does not support content-based filetype detection for shared extensions. For
package/grammar.js CHANGED
@@ -77,7 +77,6 @@ module.exports = grammar({
77
77
 
78
78
  inline: $ => [
79
79
  $._call_signature,
80
- $._formal_parameter,
81
80
  $._expressions,
82
81
  $._semicolon,
83
82
  $._identifier,
@@ -125,7 +124,7 @@ module.exports = grammar({
125
124
  [$.primary_expression, $._for_header],
126
125
  [$.variable_declarator, $._for_header],
127
126
  [$.assignment_expression, $.pattern],
128
- [$.labeled_statement, $._property_name],
127
+ [$.primary_expression, $.delete_expression],
129
128
  ],
130
129
 
131
130
  word: $ => $.identifier,
@@ -208,18 +207,22 @@ module.exports = grammar({
208
207
  field('clause', $.export_clause),
209
208
  $._semicolon,
210
209
  ),
210
+ // `export let/const …;` already carries its own terminating semicolon.
211
211
  seq(
212
212
  'export',
213
- choice(
214
- field('declaration', $.declaration),
215
- seq(
216
- 'default',
217
- seq(
218
- field('value', $.expression),
219
- ';',
220
- ),
221
- ),
222
- ),
213
+ field('declaration', $.lexical_declaration),
214
+ ),
215
+ // `export function f() {}` requires an explicit trailing `;` in ucode.
216
+ seq(
217
+ 'export',
218
+ field('declaration', $.function_declaration),
219
+ ';',
220
+ ),
221
+ seq(
222
+ 'export',
223
+ 'default',
224
+ field('value', $.expression),
225
+ ';',
223
226
  ),
224
227
  ),
225
228
 
@@ -322,7 +325,6 @@ module.exports = grammar({
322
325
  $.continue_statement,
323
326
  $.return_statement,
324
327
  $.empty_statement,
325
- $.labeled_statement,
326
328
  ),
327
329
 
328
330
  expression_statement: $ => seq(
@@ -330,10 +332,19 @@ module.exports = grammar({
330
332
  $._semicolon,
331
333
  ),
332
334
 
333
- lexical_declaration: $ => seq(
334
- field('kind', choice('let', 'const')),
335
- commaSep1($.variable_declarator),
336
- $._semicolon,
335
+ // `let` declarators may omit the initializer; `const` requires one.
336
+ // ucode rejects `const a;` ("Expecting initializer expression").
337
+ lexical_declaration: $ => choice(
338
+ seq(
339
+ field('kind', 'let'),
340
+ commaSep1($.variable_declarator),
341
+ $._semicolon,
342
+ ),
343
+ seq(
344
+ field('kind', 'const'),
345
+ commaSep1(alias($._const_declarator, $.variable_declarator)),
346
+ $._semicolon,
347
+ ),
337
348
  ),
338
349
 
339
350
  variable_declarator: $ => seq(
@@ -341,6 +352,11 @@ module.exports = grammar({
341
352
  optional($._initializer),
342
353
  ),
343
354
 
355
+ _const_declarator: $ => seq(
356
+ field('name', $.identifier),
357
+ $._initializer,
358
+ ),
359
+
344
360
  statement_block: $ => prec.right(seq(
345
361
  '{',
346
362
  repeat($.statement),
@@ -512,17 +528,19 @@ module.exports = grammar({
512
528
  ),
513
529
  ),
514
530
 
515
- // Supports both `for (k in obj)` and `for (k, v in obj)` (ucode two-variable form)
531
+ // Supports both `for (k in obj)` and `for (k, v in obj)` (ucode two-variable form).
532
+ // The loop target must be a plain identifier: ucode rejects member/subscript
533
+ // targets (`for (a.x in o)`) and only `let` (never `const`) may declare it.
516
534
  _for_header: $ => seq(
517
535
  '(',
518
536
  choice(
519
537
  seq(
520
- field('kind', choice('let', 'const')),
538
+ field('kind', 'let'),
521
539
  field('left', $.identifier),
522
540
  optional(seq(',', field('value', $.identifier))),
523
541
  ),
524
542
  seq(
525
- field('left', $._lhs_expression),
543
+ field('left', $.identifier),
526
544
  optional(seq(',', field('value', $.identifier))),
527
545
  ),
528
546
  ),
@@ -565,15 +583,14 @@ module.exports = grammar({
565
583
  optional(field('handler', $.catch_clause)),
566
584
  ),
567
585
 
586
+ // ucode has no labeled statements, so `break`/`continue` take no label.
568
587
  break_statement: $ => seq(
569
588
  'break',
570
- field('label', optional(alias($.identifier, $.statement_identifier))),
571
589
  $._semicolon,
572
590
  ),
573
591
 
574
592
  continue_statement: $ => seq(
575
593
  'continue',
576
- field('label', optional(alias($.identifier, $.statement_identifier))),
577
594
  $._semicolon,
578
595
  ),
579
596
 
@@ -585,12 +602,6 @@ module.exports = grammar({
585
602
 
586
603
  empty_statement: _ => ';',
587
604
 
588
- labeled_statement: $ => prec.dynamic(-1, seq(
589
- field('label', alias(choice($.identifier, $._reserved_identifier), $.statement_identifier)),
590
- ':',
591
- field('body', $.statement),
592
- )),
593
-
594
605
  //
595
606
  // Statement components
596
607
  //
@@ -640,6 +651,7 @@ module.exports = grammar({
640
651
  $.assignment_expression,
641
652
  $.augmented_assignment_expression,
642
653
  $.unary_expression,
654
+ $.delete_expression,
643
655
  $.binary_expression,
644
656
  $.ternary_expression,
645
657
  $.update_expression,
@@ -666,25 +678,33 @@ module.exports = grammar({
666
678
  $.call_expression,
667
679
  ),
668
680
 
681
+ // ucode allows a trailing comma but NOT interior elision (`{a:1,,b:2}`).
669
682
  object: $ => prec('object', seq(
670
683
  '{',
671
- commaSep(optional(choice(
672
- $.pair,
673
- $.spread_element,
674
- alias(
675
- choice($.identifier, $._reserved_identifier),
676
- $.shorthand_property_identifier,
677
- ),
678
- ))),
684
+ optional(seq(
685
+ commaSep1(choice(
686
+ $.pair,
687
+ $.spread_element,
688
+ alias(
689
+ choice($.identifier, $._reserved_identifier),
690
+ $.shorthand_property_identifier,
691
+ ),
692
+ )),
693
+ optional(','),
694
+ )),
679
695
  '}',
680
696
  )),
681
697
 
698
+ // ucode allows a trailing comma but NOT interior elision (`[1,,2]`).
682
699
  array: $ => seq(
683
700
  '[',
684
- commaSep(optional(choice(
685
- $.expression,
686
- $.spread_element,
687
- ))),
701
+ optional(seq(
702
+ commaSep1(choice(
703
+ $.expression,
704
+ $.spread_element,
705
+ )),
706
+ optional(','),
707
+ )),
688
708
  ']',
689
709
  ),
690
710
 
@@ -795,10 +815,17 @@ module.exports = grammar({
795
815
  ),
796
816
 
797
817
  unary_expression: $ => prec.left('unary_void', seq(
798
- field('operator', choice('!', '~', '-', '+', 'delete')),
818
+ field('operator', choice('!', '~', '-', '+')),
799
819
  field('argument', $.expression),
800
820
  )),
801
821
 
822
+ // ucode's `delete` only accepts a property access (member or subscript)
823
+ // expression: `delete o.k` / `delete o["k"]`. `delete x` is a syntax error.
824
+ delete_expression: $ => prec.left('unary_void', seq(
825
+ field('operator', 'delete'),
826
+ field('argument', choice($.member_expression, $.subscript_expression)),
827
+ )),
828
+
802
829
  update_expression: $ => prec.left(choice(
803
830
  seq(
804
831
  field('argument', $.expression),
@@ -851,13 +878,16 @@ module.exports = grammar({
851
878
  ),
852
879
 
853
880
  _call_signature: $ => field('parameters', $.formal_parameters),
854
- _formal_parameter: $ => choice($.identifier, $.rest_element),
855
881
 
882
+ // A rest element (`...name`) must be the final parameter, and there can be
883
+ // only one. ucode rejects a rest param followed by anything — including a
884
+ // trailing comma. A trailing comma is allowed only after plain parameters.
856
885
  formal_parameters: $ => seq(
857
886
  '(',
858
- optional(seq(
859
- commaSep1($._formal_parameter),
860
- optional(','),
887
+ optional(choice(
888
+ $.rest_element,
889
+ seq(commaSep1($.identifier), ',', $.rest_element),
890
+ seq(commaSep1($.identifier), optional(',')),
861
891
  )),
862
892
  ')',
863
893
  ),
@@ -892,7 +922,8 @@ module.exports = grammar({
892
922
  unescaped_double_string_fragment: _ => token.immediate(prec(1, /[^"\\\r\n]+/)),
893
923
  unescaped_single_string_fragment: _ => token.immediate(prec(1, /[^'\\\r\n]+/)),
894
924
 
895
- // Ucode extends JS escapes with \e (ESC), \a (BEL), and octal sequences
925
+ // Ucode extends JS escapes with \e (ESC), \a (BEL), and octal sequences.
926
+ // Unlike JS, ucode supports only the 4-hex `\uXXXX` form \u2014 not `\u{...}`.
896
927
  escape_sequence: _ => token.immediate(seq(
897
928
  '\\',
898
929
  choice(
@@ -900,7 +931,6 @@ module.exports = grammar({
900
931
  /[0-7]{1,3}/,
901
932
  /x[0-9a-fA-F]{2}/,
902
933
  /u[0-9a-fA-F]{4}/,
903
- /u\{[0-9a-fA-F]+\}/,
904
934
  /\r[\n\u2028\u2029]/,
905
935
  ),
906
936
  )),
@@ -948,27 +978,30 @@ module.exports = grammar({
948
978
  // - C-style legacy octal: 0177
949
979
  // - Hex float: 0x1.8
950
980
  // - Uppercase prefixes: 0O, 0B (already in JS grammar)
981
+ //
982
+ // Unlike JS, ucode does NOT support:
983
+ // - numeric underscore separators (1_000)
984
+ // - leading-dot floats (.5) — a digit is required before the dot
951
985
  number: _ => {
952
- const hexDigits = /[\da-fA-F](_?[\da-fA-F])*/;
986
+ const hexDigits = /[\da-fA-F]+/;
953
987
  const hexLiteral = seq(choice('0x', '0X'), hexDigits);
954
988
  const hexFloat = seq(choice('0x', '0X'), hexDigits, '.', optional(hexDigits));
955
989
 
956
- const decimalDigits = /\d(_?\d)*/;
990
+ const decimalDigits = /\d+/;
957
991
  const signedInteger = seq(optional(choice('-', '+')), decimalDigits);
958
992
  const exponentPart = seq(choice('e', 'E'), signedInteger);
959
993
 
960
- const binaryLiteral = seq(choice('0b', '0B'), /[0-1](_?[0-1])*/);
961
- const octalLiteral = seq(choice('0o', '0O'), /[0-7](_?[0-7])*/);
994
+ const binaryLiteral = seq(choice('0b', '0B'), /[0-1]+/);
995
+ const octalLiteral = seq(choice('0o', '0O'), /[0-7]+/);
962
996
  const legacyOctalLiteral = seq('0', /[0-7]+/);
963
997
 
964
998
  const decimalIntegerLiteral = choice(
965
999
  '0',
966
- seq(optional('0'), /[1-9]/, optional(seq(optional('_'), decimalDigits))),
1000
+ seq(optional('0'), /[1-9]/, optional(decimalDigits)),
967
1001
  );
968
1002
 
969
1003
  const decimalLiteral = choice(
970
1004
  seq(decimalIntegerLiteral, '.', optional(decimalDigits), optional(exponentPart)),
971
- seq('.', decimalDigits, optional(exponentPart)),
972
1005
  seq(decimalIntegerLiteral, exponentPart),
973
1006
  decimalDigits,
974
1007
  );
@@ -985,9 +1018,12 @@ module.exports = grammar({
985
1018
 
986
1019
  _identifier: $ => $.identifier,
987
1020
 
1021
+ // ucode does NOT support unicode escape sequences in identifiers (unlike JS);
1022
+ // a literal `\u...` is an "Unexpected character". Non-ASCII letters are still
1023
+ // allowed directly via the negated character class.
988
1024
  identifier: _ => {
989
- const alpha = /[^\x00-\x1F\s\p{Zs}0-9:;`"'@#.,|^&<=>+\-*/\\%?!~()\[\]{}\uFEFF\u2060\u200B\u2028\u2029]|\\u[0-9a-fA-F]{4}|\\u\{[0-9a-fA-F]+\}/;
990
- const alphanumeric = /[^\x00-\x1F\s\p{Zs}:;`"'@#.,|^&<=>+\-*/\\%?!~()\[\]{}\uFEFF\u2060\u200B\u2028\u2029]|\\u[0-9a-fA-F]{4}|\\u\{[0-9a-fA-F]+\}/;
1025
+ const alpha = /[^\x00-\x1F\s\p{Zs}0-9:;`"'@#.,|^&<=>+\-*/\\%?!~()\[\]{}\uFEFF\u2060\u200B\u2028\u2029]/;
1026
+ const alphanumeric = /[^\x00-\x1F\s\p{Zs}:;`"'@#.,|^&<=>+\-*/\\%?!~()\[\]{}\uFEFF\u2060\u200B\u2028\u2029]/;
991
1027
  return token(seq(alpha, repeat(alphanumeric)));
992
1028
  },
993
1029
 
@@ -996,9 +1032,11 @@ module.exports = grammar({
996
1032
  false: _ => 'false',
997
1033
  null: _ => 'null',
998
1034
 
1035
+ // Unlike arrays/objects, ucode call arguments allow neither interior
1036
+ // elision (`f(1,,2)`) nor a trailing comma (`f(1,2,)`).
999
1037
  arguments: $ => seq(
1000
1038
  '(',
1001
- commaSep(optional(choice($.expression, $.spread_element))),
1039
+ commaSep(choice($.expression, $.spread_element)),
1002
1040
  ')',
1003
1041
  ),
1004
1042