npm - tree-sitter-ucode - Versions diffs - 0.6.0 → 0.7.0 - Mend

tree-sitter-ucode 0.6.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/README.md +72 -19
package/grammar.js +94 -56
package/markup/grammar.js +94 -56
package/markup/src/grammar.json +326 -398
package/markup/src/node-types.json +32 -78
package/markup/src/parser.c +85950 -85389
package/package.json +1 -1
package/prebuilds/darwin-arm64/tree-sitter-ucode.node +0 -0
package/prebuilds/linux-arm64/tree-sitter-ucode.node +0 -0
package/prebuilds/linux-x64/tree-sitter-ucode.node +0 -0
package/prebuilds/win32-x64/tree-sitter-ucode.node +0 -0
package/queries/highlights.scm +0 -1
package/src/grammar.json +326 -398
package/src/node-types.json +32 -78
package/src/parser.c +71124 -70595
package/src/scanner_impl.h +8 -0
package/tree-sitter-ucode.wasm +0 -0
package/tree-sitter-ucode_markup.wasm +0 -0
package/tree-sitter.json +1 -1
package/ucdocs/grammar.js +7 -6
package/ucdocs/src/grammar.json +24 -6
package/ucdocs/src/node-types.json +8 -0
package/ucdocs/src/parser.c +2278 -2178

package/README.md CHANGED Viewed

@@ -2,17 +2,24 @@
 Tree-sitter grammar for [ucode](https://github.com/jow-/ucode), the ECMAScript-like scripting language used in OpenWrt.
-Two grammars are provided:
+Three grammars are provided:
-| Grammar | Scope | File types |
-|---------|-------|------------|
-| `ucode` | `source.uc` | `.uc`, `.ucode`, `.ut` |
-| `ucode_markup` | `source.ucode.markup` | `.uc`, `.ucode`, `.ut`, `.uc.tmpl` (template files — detected by content) |
+| Grammar | Scope | File types | Purpose |
+|---------|-------|------------|---------|
+| `ucode` | `source.uc` | `.uc`, `.ucode`, `.ut` | Plain ucode source files |
+| `ucode_markup` | `source.ucode.markup` | `.uc`, `.ucode`, `.ut` (template files — detected by content) | Ucode template files mixing raw text and code tags |
+| `ucdocs` | — | injected | JSDoc-style `/** */` doc comment blocks |
+`ucode` and `ucode_markup` share file extensions. Template files are distinguished from plain
+code files by content: any file containing a tag opener (`{%`, `{{`, or `{#`) at the start
+of a line (with optional leading whitespace) is automatically parsed by `ucode_markup`. Plain
+code files fall back to `ucode`. See [File-type detection](#file-type-detection) below.
-Both grammars share the same file extensions. Template files are distinguished from plain
-code files by content: any file whose first tag opener (`{%`, `{{`, or `{#`) appears at
-the start of a line is automatically parsed by `ucode_markup`. Plain code files fall back
-to `ucode`. See [File-type detection](#file-type-detection) below.
+`ucdocs` is not a standalone file grammar — it is automatically injected by the `ucode` and
+`ucode_markup` grammars into every `/** */` doc comment block. Tools that load grammars
+directly from `tree-sitter.json` (including the tree-sitter CLI) handle this automatically.
+Editor plugins may require registering the `ucdocs` grammar separately — see the editor
+sections below.
 ## Ucode vs JavaScript
@@ -26,9 +33,45 @@ Ucode is an ECMAScript subset with OpenWrt-specific extensions. Key differences:
 | Removed features | Destructuring, `for...of`, `do-while`, generators, forward declarations, dynamic `import()` | All supported |
 | Added number literals | `0177` (C octal), `0x1.8` (hex float), `0B`/`0O` prefixes | Standard only |
 | Added escape sequences | `\e` (ESC), `\a` (BEL), octal `\177` | Standard only |
+| String unicode escapes | `\uXXXX` only (no `\u{…}`); no `\u` escapes in identifiers | `\uXXXX` and `\u{…}` |
 | Regex flags | `g`, `i`, `s` only | Full set |
 | Module system | Static `import`/`export` only; no `from` on re-exports | Full ES modules |
+The grammar tracks ucode's parser closely, with one deliberate exception: **automatic
+semicolon insertion is kept ECMAScript-style (more lenient than the compiler).** ucode
+only lets you drop a statement's `;` before `}`, end-of-file, a template tag close, or an
+alt-syntax end keyword (`endif`/`endfor`/`endwhile`/`endfunction`/`elif`/`else`), whereas
+the grammar also tolerates a bare newline between statements so that in-progress edits are
+not flagged as errors.
+## Doc comment grammar (ucdocs)
+`/** */` blocks are parsed by the `ucdocs` grammar and injected into the host parse tree.
+The grammar understands the following tags:
+| Tag | Syntax |
+|-----|--------|
+| `@param` | `@param {Type} name description` |
+| `@returns` / `@return` | `@returns {Type} description` |
+| `@throws` / `@throw` | `@throws {Type} description` |
+| `@type` | `@type {Type}` |
+| `@typedef` | `@typedef {Type} TypeName` |
+| `@template` | `@template T, U` |
+| `@function` | `@function module:path#member` |
+| `@module` | `@module name` |
+| `@deprecated` | `@deprecated description` |
+| `@since` | `@since version` |
+| `@see` | `@see reference` |
+| `@example` | `@example code` |
+| `@default` | `@default value` |
+Type expressions support: primitives (`int`, `float`, `string`, `boolean`, `null`, `void`,
+`function`), `*`/`any`, `list<T>`, `dict<T>`, record types (`{field: T}`), named types
+(`TypeName`, `TypeName<T, U>`), cross-module refs (`module:path.To.Type`), named function
+types `(name: T) => U`, anonymous function types `function(T): U`, union `T | U`, nullable
+`?T`, and array postfix `T[]`. Inline `{@link ...}` tags and optional params `[name=default]`
+are also supported.
 ## Requirements
 - [tree-sitter CLI](https://github.com/tree-sitter/tree-sitter) ≥ 0.24
@@ -38,10 +81,10 @@ Ucode is an ECMAScript subset with OpenWrt-specific extensions. Key differences:
 ```sh
 npm install
-npm run build        # generate + compile Node.js bindings
+npm run build        # generate + compile Node.js bindings (ucode and ucode_markup only)
 ```
-To regenerate parsers after editing a grammar file:
+To regenerate parsers after editing a grammar file (run from the repo root):
 ```sh
 # ucode grammar
@@ -49,27 +92,33 @@ npx tree-sitter generate
 # ucode_markup grammar (generated from grammar.js — do not edit markup/grammar.js directly)
 node scripts/generate-markup-grammar.js
-cd markup && npx tree-sitter generate
+npx tree-sitter generate markup/grammar.js --output markup/src
+# ucdocs grammar (not included in npm run build — must be regenerated manually)
+npx tree-sitter generate ucdocs/grammar.js --output ucdocs/src
 ```
 ## Test
 ```sh
-npm test             # runs tree-sitter test for ucode and ucode_markup
+npm test             # builds and tests all three grammars (ucode, ucode_markup, ucdocs)
 ```
-To filter by corpus file name:
+To filter by corpus file name (run from the repo root):
 ```sh
 npx tree-sitter test --file-name control_flow
-cd markup && npx tree-sitter test --file-name markup
+(cd markup && npx tree-sitter test --file-name markup)
+(cd ucdocs && npx tree-sitter test --file-name tags)
+(cd ucdocs && npx tree-sitter test --file-name types)
 ```
 ## File-type detection
-Both grammars claim the same file extensions. Tools that respect `content-regex` in
+Both `ucode` and `ucode_markup` claim the same file extensions. Tools that respect `content-regex` in
 `tree-sitter.json` (including the tree-sitter CLI ≥ 0.24) automatically route
-template files to `ucode_markup` when a tag opener appears at the start of a line.
+template files to `ucode_markup` when a tag opener (`{%`, `{{`, or `{#`) appears at
+the start of a line (with optional leading whitespace).
 Editors that manage their own filetype dispatch (Neovim, Helix) need an explicit
 rule — see the editor sections below.
@@ -102,11 +151,15 @@ grammar       = "ucode_markup"
 [[grammar]]
 name   = "ucode"
-source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.5.0" }
+source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.6.0" }
 [[grammar]]
 name   = "ucode_markup"
-source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.5.0", subpath = "markup" }
+source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.6.0", subpath = "markup" }
+[[grammar]]
+name   = "ucdocs"
+source = { git = "https://github.com/m00qek/tree-sitter-ucode", rev = "v0.6.0", subpath = "ucdocs" }
 ```
 Helix does not support content-based filetype detection for shared extensions. For

package/grammar.js CHANGED Viewed

@@ -77,7 +77,6 @@ module.exports = grammar({
   inline: $ => [
     $._call_signature,
-    $._formal_parameter,
     $._expressions,
     $._semicolon,
     $._identifier,
@@ -125,7 +124,7 @@ module.exports = grammar({
     [$.primary_expression, $._for_header],
     [$.variable_declarator, $._for_header],
     [$.assignment_expression, $.pattern],
-    [$.labeled_statement, $._property_name],
+    [$.primary_expression, $.delete_expression],
   ],
   word: $ => $.identifier,
@@ -208,18 +207,22 @@ module.exports = grammar({
         field('clause', $.export_clause),
         $._semicolon,
       ),
+      // `export let/const …;` already carries its own terminating semicolon.
       seq(
         'export',
-        choice(
-          field('declaration', $.declaration),
-          seq(
-            'default',
-            seq(
-              field('value', $.expression),
-              ';',
-            ),
-          ),
-        ),
+        field('declaration', $.lexical_declaration),
+      ),
+      // `export function f() {}` requires an explicit trailing `;` in ucode.
+      seq(
+        'export',
+        field('declaration', $.function_declaration),
+        ';',
+      ),
+      seq(
+        'export',
+        'default',
+        field('value', $.expression),
+        ';',
       ),
     ),
@@ -322,7 +325,6 @@ module.exports = grammar({
       $.continue_statement,
       $.return_statement,
       $.empty_statement,
-      $.labeled_statement,
     ),
     expression_statement: $ => seq(
@@ -330,10 +332,19 @@ module.exports = grammar({
       $._semicolon,
     ),
-    lexical_declaration: $ => seq(
-      field('kind', choice('let', 'const')),
-      commaSep1($.variable_declarator),
-      $._semicolon,
+    // `let` declarators may omit the initializer; `const` requires one.
+    // ucode rejects `const a;` ("Expecting initializer expression").
+    lexical_declaration: $ => choice(
+      seq(
+        field('kind', 'let'),
+        commaSep1($.variable_declarator),
+        $._semicolon,
+      ),
+      seq(
+        field('kind', 'const'),
+        commaSep1(alias($._const_declarator, $.variable_declarator)),
+        $._semicolon,
+      ),
     ),
     variable_declarator: $ => seq(
@@ -341,6 +352,11 @@ module.exports = grammar({
       optional($._initializer),
     ),
+    _const_declarator: $ => seq(
+      field('name', $.identifier),
+      $._initializer,
+    ),
     statement_block: $ => prec.right(seq(
       '{',
       repeat($.statement),
@@ -512,17 +528,19 @@ module.exports = grammar({
       ),
     ),
-    // Supports both `for (k in obj)` and `for (k, v in obj)` (ucode two-variable form)
+    // Supports both `for (k in obj)` and `for (k, v in obj)` (ucode two-variable form).
+    // The loop target must be a plain identifier: ucode rejects member/subscript
+    // targets (`for (a.x in o)`) and only `let` (never `const`) may declare it.
     _for_header: $ => seq(
       '(',
       choice(
         seq(
-          field('kind', choice('let', 'const')),
+          field('kind', 'let'),
           field('left', $.identifier),
           optional(seq(',', field('value', $.identifier))),
         ),
         seq(
-          field('left', $._lhs_expression),
+          field('left', $.identifier),
           optional(seq(',', field('value', $.identifier))),
         ),
       ),
@@ -565,15 +583,14 @@ module.exports = grammar({
       optional(field('handler', $.catch_clause)),
     ),
+    // ucode has no labeled statements, so `break`/`continue` take no label.
     break_statement: $ => seq(
       'break',
-      field('label', optional(alias($.identifier, $.statement_identifier))),
       $._semicolon,
     ),
     continue_statement: $ => seq(
       'continue',
-      field('label', optional(alias($.identifier, $.statement_identifier))),
       $._semicolon,
     ),
@@ -585,12 +602,6 @@ module.exports = grammar({
     empty_statement: _ => ';',
-    labeled_statement: $ => prec.dynamic(-1, seq(
-      field('label', alias(choice($.identifier, $._reserved_identifier), $.statement_identifier)),
-      ':',
-      field('body', $.statement),
-    )),
     //
     // Statement components
     //
@@ -640,6 +651,7 @@ module.exports = grammar({
       $.assignment_expression,
       $.augmented_assignment_expression,
       $.unary_expression,
+      $.delete_expression,
       $.binary_expression,
       $.ternary_expression,
       $.update_expression,
@@ -666,25 +678,33 @@ module.exports = grammar({
       $.call_expression,
     ),
+    // ucode allows a trailing comma but NOT interior elision (`{a:1,,b:2}`).
     object: $ => prec('object', seq(
       '{',
-      commaSep(optional(choice(
-        $.pair,
-        $.spread_element,
-        alias(
-          choice($.identifier, $._reserved_identifier),
-          $.shorthand_property_identifier,
-        ),
-      ))),
+      optional(seq(
+        commaSep1(choice(
+          $.pair,
+          $.spread_element,
+          alias(
+            choice($.identifier, $._reserved_identifier),
+            $.shorthand_property_identifier,
+          ),
+        )),
+        optional(','),
+      )),
       '}',
     )),
+    // ucode allows a trailing comma but NOT interior elision (`[1,,2]`).
     array: $ => seq(
       '[',
-      commaSep(optional(choice(
-        $.expression,
-        $.spread_element,
-      ))),
+      optional(seq(
+        commaSep1(choice(
+          $.expression,
+          $.spread_element,
+        )),
+        optional(','),
+      )),
       ']',
     ),
@@ -795,10 +815,17 @@ module.exports = grammar({
     ),
     unary_expression: $ => prec.left('unary_void', seq(
-      field('operator', choice('!', '~', '-', '+', 'delete')),
+      field('operator', choice('!', '~', '-', '+')),
       field('argument', $.expression),
     )),
+    // ucode's `delete` only accepts a property access (member or subscript)
+    // expression: `delete o.k` / `delete o["k"]`. `delete x` is a syntax error.
+    delete_expression: $ => prec.left('unary_void', seq(
+      field('operator', 'delete'),
+      field('argument', choice($.member_expression, $.subscript_expression)),
+    )),
     update_expression: $ => prec.left(choice(
       seq(
         field('argument', $.expression),
@@ -851,13 +878,16 @@ module.exports = grammar({
     ),
     _call_signature: $ => field('parameters', $.formal_parameters),
-    _formal_parameter: $ => choice($.identifier, $.rest_element),
+    // A rest element (`...name`) must be the final parameter, and there can be
+    // only one.  ucode rejects a rest param followed by anything — including a
+    // trailing comma.  A trailing comma is allowed only after plain parameters.
     formal_parameters: $ => seq(
       '(',
-      optional(seq(
-        commaSep1($._formal_parameter),
-        optional(','),
+      optional(choice(
+        $.rest_element,
+        seq(commaSep1($.identifier), ',', $.rest_element),
+        seq(commaSep1($.identifier), optional(',')),
       )),
       ')',
     ),
@@ -892,7 +922,8 @@ module.exports = grammar({
     unescaped_double_string_fragment: _ => token.immediate(prec(1, /[^"\\\r\n]+/)),
     unescaped_single_string_fragment: _ => token.immediate(prec(1, /[^'\\\r\n]+/)),
-    // Ucode extends JS escapes with \e (ESC), \a (BEL), and octal sequences
+    // Ucode extends JS escapes with \e (ESC), \a (BEL), and octal sequences.
+    // Unlike JS, ucode supports only the 4-hex `\uXXXX` form \u2014 not `\u{...}`.
     escape_sequence: _ => token.immediate(seq(
       '\\',
       choice(
@@ -900,7 +931,6 @@ module.exports = grammar({
         /[0-7]{1,3}/,
         /x[0-9a-fA-F]{2}/,
         /u[0-9a-fA-F]{4}/,
-        /u\{[0-9a-fA-F]+\}/,
         /\r[\n\u2028\u2029]/,
       ),
     )),
@@ -948,27 +978,30 @@ module.exports = grammar({
     // - C-style legacy octal: 0177
     // - Hex float: 0x1.8
     // - Uppercase prefixes: 0O, 0B (already in JS grammar)
+    //
+    // Unlike JS, ucode does NOT support:
+    // - numeric underscore separators (1_000)
+    // - leading-dot floats (.5) — a digit is required before the dot
     number: _ => {
-      const hexDigits = /[\da-fA-F](_?[\da-fA-F])*/;
+      const hexDigits = /[\da-fA-F]+/;
       const hexLiteral = seq(choice('0x', '0X'), hexDigits);
       const hexFloat = seq(choice('0x', '0X'), hexDigits, '.', optional(hexDigits));
-      const decimalDigits = /\d(_?\d)*/;
+      const decimalDigits = /\d+/;
       const signedInteger = seq(optional(choice('-', '+')), decimalDigits);
       const exponentPart = seq(choice('e', 'E'), signedInteger);
-      const binaryLiteral = seq(choice('0b', '0B'), /[0-1](_?[0-1])*/);
-      const octalLiteral = seq(choice('0o', '0O'), /[0-7](_?[0-7])*/);
+      const binaryLiteral = seq(choice('0b', '0B'), /[0-1]+/);
+      const octalLiteral = seq(choice('0o', '0O'), /[0-7]+/);
       const legacyOctalLiteral = seq('0', /[0-7]+/);
       const decimalIntegerLiteral = choice(
         '0',
-        seq(optional('0'), /[1-9]/, optional(seq(optional('_'), decimalDigits))),
+        seq(optional('0'), /[1-9]/, optional(decimalDigits)),
       );
       const decimalLiteral = choice(
         seq(decimalIntegerLiteral, '.', optional(decimalDigits), optional(exponentPart)),
-        seq('.', decimalDigits, optional(exponentPart)),
         seq(decimalIntegerLiteral, exponentPart),
         decimalDigits,
       );
@@ -985,9 +1018,12 @@ module.exports = grammar({
     _identifier: $ => $.identifier,
+    // ucode does NOT support unicode escape sequences in identifiers (unlike JS);
+    // a literal `\u...` is an "Unexpected character".  Non-ASCII letters are still
+    // allowed directly via the negated character class.
     identifier: _ => {
-      const alpha = /[^\x00-\x1F\s\p{Zs}0-9:;`"'@#.,|^&<=>+\-*/\\%?!~()\[\]{}\uFEFF\u2060\u200B\u2028\u2029]|\\u[0-9a-fA-F]{4}|\\u\{[0-9a-fA-F]+\}/;
-      const alphanumeric = /[^\x00-\x1F\s\p{Zs}:;`"'@#.,|^&<=>+\-*/\\%?!~()\[\]{}\uFEFF\u2060\u200B\u2028\u2029]|\\u[0-9a-fA-F]{4}|\\u\{[0-9a-fA-F]+\}/;
+      const alpha = /[^\x00-\x1F\s\p{Zs}0-9:;`"'@#.,|^&<=>+\-*/\\%?!~()\[\]{}\uFEFF\u2060\u200B\u2028\u2029]/;
+      const alphanumeric = /[^\x00-\x1F\s\p{Zs}:;`"'@#.,|^&<=>+\-*/\\%?!~()\[\]{}\uFEFF\u2060\u200B\u2028\u2029]/;
       return token(seq(alpha, repeat(alphanumeric)));
     },
@@ -996,9 +1032,11 @@ module.exports = grammar({
     false: _ => 'false',
     null: _ => 'null',
+    // Unlike arrays/objects, ucode call arguments allow neither interior
+    // elision (`f(1,,2)`) nor a trailing comma (`f(1,2,)`).
     arguments: $ => seq(
       '(',
-      commaSep(optional(choice($.expression, $.spread_element))),
+      commaSep(choice($.expression, $.spread_element)),
       ')',
     ),