@markuplint/parser-utils 4.8.10 → 5.0.0-alpha.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,208 @@
1
+ # @markuplint/parser-utils
2
+
3
+ ## 概要
4
+
5
+ `@markuplint/parser-utils` は全 markuplint パーサーの共通基盤パッケージです。完全なパースパイプラインを実装する抽象 `Parser` クラスと、トークン化、エラー処理、デバッグ、AST 操作のためのユーティリティモジュール群を提供します。すべてのマークアップ言語パーサー(HTML、JSX、Vue、Svelte、Astro、Pug)はこのパッケージの `Parser` クラスを拡張し、言語固有の AST ノードを `@markuplint/ml-ast` で定義された統一 markuplint AST 形式に変換します。
6
+
7
+ ## ディレクトリ構成
8
+
9
+ ```
10
+ src/
11
+ ├── index.ts — 全パブリック API の再エクスポート
12
+ ├── parser.ts — abstract class Parser<Node, State>(約1825行、コア)
13
+ ├── types.ts — ParserOptions, ParseOptions, Token, ChildToken, IgnoreTag 等
14
+ ├── enums.ts — TagState, AttrState ステートマシン
15
+ ├── attr-tokenizer.ts — 属性トークナイザ(AttrState 使用)
16
+ ├── script-parser.ts — espree による JavaScript パース
17
+ ├── ignore-block.ts — テンプレート式のマスキングと復元
18
+ ├── ignore-front-matter.ts — YAML フロントマター検出・マスキング
19
+ ├── detect-element-type.ts — 要素種別判定(html/web-component/authored)
20
+ ├── idl-attributes.ts — IDL ↔ コンテンツ属性名マッピング(React 互換)
21
+ ├── debugger.ts — デバッグ・テスト用ユーティリティ
22
+ ├── parser-error.ts — ParserError, TargetParserError, ConfigParserError
23
+ ├── sort-nodes.ts — ノード位置ソート
24
+ ├── const.ts — MASK_CHAR, SVG 要素リスト, defaultSpaces
25
+ ├── get-location.ts — 行/列/オフセット計算ユーティリティ
26
+ └── decision.ts — カスタム要素名判定
27
+ ```
28
+
29
+ ## アーキテクチャ図
30
+
31
+ ```mermaid
32
+ flowchart TD
33
+ subgraph core ["コアモジュール"]
34
+ parser["parser.ts\nabstract Parser"]
35
+ end
36
+
37
+ subgraph utils ["ユーティリティモジュール"]
38
+ attrTokenizer["attr-tokenizer.ts\n属性トークナイザ"]
39
+ ignoreBlock["ignore-block.ts\nマスク・復元"]
40
+ ignoreFM["ignore-front-matter.ts\nフロントマター"]
41
+ detectType["detect-element-type.ts\n要素種別判定"]
42
+ scriptParser["script-parser.ts\nJS パース"]
43
+ idlAttrs["idl-attributes.ts\nIDL マッピング"]
44
+ end
45
+
46
+ subgraph support ["サポートモジュール"]
47
+ enums["enums.ts\nTagState, AttrState"]
48
+ errors["parser-error.ts\nエラークラス"]
49
+ debugger["debugger.ts\nデバッグ"]
50
+ location["get-location.ts\n位置計算"]
51
+ sort["sort-nodes.ts\nソート"]
52
+ consts["const.ts\n定数"]
53
+ decision["decision.ts\nカスタム要素名"]
54
+ end
55
+
56
+ subgraph deps ["外部依存"]
57
+ mlAst["@markuplint/ml-ast\n型定義"]
58
+ mlSpec["@markuplint/ml-spec\nvoid 要素"]
59
+ espree["espree\nJS トークナイザ"]
60
+ cryptoLib["node:crypto\nノード ID"]
61
+ end
62
+
63
+ parser --> attrTokenizer
64
+ parser --> ignoreBlock
65
+ parser --> ignoreFM
66
+ parser --> detectType
67
+ parser --> enums
68
+ parser --> errors
69
+ parser --> location
70
+ parser --> sort
71
+ parser --> consts
72
+
73
+ attrTokenizer --> enums
74
+ attrTokenizer --> scriptParser
75
+ scriptParser --> espree
76
+ detectType --> decision
77
+ ignoreBlock --> location
78
+ ignoreBlock --> errors
79
+
80
+ parser --> mlAst
81
+ parser --> mlSpec
82
+ parser --> cryptoLib
83
+ decision --> mlAst
84
+ ```
85
+
86
+ ## モジュール責務
87
+
88
+ | モジュール | 責務 | 主要エクスポート |
89
+ | ------------------------ | ------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------- |
90
+ | `parser.ts` | コアパースパイプライン、抽象 Parser クラス | `Parser` |
91
+ | `types.ts` | 型定義 | `ParserOptions`, `ParseOptions`, `Token`, `ChildToken`, `IgnoreTag`, `IgnoreBlock`, `QuoteSet`, `ValueType`, `SelfCloseType`, `Tokenized` |
92
+ | `enums.ts` | ステートマシン列挙型 | `TagState`, `AttrState` |
93
+ | `attr-tokenizer.ts` | 属性文字列のトークン化 | `attrTokenizer` |
94
+ | `script-parser.ts` | 組み込みスクリプトの JavaScript パース | `scriptParser`, `safeScriptParser` |
95
+ | `ignore-block.ts` | テンプレート式のマスキングと復元 | `ignoreBlock`, `restoreNode` |
96
+ | `ignore-front-matter.ts` | YAML フロントマター処理 | `ignoreFrontMatter` |
97
+ | `detect-element-type.ts` | 要素の分類 | `detectElementType` |
98
+ | `idl-attributes.ts` | IDL ↔ コンテンツ属性名マッピング | `searchIDLAttribute` |
99
+ | `debugger.ts` | テスト・デバッグユーティリティ | `nodeListToDebugMaps`, `attributesToDebugMaps`, `nodeTreeDebugView` |
100
+ | `parser-error.ts` | エラークラス | `ParserError`, `TargetParserError`, `ConfigParserError` |
101
+ | `sort-nodes.ts` | ノードの位置ソート | `sortNodes` |
102
+ | `const.ts` | 定数 | `MASK_CHAR`, `svgElementList`, `defaultSpaces` |
103
+ | `get-location.ts` | 位置計算 | `getPosition`, `getEndLine`, `getEndCol`, `getEndPosition`, `getOffsetsFromCode` |
104
+ | `decision.ts` | カスタム要素名判定 | `isPotentialCustomElementName`, `isSVGElement` |
105
+
106
+ ## パースパイプライン概要
107
+
108
+ Parser クラスの `parse()` メソッドは11ステップのパイプラインを実行します:
109
+
110
+ ```mermaid
111
+ flowchart LR
112
+ A["beforeParse"] --> B["frontMatter"]
113
+ B --> C["ignoreBlock"]
114
+ C --> D["tokenize"]
115
+ D --> E["traverse"]
116
+ E --> F["afterTraverse"]
117
+ F --> G["flatten"]
118
+ G --> H["afterFlatten"]
119
+ H --> I["restore"]
120
+ I --> J["afterParse"]
121
+ ```
122
+
123
+ 詳細は [Parser クラスリファレンス](docs/parser-class.ja.md) を参照してください。
124
+
125
+ ## ステートマシン概要
126
+
127
+ パッケージには2つのステートマシンがあります:
128
+
129
+ - **TagState** -- タグレベルのパースで使用(`<` 検出 → タグ名 → 属性 → `>` 検出)
130
+ - **AttrState** -- 属性レベルのパースで使用(名前 → `=` → 値)
131
+
132
+ 詳細な状態遷移図は [Parser クラスリファレンス](docs/parser-class.ja.md#ステートマシン) を参照してください。
133
+
134
+ ## エラー処理
135
+
136
+ ```
137
+ ParserError (基底クラス)
138
+ ├── line, col, raw — ソース位置情報
139
+ ├── TargetParserError — 特定要素に関するエラー(nodeName を含む)
140
+ └── ConfigParserError — 設定ファイルのエラー(filePath を含む)
141
+ ```
142
+
143
+ ## デバッグユーティリティ
144
+
145
+ - **`nodeListToDebugMaps`** -- AST ノードリストを人間が読めるデバッグ文字列に変換。スナップショットテストに使用
146
+ - **`attributesToDebugMaps`** -- 属性を名前、等号、値、引用符の各部品に分解して表示
147
+ - **`nodeTreeDebugView`** -- ツリー構造の可視化。深さ、親子関係、ペアノードを表示
148
+
149
+ ## IDL 属性マッピング
150
+
151
+ `searchIDLAttribute` は React スタイルの IDL 属性名と HTML コンテンツ属性名の双方向マッピングを提供します(例: `className` → `class`、`htmlFor` → `for`)。spec が `useIDLAttributeNames: true` を設定している場合に、`@markuplint/ml-core` の `MLAttr` コンストラクタから呼び出されます。
152
+
153
+ ## 外部依存
154
+
155
+ | 依存パッケージ | 用途 |
156
+ | --------------------- | ---------------------------------- |
157
+ | `@markuplint/ml-ast` | AST 型定義 |
158
+ | `@markuplint/ml-spec` | void 要素判定 |
159
+ | `@markuplint/types` | カスタム要素名検証 |
160
+ | `node:crypto` | AST ノード UUID 生成 |
161
+ | `debug` | パフォーマンスタイミング・ロギング |
162
+ | `espree` | JavaScript トークン化・パース |
163
+ | `type-fest` | TypeScript ユーティリティ型 |
164
+
165
+ ## 統合ポイント
166
+
167
+ ```mermaid
168
+ flowchart TD
169
+ subgraph upstream ["上流"]
170
+ mlAst["@markuplint/ml-ast\n型定義"]
171
+ mlSpec["@markuplint/ml-spec\nvoid 要素判定"]
172
+ end
173
+
174
+ subgraph pkg ["@markuplint/parser-utils"]
175
+ parser["abstract Parser"]
176
+ end
177
+
178
+ subgraph downstream ["下流(パーサー群)"]
179
+ html["@markuplint/html-parser"]
180
+ jsx["@markuplint/jsx-parser"]
181
+ vue["@markuplint/vue-parser"]
182
+ svelte["@markuplint/svelte-parser"]
183
+ astro["@markuplint/astro-parser"]
184
+ pug["@markuplint/pug-parser"]
185
+ end
186
+
187
+ subgraph indirect ["間接的"]
188
+ mlCore["@markuplint/ml-core"]
189
+ end
190
+
191
+ upstream -->|"型・ユーティリティ"| pkg
192
+ pkg -->|"Parser クラスを拡張"| downstream
193
+ downstream -->|"MLASTDocument を生成"| mlCore
194
+ ```
195
+
196
+ ### 上流
197
+
198
+ - **`@markuplint/ml-ast`** -- 全 AST 型定義(`MLASTElement`, `MLASTText` 等)
199
+ - **`@markuplint/ml-spec`** -- `isVoidElement` による自己閉じタグ判定
200
+
201
+ ### 下流
202
+
203
+ 6つのパーサーパッケージが Parser クラスを拡張: html-parser, jsx-parser, vue-parser, svelte-parser, astro-parser, pug-parser
204
+
205
+ ## ドキュメントマップ
206
+
207
+ - [Parser クラスリファレンス](docs/parser-class.ja.md) -- Parser クラスの完全リファレンス
208
+ - [メンテナンスガイド](docs/maintenance.ja.md) -- コマンド、レシピ、トラブルシューティング
@@ -0,0 +1,251 @@
1
+ # @markuplint/parser-utils
2
+
3
+ ## Overview
4
+
5
+ `@markuplint/parser-utils` is the shared foundation for all markuplint parsers. It provides the abstract `Parser` class that implements the full parsing pipeline and a set of utility modules for tokenization, error handling, debugging, and AST manipulation. Every markup language parser (HTML, JSX, Vue, Svelte, Astro, Pug) extends this package's `Parser` class to convert language-specific AST nodes into the unified markuplint AST format defined by `@markuplint/ml-ast`.
6
+
7
+ ## Directory Structure
8
+
9
+ ```
10
+ src/
11
+ ├── index.ts — Re-exports all public API
12
+ ├── parser.ts — Abstract class Parser<Node, State> (~1825 lines, core)
13
+ ├── types.ts — ParserOptions, ParseOptions, Token, ChildToken, IgnoreTag, etc.
14
+ ├── enums.ts — TagState, AttrState state machines
15
+ ├── attr-tokenizer.ts — Attribute tokenizer (uses AttrState)
16
+ ├── script-parser.ts — JavaScript parsing via espree
17
+ ├── ignore-block.ts — Template expression masking and restoration
18
+ ├── ignore-front-matter.ts — YAML front matter detection and masking
19
+ ├── detect-element-type.ts — Element type classification (html/web-component/authored)
20
+ ├── idl-attributes.ts — IDL <-> content attribute name mapping (React-compatible)
21
+ ├── debugger.ts — Debug and test utilities (nodeListToDebugMaps, etc.)
22
+ ├── debug.ts — Performance timer and debug logging via `debug` package
23
+ ├── parser-error.ts — ParserError, TargetParserError, ConfigParserError
24
+ ├── sort-nodes.ts — Node position sorting
25
+ ├── const.ts — MASK_CHAR, SVG element list, defaultSpaces
26
+ ├── get-location.ts — Line/column/offset calculation utilities
27
+ └── decision.ts — Custom element name detection
28
+ ```
29
+
30
+ ## Architecture Diagram
31
+
32
+ ```mermaid
33
+ flowchart TD
34
+ subgraph core ["Core"]
35
+ parser["parser.ts\nAbstract Parser Class"]
36
+ types["types.ts\nParserOptions, Token, etc."]
37
+ enums["enums.ts\nTagState, AttrState"]
38
+ end
39
+
40
+ subgraph tokenizer ["Tokenizer Utilities"]
41
+ attrTok["attr-tokenizer.ts\nattrTokenizer()"]
42
+ scriptParser["script-parser.ts\nscriptParser(), safeScriptParser()"]
43
+ end
44
+
45
+ subgraph masking ["Masking & Preprocessing"]
46
+ ignoreBlock["ignore-block.ts\nignoreBlock(), restoreNode()"]
47
+ ignoreFM["ignore-front-matter.ts\nignoreFrontMatter()"]
48
+ end
49
+
50
+ subgraph classification ["Classification"]
51
+ detectType["detect-element-type.ts\ndetectElementType()"]
52
+ decision["decision.ts\nisPotentialCustomElementName()"]
53
+ idlAttrs["idl-attributes.ts\nsearchIDLAttribute()"]
54
+ end
55
+
56
+ subgraph utilities ["Utilities"]
57
+ debugger["debugger.ts\nnodeListToDebugMaps()"]
58
+ debugMod["debug.ts\nPerformanceTimer, domLog()"]
59
+ parserError["parser-error.ts\nParserError hierarchy"]
60
+ sortNodes["sort-nodes.ts\nsortNodes()"]
61
+ getLoc["get-location.ts\ngetPosition(), getEndLine()"]
62
+ constMod["const.ts\nMASK_CHAR, svgElementList"]
63
+ end
64
+
65
+ subgraph external ["External Dependencies"]
66
+ mlAst["@markuplint/ml-ast\n(AST types)"]
67
+ mlSpec["@markuplint/ml-spec\n(void element detection)"]
68
+ mlTypes["@markuplint/types\n(custom element validation)"]
69
+ espree["espree\n(JS tokenization)"]
70
+ cryptoLib["node:crypto\n(node ID generation)"]
71
+ debugLib["debug\n(logging)"]
72
+ end
73
+
74
+ parser --> types
75
+ parser --> enums
76
+ parser --> attrTok
77
+ parser --> ignoreBlock
78
+ parser --> ignoreFM
79
+ parser --> detectType
80
+ parser --> sortNodes
81
+ parser --> getLoc
82
+ parser --> constMod
83
+ parser --> parserError
84
+ parser --> debugMod
85
+ attrTok --> enums
86
+ attrTok --> scriptParser
87
+ detectType --> decision
88
+ ignoreBlock --> constMod
89
+ ignoreBlock --> getLoc
90
+ decision --> mlTypes
91
+ parser --> mlAst
92
+ parser --> mlSpec
93
+ parser --> cryptoLib
94
+ scriptParser --> espree
95
+ debugMod --> debugLib
96
+ ```
97
+
98
+ ## Module Responsibilities
99
+
100
+ | Module | Responsibility | Key Exports |
101
+ | ------------------------ | ------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
102
+ | `parser.ts` | Core parsing pipeline; abstract `Parser` class that all language parsers extend | `Parser` |
103
+ | `types.ts` | Type definitions for parser configuration and tokenization | `ParserOptions`, `ParseOptions`, `Token`, `ChildToken`, `IgnoreTag`, `IgnoreBlock`, `QuoteSet`, `ValueType`, `SelfCloseType`, `Tokenized` |
104
+ | `enums.ts` | State machine enumerations for tag and attribute parsing | `TagState`, `AttrState` |
105
+ | `attr-tokenizer.ts` | Attribute string tokenization using `AttrState` state machine | `attrTokenizer` |
106
+ | `script-parser.ts` | JavaScript parsing for embedded scripts using espree | `scriptParser`, `safeScriptParser` |
107
+ | `ignore-block.ts` | Template expression masking before parsing and restoration after | `ignoreBlock`, `restoreNode` |
108
+ | `ignore-front-matter.ts` | YAML front matter detection and masking | `ignoreFrontMatter` |
109
+ | `detect-element-type.ts` | Element classification into `html`, `web-component`, or `authored` | `detectElementType` |
110
+ | `idl-attributes.ts` | IDL-to-content attribute name mapping (React-compatible) | `searchIDLAttribute` |
111
+ | `debugger.ts` | Test and debug snapshot utilities | `nodeListToDebugMaps`, `attributesToDebugMaps`, `nodeTreeDebugView` |
112
+ | `debug.ts` | Performance timing and debug logging | `PerformanceTimer`, `domLog`, `log` |
113
+ | `parser-error.ts` | Error classes with positional information | `ParserError`, `TargetParserError`, `ConfigParserError` |
114
+ | `sort-nodes.ts` | Node position sorting by offset | `sortNodes` |
115
+ | `const.ts` | Constants used across the package | `MASK_CHAR`, `svgElementList`, `defaultSpaces` |
116
+ | `get-location.ts` | Line/column/offset position calculations | `getPosition`, `getEndLine`, `getEndCol`, `getEndPosition`, `getOffsetsFromCode` |
117
+ | `decision.ts` | Custom element name detection and SVG element lookup | `isPotentialCustomElementName`, `isSVGElement` |
118
+
119
+ ## Parse Pipeline Overview
120
+
121
+ The `Parser` class implements the complete parse pipeline. Each language-specific parser extends `Parser` and provides a `tokenize()` method that produces language-specific AST nodes, plus a `nodeize()` method that converts those nodes into markuplint AST nodes. See [Parser Class Reference](docs/parser-class.md) for the full pipeline documentation.
122
+
123
+ ```mermaid
124
+ flowchart TD
125
+ source["Source Code"]
126
+ fm["ignoreFrontMatter()\nMask YAML front matter"]
127
+ ib["ignoreBlock()\nMask template expressions"]
128
+ tokenize["tokenize()\nLanguage-specific tokenization"]
129
+ traverse["traverse()\nWalk language AST"]
130
+ nodeize["nodeize()\nConvert to MLASTNode"]
131
+ flatten["flattenNodes()\nBuild flat node list"]
132
+ walk["walk()\nParent/child linking, depth"]
133
+ restore["restoreNode()\nRestore masked content"]
134
+ doc["MLASTDocument"]
135
+
136
+ source --> fm --> ib --> tokenize --> traverse --> nodeize --> flatten --> walk --> restore --> doc
137
+ ```
138
+
139
+ ## State Machines Overview
140
+
141
+ The parser uses two state machine enumerations to drive character-by-character tokenization:
142
+
143
+ **TagState** controls the tag-level parse loop:
144
+
145
+ `BeforeOpenTag` -> `FirstCharOfTagName` -> `TagName` -> `Attrs` -> `AfterAttrs` -> `AfterOpenTag`
146
+
147
+ **AttrState** controls the attribute-level parse loop:
148
+
149
+ `BeforeName` -> `Name` -> `Equal` -> `BeforeValue` -> `Value` -> `AfterValue`
150
+
151
+ See [Parser Class Reference](docs/parser-class.md) for detailed state transition diagrams.
152
+
153
+ ## Error Handling
154
+
155
+ The package provides a three-level error class hierarchy for parser errors:
156
+
157
+ | Class | Extends | Additional Fields | Purpose |
158
+ | ------------------- | ------------- | -------------------- | ---------------------------------------------------------------------------- |
159
+ | `ParserError` | `Error` | `line`, `col`, `raw` | Base parser error with source position |
160
+ | `TargetParserError` | `ParserError` | `nodeName` | Error tied to a specific element, includes the element name in the message |
161
+ | `ConfigParserError` | `ParserError` | `filePath` | Error from configuration file parsing, includes the file path in the message |
162
+
163
+ All error classes automatically format their messages with positional information (e.g., `(line:col)`).
164
+
165
+ ## Debug Utilities
166
+
167
+ The package provides three debug functions for testing and visualization:
168
+
169
+ - **`nodeListToDebugMaps`** -- Converts a flat AST node list into human-readable debug strings showing each node's position, type, and raw content. The primary tool for snapshot testing in parser tests.
170
+ - **`attributesToDebugMaps`** -- Converts attributes into detailed debug strings showing all attribute components (name, equal sign, value, quotes) with positional information and metadata flags (`isDirective`, `isDynamicValue`).
171
+ - **`nodeTreeDebugView`** -- Produces an indented tree view of the AST showing depth, parent-child relationships, pair node links, and ghost/bogus markers. Useful for visual inspection of parse results.
172
+
173
+ Additionally, `debug.ts` provides `PerformanceTimer` for measuring parse phase durations and `domLog` for structured logging via the `debug` package (enabled with `DEBUG=ml-parser`).
174
+
175
+ ## IDL Attribute Mapping
176
+
177
+ `searchIDLAttribute` maps between React-style IDL attribute names and HTML content attribute names. It is called by `@markuplint/ml-core`'s `MLAttr` constructor when the spec sets `useIDLAttributeNames: true`. It maintains a comprehensive mapping table derived from React's `possibleStandardNames.js`, covering:
178
+
179
+ - HTML attributes (e.g., `className` -> `class`, `htmlFor` -> `for`, `tabIndex` -> `tabindex`)
180
+ - SVG attributes (e.g., `strokeWidth` -> `stroke-width`, `clipPath` -> `clip-path`)
181
+ - Event handler attributes (e.g., `onClick` -> `onclick`)
182
+
183
+ The lookup handles camelCase IDL names, lowercase content attribute names, and hyphenated variants.
184
+
185
+ ## External Dependencies
186
+
187
+ | Dependency | Purpose |
188
+ | --------------------- | -------------------------------------------------------------------------- |
189
+ | `@markuplint/ml-ast` | AST type definitions (`MLASTDocument`, `MLASTElement`, `MLASTToken`, etc.) |
190
+ | `@markuplint/ml-spec` | Void element detection (`isVoidElement`) for self-closing tag handling |
191
+ | `@markuplint/types` | Custom element name validation (`isCustomElementName`) |
192
+ | `node:crypto` | AST node UUID generation via `crypto.randomUUID()` |
193
+ | `debug` | Performance timing and structured logging |
194
+ | `espree` | JavaScript tokenization and parsing for embedded script content |
195
+ | `type-fest` | TypeScript utility types |
196
+
197
+ ## Integration Points
198
+
199
+ ```mermaid
200
+ flowchart TD
201
+ subgraph upstream ["Upstream"]
202
+ mlAst["@markuplint/ml-ast\n(AST type definitions)"]
203
+ mlSpec["@markuplint/ml-spec\n(void element detection)"]
204
+ mlTypes["@markuplint/types\n(custom element validation)"]
205
+ end
206
+
207
+ subgraph pkg ["@markuplint/parser-utils"]
208
+ parser["Abstract Parser Class\n+ Utility Modules"]
209
+ end
210
+
211
+ subgraph downstream ["Downstream Parsers"]
212
+ htmlParser["@markuplint/html-parser"]
213
+ jsxParser["@markuplint/jsx-parser"]
214
+ vueParser["@markuplint/vue-parser"]
215
+ svelteParser["@markuplint/svelte-parser"]
216
+ astroParser["@markuplint/astro-parser"]
217
+ pugParser["@markuplint/pug-parser"]
218
+ end
219
+
220
+ subgraph indirect ["Indirect Downstream"]
221
+ mlCore["@markuplint/ml-core\n(MLASTDocument -> MLDOM)"]
222
+ end
223
+
224
+ upstream -->|"types, specs"| parser
225
+ parser -->|"extends Parser"| downstream
226
+ downstream -->|"produces MLASTDocument"| mlCore
227
+ ```
228
+
229
+ ### Upstream
230
+
231
+ - **`@markuplint/ml-ast`** -- All AST type definitions (`MLASTDocument`, `MLASTElement`, `MLASTAttr`, `MLASTToken`, etc.) that the Parser class produces.
232
+ - **`@markuplint/ml-spec`** -- `isVoidElement` function used to determine self-closing behavior for HTML void elements.
233
+ - **`@markuplint/types`** -- `isCustomElementName` function used by `decision.ts` to validate custom element names per the HTML spec.
234
+
235
+ ### Downstream
236
+
237
+ Six parser packages extend the abstract `Parser` class:
238
+
239
+ - **`@markuplint/html-parser`** -- Standard HTML parsing
240
+ - **`@markuplint/jsx-parser`** -- JSX/TSX parsing (extends html-parser)
241
+ - **`@markuplint/vue-parser`** -- Vue Single File Component parsing
242
+ - **`@markuplint/svelte-parser`** -- Svelte component parsing
243
+ - **`@markuplint/astro-parser`** -- Astro component parsing
244
+ - **`@markuplint/pug-parser`** -- Pug template parsing
245
+
246
+ Each downstream parser implements the `tokenize()` and `nodeize()` abstract methods to convert their language-specific AST into the unified markuplint AST format.
247
+
248
+ ## Documentation Map
249
+
250
+ - [Parser Class Reference](docs/parser-class.md) -- Detailed documentation of the Parser class, its methods, and parse pipeline
251
+ - [Maintenance Guide](docs/maintenance.md) -- Commands, recipes, and troubleshooting
package/CHANGELOG.md CHANGED
@@ -3,13 +3,44 @@
3
3
  All notable changes to this project will be documented in this file.
4
4
  See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.
5
5
 
6
- ## [4.8.10](https://github.com/markuplint/markuplint/compare/@markuplint/parser-utils@4.8.9...@markuplint/parser-utils@4.8.10) (2025-11-05)
6
+ # [5.0.0-alpha.0](https://github.com/markuplint/markuplint/compare/v4.14.1...v5.0.0-alpha.0) (2026-02-20)
7
7
 
8
- **Note:** Version bump only for package @markuplint/parser-utils
8
+ ### Bug Fixes
9
+
10
+ - **ml-core:** improve detection of namespace ([5b507ad](https://github.com/markuplint/markuplint/commit/5b507ad7c19c5015b8ce587845d901e31dfa6518))
11
+ - resolve additional eslint-plugin-unicorn v63 errors ([e58a72c](https://github.com/markuplint/markuplint/commit/e58a72c17c97bbec522f9513b99777fac6904d64))
12
+ - use explicit `export type` for type-only re-exports ([7c77c05](https://github.com/markuplint/markuplint/commit/7c77c05619518c8d18a183132040f5b2cd0ab6ec))
13
+
14
+ - feat(parser-utils)!: adapt to simplified MLASTToken properties ([5cbbc9c](https://github.com/markuplint/markuplint/commit/5cbbc9ca8f77a71d99bffa14b193c79b26c1c415))
15
+
16
+ ### BREAKING CHANGES
17
+
18
+ - Update Token type and parser internals for
19
+ simplified AST token properties.
20
+
21
+ Token type property renames:
9
22
 
23
+ - startOffset -> offset
24
+ - startLine -> line
25
+ - startCol -> col
10
26
 
27
+ Parser changes:
11
28
 
29
+ - createToken() no longer produces endOffset/endLine/endCol
30
+ - visitPsBlock() parameter: conditionalType -> blockBehavior
31
+ - visitElement() accepts blockBehavior option
32
+ - Remove selfClosingSolidus token generation
33
+ - Add getEndPosition() helper to get-location.ts
12
34
 
35
+ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
36
+
37
+ ## [4.8.11](https://github.com/markuplint/markuplint/compare/@markuplint/parser-utils@4.8.10...@markuplint/parser-utils@4.8.11) (2026-02-10)
38
+
39
+ **Note:** Version bump only for package @markuplint/parser-utils
40
+
41
+ ## [4.8.10](https://github.com/markuplint/markuplint/compare/@markuplint/parser-utils@4.8.9...@markuplint/parser-utils@4.8.10) (2025-11-05)
42
+
43
+ **Note:** Version bump only for package @markuplint/parser-utils
13
44
 
14
45
  ## [4.8.9](https://github.com/markuplint/markuplint/compare/@markuplint/parser-utils@4.8.8...@markuplint/parser-utils@4.8.9) (2025-08-24)
15
46
 
package/README.md CHANGED
@@ -16,3 +16,9 @@ $ yarn add @markuplint/parser-utils
16
16
  ```
17
17
 
18
18
  </details>
19
+
20
+ ## Documentation
21
+
22
+ - [Architecture](ARCHITECTURE.md) ([日本語](ARCHITECTURE.ja.md)) — Package overview, module relationships, and integration points
23
+ - [Parser Class Reference](docs/parser-class.md) ([日本語](docs/parser-class.ja.md)) — Complete reference for the abstract `Parser` class
24
+ - [Maintenance Guide](docs/maintenance.md) ([日本語](docs/maintenance.ja.md)) — Commands, recipes, and troubleshooting
package/SKILL.md ADDED
@@ -0,0 +1,126 @@
1
+ ---
2
+ description: Perform maintenance tasks for @markuplint/parser-utils
3
+ ---
4
+
5
+ # parser-utils-maintenance
6
+
7
+ Perform maintenance tasks for `@markuplint/parser-utils`: create new parsers,
8
+ add ignore tag patterns, add IDL attribute mappings, and customize attribute parsing.
9
+
10
+ ## Input
11
+
12
+ `$ARGUMENTS` specifies the task. Supported tasks:
13
+
14
+ | Task | Description |
15
+ | -------------------------- | ------------------------------------------ |
16
+ | `create-parser` | Create a new parser extending Parser class |
17
+ | `add-ignore-tag <type>` | Add an IgnoreTag pattern |
18
+ | `add-idl-attribute <name>` | Add an IDL attribute mapping |
19
+ | `customize-attr-parsing` | Customize attribute parsing behavior |
20
+
21
+ If omitted, defaults to `create-parser`.
22
+
23
+ ## Reference
24
+
25
+ Before executing any task, read `docs/maintenance.md` (or `docs/maintenance.ja.md`)
26
+ for the full guide. The recipes there are the source of truth for procedures.
27
+
28
+ Also read:
29
+
30
+ - `docs/parser-class.md` -- Complete Parser class reference with override patterns
31
+ - `ARCHITECTURE.md` -- Package overview, module relationships, and integration points
32
+
33
+ ## Task: create-parser
34
+
35
+ Create a new parser extending the abstract Parser class. Follow recipe #1 in `docs/maintenance.md`.
36
+
37
+ ### Step 1: Set up the package
38
+
39
+ 1. Create a new package under `packages/@markuplint/`
40
+ 2. Add `@markuplint/parser-utils` as a dependency
41
+ 3. Create the main parser file extending `Parser<YourNode, YourState>`
42
+
43
+ ### Step 2: Implement required methods
44
+
45
+ 1. Read `docs/parser-class.md` for the full override pattern reference
46
+ 2. Implement `tokenize()` -- invoke the language-specific tokenizer on `this.rawCode`
47
+ 3. Implement `nodeize()` -- convert each AST node using visitor methods
48
+ 4. Set constructor options (`endTagType`, `tagNameCaseSensitive`, `ignoreTags`, etc.)
49
+
50
+ ### Step 3: Export the parser module
51
+
52
+ 1. Export as `MLParserModule`: `export default { parser: new MyParser() }`
53
+ 2. Build: `yarn build --scope @markuplint/<package-name>`
54
+ 3. Test with `nodeListToDebugMaps` snapshot assertions
55
+
56
+ ## Task: add-ignore-tag
57
+
58
+ Add an IgnoreTag pattern for masking template expressions. Follow recipe #3 in `docs/maintenance.md`.
59
+
60
+ ### Step 1: Define the pattern
61
+
62
+ 1. Identify the start and end delimiters of the template expression
63
+ 2. Choose a `type` name (becomes the `#ps:` node name prefix)
64
+ 3. Start and end can be strings or RegExp patterns
65
+
66
+ ### Step 2: Add to constructor
67
+
68
+ 1. Add the `IgnoreTag` entry to the `ignoreTags` array in the parser constructor
69
+ 2. Consider if a custom `maskChar` is needed (default is `\uE000`)
70
+
71
+ ### Step 3: Verify
72
+
73
+ 1. Build: `yarn build --scope @markuplint/<package-name>`
74
+ 2. Test that the template expression is correctly masked and restored
75
+ 3. Verify the restored node has the expected `#ps:<type>` name
76
+
77
+ ## Task: add-idl-attribute
78
+
79
+ Add an IDL attribute mapping. Follow recipe #5 in `docs/maintenance.md`.
80
+
81
+ ### Step 1: Add the mapping
82
+
83
+ 1. Read `src/idl-attributes.ts` and find the `idlContentMap` object
84
+ 2. Add a new entry: `idlPropName: 'content-attr-name'`
85
+ 3. Follow naming conventions: key is camelCase IDL name, value is lowercase content name
86
+
87
+ ### Step 2: Verify
88
+
89
+ 1. Build: `yarn build --scope @markuplint/parser-utils`
90
+ 2. Test with `searchIDLAttribute()` to confirm the mapping resolves correctly
91
+
92
+ ## Task: customize-attr-parsing
93
+
94
+ Customize attribute parsing behavior for a specific parser. Follow recipe #4 in `docs/maintenance.md`.
95
+
96
+ ### Step 1: Override visitAttr
97
+
98
+ 1. Read `docs/parser-class.md` for the `visitAttr()` documentation
99
+ 2. Override `visitAttr()` in your parser subclass
100
+ 3. Call `super.visitAttr(token, options)` with custom options:
101
+ - `quoteSet` -- custom quote delimiters (e.g., `{` `}` for JSX)
102
+ - `startState` -- initial AttrState (usually `BeforeName`)
103
+ - `noQuoteValueType` -- value type for unquoted values
104
+
105
+ ### Step 2: Post-process
106
+
107
+ 1. After calling `super`, use `this.updateAttr()` to set metadata:
108
+ - `isDirective` for framework directives
109
+ - `isDynamicValue` for dynamic bindings
110
+ - `potentialName` for directive-to-attribute name resolution
111
+ 2. Use `searchIDLAttribute()` to resolve IDL property names
112
+
113
+ ### Step 3: Verify
114
+
115
+ 1. Build the parser package
116
+ 2. Test attribute parsing with `attributesToDebugMaps` for snapshot assertions
117
+ 3. Verify all framework-specific directive patterns are recognized
118
+
119
+ ## Rules
120
+
121
+ 1. **Always call `super.visitAttr()`** when overriding. It handles token decomposition.
122
+ 2. **Always call `super.detectElementType()`** when overriding. Pass framework-specific patterns as the `defaultPattern` argument.
123
+ 3. **Never call `super.tokenize()` or `super.nodeize()`** -- the defaults return empty arrays.
124
+ 4. **Always call `super.beforeParse()` and `super.afterParse()`** -- they handle offset spaces.
125
+ 5. **Test across all downstream parsers** when modifying the Parser class.
126
+ 6. **Add JSDoc comments** to all new public methods and properties.