@markuplint/parser-utils 4.8.10 → 5.0.0-alpha.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/ARCHITECTURE.ja.md +208 -0
- package/ARCHITECTURE.md +251 -0
- package/CHANGELOG.md +33 -2
- package/README.md +6 -0
- package/SKILL.md +126 -0
- package/docs/maintenance.ja.md +176 -0
- package/docs/maintenance.md +176 -0
- package/docs/parser-class.ja.md +655 -0
- package/docs/parser-class.md +655 -0
- package/lib/debug.js +8 -24
- package/lib/debugger.d.ts +25 -0
- package/lib/debugger.js +34 -4
- package/lib/enums.d.ts +10 -0
- package/lib/enums.js +10 -0
- package/lib/get-location.d.ts +31 -0
- package/lib/get-location.js +33 -0
- package/lib/get-namespace.d.ts +11 -0
- package/lib/get-namespace.js +38 -0
- package/lib/idl-attributes.d.ts +9 -0
- package/lib/idl-attributes.js +9 -0
- package/lib/ignore-block.js +15 -14
- package/lib/index.d.ts +2 -1
- package/lib/index.js +1 -1
- package/lib/parser-error.d.ts +16 -0
- package/lib/parser-error.js +20 -3
- package/lib/parser.d.ts +285 -7
- package/lib/parser.js +763 -551
- package/lib/script-parser.d.ts +21 -0
- package/lib/script-parser.js +17 -0
- package/lib/sort-nodes.d.ts +8 -0
- package/lib/sort-nodes.js +11 -3
- package/lib/types.d.ts +60 -3
- package/package.json +11 -10
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
# @markuplint/parser-utils
|
|
2
|
+
|
|
3
|
+
## 概要
|
|
4
|
+
|
|
5
|
+
`@markuplint/parser-utils` は全 markuplint パーサーの共通基盤パッケージです。完全なパースパイプラインを実装する抽象 `Parser` クラスと、トークン化、エラー処理、デバッグ、AST 操作のためのユーティリティモジュール群を提供します。すべてのマークアップ言語パーサー(HTML、JSX、Vue、Svelte、Astro、Pug)はこのパッケージの `Parser` クラスを拡張し、言語固有の AST ノードを `@markuplint/ml-ast` で定義された統一 markuplint AST 形式に変換します。
|
|
6
|
+
|
|
7
|
+
## ディレクトリ構成
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
src/
|
|
11
|
+
├── index.ts — 全パブリック API の再エクスポート
|
|
12
|
+
├── parser.ts — abstract class Parser<Node, State>(約1825行、コア)
|
|
13
|
+
├── types.ts — ParserOptions, ParseOptions, Token, ChildToken, IgnoreTag 等
|
|
14
|
+
├── enums.ts — TagState, AttrState ステートマシン
|
|
15
|
+
├── attr-tokenizer.ts — 属性トークナイザ(AttrState 使用)
|
|
16
|
+
├── script-parser.ts — espree による JavaScript パース
|
|
17
|
+
├── ignore-block.ts — テンプレート式のマスキングと復元
|
|
18
|
+
├── ignore-front-matter.ts — YAML フロントマター検出・マスキング
|
|
19
|
+
├── detect-element-type.ts — 要素種別判定(html/web-component/authored)
|
|
20
|
+
├── idl-attributes.ts — IDL ↔ コンテンツ属性名マッピング(React 互換)
|
|
21
|
+
├── debugger.ts — デバッグ・テスト用ユーティリティ
|
|
22
|
+
├── parser-error.ts — ParserError, TargetParserError, ConfigParserError
|
|
23
|
+
├── sort-nodes.ts — ノード位置ソート
|
|
24
|
+
├── const.ts — MASK_CHAR, SVG 要素リスト, defaultSpaces
|
|
25
|
+
├── get-location.ts — 行/列/オフセット計算ユーティリティ
|
|
26
|
+
└── decision.ts — カスタム要素名判定
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## アーキテクチャ図
|
|
30
|
+
|
|
31
|
+
```mermaid
|
|
32
|
+
flowchart TD
|
|
33
|
+
subgraph core ["コアモジュール"]
|
|
34
|
+
parser["parser.ts\nabstract Parser"]
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
subgraph utils ["ユーティリティモジュール"]
|
|
38
|
+
attrTokenizer["attr-tokenizer.ts\n属性トークナイザ"]
|
|
39
|
+
ignoreBlock["ignore-block.ts\nマスク・復元"]
|
|
40
|
+
ignoreFM["ignore-front-matter.ts\nフロントマター"]
|
|
41
|
+
detectType["detect-element-type.ts\n要素種別判定"]
|
|
42
|
+
scriptParser["script-parser.ts\nJS パース"]
|
|
43
|
+
idlAttrs["idl-attributes.ts\nIDL マッピング"]
|
|
44
|
+
end
|
|
45
|
+
|
|
46
|
+
subgraph support ["サポートモジュール"]
|
|
47
|
+
enums["enums.ts\nTagState, AttrState"]
|
|
48
|
+
errors["parser-error.ts\nエラークラス"]
|
|
49
|
+
debugger["debugger.ts\nデバッグ"]
|
|
50
|
+
location["get-location.ts\n位置計算"]
|
|
51
|
+
sort["sort-nodes.ts\nソート"]
|
|
52
|
+
consts["const.ts\n定数"]
|
|
53
|
+
decision["decision.ts\nカスタム要素名"]
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
subgraph deps ["外部依存"]
|
|
57
|
+
mlAst["@markuplint/ml-ast\n型定義"]
|
|
58
|
+
mlSpec["@markuplint/ml-spec\nvoid 要素"]
|
|
59
|
+
espree["espree\nJS トークナイザ"]
|
|
60
|
+
cryptoLib["node:crypto\nノード ID"]
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
parser --> attrTokenizer
|
|
64
|
+
parser --> ignoreBlock
|
|
65
|
+
parser --> ignoreFM
|
|
66
|
+
parser --> detectType
|
|
67
|
+
parser --> enums
|
|
68
|
+
parser --> errors
|
|
69
|
+
parser --> location
|
|
70
|
+
parser --> sort
|
|
71
|
+
parser --> consts
|
|
72
|
+
|
|
73
|
+
attrTokenizer --> enums
|
|
74
|
+
attrTokenizer --> scriptParser
|
|
75
|
+
scriptParser --> espree
|
|
76
|
+
detectType --> decision
|
|
77
|
+
ignoreBlock --> location
|
|
78
|
+
ignoreBlock --> errors
|
|
79
|
+
|
|
80
|
+
parser --> mlAst
|
|
81
|
+
parser --> mlSpec
|
|
82
|
+
parser --> cryptoLib
|
|
83
|
+
decision --> mlAst
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## モジュール責務
|
|
87
|
+
|
|
88
|
+
| モジュール | 責務 | 主要エクスポート |
|
|
89
|
+
| ------------------------ | ------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------- |
|
|
90
|
+
| `parser.ts` | コアパースパイプライン、抽象 Parser クラス | `Parser` |
|
|
91
|
+
| `types.ts` | 型定義 | `ParserOptions`, `ParseOptions`, `Token`, `ChildToken`, `IgnoreTag`, `IgnoreBlock`, `QuoteSet`, `ValueType`, `SelfCloseType`, `Tokenized` |
|
|
92
|
+
| `enums.ts` | ステートマシン列挙型 | `TagState`, `AttrState` |
|
|
93
|
+
| `attr-tokenizer.ts` | 属性文字列のトークン化 | `attrTokenizer` |
|
|
94
|
+
| `script-parser.ts` | 組み込みスクリプトの JavaScript パース | `scriptParser`, `safeScriptParser` |
|
|
95
|
+
| `ignore-block.ts` | テンプレート式のマスキングと復元 | `ignoreBlock`, `restoreNode` |
|
|
96
|
+
| `ignore-front-matter.ts` | YAML フロントマター処理 | `ignoreFrontMatter` |
|
|
97
|
+
| `detect-element-type.ts` | 要素の分類 | `detectElementType` |
|
|
98
|
+
| `idl-attributes.ts` | IDL ↔ コンテンツ属性名マッピング | `searchIDLAttribute` |
|
|
99
|
+
| `debugger.ts` | テスト・デバッグユーティリティ | `nodeListToDebugMaps`, `attributesToDebugMaps`, `nodeTreeDebugView` |
|
|
100
|
+
| `parser-error.ts` | エラークラス | `ParserError`, `TargetParserError`, `ConfigParserError` |
|
|
101
|
+
| `sort-nodes.ts` | ノードの位置ソート | `sortNodes` |
|
|
102
|
+
| `const.ts` | 定数 | `MASK_CHAR`, `svgElementList`, `defaultSpaces` |
|
|
103
|
+
| `get-location.ts` | 位置計算 | `getPosition`, `getEndLine`, `getEndCol`, `getEndPosition`, `getOffsetsFromCode` |
|
|
104
|
+
| `decision.ts` | カスタム要素名判定 | `isPotentialCustomElementName`, `isSVGElement` |
|
|
105
|
+
|
|
106
|
+
## パースパイプライン概要
|
|
107
|
+
|
|
108
|
+
Parser クラスの `parse()` メソッドは11ステップのパイプラインを実行します:
|
|
109
|
+
|
|
110
|
+
```mermaid
|
|
111
|
+
flowchart LR
|
|
112
|
+
A["beforeParse"] --> B["frontMatter"]
|
|
113
|
+
B --> C["ignoreBlock"]
|
|
114
|
+
C --> D["tokenize"]
|
|
115
|
+
D --> E["traverse"]
|
|
116
|
+
E --> F["afterTraverse"]
|
|
117
|
+
F --> G["flatten"]
|
|
118
|
+
G --> H["afterFlatten"]
|
|
119
|
+
H --> I["restore"]
|
|
120
|
+
I --> J["afterParse"]
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
詳細は [Parser クラスリファレンス](docs/parser-class.ja.md) を参照してください。
|
|
124
|
+
|
|
125
|
+
## ステートマシン概要
|
|
126
|
+
|
|
127
|
+
パッケージには2つのステートマシンがあります:
|
|
128
|
+
|
|
129
|
+
- **TagState** -- タグレベルのパースで使用(`<` 検出 → タグ名 → 属性 → `>` 検出)
|
|
130
|
+
- **AttrState** -- 属性レベルのパースで使用(名前 → `=` → 値)
|
|
131
|
+
|
|
132
|
+
詳細な状態遷移図は [Parser クラスリファレンス](docs/parser-class.ja.md#ステートマシン) を参照してください。
|
|
133
|
+
|
|
134
|
+
## エラー処理
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
ParserError (基底クラス)
|
|
138
|
+
├── line, col, raw — ソース位置情報
|
|
139
|
+
├── TargetParserError — 特定要素に関するエラー(nodeName を含む)
|
|
140
|
+
└── ConfigParserError — 設定ファイルのエラー(filePath を含む)
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
## デバッグユーティリティ
|
|
144
|
+
|
|
145
|
+
- **`nodeListToDebugMaps`** -- AST ノードリストを人間が読めるデバッグ文字列に変換。スナップショットテストに使用
|
|
146
|
+
- **`attributesToDebugMaps`** -- 属性を名前、等号、値、引用符の各部品に分解して表示
|
|
147
|
+
- **`nodeTreeDebugView`** -- ツリー構造の可視化。深さ、親子関係、ペアノードを表示
|
|
148
|
+
|
|
149
|
+
## IDL 属性マッピング
|
|
150
|
+
|
|
151
|
+
`searchIDLAttribute` は React スタイルの IDL 属性名と HTML コンテンツ属性名の双方向マッピングを提供します(例: `className` → `class`、`htmlFor` → `for`)。spec が `useIDLAttributeNames: true` を設定している場合に、`@markuplint/ml-core` の `MLAttr` コンストラクタから呼び出されます。
|
|
152
|
+
|
|
153
|
+
## 外部依存
|
|
154
|
+
|
|
155
|
+
| 依存パッケージ | 用途 |
|
|
156
|
+
| --------------------- | ---------------------------------- |
|
|
157
|
+
| `@markuplint/ml-ast` | AST 型定義 |
|
|
158
|
+
| `@markuplint/ml-spec` | void 要素判定 |
|
|
159
|
+
| `@markuplint/types` | カスタム要素名検証 |
|
|
160
|
+
| `node:crypto` | AST ノード UUID 生成 |
|
|
161
|
+
| `debug` | パフォーマンスタイミング・ロギング |
|
|
162
|
+
| `espree` | JavaScript トークン化・パース |
|
|
163
|
+
| `type-fest` | TypeScript ユーティリティ型 |
|
|
164
|
+
|
|
165
|
+
## 統合ポイント
|
|
166
|
+
|
|
167
|
+
```mermaid
|
|
168
|
+
flowchart TD
|
|
169
|
+
subgraph upstream ["上流"]
|
|
170
|
+
mlAst["@markuplint/ml-ast\n型定義"]
|
|
171
|
+
mlSpec["@markuplint/ml-spec\nvoid 要素判定"]
|
|
172
|
+
end
|
|
173
|
+
|
|
174
|
+
subgraph pkg ["@markuplint/parser-utils"]
|
|
175
|
+
parser["abstract Parser"]
|
|
176
|
+
end
|
|
177
|
+
|
|
178
|
+
subgraph downstream ["下流(パーサー群)"]
|
|
179
|
+
html["@markuplint/html-parser"]
|
|
180
|
+
jsx["@markuplint/jsx-parser"]
|
|
181
|
+
vue["@markuplint/vue-parser"]
|
|
182
|
+
svelte["@markuplint/svelte-parser"]
|
|
183
|
+
astro["@markuplint/astro-parser"]
|
|
184
|
+
pug["@markuplint/pug-parser"]
|
|
185
|
+
end
|
|
186
|
+
|
|
187
|
+
subgraph indirect ["間接的"]
|
|
188
|
+
mlCore["@markuplint/ml-core"]
|
|
189
|
+
end
|
|
190
|
+
|
|
191
|
+
upstream -->|"型・ユーティリティ"| pkg
|
|
192
|
+
pkg -->|"Parser クラスを拡張"| downstream
|
|
193
|
+
downstream -->|"MLASTDocument を生成"| mlCore
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
### 上流
|
|
197
|
+
|
|
198
|
+
- **`@markuplint/ml-ast`** -- 全 AST 型定義(`MLASTElement`, `MLASTText` 等)
|
|
199
|
+
- **`@markuplint/ml-spec`** -- `isVoidElement` による自己閉じタグ判定
|
|
200
|
+
|
|
201
|
+
### 下流
|
|
202
|
+
|
|
203
|
+
6つのパーサーパッケージが Parser クラスを拡張: html-parser, jsx-parser, vue-parser, svelte-parser, astro-parser, pug-parser
|
|
204
|
+
|
|
205
|
+
## ドキュメントマップ
|
|
206
|
+
|
|
207
|
+
- [Parser クラスリファレンス](docs/parser-class.ja.md) -- Parser クラスの完全リファレンス
|
|
208
|
+
- [メンテナンスガイド](docs/maintenance.ja.md) -- コマンド、レシピ、トラブルシューティング
|
package/ARCHITECTURE.md
ADDED
|
@@ -0,0 +1,251 @@
|
|
|
1
|
+
# @markuplint/parser-utils
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
`@markuplint/parser-utils` is the shared foundation for all markuplint parsers. It provides the abstract `Parser` class that implements the full parsing pipeline and a set of utility modules for tokenization, error handling, debugging, and AST manipulation. Every markup language parser (HTML, JSX, Vue, Svelte, Astro, Pug) extends this package's `Parser` class to convert language-specific AST nodes into the unified markuplint AST format defined by `@markuplint/ml-ast`.
|
|
6
|
+
|
|
7
|
+
## Directory Structure
|
|
8
|
+
|
|
9
|
+
```
|
|
10
|
+
src/
|
|
11
|
+
├── index.ts — Re-exports all public API
|
|
12
|
+
├── parser.ts — Abstract class Parser<Node, State> (~1825 lines, core)
|
|
13
|
+
├── types.ts — ParserOptions, ParseOptions, Token, ChildToken, IgnoreTag, etc.
|
|
14
|
+
├── enums.ts — TagState, AttrState state machines
|
|
15
|
+
├── attr-tokenizer.ts — Attribute tokenizer (uses AttrState)
|
|
16
|
+
├── script-parser.ts — JavaScript parsing via espree
|
|
17
|
+
├── ignore-block.ts — Template expression masking and restoration
|
|
18
|
+
├── ignore-front-matter.ts — YAML front matter detection and masking
|
|
19
|
+
├── detect-element-type.ts — Element type classification (html/web-component/authored)
|
|
20
|
+
├── idl-attributes.ts — IDL <-> content attribute name mapping (React-compatible)
|
|
21
|
+
├── debugger.ts — Debug and test utilities (nodeListToDebugMaps, etc.)
|
|
22
|
+
├── debug.ts — Performance timer and debug logging via `debug` package
|
|
23
|
+
├── parser-error.ts — ParserError, TargetParserError, ConfigParserError
|
|
24
|
+
├── sort-nodes.ts — Node position sorting
|
|
25
|
+
├── const.ts — MASK_CHAR, SVG element list, defaultSpaces
|
|
26
|
+
├── get-location.ts — Line/column/offset calculation utilities
|
|
27
|
+
└── decision.ts — Custom element name detection
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## Architecture Diagram
|
|
31
|
+
|
|
32
|
+
```mermaid
|
|
33
|
+
flowchart TD
|
|
34
|
+
subgraph core ["Core"]
|
|
35
|
+
parser["parser.ts\nAbstract Parser Class"]
|
|
36
|
+
types["types.ts\nParserOptions, Token, etc."]
|
|
37
|
+
enums["enums.ts\nTagState, AttrState"]
|
|
38
|
+
end
|
|
39
|
+
|
|
40
|
+
subgraph tokenizer ["Tokenizer Utilities"]
|
|
41
|
+
attrTok["attr-tokenizer.ts\nattrTokenizer()"]
|
|
42
|
+
scriptParser["script-parser.ts\nscriptParser(), safeScriptParser()"]
|
|
43
|
+
end
|
|
44
|
+
|
|
45
|
+
subgraph masking ["Masking & Preprocessing"]
|
|
46
|
+
ignoreBlock["ignore-block.ts\nignoreBlock(), restoreNode()"]
|
|
47
|
+
ignoreFM["ignore-front-matter.ts\nignoreFrontMatter()"]
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
subgraph classification ["Classification"]
|
|
51
|
+
detectType["detect-element-type.ts\ndetectElementType()"]
|
|
52
|
+
decision["decision.ts\nisPotentialCustomElementName()"]
|
|
53
|
+
idlAttrs["idl-attributes.ts\nsearchIDLAttribute()"]
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
subgraph utilities ["Utilities"]
|
|
57
|
+
debugger["debugger.ts\nnodeListToDebugMaps()"]
|
|
58
|
+
debugMod["debug.ts\nPerformanceTimer, domLog()"]
|
|
59
|
+
parserError["parser-error.ts\nParserError hierarchy"]
|
|
60
|
+
sortNodes["sort-nodes.ts\nsortNodes()"]
|
|
61
|
+
getLoc["get-location.ts\ngetPosition(), getEndLine()"]
|
|
62
|
+
constMod["const.ts\nMASK_CHAR, svgElementList"]
|
|
63
|
+
end
|
|
64
|
+
|
|
65
|
+
subgraph external ["External Dependencies"]
|
|
66
|
+
mlAst["@markuplint/ml-ast\n(AST types)"]
|
|
67
|
+
mlSpec["@markuplint/ml-spec\n(void element detection)"]
|
|
68
|
+
mlTypes["@markuplint/types\n(custom element validation)"]
|
|
69
|
+
espree["espree\n(JS tokenization)"]
|
|
70
|
+
cryptoLib["node:crypto\n(node ID generation)"]
|
|
71
|
+
debugLib["debug\n(logging)"]
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
parser --> types
|
|
75
|
+
parser --> enums
|
|
76
|
+
parser --> attrTok
|
|
77
|
+
parser --> ignoreBlock
|
|
78
|
+
parser --> ignoreFM
|
|
79
|
+
parser --> detectType
|
|
80
|
+
parser --> sortNodes
|
|
81
|
+
parser --> getLoc
|
|
82
|
+
parser --> constMod
|
|
83
|
+
parser --> parserError
|
|
84
|
+
parser --> debugMod
|
|
85
|
+
attrTok --> enums
|
|
86
|
+
attrTok --> scriptParser
|
|
87
|
+
detectType --> decision
|
|
88
|
+
ignoreBlock --> constMod
|
|
89
|
+
ignoreBlock --> getLoc
|
|
90
|
+
decision --> mlTypes
|
|
91
|
+
parser --> mlAst
|
|
92
|
+
parser --> mlSpec
|
|
93
|
+
parser --> cryptoLib
|
|
94
|
+
scriptParser --> espree
|
|
95
|
+
debugMod --> debugLib
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
## Module Responsibilities
|
|
99
|
+
|
|
100
|
+
| Module | Responsibility | Key Exports |
|
|
101
|
+
| ------------------------ | ------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
|
|
102
|
+
| `parser.ts` | Core parsing pipeline; abstract `Parser` class that all language parsers extend | `Parser` |
|
|
103
|
+
| `types.ts` | Type definitions for parser configuration and tokenization | `ParserOptions`, `ParseOptions`, `Token`, `ChildToken`, `IgnoreTag`, `IgnoreBlock`, `QuoteSet`, `ValueType`, `SelfCloseType`, `Tokenized` |
|
|
104
|
+
| `enums.ts` | State machine enumerations for tag and attribute parsing | `TagState`, `AttrState` |
|
|
105
|
+
| `attr-tokenizer.ts` | Attribute string tokenization using `AttrState` state machine | `attrTokenizer` |
|
|
106
|
+
| `script-parser.ts` | JavaScript parsing for embedded scripts using espree | `scriptParser`, `safeScriptParser` |
|
|
107
|
+
| `ignore-block.ts` | Template expression masking before parsing and restoration after | `ignoreBlock`, `restoreNode` |
|
|
108
|
+
| `ignore-front-matter.ts` | YAML front matter detection and masking | `ignoreFrontMatter` |
|
|
109
|
+
| `detect-element-type.ts` | Element classification into `html`, `web-component`, or `authored` | `detectElementType` |
|
|
110
|
+
| `idl-attributes.ts` | IDL-to-content attribute name mapping (React-compatible) | `searchIDLAttribute` |
|
|
111
|
+
| `debugger.ts` | Test and debug snapshot utilities | `nodeListToDebugMaps`, `attributesToDebugMaps`, `nodeTreeDebugView` |
|
|
112
|
+
| `debug.ts` | Performance timing and debug logging | `PerformanceTimer`, `domLog`, `log` |
|
|
113
|
+
| `parser-error.ts` | Error classes with positional information | `ParserError`, `TargetParserError`, `ConfigParserError` |
|
|
114
|
+
| `sort-nodes.ts` | Node position sorting by offset | `sortNodes` |
|
|
115
|
+
| `const.ts` | Constants used across the package | `MASK_CHAR`, `svgElementList`, `defaultSpaces` |
|
|
116
|
+
| `get-location.ts` | Line/column/offset position calculations | `getPosition`, `getEndLine`, `getEndCol`, `getEndPosition`, `getOffsetsFromCode` |
|
|
117
|
+
| `decision.ts` | Custom element name detection and SVG element lookup | `isPotentialCustomElementName`, `isSVGElement` |
|
|
118
|
+
|
|
119
|
+
## Parse Pipeline Overview
|
|
120
|
+
|
|
121
|
+
The `Parser` class implements the complete parse pipeline. Each language-specific parser extends `Parser` and provides a `tokenize()` method that produces language-specific AST nodes, plus a `nodeize()` method that converts those nodes into markuplint AST nodes. See [Parser Class Reference](docs/parser-class.md) for the full pipeline documentation.
|
|
122
|
+
|
|
123
|
+
```mermaid
|
|
124
|
+
flowchart TD
|
|
125
|
+
source["Source Code"]
|
|
126
|
+
fm["ignoreFrontMatter()\nMask YAML front matter"]
|
|
127
|
+
ib["ignoreBlock()\nMask template expressions"]
|
|
128
|
+
tokenize["tokenize()\nLanguage-specific tokenization"]
|
|
129
|
+
traverse["traverse()\nWalk language AST"]
|
|
130
|
+
nodeize["nodeize()\nConvert to MLASTNode"]
|
|
131
|
+
flatten["flattenNodes()\nBuild flat node list"]
|
|
132
|
+
walk["walk()\nParent/child linking, depth"]
|
|
133
|
+
restore["restoreNode()\nRestore masked content"]
|
|
134
|
+
doc["MLASTDocument"]
|
|
135
|
+
|
|
136
|
+
source --> fm --> ib --> tokenize --> traverse --> nodeize --> flatten --> walk --> restore --> doc
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
## State Machines Overview
|
|
140
|
+
|
|
141
|
+
The parser uses two state machine enumerations to drive character-by-character tokenization:
|
|
142
|
+
|
|
143
|
+
**TagState** controls the tag-level parse loop:
|
|
144
|
+
|
|
145
|
+
`BeforeOpenTag` -> `FirstCharOfTagName` -> `TagName` -> `Attrs` -> `AfterAttrs` -> `AfterOpenTag`
|
|
146
|
+
|
|
147
|
+
**AttrState** controls the attribute-level parse loop:
|
|
148
|
+
|
|
149
|
+
`BeforeName` -> `Name` -> `Equal` -> `BeforeValue` -> `Value` -> `AfterValue`
|
|
150
|
+
|
|
151
|
+
See [Parser Class Reference](docs/parser-class.md) for detailed state transition diagrams.
|
|
152
|
+
|
|
153
|
+
## Error Handling
|
|
154
|
+
|
|
155
|
+
The package provides a three-level error class hierarchy for parser errors:
|
|
156
|
+
|
|
157
|
+
| Class | Extends | Additional Fields | Purpose |
|
|
158
|
+
| ------------------- | ------------- | -------------------- | ---------------------------------------------------------------------------- |
|
|
159
|
+
| `ParserError` | `Error` | `line`, `col`, `raw` | Base parser error with source position |
|
|
160
|
+
| `TargetParserError` | `ParserError` | `nodeName` | Error tied to a specific element, includes the element name in the message |
|
|
161
|
+
| `ConfigParserError` | `ParserError` | `filePath` | Error from configuration file parsing, includes the file path in the message |
|
|
162
|
+
|
|
163
|
+
All error classes automatically format their messages with positional information (e.g., `(line:col)`).
|
|
164
|
+
|
|
165
|
+
## Debug Utilities
|
|
166
|
+
|
|
167
|
+
The package provides three debug functions for testing and visualization:
|
|
168
|
+
|
|
169
|
+
- **`nodeListToDebugMaps`** -- Converts a flat AST node list into human-readable debug strings showing each node's position, type, and raw content. The primary tool for snapshot testing in parser tests.
|
|
170
|
+
- **`attributesToDebugMaps`** -- Converts attributes into detailed debug strings showing all attribute components (name, equal sign, value, quotes) with positional information and metadata flags (`isDirective`, `isDynamicValue`).
|
|
171
|
+
- **`nodeTreeDebugView`** -- Produces an indented tree view of the AST showing depth, parent-child relationships, pair node links, and ghost/bogus markers. Useful for visual inspection of parse results.
|
|
172
|
+
|
|
173
|
+
Additionally, `debug.ts` provides `PerformanceTimer` for measuring parse phase durations and `domLog` for structured logging via the `debug` package (enabled with `DEBUG=ml-parser`).
|
|
174
|
+
|
|
175
|
+
## IDL Attribute Mapping
|
|
176
|
+
|
|
177
|
+
`searchIDLAttribute` maps between React-style IDL attribute names and HTML content attribute names. It is called by `@markuplint/ml-core`'s `MLAttr` constructor when the spec sets `useIDLAttributeNames: true`. It maintains a comprehensive mapping table derived from React's `possibleStandardNames.js`, covering:
|
|
178
|
+
|
|
179
|
+
- HTML attributes (e.g., `className` -> `class`, `htmlFor` -> `for`, `tabIndex` -> `tabindex`)
|
|
180
|
+
- SVG attributes (e.g., `strokeWidth` -> `stroke-width`, `clipPath` -> `clip-path`)
|
|
181
|
+
- Event handler attributes (e.g., `onClick` -> `onclick`)
|
|
182
|
+
|
|
183
|
+
The lookup handles camelCase IDL names, lowercase content attribute names, and hyphenated variants.
|
|
184
|
+
|
|
185
|
+
## External Dependencies
|
|
186
|
+
|
|
187
|
+
| Dependency | Purpose |
|
|
188
|
+
| --------------------- | -------------------------------------------------------------------------- |
|
|
189
|
+
| `@markuplint/ml-ast` | AST type definitions (`MLASTDocument`, `MLASTElement`, `MLASTToken`, etc.) |
|
|
190
|
+
| `@markuplint/ml-spec` | Void element detection (`isVoidElement`) for self-closing tag handling |
|
|
191
|
+
| `@markuplint/types` | Custom element name validation (`isCustomElementName`) |
|
|
192
|
+
| `node:crypto` | AST node UUID generation via `crypto.randomUUID()` |
|
|
193
|
+
| `debug` | Performance timing and structured logging |
|
|
194
|
+
| `espree` | JavaScript tokenization and parsing for embedded script content |
|
|
195
|
+
| `type-fest` | TypeScript utility types |
|
|
196
|
+
|
|
197
|
+
## Integration Points
|
|
198
|
+
|
|
199
|
+
```mermaid
|
|
200
|
+
flowchart TD
|
|
201
|
+
subgraph upstream ["Upstream"]
|
|
202
|
+
mlAst["@markuplint/ml-ast\n(AST type definitions)"]
|
|
203
|
+
mlSpec["@markuplint/ml-spec\n(void element detection)"]
|
|
204
|
+
mlTypes["@markuplint/types\n(custom element validation)"]
|
|
205
|
+
end
|
|
206
|
+
|
|
207
|
+
subgraph pkg ["@markuplint/parser-utils"]
|
|
208
|
+
parser["Abstract Parser Class\n+ Utility Modules"]
|
|
209
|
+
end
|
|
210
|
+
|
|
211
|
+
subgraph downstream ["Downstream Parsers"]
|
|
212
|
+
htmlParser["@markuplint/html-parser"]
|
|
213
|
+
jsxParser["@markuplint/jsx-parser"]
|
|
214
|
+
vueParser["@markuplint/vue-parser"]
|
|
215
|
+
svelteParser["@markuplint/svelte-parser"]
|
|
216
|
+
astroParser["@markuplint/astro-parser"]
|
|
217
|
+
pugParser["@markuplint/pug-parser"]
|
|
218
|
+
end
|
|
219
|
+
|
|
220
|
+
subgraph indirect ["Indirect Downstream"]
|
|
221
|
+
mlCore["@markuplint/ml-core\n(MLASTDocument -> MLDOM)"]
|
|
222
|
+
end
|
|
223
|
+
|
|
224
|
+
upstream -->|"types, specs"| parser
|
|
225
|
+
parser -->|"extends Parser"| downstream
|
|
226
|
+
downstream -->|"produces MLASTDocument"| mlCore
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
### Upstream
|
|
230
|
+
|
|
231
|
+
- **`@markuplint/ml-ast`** -- All AST type definitions (`MLASTDocument`, `MLASTElement`, `MLASTAttr`, `MLASTToken`, etc.) that the Parser class produces.
|
|
232
|
+
- **`@markuplint/ml-spec`** -- `isVoidElement` function used to determine self-closing behavior for HTML void elements.
|
|
233
|
+
- **`@markuplint/types`** -- `isCustomElementName` function used by `decision.ts` to validate custom element names per the HTML spec.
|
|
234
|
+
|
|
235
|
+
### Downstream
|
|
236
|
+
|
|
237
|
+
Six parser packages extend the abstract `Parser` class:
|
|
238
|
+
|
|
239
|
+
- **`@markuplint/html-parser`** -- Standard HTML parsing
|
|
240
|
+
- **`@markuplint/jsx-parser`** -- JSX/TSX parsing (extends html-parser)
|
|
241
|
+
- **`@markuplint/vue-parser`** -- Vue Single File Component parsing
|
|
242
|
+
- **`@markuplint/svelte-parser`** -- Svelte component parsing
|
|
243
|
+
- **`@markuplint/astro-parser`** -- Astro component parsing
|
|
244
|
+
- **`@markuplint/pug-parser`** -- Pug template parsing
|
|
245
|
+
|
|
246
|
+
Each downstream parser implements the `tokenize()` and `nodeize()` abstract methods to convert their language-specific AST into the unified markuplint AST format.
|
|
247
|
+
|
|
248
|
+
## Documentation Map
|
|
249
|
+
|
|
250
|
+
- [Parser Class Reference](docs/parser-class.md) -- Detailed documentation of the Parser class, its methods, and parse pipeline
|
|
251
|
+
- [Maintenance Guide](docs/maintenance.md) -- Commands, recipes, and troubleshooting
|
package/CHANGELOG.md
CHANGED
|
@@ -3,13 +3,44 @@
|
|
|
3
3
|
All notable changes to this project will be documented in this file.
|
|
4
4
|
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.
|
|
5
5
|
|
|
6
|
-
|
|
6
|
+
# [5.0.0-alpha.0](https://github.com/markuplint/markuplint/compare/v4.14.1...v5.0.0-alpha.0) (2026-02-20)
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
### Bug Fixes
|
|
9
|
+
|
|
10
|
+
- **ml-core:** improve detection of namespace ([5b507ad](https://github.com/markuplint/markuplint/commit/5b507ad7c19c5015b8ce587845d901e31dfa6518))
|
|
11
|
+
- resolve additional eslint-plugin-unicorn v63 errors ([e58a72c](https://github.com/markuplint/markuplint/commit/e58a72c17c97bbec522f9513b99777fac6904d64))
|
|
12
|
+
- use explicit `export type` for type-only re-exports ([7c77c05](https://github.com/markuplint/markuplint/commit/7c77c05619518c8d18a183132040f5b2cd0ab6ec))
|
|
13
|
+
|
|
14
|
+
- feat(parser-utils)!: adapt to simplified MLASTToken properties ([5cbbc9c](https://github.com/markuplint/markuplint/commit/5cbbc9ca8f77a71d99bffa14b193c79b26c1c415))
|
|
15
|
+
|
|
16
|
+
### BREAKING CHANGES
|
|
17
|
+
|
|
18
|
+
- Update Token type and parser internals for
|
|
19
|
+
simplified AST token properties.
|
|
20
|
+
|
|
21
|
+
Token type property renames:
|
|
9
22
|
|
|
23
|
+
- startOffset -> offset
|
|
24
|
+
- startLine -> line
|
|
25
|
+
- startCol -> col
|
|
10
26
|
|
|
27
|
+
Parser changes:
|
|
11
28
|
|
|
29
|
+
- createToken() no longer produces endOffset/endLine/endCol
|
|
30
|
+
- visitPsBlock() parameter: conditionalType -> blockBehavior
|
|
31
|
+
- visitElement() accepts blockBehavior option
|
|
32
|
+
- Remove selfClosingSolidus token generation
|
|
33
|
+
- Add getEndPosition() helper to get-location.ts
|
|
12
34
|
|
|
35
|
+
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
36
|
+
|
|
37
|
+
## [4.8.11](https://github.com/markuplint/markuplint/compare/@markuplint/parser-utils@4.8.10...@markuplint/parser-utils@4.8.11) (2026-02-10)
|
|
38
|
+
|
|
39
|
+
**Note:** Version bump only for package @markuplint/parser-utils
|
|
40
|
+
|
|
41
|
+
## [4.8.10](https://github.com/markuplint/markuplint/compare/@markuplint/parser-utils@4.8.9...@markuplint/parser-utils@4.8.10) (2025-11-05)
|
|
42
|
+
|
|
43
|
+
**Note:** Version bump only for package @markuplint/parser-utils
|
|
13
44
|
|
|
14
45
|
## [4.8.9](https://github.com/markuplint/markuplint/compare/@markuplint/parser-utils@4.8.8...@markuplint/parser-utils@4.8.9) (2025-08-24)
|
|
15
46
|
|
package/README.md
CHANGED
|
@@ -16,3 +16,9 @@ $ yarn add @markuplint/parser-utils
|
|
|
16
16
|
```
|
|
17
17
|
|
|
18
18
|
</details>
|
|
19
|
+
|
|
20
|
+
## Documentation
|
|
21
|
+
|
|
22
|
+
- [Architecture](ARCHITECTURE.md) ([日本語](ARCHITECTURE.ja.md)) — Package overview, module relationships, and integration points
|
|
23
|
+
- [Parser Class Reference](docs/parser-class.md) ([日本語](docs/parser-class.ja.md)) — Complete reference for the abstract `Parser` class
|
|
24
|
+
- [Maintenance Guide](docs/maintenance.md) ([日本語](docs/maintenance.ja.md)) — Commands, recipes, and troubleshooting
|
package/SKILL.md
ADDED
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Perform maintenance tasks for @markuplint/parser-utils
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# parser-utils-maintenance
|
|
6
|
+
|
|
7
|
+
Perform maintenance tasks for `@markuplint/parser-utils`: create new parsers,
|
|
8
|
+
add ignore tag patterns, add IDL attribute mappings, and customize attribute parsing.
|
|
9
|
+
|
|
10
|
+
## Input
|
|
11
|
+
|
|
12
|
+
`$ARGUMENTS` specifies the task. Supported tasks:
|
|
13
|
+
|
|
14
|
+
| Task | Description |
|
|
15
|
+
| -------------------------- | ------------------------------------------ |
|
|
16
|
+
| `create-parser` | Create a new parser extending Parser class |
|
|
17
|
+
| `add-ignore-tag <type>` | Add an IgnoreTag pattern |
|
|
18
|
+
| `add-idl-attribute <name>` | Add an IDL attribute mapping |
|
|
19
|
+
| `customize-attr-parsing` | Customize attribute parsing behavior |
|
|
20
|
+
|
|
21
|
+
If omitted, defaults to `create-parser`.
|
|
22
|
+
|
|
23
|
+
## Reference
|
|
24
|
+
|
|
25
|
+
Before executing any task, read `docs/maintenance.md` (or `docs/maintenance.ja.md`)
|
|
26
|
+
for the full guide. The recipes there are the source of truth for procedures.
|
|
27
|
+
|
|
28
|
+
Also read:
|
|
29
|
+
|
|
30
|
+
- `docs/parser-class.md` -- Complete Parser class reference with override patterns
|
|
31
|
+
- `ARCHITECTURE.md` -- Package overview, module relationships, and integration points
|
|
32
|
+
|
|
33
|
+
## Task: create-parser
|
|
34
|
+
|
|
35
|
+
Create a new parser extending the abstract Parser class. Follow recipe #1 in `docs/maintenance.md`.
|
|
36
|
+
|
|
37
|
+
### Step 1: Set up the package
|
|
38
|
+
|
|
39
|
+
1. Create a new package under `packages/@markuplint/`
|
|
40
|
+
2. Add `@markuplint/parser-utils` as a dependency
|
|
41
|
+
3. Create the main parser file extending `Parser<YourNode, YourState>`
|
|
42
|
+
|
|
43
|
+
### Step 2: Implement required methods
|
|
44
|
+
|
|
45
|
+
1. Read `docs/parser-class.md` for the full override pattern reference
|
|
46
|
+
2. Implement `tokenize()` -- invoke the language-specific tokenizer on `this.rawCode`
|
|
47
|
+
3. Implement `nodeize()` -- convert each AST node using visitor methods
|
|
48
|
+
4. Set constructor options (`endTagType`, `tagNameCaseSensitive`, `ignoreTags`, etc.)
|
|
49
|
+
|
|
50
|
+
### Step 3: Export the parser module
|
|
51
|
+
|
|
52
|
+
1. Export as `MLParserModule`: `export default { parser: new MyParser() }`
|
|
53
|
+
2. Build: `yarn build --scope @markuplint/<package-name>`
|
|
54
|
+
3. Test with `nodeListToDebugMaps` snapshot assertions
|
|
55
|
+
|
|
56
|
+
## Task: add-ignore-tag
|
|
57
|
+
|
|
58
|
+
Add an IgnoreTag pattern for masking template expressions. Follow recipe #3 in `docs/maintenance.md`.
|
|
59
|
+
|
|
60
|
+
### Step 1: Define the pattern
|
|
61
|
+
|
|
62
|
+
1. Identify the start and end delimiters of the template expression
|
|
63
|
+
2. Choose a `type` name (becomes the `#ps:` node name prefix)
|
|
64
|
+
3. Start and end can be strings or RegExp patterns
|
|
65
|
+
|
|
66
|
+
### Step 2: Add to constructor
|
|
67
|
+
|
|
68
|
+
1. Add the `IgnoreTag` entry to the `ignoreTags` array in the parser constructor
|
|
69
|
+
2. Consider if a custom `maskChar` is needed (default is `\uE000`)
|
|
70
|
+
|
|
71
|
+
### Step 3: Verify
|
|
72
|
+
|
|
73
|
+
1. Build: `yarn build --scope @markuplint/<package-name>`
|
|
74
|
+
2. Test that the template expression is correctly masked and restored
|
|
75
|
+
3. Verify the restored node has the expected `#ps:<type>` name
|
|
76
|
+
|
|
77
|
+
## Task: add-idl-attribute
|
|
78
|
+
|
|
79
|
+
Add an IDL attribute mapping. Follow recipe #5 in `docs/maintenance.md`.
|
|
80
|
+
|
|
81
|
+
### Step 1: Add the mapping
|
|
82
|
+
|
|
83
|
+
1. Read `src/idl-attributes.ts` and find the `idlContentMap` object
|
|
84
|
+
2. Add a new entry: `idlPropName: 'content-attr-name'`
|
|
85
|
+
3. Follow naming conventions: key is camelCase IDL name, value is lowercase content name
|
|
86
|
+
|
|
87
|
+
### Step 2: Verify
|
|
88
|
+
|
|
89
|
+
1. Build: `yarn build --scope @markuplint/parser-utils`
|
|
90
|
+
2. Test with `searchIDLAttribute()` to confirm the mapping resolves correctly
|
|
91
|
+
|
|
92
|
+
## Task: customize-attr-parsing
|
|
93
|
+
|
|
94
|
+
Customize attribute parsing behavior for a specific parser. Follow recipe #4 in `docs/maintenance.md`.
|
|
95
|
+
|
|
96
|
+
### Step 1: Override visitAttr
|
|
97
|
+
|
|
98
|
+
1. Read `docs/parser-class.md` for the `visitAttr()` documentation
|
|
99
|
+
2. Override `visitAttr()` in your parser subclass
|
|
100
|
+
3. Call `super.visitAttr(token, options)` with custom options:
|
|
101
|
+
- `quoteSet` -- custom quote delimiters (e.g., `{` `}` for JSX)
|
|
102
|
+
- `startState` -- initial AttrState (usually `BeforeName`)
|
|
103
|
+
- `noQuoteValueType` -- value type for unquoted values
|
|
104
|
+
|
|
105
|
+
### Step 2: Post-process
|
|
106
|
+
|
|
107
|
+
1. After calling `super`, use `this.updateAttr()` to set metadata:
|
|
108
|
+
- `isDirective` for framework directives
|
|
109
|
+
- `isDynamicValue` for dynamic bindings
|
|
110
|
+
- `potentialName` for directive-to-attribute name resolution
|
|
111
|
+
2. Use `searchIDLAttribute()` to resolve IDL property names
|
|
112
|
+
|
|
113
|
+
### Step 3: Verify
|
|
114
|
+
|
|
115
|
+
1. Build the parser package
|
|
116
|
+
2. Test attribute parsing with `attributesToDebugMaps` for snapshot assertions
|
|
117
|
+
3. Verify all framework-specific directive patterns are recognized
|
|
118
|
+
|
|
119
|
+
## Rules
|
|
120
|
+
|
|
121
|
+
1. **Always call `super.visitAttr()`** when overriding. It handles token decomposition.
|
|
122
|
+
2. **Always call `super.detectElementType()`** when overriding. Pass framework-specific patterns as the `defaultPattern` argument.
|
|
123
|
+
3. **Never call `super.tokenize()` or `super.nodeize()`** -- the defaults return empty arrays.
|
|
124
|
+
4. **Always call `super.beforeParse()` and `super.afterParse()`** -- they handle offset spaces.
|
|
125
|
+
5. **Test across all downstream parsers** when modifying the Parser class.
|
|
126
|
+
6. **Add JSDoc comments** to all new public methods and properties.
|