carve-grammars 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2024 php-collective
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,206 @@
1
+ # Carve Grammars
2
+
3
+ Grammars for the [Carve](https://github.com/markup-carve/carve) markup language:
4
+
5
+ - a **Tiptap** integration (editor kit + serializer) that turns a Tiptap/ProseMirror document into Carve markup;
6
+ - **Prism** and **highlight.js** syntax-highlighting grammars for rendering Carve source on the web.
7
+
8
+ Modeled on [djot-grammars](https://github.com/php-collective/djot-grammars), adapted to Carve's syntax. The Tiptap mark mapping mirrors `carve-php`'s `HtmlToCarve` converter; the highlighting grammars mirror the canonical token set in [`carve/resources/grammar.ebnf`](https://github.com/markup-carve/carve) and the TextMate grammar in [vscode-carve](https://github.com/markup-carve/vscode-carve).
9
+
10
+ > **Status:** Tiptap integration, plus Prism and highlight.js grammars.
11
+ > Sibling editor grammars live in their own repos: **TextMate** in
12
+ > [vscode-carve](https://github.com/markup-carve/vscode-carve) and
13
+ > [intellij-carve](https://github.com/markup-carve/intellij-carve);
14
+ > **Tree-sitter** in [tree-sitter-carve](https://github.com/markup-carve/tree-sitter-carve)
15
+ > and [zed-carve](https://github.com/markup-carve/zed-carve).
16
+
17
+ ## Install
18
+
19
+ ```bash
20
+ npm install carve-grammars
21
+ ```
22
+
23
+ All peer dependencies are optional - install only what you use:
24
+ `@tiptap/core` + `@tiptap/starter-kit` (v2) for the editor, `prismjs` (v1) for
25
+ Prism, `highlight.js` (v11) for highlight.js.
26
+
27
+ `CarveKit` also pulls in several standalone Tiptap marks/extensions (highlight,
28
+ subscript, superscript, underline, link, image, table, task-list); install the
29
+ `@tiptap/extension-*` packages you use, or disable them via
30
+ `CarveKit.configure({ underline: false, ... })`.
31
+
32
+ ## Usage
33
+
34
+ ```js
35
+ import { Editor } from '@tiptap/core'
36
+ import { CarveKit, serializeToCarve } from 'carve-grammars/tiptap'
37
+
38
+ const editor = new Editor({
39
+ element: document.getElementById('editor'),
40
+ extensions: [CarveKit],
41
+ onUpdate: ({ editor }) => {
42
+ const carve = serializeToCarve(editor.getJSON())
43
+ console.log(carve)
44
+ },
45
+ })
46
+ ```
47
+
48
+ ### Individual extensions
49
+
50
+ ```js
51
+ import StarterKit from '@tiptap/starter-kit'
52
+ import { CarveInsert, CarveDelete, CarveDiv, serializeToCarve } from 'carve-grammars/tiptap'
53
+
54
+ const editor = new Editor({
55
+ extensions: [StarterKit, CarveInsert, CarveDelete, CarveDiv],
56
+ })
57
+ ```
58
+
59
+ ## Mark mapping
60
+
61
+ | Tiptap mark | Carve token | Renders as |
62
+ |-------------|-------------|------------|
63
+ | bold | `*text*` / `{*text*}` | `<strong>` |
64
+ | italic | `/text/` / `{/text/}` | `<em>` |
65
+ | underline | `_text_` / `{_text_}` | `<u>` |
66
+ | code | `` `text` `` | `<code>` |
67
+ | highlight | `=text=` / `{=text=}` | `<mark>` |
68
+ | strike | `~text~` / `{~text~}` | `<s>` |
69
+ | subscript | `,text,` / `{,text,}` | `<sub>` |
70
+ | superscript | `^text^` / `{^text^}` | `<sup>` |
71
+ | insert | `{+text+}` | `<ins>` |
72
+ | delete | `{-text-}` | `<del>` |
73
+ | link | `[text](url)` / `[text](url "title")` | `<a>` |
74
+ | image | `![alt](src)` / `![alt](src "title")` | `<img>` |
75
+ | span | `[text]{.class}` | `<span class>` |
76
+ | abbreviation | `[text]{abbr="..."}` | `<abbr title>` \*\*\* |
77
+
78
+ \*\*\* `[text]{abbr="..."}` renders a real `<abbr title>` only when carve's
79
+ `SemanticSpanExtension` is enabled (the same opt-in extension also maps `{kbd}`
80
+ -> `<kbd>`, `{dfn}` -> `<dfn>`, `{samp}` -> `<samp>`, `{var}` -> `<var>`).
81
+ Without it, the attribute stays literal: `<span abbr="...">`. The mark's
82
+ `parseHTML` reads back the `<abbr title>` form.
83
+
84
+ The tokens target carve-php's **parser** (the contract: serialized Carve must parse
85
+ back to the same elements). Carve's inline syntax differs notably from Djot's:
86
+ emphasis is `/text/` (Djot uses `_`), `_text_` is underline, `~text~` is
87
+ strikethrough, subscript is `,text,`, and highlight is `=text=` (single-char
88
+ delimiters since carve #108).
89
+
90
+ Each single-char delimiter has two equivalent forms: a **bare** form
91
+ (`=text=`) and a **forced brace** form (`{=text=}`) that also works intraword;
92
+ both parse to the same element. The two columns above list bare / forced.
93
+ `serializeToCarve` emits the bare form for `* / _ ~` and the forced `{…}` form
94
+ for `= , ^` (round-trip-safe — those delimiters are likelier to be inert bare);
95
+ `{+…+}` / `{-…-}` (insert / delete) have only the brace form, since `+` / `-`
96
+ are not emphasis delimiters.
97
+
98
+ ### Escaping
99
+
100
+ To honor that round-trip contract, `serializeToCarve` escapes literal Carve
101
+ syntax in plain text so it parses back as text rather than markup - inline code,
102
+ links, footnotes, CriticMarkup, mentions/tags/emoji, and an emphasis delimiter
103
+ appearing inside its own span. Escaping is **contextual**: Carve's flanking rules
104
+ already make most lone delimiters inert (`price * 2`, intraword `x_1`,
105
+ `comma,, two`, `C:\path`, `a@b.com`), so those stay clean. The same logic is
106
+ exposed as `escapeCarve(text)`.
107
+
108
+ ## Block elements
109
+
110
+ Headings (`#`), bullet / ordered / task lists, blockquotes (`>`), fenced code
111
+ blocks (`` ``` lang ``), horizontal rules (`---`), tables (with `|=` header
112
+ cells and `^` / `<` row / column spans), container divs (`::: class`), and
113
+ definition lists.
114
+
115
+ ## Syntax highlighting
116
+
117
+ Render Carve source as highlighted HTML on the web. Both grammars cover the full
118
+ Carve token set: headings, lists, tables, blockquotes, fenced/raw blocks,
119
+ container divs, front matter and comments, plus inline emphasis
120
+ (`*bold*` `/italic/` `_underline_` `~strike~` `=highlight=` `^sup^` `,sub,`),
121
+ code, links, images, spans, attributes, footnotes, math (`` $`x`$ ``),
122
+ CriticMarkup (`{+ins+}` `{-del-}`), mentions, tags and emoji.
123
+
124
+ ### Prism
125
+
126
+ The grammar registers itself against the global `Prism`, so `Prism` must be
127
+ global before the grammar module runs. Because static `import` statements are
128
+ hoisted (they all evaluate before any top-level assignment), load the grammar
129
+ with a dynamic `import` after assigning `globalThis.Prism`:
130
+
131
+ ```js
132
+ import Prism from 'prismjs'
133
+
134
+ globalThis.Prism = Prism // grammar reads the global Prism
135
+ await import('carve-grammars/prism/carve.js') // registers Prism.languages.carve
136
+
137
+ const html = Prism.highlight(source, Prism.languages.carve, 'carve')
138
+ ```
139
+
140
+ In the browser, load `prismjs` first (it sets the global `Prism`), then load
141
+ `carve-grammars/prism/carve.js`.
142
+
143
+ ### highlight.js
144
+
145
+ ```js
146
+ import hljs from 'highlight.js'
147
+ import carve from 'carve-grammars/highlightjs/carve.js'
148
+
149
+ hljs.registerLanguage('carve', carve)
150
+ const { value } = hljs.highlight(source, { language: 'carve' })
151
+ ```
152
+
153
+ Loaded as a classic `<script>` after highlight.js, it self-registers against
154
+ the global `hljs`:
155
+
156
+ ```html
157
+ <script src="highlight.min.js"></script>
158
+ <script src="node_modules/carve-grammars/highlightjs/carve.js"></script>
159
+ <script>hljs.highlightAll();</script>
160
+ ```
161
+
162
+ ## API
163
+
164
+ - `serializeToCarve(doc)` - serialize an `editor.getJSON()` document to Carve markup.
165
+ - `escapeCarve(text)` - contextually escape literal Carve syntax in a plain-text run so it round-trips as text (used internally by `serializeToCarve`).
166
+ - `CarveKit` - the bundled Tiptap extension set.
167
+ - Individual extensions: `CarveInsert`, `CarveDelete`, `CarveDiv`, `CarveSpan`, `CarveFootnote`, `CarveFootnoteDefinition`, `CarveMath`, `CarveEmbed`, `CarveAbbreviation`, `CarveDefinitionList`.
168
+
169
+ ## Attributes, math and footnotes
170
+
171
+ - **Attributes** - spans, headings and images serialize an `id` and `class`
172
+ (and any extra non-structural attrs) as a `{#id .class key="val"}` block, e.g.
173
+ `[text]{#me .note}`, `![alt](src){.wide}`. Inline attrs trail their target;
174
+ block attrs (headings) sit on the **preceding** line (strict djot), e.g.
175
+ `{#slug}` then `# Title`.
176
+ - **Math** - `CarveMath` (inline atom) serializes to `` $`x`$ `` and, with
177
+ `display: true`, `` $$`x`$$ ``.
178
+ - **Footnotes** - `CarveFootnote` is the inline `[^label]` reference;
179
+ `CarveFootnoteDefinition` is the matching body block, serialized as
180
+ `[^label]: body`.
181
+
182
+ ## Tests
183
+
184
+ ```bash
185
+ npm test
186
+ ```
187
+
188
+ The suite holds all three grammars to one source of truth: the shared corpus
189
+ from the [`markup-carve/carve`](https://github.com/markup-carve/carve) spec,
190
+ vendored as the `spec/` git submodule (`git submodule update --init`).
191
+
192
+ - `npm run test:coverage` - the coverage matrix. Each grammar (prism,
193
+ highlightjs, tiptap) declares a covered-category set and a skip set (with a
194
+ reason per skip); the test fails if the two do not partition every corpus
195
+ category, so a new spec category forces a deliberate decision.
196
+ - `npm run test:snapshot` - golden token snapshots. Each covered `.crv` is
197
+ tokenized with Prism's and highlight.js's own tokenizers and the token stream
198
+ (type + text) is compared against a committed golden in `tests/snapshots/`.
199
+ Refresh intended changes with `npm run snapshots:update`.
200
+ - `npm run test:roundtrip` - the Tiptap serializer round-trip. Each covered
201
+ `.crv` runs `parse -> ProseMirror JSON -> serializeToCarve -> parse` and the
202
+ two parsed ASTs must be identical, catching serializer drift. Categories the
203
+ serializer cannot represent are skipped with a reason.
204
+
205
+ `npm test` runs all of the above plus the structural grammar and serializer
206
+ unit tests. CI runs the same on Node 18, 20 and 22.
@@ -0,0 +1,456 @@
1
+ /**
2
+ * Carve language definition for highlight.js
3
+ *
4
+ * Carve is a Djot-derived markup language with distinct inline delimiters:
5
+ * emphasis is /text/ (not _text_), underline is _text_, strikethrough is
6
+ * ~text~ (Djot uses ~ for subscript), subscript is ,text, and highlight is
7
+ * =text= (Djot uses {=text=}). Strong (*text*), superscript (^text^),
8
+ * insert ({+text+}) and delete ({-text-}) match Djot.
9
+ *
10
+ * This file is a UMD module so it works in every documented integration:
11
+ *
12
+ * - ESM: `import carve from 'carve-grammars/highlightjs/carve.js'` (resolved to
13
+ * the carve.mjs shim via the package `exports` map), then
14
+ * `hljs.registerLanguage('carve', carve)`.
15
+ * - Classic `<script src=".../highlightjs/carve.js">` after highlight.js: it
16
+ * self-registers against the global `hljs` (and exposes `globalThis.carveHljs`).
17
+ * - CommonJS contexts that load this file as CommonJS get the factory on
18
+ * `module.exports`.
19
+ *
20
+ * A top-level `export default` is intentionally NOT used: that would be a
21
+ * syntax error when the file is loaded as a classic browser script.
22
+ *
23
+ * @see https://github.com/markup-carve/carve for the Carve specification
24
+ */
25
+ (function (root, factory) {
26
+ var carve = factory();
27
+ if (typeof module === 'object' && module.exports) {
28
+ module.exports = carve;
29
+ }
30
+ if (root) {
31
+ // Exposed for the ESM shim (carve.mjs) and for classic <script> use.
32
+ root.carveHljs = carve;
33
+ if (root.hljs && typeof root.hljs.registerLanguage === 'function') {
34
+ root.hljs.registerLanguage('carve', carve);
35
+ }
36
+ }
37
+ }(typeof globalThis !== 'undefined' ? globalThis : this, function () {
38
+ 'use strict';
39
+ /**
40
+ * @param {object} [hljs] - the highlight.js instance (unused, kept for the
41
+ * standard language-definition signature).
42
+ * @returns {object} a highlight.js language definition.
43
+ */
44
+ return function carve(hljs) {
45
+ // Block attributes: {.class #id key=value} or boolean {reversed}
46
+ // Excludes special inline syntax like {= {+ {- {%
47
+ const ATTRIBUTE = {
48
+ className: 'attr',
49
+ begin: /\{(?![=+\-%])[^}]+\}/,
50
+ relevance: 5,
51
+ };
52
+
53
+ // Headings: # to ######
54
+ const HEADING = {
55
+ className: 'section',
56
+ begin: /^#{1,6}\s/,
57
+ end: /$/,
58
+ relevance: 10,
59
+ };
60
+
61
+ // Emphasis (Carve): /text/ - the begin guard avoids URLs and paths
62
+ // (a/b, ://); the end is a closing slash not followed by word char/slash.
63
+ const EMPHASIS = {
64
+ className: 'emphasis',
65
+ begin: /(?<![\w:/])\/(?=\S)/,
66
+ end: /\/(?![\w/])/,
67
+ relevance: 0,
68
+ };
69
+
70
+ // Underline (Carve): _text_ - not in the middle of words
71
+ const UNDERLINE = {
72
+ className: 'emphasis',
73
+ begin: /(?<!\w)_(?!\s)/,
74
+ end: /_(?!\w)/,
75
+ relevance: 0,
76
+ };
77
+
78
+ // Strong: *text* - not in the middle of words, can contain emphasis.
79
+ // Excludes *[ which is abbreviation-definition syntax.
80
+ const STRONG = {
81
+ className: 'strong',
82
+ begin: /(?<!\w)\*(?![\s\[])/,
83
+ end: /\*(?!\w)/,
84
+ relevance: 0,
85
+ contains: [EMPHASIS, UNDERLINE],
86
+ };
87
+
88
+ // Highlight (Carve): =text= (single-char; intraword as {=text=})
89
+ const HIGHLIGHT = {
90
+ className: 'addition',
91
+ begin: /(?<![=\w])=(?=\S)/,
92
+ end: /=(?![=\w])/,
93
+ relevance: 3,
94
+ };
95
+
96
+ // Insert: {+text+}
97
+ const INSERT = {
98
+ className: 'addition',
99
+ begin: /\{\+/,
100
+ end: /\+\}/,
101
+ relevance: 5,
102
+ };
103
+
104
+ // Delete: {-text-}
105
+ const DELETE = {
106
+ className: 'deletion',
107
+ begin: /\{-/,
108
+ end: /-\}/,
109
+ relevance: 5,
110
+ };
111
+
112
+ // Strikethrough (Carve): ~text~ (Djot uses ~ for subscript instead)
113
+ const STRIKETHROUGH = {
114
+ className: 'deletion',
115
+ begin: /(?<!\w)~(?=\S)/,
116
+ end: /~(?!\w)/,
117
+ relevance: 2,
118
+ };
119
+
120
+ // Subscript (Carve): ,text, (single-char; intraword as {,text,})
121
+ const SUBSCRIPT = {
122
+ className: 'built_in',
123
+ begin: /(?<!\w),(?=\S)/,
124
+ end: /,(?!\w)/,
125
+ relevance: 3,
126
+ };
127
+
128
+ // Superscript: ^text^
129
+ const SUPERSCRIPT = {
130
+ className: 'built_in',
131
+ begin: /\^(?!\s)/,
132
+ end: /\^/,
133
+ relevance: 2,
134
+ };
135
+
136
+ // Math: $$`...`$$ (display) and $`...`$ (inline). Must precede the inline
137
+ // code modes - the leading $ keeps them from matching, but order is clearer.
138
+ const MATH_DISPLAY = {
139
+ className: 'string',
140
+ begin: /\$\$`+/,
141
+ end: /`+\$\$/,
142
+ relevance: 5,
143
+ };
144
+ const MATH_INLINE = {
145
+ className: 'string',
146
+ begin: /\$`+/,
147
+ end: /`+\$/,
148
+ relevance: 5,
149
+ };
150
+
151
+ // Inline code: `code` or ``code``. highlight.js has no begin->end
152
+ // backreference to match fence widths, so handle the two common widths
153
+ // explicitly - double backticks first, so an embedded single backtick
154
+ // (``a ` b``) does not close the span early.
155
+ const INLINE_CODE_DOUBLE = {
156
+ className: 'code',
157
+ begin: /``/,
158
+ end: /``/,
159
+ relevance: 0,
160
+ };
161
+ const INLINE_CODE_SINGLE = {
162
+ className: 'code',
163
+ begin: /`/,
164
+ end: /`/,
165
+ relevance: 0,
166
+ };
167
+
168
+ // Inline links: [text](url) with optional trailing attributes
169
+ const LINK = {
170
+ className: 'link',
171
+ begin: /\[[^\]]*\]\([^)]*\)(\{[^}]+\})?/,
172
+ relevance: 5,
173
+ };
174
+
175
+ // Autolinks: <https://...> or <mailto:...>
176
+ const AUTOLINK = {
177
+ className: 'link',
178
+ begin: /<(?:https?:\/\/|mailto:)[^>]+>/,
179
+ relevance: 5,
180
+ };
181
+
182
+ // Email autolinks: <user@example.com>
183
+ const EMAIL_AUTOLINK = {
184
+ className: 'link',
185
+ begin: /<[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}>/,
186
+ relevance: 5,
187
+ };
188
+
189
+ // Images: ![alt](url) with optional trailing attributes
190
+ const IMAGE = {
191
+ className: 'link',
192
+ begin: /!\[[^\]]*\]\([^)]*\)(\{[^}]+\})?/,
193
+ relevance: 5,
194
+ };
195
+
196
+ // Reference links: [text][ref] with optional trailing attributes
197
+ const REFERENCE_LINK = {
198
+ className: 'link',
199
+ begin: /\[[^\]]+\]\[[^\]]*\](\{[^}]+\})?/,
200
+ relevance: 5,
201
+ };
202
+
203
+ // Spans with attributes: [text]{.class} or [text]{#id}
204
+ const SPAN = {
205
+ className: 'string',
206
+ begin: /\[[^\]]+\]\{[^}]+\}/,
207
+ relevance: 5,
208
+ };
209
+
210
+ // Reference definitions: [ref]: url
211
+ const REFERENCE_DEF = {
212
+ className: 'symbol',
213
+ begin: /^\[[^\]^\]]+\]:/,
214
+ end: /$/,
215
+ relevance: 10,
216
+ };
217
+
218
+ // Footnote references: [^note]
219
+ const FOOTNOTE_REF = {
220
+ className: 'symbol',
221
+ begin: /\[\^[^\]]+\]/,
222
+ relevance: 5,
223
+ };
224
+
225
+ // Citations (Tier-2 §22): [@key], [+@key], [@key, p.10], [@a; @b]
226
+ // A bracket whose content holds at least one `@key` with no trailing
227
+ // `(url)`, `[ref]`, or `{attrs}` suffix. The negative lookahead is handled
228
+ // by position in the contains array (SPAN and REFERENCE_LINK are checked
229
+ // first to claim those suffixed forms).
230
+ const CITATION = {
231
+ className: 'symbol',
232
+ begin: /\[\+?(?:[^\]@]*@[A-Za-z0-9_][A-Za-z0-9_.:#$%&+?<>~\/-]*[^\]]*)\](?!\(|\[|\{)/,
233
+ relevance: 8,
234
+ };
235
+
236
+ // Code callouts (Tier-2 §10): <n> markers trailing a code-fence line or
237
+ // leading a callout-list item.
238
+ const CODE_CALLOUT = {
239
+ className: 'symbol',
240
+ begin: /<\d+>/,
241
+ relevance: 5,
242
+ };
243
+
244
+ // Footnote definitions: [^note]: content
245
+ const FOOTNOTE_DEF = {
246
+ className: 'symbol',
247
+ begin: /^\[\^[^\]]+\]:/,
248
+ end: /$/,
249
+ relevance: 10,
250
+ };
251
+
252
+ // Abbreviation definitions: *[ABBR]: text
253
+ const ABBREVIATION_DEF = {
254
+ className: 'symbol',
255
+ begin: /^\*\[[^\]]+\]:/,
256
+ end: /$/,
257
+ relevance: 10,
258
+ };
259
+
260
+ // Blockquotes: > text
261
+ const BLOCKQUOTE = {
262
+ className: 'quote',
263
+ begin: /^>/,
264
+ end: /$/,
265
+ relevance: 0,
266
+ };
267
+
268
+ // Horizontal rules: --- or *** or ___
269
+ const HORIZONTAL_RULE = {
270
+ className: 'meta',
271
+ begin: /^(-{3,}|\*{3,}|_{3,})$/,
272
+ relevance: 10,
273
+ };
274
+
275
+ // Bullet list items: - or *
276
+ const LIST_BULLET = {
277
+ className: 'bullet',
278
+ begin: /^[ \t]*[-*](?=\s)/,
279
+ relevance: 0,
280
+ };
281
+
282
+ // Numbered list items: decimal (1.), alpha (a. A.), roman (i. I.)
283
+ const LIST_NUMBER = {
284
+ className: 'bullet',
285
+ begin: /^[ \t]*(\d+[.)]|[a-zA-Z][.)]|[ivxlcdmIVXLCDM]+[.)])(?=\s)/,
286
+ relevance: 0,
287
+ };
288
+
289
+ // Task list items: - [ ] or - [x]
290
+ const TASK_LIST = {
291
+ className: 'bullet',
292
+ begin: /^[ \t]*[-*]\s\[[ xX_]\]/,
293
+ relevance: 5,
294
+ };
295
+
296
+ // Definition list terms: : term
297
+ const DEFINITION_TERM = {
298
+ className: 'title',
299
+ begin: /^: /,
300
+ end: /$/,
301
+ relevance: 5,
302
+ };
303
+
304
+ // Code fence opening: ``` or ~~~ with optional language
305
+ const CODE_FENCE_START = {
306
+ className: 'keyword',
307
+ begin: /^[`~]{3,}\s*[a-zA-Z]*$/,
308
+ relevance: 10,
309
+ };
310
+
311
+ // Code fence closing: ``` or ~~~
312
+ const CODE_FENCE_END = {
313
+ className: 'keyword',
314
+ begin: /^[`~]{3,}$/,
315
+ relevance: 10,
316
+ };
317
+
318
+ // Div block opening: ::: with optional class
319
+ const DIV_BLOCK_START = {
320
+ className: 'keyword',
321
+ begin: /^:{3,}\s*\w*$/,
322
+ relevance: 10,
323
+ };
324
+
325
+ // Div block closing: :::
326
+ const DIV_BLOCK_END = {
327
+ className: 'keyword',
328
+ begin: /^:{3,}$/,
329
+ relevance: 10,
330
+ };
331
+
332
+ // Inline comments: {% comment %}
333
+ const INLINE_COMMENT = {
334
+ className: 'comment',
335
+ begin: /\{%/,
336
+ end: /%\}/,
337
+ relevance: 5,
338
+ };
339
+
340
+ // Table separator: |---|---|
341
+ const TABLE_SEPARATOR = {
342
+ className: 'meta',
343
+ begin: /^\|[-:| ]+\|$/,
344
+ relevance: 5,
345
+ };
346
+
347
+ // Line blocks: | text (for poetry) - must precede TABLE_ROW
348
+ const LINE_BLOCK = {
349
+ className: 'string',
350
+ begin: /^\| /,
351
+ end: /$/,
352
+ relevance: 3,
353
+ };
354
+
355
+ // Table rows: | cell | cell |
356
+ const TABLE_ROW = {
357
+ className: 'string',
358
+ begin: /^\|/,
359
+ end: /\|(\{[^}]*\})?$/,
360
+ relevance: 2,
361
+ };
362
+
363
+ // Captions: ^ caption text
364
+ const CAPTION = {
365
+ className: 'title',
366
+ begin: /^\^ /,
367
+ end: /$/,
368
+ relevance: 5,
369
+ };
370
+
371
+ // Raw format marker: {=html} or {=latex}
372
+ const RAW_FORMAT = {
373
+ className: 'meta',
374
+ begin: /\{=[a-zA-Z]+\}/,
375
+ relevance: 5,
376
+ };
377
+
378
+ // Escaped characters: \* \[ etc
379
+ const ESCAPE = {
380
+ className: 'symbol',
381
+ begin: /\\[!"#$%&'()*+,.\/:;<=>?@\[\\\]^_`{|}~-]/,
382
+ relevance: 0,
383
+ };
384
+
385
+ // Hard line break: \ at end of line
386
+ const HARD_BREAK = {
387
+ className: 'meta',
388
+ begin: /\\$/,
389
+ relevance: 2,
390
+ };
391
+
392
+ return {
393
+ name: 'Carve',
394
+ aliases: ['carve'],
395
+ case_insensitive: false,
396
+ contains: [
397
+ // NOTE: front matter is intentionally NOT highlighted. It is valid
398
+ // only at the very top of the document, but highlight.js has no
399
+ // document-start anchor, so a `^---$` begin would also match a bare
400
+ // `---` horizontal rule mid-document and swallow everything up to
401
+ // the next `---`. The horizontal-rule rule below handles `---`
402
+ // lines instead. (Prism anchors front matter via `^` with no `m`
403
+ // flag; see prism/carve.js.)
404
+
405
+ // Block-level elements (order matters - more specific first)
406
+ HEADING,
407
+ CODE_FENCE_START,
408
+ CODE_FENCE_END,
409
+ DIV_BLOCK_START,
410
+ DIV_BLOCK_END,
411
+ HORIZONTAL_RULE,
412
+ TABLE_SEPARATOR,
413
+ LINE_BLOCK, // Must be before TABLE_ROW (both start with |)
414
+ TABLE_ROW,
415
+ BLOCKQUOTE,
416
+ CAPTION,
417
+ TASK_LIST, // Must be before LIST_BULLET
418
+ LIST_BULLET,
419
+ LIST_NUMBER,
420
+ DEFINITION_TERM,
421
+ FOOTNOTE_DEF, // Must be before REFERENCE_DEF
422
+ ABBREVIATION_DEF, // Must be before REFERENCE_DEF (*[ABBR]: vs [ref]:)
423
+ REFERENCE_DEF,
424
+
425
+ // Inline elements (order matters - more specific first)
426
+ FOOTNOTE_REF,
427
+ IMAGE, // Must be before LINK (starts with !)
428
+ SPAN, // Must be before LINK ([text]{attr} vs [text](url))
429
+ REFERENCE_LINK, // Must be before LINK ([text][ref] vs [text](url))
430
+ CITATION, // Must be after SPAN/REF_LINK (no (url)/[ref]/{attr} tail)
431
+ CODE_CALLOUT, // <n> callout markers
432
+ LINK,
433
+ AUTOLINK,
434
+ EMAIL_AUTOLINK,
435
+ RAW_FORMAT, // {=html} - must be before INSERT/DELETE braces
436
+ INSERT, // {+text+}
437
+ DELETE, // {-text-}
438
+ INLINE_COMMENT, // {% %} - must be before ATTRIBUTE
439
+ HIGHLIGHT, // =text=
440
+ SUBSCRIPT, // ,text,
441
+ SUPERSCRIPT, // ^text^
442
+ STRONG,
443
+ EMPHASIS, // /text/
444
+ UNDERLINE, // _text_
445
+ STRIKETHROUGH, // ~text~
446
+ MATH_DISPLAY, // $$`...`$$ - before inline code (leading $)
447
+ MATH_INLINE, // $`...`$
448
+ INLINE_CODE_DOUBLE, // ``code`` - before single
449
+ INLINE_CODE_SINGLE, // `code`
450
+ ATTRIBUTE,
451
+ ESCAPE,
452
+ HARD_BREAK,
453
+ ],
454
+ };
455
+ };
456
+ }));