red_quilt 0.7.1 → 0.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 97ab1d8ff3dcb3278403b6f85fba5c49bfef8fa9fa2aceac979873fd580260ba
4
- data.tar.gz: b604fecab6bf8f3e3ab06a768cc3625e7adbcca3f256334efcf7c9c2e1c4fcd8
3
+ metadata.gz: 689739c1f6cf971cbeaacdbb65b50fcf7600cecef38bc25aa607ab5e87a559e5
4
+ data.tar.gz: 2d57a5a5993bfb8352c18ec8c6da5dc06169d59198e72870399c85d4cc1e1769
5
5
  SHA512:
6
- metadata.gz: 2709412545b3b9c28752f6da004781bcf628874db68eea0753231951f3c73a89a5bdadf9250e7940db77f1e8d4d5dca89bd4ed04c428c4f764fcdb7d278baf85
7
- data.tar.gz: 27568989536531184a37814fd84b81961887ddfc4de9de3cc05de6135930833a06578cb36c8ffff3f9f95abd6dd7c4ad6894281b7f72a2f8e33ef65fb583e44c
6
+ metadata.gz: b3f3b90749d7307db9121d9f1af968a72e94cad5aa06a3f9cf54505ff047e482246f5aa2c0fa14f1b74d9b3d03b516629b3f1c4e57e5ecb6cf5d171d4bc2b6f6
7
+ data.tar.gz: f8a8929d8075e0c9e3ec32145daac57ca4565ff299bed3f119faee01f5253727d44acea4dacac579ee676715138e4f03f16e0555544b4b25ee1e30c779db1de5
data/CHANGELOG.md CHANGED
@@ -5,6 +5,25 @@ All notable changes to this project are documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [Unreleased]
9
+
10
+ ### Added
11
+
12
+ - Opt-in Mermaid diagram support via the `mermaid:` option on `render_html` /
13
+ `Document#to_html` and the `--mermaid` CLI flag (off by default). Fenced
14
+ ` ```mermaid ` code blocks render as `<pre class="mermaid">` containers; in
15
+ standalone output the mermaid.js runtime is loaded from a CDN and each
16
+ diagram is made interactive (wheel zoom, drag pan, +/-/reset controls) with
17
+ svg-pan-zoom.
18
+
19
+ ## [0.7.1] - 2026-06-06
20
+
21
+ ### Added
22
+
23
+ - `--open` CLI flag: render the Markdown to a standalone HTML file and open it
24
+ in the default browser (forces `--standalone`; writes under `Dir.tmpdir`
25
+ when `-o` is not given).
26
+
8
27
  ## [0.7.0] - 2026-05-29
9
28
 
10
29
  ### Added
data/README.md CHANGED
@@ -75,14 +75,31 @@ doc.diagnostics.first.severity # => :warning
75
75
  ### Heading anchors (opt-in)
76
76
 
77
77
  `render_html` / `to_html` accept `heading_ids:` to give every heading a
78
- slugified `id` for anchor links. Slugs follow GitHub's scheme but keep Unicode
79
- intact, so Japanese headings stay readable; duplicates get `-1`, `-2` suffixes.
78
+ slugified `id` for anchor links.
80
79
 
81
80
  ```ruby
82
81
  RedQuilt.render_html("# Hello World\n\n## はじめに", heading_ids: true)
83
82
  # => "<h1 id=\"hello-world\">Hello World</h1>\n<h2 id=\"はじめに\">はじめに</h2>\n"
84
83
  ```
85
84
 
85
+ ### Mermaid diagrams (opt-in)
86
+
87
+ `render_html` / `to_html` accept `mermaid:` to render ` ```mermaid ` fenced
88
+ code blocks for [mermaid.js](https://mermaid.js.org/).
89
+ In standalone output the mermaid.js runtime is also loaded from a CDN.
90
+
91
+ ```ruby
92
+ RedQuilt.render_html("```mermaid\ngraph LR\n A --> B\n```", mermaid: true)
93
+ # => "<pre class=\"mermaid\">graph LR\n A --&gt; B\n</pre>\n"
94
+
95
+ # Full page that renders the diagram in a browser (CDN script included):
96
+ RedQuilt.parse("```mermaid\ngraph LR\n A --> B\n```")
97
+ .to_html(standalone: true, mermaid: true)
98
+ ```
99
+
100
+ In standalone output each diagram is made interactive with
101
+ [svg-pan-zoom](https://github.com/bumbu/svg-pan-zoom) (loaded from a CDN).
102
+
86
103
  ### Tilt integration
87
104
 
88
105
  RedQuilt ships a [Tilt](https://github.com/jeremyevans/tilt) adapter.
@@ -100,13 +117,13 @@ Native options (`allow_html:`, `footnotes:`, …) pass straight through; Tilt's
100
117
  ## Documentation
101
118
 
102
119
  - [API reference](docs/api.md) — `Document` / `NodeRef` / `SourceSpan`, supported syntax, and usage examples
103
- - [Architecture overview](docs/architecture.ja.md) (日本語)
104
- - [Arena usage guide](docs/arena-usage.ja.md) (日本語)
105
- - [CommonMark conformance notes](docs/commonmark-conformance.ja.md) (日本語)
120
+ - [Architecture overview](docs/architecture.md) ([日本語](docs/architecture.ja.md))
121
+ - [Arena usage guide](docs/arena-usage.md) ([日本語](docs/arena-usage.ja.md))
122
+ - [CommonMark conformance notes](docs/commonmark-conformance.md) ([日本語](docs/commonmark-conformance.ja.md))
106
123
 
107
124
  ## CommonMark Compatibility
108
125
 
109
- RedQuilt achieves 100% compliance with the CommonMark v0.31.2 specification.
126
+ RedQuilt achieves 100% compliance with the CommonMark v0.31.2 test cases.
110
127
  See the [conformance notes](docs/commonmark-conformance.ja.md) for GFM
111
128
  extensions and intentional deviations.
112
129
 
@@ -143,6 +160,9 @@ redquilt -o output.html input.md
143
160
 
144
161
  # Render and open the result in the default browser
145
162
  redquilt --open input.md
163
+
164
+ # Render mermaid code blocks as diagrams (loads mermaid.js from a CDN)
165
+ redquilt --mermaid --open input.md
146
166
  ```
147
167
 
148
168
  ### Options
@@ -164,6 +184,8 @@ redquilt --open input.md
164
184
  --open Write HTML to a file and open it in the default
165
185
  browser (forces --standalone; uses a file under
166
186
  Dir.tmpdir when -o is not given)
187
+ --mermaid Render `mermaid` code blocks as diagrams (loads
188
+ mermaid.js from a CDN in standalone output)
167
189
  --diagnostics Print diagnostics to stderr
168
190
  --diagnostics-only Print diagnostics only (suppress output)
169
191
  -h, --help Show help
@@ -216,7 +238,9 @@ RedQuilt.render_html(user_markdown, allow_html: true)
216
238
  bundle exec rake spec
217
239
  ```
218
240
 
219
- Runs 70+ CommonMark compatibility and feature tests.
241
+ Runs the full CommonMark 0.31.2 conformance suite (all 652 official examples,
242
+ parsed directly from `spec/fixtures/cmark_spec-0.31.2.md`) plus RedQuilt's own
243
+ feature tests — 1000 examples in total.
220
244
 
221
245
  ### Benchmark
222
246
 
data/Rakefile CHANGED
@@ -6,3 +6,34 @@ require "rspec/core/rake_task"
6
6
  RSpec::Core::RakeTask.new(:spec)
7
7
 
8
8
  task default: :spec
9
+
10
+ desc "Report CommonMark spec conformance, broken down by section"
11
+ task :conformance do
12
+ require_relative "lib/red_quilt"
13
+ require_relative "spec/support/commonmark_spec_loader"
14
+
15
+ examples = CommonMarkSpecLoader.examples
16
+ sections = examples.each_with_object({}) do |example, acc|
17
+ stats = acc[example[:section]] ||= { pass: 0, total: 0 }
18
+ stats[:total] += 1
19
+ actual = RedQuilt.parse(example[:markdown], allow_html: true).to_html
20
+ stats[:pass] += 1 if actual == example[:html]
21
+ end
22
+
23
+ total = { pass: sections.values.sum { |s| s[:pass] }, total: examples.size }
24
+ width = sections.keys.map(&:length).max
25
+ divider = " #{'-' * width} ---------- -------"
26
+
27
+ puts "CommonMark #{CommonMarkSpecLoader::VERSION} conformance", ""
28
+ puts format(" %-#{width}s %-10s %s", "Section", "Pass/Total", "Rate")
29
+ puts divider
30
+ sections.each do |name, stats|
31
+ rate = stats[:pass].fdiv(stats[:total]) * 100
32
+ puts format(" %-#{width}s %4d / %4d %5.1f%%", name, stats[:pass], stats[:total], rate)
33
+ end
34
+ puts divider
35
+ rate = total[:pass].fdiv(total[:total]) * 100
36
+ puts format(" %-#{width}s %4d / %4d %5.1f%%", "TOTAL", total[:pass], total[:total], rate)
37
+
38
+ abort("\nConformance is below 100%.") unless total[:pass] == total[:total]
39
+ end
data/docs/api.md CHANGED
@@ -23,6 +23,10 @@ doc.disallow_raw_html? # Check GFM disallowed-raw-HTML filtering setting
23
23
  doc.to_html(standalone: true, theme: :default, title: "My Doc", lang: "en")
24
24
  # theme: :default (compact, dark-mode-aware stylesheet) or :none (bare).
25
25
  # css: "style.css" links an external stylesheet instead.
26
+
27
+ # Render `mermaid` code blocks as <pre class="mermaid"> diagrams; in
28
+ # standalone mode the mermaid.js runtime is loaded from a CDN too.
29
+ doc.to_html(standalone: true, mermaid: true)
26
30
  ```
27
31
 
28
32
  ## NodeRef (AST node wrapper)
@@ -35,7 +35,7 @@ Source (Markdown String)
35
35
  ## 各ステージの責務
36
36
 
37
37
  ### `RedQuilt.normalize_input`
38
- CommonMark§2.3/2.4の最小前処理。`\r\n`/`\r`→`\n`の行末正規化と、NUL→U+FFFDの置換だけを行う。
38
+ - CommonMark§2.3/2.4の最小前処理。`\r\n`/`\r`→`\n`の行末正規化と、NUL→U+FFFDの置換だけを行う。
39
39
 
40
40
  ### BlockParser
41
41
  - 行分割: sourceを`Line` Struct配列へ。各行はbyte spanで保持する。
@@ -0,0 +1,99 @@
1
+ # RedQuilt Architecture Overview
2
+
3
+ This document gives a high-level view of how RedQuilt is structured.
4
+
5
+ ## Pipeline
6
+
7
+ ```
8
+ Source (Markdown String)
9
+
10
+ ▼ RedQuilt.normalize_input (lib/red_quilt.rb)
11
+
12
+ ▼ BlockParser (lib/red_quilt/block_parser.rb)
13
+ │ dispatch / container parsers / build_lines
14
+ │ (list.rb, blockquote.rb, reference_definition.rb)
15
+
16
+ ▼ Arena (raw inline spans)
17
+ │ The body of each paragraph / heading / table cell is kept
18
+ │ as a byte span or a str1 literal.
19
+
20
+ ▼ InlinePass (lib/red_quilt/inline_pass.rb)
21
+ │ ├─ Inline::Lexer (lib/red_quilt/inline/lexer.rb)
22
+ │ │ byte scan -> Tokens (parallel array)
23
+ │ └─ Inline::Builder (lib/red_quilt/inline/builder.rb)
24
+ │ linear pass -> process_emphasis (CommonMark §6.2)
25
+
26
+ ▼ Arena (inline resolved)
27
+
28
+ ▼ (option) FootnotePass (footnotes: true)
29
+ ▼ (option) ExtendedAutolinkPass (extended_autolinks: true)
30
+ ▼ (option) LintPass (lint: true)
31
+
32
+ ▼ Renderer::HTML (lib/red_quilt/renderer/html.rb)
33
+ walk the arena and append to a mutable String
34
+ ```
35
+
36
+ ## Responsibility of each stage
37
+
38
+ ### `RedQuilt.normalize_input`
39
+ - Minimal preprocessing required by CommonMark §2.3 / §2.4. It only normalizes
40
+ line endings (`\r\n` / `\r` -> `\n`) and replaces NUL with U+FFFD.
41
+
42
+ ### BlockParser
43
+ - Line splitting: turn the source into an array of `Line` structs. Each line is
44
+ kept as a byte span.
45
+ - Dispatch: decide the block kind from the first byte of the line
46
+ (`paragraph_only_line?` quickly routes non-block lines).
47
+ - Container delegation: lists and blockquotes are delegated to `List::Parser`
48
+ and `Blockquote::Parser`, which call `parse_lines` recursively.
49
+ - Collecting and excluding definitions: link reference definitions (the
50
+ reference table) and opt-in footnote definitions (`FootnoteRegistry`) are
51
+ pulled out of the body flow and gathered in dedicated collectors.
52
+ - Column calculation: indentation that includes tab expansion is delegated to
53
+ `Indentation`.
54
+ - Output: build block nodes in the Arena, with inline content still unresolved.
55
+
56
+ ### InlinePass / Inline::Lexer / Inline::Builder
57
+ - Target selection: scan and process each inline target (paragraph / heading /
58
+ table cell).
59
+ - Lexer: scan the target's byte span, or the range of a str1 literal, into
60
+ Tokens (a parallel array).
61
+ - Builder, step 1 (linear_pass): resolve code spans, links, images, autolinks,
62
+ and simple inlines.
63
+ - Builder, step 2 (process_emphasis): collapse the delimiter stack to finalize
64
+ emphasis / strong / strikethrough (CommonMark §6.2; strikethrough is a GFM
65
+ extension).
66
+ - Footnote references: resolve `[^label]` through `FootnoteRegistry`, number
67
+ them in first-reference order, and create a `FOOTNOTE_REFERENCE`.
68
+
69
+ ### FootnotePass (`footnotes: true`)
70
+ - Reordering: sort the definitions under `FOOTNOTES_SECTION` (at the end of the
71
+ root) into first-reference order.
72
+ - Pruning: detach unreferenced definitions.
73
+ - Section removal: if there are no references at all, remove the section itself.
74
+
75
+ ### Renderer::HTML
76
+ - Walk: walk the arena recursively and append directly with `<<` to a mutable
77
+ String opened with `+""`.
78
+ - Raw HTML: `allow_html` switches between passing HTML through and escaping it;
79
+ `disallow_raw_html` filters HTML using GFM "Disallowed Raw HTML".
80
+ - Footnotes: render `FOOTNOTE_REFERENCE` as a sup link, and the trailing
81
+ `FOOTNOTES_SECTION` as `<section class="footnotes">` with backrefs.
82
+
83
+ ## Where the main subsystems live
84
+
85
+ | Area | Files |
86
+ |---|---|
87
+ | Entry point / input normalization | `lib/red_quilt.rb` |
88
+ | Public API | `lib/red_quilt/document.rb`, `node_ref.rb` |
89
+ | Arena | `lib/red_quilt/arena.rb` |
90
+ | Block parsing | `block_parser.rb`, `list.rb`, `blockquote.rb`, `indentation.rb` |
91
+ | Reference definitions | `reference_definition.rb` |
92
+ | Footnotes (opt-in) | `footnote_definition.rb`, `footnote_registry.rb`, `footnote_pass.rb` |
93
+ | Inline parsing | `inline.rb`, `inline/lexer.rb`, `inline/tokens.rb`, `inline/flanking.rb`, `inline/builder.rb`, `inline/link_scanner.rb` |
94
+ | Inline entities | `inline/html_entities.rb` |
95
+ | HTML / MDAST output | `renderer/html.rb`, `renderer/mdast.rb` |
96
+ | Extension passes | `inline_pass.rb`, `footnote_pass.rb`, `extended_autolink_pass.rb`, `lint_pass.rb` |
97
+ | Source positions | `source_span.rb`, `source_map.rb` |
98
+ | Diagnostics | `diagnostic.rb` |
99
+ | CLI | `cli.rb`, `exe/redquilt` |
@@ -92,7 +92,7 @@ ArenaはASTを「オブジェクトのツリー」ではなく[parallel array](h
92
92
  - メモリ局所性が良く、GC圧が小さい
93
93
  - ノードを「軽い」値として扱えるのでRenderer / Builderをinline化しやすい
94
94
 
95
- ###列(column)一覧
95
+ #### 列(column)一覧
96
96
 
97
97
  |列名|用途|
98
98
  |------|------|
@@ -183,7 +183,7 @@ Arenaの公開メソッドは以下の3レイヤーに分けて読むと意図
183
183
 
184
184
  各NodeTypeがどのint / strスロットを使うかは規約で決まっています。以下が現在の規約です。
185
185
 
186
- ### Blockノード
186
+ #### Blockノード
187
187
 
188
188
  | NodeType | int1 | int2 | int3 | str1 | str2 |
189
189
  |----------|------|------|------|------|------|
@@ -202,7 +202,7 @@ Arenaの公開メソッドは以下の3レイヤーに分けて読むと意図
202
202
  | `FOOTNOTE_DEFINITION` | - | - | - | 正規化済みlabel | - |
203
203
  | `FOOTNOTES_SECTION` | - | - | - | - | - |
204
204
 
205
- ### Inlineノード
205
+ #### Inlineノード
206
206
 
207
207
  | NodeType | int1 | int2 | int3 | str1 | str2 |
208
208
  |----------|------|------|------|------|------|
@@ -219,7 +219,7 @@ Arenaの公開メソッドは以下の3レイヤーに分けて読むと意図
219
219
 
220
220
  > footnoteは`footnotes: true`時のみ生成されます。`FOOTNOTES_SECTION`はroot直下の最後の子として置かれ(span-less、`source_start: -1`)、参照された`FOOTNOTE_DEFINITION`を初回参照順に保持します。backrefの個数はfootnote番号とlabelからrender時に算出します。
221
221
 
222
- ### Source spanの慣習
222
+ #### Source spanの慣習
223
223
 
224
224
  - `source_start` / `source_len`: 元documentのbytes (絶対byte offset)
225
225
  - `source_start < 0`: spanなし。leafノードでは内容を`str1`にliteralとして持つことが多いが、container inlineは子ノードだけを持つ場合がある。
@@ -334,7 +334,7 @@ arena.update_span(text_id, 0, 12)
334
334
 
335
335
  ## 6. パフォーマンス上の注意
336
336
 
337
- ####ホットパスでは`each_child`を使う
337
+ #### ホットパスでは`each_child`を使う
338
338
 
339
339
  ブロック直yieldでEnumerator allocationを避ける。`child_ids`は外部API用
340
340
 
@@ -0,0 +1,423 @@
1
+ # How to use the Arena class
2
+
3
+ `RedQuilt::Arena` is the low-level storage class that holds the actual AST of
4
+ RedQuilt. This document describes its API and assumptions for people who touch
5
+ the Arena directly: block parsers, inline builders, renderers, custom
6
+ transformers, and any other code under `lib/red_quilt`.
7
+
8
+ > If you only need to work with the AST as an external API, the standard path is
9
+ > to go through `RedQuilt::Document` and `RedQuilt::NodeRef`. The Arena is a more
10
+ > internal layer; it is the very data structure that `NodeRef` is built on.
11
+
12
+ ---
13
+
14
+ ## 0. A quick example
15
+
16
+ The code below works if you copy and paste it as is. It should give you a feel
17
+ for how the Arena "builds a tree from a source string using various IDs".
18
+
19
+ ```ruby
20
+ require "red_quilt"
21
+
22
+ source = "Hello *world*"
23
+ arena = RedQuilt::Arena.new(source)
24
+
25
+ # (a) Create nodes. The return value is a node id (Integer).
26
+ para_id = arena.add_node(RedQuilt::NodeType::PARAGRAPH,
27
+ source_start: 0, source_len: source.bytesize)
28
+ text_id = arena.add_node(RedQuilt::NodeType::TEXT,
29
+ source_start: 0, source_len: 6) # "Hello "
30
+ em_id = arena.add_node(RedQuilt::NodeType::EMPHASIS,
31
+ source_start: 6, source_len: 7) # "*world*"
32
+ inner_id = arena.add_node(RedQuilt::NodeType::TEXT,
33
+ source_start: 7, source_len: 5) # "world"
34
+
35
+ # (b) Build parent/child relationships
36
+ arena.append_child(para_id, text_id)
37
+ arena.append_child(para_id, em_id)
38
+ arena.append_child(em_id, inner_id)
39
+
40
+ # (c) Read content back out
41
+ puts "type: #{arena.type_name(para_id)}"
42
+ puts "text: #{arena.text(text_id).inspect}"
43
+ puts "inner: #{arena.text(inner_id).inspect}"
44
+ puts "span: #{arena.source_span(em_id).inspect}"
45
+
46
+ # (d) Iterate over children (block form, no Enumerator)
47
+ puts "children of paragraph:"
48
+ arena.each_child(para_id) do |child_id|
49
+ puts " #{arena.type_name(child_id)}: #{arena.text(child_id).inspect}"
50
+ end
51
+ ```
52
+
53
+ The output looks like this:
54
+
55
+ ```
56
+ type: paragraph
57
+ text: "Hello "
58
+ inner: "world"
59
+ span: #<RedQuilt::SourceSpan:0x... @start_byte=6, @end_byte=13>
60
+ children of paragraph:
61
+ text: "Hello "
62
+ emphasis: "*world*"
63
+ ```
64
+
65
+ The AST that this sample builds looks like this:
66
+
67
+ ```
68
+ PARAGRAPH [0, 13) "Hello *world*"
69
+ ├─ TEXT [0, 6) "Hello "
70
+ └─ EMPHASIS [6, 13) "*world*"
71
+ └─ TEXT [7, 12) "world"
72
+ ```
73
+
74
+ The key points when working with the Arena are:
75
+
76
+ - `add_node` returns a node ID (Integer). Every later API call uses this ID as
77
+ the key.
78
+ - `source_start` / `source_len` specify a range over the original source string
79
+ in bytes, not characters. The Arena does not keep a copy of the string itself.
80
+ - `text(id)` returns str1 if it exists; otherwise it byteslices `source`.
81
+ - `each_child(id)` is the basic traversal API and is used on the hot path.
82
+
83
+ Keep these in mind, and the later sections will read as "what does this API
84
+ actually guarantee, and how should I use it?".
85
+
86
+ ---
87
+
88
+ ## 1. Design highlights
89
+
90
+ The Arena represents the AST not as a "tree of objects" but as a
91
+ [parallel array](https://en.wikipedia.org/wiki/Parallel_array).
92
+
93
+ - Nodes are identified by an integer ID (`node_id`).
94
+ - The attributes for each ID (parent / source span / payload) are kept as
95
+ columns in separate Arrays.
96
+ - Adding a node is just an append to the end of each Array; no new Ruby object is
97
+ created at all.
98
+
99
+ As a result, the Arena has the following properties:
100
+
101
+ - On the hot path you only pass around Integers (IDs).
102
+ - Memory locality is good and GC pressure is low.
103
+ - Nodes can be treated as "lightweight" values, which makes it easy to inline the
104
+ Renderer and Builder.
105
+
106
+ #### List of columns
107
+
108
+ | Column | Purpose |
109
+ |------|------|
110
+ | `@type` | NodeType (an Integer constant) |
111
+ | `@parent` / `@first_child` / `@last_child` / `@next_sibling` / `@prev_sibling` | Parent / child / sibling links. The value is a node id (`NO_NODE` means "none"). |
112
+ | `@source_start` / `@source_len` | Byte range within the document source. `source_start < 0` means "no span". |
113
+ | `@int1` / `@int2` / `@int3` | Integer slots whose meaning depends on the NodeType (default `0`). |
114
+ | `@str1` / `@str2` | String slots whose meaning depends on the NodeType (default `nil`). |
115
+
116
+ ---
117
+
118
+ ## 2. Invariants
119
+
120
+ These assumptions always hold when you work with the Arena.
121
+
122
+ 1. Node IDs increase monotonically.
123
+ The ID handed out by `add_node` starts at `@type.length` and increases by 1
124
+ each time you add a node. IDs are never reused.
125
+ 2. Detached nodes stay in the columns.
126
+ `detach` only resets the parent and sibling links to `NO_NODE`; the column
127
+ record itself stays in the arena. A later `add_node` never reuses that slot.
128
+ This is a deliberate choice that keeps allocation simple.
129
+ 3. Treat `@source` as immutable.
130
+ You must not rewrite the source after the Arena is built. `source_start` /
131
+ `source_len` point directly at byte ranges, so if the source changes, the
132
+ return values of `text` / `source_span` break.
133
+ 4. `NO_NODE` = -1.
134
+ This is the sentinel meaning "no parent or sibling exists". You can reference
135
+ it as the `Arena::NO_NODE` constant.
136
+ 5. `source_start < 0` means "no span".
137
+ In this case the content of a leaf node is often held as a literal in `@str1`
138
+ (for example, a paragraph after a blockquote is removed, or a TEXT node after
139
+ entity decoding). However, some NodeTypes, like container inlines, have no
140
+ span and also do not use `str1`; they build their content from child nodes.
141
+
142
+ ---
143
+
144
+ ## 3. API layers
145
+
146
+ The public methods of the Arena are easier to understand if you read them in
147
+ these three layers.
148
+
149
+ ### 3.1 Structure mutation (mutators)
150
+
151
+ APIs for building and editing the tree. They assume you pass a valid id and do
152
+ minimal safety checking.
153
+
154
+ | Method | Summary |
155
+ |----------|------|
156
+ | `add_node(type, **fields)` | Append a new node at the end and return its ID. It starts detached. |
157
+ | `append_child(parent_id, child_id)` | Append to the end of the parent's child list. |
158
+ | `insert_before(parent_id, ref_id, new_id)` | Insert immediately before `ref_id`. |
159
+ | `detach(child_id)` | Detach from the parent. The node itself remains. |
160
+ | `reparent(new_parent_id, first_id, last_id)` | Move the sibling range `first_id..last_id` to a new parent. |
161
+ | `update_span(id, start_byte, end_byte)` | Reset the source span. |
162
+ | `update_str1(id, value)` / `update_int3(id, value)` | Overwrite an individual slot. |
163
+
164
+ ### 3.2 Structure access (raw id accessors)
165
+
166
+ These return the raw column value, which may be `NO_NODE`. The naming convention
167
+ `raw_X_id` means "the return value is a node id and may be -1 (`NO_NODE`)".
168
+
169
+ | Method | Return value |
170
+ |----------|--------|
171
+ | `raw_parent_id(id)` | Parent id, or `NO_NODE`. |
172
+ | `raw_first_child_id(id)` / `raw_last_child_id(id)` | Child id, or `NO_NODE`. |
173
+ | `raw_next_sibling_id(id)` / `raw_prev_sibling_id(id)` | Sibling id, or `NO_NODE`. |
174
+
175
+ ### 3.3 Payload access (column accessors)
176
+
177
+ These return each column as raw data. You should read from the return type
178
+ whether a sentinel can come back.
179
+
180
+ | Method | Return value |
181
+ |----------|--------|
182
+ | `type(id)` | NodeType constant (Integer). |
183
+ | `type_name(id)` | Symbol (for example, `:paragraph`). |
184
+ | `source_start(id)` / `source_len(id)` | Byte offset / byte length. `source_start < 0` means no span. |
185
+ | `int1(id)` / `int2(id)` / `int3(id)` | Integer (default 0). |
186
+ | `str1(id)` / `str2(id)` | String or `nil`. |
187
+
188
+ ### 3.4 Semantic accessors
189
+
190
+ These interpret the low-level columns and return an "easy to use" value. They can
191
+ return `nil` to explicitly express "none".
192
+
193
+ | Method | Return value |
194
+ |----------|--------|
195
+ | `source_span(id)` | `SourceSpan`, or `nil` if there is no span. |
196
+ | `text(id)` | str1 if present; otherwise `source.byteslice(...)`. `nil` if neither exists. |
197
+
198
+ ### 3.5 Traversal
199
+
200
+ | Method | Purpose |
201
+ |----------|------|
202
+ | `each_child(id) { |child_id| ... }` | Block form. Recommended on the hot path (no Enumerator). |
203
+ | `child_ids(id)` | Returns an `Enumerator`, for chaining `map` / `select`, etc. |
204
+
205
+ ---
206
+
207
+ ## 4. Slot usage per NodeType
208
+
209
+ Which int / str slots each NodeType uses is fixed by convention. The current
210
+ conventions are below.
211
+
212
+ #### Block nodes
213
+
214
+ | NodeType | int1 | int2 | int3 | str1 | str2 |
215
+ |----------|------|------|------|------|------|
216
+ | `DOCUMENT` | - | - | - | - | - |
217
+ | `PARAGRAPH` | - | - | - | A joined literal when needed (when transformed, or when leading indent is removed, etc.) | - |
218
+ | `HEADING` | level (1-6) | - | - | An inline literal when needed (when transformed, setext heading, etc.) | - |
219
+ | `THEMATIC_BREAK` | - | - | - | - | - |
220
+ | `BLOCKQUOTE` | - | - | - | - | - |
221
+ | `LIST` | ordered? (0/1) | start_number | tight? (1=tight) | marker (`-`/`*`/`+`/`.`/`)`) | - |
222
+ | `LIST_ITEM` | - | - | - | - | - |
223
+ | `CODE_BLOCK` | - | - | - | code content (literal) | info string (fenced only) |
224
+ | `HTML_BLOCK` | - | - | - | HTML content (literal) | - |
225
+ | `TABLE` | - | - | - | - | - |
226
+ | `TABLE_ROW` | header? (1/0) | - | - | - | - |
227
+ | `TABLE_CELL` | header? (1/0) | - | - | stripped cell text | - |
228
+ | `FOOTNOTE_DEFINITION` | - | - | - | normalized label | - |
229
+ | `FOOTNOTES_SECTION` | - | - | - | - | - |
230
+
231
+ #### Inline nodes
232
+
233
+ | NodeType | int1 | int2 | int3 | str1 | str2 |
234
+ |----------|------|------|------|------|------|
235
+ | `TEXT` | - | - | - | literal (after entity decode, etc.) or `nil` (span-based) | - |
236
+ | `SOFTBREAK` / `HARDBREAK` | - | - | - | `"\n"` | - |
237
+ | `EMPHASIS` / `STRONG` / `STRIKETHROUGH` | - | - | - | - | - |
238
+ | `CODE_SPAN` | - | - | - | normalized content (literal) | - |
239
+ | `LINK` | - | - | - | sanitized destination | title (or `nil`) |
240
+ | `IMAGE` | - | - | - | sanitized destination | title (or `nil`) |
241
+ | `HTML_INLINE` | - | - | - | matched HTML literal | - |
242
+ | `FOOTNOTE_REFERENCE` | footnote number | occurrence count (the Nth one for the same label) | - | normalized label | - |
243
+
244
+ > `-` means "not used" (left at the default `0` / `nil`).
245
+
246
+ > Footnotes are only generated when `footnotes: true`. `FOOTNOTES_SECTION` is
247
+ > placed as the last child directly under the root (span-less,
248
+ > `source_start: -1`), and holds the referenced `FOOTNOTE_DEFINITION`s in
249
+ > first-reference order. The number of backrefs is computed at render time from
250
+ > the footnote number and label.
251
+
252
+ #### Source span conventions
253
+
254
+ - `source_start` / `source_len`: bytes of the original document (absolute byte
255
+ offset).
256
+ - `source_start < 0`: no span. A leaf node often holds its content as a literal
257
+ in `str1`, but a container inline may have only child nodes.
258
+ - The span of a block node serves two different purposes depending on use.
259
+ - For inline targets (paragraph / heading / table cell), the span is also the
260
+ byte range that the InlinePass tokenizes, so it points at the inline body
261
+ with `#` or other prefixes removed.
262
+ - For everything else (list / blockquote / table / code / html block, etc.)
263
+ the span is not used for tokenizing and only carries
264
+ structural / line-level position information.
265
+
266
+ ---
267
+
268
+ ## 5. Typical usage
269
+
270
+ ### 5.1 Create an Arena and build a small AST
271
+
272
+ ```ruby
273
+ source = "Hello *world*"
274
+ arena = RedQuilt::Arena.new(source)
275
+
276
+ doc_id = arena.add_node(RedQuilt::NodeType::DOCUMENT,
277
+ source_start: 0, source_len: source.bytesize)
278
+
279
+ para_id = arena.add_node(RedQuilt::NodeType::PARAGRAPH,
280
+ source_start: 0, source_len: source.bytesize)
281
+ arena.append_child(doc_id, para_id)
282
+
283
+ text_id = arena.add_node(RedQuilt::NodeType::TEXT,
284
+ source_start: 0, source_len: 6) # "Hello "
285
+ arena.append_child(para_id, text_id)
286
+
287
+ em_id = arena.add_node(RedQuilt::NodeType::EMPHASIS,
288
+ source_start: 6, source_len: 7) # "*world*"
289
+ arena.append_child(para_id, em_id)
290
+
291
+ inner_id = arena.add_node(RedQuilt::NodeType::TEXT,
292
+ source_start: 7, source_len: 5) # "world"
293
+ arena.append_child(em_id, inner_id)
294
+
295
+ arena.text(text_id) # => "Hello "
296
+ arena.text(inner_id) # => "world"
297
+ arena.source_span(em_id) # => #<SourceSpan @start_byte=6 @end_byte=13>
298
+ ```
299
+
300
+ ### 5.2 Loop over siblings (hot path)
301
+
302
+ ```ruby
303
+ arena.each_child(para_id) do |child_id|
304
+ case arena.type(child_id)
305
+ when RedQuilt::NodeType::TEXT
306
+ output << arena.text(child_id)
307
+ when RedQuilt::NodeType::EMPHASIS
308
+ output << "<em>"
309
+ render_children(child_id)
310
+ output << "</em>"
311
+ end
312
+ end
313
+ ```
314
+
315
+ If you want to chain over an `Enumerator` (for example in NodeRef), do this:
316
+
317
+ ```ruby
318
+ arena.child_ids(para_id).map { |id| arena.type_name(id) }
319
+ # => [:text, :emphasis]
320
+ ```
321
+
322
+ ### 5.3 Move a node to a different parent
323
+
324
+ `reparent` is an API that replaces the children of the destination node, so the
325
+ destination should normally be a newly created empty node.
326
+
327
+ ```ruby
328
+ # Move the children of `em_id` under a new strong_id
329
+ strong_id = arena.add_node(RedQuilt::NodeType::STRONG,
330
+ source_start: arena.source_start(em_id),
331
+ source_len: arena.source_len(em_id))
332
+ arena.insert_before(arena.raw_parent_id(em_id), em_id, strong_id)
333
+
334
+ first = arena.raw_first_child_id(em_id)
335
+ last = arena.raw_last_child_id(em_id)
336
+ arena.reparent(strong_id, first, last) if first != RedQuilt::Arena::NO_NODE
337
+
338
+ # Detach em_id while it is empty. strong_id stays where em_id was.
339
+ arena.detach(em_id)
340
+ ```
341
+
342
+ ### 5.4 Replace a node
343
+
344
+ ```ruby
345
+ # Replace em_id with strong_id (keep the contents)
346
+ strong_id = arena.add_node(RedQuilt::NodeType::STRONG,
347
+ source_start: arena.source_start(em_id),
348
+ source_len: arena.source_len(em_id))
349
+ arena.insert_before(arena.raw_parent_id(em_id), em_id, strong_id)
350
+
351
+ first = arena.raw_first_child_id(em_id)
352
+ last = arena.raw_last_child_id(em_id)
353
+ arena.reparent(strong_id, first, last) if first != RedQuilt::Arena::NO_NODE
354
+
355
+ arena.detach(em_id)
356
+ ```
357
+
358
+ ### 5.5 Update column values directly
359
+
360
+ ```ruby
361
+ # A heading's level is in int1, but there is no dedicated setter for it,
362
+ # so add one if needed. Currently only str1 / int3 / span have public setters:
363
+ arena.update_str1(text_id, "Hello, world!")
364
+ arena.update_int3(list_id, 1) # make it tight
365
+ arena.update_span(text_id, 0, 12)
366
+ ```
367
+
368
+ Note that there are currently no setters for int1 / int2 / str2. The plan is to
369
+ add `update_int1` and similar when the need arises.
370
+
371
+ ---
372
+
373
+ ## 6. Performance notes
374
+
375
+ #### Use `each_child` on the hot path
376
+
377
+ Yielding directly to a block avoids Enumerator allocation. `child_ids` is for the
378
+ external API.
379
+
380
+ #### `text(id)` prefers str1
381
+
382
+ To avoid an extra `byteslice`, content that can be reconstructed from the source
383
+ should leave `str1` as `nil`. However, use `str1` in cases where a literal is
384
+ required for correctness: TEXT after entity decode, code/html literals, table
385
+ cells, transformed/literal inline targets, and so on.
386
+
387
+ #### `source_span(id)` allocates a `SourceSpan` every time
388
+
389
+ If you use it on the hot path, it is better to read `source_start` /
390
+ `source_len` directly.
391
+
392
+ #### Detached nodes cannot be reclaimed
393
+
394
+ Repeatedly detaching many nodes keeps growing the arena's columns. The scale is
395
+ fine within parsing a single document, but it is not suited to a long-lived
396
+ arena.
397
+
398
+ ---
399
+
400
+ ## 7. Pitfalls
401
+
402
+ #### When you use a `raw_*_id` return value directly as a foreign key
403
+
404
+ Do not forget the `NO_NODE` (-1) check. Using it with `Array#[-1]` reads the last
405
+ element of the array and corrupts the tree.
406
+
407
+ #### Preconditions of `reparent`
408
+
409
+ You must be able to reach `last_id` by following `next_sibling` from `first_id`.
410
+ Passing nodes with a different parent, or a `last_id` that is unreachable behind
411
+ `first_id`, can cause an infinite loop (the builder actually hit this in the
412
+ past).
413
+
414
+ #### The meaning of `source_start < 0`
415
+
416
+ It is "literal mode, with position information discarded". The user-facing APIs
417
+ (`SourceMap`, `node.source_location`, etc.) treat it as having no span. Do not
418
+ forget this and get confused in the debugger by "there is no position
419
+ information".
420
+
421
+ #### Do not change `@source` afterward
422
+
423
+ If you do, the return values of `text` / `source_span` break silently.
@@ -0,0 +1,316 @@
1
+ # RedQuilt CommonMark Conformance
2
+
3
+ ## 1. Scope of this document
4
+
5
+ This document describes how RedQuilt differs from the CommonMark / GFM spec.
6
+ For behavior that follows the spec, refer directly to the spec documents
7
+ (<https://spec.commonmark.org/0.31.2/>, <https://github.github.com/gfm/>); this
8
+ document does not repeat them.
9
+
10
+ #### What this document covers
11
+
12
+ - Places where the implementation **narrows** what the spec allows (interpreting
13
+ or tightening ambiguous areas).
14
+ - Features outside the spec (security, diagnostics, option flags).
15
+ - The extensions that are **enabled** (GFM, etc.) and their opt-in conditions.
16
+ - Unsupported features and known limitations.
17
+
18
+ #### What this document does not cover
19
+
20
+ - Descriptions of standard behavior that matches the spec.
21
+ - Design background or data structure choices.
22
+
23
+ ### 1.1 Target versions
24
+
25
+ - CommonMark: 0.31.2
26
+ - GitHub Flavored Markdown: 0.29-gfm
27
+
28
+ ### 1.2 Implementation assumptions
29
+
30
+ - Input is a UTF-8 string. Preprocessing such as `force_encoding(Encoding::UTF_8)`
31
+ is the caller's responsibility.
32
+ - The normalization required by spec §2.3 / §2.4 (NUL -> U+FFFD,
33
+ `\r\n` / `\r` -> `\n`) and limiting the blank-line definition to space/tab are
34
+ all implemented. These follow the spec, so this document does not list them
35
+ individually.
36
+
37
+ ### 1.3 Format of each item
38
+
39
+ ```
40
+ ### N.N <Title>
41
+
42
+ **Spec**: the relevant section and the spec rule (or ambiguity)
43
+ **RedQuilt behavior**: how the implementation behaves / where it narrows or extends
44
+ **Implementation**: file:line / main symbols
45
+ **Test**: spec file / example number
46
+ ```
47
+
48
+ ## 2. Points where the spec is tightened
49
+
50
+ Where the spec wording allows more than one interpretation, or where a "must" is
51
+ left ambiguous, the implementation chooses the stricter side.
52
+
53
+ ### 2.1 URI autolink rejects U+007F (DEL)
54
+
55
+ **Spec**: §6.5 — a URI autolink does not contain "ASCII control characters,
56
+ space, `<`, `>`". Whether the range of "ASCII control characters" is only
57
+ U+0000–U+001F or also includes U+007F is not stated.
58
+
59
+ **RedQuilt behavior**: also rejects U+007F.
60
+
61
+ **Implementation**: `lib/red_quilt/inline/lexer.rb` — `URI_AUTOLINK_RE`
62
+
63
+ **Test**: `spec/whitespace_strictness_spec.rb` — "URI autolink (CommonMark 6.5)"
64
+
65
+ ### 2.2 Raw HTML tag separators limited to space/tab/CR/LF
66
+
67
+ **Spec**: §6.6 — defines the separators between attributes and around `=` as
68
+ "whitespace". In the spec's terminology (§2.1), the "whitespace" set is broad and
69
+ includes space / tab / newline / line tabulation (U+000B) / form feed (U+000C) /
70
+ carriage return.
71
+
72
+ **RedQuilt behavior**: within the tag grammar, only `[ \t\r\n]` is allowed as a
73
+ separator. FF (U+000C) / VT (U+000B) are not included. The same constraint
74
+ applies to inline raw HTML and to HTML block types 1 / 6 / 7.
75
+
76
+ **Implementation**:
77
+ - Inline: `lib/red_quilt/inline/lexer.rb` — `HTML_OPEN_TAG_RE` /
78
+ `HTML_CLOSING_TAG_RE`
79
+ - Block: `lib/red_quilt/block_parser.rb` — `HTML_TYPE_7_OPEN_TAG_RE` /
80
+ `HTML_TYPE_7_CLOSING_TAG_RE` / `HTML_BLOCK_TYPE_6_RE` / type 1 regex
81
+
82
+ **Test**: `spec/whitespace_strictness_spec.rb` — "raw HTML tag whitespace
83
+ (CommonMark 6.6)"
84
+
85
+ ### 2.3 Inline link tail separators limited to space/tab/at most 1 LF
86
+
87
+ **Spec**: §6.3 — the link tail (inside `(dest "title")`) is separated by "spaces,
88
+ tabs, and up to one line ending". FF / VT are not mentioned.
89
+
90
+ **RedQuilt behavior**: within the link tail, only space / tab are separators, and
91
+ a line ending is counted separately, up to one. If FF / VT appears, it does not
92
+ form a link (it is treated as normal paragraph text).
93
+
94
+ **Implementation**: `lib/red_quilt/inline/link_scanner.rb` —
95
+ `link_tail_whitespace_byte?`, `skip_link_whitespace`, `inline_link`,
96
+ `parse_link_title`
97
+
98
+ **Test**: `spec/whitespace_strictness_spec.rb` — "inline link tail whitespace
99
+ (CommonMark 6.3)"
100
+
101
+ ### 2.4 Reference definition raw destination validated the same as inline links
102
+
103
+ **Spec**: §6.3 — the raw form of a link destination is "a nonempty sequence of
104
+ characters that does not start with `<`, does not include ASCII control
105
+ characters or space character, and includes parentheses only if (a) they are
106
+ backslash-escaped or (b) they are part of a balanced pair of unescaped
107
+ parentheses".
108
+
109
+ **RedQuilt behavior**: validates all of the above for the raw destination of a
110
+ reference definition too. Specifically, it rejects ASCII control
111
+ (U+0000–U+001F) / U+007F (DEL) / space, and tracks the depth of unescaped
112
+ parens, invalidating the definition if they are unbalanced.
113
+
114
+ **Past behavior**: it accepted destinations with a simple `/\A(\S+)(.*)\z/`, so
115
+ `[x]: foo(bar` or `[x]: foo\bbar` were also accepted as definitions.
116
+
117
+ **Implementation**: `lib/red_quilt/reference_definition.rb` —
118
+ `parse_raw_destination`, `RAW_DEST_FORBIDDEN_RE`
119
+
120
+ **Test**: `spec/link_validation_spec.rb` — "reference definition raw destination
121
+ validation"
122
+
123
+ ### 2.5 Apply the 999-character link label limit on all paths
124
+
125
+ **Spec**: §6.3 — "A link label can have at most 999 characters inside the square
126
+ brackets."
127
+
128
+ **RedQuilt behavior**: rejects more than 999 characters on both the reference
129
+ definition side and the reference link usage side (shortcut / collapsed / full,
130
+ all of them).
131
+
132
+ **Implementation**:
133
+ - Constant: `lib/red_quilt/reference_definition.rb` —
134
+ `LABEL_MAX_LENGTH = 999`, the `label_too_long?` helper
135
+ - Definition side: `match_label` (decides for both single-line and multi-line)
136
+ - Usage side: `lib/red_quilt/inline/builder.rb` — `try_reference_link`,
137
+ `lib/red_quilt/inline/link_scanner.rb` — `reference_label`
138
+
139
+ **Test**: `spec/link_validation_spec.rb` — "link label length limit (999
140
+ characters)"
141
+
142
+ ### 2.6 NCR digit limits and U+FFFD replacement of invalid codepoints
143
+
144
+ **Spec**: §6.4 — a decimal NCR is 1–7 digits, a hex NCR is 1–6 digits. If the
145
+ decode result is U+0000, a surrogate (U+D800–U+DFFF), or out of the Unicode range
146
+ (> U+10FFFF), it is replaced with U+FFFD.
147
+
148
+ **RedQuilt behavior**: implements all of the above.
149
+
150
+ **Past behavior**: it delegated to `CGI.unescapeHTML`, so an 8-digit decimal like
151
+ `&#00000065;` or a surrogate like `&#xD800;` would each decode to "A" or raise a
152
+ `RangeError`.
153
+
154
+ **Implementation**: `lib/red_quilt/inline/html_entities.rb` —
155
+ `Inline.decode_entity`, `Inline::ENTITY_RE`, `decode_numeric_codepoint`. The
156
+ `SURROGATE_RANGE` and `MAX_UNICODE_CODEPOINT` constants.
157
+
158
+ **Test**: `spec/numeric_character_reference_spec.rb`
159
+
160
+ ### 2.7 GFM table header / delimiter cell-count match requirement
161
+
162
+ **Spec (GFM §4.10)**: "The header row must match the delimiter row in the number
163
+ of cells. If not, a table will not be recognized."
164
+
165
+ **RedQuilt behavior**: if the cell count of the header and the delimiter do not
166
+ match, it is not recognized as a table and is treated as a paragraph.
167
+
168
+ **Implementation**: `lib/red_quilt/block_parser.rb` — `table_start?`
169
+
170
+ **Test**: `spec/red_quilt_spec.rb` — "table separator validation (GFM spec)"
171
+
172
+ ### 2.8 GFM extended autolink domain underscore constraint
173
+
174
+ **Spec (GFM §6.9)**: "If the domain name contains an underscore (`_`) in its last
175
+ two segments, it is invalid."
176
+
177
+ **RedQuilt behavior**: when extended autolinks are enabled, a URL / email whose
178
+ domain has `_` in its last two segments is not linkified.
179
+
180
+ **Implementation**: `lib/red_quilt/extended_autolink_pass.rb` — `valid_domain?` /
181
+ `extract_domain`
182
+
183
+ **Test**: `spec/extended_autolink_spec.rb` — "domain validation (GFM spec)"
184
+
185
+ ## 3. Features outside the spec
186
+
187
+ Features not defined in the spec that RedQuilt provides for safety and
188
+ convenience.
189
+
190
+ ### 3.1 Sanitizing unsafe URL schemes
191
+
192
+ **RedQuilt behavior**: if the scheme of a link / image destination is not in the
193
+ safe list below, it outputs `href` / `src` as an empty string. At the same time
194
+ it emits an `:unsafe_url` diagnostic as a warning. For CommonMark autolinks
195
+ (`<scheme:...>`), to stay spec-conformant, a denylist is used instead of a safe
196
+ list, and only schemes that could lead to script execution get an empty href.
197
+
198
+ **Safe schemes**: `http`, `https`, `mailto`, `ftp`, `tel`, `ssh`
199
+
200
+ **Schemes blocked in autolinks**: `javascript`, `vbscript`, `data`
201
+
202
+ **Implementation**: `lib/red_quilt/inline/builder.rb` — `SAFE_SCHEMES`,
203
+ `UNSAFE_AUTOLINK_SCHEMES`, `sanitize_destination`, `block_unsafe_autolink`
204
+
205
+ **Test**: `spec/red_quilt_spec.rb` — "sanitizes unsafe URL schemes"
206
+
207
+ ### 3.2 Diagnostics
208
+
209
+ **RedQuilt behavior**: suspicious syntax, missing references, and potential
210
+ security events detected during parse / render are accumulated in
211
+ `Document#diagnostics` as `RedQuilt::Diagnostic` objects. Processing is never
212
+ interrupted (a tree and HTML are always returned).
213
+
214
+ **Rules currently emitted**:
215
+
216
+ | Rule | Severity | Description |
217
+ |---|---|---|
218
+ | `:missing_reference` | warning | A full reference link `[text][ref]` has no definition. |
219
+ | `:duplicate_reference` | warning | There were multiple reference definitions with the same label (the first one is used). |
220
+ | `:duplicate_footnote` | warning | There were multiple footnote definitions with the same label (the first one is used; only when `footnotes: true`). |
221
+ | `:unsafe_url` | warning | An unsafe URL was replaced with an empty `href` / `src`. |
222
+ | `:empty_link` | warning | The link destination is empty (only when `lint: true`). |
223
+ | `:missing_alt` | info | An image's alt text is empty (only when `lint: true`). |
224
+ | `:heading_level_skip` | info | A heading level jumped by more than one (only when `lint: true`). |
225
+
226
+ **Implementation**: `lib/red_quilt/diagnostic.rb` (value object),
227
+ `lib/red_quilt/block_parser.rb` (duplicate reference),
228
+ `lib/red_quilt/footnote_definition.rb` (duplicate footnote),
229
+ `lib/red_quilt/inline/builder.rb` (missing / unsafe),
230
+ `lib/red_quilt/lint_pass.rb` (lint rules)
231
+
232
+ ### 3.3 `allow_html` / `disallow_raw_html` flags
233
+
234
+ **RedQuilt behavior**:
235
+
236
+ | Flag | Default | Effect |
237
+ |---|---|---|
238
+ | `allow_html` | `false` | When false, raw HTML is fully escaped (turned into `&lt;`). When true, HTML blocks and inline raw HTML are output as-is. |
239
+ | `disallow_raw_html` | `false` | The GFM "Disallowed Raw HTML" extension, enabled under `allow_html: true`. It rewrites `<` to `&lt;` for the specified tags. |
240
+
241
+ The disallowed tag set defined by GFM: `title`, `textarea`, `style`, `xmp`,
242
+ `iframe`, `noembed`, `noframes`, `script`, `plaintext`
243
+
244
+ **Implementation**: `lib/red_quilt/document.rb` — `allow_html?` /
245
+ `disallow_raw_html?`
246
+ **Implementation (filter)**: `lib/red_quilt/renderer/html.rb` —
247
+ `DISALLOWED_RAW_TAGS` / `DISALLOWED_RAW_TAG_RE` / `filter_disallowed_raw`
248
+
249
+ ## 4. Enabled extensions
250
+
251
+ ### 4.1 GFM Table
252
+
253
+ Always enabled. In addition to the spec, the column-count match requirement from
254
+ 2.7 is applied.
255
+
256
+ **Implementation**: `lib/red_quilt/block_parser.rb` — `table_start?` /
257
+ `parse_table`
258
+
259
+ ### 4.2 GFM Strikethrough
260
+
261
+ Always enabled. Only the double tilde `~~text~~` is supported (matching GFM
262
+ behavior). A single tilde `~text~` is treated as normal text.
263
+
264
+ **Implementation**: `lib/red_quilt/inline/lexer.rb` (handling of `~` in
265
+ `SPECIAL_BYTES` and `scan_delim_run`), `lib/red_quilt/inline/builder.rb`
266
+ (generating `STRIKETHROUGH` in `process_emphasis`)
267
+
268
+ ### 4.3 GFM Disallowed Raw HTML
269
+
270
+ Opt-in. It only works when `allow_html: true, disallow_raw_html: true` are used
271
+ together (under `allow_html: false` all HTML is escaped, so it has no effect).
272
+ See 3.3 for details.
273
+
274
+ ### 4.4 GFM Extended Autolink
275
+
276
+ Opt-in. Specifying `extended_autolinks: true` runs `ExtendedAutolinkPass` as a
277
+ pass that linkifies bare URLs / emails / `www.`-prefixed strings that are not
278
+ wrapped in `<...>`.
279
+
280
+ **Additional constraint**: implements the domain underscore check from 2.8.
281
+
282
+ **Implementation**: `lib/red_quilt/extended_autolink_pass.rb`
283
+
284
+ ### 4.5 GFM Footnotes
285
+
286
+ Opt-in. Specifying `footnotes: true` removes `[^label]: ...` definitions from the
287
+ body flow and converts `[^label]` references into sup links. Only the referenced
288
+ definitions are kept, ordered by first reference, and output as a
289
+ `FOOTNOTES_SECTION` at the end of the root. Unreferenced definitions are not
290
+ output.
291
+
292
+ **Implementation**: `lib/red_quilt/footnote_definition.rb`,
293
+ `lib/red_quilt/footnote_registry.rb`, `lib/red_quilt/footnote_pass.rb`
294
+
295
+ ## 5. Unsupported / known limitations
296
+
297
+ - GFM Task List Items (`- [ ]` / `- [x]`) are not supported. They are parsed as
298
+ normal list items.
299
+
300
+ ## 6. Correspondence with tests
301
+
302
+ This section collects the spec files that verify the difference items.
303
+
304
+ | Aspect | Spec file |
305
+ |---|---|
306
+ | Passing the official CommonMark examples | `spec/commonmark_compat_spec.rb` |
307
+ | Input normalization (line endings / NUL / blank line) | `spec/input_normalization_spec.rb` |
308
+ | Whitespace strictness (autolink / raw HTML / link tail) | `spec/whitespace_strictness_spec.rb` |
309
+ | Link / reference validation (label cap / raw dest) | `spec/link_validation_spec.rb` |
310
+ | NCR digit limits and invalid codepoints | `spec/numeric_character_reference_spec.rb` |
311
+ | GFM table column-count match | `spec/red_quilt_spec.rb` — "table separator validation" |
312
+ | GFM extended autolink domain validation | `spec/extended_autolink_spec.rb` |
313
+ | GFM footnotes | `spec/footnotes_spec.rb` |
314
+ | URL scheme sanitization | `spec/red_quilt_spec.rb` — "sanitizes unsafe URL schemes" |
315
+ | Diagnostics / lint diagnostics | `spec/diagnostic_spec.rb` |
316
+ | Disallowed Raw HTML | `spec/red_quilt_spec.rb` — disallow_raw_html cases |
data/lib/red_quilt/cli.rb CHANGED
@@ -38,6 +38,7 @@ module RedQuilt
38
38
  theme: :default,
39
39
  output: nil,
40
40
  open: false,
41
+ mermaid: false,
41
42
  }.freeze
42
43
 
43
44
  THEMES = %i[none default].freeze
@@ -154,6 +155,10 @@ module RedQuilt
154
155
  "Write HTML to a file and open it in the default browser (forces --standalone)") do
155
156
  options[:open] = true
156
157
  end
158
+ opts.on("--mermaid",
159
+ "Render `mermaid` code blocks as diagrams (loads mermaid.js from a CDN in standalone output)") do
160
+ options[:mermaid] = true
161
+ end
157
162
  opts.on("--diagnostics", "Also print diagnostics to stderr") do
158
163
  options[:diagnostics] = true
159
164
  end
@@ -206,6 +211,7 @@ module RedQuilt
206
211
  lang: options[:lang],
207
212
  css: options[:css],
208
213
  theme: options[:theme],
214
+ mermaid: options[:mermaid],
209
215
  )
210
216
  end
211
217
 
@@ -47,11 +47,15 @@ module RedQuilt
47
47
  # (an external stylesheet link) is independent and may be combined.
48
48
  # heading_ids: when true, every heading gets a slugified `id` (Unicode
49
49
  # preserving, deduplicated within the document) for anchor links.
50
- def to_html(standalone: false, title: nil, lang: "en", css: nil, theme: :none, heading_ids: false)
51
- body = Renderer::HTML.new(self, heading_ids: heading_ids).render
50
+ # mermaid: when true, fenced code blocks tagged `mermaid` render as
51
+ # `<pre class="mermaid">` containers instead of `<pre><code>`. In
52
+ # standalone mode the mermaid.js runtime is also loaded from a CDN so
53
+ # the diagrams render in the browser without further setup.
54
+ def to_html(standalone: false, title: nil, lang: "en", css: nil, theme: :none, heading_ids: false, mermaid: false)
55
+ body = Renderer::HTML.new(self, heading_ids: heading_ids, mermaid: mermaid).render
52
56
  return body unless standalone
53
57
 
54
- wrap_standalone_html(body, title: title.to_s, lang: lang.to_s, css: css, theme: Theme.css(theme))
58
+ wrap_standalone_html(body, title: title.to_s, lang: lang.to_s, css: css, theme: Theme.css(theme), mermaid: mermaid)
55
59
  end
56
60
 
57
61
  def to_ast
@@ -87,7 +91,68 @@ module RedQuilt
87
91
 
88
92
  private
89
93
 
90
- def wrap_standalone_html(body, title:, lang:, css:, theme:)
94
+ # Self-contained assets embedded in standalone output when mermaid
95
+ # support is enabled. Loads the mermaid.js runtime from a CDN as an ES
96
+ # module, renders every `<pre class="mermaid">` container, then makes
97
+ # each diagram interactive with svg-pan-zoom (also from a CDN): mouse
98
+ # wheel zooms, drag pans, and a small control panel offers +/-/reset.
99
+ MERMAID_SCRIPT = <<~HTML
100
+ <style>
101
+ .rq-mermaid-pz {
102
+ /* Break out of the body's max-width column so the viewport isn't a
103
+ narrow peephole: span most of the viewport width, centered. */
104
+ width: 80vw;
105
+ margin-left: calc(50% - 40vw);
106
+ height: 80vh;
107
+ border: 1px solid #d0d7de;
108
+ border-radius: 6px;
109
+ overflow: hidden;
110
+ }
111
+ .rq-mermaid-pz svg {
112
+ width: 100%;
113
+ height: 100%;
114
+ max-width: none;
115
+ display: block;
116
+ cursor: grab;
117
+ }
118
+ @media (prefers-color-scheme: dark) {
119
+ .rq-mermaid-pz { border-color: #30363d; }
120
+ }
121
+ </style>
122
+ <script type="module">
123
+ import mermaid from "https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.esm.min.mjs";
124
+ import svgPanZoom from "https://cdn.jsdelivr.net/npm/svg-pan-zoom@3.6.1/+esm";
125
+ mermaid.initialize({ startOnLoad: false });
126
+ await mermaid.run();
127
+
128
+ for (const pre of document.querySelectorAll("pre.mermaid")) {
129
+ const svg = pre.querySelector("svg");
130
+ if (!svg) continue;
131
+ // Drop mermaid's inline max-width and let the SVG fill a sized box so
132
+ // svg-pan-zoom has room to zoom/pan. The whole viewBox scales as one,
133
+ // so every element stays aligned.
134
+ svg.removeAttribute("style");
135
+ svg.setAttribute("width", "100%");
136
+ svg.setAttribute("height", "100%");
137
+ const box = document.createElement("div");
138
+ box.className = "rq-mermaid-pz";
139
+ pre.replaceWith(box);
140
+ box.appendChild(svg);
141
+ svgPanZoom(svg, {
142
+ zoomEnabled: true,
143
+ controlIconsEnabled: true,
144
+ fit: true,
145
+ center: true,
146
+ zoomScaleSensitivity: 0.3,
147
+ minZoom: 0.2,
148
+ maxZoom: 20,
149
+ });
150
+ }
151
+ </script>
152
+ HTML
153
+ private_constant :MERMAID_SCRIPT
154
+
155
+ def wrap_standalone_html(body, title:, lang:, css:, theme:, mermaid: false)
91
156
  out = +"<!DOCTYPE html>\n"
92
157
  out << %(<html lang="#{html_escape_attr(lang)}">\n)
93
158
  out << "<head>\n"
@@ -97,6 +162,7 @@ module RedQuilt
97
162
  out << "<style>\n#{theme}</style>\n" if theme
98
163
  out << "</head>\n<body>\n"
99
164
  out << body
165
+ out << MERMAID_SCRIPT if mermaid
100
166
  out << "</body>\n</html>\n"
101
167
  out
102
168
  end
@@ -3,11 +3,12 @@
3
3
  module RedQuilt
4
4
  module Renderer
5
5
  class HTML
6
- def initialize(document, heading_ids: false)
6
+ def initialize(document, heading_ids: false, mermaid: false)
7
7
  @document = document
8
8
  @arena = document.arena
9
9
  @out = +""
10
10
  @slugger = Slug::Counter.new if heading_ids
11
+ @mermaid = mermaid
11
12
  end
12
13
 
13
14
  def render
@@ -73,12 +74,21 @@ module RedQuilt
73
74
  render_list_item(node_id)
74
75
  @out << "</li>\n"
75
76
  when NodeType::CODE_BLOCK
76
- @out << "<pre><code"
77
77
  info_word = @arena.str2(node_id).to_s.split.first.to_s
78
- @out << %( class="language-#{escape_html(info_word)}") unless info_word.empty?
79
- @out << ">"
80
- @out << escape_html(@arena.text(node_id).to_s)
81
- @out << "</code></pre>\n"
78
+ if @mermaid && info_word == "mermaid"
79
+ # Emit a container mermaid.js recognizes via class="mermaid".
80
+ # The diagram source is still HTML-escaped; the browser decodes
81
+ # the entities back into textContent, which is what mermaid reads.
82
+ @out << %(<pre class="mermaid">)
83
+ @out << escape_html(@arena.text(node_id).to_s)
84
+ @out << "</pre>\n"
85
+ else
86
+ @out << "<pre><code"
87
+ @out << %( class="language-#{escape_html(info_word)}") unless info_word.empty?
88
+ @out << ">"
89
+ @out << escape_html(@arena.text(node_id).to_s)
90
+ @out << "</code></pre>\n"
91
+ end
82
92
  when NodeType::HTML_BLOCK
83
93
  render_raw_html(@arena.text(node_id).to_s, block: true)
84
94
  when NodeType::TABLE
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module RedQuilt
4
- VERSION = "0.7.1"
4
+ VERSION = "0.7.2"
5
5
  end
data/lib/red_quilt.rb CHANGED
@@ -57,13 +57,13 @@ module RedQuilt
57
57
  document
58
58
  end
59
59
 
60
- def render_html(source, allow_html: false, disallow_raw_html: false, extended_autolinks: false, footnotes: false, lint: false, heading_ids: false)
60
+ def render_html(source, allow_html: false, disallow_raw_html: false, extended_autolinks: false, footnotes: false, lint: false, heading_ids: false, mermaid: false)
61
61
  parse(source,
62
62
  allow_html: allow_html,
63
63
  disallow_raw_html: disallow_raw_html,
64
64
  extended_autolinks: extended_autolinks,
65
65
  footnotes: footnotes,
66
- lint: lint).to_html(heading_ids: heading_ids)
66
+ lint: lint).to_html(heading_ids: heading_ids, mermaid: mermaid)
67
67
  end
68
68
 
69
69
  private
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: red_quilt
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.1
4
+ version: 0.7.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - takahashim
@@ -26,8 +26,11 @@ files:
26
26
  - Rakefile
27
27
  - docs/api.md
28
28
  - docs/architecture.ja.md
29
+ - docs/architecture.md
29
30
  - docs/arena-usage.ja.md
31
+ - docs/arena-usage.md
30
32
  - docs/commonmark-conformance.ja.md
33
+ - docs/commonmark-conformance.md
31
34
  - exe/redquilt
32
35
  - lib/red_quilt.rb
33
36
  - lib/red_quilt/arena.rb
@@ -91,7 +94,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
91
94
  - !ruby/object:Gem::Version
92
95
  version: '0'
93
96
  requirements: []
94
- rubygems_version: 4.0.10
97
+ rubygems_version: 3.6.9
95
98
  specification_version: 4
96
99
  summary: CommonMark-based Markdown processor written in pure Ruby
97
100
  test_files: []