red_quilt 0.7.1 → 0.8.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +42 -0
- data/README.md +69 -7
- data/Rakefile +31 -0
- data/docs/api.md +15 -0
- data/docs/architecture.ja.md +1 -1
- data/docs/architecture.md +99 -0
- data/docs/arena-usage.ja.md +5 -5
- data/docs/arena-usage.md +423 -0
- data/docs/commonmark-conformance.md +316 -0
- data/lib/red_quilt/arena.rb +96 -0
- data/lib/red_quilt/block_parser.rb +22 -384
- data/lib/red_quilt/cli.rb +14 -2
- data/lib/red_quilt/code_block.rb +139 -0
- data/lib/red_quilt/document.rb +86 -6
- data/lib/red_quilt/footnote_anchors.rb +24 -0
- data/lib/red_quilt/footnote_pass.rb +6 -2
- data/lib/red_quilt/frontmatter.rb +54 -0
- data/lib/red_quilt/html_block.rb +161 -0
- data/lib/red_quilt/indentation.rb +35 -0
- data/lib/red_quilt/inline/builder.rb +9 -186
- data/lib/red_quilt/inline/emphasis_resolver.rb +184 -0
- data/lib/red_quilt/inline/url_sanitizer.rb +64 -0
- data/lib/red_quilt/line.rb +6 -1
- data/lib/red_quilt/lint_pass.rb +2 -2
- data/lib/red_quilt/node_ref.rb +20 -11
- data/lib/red_quilt/renderer/html.rb +48 -26
- data/lib/red_quilt/renderer/mdast.rb +11 -11
- data/lib/red_quilt/table.rb +97 -0
- data/lib/red_quilt/version.rb +1 -1
- data/lib/red_quilt.rb +19 -4
- data/sig/red_quilt.rbs +18 -0
- metadata +11 -1
data/docs/arena-usage.md
ADDED
|
@@ -0,0 +1,423 @@
|
|
|
1
|
+
# How to use the Arena class
|
|
2
|
+
|
|
3
|
+
`RedQuilt::Arena` is the low-level storage class that holds the actual AST of
|
|
4
|
+
RedQuilt. This document describes its API and assumptions for people who touch
|
|
5
|
+
the Arena directly: block parsers, inline builders, renderers, custom
|
|
6
|
+
transformers, and any other code under `lib/red_quilt`.
|
|
7
|
+
|
|
8
|
+
> If you only need to work with the AST as an external API, the standard path is
|
|
9
|
+
> to go through `RedQuilt::Document` and `RedQuilt::NodeRef`. The Arena is a more
|
|
10
|
+
> internal layer; it is the very data structure that `NodeRef` is built on.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## 0. A quick example
|
|
15
|
+
|
|
16
|
+
The code below works if you copy and paste it as is. It should give you a feel
|
|
17
|
+
for how the Arena "builds a tree from a source string using various IDs".
|
|
18
|
+
|
|
19
|
+
```ruby
|
|
20
|
+
require "red_quilt"
|
|
21
|
+
|
|
22
|
+
source = "Hello *world*"
|
|
23
|
+
arena = RedQuilt::Arena.new(source)
|
|
24
|
+
|
|
25
|
+
# (a) Create nodes. The return value is a node id (Integer).
|
|
26
|
+
para_id = arena.add_node(RedQuilt::NodeType::PARAGRAPH,
|
|
27
|
+
source_start: 0, source_len: source.bytesize)
|
|
28
|
+
text_id = arena.add_node(RedQuilt::NodeType::TEXT,
|
|
29
|
+
source_start: 0, source_len: 6) # "Hello "
|
|
30
|
+
em_id = arena.add_node(RedQuilt::NodeType::EMPHASIS,
|
|
31
|
+
source_start: 6, source_len: 7) # "*world*"
|
|
32
|
+
inner_id = arena.add_node(RedQuilt::NodeType::TEXT,
|
|
33
|
+
source_start: 7, source_len: 5) # "world"
|
|
34
|
+
|
|
35
|
+
# (b) Build parent/child relationships
|
|
36
|
+
arena.append_child(para_id, text_id)
|
|
37
|
+
arena.append_child(para_id, em_id)
|
|
38
|
+
arena.append_child(em_id, inner_id)
|
|
39
|
+
|
|
40
|
+
# (c) Read content back out
|
|
41
|
+
puts "type: #{arena.type_name(para_id)}"
|
|
42
|
+
puts "text: #{arena.text(text_id).inspect}"
|
|
43
|
+
puts "inner: #{arena.text(inner_id).inspect}"
|
|
44
|
+
puts "span: #{arena.source_span(em_id).inspect}"
|
|
45
|
+
|
|
46
|
+
# (d) Iterate over children (block form, no Enumerator)
|
|
47
|
+
puts "children of paragraph:"
|
|
48
|
+
arena.each_child(para_id) do |child_id|
|
|
49
|
+
puts " #{arena.type_name(child_id)}: #{arena.text(child_id).inspect}"
|
|
50
|
+
end
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
The output looks like this:
|
|
54
|
+
|
|
55
|
+
```
|
|
56
|
+
type: paragraph
|
|
57
|
+
text: "Hello "
|
|
58
|
+
inner: "world"
|
|
59
|
+
span: #<RedQuilt::SourceSpan:0x... @start_byte=6, @end_byte=13>
|
|
60
|
+
children of paragraph:
|
|
61
|
+
text: "Hello "
|
|
62
|
+
emphasis: "*world*"
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
The AST that this sample builds looks like this:
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
PARAGRAPH [0, 13) "Hello *world*"
|
|
69
|
+
├─ TEXT [0, 6) "Hello "
|
|
70
|
+
└─ EMPHASIS [6, 13) "*world*"
|
|
71
|
+
└─ TEXT [7, 12) "world"
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
The key points when working with the Arena are:
|
|
75
|
+
|
|
76
|
+
- `add_node` returns a node ID (Integer). Every later API call uses this ID as
|
|
77
|
+
the key.
|
|
78
|
+
- `source_start` / `source_len` specify a range over the original source string
|
|
79
|
+
in bytes, not characters. The Arena does not keep a copy of the string itself.
|
|
80
|
+
- `text(id)` returns str1 if it exists; otherwise it byteslices `source`.
|
|
81
|
+
- `each_child(id)` is the basic traversal API and is used on the hot path.
|
|
82
|
+
|
|
83
|
+
Keep these in mind, and the later sections will read as "what does this API
|
|
84
|
+
actually guarantee, and how should I use it?".
|
|
85
|
+
|
|
86
|
+
---
|
|
87
|
+
|
|
88
|
+
## 1. Design highlights
|
|
89
|
+
|
|
90
|
+
The Arena represents the AST not as a "tree of objects" but as a
|
|
91
|
+
[parallel array](https://en.wikipedia.org/wiki/Parallel_array).
|
|
92
|
+
|
|
93
|
+
- Nodes are identified by an integer ID (`node_id`).
|
|
94
|
+
- The attributes for each ID (parent / source span / payload) are kept as
|
|
95
|
+
columns in separate Arrays.
|
|
96
|
+
- Adding a node is just an append to the end of each Array; no new Ruby object is
|
|
97
|
+
created at all.
|
|
98
|
+
|
|
99
|
+
As a result, the Arena has the following properties:
|
|
100
|
+
|
|
101
|
+
- On the hot path you only pass around Integers (IDs).
|
|
102
|
+
- Memory locality is good and GC pressure is low.
|
|
103
|
+
- Nodes can be treated as "lightweight" values, which makes it easy to inline the
|
|
104
|
+
Renderer and Builder.
|
|
105
|
+
|
|
106
|
+
#### List of columns
|
|
107
|
+
|
|
108
|
+
| Column | Purpose |
|
|
109
|
+
|------|------|
|
|
110
|
+
| `@type` | NodeType (an Integer constant) |
|
|
111
|
+
| `@parent` / `@first_child` / `@last_child` / `@next_sibling` / `@prev_sibling` | Parent / child / sibling links. The value is a node id (`NO_NODE` means "none"). |
|
|
112
|
+
| `@source_start` / `@source_len` | Byte range within the document source. `source_start < 0` means "no span". |
|
|
113
|
+
| `@int1` / `@int2` / `@int3` | Integer slots whose meaning depends on the NodeType (default `0`). |
|
|
114
|
+
| `@str1` / `@str2` | String slots whose meaning depends on the NodeType (default `nil`). |
|
|
115
|
+
|
|
116
|
+
---
|
|
117
|
+
|
|
118
|
+
## 2. Invariants
|
|
119
|
+
|
|
120
|
+
These assumptions always hold when you work with the Arena.
|
|
121
|
+
|
|
122
|
+
1. Node IDs increase monotonically.
|
|
123
|
+
The ID handed out by `add_node` starts at `@type.length` and increases by 1
|
|
124
|
+
each time you add a node. IDs are never reused.
|
|
125
|
+
2. Detached nodes stay in the columns.
|
|
126
|
+
`detach` only resets the parent and sibling links to `NO_NODE`; the column
|
|
127
|
+
record itself stays in the arena. A later `add_node` never reuses that slot.
|
|
128
|
+
This is a deliberate choice that keeps allocation simple.
|
|
129
|
+
3. Treat `@source` as immutable.
|
|
130
|
+
You must not rewrite the source after the Arena is built. `source_start` /
|
|
131
|
+
`source_len` point directly at byte ranges, so if the source changes, the
|
|
132
|
+
return values of `text` / `source_span` break.
|
|
133
|
+
4. `NO_NODE` = -1.
|
|
134
|
+
This is the sentinel meaning "no parent or sibling exists". You can reference
|
|
135
|
+
it as the `Arena::NO_NODE` constant.
|
|
136
|
+
5. `source_start < 0` means "no span".
|
|
137
|
+
In this case the content of a leaf node is often held as a literal in `@str1`
|
|
138
|
+
(for example, a paragraph after a blockquote is removed, or a TEXT node after
|
|
139
|
+
entity decoding). However, some NodeTypes, like container inlines, have no
|
|
140
|
+
span and also do not use `str1`; they build their content from child nodes.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## 3. API layers
|
|
145
|
+
|
|
146
|
+
The public methods of the Arena are easier to understand if you read them in
|
|
147
|
+
these three layers.
|
|
148
|
+
|
|
149
|
+
### 3.1 Structure mutation (mutators)
|
|
150
|
+
|
|
151
|
+
APIs for building and editing the tree. They assume you pass a valid id and do
|
|
152
|
+
minimal safety checking.
|
|
153
|
+
|
|
154
|
+
| Method | Summary |
|
|
155
|
+
|----------|------|
|
|
156
|
+
| `add_node(type, **fields)` | Append a new node at the end and return its ID. It starts detached. |
|
|
157
|
+
| `append_child(parent_id, child_id)` | Append to the end of the parent's child list. |
|
|
158
|
+
| `insert_before(parent_id, ref_id, new_id)` | Insert immediately before `ref_id`. |
|
|
159
|
+
| `detach(child_id)` | Detach from the parent. The node itself remains. |
|
|
160
|
+
| `reparent(new_parent_id, first_id, last_id)` | Move the sibling range `first_id..last_id` to a new parent. |
|
|
161
|
+
| `update_span(id, start_byte, end_byte)` | Reset the source span. |
|
|
162
|
+
| `update_str1(id, value)` / `update_int3(id, value)` | Overwrite an individual slot. |
|
|
163
|
+
|
|
164
|
+
### 3.2 Structure access (raw id accessors)
|
|
165
|
+
|
|
166
|
+
These return the raw column value, which may be `NO_NODE`. The naming convention
|
|
167
|
+
`raw_X_id` means "the return value is a node id and may be -1 (`NO_NODE`)".
|
|
168
|
+
|
|
169
|
+
| Method | Return value |
|
|
170
|
+
|----------|--------|
|
|
171
|
+
| `raw_parent_id(id)` | Parent id, or `NO_NODE`. |
|
|
172
|
+
| `raw_first_child_id(id)` / `raw_last_child_id(id)` | Child id, or `NO_NODE`. |
|
|
173
|
+
| `raw_next_sibling_id(id)` / `raw_prev_sibling_id(id)` | Sibling id, or `NO_NODE`. |
|
|
174
|
+
|
|
175
|
+
### 3.3 Payload access (column accessors)
|
|
176
|
+
|
|
177
|
+
These return each column as raw data. You should read from the return type
|
|
178
|
+
whether a sentinel can come back.
|
|
179
|
+
|
|
180
|
+
| Method | Return value |
|
|
181
|
+
|----------|--------|
|
|
182
|
+
| `type(id)` | NodeType constant (Integer). |
|
|
183
|
+
| `type_name(id)` | Symbol (for example, `:paragraph`). |
|
|
184
|
+
| `source_start(id)` / `source_len(id)` | Byte offset / byte length. `source_start < 0` means no span. |
|
|
185
|
+
| `int1(id)` / `int2(id)` / `int3(id)` | Integer (default 0). |
|
|
186
|
+
| `str1(id)` / `str2(id)` | String or `nil`. |
|
|
187
|
+
|
|
188
|
+
### 3.4 Semantic accessors
|
|
189
|
+
|
|
190
|
+
These interpret the low-level columns and return an "easy to use" value. They can
|
|
191
|
+
return `nil` to explicitly express "none".
|
|
192
|
+
|
|
193
|
+
| Method | Return value |
|
|
194
|
+
|----------|--------|
|
|
195
|
+
| `source_span(id)` | `SourceSpan`, or `nil` if there is no span. |
|
|
196
|
+
| `text(id)` | str1 if present; otherwise `source.byteslice(...)`. `nil` if neither exists. |
|
|
197
|
+
|
|
198
|
+
### 3.5 Traversal
|
|
199
|
+
|
|
200
|
+
| Method | Purpose |
|
|
201
|
+
|----------|------|
|
|
202
|
+
| `each_child(id) { |child_id| ... }` | Block form. Recommended on the hot path (no Enumerator). |
|
|
203
|
+
| `child_ids(id)` | Returns an `Enumerator`, for chaining `map` / `select`, etc. |
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## 4. Slot usage per NodeType
|
|
208
|
+
|
|
209
|
+
Which int / str slots each NodeType uses is fixed by convention. The current
|
|
210
|
+
conventions are below.
|
|
211
|
+
|
|
212
|
+
#### Block nodes
|
|
213
|
+
|
|
214
|
+
| NodeType | int1 | int2 | int3 | str1 | str2 |
|
|
215
|
+
|----------|------|------|------|------|------|
|
|
216
|
+
| `DOCUMENT` | - | - | - | - | - |
|
|
217
|
+
| `PARAGRAPH` | - | - | - | A joined literal when needed (when transformed, or when leading indent is removed, etc.) | - |
|
|
218
|
+
| `HEADING` | level (1-6) | - | - | An inline literal when needed (when transformed, setext heading, etc.) | - |
|
|
219
|
+
| `THEMATIC_BREAK` | - | - | - | - | - |
|
|
220
|
+
| `BLOCKQUOTE` | - | - | - | - | - |
|
|
221
|
+
| `LIST` | ordered? (0/1) | start_number | tight? (1=tight) | marker (`-`/`*`/`+`/`.`/`)`) | - |
|
|
222
|
+
| `LIST_ITEM` | - | - | - | - | - |
|
|
223
|
+
| `CODE_BLOCK` | - | - | - | code content (literal) | info string (fenced only) |
|
|
224
|
+
| `HTML_BLOCK` | - | - | - | HTML content (literal) | - |
|
|
225
|
+
| `TABLE` | - | - | - | - | - |
|
|
226
|
+
| `TABLE_ROW` | header? (1/0) | - | - | - | - |
|
|
227
|
+
| `TABLE_CELL` | header? (1/0) | - | - | stripped cell text | - |
|
|
228
|
+
| `FOOTNOTE_DEFINITION` | - | - | - | normalized label | - |
|
|
229
|
+
| `FOOTNOTES_SECTION` | - | - | - | - | - |
|
|
230
|
+
|
|
231
|
+
#### Inline nodes
|
|
232
|
+
|
|
233
|
+
| NodeType | int1 | int2 | int3 | str1 | str2 |
|
|
234
|
+
|----------|------|------|------|------|------|
|
|
235
|
+
| `TEXT` | - | - | - | literal (after entity decode, etc.) or `nil` (span-based) | - |
|
|
236
|
+
| `SOFTBREAK` / `HARDBREAK` | - | - | - | `"\n"` | - |
|
|
237
|
+
| `EMPHASIS` / `STRONG` / `STRIKETHROUGH` | - | - | - | - | - |
|
|
238
|
+
| `CODE_SPAN` | - | - | - | normalized content (literal) | - |
|
|
239
|
+
| `LINK` | - | - | - | sanitized destination | title (or `nil`) |
|
|
240
|
+
| `IMAGE` | - | - | - | sanitized destination | title (or `nil`) |
|
|
241
|
+
| `HTML_INLINE` | - | - | - | matched HTML literal | - |
|
|
242
|
+
| `FOOTNOTE_REFERENCE` | footnote number | occurrence count (the Nth one for the same label) | - | normalized label | - |
|
|
243
|
+
|
|
244
|
+
> `-` means "not used" (left at the default `0` / `nil`).
|
|
245
|
+
|
|
246
|
+
> Footnotes are only generated when `footnotes: true`. `FOOTNOTES_SECTION` is
|
|
247
|
+
> placed as the last child directly under the root (span-less,
|
|
248
|
+
> `source_start: -1`), and holds the referenced `FOOTNOTE_DEFINITION`s in
|
|
249
|
+
> first-reference order. The number of backrefs is computed at render time from
|
|
250
|
+
> the footnote number and label.
|
|
251
|
+
|
|
252
|
+
#### Source span conventions
|
|
253
|
+
|
|
254
|
+
- `source_start` / `source_len`: bytes of the original document (absolute byte
|
|
255
|
+
offset).
|
|
256
|
+
- `source_start < 0`: no span. A leaf node often holds its content as a literal
|
|
257
|
+
in `str1`, but a container inline may have only child nodes.
|
|
258
|
+
- The span of a block node serves two different purposes depending on use.
|
|
259
|
+
- For inline targets (paragraph / heading / table cell), the span is also the
|
|
260
|
+
byte range that the InlinePass tokenizes, so it points at the inline body
|
|
261
|
+
with `#` or other prefixes removed.
|
|
262
|
+
- For everything else (list / blockquote / table / code / html block, etc.)
|
|
263
|
+
the span is not used for tokenizing and only carries
|
|
264
|
+
structural / line-level position information.
|
|
265
|
+
|
|
266
|
+
---
|
|
267
|
+
|
|
268
|
+
## 5. Typical usage
|
|
269
|
+
|
|
270
|
+
### 5.1 Create an Arena and build a small AST
|
|
271
|
+
|
|
272
|
+
```ruby
|
|
273
|
+
source = "Hello *world*"
|
|
274
|
+
arena = RedQuilt::Arena.new(source)
|
|
275
|
+
|
|
276
|
+
doc_id = arena.add_node(RedQuilt::NodeType::DOCUMENT,
|
|
277
|
+
source_start: 0, source_len: source.bytesize)
|
|
278
|
+
|
|
279
|
+
para_id = arena.add_node(RedQuilt::NodeType::PARAGRAPH,
|
|
280
|
+
source_start: 0, source_len: source.bytesize)
|
|
281
|
+
arena.append_child(doc_id, para_id)
|
|
282
|
+
|
|
283
|
+
text_id = arena.add_node(RedQuilt::NodeType::TEXT,
|
|
284
|
+
source_start: 0, source_len: 6) # "Hello "
|
|
285
|
+
arena.append_child(para_id, text_id)
|
|
286
|
+
|
|
287
|
+
em_id = arena.add_node(RedQuilt::NodeType::EMPHASIS,
|
|
288
|
+
source_start: 6, source_len: 7) # "*world*"
|
|
289
|
+
arena.append_child(para_id, em_id)
|
|
290
|
+
|
|
291
|
+
inner_id = arena.add_node(RedQuilt::NodeType::TEXT,
|
|
292
|
+
source_start: 7, source_len: 5) # "world"
|
|
293
|
+
arena.append_child(em_id, inner_id)
|
|
294
|
+
|
|
295
|
+
arena.text(text_id) # => "Hello "
|
|
296
|
+
arena.text(inner_id) # => "world"
|
|
297
|
+
arena.source_span(em_id) # => #<SourceSpan @start_byte=6 @end_byte=13>
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
### 5.2 Loop over siblings (hot path)
|
|
301
|
+
|
|
302
|
+
```ruby
|
|
303
|
+
arena.each_child(para_id) do |child_id|
|
|
304
|
+
case arena.type(child_id)
|
|
305
|
+
when RedQuilt::NodeType::TEXT
|
|
306
|
+
output << arena.text(child_id)
|
|
307
|
+
when RedQuilt::NodeType::EMPHASIS
|
|
308
|
+
output << "<em>"
|
|
309
|
+
render_children(child_id)
|
|
310
|
+
output << "</em>"
|
|
311
|
+
end
|
|
312
|
+
end
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
If you want to chain over an `Enumerator` (for example in NodeRef), do this:
|
|
316
|
+
|
|
317
|
+
```ruby
|
|
318
|
+
arena.child_ids(para_id).map { |id| arena.type_name(id) }
|
|
319
|
+
# => [:text, :emphasis]
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
### 5.3 Move a node to a different parent
|
|
323
|
+
|
|
324
|
+
`reparent` is an API that replaces the children of the destination node, so the
|
|
325
|
+
destination should normally be a newly created empty node.
|
|
326
|
+
|
|
327
|
+
```ruby
|
|
328
|
+
# Move the children of `em_id` under a new strong_id
|
|
329
|
+
strong_id = arena.add_node(RedQuilt::NodeType::STRONG,
|
|
330
|
+
source_start: arena.source_start(em_id),
|
|
331
|
+
source_len: arena.source_len(em_id))
|
|
332
|
+
arena.insert_before(arena.raw_parent_id(em_id), em_id, strong_id)
|
|
333
|
+
|
|
334
|
+
first = arena.raw_first_child_id(em_id)
|
|
335
|
+
last = arena.raw_last_child_id(em_id)
|
|
336
|
+
arena.reparent(strong_id, first, last) if first != RedQuilt::Arena::NO_NODE
|
|
337
|
+
|
|
338
|
+
# Detach em_id while it is empty. strong_id stays where em_id was.
|
|
339
|
+
arena.detach(em_id)
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
### 5.4 Replace a node
|
|
343
|
+
|
|
344
|
+
```ruby
|
|
345
|
+
# Replace em_id with strong_id (keep the contents)
|
|
346
|
+
strong_id = arena.add_node(RedQuilt::NodeType::STRONG,
|
|
347
|
+
source_start: arena.source_start(em_id),
|
|
348
|
+
source_len: arena.source_len(em_id))
|
|
349
|
+
arena.insert_before(arena.raw_parent_id(em_id), em_id, strong_id)
|
|
350
|
+
|
|
351
|
+
first = arena.raw_first_child_id(em_id)
|
|
352
|
+
last = arena.raw_last_child_id(em_id)
|
|
353
|
+
arena.reparent(strong_id, first, last) if first != RedQuilt::Arena::NO_NODE
|
|
354
|
+
|
|
355
|
+
arena.detach(em_id)
|
|
356
|
+
```
|
|
357
|
+
|
|
358
|
+
### 5.5 Update column values directly
|
|
359
|
+
|
|
360
|
+
```ruby
|
|
361
|
+
# A heading's level is in int1, but there is no dedicated setter for it,
|
|
362
|
+
# so add one if needed. Currently only str1 / int3 / span have public setters:
|
|
363
|
+
arena.update_str1(text_id, "Hello, world!")
|
|
364
|
+
arena.update_int3(list_id, 1) # make it tight
|
|
365
|
+
arena.update_span(text_id, 0, 12)
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
Note that there are currently no setters for int1 / int2 / str2. The plan is to
|
|
369
|
+
add `update_int1` and similar when the need arises.
|
|
370
|
+
|
|
371
|
+
---
|
|
372
|
+
|
|
373
|
+
## 6. Performance notes
|
|
374
|
+
|
|
375
|
+
#### Use `each_child` on the hot path
|
|
376
|
+
|
|
377
|
+
Yielding directly to a block avoids Enumerator allocation. `child_ids` is for the
|
|
378
|
+
external API.
|
|
379
|
+
|
|
380
|
+
#### `text(id)` prefers str1
|
|
381
|
+
|
|
382
|
+
To avoid an extra `byteslice`, content that can be reconstructed from the source
|
|
383
|
+
should leave `str1` as `nil`. However, use `str1` in cases where a literal is
|
|
384
|
+
required for correctness: TEXT after entity decode, code/html literals, table
|
|
385
|
+
cells, transformed/literal inline targets, and so on.
|
|
386
|
+
|
|
387
|
+
#### `source_span(id)` allocates a `SourceSpan` every time
|
|
388
|
+
|
|
389
|
+
If you use it on the hot path, it is better to read `source_start` /
|
|
390
|
+
`source_len` directly.
|
|
391
|
+
|
|
392
|
+
#### Detached nodes cannot be reclaimed
|
|
393
|
+
|
|
394
|
+
Repeatedly detaching many nodes keeps growing the arena's columns. The scale is
|
|
395
|
+
fine within parsing a single document, but it is not suited to a long-lived
|
|
396
|
+
arena.
|
|
397
|
+
|
|
398
|
+
---
|
|
399
|
+
|
|
400
|
+
## 7. Pitfalls
|
|
401
|
+
|
|
402
|
+
#### When you use a `raw_*_id` return value directly as a foreign key
|
|
403
|
+
|
|
404
|
+
Do not forget the `NO_NODE` (-1) check. Using it with `Array#[-1]` reads the last
|
|
405
|
+
element of the array and corrupts the tree.
|
|
406
|
+
|
|
407
|
+
#### Preconditions of `reparent`
|
|
408
|
+
|
|
409
|
+
You must be able to reach `last_id` by following `next_sibling` from `first_id`.
|
|
410
|
+
Passing nodes with a different parent, or a `last_id` that is unreachable behind
|
|
411
|
+
`first_id`, can cause an infinite loop (the builder actually hit this in the
|
|
412
|
+
past).
|
|
413
|
+
|
|
414
|
+
#### The meaning of `source_start < 0`
|
|
415
|
+
|
|
416
|
+
It is "literal mode, with position information discarded". The user-facing APIs
|
|
417
|
+
(`SourceMap`, `node.source_location`, etc.) treat it as having no span. Do not
|
|
418
|
+
forget this and get confused in the debugger by "there is no position
|
|
419
|
+
information".
|
|
420
|
+
|
|
421
|
+
#### Do not change `@source` afterward
|
|
422
|
+
|
|
423
|
+
If you do, the return values of `text` / `source_span` break silently.
|