mathpix-markdown-it 2.0.39 → 2.0.40
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +2 -0
- package/doc/changelog.md +27 -0
- package/es5/browser/auto-render.js +1 -1
- package/es5/bundle.js +4 -4
- package/es5/index.js +1 -1
- package/lib/components/mathpix-markdown/index.js +2 -1
- package/lib/components/mathpix-markdown/index.js.map +1 -1
- package/lib/markdown/common/consts.d.ts +4 -0
- package/lib/markdown/common/consts.js +15 -5
- package/lib/markdown/common/consts.js.map +1 -1
- package/lib/markdown/index.js +2 -1
- package/lib/markdown/index.js.map +1 -1
- package/lib/markdown/mathpix-markdown-plugins.js +2 -1
- package/lib/markdown/mathpix-markdown-plugins.js.map +1 -1
- package/lib/markdown/md-block-rule/begin-tabular/common.d.ts +15 -1
- package/lib/markdown/md-block-rule/begin-tabular/common.js +55 -9
- package/lib/markdown/md-block-rule/begin-tabular/common.js.map +1 -1
- package/lib/markdown/md-block-rule/begin-tabular/index.d.ts +3 -0
- package/lib/markdown/md-block-rule/begin-tabular/index.js +6 -5
- package/lib/markdown/md-block-rule/begin-tabular/index.js.map +1 -1
- package/lib/markdown/md-block-rule/begin-tabular/multi-column-row.d.ts +2 -0
- package/lib/markdown/md-block-rule/begin-tabular/multi-column-row.js +6 -3
- package/lib/markdown/md-block-rule/begin-tabular/multi-column-row.js.map +1 -1
- package/lib/markdown/md-block-rule/begin-tabular/parse-tabular.d.ts +2 -1
- package/lib/markdown/md-block-rule/begin-tabular/parse-tabular.js +77 -50
- package/lib/markdown/md-block-rule/begin-tabular/parse-tabular.js.map +1 -1
- package/lib/markdown/md-block-rule/begin-tabular/sub-tabular.d.ts +3 -1
- package/lib/markdown/md-block-rule/begin-tabular/sub-tabular.js +19 -5
- package/lib/markdown/md-block-rule/begin-tabular/sub-tabular.js.map +1 -1
- package/lib/markdown/md-block-rule/begin-tabular/tabular-td.d.ts +3 -3
- package/lib/markdown/md-block-rule/begin-tabular/tabular-td.js +10 -4
- package/lib/markdown/md-block-rule/begin-tabular/tabular-td.js.map +1 -1
- package/lib/markdown/md-core-rules/set-positions.js +90 -15
- package/lib/markdown/md-core-rules/set-positions.js.map +1 -1
- package/lib/markdown/md-inline-rule/tabular.js +2 -1
- package/lib/markdown/md-inline-rule/tabular.js.map +1 -1
- package/lib/markdown/md-latex-footnotes/block-rule.js +72 -3
- package/lib/markdown/md-latex-footnotes/block-rule.js.map +1 -1
- package/lib/markdown/md-renderer-rules/render-tabular.js +3 -2
- package/lib/markdown/md-renderer-rules/render-tabular.js.map +1 -1
- package/lib/markdown/md-theorem/block-rule.js +10 -6
- package/lib/markdown/md-theorem/block-rule.js.map +1 -1
- package/lib/mathpix-markdown-model/index.d.ts +2 -0
- package/lib/mathpix-markdown-model/index.js +2 -1
- package/lib/mathpix-markdown-model/index.js.map +1 -1
- package/package.json +1 -1
- package/pr-specs/2026-05-footnote-perf-and-parser-invariants.md +246 -0
- package/pr-specs/2026-05-tabular-vertical-align-bracket.md +270 -0
|
@@ -0,0 +1,246 @@
|
|
|
1
|
+
# PR: Footnote block-rule performance + tabular/theorem parser invariants
|
|
2
|
+
|
|
3
|
+
Status: Implemented
|
|
4
|
+
Owner: @OlgaRedozubova
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Context
|
|
9
|
+
|
|
10
|
+
Three independent issues in the block-parsing layer, surfaced together while profiling `markdownToHTMLSegments` on large inputs:
|
|
11
|
+
|
|
12
|
+
### 1. `latex_footnote_block` / `latex_footnotetext_block` are O(N × M) per block
|
|
13
|
+
|
|
14
|
+
Both rules are registered as block rules and therefore invoked at the start of every block in the document. Their forward-scanning loop appends each line to a growing `fullContent` buffer and re-runs the opening-tag regex on the **whole** buffer after every line. For a paragraph of `N` lines totaling `M` characters this is O(N × M) per invocation. Across thousands of blocks in documents that contain few or no `\footnote` / `\footnotetext` directives, this turns into the dominant parse cost — the rules do no useful work but pay their full scanning cost for every block start.
|
|
15
|
+
|
|
16
|
+
V8 CPU profiling on a 2.5 MB / 43,607-line MMD input (706 `\begin{tabular}`, 1,585 `\section*`, **3 `\footnote{}`, 3 `\footnotetext{}`**) measured wall-clock at 93.5 s and attributed:
|
|
17
|
+
|
|
18
|
+
| Frame | Self time |
|
|
19
|
+
|-------|----------:|
|
|
20
|
+
| `latex_footnote_block` | 60.3 s (64%) |
|
|
21
|
+
| `RegExp \\footnote\s{0,}\[...\]\s{0,}{\|...` | 23.4 s (25%) |
|
|
22
|
+
| GC | 8.0 s |
|
|
23
|
+
| Everything else | 1.8 s |
|
|
24
|
+
|
|
25
|
+
Together the rule and its regex consumed ~89% of total time on a document with only 3 footnotes.
|
|
26
|
+
|
|
27
|
+
### 2. `setChildrenPositions` traversal does not match the existing tabular skip
|
|
28
|
+
|
|
29
|
+
The top-level `setPositions` already excludes `tabular`-typed tokens from child traversal, since tabular subtrees carry parser-private structure (notably the shared, frozen close-token singletons emitted during cell construction). The recursive `setChildrenPositions` does not skip them, and the inline tabular variant `tabular_inline` (used for subtables embedded in paragraphs) is missing from the existing skip list. When such a token appears as a child of a non-tabular parent and the caller passes `addPositionsToTokens: true`, the recursive walk reaches the frozen singletons and assignment of `.positions` throws:
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
TypeError: Cannot add property positions, object is not extensible
|
|
33
|
+
at setChildrenPositions (md-core-rules/set-positions.ts:84)
|
|
34
|
+
at setChildrenPositions (md-core-rules/set-positions.ts:134) ← recursive call
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
`markdownToHTMLSegments` returns `null` on any document that contains an inline subtable when this option is set.
|
|
38
|
+
|
|
39
|
+
### 3. `BeginTheorem` emits open tokens before validating the environment name
|
|
40
|
+
|
|
41
|
+
`BeginTheorem` matches any `\begin{NAME}…\end{NAME}` whose NAME is not in `latexEnvironments` / `mathEnvironments`. The rule currently:
|
|
42
|
+
|
|
43
|
+
1. Pushes a `paragraph_open` token (level +1) with `class="theorem_block"`.
|
|
44
|
+
2. Optionally pushes an `inline` token for content before the `\begin`.
|
|
45
|
+
3. Looks up `getTheoremEnvironmentIndex(envName)` — `-1` for unregistered environments.
|
|
46
|
+
4. Returns `false`.
|
|
47
|
+
|
|
48
|
+
markdown-it has no rollback on `return false`: the open tokens stay in `state.tokens`, and the renderer emits an unclosed `<div class="theorem_block">`. On documents that contain `\begin{NAME}…\end{NAME}` blocks for names not registered via `\newtheorem` (e.g. `tikzpicture`, `lemma`, `example`), HTML output accumulates unmatched opening divs that nest around subsequent content.
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Goal
|
|
53
|
+
|
|
54
|
+
1. Make `latex_footnote_block` and `latex_footnotetext_block` O(|src|) on first invocation per parse and O(1) per subsequent block-start when no relevant directive exists at or after the current source position. Preserve the three existing input shapes:
|
|
55
|
+
- `\footnote{…}` placed mid-line.
|
|
56
|
+
- `\footnote{…}` whose opening tag spans line breaks (e.g. `\footnote\n{`, `\footnote[1]\n\n{`).
|
|
57
|
+
- Block-level constructs nested inside `\footnote{…}` content (lists, paragraphs, tables, etc.).
|
|
58
|
+
2. Extend the existing tabular skip in `setPositions` / `setChildrenPositions` so the inline tabular variant is also opt-out of source-position assignment, restoring `markdownToHTMLSegments` correctness when subtables are present and `addPositionsToTokens: true`.
|
|
59
|
+
3. Hoist the environment-name validation in `BeginTheorem` above the first `state.push`, so unregistered environments cannot leave half-built token sequences in the stream.
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Non-Goals
|
|
64
|
+
|
|
65
|
+
- Rewriting the footnote rule's tokenization pipeline (Phase 2 / Phase 3 / nested `state.md.block.parse`) — left untouched to minimize regression risk.
|
|
66
|
+
- Tightening the `reOpenTagFootnoteG` / `reOpenTagFootnotetextG` patterns — they already match correctly; the issue is *how often* they run, not their cost per match.
|
|
67
|
+
- Rewriting `findOpenCloseTags(fullContent, …)` calls in Phase 3 — they only execute for actual footnote blocks (≤ number of footnotes in doc) and are not in the profile.
|
|
68
|
+
- Un-freezing the shared close-token singletons. The freeze is the parser's contract that close markers carry no per-token state. The `setPositions` change is on the consumer side.
|
|
69
|
+
- Unifying the Phase 1 terminator asymmetry — `latex_footnote_block` only checks `fence`, while `latex_footnotetext_block` runs the full terminator-rules list. Pre-existing behaviour, separate refactor.
|
|
70
|
+
- Direct unit tests for `getCachedSrcPositions` — helper is module-private; exporting for testability is an antipattern. Cache behavior is exercised through integration fixtures (notably `\blfootnotetext`-only doc, which specifically triggers the pre-gate path). Direct unit-test coverage is a separate refactor.
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Current Behavior (before)
|
|
75
|
+
|
|
76
|
+
### Footnote rules
|
|
77
|
+
|
|
78
|
+
`latex_footnote_block` is registered as a markdown-it block rule (`before('lheading')`), invoked at every block-start during parse. On every invocation:
|
|
79
|
+
|
|
80
|
+
1. Reads the first line of the candidate block.
|
|
81
|
+
2. If the regex `reOpenTagFootnoteG` does not match the first line, enters a forward-scanning loop that:
|
|
82
|
+
- Appends each subsequent non-empty line to `fullContent`.
|
|
83
|
+
- Runs `reOpenTagFootnoteG.test(fullContent)` on the **growing** string after every line.
|
|
84
|
+
3. Bails out only when an empty line, fence, or `endLine` is reached.
|
|
85
|
+
|
|
86
|
+
For a paragraph of `N` lines totaling `M` characters, step 2 is O(N × M) per invocation. `latex_footnotetext_block` follows the identical structure with `reOpenTagFootnotetextG`.
|
|
87
|
+
|
|
88
|
+
### `setPositions`
|
|
89
|
+
|
|
90
|
+
Top-level `setPositions` short-circuits before recursing into `tabular`-typed tokens. The recursive `setChildrenPositions` does not — when invoked recursively on a non-tabular parent whose subtree contains a `tabular_inline` child, it walks into the children of that subtable and tries to assign `.positions` onto frozen close-token singletons, throwing.
|
|
91
|
+
|
|
92
|
+
### `BeginTheorem`
|
|
93
|
+
|
|
94
|
+
Validation of `envName` against the registered theorem environments runs after the `paragraph_open` and optional `inline` token are already pushed. Returning `false` from validation does not roll back the pushed tokens.
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## Desired Behavior (after)
|
|
99
|
+
|
|
100
|
+
1. **Whole-document early exit (footnotes).** On its first invocation per parse, each rule sweeps `state.src` once with `RegExp.exec` (lastIndex reset before scan) and caches the offset of the LAST occurrence on the parser state. Subsequent calls return that cached offset in O(1); when it is before the current block's start, the rule returns `false` without entering any loop.
|
|
101
|
+
|
|
102
|
+
2. **Per-line token guard inside the accumulation loop (footnotes).** The expensive `reOpenTagFootnoteG.test(fullContent)` runs only after a line that contains the literal token has been seen. Sound by construction: the regex always begins with `\\footnote` (resp. `\\footnotetext` / `\\blfootnotetext`), and those literals are contiguous and cannot span a line break. Guard regex `/\\footnote(?![a-zA-Z])/` (resp. `/\\(?:bl)?footnotetext(?![a-zA-Z])/`) excludes `\footnotemark` / `\footnotesize` etc. so the guard does not hold the line on prefix-only matches.
|
|
103
|
+
|
|
104
|
+
3. **`setChildrenPositions` early-return for `tabular` and `tabular_inline` parents.** Mirrors the existing top-level skip and prevents recursion from reaching the frozen close-token singletons. The top-level skip list is also broadened to `['tabular', 'tabular_inline']` for symmetry.
|
|
105
|
+
|
|
106
|
+
4. **`BeginTheorem` validates `envName` before the first push.** The `getTheoremEnvironmentIndex(envName)` check is moved above the `state.push('paragraph_open', …)` calls, so an unregistered environment returns `false` without leaving any token in the stream.
|
|
107
|
+
|
|
108
|
+
5. Tokenization, output, and silent-mode semantics are unchanged. All existing footnote, tabular, and theorem test cases produce byte-identical HTML.
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## Constraints / Invariants
|
|
113
|
+
|
|
114
|
+
- The keyword-position cache is pinned to `state` (not `state.env`). Nested `state.md.block.parse(content, …)` calls construct a new `StateBlock` with their own `src`, so each scope computes its own positions; an outer cache cannot leak through to nested parses with different source strings.
|
|
115
|
+
- The cache holds a single `lastPos: number` per rule (-1 if no match). Constant size, released with the parser state via normal GC.
|
|
116
|
+
- Soundness of the token guard rests on a single property: the literal token `\footnote` (resp. `\footnotetext`, `\blfootnotetext`) is a contiguous run of characters that cannot be split by `\n`. If no line in `fullContent` contains the literal followed by a non-letter (the guard's lookahead anchor), the rule's regex — which has the literal as a required prefix in every alternative followed by `\s*`/`[`/`{` — cannot match.
|
|
117
|
+
- `\footnotemark` / `\footnotesize` and other `\footnote*` longer forms are excluded by the `(?![a-zA-Z])` lookahead, so the guard correctly skips lines that only contain those prefixes.
|
|
118
|
+
- The footnote rule's silent-mode contract (advance `state.line` only when not silent) is preserved — the new early-exit returns `false`, which is the same as the pre-change path on a non-matching block.
|
|
119
|
+
- `tabular` and `tabular_inline` token children are parser-private (frozen close-token singletons + per-cell tokens with non-source content). Skipping them in `setChildrenPositions` matches the existing semantics — source-position metadata on those nodes is not consumed downstream because the tabular renderer reconstructs cell content from `token.content` rather than from `state.src` slices.
|
|
120
|
+
- `BeginTheorem`'s behavior for registered environments is unchanged — the lookup is the same, downstream tokens are the same, only the order of operations changes.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## Known limitations
|
|
125
|
+
|
|
126
|
+
### Pre-existing, not introduced here
|
|
127
|
+
|
|
128
|
+
**Unregistered theorem env body drop.** For `\begin{NAME}…\end{NAME}` without a matching `\newtheorem{NAME}`, the body text inside the environment is consumed by an unrelated math-block fallback rule and dropped from the rendered HTML — the rule emits an empty `<span class="math-block equation-number">` placeholder rather than the body text. This was already the behavior on master prior to this PR; this PR only stops emitting the unmatched leading `<div class="theorem_block">` wrapper, which made the output look like a real theorem despite the dropped body. A real fix is out of scope (would require either auto-registering common environments or falling back to plain-paragraph rendering for unrecognised `\begin{NAME}…\end{NAME}` blocks).
|
|
129
|
+
|
|
130
|
+
### Introduced by this PR
|
|
131
|
+
|
|
132
|
+
**Footnote position cache does not exclude code-fenced regions.** `getCachedSrcPositions` sweeps `state.src` once with the token-guard regex; matches inside fenced code blocks, indented code blocks, or inline code count toward the cache and the rule still runs Phase 1 for blocks at or before them. The block-rule itself bails on the `findOpenCloseTags` step for code-protected matches, so output is unchanged — the cost is "rule didn't early-exit when it could have" for documents where the only `\footnote*` literals live inside code. Acceptable because (1) documents using `\footnote{}` literally inside code are rare, (2) the worst-case is one full Phase 1 walk per parse instead of zero, not a return to O(N×M) per block.
|
|
133
|
+
|
|
134
|
+
**Symbol-keyed cache mutations on `StateBlock` are invisible to `JSON.stringify` / `Object.keys` / `for…in`.** Intentional — Symbols don't appear in those enumerations, so the cache cannot leak into serialised state or `getOwnPropertyNames`-based introspection. Anyone debugging the StateBlock instance must use `Object.getOwnPropertySymbols(state)` to see the cache slot. Standard for module-private state on objects you don't own; called out so future contributors don't go looking for `state.__mmd_*` strings.
|
|
135
|
+
|
|
136
|
+
**Empty `<span class="mmd-highlight">` wrappers around markup-only inner tokens in span-fallback links.** When a highlight overlaps `[**bold**](url)` the fallback positions strong_open/strong_close (markup tokens with empty rendered content) and the highlight renderer emits `<span class="mmd-highlight" style="…"></span>` wrappers. Rendered output is visually correct (empty span shows nothing) but consumers depending on `.mmd-highlight` having content may need to filter empty matches. Pinned in `tests/_data/_highlights/_data.js`.
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## Done When
|
|
141
|
+
|
|
142
|
+
- [x] `latex_footnote_block` short-circuits in O(1) per block-start (after one O(|src|) sweep per parse) when no `\footnote` keyword exists at or after the current block.
|
|
143
|
+
- [x] `latex_footnotetext_block` short-circuits in O(1) per block-start (after one O(|src|) sweep per parse) when no `\footnotetext` / `\blfootnotetext` keyword exists at or after the current block.
|
|
144
|
+
- [x] Per-iteration substring guard prevents the O(`fullContent.length`) regex from running on lines that cannot complete the pattern.
|
|
145
|
+
- [x] `setChildrenPositions` mirrors top-level skip for `tabular`/`tabular_inline` — positions the token by `content.length` but doesn't recurse into cell children. Per-child `Object.isExtensible` guard remains as a safety net for any other frozen tokens.
|
|
146
|
+
- [x] `markdownToHTMLSegments({ addPositionsToTokens: true })` returns a non-null result on documents with inline subtables.
|
|
147
|
+
- [x] `BeginTheorem` (non-silent) bails for unregistered environments before `endTag()` / forward scan / any `state.push`; silent-mode terminator probes preserve the original close-tag-based answer so the `\newtheorem` ↔ `\begin{NAME}` adjacent-line handshake keeps working.
|
|
148
|
+
- [x] All existing tests pass; new fixtures cover each change.
|
|
149
|
+
- [x] Output of `markdownToHTMLSegments` on the footnote-perf benchmark input is byte-identical before and after the change (same content, same segment map).
|
|
150
|
+
- [x] Status is updated to `Implemented`.
|
|
151
|
+
|
|
152
|
+
---
|
|
153
|
+
|
|
154
|
+
## Architecture
|
|
155
|
+
|
|
156
|
+
### Footnote rules: per-state position cache + per-line token guard
|
|
157
|
+
|
|
158
|
+
Module-private helper `getCachedSrcPositions(state, key, patternG)` in `md-latex-footnotes/block-rule.ts`. Cache key is a per-module `Symbol` on the `StateBlock` instance — Symbol uniqueness rules out collisions with `StateBlock` fields or other plugins' state property writes. Cache entry is `{ src, lastPos }` (single integer, -1 if no match); hit is gated on `cached.src === state.src` so a `StateBlock` reused with a swapped src recomputes instead of returning stale offsets. The `/g` pattern is helper-owned (named `*_SWEEP_G`) and `lastIndex` is reset before and after each scan so the shared regex is safe across calls.
|
|
159
|
+
|
|
160
|
+
Both rules early-exit when `lastPos < state.bMarks[startLine]` — O(1) per block-start once the cache is warm. The cache is **state**-keyed not `state.env`-keyed because `state.md.block.parse(content, …)` builds a fresh `StateBlock` per nested parse — env-keying would alias outer/nested src strings.
|
|
161
|
+
|
|
162
|
+
Inside Phase 1 (multi-line opening tag detection) the heavy `reOpenTagFootnoteG.test(fullContent)` is gated by two cheap per-line checks: (1) the literal token `\footnote` (or `\footnotetext` / `\blfootnotetext`) appears on this line, anchored by `(?![a-zA-Z])` to exclude `\footnotemark` / `\footnotesize`; (2) the line contains `{`. Either gate's `continue` shortcuts the regex. The token-guard regexes live in `common/consts.ts` so the soundness test pins the same patterns the rule consumes — single source of truth.
|
|
163
|
+
|
|
164
|
+
**Soundness.** Every alternative of the open-tag regex starts with the literal `\footnote*` followed by `\s*` (which permits `\n`) then `[`/`{`. The character right after the literal is never a letter, so the guard's `(?![a-zA-Z])` lookahead admits exactly the same prefix positions as the full regex. `fullContent` is built by joining lines with `\n` and `\n` is not in `\footnote*`, so the literal cannot straddle a join — if no line contains it, the full regex cannot match.
|
|
165
|
+
|
|
166
|
+
### `setChildrenPositions`: per-child Object.isExtensible guard + link_open span fallback
|
|
167
|
+
|
|
168
|
+
**Tabular skip + frozen-token guard.** A top-of-loop skip in `setChildrenPositions` mirrors the existing top-level skip for `tabular`/`tabular_inline`: the token gets `.positions` and `content_test_str` from `content.length`, `pos` advances so siblings stay correctly placed, and recursion into cells is skipped. Cell children (td_open, inline math, includegraphics) don't get `.positions` — same as master pre-PR; cell content is reconstructed from `token.content`, so `state.src` slicing on cell descendants is meaningless. A per-child `Object.isExtensible` check remains downstream as a safety net for any other frozen tokens that could appear in unrelated subtrees.
|
|
169
|
+
|
|
170
|
+
**link_open: strict triple + span fallback.** The legacy branch unconditionally treated `(link_open, i+1, i+2)` as `(link_open, text, link_close)` and produced `NaN` / off-by-N positions for fancy contents (`[**bold**](url)`, `` [`code`](url) ``, `[](url)`). Replaced with two branches:
|
|
171
|
+
|
|
172
|
+
- **Strict triple** for `[text](url)`: explicit type checks (`i+1.type === 'text' && i+2.type === 'link_close' && typeof i+2.nextPos === 'number'`) preserve the legacy math + `highlightAll` cascade exactly. Snapshots in `tests/_data/_tokenPositions/_data.js` pin the byte-identical positions.
|
|
173
|
+
- **Span fallback** for any other link_open layout: scans forward for the matching `link_close` (depth-tracking), sets `link_open.positions` to the full `[…](…)` span via `start_content + link_close.nextPos`, advances `pos = startPos + 1` if `state.src[startPos] === '['` (else `pos = startPos` — defensive for non-bracket emitters), and lets the per-child loop position each inner child by its own `nextPos`/`inlinePos`/`content`/`markup`. When the loop reaches `link_close`, its `nextPos` jumps `pos` to the link end — no `i +=` skip needed.
|
|
174
|
+
|
|
175
|
+
The `highlightAll` cascade is intentionally omitted from the span branch — implementing it would require a post-hoc pass over the inner range, and the legacy code didn't compute it correctly for fancy links either (NaN positions short-circuited highlight resolution).
|
|
176
|
+
|
|
177
|
+
### `BeginTheorem`: validate envName before push
|
|
178
|
+
|
|
179
|
+
`getTheoremEnvironmentIndex(envName)` is now hoisted above `endTag()` / forward-scan / `state.push`, but only in non-silent mode. Silent-mode invocations keep the original close-tag-based answer — required by `newTheoremBlock`'s terminator pass, which calls `BeginTheorem(silent=true)` on the `\begin{NAME}` line that immediately follows `\newtheorem{NAME}{…}` *before* NAME is registered. Non-silent is the path that mutates `state.tokens`, so the registration check is load-bearing there. Unregistered environments now bail in O(1) without touching `state.tokens` and the renderer no longer emits unmatched `<div class="theorem_block">` wrappers.
|
|
180
|
+
|
|
181
|
+
---
|
|
182
|
+
|
|
183
|
+
## Performance impact
|
|
184
|
+
|
|
185
|
+
Profiled on a 2.45 MB / 43,608-line MMD input (706 `\begin{tabular}`, 1,065 `\section*`, 0 `\footnote{}`, 3 `\footnotetext{}`). The pathological structure here is **706 long tabular blocks without empty-line separators** — Phase 1 forward-scan terminates only on `fence`/empty-line/EOF, so each block-start near a tabular pays O(table-size) to re-test the growing `fullContent` against `reOpenTagFootnoteG`.
|
|
186
|
+
|
|
187
|
+
Median of 5 runs (1 warmup + 5 measure via `performance.now()`); single-run on master because of the multi-minute parse time:
|
|
188
|
+
|
|
189
|
+
| Stage | Master | PR | Speedup | Output bytes |
|
|
190
|
+
|---|---:|---:|---:|---:|
|
|
191
|
+
| `markdownToHTML` | 178,078 ms | 1,525 ms | **117×** | 22,564,133 (byte-identical) |
|
|
192
|
+
| `markdownToHTMLSegments` | 193,023 ms | 1,498 ms | **129×** | 22,564,121 / 9,082 segments (byte-identical) |
|
|
193
|
+
| `markdownToHTMLSegments({ addPositionsToTokens: true })` | 207,115 ms | 1,531 ms | **135×** | 22,564,121 / 9,082 segments (byte-identical) |
|
|
194
|
+
|
|
195
|
+
Output verified byte-identical between master and PR via diff (modulo random per-parse IDs from the smiles plugin, which is non-determinism unrelated to this PR).
|
|
196
|
+
|
|
197
|
+
Note: this pathological case is structural, not universal. Smaller documents (~30 KB) and a 1.1 MB TikZ-heavy input parse fast on master too — the quadratic blow-up requires (a) >1 MB src and (b) many long tabular blocks without empty-line separators between rows.
|
|
198
|
+
|
|
199
|
+
A second TikZ-heavy benchmark (44 `\begin{tabular}`, 482 `\begin{tikzpicture}`, 726 sections) covers the `setPositions` and `BeginTheorem` paths:
|
|
200
|
+
|
|
201
|
+
| Stage | Before | After |
|
|
202
|
+
|---|---|---:|
|
|
203
|
+
| `markdownToHTMLSegments({ addPositionsToTokens: true })` | TypeError, returns `null` | 741 ms |
|
|
204
|
+
| `<div class="theorem_block">` count in HTML output | 75 (unmatched) | 0 |
|
|
205
|
+
| `markdownToHTMLSegments({ addPositionsToTokens: false })` | 581 ms | 581 ms (unchanged) |
|
|
206
|
+
|
|
207
|
+
The post-change segment count on this input rises because the segment renderer in `markdownToHtmlPipelineSegments` previously coalesced adjacent blocks under each unclosed `<div class="theorem_block">` wrapper (the segment delimiter logic waits for matching close tags). Removing the unmatched opens lets segments break at their natural boundaries.
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## Files Changed
|
|
212
|
+
|
|
213
|
+
| File | Change |
|
|
214
|
+
|------|--------|
|
|
215
|
+
| `src/markdown/common/consts.ts` | Add `reFootnoteToken` / `reFootnotetextToken` token-guard exports. |
|
|
216
|
+
| `src/markdown/md-latex-footnotes/block-rule.ts` | Position cache + early-exit + Phase 1 token/`{` gates. |
|
|
217
|
+
| `src/markdown/md-core-rules/set-positions.ts` | `Object.isExtensible` guard, `tabular_inline` skip, link_open strict-triple + span fallback. |
|
|
218
|
+
| `src/markdown/md-theorem/block-rule.ts` | Hoist envName validation in non-silent `BeginTheorem`. |
|
|
219
|
+
| `tests/_data/_footnotes_latex/_data-footnote.js` | Negative + boundary + multi-line opening tag fixtures. |
|
|
220
|
+
| `tests/_data/_footnotes_latex/_data-footnotetext.js` | `\blfootnotetext`-only + mixed `\footnotetext`+`\blfootnotetext` fixtures. |
|
|
221
|
+
| `tests/_data/_footnotes_latex/_data_known_quirks_footnote.js` | Pre-existing quirks pinned (mixed mark+real, nested numbering). |
|
|
222
|
+
| `tests/_data/_theorem/_data.js` + `_data_known_quirks.js` | Adjacent-line handshake fixture; quirk fixtures under "TO BE FIXED". |
|
|
223
|
+
| `tests/_data/_tokenPositions/_data.js` | Strict-triple + span-fallback link snapshots; tabular_inline sibling-positions regression. |
|
|
224
|
+
| `tests/_data/_highlights/_data.js` | Span-fallback highlight pin (3 cases). |
|
|
225
|
+
| `tests/_html-segments.js` | inline-tabular, fancy-link, tikzpicture (segment count) regression cases. |
|
|
226
|
+
| `tests/_footnotes_latex.js` | Soundness canary + ratio-based perf regression. |
|
|
227
|
+
|
|
228
|
+
No public API surface, no exported names, no option flags introduced.
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
## Testing
|
|
233
|
+
|
|
234
|
+
Full suite passes (3,397 tests). New coverage:
|
|
235
|
+
|
|
236
|
+
- Footnote correctness: negative fixtures for `\footnotemark` / `\footnotesize`, mixed mark+real (in quirks file), end-of-source / tShift / nested-parse boundaries, multi-line opening tag (`\footnote\n[1]\n{body}`).
|
|
237
|
+
- Footnotetext correctness: `\blfootnotetext`-only doc (pre-gate regression), mixed `\footnotetext` + `\blfootnotetext` in one doc.
|
|
238
|
+
- Token-guard soundness: structural canary (every regex alternative starts with `\footnote*` literal and ends with `{`) + per-alternative samples + forbidden samples (letter-continuation rejection) + no-match samples.
|
|
239
|
+
- Position invariant: canary that `link_open.positions` carries no `start_content`/`end_content` in either branch (strict-triple `[text](url)` and span fallback `[**bold**](url)`).
|
|
240
|
+
- Perf regression: parse-only ratio-based scaling test (200 vs 2000-line input, median of 5 via `performance.now()`, runs `md.parse()` directly to bypass MathJax/render overhead). Synthetic worst case — long unbroken paragraph with the literal `\footnote*` substring inline on every line (no `{`/`[` after), so master's regex backtracks against growing `fullContent` per iteration. 4 cases cover both rules × both paths: `latex_footnote_block` and `latex_footnotetext_block`, each with literal at the start (cache early-exit path) and at the end (per-line gate path). Linear scaling gives ratio ~10; master's quadratic gives ~67-68×, limit at 60× catches the regression in 3 of 4 cases on master while passing all on PR.
|
|
241
|
+
- Tabular: `markdownToHTMLSegments({ addPositionsToTokens: true })` regression for inline-tabular children; sibling-position snapshot for `\begin{tabular}…\end{tabular}` between paragraphs.
|
|
242
|
+
- Theorem: HTML pin for `\newtheorem` + adjacent `\begin{NAME}` (silent-mode handshake); pre-existing body-drop quirk pinned in `_data_known_quirks.js`. Tikzpicture segment-count regression: 3 segments (Before/body/After) vs master's 1.
|
|
243
|
+
- Link guard: strict-triple `[text](url)` and span fallback for `[**bold**](url)`, `` [`code`](url) ``, `[](url)`, plus nested-bracket `[outer [inner](b)](a)` defensive case.
|
|
244
|
+
- Highlight: 3 fancy-link snapshots (bold/code/image) pinning empty `<span class="mmd-highlight">` wrappers.
|
|
245
|
+
|
|
246
|
+
Output equivalence verified on the footnote-perf benchmark (byte-identical HTML, identical segment map) and on a TikZ-heavy benchmark (no more unmatched `<div class="theorem_block">`).
|
|
@@ -0,0 +1,270 @@
|
|
|
1
|
+
# PR: Parse `[t]/[c]/[b]` vertical-align bracket on `\begin{tabular}`
|
|
2
|
+
|
|
3
|
+
Status: Active
|
|
4
|
+
Owner: @OlgaRedozubova
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Context
|
|
9
|
+
|
|
10
|
+
Standard LaTeX `\begin{tabular}` accepts an optional positional argument `[t]/[c]/[b]` that controls how the tabular box is aligned vertically with surrounding context. mathpix-markdown-it currently silently drops this bracket — the parser regex matches only `\begin{tabular}\s*\{...\}` and ignores anything between `\begin{tabular}` and `{`.
|
|
11
|
+
|
|
12
|
+
This PR adds bracket parsing and uses the parsed value as the default vertical alignment for the `l/c/r/S` columns of that table. It also adds an opt-in renderer option that lets consumers flip the absent-bracket default to `top` (or `bottom`) without modifying the source MMD.
|
|
13
|
+
|
|
14
|
+
The motivating case is tables that contain mixed-height cells — for example, one cell holds a long stacked list (often via a nested `\begin{tabular}{l}`), while siblings carry short text. Today every cell renders vertically centered, so the short cells visually float in the middle of the row instead of starting at the top of the row's content. With this change, a document author (or generator) can write `\begin{tabular}[t]{|l|l|l|}` and get top-aligned `<td>` cells, matching the standard LaTeX `[t]` semantics.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## Goal
|
|
19
|
+
|
|
20
|
+
- Parse the optional `[t]/[c]/[b]` bracket on `\begin{tabular}` and propagate it through the tabular pipeline.
|
|
21
|
+
- Use the parsed bracket as the default vertical alignment for `l/c/r/S` columns of that table.
|
|
22
|
+
- Add an opt-in `defaultCellVerticalAlign` option that flips the absent-bracket default in both HTML output and `forLatex` export.
|
|
23
|
+
- Preserve existing behavior when no bracket is present and the option is not set.
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## Non-Goals
|
|
28
|
+
|
|
29
|
+
- Adding `\makecell` parsing (future, complementary path: per-cell vertical alignment without a row-level bracket).
|
|
30
|
+
- New non-standard column-spec letters or width inference.
|
|
31
|
+
- Per-cell vertical-align values driven by anything other than the existing column-spec mechanism (`m`/`p`/`b`) or the new row-level bracket.
|
|
32
|
+
- Auto-injecting `[t]` on outer tabulars when an inner tabular carries `[t]` (would be a non-LaTeX heuristic).
|
|
33
|
+
- Any change to math, list, or non-tabular rendering paths.
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## Current Behavior
|
|
38
|
+
|
|
39
|
+
- `\begin{tabular}[t]{|l|l|l|}` — bracket is silently dropped at the regex in `parse-tabular.ts`. Rendered identically to `\begin{tabular}{|l|l|l|}`.
|
|
40
|
+
- `getVerticallyColumnAlign` (`common.ts`) hard-codes `vAlign = 'middle'` for `l/c/r/S` columns. Only `m`/`p`/`b` produce non-middle vAlign.
|
|
41
|
+
- `forLatex` export emits the column spec back into the `latex` payload but does not preserve any bracket (it never received one).
|
|
42
|
+
- HTML `<td>` style omits `vertical-align` unless an explicit `m`/`p`/`b` column type is set; the browser default (`middle`) applies.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Desired Behavior
|
|
47
|
+
|
|
48
|
+
### Bracket parsing
|
|
49
|
+
|
|
50
|
+
- `\begin{tabular}[t]{|l|l|l|}` → all `l/c/r/S` columns of that table default to `vAlign: 'top'`.
|
|
51
|
+
- `\begin{tabular}[c]{|l|l|l|}` → defaults to `'middle'` (matches existing behavior).
|
|
52
|
+
- `\begin{tabular}[b]{|l|l|l|}` → defaults to `'bottom'`.
|
|
53
|
+
- `\begin{tabular}{|l|l|l|}` (absent bracket) → defaults to `'middle'` (unchanged) unless `defaultCellVerticalAlign` option overrides it.
|
|
54
|
+
- Any other bracket value (whitespace, unknown letter, multi-char) → ignored, treated as absent.
|
|
55
|
+
- Per-column `m`/`p`/`b` always overrides the row-level bracket default.
|
|
56
|
+
- The bracket on a table affects only that table's cells. It does not propagate into nested tabulars (each nested tabular is parsed with its own bracket).
|
|
57
|
+
|
|
58
|
+
### Cell-level inference (nested bracket → outer td)
|
|
59
|
+
|
|
60
|
+
- When an outer cell's content contains a nested `\begin{tabular}[t/c/b]`, the outer `<td>` inherits that vertical-align (matching LaTeX baseline semantics — `[t]` on the inner tabular sits at the top of the row baseline, effectively top-aligning the outer cell's content).
|
|
61
|
+
- Cell-level inference overrides the row-level bracket for that single cell. Siblings in the same row are not affected.
|
|
62
|
+
- Per-column `m`/`p`/`b` on the outer column still wins (most-specific rule).
|
|
63
|
+
- If the cell contains a bare nested `\begin{tabular}{...}` without bracket, no inference fires; the outer cell uses the row-level default.
|
|
64
|
+
|
|
65
|
+
### `forLatex` export
|
|
66
|
+
|
|
67
|
+
- When the source had a bracket, the bracket is preserved verbatim in `tableOpen.meta.bracket`.
|
|
68
|
+
- When `defaultCellVerticalAlign: 'top'` (or `'bottom'`) and the source had no explicit bracket, the option's value is injected as `'t'` (or `'b'`) into `tableOpen.meta.bracket` so the consumer can serialize `\begin{tabular}[pos]{...}` and keep HTML and exported LaTeX consistent. **Top-level only** — nested absent-bracket tabulars stay bracket-less to preserve round-trip.
|
|
69
|
+
- When `defaultCellVerticalAlign: 'middle'` or unset, no `meta.bracket` is set on absent-bracket tables (preserves round-trip).
|
|
70
|
+
- Every `td_open` of a tabular with an effective bracket carries `meta.parentBracket` set to that bracket. Consumers iterating forLatex tokens see parent context directly on each cell. `AddTd` and `AddTdSubTable` accept an optional `meta?: TTdMeta` parameter; the multicol path sets `parentBracket` alongside its existing meta fields. New `TTdMeta` type in `common.ts` captures the known shape of `td_open.meta` for forLatex (parentBracket, multi, colCount, colSpecs, currentColIndex, isSubTabular, forceMultiFixedWidth).
|
|
71
|
+
|
|
72
|
+
### `defaultCellVerticalAlign` option
|
|
73
|
+
|
|
74
|
+
- New top-level option: `defaultCellVerticalAlign?: 'top' | 'middle' | 'bottom'`.
|
|
75
|
+
- Default unset → no override; existing defaults apply.
|
|
76
|
+
- Affects all `\begin{tabular}` blocks in the document where the bracket is absent.
|
|
77
|
+
- A document with an explicit bracket always wins over the option.
|
|
78
|
+
- Propagates into `\multicolumn` / `\multirow` cells the same way the row-level bracket does.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## Constraints / Invariants
|
|
83
|
+
|
|
84
|
+
- **No-op on existing MMD**: documents without the bracket and without the option set must produce byte-identical HTML output.
|
|
85
|
+
- **LaTeX semantics for the bracket**: `[t]/[c]/[b]` is a row-level default. Per-column `m`/`p`/`b` must continue to win.
|
|
86
|
+
- **Round-trip safety**: a source with `\begin{tabular}[t]{|l|l|}` must serialize back to `\begin{tabular}[t]{|l|l|}` in `forLatex` mode (bracket preserved).
|
|
87
|
+
- **Scope rules**: bracket on a table is row-level for that table's own cells. Bracket on a nested tabular additionally propagates to the outer cell containing it (cell-level inference, matches LaTeX baseline semantics). Bracket does not propagate into deeper nested tabulars.
|
|
88
|
+
- **Unknown bracket value**: silently treated as absent; never throws or produces malformed output.
|
|
89
|
+
- **Existing performance optimizations are not regressed**: `columnStyleCache`, `cellAttrsCache`, shared close-tokens, `colsToFixWidth` Set, and per-parse interning all continue to work. The new `vAlign` value (one of `'top'`/`'middle'`/`'bottom'`) participates in style key generation as before.
|
|
90
|
+
- **Test surface**: all existing tests must pass.
|
|
91
|
+
- **Frozen shared `td_open.meta`**: the per-bracket `TD_META_BY_BRACKET` singletons are `Object.freeze`'d; extend via spread (`{...meta, extra}`), never in-place. No clone-on-write marker (asymmetric with `attrs` / `attrsSharedMarker`) because the codebase has no in-place meta mutators — add a `metaSharedMarker` only when a legit mutator appears.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## Public API changes
|
|
96
|
+
|
|
97
|
+
| Option | Type | Default | Effect |
|
|
98
|
+
|--------|------|--------:|--------|
|
|
99
|
+
| `defaultCellVerticalAlign` | `'top' \| 'middle' \| 'bottom' \| undefined` | `undefined` | Vertical-align fallback for `\begin{tabular}` blocks without an explicit `[pos]` bracket. Affects `<td>` HTML style. Propagates into `\multicolumn`/`\multirow` cells only for `'top'`/`'bottom'` (option `'middle'` stays no-op on multicol to preserve legacy). Per-column `m`/`p`/`b` and any explicit `[t]/[c]/[b]` source bracket always override. Unset → byte-identical to legacy. See `forLatex export` for round-trip behavior. |
|
|
100
|
+
|
|
101
|
+
No other options introduced.
|
|
102
|
+
|
|
103
|
+
---
|
|
104
|
+
|
|
105
|
+
## Architecture
|
|
106
|
+
|
|
107
|
+
### Bracket parsing (parse-tabular.ts)
|
|
108
|
+
|
|
109
|
+
The current regex is:
|
|
110
|
+
|
|
111
|
+
```ts
|
|
112
|
+
/(?:\\begin{tabular}\s{0,}\{([^}]*)\})/
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
Extend to capture an optional bracket:
|
|
116
|
+
|
|
117
|
+
```ts
|
|
118
|
+
/\\begin{tabular}\s*(?:\[([^\]]*)\])?\s*\{([^}]*)\}/
|
|
119
|
+
```
|
|
120
|
+
|
|
121
|
+
The captured group `[1]` is the raw bracket value (or `undefined`); group `[2]` is the column spec.
|
|
122
|
+
|
|
123
|
+
Normalize the bracket to one of `'t' | 'c' | 'b' | undefined`:
|
|
124
|
+
- Trim whitespace.
|
|
125
|
+
- Single-character match against `'t' | 'c' | 'b'`.
|
|
126
|
+
- Anything else → `undefined` (absent).
|
|
127
|
+
|
|
128
|
+
Audit `getParams` and the recursive sub-tabular splice path so the bracket is recognized regardless of which branch parses the tabular.
|
|
129
|
+
|
|
130
|
+
### Threading the bracket value
|
|
131
|
+
|
|
132
|
+
The captured bracket position needs to reach `getVerticallyColumnAlign` and the `forLatex` payload builder. Two existing call sites:
|
|
133
|
+
|
|
134
|
+
- `setTokensTabular` in `parse-tabular.ts` — this is where `getVerticallyColumnAlign` is invoked. Add a `bracketDefault?: 't' | 'c' | 'b'` parameter threaded through `setTokensTabular → getVerticallyColumnAlign`.
|
|
135
|
+
- `table_open` token construction — `latex` payload field. When `forLatex`, emit the bracket into the serialized `\begin{tabular}` open. Source bracket preserved as-is; option-derived bracket injected only if source had none.
|
|
136
|
+
|
|
137
|
+
The bracket value also enters the per-table state for `multi-column-row.ts` (`getMultiColumnMultiRow`) only if multi-row/multi-column cells inherit row-level vAlign — verify whether they currently inherit `vAlign` from the column or use their own. If they use their own, no thread-through needed.
|
|
138
|
+
|
|
139
|
+
### `getVerticallyColumnAlign` (common.ts)
|
|
140
|
+
|
|
141
|
+
Extend signature:
|
|
142
|
+
|
|
143
|
+
```ts
|
|
144
|
+
getVerticallyColumnAlign(
|
|
145
|
+
align: string,
|
|
146
|
+
numCol: number,
|
|
147
|
+
bracketDefault?: 't' | 'c' | 'b',
|
|
148
|
+
): TAlignData
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
For `l/c/r/S` switch branches, replace `vAlign.push('middle')` with a helper that maps `bracketDefault` → `'top' | 'middle' | 'bottom'`, defaulting to `'middle'` when `bracketDefault` is undefined.
|
|
152
|
+
|
|
153
|
+
`m`/`p`/`b` branches are not modified — they already set `vAlign` explicitly and that always wins.
|
|
154
|
+
|
|
155
|
+
The trailing `arrayFillDef(vAlign, defaultV, numCol)` fallback uses the same `defaultV` for symmetry — extra columns past the column-spec length get the row-level default rather than hardcoded `'middle'`.
|
|
156
|
+
|
|
157
|
+
### `defaultCellVerticalAlign` option threading
|
|
158
|
+
|
|
159
|
+
Read from `state.md.options.defaultCellVerticalAlign` at the parsing entry. When the source had no bracket and the option is set to `'top'` or `'bottom'`, treat the option's value as if it were an implicit bracket — both for `getVerticallyColumnAlign` and for the `forLatex` payload.
|
|
160
|
+
|
|
161
|
+
Document-level option, not per-call. Same option applies to every tabular in the parse.
|
|
162
|
+
|
|
163
|
+
### `forLatex` round-trip
|
|
164
|
+
|
|
165
|
+
The `latex` payload for `table_open` currently emits only the column spec. Extend so that when:
|
|
166
|
+
|
|
167
|
+
- Source bracket present → serialize as `\begin{tabular}[<src-bracket>]{...}`.
|
|
168
|
+
- Source bracket absent + `defaultCellVerticalAlign` set to `'top'`/`'bottom'` → set `tableOpen.meta.bracket` to `'t'`/`'b'` accordingly.
|
|
169
|
+
- Source bracket absent + no option → emit as today (no bracket).
|
|
170
|
+
|
|
171
|
+
The `latex` field on `table_open` is consumed by the LaTeX-emitting render path. Verify that consumer accepts the bracket-augmented payload without further modification.
|
|
172
|
+
|
|
173
|
+
### HTML `<td>` style
|
|
174
|
+
|
|
175
|
+
`composeCellStyle` in `tabular-td.ts` emits `vertical-align: ${v}` whenever `aligns.v` is non-empty. For regular `l/c/r/S` columns `bracketToVAlign` maps `'t' → 'top'`, `'b' → 'bottom'`, and everything else (`'c'`, `undefined`) → `'middle'` — so `vertical-align: middle` is always present for these cells. This matches master, where the legacy code pushed `'middle'` unconditionally; existing snapshots already include `vertical-align: middle` and remain byte-identical.
|
|
176
|
+
|
|
177
|
+
The `\multicolumn` / `\multirow` path uses a different guard: it emits `vertical-align` only when an effective bracket (`'t'`/`'c'`/`'b'`) is set, and stays no-CSS when the bracket is absent and the option is `'middle'` or unset. This preserves the legacy no-`vertical-align` output on multicol/multirow cells in absent-bracket tabulars.
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## Edge Cases
|
|
182
|
+
|
|
183
|
+
- **Whitespace**: `\begin{tabular} [t] {|l|}` — extended regex must allow whitespace between `tabular`, bracket, and `{...}`.
|
|
184
|
+
- **Multiple tabulars in one document**: each tabular parses its own bracket independently.
|
|
185
|
+
- **Nested tabulars**: outer and inner each parse their own bracket. Outer bracket does not propagate into inner; inner bracket does not propagate outward.
|
|
186
|
+
- **Multiple nested tabulars in one cell**: only the first nested `\begin{tabular}[...]` contributes its bracket to the outer cell. Subsequent nested tabulars in the same cell render with their own brackets but do not override the outer cell's vertical-align.
|
|
187
|
+
- **Unknown bracket value**: `\begin{tabular}[x]{|l|}` or `\begin{tabular}[tt]{|l|}` — bracket ignored, treated as absent.
|
|
188
|
+
- **Empty bracket**: `\begin{tabular}[]{|l|}` — treated as absent.
|
|
189
|
+
- **Per-column override**: `\begin{tabular}[t]{|l|m{2cm}|}` — column 0 = top (from bracket), column 1 = middle (from `m{}`).
|
|
190
|
+
- **`forMD` export**: the visual gating already skips `<td>` style under `forMD`. No new behavior needed for MD export — vAlign is HTML/visual only.
|
|
191
|
+
- **`forDocx`/`forPptx`**: vAlign currently propagates via the cell metadata for these exporters; verify the new vAlign values (`'top'`/`'bottom'`) are recognized. If not, existing behavior is preserved (only `'middle'` was emitted before).
|
|
192
|
+
- **`multicolumn` / `multirow`**: explicit source bracket `'t'`/`'c'`/`'b'` propagates; option `'top'`/`'bottom'` propagates; option `'middle'` does NOT (preserves legacy no-vertical-align on multicol). Explicit `\multirow[…]` always wins. Plain `\multicolumn{}`/`\multirow{}` in an absent-bracket tabular emits no `vertical-align`.
|
|
193
|
+
- **Diagbox cells**: render-tabular always emits `vertical-align: middle` for cells containing `\diagbox`/`\slashbox`/`\backslashbox`. Parser detects via `getSubTabular`'s `hasDiagbox` flag and skips its own vertical-align emit so the result has a single `vertical-align: middle` (no duplication). Outer tabular's bracket does not override this — the diagonal split visual always centers content.
|
|
194
|
+
|
|
195
|
+
---
|
|
196
|
+
|
|
197
|
+
## Done When
|
|
198
|
+
|
|
199
|
+
- [x] `parse-tabular.ts` regex captures `[t]/[c]/[b]` on `\begin{tabular}` and threads it to `getVerticallyColumnAlign`
|
|
200
|
+
- [x] All `\begin{tabular}` parsing sites in the file audited so bracket is not dropped on the recursive sub-tabular path
|
|
201
|
+
- [x] `getVerticallyColumnAlign` accepts a `bracketDefault` argument; `l/c/r/S` columns honor it; `m`/`p`/`b` columns continue to win
|
|
202
|
+
- [x] `defaultCellVerticalAlign` option threaded from `state.md.options` to the parser; treated as fallback when source bracket is absent
|
|
203
|
+
- [x] `forLatex` `tableOpen.meta.bracket` carries the source bracket; option-derived `'t'`/`'b'` is injected when the source has no bracket; `'c'`/unset preserves round-trip (no `meta.bracket`)
|
|
204
|
+
- [x] HTML output: `<td>` gains `vertical-align: top` (or `bottom`) only when bracket is present or option is set; no-op for existing MMD
|
|
205
|
+
- [x] `\multicolumn` / `\multirow` cells inherit explicit source `'t'`/`'c'`/`'b'` and option `'top'`/`'bottom'`; option `'middle'` and unset stay no-op (preserves legacy no-CSS path on multicol)
|
|
206
|
+
- [x] Explicit `\multirow[t]`/`\multirow[c]`/`\multirow[b]` always wins over the row-level default and emits explicit `vertical-align`; `[c]` no longer leaks `[t]`/`[b]` from the outer tabular
|
|
207
|
+
- [x] All existing tests pass; two `\multirow[c]` snapshots in `tests/_data/_tabular/_data_digbox.js` updated to include the now-explicit `vertical-align: middle` (intentional behavior change — see "Multirow vpos handling" notes)
|
|
208
|
+
- [x] New unit tests cover the cases listed under Testing
|
|
209
|
+
- [x] Changelog entry added
|
|
210
|
+
- [ ] `Status` updated to `Implemented` after merge
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## Testing
|
|
215
|
+
|
|
216
|
+
### Unit tests (new file under `tests/`)
|
|
217
|
+
|
|
218
|
+
Cases:
|
|
219
|
+
|
|
220
|
+
- `\begin{tabular}[t]{|l|l|}` → `vAlign = ['top', 'top']`, HTML emits `vertical-align: top` on both `<td>`.
|
|
221
|
+
- `\begin{tabular}[b]{|l|l|}` → `vAlign = ['bottom', 'bottom']`, HTML emits `vertical-align: bottom`.
|
|
222
|
+
- `\begin{tabular}[c]{|l|l|}` → `vAlign = ['middle', 'middle']`, HTML emits `vertical-align: middle` (explicit centering).
|
|
223
|
+
- `\begin{tabular}{|l|l|}` (no bracket, no option) → `vAlign = ['middle', 'middle']`, HTML emits `vertical-align: middle` (matches master — legacy code pushed `'middle'` unconditionally; snapshots include it).
|
|
224
|
+
- `\begin{tabular}[t]{|l|m{2cm}|}` → `vAlign = ['top', 'middle']` (per-column `m{}` wins).
|
|
225
|
+
- `\begin{tabular}[t]{|l|p{2cm}|b{2cm}|}` → `vAlign = ['top', 'top', 'bottom']` (`p` is already top; `b{}` overrides bracket).
|
|
226
|
+
- `\begin{tabular}[x]{|l|l|}` (unknown bracket) → treated as absent.
|
|
227
|
+
- `\begin{tabular}[ ]{|l|l|}` (empty/whitespace bracket) → treated as absent.
|
|
228
|
+
- `\begin{tabular} [t] {|l|l|}` (whitespace around bracket) → `vAlign = ['top', 'top']`.
|
|
229
|
+
- Nested: outer `\begin{tabular}{|l|l|}` + inner `\begin{tabular}[t]{l}` — outer cells stay middle, inner cells become top.
|
|
230
|
+
- `defaultCellVerticalAlign: 'top'` set, source no bracket → vAlign top, `tableOpen.meta.bracket = 't'`.
|
|
231
|
+
- `defaultCellVerticalAlign: 'top'` set, source explicitly `[c]` → vAlign middle, `tableOpen.meta.bracket = 'c'` (source bracket wins over option).
|
|
232
|
+
- `defaultCellVerticalAlign: 'middle'` set, source no bracket → vAlign middle, `tableOpen.meta.bracket` undefined (round-trip preserved).
|
|
233
|
+
- `defaultCellVerticalAlign` unset, source no bracket → no change (regression guard).
|
|
234
|
+
|
|
235
|
+
### Snapshot tests
|
|
236
|
+
|
|
237
|
+
- Run full snapshot suite. No existing snapshots should change — confirm by running `npm test` before and after the implementation and diffing.
|
|
238
|
+
- Add a new snapshot fixture: a table with `\begin{tabular}[t]{|l|l|l|}` containing nested-tabular cells of unequal lengths to verify end-to-end HTML output.
|
|
239
|
+
|
|
240
|
+
### Manual verification
|
|
241
|
+
|
|
242
|
+
- Render a sample MMD with `\begin{tabular}[t]{|l|l|l|}` containing nested-tabular cells of unequal lengths. Confirm short cells in HTML preview now align to the top.
|
|
243
|
+
- Render same document via `forLatex` export, confirm bracket is preserved in the emitted LaTeX source.
|
|
244
|
+
- Render same document with `defaultCellVerticalAlign: 'top'` and **without** the source bracket; confirm both HTML emits `vertical-align: top` and `tableOpen.meta.bracket === 't'`.
|
|
245
|
+
|
|
246
|
+
### Commands
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
npm test
|
|
250
|
+
npm run build
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
## Risk / Rollback
|
|
256
|
+
|
|
257
|
+
**Risk**: Low
|
|
258
|
+
|
|
259
|
+
- Pure additive change. Default behavior for all existing documents (no bracket + no option) is unchanged.
|
|
260
|
+
- Option is opt-in. Absent → identical to current behavior.
|
|
261
|
+
- Bracket parsing is scoped to a single regex extension and one new parameter through one helper.
|
|
262
|
+
- No changes to math, list, or non-tabular rendering paths.
|
|
263
|
+
|
|
264
|
+
**Risk areas to watch**:
|
|
265
|
+
|
|
266
|
+
- Multi-column / multi-row cells — confirm row-level bracket propagates as expected.
|
|
267
|
+
- `forLatex` payload — confirm bracket round-trip does not break downstream LaTeX consumers.
|
|
268
|
+
- Existing snapshot tests — confirm none change.
|
|
269
|
+
|
|
270
|
+
**Rollback**: revert PR.
|