markdown-merge 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- checksums.yaml.gz.sig +0 -0
- data/CHANGELOG.md +251 -0
- data/CITATION.cff +20 -0
- data/CODE_OF_CONDUCT.md +134 -0
- data/CONTRIBUTING.md +227 -0
- data/FUNDING.md +74 -0
- data/LICENSE.txt +21 -0
- data/README.md +1087 -0
- data/REEK +0 -0
- data/RUBOCOP.md +71 -0
- data/SECURITY.md +21 -0
- data/lib/markdown/merge/cleanse/block_spacing.rb +253 -0
- data/lib/markdown/merge/cleanse/code_fence_spacing.rb +294 -0
- data/lib/markdown/merge/cleanse/condensed_link_refs.rb +405 -0
- data/lib/markdown/merge/cleanse.rb +42 -0
- data/lib/markdown/merge/code_block_merger.rb +300 -0
- data/lib/markdown/merge/conflict_resolver.rb +128 -0
- data/lib/markdown/merge/debug_logger.rb +26 -0
- data/lib/markdown/merge/document_problems.rb +190 -0
- data/lib/markdown/merge/file_aligner.rb +196 -0
- data/lib/markdown/merge/file_analysis.rb +353 -0
- data/lib/markdown/merge/file_analysis_base.rb +629 -0
- data/lib/markdown/merge/freeze_node.rb +93 -0
- data/lib/markdown/merge/gap_line_node.rb +136 -0
- data/lib/markdown/merge/link_definition_formatter.rb +49 -0
- data/lib/markdown/merge/link_definition_node.rb +157 -0
- data/lib/markdown/merge/link_parser.rb +421 -0
- data/lib/markdown/merge/link_reference_rehydrator.rb +320 -0
- data/lib/markdown/merge/markdown_structure.rb +123 -0
- data/lib/markdown/merge/merge_result.rb +166 -0
- data/lib/markdown/merge/node_type_normalizer.rb +126 -0
- data/lib/markdown/merge/output_builder.rb +166 -0
- data/lib/markdown/merge/partial_template_merger.rb +334 -0
- data/lib/markdown/merge/smart_merger.rb +221 -0
- data/lib/markdown/merge/smart_merger_base.rb +621 -0
- data/lib/markdown/merge/table_match_algorithm.rb +504 -0
- data/lib/markdown/merge/table_match_refiner.rb +136 -0
- data/lib/markdown/merge/version.rb +12 -0
- data/lib/markdown/merge/whitespace_normalizer.rb +251 -0
- data/lib/markdown/merge.rb +149 -0
- data/lib/markdown-merge.rb +4 -0
- data/sig/markdown/merge.rbs +341 -0
- data.tar.gz.sig +0 -0
- metadata +365 -0
- metadata.gz.sig +0 -0
data/REEK
ADDED
|
File without changes
|
data/RUBOCOP.md
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
# RuboCop Usage Guide
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
A tale of two RuboCop plugin gems.
|
|
6
|
+
|
|
7
|
+
### RuboCop Gradual
|
|
8
|
+
|
|
9
|
+
This project uses `rubocop_gradual` instead of vanilla RuboCop for code style checking. The `rubocop_gradual` tool allows for gradual adoption of RuboCop rules by tracking violations in a lock file.
|
|
10
|
+
|
|
11
|
+
### RuboCop LTS
|
|
12
|
+
|
|
13
|
+
This project uses `rubocop-lts` to ensure, on a best-effort basis, compatibility with Ruby >= 1.9.2.
|
|
14
|
+
RuboCop rules are meticulously configured by the `rubocop-lts` family of gems to ensure that a project is compatible with a specific version of Ruby. See: https://rubocop-lts.gitlab.io for more.
|
|
15
|
+
|
|
16
|
+
## Checking RuboCop Violations
|
|
17
|
+
|
|
18
|
+
To check for RuboCop violations in this project, always use:
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
bundle exec rake rubocop_gradual:check
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
**Do not use** the standard RuboCop commands like:
|
|
25
|
+
- `bundle exec rubocop`
|
|
26
|
+
- `rubocop`
|
|
27
|
+
|
|
28
|
+
## Understanding the Lock File
|
|
29
|
+
|
|
30
|
+
The `.rubocop_gradual.lock` file tracks all current RuboCop violations in the project. This allows the team to:
|
|
31
|
+
|
|
32
|
+
1. Prevent new violations while gradually fixing existing ones
|
|
33
|
+
2. Track progress on code style improvements
|
|
34
|
+
3. Ensure CI builds don't fail due to pre-existing violations
|
|
35
|
+
|
|
36
|
+
## Common Commands
|
|
37
|
+
|
|
38
|
+
- **Check violations**
|
|
39
|
+
- `bundle exec rake rubocop_gradual`
|
|
40
|
+
- `bundle exec rake rubocop_gradual:check`
|
|
41
|
+
- **(Safe) Autocorrect violations, and update lockfile if no new violations**
|
|
42
|
+
- `bundle exec rake rubocop_gradual:autocorrect`
|
|
43
|
+
- **Force update the lock file (w/o autocorrect) to match violations present in code**
|
|
44
|
+
- `bundle exec rake rubocop_gradual:force_update`
|
|
45
|
+
|
|
46
|
+
## Workflow
|
|
47
|
+
|
|
48
|
+
1. Before submitting a PR, run `bundle exec rake rubocop_gradual:autocorrect`
|
|
49
|
+
a. or just the default `bundle exec rake`, as autocorrection is a pre-requisite of the default task.
|
|
50
|
+
2. If there are new violations, either:
|
|
51
|
+
- Fix them in your code
|
|
52
|
+
- Run `bundle exec rake rubocop_gradual:force_update` to update the lock file (only for violations you can't fix immediately)
|
|
53
|
+
3. Commit the updated `.rubocop_gradual.lock` file along with your changes
|
|
54
|
+
|
|
55
|
+
## Never add inline RuboCop disables
|
|
56
|
+
|
|
57
|
+
Do not add inline `rubocop:disable` / `rubocop:enable` comments anywhere in the codebase (including specs, except when following the few existing `rubocop:disable` patterns for a rule already being disabled elsewhere in the code). We handle exceptions in two supported ways:
|
|
58
|
+
|
|
59
|
+
- Permanent/structural exceptions: prefer adjusting the RuboCop configuration (e.g., in `.rubocop.yml`) to exclude a rule for a path or file pattern when it makes sense project-wide.
|
|
60
|
+
- Temporary exceptions while improving code: record the current violations in `.rubocop_gradual.lock` via the gradual workflow:
|
|
61
|
+
- `bundle exec rake rubocop_gradual:autocorrect` (preferred; will autocorrect what it can and update the lock only if no new violations were introduced)
|
|
62
|
+
- If needed, `bundle exec rake rubocop_gradual:force_update` (as a last resort when you cannot fix the newly reported violations immediately)
|
|
63
|
+
|
|
64
|
+
In general, treat the rules as guidance to follow; fix violations rather than ignore them. For example, RSpec conventions in this project expect `described_class` to be used in specs that target a specific class under test.
|
|
65
|
+
|
|
66
|
+
## Benefits of rubocop_gradual
|
|
67
|
+
|
|
68
|
+
- Allows incremental adoption of code style rules
|
|
69
|
+
- Prevents CI failures due to pre-existing violations
|
|
70
|
+
- Provides a clear record of code style debt
|
|
71
|
+
- Enables focused efforts on improving code quality over time
|
data/SECURITY.md
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
# Security Policy
|
|
2
|
+
|
|
3
|
+
## Supported Versions
|
|
4
|
+
|
|
5
|
+
| Version | Supported |
|
|
6
|
+
|----------|-----------|
|
|
7
|
+
| 1.latest | ✅ |
|
|
8
|
+
|
|
9
|
+
## Security contact information
|
|
10
|
+
|
|
11
|
+
To report a security vulnerability, please use the
|
|
12
|
+
[Tidelift security contact](https://tidelift.com/security).
|
|
13
|
+
Tidelift will coordinate the fix and disclosure.
|
|
14
|
+
|
|
15
|
+
## Additional Support
|
|
16
|
+
|
|
17
|
+
If you are interested in support for versions older than the latest release,
|
|
18
|
+
please consider sponsoring the project / maintainer @ https://liberapay.com/pboling/donate,
|
|
19
|
+
or find other sponsorship links in the [README].
|
|
20
|
+
|
|
21
|
+
[README]: README.md
|
|
@@ -0,0 +1,253 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "parslet"
|
|
4
|
+
|
|
5
|
+
module Markdown
|
|
6
|
+
module Merge
|
|
7
|
+
module Cleanse
|
|
8
|
+
# Fixes missing blank lines between block elements in Markdown.
|
|
9
|
+
#
|
|
10
|
+
# Markdown best practices require blank lines between:
|
|
11
|
+
# - List items and headings
|
|
12
|
+
# - Thematic breaks (---) and following content
|
|
13
|
+
# - HTML blocks (like </details>) and following markdown
|
|
14
|
+
# - Nested list items and headings
|
|
15
|
+
#
|
|
16
|
+
# This class detects and fixes these issues without using a full
|
|
17
|
+
# markdown parser, making it safe to use on documents that might
|
|
18
|
+
# have syntax issues.
|
|
19
|
+
#
|
|
20
|
+
# @example Basic usage
|
|
21
|
+
# fixer = Markdown::Merge::Cleanse::BlockSpacing.new(content)
|
|
22
|
+
# if fixer.malformed?
|
|
23
|
+
# fixed_content = fixer.fix
|
|
24
|
+
# end
|
|
25
|
+
#
|
|
26
|
+
# @example Check specific issues
|
|
27
|
+
# fixer = Markdown::Merge::Cleanse::BlockSpacing.new(content)
|
|
28
|
+
# fixer.issues.each do |issue|
|
|
29
|
+
# puts "Line #{issue[:line]}: #{issue[:type]}"
|
|
30
|
+
# end
|
|
31
|
+
#
|
|
32
|
+
class BlockSpacing
|
|
33
|
+
# Patterns for block elements that should have blank lines after them
|
|
34
|
+
THEMATIC_BREAK = /\A\s*(?:---+|\*\*\*+|___+)\s*\z/
|
|
35
|
+
HEADING = /\A\s*\#{1,6}\s+/
|
|
36
|
+
LIST_ITEM = /\A\s*(?:[-*+]|\d+\.)\s+/
|
|
37
|
+
HTML_CLOSE_TAG = /\A\s*<\/[a-zA-Z][a-zA-Z0-9]*>\s*\z/
|
|
38
|
+
HTML_OPEN_TAG = /\A\s*<[a-zA-Z][a-zA-Z0-9]*(?:\s|>)/
|
|
39
|
+
HTML_ANY_TAG = /\A\s*<\/?[a-zA-Z]/
|
|
40
|
+
LINK_REF_DEF = /\A\s*\[[^\]]+\]:\s*/
|
|
41
|
+
|
|
42
|
+
# Block-level HTML elements that can span multiple lines
|
|
43
|
+
# These create a context where we shouldn't insert blank lines
|
|
44
|
+
HTML_BLOCK_ELEMENTS = %w[
|
|
45
|
+
ul
|
|
46
|
+
ol
|
|
47
|
+
li
|
|
48
|
+
dl
|
|
49
|
+
dt
|
|
50
|
+
dd
|
|
51
|
+
div
|
|
52
|
+
table
|
|
53
|
+
thead
|
|
54
|
+
tbody
|
|
55
|
+
tfoot
|
|
56
|
+
tr
|
|
57
|
+
th
|
|
58
|
+
td
|
|
59
|
+
blockquote
|
|
60
|
+
pre
|
|
61
|
+
figure
|
|
62
|
+
figcaption
|
|
63
|
+
details
|
|
64
|
+
summary
|
|
65
|
+
section
|
|
66
|
+
article
|
|
67
|
+
aside
|
|
68
|
+
nav
|
|
69
|
+
header
|
|
70
|
+
footer
|
|
71
|
+
main
|
|
72
|
+
address
|
|
73
|
+
form
|
|
74
|
+
fieldset
|
|
75
|
+
].freeze
|
|
76
|
+
|
|
77
|
+
# Pattern to match opening block-level HTML tags
|
|
78
|
+
HTML_BLOCK_OPEN = /\A\s*<(#{HTML_BLOCK_ELEMENTS.join("|")})(?:\s|>)/i
|
|
79
|
+
|
|
80
|
+
# Pattern to match closing block-level HTML tags
|
|
81
|
+
HTML_BLOCK_CLOSE = /\A\s*<\/(#{HTML_BLOCK_ELEMENTS.join("|")})>/i
|
|
82
|
+
|
|
83
|
+
# HTML elements that contain markdown content (not HTML content)
|
|
84
|
+
# These should have blank lines before their closing tags
|
|
85
|
+
MARKDOWN_CONTAINER_ELEMENTS = %w[details].freeze
|
|
86
|
+
|
|
87
|
+
# Pattern to match closing tags for markdown containers
|
|
88
|
+
MARKDOWN_CONTAINER_CLOSE = /\A\s*<\/(#{MARKDOWN_CONTAINER_ELEMENTS.join("|")})>/i
|
|
89
|
+
|
|
90
|
+
# Markdown content: anything that's not blank, not HTML, and not a link ref def
|
|
91
|
+
MARKDOWN_CONTENT = ->(line) {
|
|
92
|
+
stripped = line.strip
|
|
93
|
+
return false if stripped.empty?
|
|
94
|
+
return false if stripped.start_with?("<")
|
|
95
|
+
return false if line.match?(LINK_REF_DEF)
|
|
96
|
+
true
|
|
97
|
+
}
|
|
98
|
+
|
|
99
|
+
# @return [String] The original content
|
|
100
|
+
attr_reader :source
|
|
101
|
+
|
|
102
|
+
# @return [Array<Hash>] Issues found
|
|
103
|
+
attr_reader :issues
|
|
104
|
+
|
|
105
|
+
# Initialize a new BlockSpacing fixer.
|
|
106
|
+
#
|
|
107
|
+
# @param source [String] The markdown content to analyze
|
|
108
|
+
def initialize(source)
|
|
109
|
+
@source = source
|
|
110
|
+
@issues = []
|
|
111
|
+
analyze
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
# Check if the content has block spacing issues.
|
|
115
|
+
#
|
|
116
|
+
# @return [Boolean] true if issues were found
|
|
117
|
+
def malformed?
|
|
118
|
+
@issues.any?
|
|
119
|
+
end
|
|
120
|
+
|
|
121
|
+
# Get the count of issues found.
|
|
122
|
+
#
|
|
123
|
+
# @return [Integer] number of issues
|
|
124
|
+
def issue_count
|
|
125
|
+
@issues.size
|
|
126
|
+
end
|
|
127
|
+
|
|
128
|
+
# Fix the block spacing issues.
|
|
129
|
+
#
|
|
130
|
+
# @return [String] Content with blank lines added where needed
|
|
131
|
+
def fix
|
|
132
|
+
return source unless malformed?
|
|
133
|
+
|
|
134
|
+
lines = source.lines
|
|
135
|
+
result = []
|
|
136
|
+
insertions = @issues.map { |i| i[:line] }.to_set
|
|
137
|
+
|
|
138
|
+
lines.each_with_index do |line, idx|
|
|
139
|
+
result << line
|
|
140
|
+
# If this line needs a blank line after it, add one
|
|
141
|
+
if insertions.include?(idx + 1) # issues use 1-based line numbers
|
|
142
|
+
result << "\n" unless line.strip.empty?
|
|
143
|
+
end
|
|
144
|
+
end
|
|
145
|
+
|
|
146
|
+
result.join
|
|
147
|
+
end
|
|
148
|
+
|
|
149
|
+
private
|
|
150
|
+
|
|
151
|
+
def analyze
|
|
152
|
+
lines = source.lines
|
|
153
|
+
return if lines.empty?
|
|
154
|
+
|
|
155
|
+
# Track depth of block-level HTML elements
|
|
156
|
+
# When depth > 0, we're inside an HTML block and shouldn't add blank lines
|
|
157
|
+
html_block_depth = 0
|
|
158
|
+
|
|
159
|
+
lines.each_with_index do |line, idx|
|
|
160
|
+
next_line = lines[idx + 1]
|
|
161
|
+
prev_line = (idx > 0) ? lines[idx - 1] : nil
|
|
162
|
+
|
|
163
|
+
# Special case: closing tags for markdown containers like </details>
|
|
164
|
+
# These contain markdown content, so we need blank lines before them
|
|
165
|
+
# even when inside an HTML block
|
|
166
|
+
is_markdown_container_close = line.match?(MARKDOWN_CONTAINER_CLOSE)
|
|
167
|
+
|
|
168
|
+
# Check for issues BEFORE updating depth
|
|
169
|
+
if html_block_depth <= 0
|
|
170
|
+
# Check for issues that need blank line AFTER current line
|
|
171
|
+
if next_line && !next_line.strip.empty?
|
|
172
|
+
check_thematic_break(line, next_line, idx)
|
|
173
|
+
check_list_before_heading(line, next_line, idx)
|
|
174
|
+
check_html_close_before_markdown(line, next_line, idx)
|
|
175
|
+
end
|
|
176
|
+
|
|
177
|
+
# Check for issues that need blank line BEFORE current line
|
|
178
|
+
if prev_line && !prev_line.strip.empty?
|
|
179
|
+
check_markdown_before_html(prev_line, line, idx)
|
|
180
|
+
end
|
|
181
|
+
end
|
|
182
|
+
|
|
183
|
+
# Special case: always check for blank line before </details> etc.
|
|
184
|
+
# because they contain markdown content
|
|
185
|
+
if is_markdown_container_close && prev_line && !prev_line.strip.empty?
|
|
186
|
+
check_markdown_before_html(prev_line, line, idx)
|
|
187
|
+
end
|
|
188
|
+
|
|
189
|
+
# Update HTML block depth AFTER checking for issues
|
|
190
|
+
# Count opening block-level tags
|
|
191
|
+
if line.match?(HTML_BLOCK_OPEN)
|
|
192
|
+
html_block_depth += 1
|
|
193
|
+
end
|
|
194
|
+
|
|
195
|
+
# Check for closing block-level tags
|
|
196
|
+
line.scan(HTML_BLOCK_CLOSE) do
|
|
197
|
+
html_block_depth -= 1 if html_block_depth > 0
|
|
198
|
+
end
|
|
199
|
+
end
|
|
200
|
+
end
|
|
201
|
+
|
|
202
|
+
def check_thematic_break(line, next_line, idx)
|
|
203
|
+
return unless line.match?(THEMATIC_BREAK)
|
|
204
|
+
return if next_line.strip.empty?
|
|
205
|
+
|
|
206
|
+
@issues << {
|
|
207
|
+
type: :thematic_break_needs_blank,
|
|
208
|
+
line: idx + 1,
|
|
209
|
+
description: "Thematic break should be followed by blank line",
|
|
210
|
+
}
|
|
211
|
+
end
|
|
212
|
+
|
|
213
|
+
def check_list_before_heading(line, next_line, idx)
|
|
214
|
+
return unless line.match?(LIST_ITEM)
|
|
215
|
+
return unless next_line.match?(HEADING)
|
|
216
|
+
|
|
217
|
+
@issues << {
|
|
218
|
+
type: :list_before_heading,
|
|
219
|
+
line: idx + 1,
|
|
220
|
+
description: "List item should be followed by blank line before heading",
|
|
221
|
+
}
|
|
222
|
+
end
|
|
223
|
+
|
|
224
|
+
def check_html_close_before_markdown(line, next_line, idx)
|
|
225
|
+
return unless line.match?(HTML_CLOSE_TAG)
|
|
226
|
+
# Next line is markdown (heading, list, paragraph start, etc.)
|
|
227
|
+
# but not HTML or blank
|
|
228
|
+
return if next_line.match?(/\A\s*</)
|
|
229
|
+
return if next_line.match?(LINK_REF_DEF)
|
|
230
|
+
|
|
231
|
+
@issues << {
|
|
232
|
+
type: :html_before_markdown,
|
|
233
|
+
line: idx + 1,
|
|
234
|
+
description: "HTML close tag should be followed by blank line before markdown",
|
|
235
|
+
}
|
|
236
|
+
end
|
|
237
|
+
|
|
238
|
+
def check_markdown_before_html(prev_line, line, idx)
|
|
239
|
+
# Current line is HTML (open or close tag)
|
|
240
|
+
return unless line.match?(HTML_ANY_TAG)
|
|
241
|
+
# Previous line is markdown content (not HTML, not blank, not link ref)
|
|
242
|
+
return unless MARKDOWN_CONTENT.call(prev_line)
|
|
243
|
+
|
|
244
|
+
@issues << {
|
|
245
|
+
type: :markdown_before_html,
|
|
246
|
+
line: idx, # Insert blank line BEFORE this line (so after prev_line)
|
|
247
|
+
description: "Markdown content should be followed by blank line before HTML",
|
|
248
|
+
}
|
|
249
|
+
end
|
|
250
|
+
end
|
|
251
|
+
end
|
|
252
|
+
end
|
|
253
|
+
end
|
|
@@ -0,0 +1,294 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require "parslet"
|
|
4
|
+
|
|
5
|
+
module Markdown
|
|
6
|
+
module Merge
|
|
7
|
+
module Cleanse
|
|
8
|
+
# Parslet-based parser for fixing malformed fenced code blocks in Markdown.
|
|
9
|
+
#
|
|
10
|
+
# == The Problem
|
|
11
|
+
#
|
|
12
|
+
# This class fixes **improperly formatted fenced code blocks** where there is
|
|
13
|
+
# unwanted whitespace between the fence markers (``` or ~~~) and the language
|
|
14
|
+
# identifier.
|
|
15
|
+
#
|
|
16
|
+
# A bug in ast-merge (or its dependencies) caused fenced code blocks to be
|
|
17
|
+
# rendered with a space between the fence markers and the language identifier.
|
|
18
|
+
#
|
|
19
|
+
# == Bug Pattern
|
|
20
|
+
#
|
|
21
|
+
# CommonMark and most Markdown parsers expect NO space between fence and language:
|
|
22
|
+
# - **Correct:** ` ```ruby` or ` ~~~python`
|
|
23
|
+
# - **Incorrect:** ` ``` ruby` or ` ~~~ python` (extra space)
|
|
24
|
+
#
|
|
25
|
+
# The extra space can cause:
|
|
26
|
+
# - Syntax highlighting to fail
|
|
27
|
+
# - The language identifier to be ignored
|
|
28
|
+
# - Rendering issues in various Markdown processors
|
|
29
|
+
#
|
|
30
|
+
# @example Malformed (buggy) input
|
|
31
|
+
# "``` console\nsome code\n```"
|
|
32
|
+
#
|
|
33
|
+
# @example Fixed output
|
|
34
|
+
# "```console\nsome code\n```"
|
|
35
|
+
#
|
|
36
|
+
# == Scope
|
|
37
|
+
#
|
|
38
|
+
# This fixer handles:
|
|
39
|
+
# - **Any indentation level** (0+ spaces before fence)
|
|
40
|
+
# - Top-level: ` ```ruby`
|
|
41
|
+
# - In lists: ` ```python` (4 spaces)
|
|
42
|
+
# - **Both fence types:** backticks (```) and tildes (~~~)
|
|
43
|
+
# - **Any fence length:** 3+ markers (````, ~~~~~, etc.)
|
|
44
|
+
#
|
|
45
|
+
# == How It Works
|
|
46
|
+
#
|
|
47
|
+
# The parser uses a **PEG grammar** (via Parslet) to:
|
|
48
|
+
# - Detect fence opening lines with optional indentation
|
|
49
|
+
# - Identify spacing between fence and language identifier
|
|
50
|
+
# - Track opening/closing fence pairs to avoid false positives
|
|
51
|
+
# - Reconstruct fences with proper formatting (no space)
|
|
52
|
+
#
|
|
53
|
+
# **Why PEG?** The previous regex-based implementation used patterns like
|
|
54
|
+
# `([ \t]*)` which can cause polynomial backtracking (ReDoS vulnerability)
|
|
55
|
+
# when processing malicious input with many tabs/spaces. PEG parsers are
|
|
56
|
+
# linear-time and immune to ReDoS attacks.
|
|
57
|
+
#
|
|
58
|
+
# @example Basic usage
|
|
59
|
+
# parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
|
|
60
|
+
# fixed_content = parser.fix
|
|
61
|
+
#
|
|
62
|
+
# @example Check if content has malformed fences
|
|
63
|
+
# parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
|
|
64
|
+
# parser.malformed? # => true/false
|
|
65
|
+
#
|
|
66
|
+
# @example Process a file
|
|
67
|
+
# content = File.read("README.md")
|
|
68
|
+
# parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
|
|
69
|
+
# if parser.malformed?
|
|
70
|
+
# File.write("README.md", parser.fix)
|
|
71
|
+
# end
|
|
72
|
+
#
|
|
73
|
+
# @example Get details about code blocks
|
|
74
|
+
# parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
|
|
75
|
+
# parser.code_blocks.each do |block|
|
|
76
|
+
# puts "#{block[:fence]}#{block[:language]}: malformed=#{block[:malformed]}"
|
|
77
|
+
# end
|
|
78
|
+
#
|
|
79
|
+
# @api public
|
|
80
|
+
class CodeFenceSpacing
|
|
81
|
+
# Grammar for parsing fenced code blocks with PEG parser.
|
|
82
|
+
#
|
|
83
|
+
# Recognizes:
|
|
84
|
+
# - Any amount of indentation (handles nested lists)
|
|
85
|
+
# - Backtick fences (```) and tilde fences (~~~)
|
|
86
|
+
# - Optional info string (language identifier)
|
|
87
|
+
# - Properly handles spacing issues
|
|
88
|
+
#
|
|
89
|
+
# This PEG grammar is linear-time and cannot have polynomial backtracking,
|
|
90
|
+
# eliminating ReDoS vulnerabilities.
|
|
91
|
+
#
|
|
92
|
+
# @api private
|
|
93
|
+
class CodeFenceGrammar < Parslet::Parser
|
|
94
|
+
# Any amount of indentation (handles code blocks in lists)
|
|
95
|
+
# Captured as string, not array
|
|
96
|
+
rule(:indent) { match("[ ]").repeat }
|
|
97
|
+
|
|
98
|
+
# Fence markers - 3+ backticks or tildes
|
|
99
|
+
rule(:backtick) { str("`") }
|
|
100
|
+
rule(:tilde) { str("~") }
|
|
101
|
+
rule(:backtick_fence) { backtick.repeat(3, nil) }
|
|
102
|
+
rule(:tilde_fence) { tilde.repeat(3, nil) }
|
|
103
|
+
rule(:fence) { backtick_fence | tilde_fence }
|
|
104
|
+
|
|
105
|
+
# Whitespace after fence (the bug we're fixing)
|
|
106
|
+
rule(:space) { match('[ \t]') }
|
|
107
|
+
rule(:spaces) { space.repeat(1) }
|
|
108
|
+
rule(:spaces?) { space.repeat }
|
|
109
|
+
|
|
110
|
+
# Info string (language identifier + optional attributes)
|
|
111
|
+
# Cannot contain backticks or tildes per CommonMark
|
|
112
|
+
rule(:info_char) { match('[^\r\n`~]') }
|
|
113
|
+
rule(:info_string) { info_char.repeat(1) }
|
|
114
|
+
|
|
115
|
+
# Line ending
|
|
116
|
+
rule(:line_end) { str("\r").maybe >> str("\n").maybe >> any.absent? }
|
|
117
|
+
|
|
118
|
+
# Fence line with optional indentation, optional spacing, optional info
|
|
119
|
+
# Capture: indent (raw), fence (as :fence), spacing (as :spacing), info (as :info)
|
|
120
|
+
rule(:fence_line) {
|
|
121
|
+
indent.as(:indent) >> fence.as(:fence) >> spaces?.as(:spacing) >> info_string.maybe.as(:info) >> line_end
|
|
122
|
+
}
|
|
123
|
+
|
|
124
|
+
root(:fence_line)
|
|
125
|
+
end
|
|
126
|
+
|
|
127
|
+
# @return [String] the input text to parse
|
|
128
|
+
attr_reader :source
|
|
129
|
+
|
|
130
|
+
# Create a new parser for the given text.
|
|
131
|
+
#
|
|
132
|
+
# @param source [String] the text that may contain malformed code fences
|
|
133
|
+
def initialize(source)
|
|
134
|
+
@source = source.to_s
|
|
135
|
+
@grammar = CodeFenceGrammar.new
|
|
136
|
+
@code_blocks = nil
|
|
137
|
+
end
|
|
138
|
+
|
|
139
|
+
# Check if the source contains malformed fenced code blocks.
|
|
140
|
+
#
|
|
141
|
+
# Detects the pattern where there's whitespace between the fence
|
|
142
|
+
# markers and the language identifier.
|
|
143
|
+
#
|
|
144
|
+
# @return [Boolean] true if malformed fences are detected
|
|
145
|
+
def malformed?
|
|
146
|
+
code_blocks.any? { |block| block[:malformed] }
|
|
147
|
+
end
|
|
148
|
+
|
|
149
|
+
# Parse and return information about all fenced code blocks.
|
|
150
|
+
#
|
|
151
|
+
# Only returns opening fences (not closing fences).
|
|
152
|
+
#
|
|
153
|
+
# @return [Array<Hash>] Array of code block info
|
|
154
|
+
# - :indent [String] The indentation before the fence
|
|
155
|
+
# - :fence [String] The fence markers (e.g., "```" or "~~~")
|
|
156
|
+
# - :language [String, nil] The language identifier
|
|
157
|
+
# - :spacing [String] Any spacing between fence and language
|
|
158
|
+
# - :malformed [Boolean] Whether this block has improper spacing
|
|
159
|
+
# - :line_number [Integer] Line number where block starts (1-based)
|
|
160
|
+
# - :original [String] The original opening fence line
|
|
161
|
+
def code_blocks
|
|
162
|
+
return @code_blocks if @code_blocks
|
|
163
|
+
|
|
164
|
+
@code_blocks = []
|
|
165
|
+
line_number = 0
|
|
166
|
+
in_code_block = false
|
|
167
|
+
current_fence_char = nil
|
|
168
|
+
|
|
169
|
+
source.each_line do |line|
|
|
170
|
+
line_number += 1
|
|
171
|
+
|
|
172
|
+
# Try to parse as fence line using PEG grammar
|
|
173
|
+
parsed = parse_fence_line(line)
|
|
174
|
+
next unless parsed
|
|
175
|
+
|
|
176
|
+
fence = parsed[:fence]
|
|
177
|
+
fence_char = fence[0]
|
|
178
|
+
spacing = parsed[:spacing] || ""
|
|
179
|
+
info = parsed[:info] || ""
|
|
180
|
+
indent = parsed[:indent] || ""
|
|
181
|
+
|
|
182
|
+
# Closing fence: matches current fence type and has no info
|
|
183
|
+
if in_code_block && fence_char == current_fence_char && info.empty?
|
|
184
|
+
in_code_block = false
|
|
185
|
+
current_fence_char = nil
|
|
186
|
+
next
|
|
187
|
+
end
|
|
188
|
+
|
|
189
|
+
# Opening fence
|
|
190
|
+
in_code_block = true
|
|
191
|
+
current_fence_char = fence_char
|
|
192
|
+
|
|
193
|
+
# Extract just the language (first word of info string)
|
|
194
|
+
language = info.strip.split(/\s+/).first
|
|
195
|
+
language = nil if language&.empty?
|
|
196
|
+
|
|
197
|
+
@code_blocks << {
|
|
198
|
+
indent: indent,
|
|
199
|
+
fence: fence,
|
|
200
|
+
language: language,
|
|
201
|
+
info_string: info.strip,
|
|
202
|
+
spacing: spacing,
|
|
203
|
+
malformed: !spacing.empty? && !language.nil?,
|
|
204
|
+
line_number: line_number,
|
|
205
|
+
original: line.chomp,
|
|
206
|
+
}
|
|
207
|
+
end
|
|
208
|
+
|
|
209
|
+
@code_blocks
|
|
210
|
+
end
|
|
211
|
+
|
|
212
|
+
# Fix malformed fenced code blocks by removing improper spacing.
|
|
213
|
+
#
|
|
214
|
+
# @return [String] the source with code fences fixed
|
|
215
|
+
def fix
|
|
216
|
+
return source unless malformed?
|
|
217
|
+
|
|
218
|
+
result = source.dup
|
|
219
|
+
|
|
220
|
+
# Process line by line, fixing malformed fences
|
|
221
|
+
lines = result.lines
|
|
222
|
+
fixed_lines = lines.map do |line|
|
|
223
|
+
fix_fence_line(line)
|
|
224
|
+
end
|
|
225
|
+
|
|
226
|
+
fixed_lines.join
|
|
227
|
+
end
|
|
228
|
+
|
|
229
|
+
# Count the number of malformed code blocks.
|
|
230
|
+
#
|
|
231
|
+
# @return [Integer] number of malformed fences found
|
|
232
|
+
def malformed_count
|
|
233
|
+
code_blocks.count { |block| block[:malformed] }
|
|
234
|
+
end
|
|
235
|
+
|
|
236
|
+
# Count the total number of code blocks.
|
|
237
|
+
#
|
|
238
|
+
# @return [Integer] total number of fenced code blocks
|
|
239
|
+
def count
|
|
240
|
+
code_blocks.size
|
|
241
|
+
end
|
|
242
|
+
|
|
243
|
+
private
|
|
244
|
+
|
|
245
|
+
# Parse a single line as a fence using PEG grammar.
|
|
246
|
+
#
|
|
247
|
+
# @param line [String] the line to parse
|
|
248
|
+
# @return [Hash, nil] parsed fence data or nil if not a fence
|
|
249
|
+
def parse_fence_line(line)
|
|
250
|
+
tree = @grammar.parse(line)
|
|
251
|
+
|
|
252
|
+
# Convert Parslet tree to simple hash
|
|
253
|
+
# Note: Parslet returns [] for empty repeats, we convert to empty string
|
|
254
|
+
indent_val = tree[:indent]
|
|
255
|
+
indent_str = indent_val.is_a?(Array) ? indent_val.join : indent_val.to_s
|
|
256
|
+
|
|
257
|
+
spacing_val = tree[:spacing]
|
|
258
|
+
spacing_str = spacing_val.is_a?(Array) ? spacing_val.join : spacing_val.to_s
|
|
259
|
+
|
|
260
|
+
info_val = tree[:info]
|
|
261
|
+
info_str = if info_val.is_a?(Array)
|
|
262
|
+
info_val.join
|
|
263
|
+
else
|
|
264
|
+
(info_val ? info_val.to_s : "")
|
|
265
|
+
end
|
|
266
|
+
|
|
267
|
+
{
|
|
268
|
+
indent: indent_str,
|
|
269
|
+
fence: tree[:fence].to_s,
|
|
270
|
+
spacing: spacing_str,
|
|
271
|
+
info: info_str,
|
|
272
|
+
}
|
|
273
|
+
rescue Parslet::ParseFailed
|
|
274
|
+
nil
|
|
275
|
+
end
|
|
276
|
+
|
|
277
|
+
# Fix a single line if it's a malformed fence.
|
|
278
|
+
#
|
|
279
|
+
# @param line [String] the line to potentially fix
|
|
280
|
+
# @return [String] the fixed line (or original if not malformed)
|
|
281
|
+
def fix_fence_line(line)
|
|
282
|
+
parsed = parse_fence_line(line)
|
|
283
|
+
return line unless parsed
|
|
284
|
+
|
|
285
|
+
# Only fix if there's spacing AND info string
|
|
286
|
+
return line if parsed[:spacing].empty? || parsed[:info].empty?
|
|
287
|
+
|
|
288
|
+
# Reconstruct: indent + fence + info (no spacing)
|
|
289
|
+
"#{parsed[:indent]}#{parsed[:fence]}#{parsed[:info]}\n"
|
|
290
|
+
end
|
|
291
|
+
end
|
|
292
|
+
end
|
|
293
|
+
end
|
|
294
|
+
end
|