markdown-merge 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. checksums.yaml +7 -0
  2. checksums.yaml.gz.sig +0 -0
  3. data/CHANGELOG.md +251 -0
  4. data/CITATION.cff +20 -0
  5. data/CODE_OF_CONDUCT.md +134 -0
  6. data/CONTRIBUTING.md +227 -0
  7. data/FUNDING.md +74 -0
  8. data/LICENSE.txt +21 -0
  9. data/README.md +1087 -0
  10. data/REEK +0 -0
  11. data/RUBOCOP.md +71 -0
  12. data/SECURITY.md +21 -0
  13. data/lib/markdown/merge/cleanse/block_spacing.rb +253 -0
  14. data/lib/markdown/merge/cleanse/code_fence_spacing.rb +294 -0
  15. data/lib/markdown/merge/cleanse/condensed_link_refs.rb +405 -0
  16. data/lib/markdown/merge/cleanse.rb +42 -0
  17. data/lib/markdown/merge/code_block_merger.rb +300 -0
  18. data/lib/markdown/merge/conflict_resolver.rb +128 -0
  19. data/lib/markdown/merge/debug_logger.rb +26 -0
  20. data/lib/markdown/merge/document_problems.rb +190 -0
  21. data/lib/markdown/merge/file_aligner.rb +196 -0
  22. data/lib/markdown/merge/file_analysis.rb +353 -0
  23. data/lib/markdown/merge/file_analysis_base.rb +629 -0
  24. data/lib/markdown/merge/freeze_node.rb +93 -0
  25. data/lib/markdown/merge/gap_line_node.rb +136 -0
  26. data/lib/markdown/merge/link_definition_formatter.rb +49 -0
  27. data/lib/markdown/merge/link_definition_node.rb +157 -0
  28. data/lib/markdown/merge/link_parser.rb +421 -0
  29. data/lib/markdown/merge/link_reference_rehydrator.rb +320 -0
  30. data/lib/markdown/merge/markdown_structure.rb +123 -0
  31. data/lib/markdown/merge/merge_result.rb +166 -0
  32. data/lib/markdown/merge/node_type_normalizer.rb +126 -0
  33. data/lib/markdown/merge/output_builder.rb +166 -0
  34. data/lib/markdown/merge/partial_template_merger.rb +334 -0
  35. data/lib/markdown/merge/smart_merger.rb +221 -0
  36. data/lib/markdown/merge/smart_merger_base.rb +621 -0
  37. data/lib/markdown/merge/table_match_algorithm.rb +504 -0
  38. data/lib/markdown/merge/table_match_refiner.rb +136 -0
  39. data/lib/markdown/merge/version.rb +12 -0
  40. data/lib/markdown/merge/whitespace_normalizer.rb +251 -0
  41. data/lib/markdown/merge.rb +149 -0
  42. data/lib/markdown-merge.rb +4 -0
  43. data/sig/markdown/merge.rbs +341 -0
  44. data.tar.gz.sig +0 -0
  45. metadata +365 -0
  46. metadata.gz.sig +0 -0
data/REEK ADDED
File without changes
data/RUBOCOP.md ADDED
@@ -0,0 +1,71 @@
1
+ # RuboCop Usage Guide
2
+
3
+ ## Overview
4
+
5
+ A tale of two RuboCop plugin gems.
6
+
7
+ ### RuboCop Gradual
8
+
9
+ This project uses `rubocop_gradual` instead of vanilla RuboCop for code style checking. The `rubocop_gradual` tool allows for gradual adoption of RuboCop rules by tracking violations in a lock file.
10
+
11
+ ### RuboCop LTS
12
+
13
+ This project uses `rubocop-lts` to ensure, on a best-effort basis, compatibility with Ruby >= 1.9.2.
14
+ RuboCop rules are meticulously configured by the `rubocop-lts` family of gems to ensure that a project is compatible with a specific version of Ruby. See: https://rubocop-lts.gitlab.io for more.
15
+
16
+ ## Checking RuboCop Violations
17
+
18
+ To check for RuboCop violations in this project, always use:
19
+
20
+ ```bash
21
+ bundle exec rake rubocop_gradual:check
22
+ ```
23
+
24
+ **Do not use** the standard RuboCop commands like:
25
+ - `bundle exec rubocop`
26
+ - `rubocop`
27
+
28
+ ## Understanding the Lock File
29
+
30
+ The `.rubocop_gradual.lock` file tracks all current RuboCop violations in the project. This allows the team to:
31
+
32
+ 1. Prevent new violations while gradually fixing existing ones
33
+ 2. Track progress on code style improvements
34
+ 3. Ensure CI builds don't fail due to pre-existing violations
35
+
36
+ ## Common Commands
37
+
38
+ - **Check violations**
39
+ - `bundle exec rake rubocop_gradual`
40
+ - `bundle exec rake rubocop_gradual:check`
41
+ - **(Safe) Autocorrect violations, and update lockfile if no new violations**
42
+ - `bundle exec rake rubocop_gradual:autocorrect`
43
+ - **Force update the lock file (w/o autocorrect) to match violations present in code**
44
+ - `bundle exec rake rubocop_gradual:force_update`
45
+
46
+ ## Workflow
47
+
48
+ 1. Before submitting a PR, run `bundle exec rake rubocop_gradual:autocorrect`
49
+ a. or just the default `bundle exec rake`, as autocorrection is a pre-requisite of the default task.
50
+ 2. If there are new violations, either:
51
+ - Fix them in your code
52
+ - Run `bundle exec rake rubocop_gradual:force_update` to update the lock file (only for violations you can't fix immediately)
53
+ 3. Commit the updated `.rubocop_gradual.lock` file along with your changes
54
+
55
+ ## Never add inline RuboCop disables
56
+
57
+ Do not add inline `rubocop:disable` / `rubocop:enable` comments anywhere in the codebase (including specs, except when following the few existing `rubocop:disable` patterns for a rule already being disabled elsewhere in the code). We handle exceptions in two supported ways:
58
+
59
+ - Permanent/structural exceptions: prefer adjusting the RuboCop configuration (e.g., in `.rubocop.yml`) to exclude a rule for a path or file pattern when it makes sense project-wide.
60
+ - Temporary exceptions while improving code: record the current violations in `.rubocop_gradual.lock` via the gradual workflow:
61
+ - `bundle exec rake rubocop_gradual:autocorrect` (preferred; will autocorrect what it can and update the lock only if no new violations were introduced)
62
+ - If needed, `bundle exec rake rubocop_gradual:force_update` (as a last resort when you cannot fix the newly reported violations immediately)
63
+
64
+ In general, treat the rules as guidance to follow; fix violations rather than ignore them. For example, RSpec conventions in this project expect `described_class` to be used in specs that target a specific class under test.
65
+
66
+ ## Benefits of rubocop_gradual
67
+
68
+ - Allows incremental adoption of code style rules
69
+ - Prevents CI failures due to pre-existing violations
70
+ - Provides a clear record of code style debt
71
+ - Enables focused efforts on improving code quality over time
data/SECURITY.md ADDED
@@ -0,0 +1,21 @@
1
+ # Security Policy
2
+
3
+ ## Supported Versions
4
+
5
+ | Version | Supported |
6
+ |----------|-----------|
7
+ | 1.latest | ✅ |
8
+
9
+ ## Security contact information
10
+
11
+ To report a security vulnerability, please use the
12
+ [Tidelift security contact](https://tidelift.com/security).
13
+ Tidelift will coordinate the fix and disclosure.
14
+
15
+ ## Additional Support
16
+
17
+ If you are interested in support for versions older than the latest release,
18
+ please consider sponsoring the project / maintainer @ https://liberapay.com/pboling/donate,
19
+ or find other sponsorship links in the [README].
20
+
21
+ [README]: README.md
@@ -0,0 +1,253 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "parslet"
4
+
5
+ module Markdown
6
+ module Merge
7
+ module Cleanse
8
+ # Fixes missing blank lines between block elements in Markdown.
9
+ #
10
+ # Markdown best practices require blank lines between:
11
+ # - List items and headings
12
+ # - Thematic breaks (---) and following content
13
+ # - HTML blocks (like </details>) and following markdown
14
+ # - Nested list items and headings
15
+ #
16
+ # This class detects and fixes these issues without using a full
17
+ # markdown parser, making it safe to use on documents that might
18
+ # have syntax issues.
19
+ #
20
+ # @example Basic usage
21
+ # fixer = Markdown::Merge::Cleanse::BlockSpacing.new(content)
22
+ # if fixer.malformed?
23
+ # fixed_content = fixer.fix
24
+ # end
25
+ #
26
+ # @example Check specific issues
27
+ # fixer = Markdown::Merge::Cleanse::BlockSpacing.new(content)
28
+ # fixer.issues.each do |issue|
29
+ # puts "Line #{issue[:line]}: #{issue[:type]}"
30
+ # end
31
+ #
32
+ class BlockSpacing
33
+ # Patterns for block elements that should have blank lines after them
34
+ THEMATIC_BREAK = /\A\s*(?:---+|\*\*\*+|___+)\s*\z/
35
+ HEADING = /\A\s*\#{1,6}\s+/
36
+ LIST_ITEM = /\A\s*(?:[-*+]|\d+\.)\s+/
37
+ HTML_CLOSE_TAG = /\A\s*<\/[a-zA-Z][a-zA-Z0-9]*>\s*\z/
38
+ HTML_OPEN_TAG = /\A\s*<[a-zA-Z][a-zA-Z0-9]*(?:\s|>)/
39
+ HTML_ANY_TAG = /\A\s*<\/?[a-zA-Z]/
40
+ LINK_REF_DEF = /\A\s*\[[^\]]+\]:\s*/
41
+
42
+ # Block-level HTML elements that can span multiple lines
43
+ # These create a context where we shouldn't insert blank lines
44
+ HTML_BLOCK_ELEMENTS = %w[
45
+ ul
46
+ ol
47
+ li
48
+ dl
49
+ dt
50
+ dd
51
+ div
52
+ table
53
+ thead
54
+ tbody
55
+ tfoot
56
+ tr
57
+ th
58
+ td
59
+ blockquote
60
+ pre
61
+ figure
62
+ figcaption
63
+ details
64
+ summary
65
+ section
66
+ article
67
+ aside
68
+ nav
69
+ header
70
+ footer
71
+ main
72
+ address
73
+ form
74
+ fieldset
75
+ ].freeze
76
+
77
+ # Pattern to match opening block-level HTML tags
78
+ HTML_BLOCK_OPEN = /\A\s*<(#{HTML_BLOCK_ELEMENTS.join("|")})(?:\s|>)/i
79
+
80
+ # Pattern to match closing block-level HTML tags
81
+ HTML_BLOCK_CLOSE = /\A\s*<\/(#{HTML_BLOCK_ELEMENTS.join("|")})>/i
82
+
83
+ # HTML elements that contain markdown content (not HTML content)
84
+ # These should have blank lines before their closing tags
85
+ MARKDOWN_CONTAINER_ELEMENTS = %w[details].freeze
86
+
87
+ # Pattern to match closing tags for markdown containers
88
+ MARKDOWN_CONTAINER_CLOSE = /\A\s*<\/(#{MARKDOWN_CONTAINER_ELEMENTS.join("|")})>/i
89
+
90
+ # Markdown content: anything that's not blank, not HTML, and not a link ref def
91
+ MARKDOWN_CONTENT = ->(line) {
92
+ stripped = line.strip
93
+ return false if stripped.empty?
94
+ return false if stripped.start_with?("<")
95
+ return false if line.match?(LINK_REF_DEF)
96
+ true
97
+ }
98
+
99
+ # @return [String] The original content
100
+ attr_reader :source
101
+
102
+ # @return [Array<Hash>] Issues found
103
+ attr_reader :issues
104
+
105
+ # Initialize a new BlockSpacing fixer.
106
+ #
107
+ # @param source [String] The markdown content to analyze
108
+ def initialize(source)
109
+ @source = source
110
+ @issues = []
111
+ analyze
112
+ end
113
+
114
+ # Check if the content has block spacing issues.
115
+ #
116
+ # @return [Boolean] true if issues were found
117
+ def malformed?
118
+ @issues.any?
119
+ end
120
+
121
+ # Get the count of issues found.
122
+ #
123
+ # @return [Integer] number of issues
124
+ def issue_count
125
+ @issues.size
126
+ end
127
+
128
+ # Fix the block spacing issues.
129
+ #
130
+ # @return [String] Content with blank lines added where needed
131
+ def fix
132
+ return source unless malformed?
133
+
134
+ lines = source.lines
135
+ result = []
136
+ insertions = @issues.map { |i| i[:line] }.to_set
137
+
138
+ lines.each_with_index do |line, idx|
139
+ result << line
140
+ # If this line needs a blank line after it, add one
141
+ if insertions.include?(idx + 1) # issues use 1-based line numbers
142
+ result << "\n" unless line.strip.empty?
143
+ end
144
+ end
145
+
146
+ result.join
147
+ end
148
+
149
+ private
150
+
151
+ def analyze
152
+ lines = source.lines
153
+ return if lines.empty?
154
+
155
+ # Track depth of block-level HTML elements
156
+ # When depth > 0, we're inside an HTML block and shouldn't add blank lines
157
+ html_block_depth = 0
158
+
159
+ lines.each_with_index do |line, idx|
160
+ next_line = lines[idx + 1]
161
+ prev_line = (idx > 0) ? lines[idx - 1] : nil
162
+
163
+ # Special case: closing tags for markdown containers like </details>
164
+ # These contain markdown content, so we need blank lines before them
165
+ # even when inside an HTML block
166
+ is_markdown_container_close = line.match?(MARKDOWN_CONTAINER_CLOSE)
167
+
168
+ # Check for issues BEFORE updating depth
169
+ if html_block_depth <= 0
170
+ # Check for issues that need blank line AFTER current line
171
+ if next_line && !next_line.strip.empty?
172
+ check_thematic_break(line, next_line, idx)
173
+ check_list_before_heading(line, next_line, idx)
174
+ check_html_close_before_markdown(line, next_line, idx)
175
+ end
176
+
177
+ # Check for issues that need blank line BEFORE current line
178
+ if prev_line && !prev_line.strip.empty?
179
+ check_markdown_before_html(prev_line, line, idx)
180
+ end
181
+ end
182
+
183
+ # Special case: always check for blank line before </details> etc.
184
+ # because they contain markdown content
185
+ if is_markdown_container_close && prev_line && !prev_line.strip.empty?
186
+ check_markdown_before_html(prev_line, line, idx)
187
+ end
188
+
189
+ # Update HTML block depth AFTER checking for issues
190
+ # Count opening block-level tags
191
+ if line.match?(HTML_BLOCK_OPEN)
192
+ html_block_depth += 1
193
+ end
194
+
195
+ # Check for closing block-level tags
196
+ line.scan(HTML_BLOCK_CLOSE) do
197
+ html_block_depth -= 1 if html_block_depth > 0
198
+ end
199
+ end
200
+ end
201
+
202
+ def check_thematic_break(line, next_line, idx)
203
+ return unless line.match?(THEMATIC_BREAK)
204
+ return if next_line.strip.empty?
205
+
206
+ @issues << {
207
+ type: :thematic_break_needs_blank,
208
+ line: idx + 1,
209
+ description: "Thematic break should be followed by blank line",
210
+ }
211
+ end
212
+
213
+ def check_list_before_heading(line, next_line, idx)
214
+ return unless line.match?(LIST_ITEM)
215
+ return unless next_line.match?(HEADING)
216
+
217
+ @issues << {
218
+ type: :list_before_heading,
219
+ line: idx + 1,
220
+ description: "List item should be followed by blank line before heading",
221
+ }
222
+ end
223
+
224
+ def check_html_close_before_markdown(line, next_line, idx)
225
+ return unless line.match?(HTML_CLOSE_TAG)
226
+ # Next line is markdown (heading, list, paragraph start, etc.)
227
+ # but not HTML or blank
228
+ return if next_line.match?(/\A\s*</)
229
+ return if next_line.match?(LINK_REF_DEF)
230
+
231
+ @issues << {
232
+ type: :html_before_markdown,
233
+ line: idx + 1,
234
+ description: "HTML close tag should be followed by blank line before markdown",
235
+ }
236
+ end
237
+
238
+ def check_markdown_before_html(prev_line, line, idx)
239
+ # Current line is HTML (open or close tag)
240
+ return unless line.match?(HTML_ANY_TAG)
241
+ # Previous line is markdown content (not HTML, not blank, not link ref)
242
+ return unless MARKDOWN_CONTENT.call(prev_line)
243
+
244
+ @issues << {
245
+ type: :markdown_before_html,
246
+ line: idx, # Insert blank line BEFORE this line (so after prev_line)
247
+ description: "Markdown content should be followed by blank line before HTML",
248
+ }
249
+ end
250
+ end
251
+ end
252
+ end
253
+ end
@@ -0,0 +1,294 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "parslet"
4
+
5
+ module Markdown
6
+ module Merge
7
+ module Cleanse
8
+ # Parslet-based parser for fixing malformed fenced code blocks in Markdown.
9
+ #
10
+ # == The Problem
11
+ #
12
+ # This class fixes **improperly formatted fenced code blocks** where there is
13
+ # unwanted whitespace between the fence markers (``` or ~~~) and the language
14
+ # identifier.
15
+ #
16
+ # A bug in ast-merge (or its dependencies) caused fenced code blocks to be
17
+ # rendered with a space between the fence markers and the language identifier.
18
+ #
19
+ # == Bug Pattern
20
+ #
21
+ # CommonMark and most Markdown parsers expect NO space between fence and language:
22
+ # - **Correct:** ` ```ruby` or ` ~~~python`
23
+ # - **Incorrect:** ` ``` ruby` or ` ~~~ python` (extra space)
24
+ #
25
+ # The extra space can cause:
26
+ # - Syntax highlighting to fail
27
+ # - The language identifier to be ignored
28
+ # - Rendering issues in various Markdown processors
29
+ #
30
+ # @example Malformed (buggy) input
31
+ # "``` console\nsome code\n```"
32
+ #
33
+ # @example Fixed output
34
+ # "```console\nsome code\n```"
35
+ #
36
+ # == Scope
37
+ #
38
+ # This fixer handles:
39
+ # - **Any indentation level** (0+ spaces before fence)
40
+ # - Top-level: ` ```ruby`
41
+ # - In lists: ` ```python` (4 spaces)
42
+ # - **Both fence types:** backticks (```) and tildes (~~~)
43
+ # - **Any fence length:** 3+ markers (````, ~~~~~, etc.)
44
+ #
45
+ # == How It Works
46
+ #
47
+ # The parser uses a **PEG grammar** (via Parslet) to:
48
+ # - Detect fence opening lines with optional indentation
49
+ # - Identify spacing between fence and language identifier
50
+ # - Track opening/closing fence pairs to avoid false positives
51
+ # - Reconstruct fences with proper formatting (no space)
52
+ #
53
+ # **Why PEG?** The previous regex-based implementation used patterns like
54
+ # `([ \t]*)` which can cause polynomial backtracking (ReDoS vulnerability)
55
+ # when processing malicious input with many tabs/spaces. PEG parsers are
56
+ # linear-time and immune to ReDoS attacks.
57
+ #
58
+ # @example Basic usage
59
+ # parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
60
+ # fixed_content = parser.fix
61
+ #
62
+ # @example Check if content has malformed fences
63
+ # parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
64
+ # parser.malformed? # => true/false
65
+ #
66
+ # @example Process a file
67
+ # content = File.read("README.md")
68
+ # parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
69
+ # if parser.malformed?
70
+ # File.write("README.md", parser.fix)
71
+ # end
72
+ #
73
+ # @example Get details about code blocks
74
+ # parser = Markdown::Merge::Cleanse::CodeFenceSpacing.new(content)
75
+ # parser.code_blocks.each do |block|
76
+ # puts "#{block[:fence]}#{block[:language]}: malformed=#{block[:malformed]}"
77
+ # end
78
+ #
79
+ # @api public
80
+ class CodeFenceSpacing
81
+ # Grammar for parsing fenced code blocks with PEG parser.
82
+ #
83
+ # Recognizes:
84
+ # - Any amount of indentation (handles nested lists)
85
+ # - Backtick fences (```) and tilde fences (~~~)
86
+ # - Optional info string (language identifier)
87
+ # - Properly handles spacing issues
88
+ #
89
+ # This PEG grammar is linear-time and cannot have polynomial backtracking,
90
+ # eliminating ReDoS vulnerabilities.
91
+ #
92
+ # @api private
93
+ class CodeFenceGrammar < Parslet::Parser
94
+ # Any amount of indentation (handles code blocks in lists)
95
+ # Captured as string, not array
96
+ rule(:indent) { match("[ ]").repeat }
97
+
98
+ # Fence markers - 3+ backticks or tildes
99
+ rule(:backtick) { str("`") }
100
+ rule(:tilde) { str("~") }
101
+ rule(:backtick_fence) { backtick.repeat(3, nil) }
102
+ rule(:tilde_fence) { tilde.repeat(3, nil) }
103
+ rule(:fence) { backtick_fence | tilde_fence }
104
+
105
+ # Whitespace after fence (the bug we're fixing)
106
+ rule(:space) { match('[ \t]') }
107
+ rule(:spaces) { space.repeat(1) }
108
+ rule(:spaces?) { space.repeat }
109
+
110
+ # Info string (language identifier + optional attributes)
111
+ # Cannot contain backticks or tildes per CommonMark
112
+ rule(:info_char) { match('[^\r\n`~]') }
113
+ rule(:info_string) { info_char.repeat(1) }
114
+
115
+ # Line ending
116
+ rule(:line_end) { str("\r").maybe >> str("\n").maybe >> any.absent? }
117
+
118
+ # Fence line with optional indentation, optional spacing, optional info
119
+ # Capture: indent (raw), fence (as :fence), spacing (as :spacing), info (as :info)
120
+ rule(:fence_line) {
121
+ indent.as(:indent) >> fence.as(:fence) >> spaces?.as(:spacing) >> info_string.maybe.as(:info) >> line_end
122
+ }
123
+
124
+ root(:fence_line)
125
+ end
126
+
127
+ # @return [String] the input text to parse
128
+ attr_reader :source
129
+
130
+ # Create a new parser for the given text.
131
+ #
132
+ # @param source [String] the text that may contain malformed code fences
133
+ def initialize(source)
134
+ @source = source.to_s
135
+ @grammar = CodeFenceGrammar.new
136
+ @code_blocks = nil
137
+ end
138
+
139
+ # Check if the source contains malformed fenced code blocks.
140
+ #
141
+ # Detects the pattern where there's whitespace between the fence
142
+ # markers and the language identifier.
143
+ #
144
+ # @return [Boolean] true if malformed fences are detected
145
+ def malformed?
146
+ code_blocks.any? { |block| block[:malformed] }
147
+ end
148
+
149
+ # Parse and return information about all fenced code blocks.
150
+ #
151
+ # Only returns opening fences (not closing fences).
152
+ #
153
+ # @return [Array<Hash>] Array of code block info
154
+ # - :indent [String] The indentation before the fence
155
+ # - :fence [String] The fence markers (e.g., "```" or "~~~")
156
+ # - :language [String, nil] The language identifier
157
+ # - :spacing [String] Any spacing between fence and language
158
+ # - :malformed [Boolean] Whether this block has improper spacing
159
+ # - :line_number [Integer] Line number where block starts (1-based)
160
+ # - :original [String] The original opening fence line
161
+ def code_blocks
162
+ return @code_blocks if @code_blocks
163
+
164
+ @code_blocks = []
165
+ line_number = 0
166
+ in_code_block = false
167
+ current_fence_char = nil
168
+
169
+ source.each_line do |line|
170
+ line_number += 1
171
+
172
+ # Try to parse as fence line using PEG grammar
173
+ parsed = parse_fence_line(line)
174
+ next unless parsed
175
+
176
+ fence = parsed[:fence]
177
+ fence_char = fence[0]
178
+ spacing = parsed[:spacing] || ""
179
+ info = parsed[:info] || ""
180
+ indent = parsed[:indent] || ""
181
+
182
+ # Closing fence: matches current fence type and has no info
183
+ if in_code_block && fence_char == current_fence_char && info.empty?
184
+ in_code_block = false
185
+ current_fence_char = nil
186
+ next
187
+ end
188
+
189
+ # Opening fence
190
+ in_code_block = true
191
+ current_fence_char = fence_char
192
+
193
+ # Extract just the language (first word of info string)
194
+ language = info.strip.split(/\s+/).first
195
+ language = nil if language&.empty?
196
+
197
+ @code_blocks << {
198
+ indent: indent,
199
+ fence: fence,
200
+ language: language,
201
+ info_string: info.strip,
202
+ spacing: spacing,
203
+ malformed: !spacing.empty? && !language.nil?,
204
+ line_number: line_number,
205
+ original: line.chomp,
206
+ }
207
+ end
208
+
209
+ @code_blocks
210
+ end
211
+
212
+ # Fix malformed fenced code blocks by removing improper spacing.
213
+ #
214
+ # @return [String] the source with code fences fixed
215
+ def fix
216
+ return source unless malformed?
217
+
218
+ result = source.dup
219
+
220
+ # Process line by line, fixing malformed fences
221
+ lines = result.lines
222
+ fixed_lines = lines.map do |line|
223
+ fix_fence_line(line)
224
+ end
225
+
226
+ fixed_lines.join
227
+ end
228
+
229
+ # Count the number of malformed code blocks.
230
+ #
231
+ # @return [Integer] number of malformed fences found
232
+ def malformed_count
233
+ code_blocks.count { |block| block[:malformed] }
234
+ end
235
+
236
+ # Count the total number of code blocks.
237
+ #
238
+ # @return [Integer] total number of fenced code blocks
239
+ def count
240
+ code_blocks.size
241
+ end
242
+
243
+ private
244
+
245
+ # Parse a single line as a fence using PEG grammar.
246
+ #
247
+ # @param line [String] the line to parse
248
+ # @return [Hash, nil] parsed fence data or nil if not a fence
249
+ def parse_fence_line(line)
250
+ tree = @grammar.parse(line)
251
+
252
+ # Convert Parslet tree to simple hash
253
+ # Note: Parslet returns [] for empty repeats, we convert to empty string
254
+ indent_val = tree[:indent]
255
+ indent_str = indent_val.is_a?(Array) ? indent_val.join : indent_val.to_s
256
+
257
+ spacing_val = tree[:spacing]
258
+ spacing_str = spacing_val.is_a?(Array) ? spacing_val.join : spacing_val.to_s
259
+
260
+ info_val = tree[:info]
261
+ info_str = if info_val.is_a?(Array)
262
+ info_val.join
263
+ else
264
+ (info_val ? info_val.to_s : "")
265
+ end
266
+
267
+ {
268
+ indent: indent_str,
269
+ fence: tree[:fence].to_s,
270
+ spacing: spacing_str,
271
+ info: info_str,
272
+ }
273
+ rescue Parslet::ParseFailed
274
+ nil
275
+ end
276
+
277
+ # Fix a single line if it's a malformed fence.
278
+ #
279
+ # @param line [String] the line to potentially fix
280
+ # @return [String] the fixed line (or original if not malformed)
281
+ def fix_fence_line(line)
282
+ parsed = parse_fence_line(line)
283
+ return line unless parsed
284
+
285
+ # Only fix if there's spacing AND info string
286
+ return line if parsed[:spacing].empty? || parsed[:info].empty?
287
+
288
+ # Reconstruct: indent + fence + info (no spacing)
289
+ "#{parsed[:indent]}#{parsed[:fence]}#{parsed[:info]}\n"
290
+ end
291
+ end
292
+ end
293
+ end
294
+ end