syntax_search 0.1.0 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,178 +1,43 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SyntaxErrorSearch
4
- # This class is responsible for generating, storing, and sorting code blocks
4
+ # The main function of the frontier is to hold the edges of our search and to
5
+ # evaluate when we can stop searching.
5
6
  #
6
- # The search algorithm for finding our syntax errors isn't in this class, but
7
- # this is class holds the bulk of the logic for generating, storing, detecting
8
- # and filtering invalid code.
7
+ # ## Knowing where we've been
9
8
  #
10
- # This is loosely based on the idea of a "frontier" for searching for a path
11
- # example: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
9
+ # Once a code block is generated it is added onto the frontier where it will be
10
+ # sorted and then the frontier can be filtered. Large blocks that totally contain a
11
+ # smaller block will cause the smaller block to be evicted.
12
12
  #
13
- # In this case our path is going from code with a syntax error to code without a
14
- # syntax error. We're currently doing that by evaluating individual lines
15
- # with respect to indentation and other whitespace (empty lines). As represented
16
- # by individual "code blocks".
13
+ # CodeFrontier#<<
14
+ # CodeFrontier#pop
17
15
  #
18
- # This class does not just store the frontier that we're searching, but is responsible
19
- # for generating new code blocks as well. This is not ideal, but the state of generating
20
- # and evaluating paths i.e. codeblocks is very tightly coupled.
16
+ # ## Knowing where we can go
21
17
  #
22
- # ## Creation
18
+ # Internally it keeps track of an "indent hash" which is exposed via `next_indent_line`
19
+ # when called this will return a line of code with the most indentation.
23
20
  #
24
- # This example code is re-used in the other sections
21
+ # This line of code can be used to build a CodeBlock via and then when that code block
22
+ # is added back to the frontier, then the lines in the code block are removed from the
23
+ # indent hash so we don't double-create the same block.
25
24
  #
26
- # Example:
25
+ # CodeFrontier#next_indent_line
26
+ # CodeFrontier#register_indent_block
27
27
  #
28
- # code_lines = [
29
- # CodeLine.new(line: "def cinco\n", index: 0)
30
- # CodeLine.new(line: " def dog\n", index: 1) # Syntax error 1
31
- # CodeLine.new(line: " def cat\n", index: 2) # Syntax error 2
32
- # CodeLine.new(line: "end\n", index: 3)
33
- # ]
28
+ # ## Knowing when to stop
34
29
  #
35
- # frontier = CodeFrontier.new(code_lines: code_lines)
30
+ # The frontier holds the syntax error when removing all code blocks from the original
31
+ # source document allows it to be parsed as syntatically valid:
36
32
  #
37
- # frontier << frontier.next_block if frontier.next_block?
38
- # frontier << frontier.next_block if frontier.next_block?
33
+ # CodeFrontier#holds_all_syntax_errors?
39
34
  #
40
- # frontier.holds_all_syntax_errors? # => true
41
- # block = frontier.pop
42
- # frontier.holds_all_syntax_errors? # => false
43
- # frontier << block
44
- # frontier.holds_all_syntax_errors? # => true
35
+ # ## Filtering false positives
45
36
  #
46
- # frontier.detect_invalid_blocks.map(&:to_s) # =>
47
- # [
48
- # "def dog\n",
49
- # "def cat\n"
50
- # ]
37
+ # Once the search is completed, the frontier will have many blocks that do not contain
38
+ # the syntax error. To filter to the smallest subset that does call:
51
39
  #
52
- # ## Block Generation
53
- #
54
- # Currently code blocks are generated based off of indentation. With the idea that blocks are,
55
- # well, indented. Once a code block is added to the frontier or it is expanded, or it is generated
56
- # then we also need to remove those lines from our generation code so we don't generate the same block
57
- # twice by accident.
58
- #
59
- # This is block generation is currently done via the "indent_hash" internally by starting at the outer
60
- # most indentation.
61
- #
62
- # Example:
63
- #
64
- # ```
65
- # def river
66
- # puts "lol" # <=== Start looking here and expand outwards
67
- # end
68
- # ```
69
- #
70
- # Generating new code blocks is a little verbose but looks like this:
71
- #
72
- # frontier << frontier.next_block if frontier.next_block?
73
- #
74
- # Once a block is in the frontier, it can be popped off:
75
- #
76
- # frontier.pop
77
- # # => <# CodeBlock >
78
- #
79
- # ## Block (frontier) storage, ordering and retrieval
80
- #
81
- # Once a block is generated it is stored internally in a frontier array. This is very similar to a search algorithm.
82
- # The array is sorted by indentation order, so that when a block is popped off the array, the one with
83
- # the largest current indentation is evaluated first.
84
- #
85
- # For example, if we have these two blocks in the frontier:
86
- #
87
- # ```
88
- # # Block A - 0 spaces for indentation
89
- #
90
- # def cinco
91
- # puts "lol"
92
- # end
93
- # ```
94
- #
95
- # ```
96
- # # Block B - 2 spaces for indentation
97
- #
98
- # def river
99
- # puts "hehe"
100
- # end
101
- # ```
102
- #
103
- # The "Block B" has more current indentation, so it would be evaluated first.
104
- #
105
- # ## Frontier evaluation (Find the syntax error)
106
- #
107
- # Another key difference between this and a normal search "frontier" is that we're not checking if
108
- # an individual code block meets the goal (turning invalid code to valid code) since there can
109
- # be multiple syntax errors and this will require multiple code blocks. To handle this, we're
110
- # evaluating all the contents of the frontier at the same time to see if the solution exists in any
111
- # of our search blocks.
112
- #
113
- # # Using the previously generated frontier
114
- #
115
- # frontier << Block.new(lines: code_lines[1], code_lines: code_lines)
116
- # frontier.holds_all_syntax_errors? # => false
117
- #
118
- # frontier << Block.new(lines: code_lines[2], code_lines: code_lines)
119
- # frontier.holds_all_syntax_errors? # => true
120
- #
121
- # ## Detect invalid blocks (Filter for smallest solution)
122
- #
123
- # After we prove that a solution exists and we've found it to be in our frontier, we can start stop searching.
124
- # Once we've done this, we need to search through the existing frontier code blocks to find the minimum combination
125
- # of blocks that hold the solution. This is done in: `detect_invalid_blocks`.
126
- #
127
- # # Using the previously generated frontier
128
- #
129
- # frontier << CodeBlock.new(lines: code_lines[0], code_lines: code_lines)
130
- # frontier << CodeBlock.new(lines: code_lines[1], code_lines: code_lines)
131
- # frontier << CodeBlock.new(lines: code_lines[2], code_lines: code_lines)
132
- # frontier << CodeBlock.new(lines: code_lines[3], code_lines: code_lines)
133
- #
134
- # frontier.count # => 4
135
- # frontier.detect_invalid_blocks.length => 2
136
- # frontier.detect_invalid_blocks.map(&:to_s) # =>
137
- # [
138
- # "def dog\n",
139
- # "def cat\n"
140
- # ]
141
- #
142
- # Once invalid blocks are found and filtered, then they can be passed to a formatter.
143
- #
144
- #
145
- #
146
-
147
- class IndentScan
148
- attr_reader :code_lines
149
-
150
- def initialize(code_lines: )
151
- @code_lines = code_lines
152
- end
153
-
154
- def neighbors_from_top(top_line)
155
- code_lines
156
- .select {|l| l.index >= top_line.index }
157
- .select {|l| l.not_empty? }
158
- .select {|l| l.visible? }
159
- .take_while {|l| l.indent >= top_line.indent }
160
- end
161
-
162
- def each_neighbor_block(top_line)
163
- neighbors = neighbors_from_top(top_line)
164
-
165
- until neighbors.empty?
166
- lines = [neighbors.pop]
167
- while (block = CodeBlock.new(lines: lines, code_lines: code_lines)) && block.invalid? && neighbors.any?
168
- lines.prepend neighbors.pop
169
- end
170
-
171
- yield block if block
172
- end
173
- end
174
- end
175
-
40
+ # CodeFrontier#detect_invalid_blocks
176
41
  class CodeFrontier
177
42
  def initialize(code_lines: )
178
43
  @code_lines = code_lines
@@ -207,16 +72,9 @@ module SyntaxErrorSearch
207
72
 
208
73
  # Returns a code block with the largest indentation possible
209
74
  def pop
210
- return nil if empty?
211
-
212
75
  return @frontier.pop
213
76
  end
214
77
 
215
- def next_block?
216
- !@indent_hash.empty?
217
- end
218
-
219
-
220
78
  def indent_hash_indent
221
79
  @indent_hash.keys.sort.last
222
80
  end
@@ -226,40 +84,25 @@ module SyntaxErrorSearch
226
84
  @indent_hash[indent]&.first
227
85
  end
228
86
 
229
- def generate_blocks
230
- end
231
-
232
- def next_block
233
- indent = @indent_hash.keys.sort.last
234
- lines = @indent_hash[indent].first
235
-
236
- block = CodeBlock.new(
237
- lines: lines,
238
- code_lines: @code_lines
239
- ).expand_until_neighbors
240
-
241
- register(block)
242
- block
243
- end
244
-
245
87
  def expand?
246
88
  return false if @frontier.empty?
247
89
  return true if @indent_hash.empty?
248
90
 
249
- @frontier.last.current_indent >= @indent_hash.keys.sort.last
250
- end
91
+ frontier_indent = @frontier.last.current_indent
92
+ hash_indent = @indent_hash.keys.sort.last
251
93
 
252
- # This method is responsible for determining if a new code
253
- # block should be generated instead of evaluating an already
254
- # existing block in the frontier
255
- def generate_new_block?
256
- return false if @indent_hash.empty?
257
- return true if @frontier.empty?
94
+ if ENV["DEBUG"]
95
+ puts "```"
96
+ puts @frontier.last.to_s
97
+ puts "```"
98
+ puts " @frontier indent: #{frontier_indent}"
99
+ puts " @hash indent: #{hash_indent}"
100
+ end
258
101
 
259
- @frontier.last.current_indent <= @indent_hash.keys.sort.last
102
+ frontier_indent >= hash_indent
260
103
  end
261
104
 
262
- def register(block)
105
+ def register_indent_block(block)
263
106
  block.lines.each do |line|
264
107
  @indent_hash[line.indent]&.delete(line)
265
108
  end
@@ -273,22 +116,18 @@ module SyntaxErrorSearch
273
116
  # and that each code block's lines are removed from the indentation hash so we
274
117
  # don't re-evaluate the same line multiple times.
275
118
  def <<(block)
276
- register(block)
119
+ register_indent_block(block)
277
120
 
121
+ # Make sure we don't double expand, if a code block fully engulfs another code block, keep the bigger one
122
+ @frontier.reject! {|b|
123
+ b.starts_at >= block.starts_at && b.ends_at <= block.ends_at
124
+ }
278
125
  @frontier << block
279
126
  @frontier.sort!
280
127
 
281
128
  self
282
129
  end
283
130
 
284
- def any?
285
- !empty?
286
- end
287
-
288
- def empty?
289
- @frontier.empty? && @indent_hash.empty?
290
- end
291
-
292
131
  # Example:
293
132
  #
294
133
  # combination([:a, :b, :c, :d])
@@ -3,15 +3,16 @@
3
3
  module SyntaxErrorSearch
4
4
  # Searches code for a syntax error
5
5
  #
6
- # The bulk of the heavy lifting is done by the CodeFrontier
6
+ # The bulk of the heavy lifting is done in:
7
7
  #
8
- # The flow looks like this:
8
+ # - CodeFrontier (Holds information for generating blocks and determining if we can stop searching)
9
+ # - ParseBlocksFromLine (Creates blocks into the frontier)
10
+ # - BlockExpand (Expands existing blocks to search more code
9
11
  #
10
12
  # ## Syntax error detection
11
13
  #
12
14
  # When the frontier holds the syntax error, we can stop searching
13
15
  #
14
- #
15
16
  # search = CodeSearch.new(<<~EOM)
16
17
  # def dog
17
18
  # def lol
@@ -23,42 +24,51 @@ module SyntaxErrorSearch
23
24
  # search.invalid_blocks.map(&:to_s) # =>
24
25
  # # => ["def lol\n"]
25
26
  #
26
- #
27
27
  class CodeSearch
28
28
  private; attr_reader :frontier; public
29
29
  public; attr_reader :invalid_blocks, :record_dir, :code_lines
30
30
 
31
- def initialize(string, record_dir: ENV["SYNTAX_SEARCH_RECORD_DIR"])
31
+ def initialize(source, record_dir: ENV["SYNTAX_SEARCH_RECORD_DIR"])
32
+ @source = source
32
33
  if record_dir
33
34
  @time = Time.now.strftime('%Y-%m-%d-%H-%M-%s-%N')
34
35
  @record_dir = Pathname(record_dir).join(@time).tap {|p| p.mkpath }
35
36
  @write_count = 0
36
37
  end
37
- @code_lines = string.lines.map.with_index do |line, i|
38
+ @code_lines = source.lines.map.with_index do |line, i|
38
39
  CodeLine.new(line: line, index: i)
39
40
  end
40
41
  @frontier = CodeFrontier.new(code_lines: @code_lines)
41
42
  @invalid_blocks = []
42
43
  @name_tick = Hash.new {|hash, k| hash[k] = 0 }
43
44
  @tick = 0
44
- @scan = IndentScan.new(code_lines: @code_lines)
45
+ @block_expand = BlockExpand.new(code_lines: code_lines)
46
+ @parse_blocks_from_indent_line = ParseBlocksFromIndentLine.new(code_lines: @code_lines)
45
47
  end
46
48
 
49
+ # Used for debugging
47
50
  def record(block:, name: "record")
48
51
  return if !@record_dir
49
52
  @name_tick[name] += 1
50
53
  filename = "#{@write_count += 1}-#{name}-#{@name_tick[name]}.txt"
54
+ if ENV["DEBUG"]
55
+ puts "\n\n==== #{filename} ===="
56
+ puts "\n```#{block.starts_at}:#{block.ends_at}"
57
+ puts "#{block.to_s}"
58
+ puts "```"
59
+ puts " block indent: #{block.current_indent}"
60
+ end
51
61
  @record_dir.join(filename).open(mode: "a") do |f|
52
62
  display = DisplayInvalidBlocks.new(
53
63
  blocks: block,
54
- terminal: false
64
+ terminal: false,
65
+ code_lines: @code_lines,
55
66
  )
56
67
  f.write(display.indent display.code_with_lines)
57
68
  end
58
69
  end
59
70
 
60
- def push_if_invalid(block, name: )
61
- frontier.register(block)
71
+ def push(block, name: )
62
72
  record(block: block, name: name)
63
73
 
64
74
  if block.valid?
@@ -69,33 +79,48 @@ module SyntaxErrorSearch
69
79
  end
70
80
  end
71
81
 
82
+ # Parses the most indented lines into blocks that are marked
83
+ # and added to the frontier
72
84
  def add_invalid_blocks
73
85
  max_indent = frontier.next_indent_line&.indent
74
86
 
75
87
  while (line = frontier.next_indent_line) && (line.indent == max_indent)
76
- neighbors = @scan.neighbors_from_top(frontier.next_indent_line)
77
88
 
78
- @scan.each_neighbor_block(frontier.next_indent_line) do |block|
89
+ @parse_blocks_from_indent_line.each_neighbor_block(frontier.next_indent_line) do |block|
79
90
  record(block: block, name: "add")
80
- if block.valid?
81
- block.lines.each(&:mark_invisible)
82
- end
83
- end
84
91
 
85
- block = CodeBlock.new(lines: neighbors, code_lines: @code_lines)
86
- push_if_invalid(block, name: "add")
92
+ block.mark_invisible if block.valid?
93
+ push(block, name: "add")
94
+ end
87
95
  end
88
96
  end
89
97
 
98
+ # Given an already existing block in the frontier, expand it to see
99
+ # if it contains our invalid syntax
90
100
  def expand_invalid_block
91
101
  block = frontier.pop
92
102
  return unless block
93
103
 
94
- block.expand_until_next_boundry
95
- push_if_invalid(block, name: "expand")
104
+ record(block: block, name: "pop")
105
+
106
+ # block = block.expand_until_next_boundry
107
+ block = @block_expand.call(block)
108
+ push(block, name: "expand")
109
+ end
110
+
111
+
112
+ def sweep_heredocs
113
+ HeredocBlockParse.new(
114
+ source: @source,
115
+ code_lines: @code_lines
116
+ ).call.each do |block|
117
+ push(block, name: "heredoc")
118
+ end
96
119
  end
97
120
 
121
+ # Main search loop
98
122
  def call
123
+ sweep_heredocs
99
124
  until frontier.holds_all_syntax_errors?
100
125
  @tick += 1
101
126
 
@@ -5,21 +5,22 @@ module SyntaxErrorSearch
5
5
  class DisplayInvalidBlocks
6
6
  attr_reader :filename
7
7
 
8
- def initialize(blocks:, io: $stderr, filename: nil, terminal: false)
8
+ def initialize(code_lines: ,blocks:, io: $stderr, filename: nil, terminal: false, invalid_type: :unmatched_end)
9
9
  @terminal = terminal
10
10
  @filename = filename
11
11
  @io = io
12
12
 
13
13
  @blocks = Array(blocks)
14
14
  @lines = @blocks.map(&:lines).flatten
15
- @code_lines = @blocks.first&.code_lines || []
15
+ @code_lines = code_lines
16
16
  @digit_count = @code_lines.last&.line_number.to_s.length
17
17
 
18
18
  @invalid_line_hash = @lines.each_with_object({}) {|line, h| h[line] = true }
19
+ @invalid_type = invalid_type
19
20
  end
20
21
 
21
22
  def call
22
- if @blocks.any?
23
+ if @blocks.any? { |b| !b.hidden? }
23
24
  found_invalid_blocks
24
25
  else
25
26
  @io.puts "Syntax OK"
@@ -33,15 +34,28 @@ module SyntaxErrorSearch
33
34
  end
34
35
 
35
36
  private def found_invalid_blocks
36
- @io.puts <<~EOM
37
+ case @invalid_type
38
+ when :missing_end
39
+ @io.puts <<~EOM
37
40
 
38
- SyntaxErrorSearch: A syntax error was detected
41
+ SyntaxSearch: Missing `end` detected
39
42
 
40
- This code has an unmatched `end` this is caused by either
41
- missing a syntax keyword (`def`, `do`, etc.) or inclusion
42
- of an extra `end` line
43
+ This code has a missing `end`. Ensure that all
44
+ syntax keywords (`def`, `do`, etc.) have a matching `end`.
45
+
46
+ EOM
47
+ when :unmatched_end
48
+ @io.puts <<~EOM
49
+
50
+ SyntaxSearch: Unmatched `end` detected
51
+
52
+ This code has an unmatched `end`. Ensure that all `end` lines
53
+ in your code have a matching syntax keyword (`def`, `do`, etc.)
54
+ and that you don't have any extra `end` lines.
55
+
56
+ EOM
57
+ end
43
58
 
44
- EOM
45
59
  @io.puts("file: #{filename}") if filename
46
60
  @io.puts <<~EOM
47
61
  simplified:
@@ -50,16 +64,13 @@ module SyntaxErrorSearch
50
64
  EOM
51
65
  end
52
66
 
53
- def indent(string, with: " ")
67
+ def indent(string, with: " ")
54
68
  string.each_line.map {|l| with + l }.join
55
69
  end
56
70
 
57
71
  def code_block
58
72
  string = String.new("")
59
- string << "```\n"
60
- # string << "#".rjust(@digit_count) + " filename: #{filename}\n\n" if filename
61
73
  string << code_with_lines
62
- string << "```\n"
63
74
  string
64
75
  end
65
76