ffast 0.0.2 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
data/docs/index.md ADDED
@@ -0,0 +1,356 @@
1
+ # Fast
2
+
3
+ Fast is a "Find AST" tool to help you search in the code abstract syntax tree.
4
+
5
+ Ruby allow us to do the same thing in a few ways then it's hard to check
6
+ how the code is written.
7
+
8
+ Using the AST will be easier than try to cover the multiple ways we can write
9
+ the same code.
10
+
11
+ You can define a string like `%||` or `''` or `""` but they will have the same
12
+ AST representation.
13
+
14
+ ## AST representation
15
+
16
+ Each detail of the ruby syntax have a equivalent identifier and some
17
+ content. The content can be another expression or a final value.
18
+
19
+ Fast uses parser gem behind the scenes to parse the code into nodes.
20
+
21
+ First get familiar with parser gem and understand how ruby code is represented.
22
+
23
+ When you install parser gem, you will have access to `ruby-parse` and you can
24
+ use it with `-e` to parse an expression directly from the command line.
25
+
26
+ Example:
27
+
28
+ ```
29
+ ruby-parse -e 1
30
+ ```
31
+
32
+ It will print the following output:
33
+
34
+ ```
35
+ (int 1)
36
+ ```
37
+
38
+ And trying a number with decimals:
39
+
40
+ ```
41
+ ruby-parse -e 1.1
42
+ (float 1)
43
+ ```
44
+
45
+ Building a regex that will match decimals and integer looks like something easy
46
+ and with fast you use a node pattern that reminds the syntax of regular
47
+ expressions.
48
+
49
+ ## Syntax for find in AST
50
+
51
+ The current version cover the following elements:
52
+
53
+ - `()` to represent a **node** search
54
+ - `{}` is for **any** matches like **union** conditions with **or** operator
55
+ - `[]` is for **all** matches like **intersect** conditions with **and** operator
56
+ - `$` is for **capture** current expression
57
+ - `_` is **something** not nil
58
+ - `nil` matches exactly **nil**
59
+ - `...` is a **node** with children
60
+ - `^` is to get the **parent node** of an expression
61
+ - `?` is for **maybe**
62
+ - `\1` to use the first **previous captured** element
63
+ - `""` surround the value with double quotes to match literal strings
64
+
65
+ Jump to [Syntax](syntax.md).
66
+
67
+ ## Fast.match?
68
+
69
+ `match?` is the most granular function that tries to compare a node with an
70
+ expression. It returns true or false and some node captures case it find
71
+ something.
72
+
73
+ Let's start with a simple integer in Ruby:
74
+
75
+ ```ruby
76
+ 1
77
+ ```
78
+
79
+ The AST can be represented with the following expression:
80
+
81
+ ```
82
+ (int 1)
83
+ ```
84
+
85
+ The ast representation holds node `type` and `children`.
86
+
87
+ Let's build a method `s` to represent `Parser::AST::Node` with a `#type` and `#children`.
88
+
89
+ ```ruby
90
+ def s(type, *children)
91
+ Parser::AST::Node.new(type, children)
92
+ end
93
+ ```
94
+
95
+ A local variable assignment:
96
+
97
+ ```ruby
98
+ value = 42
99
+ ```
100
+
101
+ Can be represented with:
102
+
103
+ ```ruby
104
+ ast = s(:lvasgn, :value, s(:int, 42))
105
+ ```
106
+
107
+ Now, lets find local variable named `value` with an value `42`:
108
+
109
+ ```ruby
110
+ Fast.match?(ast, '(lvasgn value (int 42))') # true
111
+ ```
112
+
113
+ Lets abstract a bit and allow some integer value using `_` as a shortcut:
114
+
115
+ ```ruby
116
+ Fast.match?(ast, '(lvasgn value (int _))') # true
117
+ ```
118
+
119
+ Lets abstract more and allow float or integer:
120
+
121
+ ```ruby
122
+ Fast.match?(ast, '(lvasgn value ({float int} _))') # true
123
+ ```
124
+
125
+ Or combine multiple assertions using `[]` to join conditions:
126
+
127
+ ```ruby
128
+ Fast.match?(ast, '(lvasgn value ([!str !hash !array] _))') # true
129
+ ```
130
+
131
+ Matches all local variables not string **and** not hash **and** not array.
132
+
133
+ We can match "a node with children" using `...`:
134
+
135
+ ```ruby
136
+ Fast.match?(ast, '(lvasgn value ...)') # true
137
+ ```
138
+
139
+ You can use `$` to capture a node:
140
+
141
+ ```ruby
142
+ Fast.match?(ast, '(lvasgn value $...)') # => [s(:int, 42)]
143
+ ```
144
+
145
+ Or match whatever local variable assignment combining both `_` and `...`:
146
+
147
+ ```ruby
148
+ Fast.match?(ast, '(lvasgn _ ...)') # true
149
+ ```
150
+
151
+ You can also use captures in any levels you want:
152
+
153
+ ```ruby
154
+ Fast.match?(ast, '(lvasgn $_ $...)') # [:value, s(:int, 42)]
155
+ ```
156
+
157
+ Keep in mind that `_` means something not nil and `...` means a node with
158
+ children.
159
+
160
+ Then, if do you get a method declared:
161
+
162
+ ```ruby
163
+ def my_method
164
+ call_other_method
165
+ end
166
+ ```
167
+ It will be represented with the following structure:
168
+
169
+ ```ruby
170
+ ast =
171
+ s(:def, :my_method,
172
+ s(:args),
173
+ s(:send, nil, :call_other_method))
174
+ ```
175
+
176
+ Keep an eye on the node `(args)`.
177
+
178
+ Then you know you can't use `...` but you can match with `(_)` to match with
179
+ such case.
180
+
181
+ Let's test a few other examples. You can go deeply with the arrays. Let's suppose we have a hardcore call to
182
+ `a.b.c.d` and the following AST represents it:
183
+
184
+ ```ruby
185
+ ast =
186
+ s(:send,
187
+ s(:send,
188
+ s(:send,
189
+ s(:send, nil, :a),
190
+ :b),
191
+ :c),
192
+ :d)
193
+ ```
194
+
195
+ You can search using sub-arrays with **pure values**, or **shortcuts** or
196
+ **procs**:
197
+
198
+ ```ruby
199
+ Fast.match?(ast, [:send, [:send, '...'], :d]) # => true
200
+ Fast.match?(ast, [:send, [:send, '...'], :c]) # => false
201
+ Fast.match?(ast, [:send, [:send, [:send, '...'], :c], :d]) # => true
202
+ ```
203
+
204
+ Shortcuts like `...` and `_` are just literals for procs. Then you can use
205
+ procs directly too:
206
+
207
+ ```ruby
208
+ Fast.match?(ast, [:send, [ -> (node) { node.type == :send }, [:send, '...'], :c], :d]) # => true
209
+ ```
210
+
211
+ And also work with expressions:
212
+
213
+ ```ruby
214
+ Fast.match?(
215
+ ast,
216
+ '(send (send (send (send nil $_) $_) $_) $_)'
217
+ ) # => [:a, :b, :c, :d]
218
+ ```
219
+
220
+ If something does not work you can debug with a block:
221
+
222
+ ```ruby
223
+ Fast.debug { Fast.match?(s(:int, 1), [:int, 1]) }
224
+ ```
225
+
226
+ It will output each comparison to stdout:
227
+
228
+ ```
229
+ int == (int 1) # => true
230
+ 1 == 1 # => true
231
+ ```
232
+
233
+ ## Use previous captures in search
234
+
235
+ Imagine you're looking for a method that is just delegating something to
236
+ another method, like:
237
+
238
+ ```ruby
239
+ def name
240
+ person.name
241
+ end
242
+ ```
243
+
244
+ This can be represented as the following AST:
245
+
246
+ ```
247
+ (def :name
248
+ (args)
249
+ (send
250
+ (send nil :person) :name))
251
+ ```
252
+
253
+ Then, let's build a search for methods that calls an attribute with the same
254
+ name:
255
+
256
+ ```ruby
257
+ Fast.match?(ast,'(def $_ ... (send (send nil _) \1))') # => [:name]
258
+ ```
259
+
260
+ ## Fast.search
261
+
262
+ Search allows you to go deeply in the AST, collecting nodes that matches with
263
+ the expression. It also returns captures if they exist.
264
+
265
+ ```ruby
266
+ Fast.search(code('a = 1'), '(int _)') # => s(:int, 1)
267
+ ```
268
+
269
+ If you use captures, it returns the node and the captures respectively:
270
+
271
+ ```ruby
272
+ Fast.search(code('a = 1'), '(int $_)') # => [s(:int, 1), 1]
273
+ ```
274
+
275
+ ## Fast.capture
276
+
277
+ To pick just the captures and ignore the nodes, use `Fast.capture`:
278
+
279
+ ```ruby
280
+ Fast.capture(code('a = 1'), '(int $_)') # => 1
281
+ ```
282
+ ## Fast.replace
283
+
284
+ And if I want to refactor a code and use `delegate <attribute>, to: <object>`, try with replace:
285
+
286
+ ```ruby
287
+ Fast.replace ast,
288
+ '(def $_ ... (send (send nil $_) \1))',
289
+ -> (node, captures) {
290
+ attribute, object = captures
291
+ replace(
292
+ node.location.expression,
293
+ "delegate :#{attribute}, to: :#{object}"
294
+ )
295
+ }
296
+ ```
297
+
298
+ ## Fast.replace_file
299
+
300
+ Now let's imagine we have real files like `sample.rb` with the following code:
301
+
302
+ ```ruby
303
+ def good_bye
304
+ message = ["good", "bye"]
305
+ puts message.join(' ')
306
+ end
307
+ ```
308
+
309
+ And we decide to remove the `message` variable and put it inline with the `puts`.
310
+
311
+ Basically, we need to find the local variable assignment, store the value in
312
+ memory. Remove the assignment expression and use the value where the variable
313
+ is being called.
314
+
315
+ ```ruby
316
+ assignment = nil
317
+ Fast.replace_file('sample.rb', '({ lvasgn lvar } message )',
318
+ -> (node, _) {
319
+ if node.type == :lvasgn
320
+ assignment = node.children.last
321
+ remove(node.location.expression)
322
+ elsif node.type == :lvar
323
+ replace(node.location.expression, assignment.location.expression.source)
324
+ end
325
+ }
326
+ )
327
+ ```
328
+
329
+ ## Fast.ast_from_File(file)
330
+
331
+ This method parses the code and load into a AST representation.
332
+
333
+ ```ruby
334
+ Fast.ast_from_file('sample.rb')
335
+ ```
336
+
337
+ ## Fast.search_file
338
+
339
+ You can use `search_file` and pass the path for search for expressions inside
340
+ files.
341
+
342
+ ```ruby
343
+ Fast.search_file('file.rb', expression)
344
+ ```
345
+
346
+ It's simple combination of `Fast.ast_from_file` with `Fast.search`.
347
+
348
+ ## Fast.ruby_files_from(arguments)
349
+
350
+ You'll be probably looking for multiple ruby files, then this method fetches
351
+ all internal `.rb` files
352
+
353
+ ```ruby
354
+ Fast.ruby_files_from(['lib']) # => ["lib/fast.rb"]
355
+ ```
356
+
@@ -0,0 +1,174 @@
1
+ # Research for code similarity
2
+
3
+ This is a small tutorial to explore code similarity.
4
+
5
+ The major idea is register all expression styles and see if we can find some
6
+ similarity between the structures.
7
+
8
+ First we need to create a function that can analyze AST nodes and extract a
9
+ pattern from the expression.
10
+
11
+ The expression needs to generalize final node values and recursively build a
12
+ pattern that can be used as a search expression.
13
+
14
+ ```ruby
15
+ def expression_from(node)
16
+ case node
17
+ when Parser::AST::Node
18
+ if node.children.any?
19
+ children_expression = node.children
20
+ .map(&method(:expression_from))
21
+ .join(' ')
22
+ "(#{node.type} #{children_expression})"
23
+ else
24
+ "(#{node.type})"
25
+ end
26
+ when nil, 'nil'
27
+ 'nil'
28
+ when Symbol, String, Integer
29
+ '_'
30
+ when Array, Hash
31
+ '...'
32
+ else
33
+ node
34
+ end
35
+ end
36
+ ```
37
+
38
+ The pattern generated only flexibilize the search allowing us to group similar nodes.
39
+
40
+ Example:
41
+
42
+ ```ruby
43
+ expression_from(code['1']) # =>'(int _)'
44
+ expression_from(code['nil']) # =>'(nil)'
45
+ expression_from(code['a = 1']) # =>'(lvasgn _ (int _))'
46
+ expression_from(code['def name; person.name end']) # =>'(def _ (args) (send (send nil _) _))'
47
+ ```
48
+
49
+ The current method can translate all kind of expressions and the next step is
50
+ observe some specific node types and try to group the similarities
51
+ using the pattern generated.
52
+
53
+ ```ruby
54
+ Fast.search_file('lib/fast.rb', 'class')
55
+ ```
56
+ Capturing the constant name and filtering only for symbols is easy and we can
57
+ see that we have a few classes defined in the the same file.
58
+
59
+ ```ruby
60
+ Fast.search_file('(class (const nil $_))','lib/fast.rb').grep(Symbol)
61
+ => [:Rewriter,
62
+ :ExpressionParser,
63
+ :Find,
64
+ :FindString,
65
+ :FindWithCapture,
66
+ :Capture,
67
+ :Parent,
68
+ :Any,
69
+ :All,
70
+ :Not,
71
+ :Maybe,
72
+ :Matcher,
73
+ :Experiment,
74
+ :ExperimentFile]
75
+ ```
76
+
77
+ The idea of this inspecton is build a proof of concept to show the similarity
78
+ of matcher classes because they only define a `match?` method.
79
+
80
+ ```ruby
81
+ patterns = Fast.search_file('class','lib/fast.rb').map{|n|Fast.expression_from(n)}
82
+ ```
83
+
84
+ A simple comparison between the patterns size versus `.uniq.size` can proof if
85
+ the idea will work.
86
+
87
+ ```ruby
88
+ patterns.size == patterns.uniq.size
89
+ ```
90
+
91
+ It does not work for the matcher cases but we can go deeper and analyze all
92
+ files required by bundler.
93
+
94
+ ```ruby
95
+ similarities = {}
96
+ Gem.find_files('*.rb').each do |file|
97
+ Fast.search_file('',file).map do |n|
98
+ key = Fast.expression_from(n)
99
+ similarities[key] ||= Set.new
100
+ similarities[key] << file
101
+ end
102
+ end
103
+ similarities.delete_if {|k,v|v.size < 2}
104
+ ```
105
+ The similarities found are the following:
106
+
107
+ ```ruby
108
+ {"(class (const nil _) (const nil _) nil)"=>
109
+ #<Set: {"/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb",
110
+ "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/method_source-0.9.0/lib/method_source.rb",
111
+ "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rdoc.rb",
112
+ "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/irb.rb",
113
+ "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/tsort.rb"}>,
114
+ "(class (const nil _) nil nil)"=>#<Set: {"/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/ripper.rb", "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/cgi.rb"}>}
115
+ ```
116
+
117
+ And now we can test the expression using the command line tool through the files
118
+ and observe the similarity:
119
+
120
+ ```
121
+ ⋊> ~ fast "(class (const nil _) (const nil _) nil)" /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/method_source-0.9.0/lib/method_source.rb /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rdoc.rb /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/irb.rb /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/tsort.rb
122
+ ```
123
+
124
+ Output:
125
+
126
+ ```ruby
127
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb:8
128
+ class DeadWorker < StandardError
129
+ end
130
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb:11
131
+ class Break < StandardError
132
+ end
133
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb:14
134
+ class Kill < StandardError
135
+ end
136
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/method_source-0.9.0/lib/method_source.rb:16
137
+ class SourceNotFoundError < StandardError; end
138
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rdoc.rb:63
139
+ class Error < RuntimeError; end
140
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/irb.rb:338
141
+ class Abort < Exception;end
142
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/tsort.rb:125
143
+ class Cyclic < StandardError
144
+ end
145
+ ```
146
+
147
+ It works and now we can create a method to do what the command line tool did,
148
+ grouping the patterns and inspecting the occurrences.
149
+
150
+ ```ruby
151
+ def similarities.show pattern
152
+ files = self[pattern]
153
+ files.each do |file|
154
+ nodes = Fast.search_file(pattern, file)
155
+ nodes.each do |result|
156
+ Fast.report(result, file: file)
157
+ end
158
+ end
159
+ end
160
+ ```
161
+
162
+ And calling the method exploring some "if" similarities, it prints the following
163
+ results:
164
+
165
+ ```ruby
166
+ similarities.show "(if (send (const nil _) _ (lvar _)) nil (return (false)))"
167
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/resolv.rb:1248
168
+ return false unless Name === other
169
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/fileutils.rb:138
170
+ return false unless File.exist?(new)
171
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/matrix.rb:1862
172
+ return false unless Vector === other
173
+ ```
174
+