ffast 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/docs/index.md ADDED
@@ -0,0 +1,356 @@
1
+ # Fast
2
+
3
+ Fast is a "Find AST" tool to help you search in the code abstract syntax tree.
4
+
5
+ Ruby allow us to do the same thing in a few ways then it's hard to check
6
+ how the code is written.
7
+
8
+ Using the AST will be easier than try to cover the multiple ways we can write
9
+ the same code.
10
+
11
+ You can define a string like `%||` or `''` or `""` but they will have the same
12
+ AST representation.
13
+
14
+ ## AST representation
15
+
16
+ Each detail of the ruby syntax have a equivalent identifier and some
17
+ content. The content can be another expression or a final value.
18
+
19
+ Fast uses parser gem behind the scenes to parse the code into nodes.
20
+
21
+ First get familiar with parser gem and understand how ruby code is represented.
22
+
23
+ When you install parser gem, you will have access to `ruby-parse` and you can
24
+ use it with `-e` to parse an expression directly from the command line.
25
+
26
+ Example:
27
+
28
+ ```
29
+ ruby-parse -e 1
30
+ ```
31
+
32
+ It will print the following output:
33
+
34
+ ```
35
+ (int 1)
36
+ ```
37
+
38
+ And trying a number with decimals:
39
+
40
+ ```
41
+ ruby-parse -e 1.1
42
+ (float 1)
43
+ ```
44
+
45
+ Building a regex that will match decimals and integer looks like something easy
46
+ and with fast you use a node pattern that reminds the syntax of regular
47
+ expressions.
48
+
49
+ ## Syntax for find in AST
50
+
51
+ The current version cover the following elements:
52
+
53
+ - `()` to represent a **node** search
54
+ - `{}` is for **any** matches like **union** conditions with **or** operator
55
+ - `[]` is for **all** matches like **intersect** conditions with **and** operator
56
+ - `$` is for **capture** current expression
57
+ - `_` is **something** not nil
58
+ - `nil` matches exactly **nil**
59
+ - `...` is a **node** with children
60
+ - `^` is to get the **parent node** of an expression
61
+ - `?` is for **maybe**
62
+ - `\1` to use the first **previous captured** element
63
+ - `""` surround the value with double quotes to match literal strings
64
+
65
+ Jump to [Syntax](syntax.md).
66
+
67
+ ## Fast.match?
68
+
69
+ `match?` is the most granular function that tries to compare a node with an
70
+ expression. It returns true or false and some node captures case it find
71
+ something.
72
+
73
+ Let's start with a simple integer in Ruby:
74
+
75
+ ```ruby
76
+ 1
77
+ ```
78
+
79
+ The AST can be represented with the following expression:
80
+
81
+ ```
82
+ (int 1)
83
+ ```
84
+
85
+ The ast representation holds node `type` and `children`.
86
+
87
+ Let's build a method `s` to represent `Parser::AST::Node` with a `#type` and `#children`.
88
+
89
+ ```ruby
90
+ def s(type, *children)
91
+ Parser::AST::Node.new(type, children)
92
+ end
93
+ ```
94
+
95
+ A local variable assignment:
96
+
97
+ ```ruby
98
+ value = 42
99
+ ```
100
+
101
+ Can be represented with:
102
+
103
+ ```ruby
104
+ ast = s(:lvasgn, :value, s(:int, 42))
105
+ ```
106
+
107
+ Now, lets find local variable named `value` with an value `42`:
108
+
109
+ ```ruby
110
+ Fast.match?(ast, '(lvasgn value (int 42))') # true
111
+ ```
112
+
113
+ Lets abstract a bit and allow some integer value using `_` as a shortcut:
114
+
115
+ ```ruby
116
+ Fast.match?(ast, '(lvasgn value (int _))') # true
117
+ ```
118
+
119
+ Lets abstract more and allow float or integer:
120
+
121
+ ```ruby
122
+ Fast.match?(ast, '(lvasgn value ({float int} _))') # true
123
+ ```
124
+
125
+ Or combine multiple assertions using `[]` to join conditions:
126
+
127
+ ```ruby
128
+ Fast.match?(ast, '(lvasgn value ([!str !hash !array] _))') # true
129
+ ```
130
+
131
+ Matches all local variables not string **and** not hash **and** not array.
132
+
133
+ We can match "a node with children" using `...`:
134
+
135
+ ```ruby
136
+ Fast.match?(ast, '(lvasgn value ...)') # true
137
+ ```
138
+
139
+ You can use `$` to capture a node:
140
+
141
+ ```ruby
142
+ Fast.match?(ast, '(lvasgn value $...)') # => [s(:int, 42)]
143
+ ```
144
+
145
+ Or match whatever local variable assignment combining both `_` and `...`:
146
+
147
+ ```ruby
148
+ Fast.match?(ast, '(lvasgn _ ...)') # true
149
+ ```
150
+
151
+ You can also use captures in any levels you want:
152
+
153
+ ```ruby
154
+ Fast.match?(ast, '(lvasgn $_ $...)') # [:value, s(:int, 42)]
155
+ ```
156
+
157
+ Keep in mind that `_` means something not nil and `...` means a node with
158
+ children.
159
+
160
+ Then, if do you get a method declared:
161
+
162
+ ```ruby
163
+ def my_method
164
+ call_other_method
165
+ end
166
+ ```
167
+ It will be represented with the following structure:
168
+
169
+ ```ruby
170
+ ast =
171
+ s(:def, :my_method,
172
+ s(:args),
173
+ s(:send, nil, :call_other_method))
174
+ ```
175
+
176
+ Keep an eye on the node `(args)`.
177
+
178
+ Then you know you can't use `...` but you can match with `(_)` to match with
179
+ such case.
180
+
181
+ Let's test a few other examples. You can go deeply with the arrays. Let's suppose we have a hardcore call to
182
+ `a.b.c.d` and the following AST represents it:
183
+
184
+ ```ruby
185
+ ast =
186
+ s(:send,
187
+ s(:send,
188
+ s(:send,
189
+ s(:send, nil, :a),
190
+ :b),
191
+ :c),
192
+ :d)
193
+ ```
194
+
195
+ You can search using sub-arrays with **pure values**, or **shortcuts** or
196
+ **procs**:
197
+
198
+ ```ruby
199
+ Fast.match?(ast, [:send, [:send, '...'], :d]) # => true
200
+ Fast.match?(ast, [:send, [:send, '...'], :c]) # => false
201
+ Fast.match?(ast, [:send, [:send, [:send, '...'], :c], :d]) # => true
202
+ ```
203
+
204
+ Shortcuts like `...` and `_` are just literals for procs. Then you can use
205
+ procs directly too:
206
+
207
+ ```ruby
208
+ Fast.match?(ast, [:send, [ -> (node) { node.type == :send }, [:send, '...'], :c], :d]) # => true
209
+ ```
210
+
211
+ And also work with expressions:
212
+
213
+ ```ruby
214
+ Fast.match?(
215
+ ast,
216
+ '(send (send (send (send nil $_) $_) $_) $_)'
217
+ ) # => [:a, :b, :c, :d]
218
+ ```
219
+
220
+ If something does not work you can debug with a block:
221
+
222
+ ```ruby
223
+ Fast.debug { Fast.match?(s(:int, 1), [:int, 1]) }
224
+ ```
225
+
226
+ It will output each comparison to stdout:
227
+
228
+ ```
229
+ int == (int 1) # => true
230
+ 1 == 1 # => true
231
+ ```
232
+
233
+ ## Use previous captures in search
234
+
235
+ Imagine you're looking for a method that is just delegating something to
236
+ another method, like:
237
+
238
+ ```ruby
239
+ def name
240
+ person.name
241
+ end
242
+ ```
243
+
244
+ This can be represented as the following AST:
245
+
246
+ ```
247
+ (def :name
248
+ (args)
249
+ (send
250
+ (send nil :person) :name))
251
+ ```
252
+
253
+ Then, let's build a search for methods that calls an attribute with the same
254
+ name:
255
+
256
+ ```ruby
257
+ Fast.match?(ast,'(def $_ ... (send (send nil _) \1))') # => [:name]
258
+ ```
259
+
260
+ ## Fast.search
261
+
262
+ Search allows you to go deeply in the AST, collecting nodes that matches with
263
+ the expression. It also returns captures if they exist.
264
+
265
+ ```ruby
266
+ Fast.search(code('a = 1'), '(int _)') # => s(:int, 1)
267
+ ```
268
+
269
+ If you use captures, it returns the node and the captures respectively:
270
+
271
+ ```ruby
272
+ Fast.search(code('a = 1'), '(int $_)') # => [s(:int, 1), 1]
273
+ ```
274
+
275
+ ## Fast.capture
276
+
277
+ To pick just the captures and ignore the nodes, use `Fast.capture`:
278
+
279
+ ```ruby
280
+ Fast.capture(code('a = 1'), '(int $_)') # => 1
281
+ ```
282
+ ## Fast.replace
283
+
284
+ And if I want to refactor a code and use `delegate <attribute>, to: <object>`, try with replace:
285
+
286
+ ```ruby
287
+ Fast.replace ast,
288
+ '(def $_ ... (send (send nil $_) \1))',
289
+ -> (node, captures) {
290
+ attribute, object = captures
291
+ replace(
292
+ node.location.expression,
293
+ "delegate :#{attribute}, to: :#{object}"
294
+ )
295
+ }
296
+ ```
297
+
298
+ ## Fast.replace_file
299
+
300
+ Now let's imagine we have real files like `sample.rb` with the following code:
301
+
302
+ ```ruby
303
+ def good_bye
304
+ message = ["good", "bye"]
305
+ puts message.join(' ')
306
+ end
307
+ ```
308
+
309
+ And we decide to remove the `message` variable and put it inline with the `puts`.
310
+
311
+ Basically, we need to find the local variable assignment, store the value in
312
+ memory. Remove the assignment expression and use the value where the variable
313
+ is being called.
314
+
315
+ ```ruby
316
+ assignment = nil
317
+ Fast.replace_file('sample.rb', '({ lvasgn lvar } message )',
318
+ -> (node, _) {
319
+ if node.type == :lvasgn
320
+ assignment = node.children.last
321
+ remove(node.location.expression)
322
+ elsif node.type == :lvar
323
+ replace(node.location.expression, assignment.location.expression.source)
324
+ end
325
+ }
326
+ )
327
+ ```
328
+
329
+ ## Fast.ast_from_File(file)
330
+
331
+ This method parses the code and load into a AST representation.
332
+
333
+ ```ruby
334
+ Fast.ast_from_file('sample.rb')
335
+ ```
336
+
337
+ ## Fast.search_file
338
+
339
+ You can use `search_file` and pass the path for search for expressions inside
340
+ files.
341
+
342
+ ```ruby
343
+ Fast.search_file('file.rb', expression)
344
+ ```
345
+
346
+ It's simple combination of `Fast.ast_from_file` with `Fast.search`.
347
+
348
+ ## Fast.ruby_files_from(arguments)
349
+
350
+ You'll be probably looking for multiple ruby files, then this method fetches
351
+ all internal `.rb` files
352
+
353
+ ```ruby
354
+ Fast.ruby_files_from(['lib']) # => ["lib/fast.rb"]
355
+ ```
356
+
@@ -0,0 +1,174 @@
1
+ # Research for code similarity
2
+
3
+ This is a small tutorial to explore code similarity.
4
+
5
+ The major idea is register all expression styles and see if we can find some
6
+ similarity between the structures.
7
+
8
+ First we need to create a function that can analyze AST nodes and extract a
9
+ pattern from the expression.
10
+
11
+ The expression needs to generalize final node values and recursively build a
12
+ pattern that can be used as a search expression.
13
+
14
+ ```ruby
15
+ def expression_from(node)
16
+ case node
17
+ when Parser::AST::Node
18
+ if node.children.any?
19
+ children_expression = node.children
20
+ .map(&method(:expression_from))
21
+ .join(' ')
22
+ "(#{node.type} #{children_expression})"
23
+ else
24
+ "(#{node.type})"
25
+ end
26
+ when nil, 'nil'
27
+ 'nil'
28
+ when Symbol, String, Integer
29
+ '_'
30
+ when Array, Hash
31
+ '...'
32
+ else
33
+ node
34
+ end
35
+ end
36
+ ```
37
+
38
+ The pattern generated only flexibilize the search allowing us to group similar nodes.
39
+
40
+ Example:
41
+
42
+ ```ruby
43
+ expression_from(code['1']) # =>'(int _)'
44
+ expression_from(code['nil']) # =>'(nil)'
45
+ expression_from(code['a = 1']) # =>'(lvasgn _ (int _))'
46
+ expression_from(code['def name; person.name end']) # =>'(def _ (args) (send (send nil _) _))'
47
+ ```
48
+
49
+ The current method can translate all kind of expressions and the next step is
50
+ observe some specific node types and try to group the similarities
51
+ using the pattern generated.
52
+
53
+ ```ruby
54
+ Fast.search_file('lib/fast.rb', 'class')
55
+ ```
56
+ Capturing the constant name and filtering only for symbols is easy and we can
57
+ see that we have a few classes defined in the the same file.
58
+
59
+ ```ruby
60
+ Fast.search_file('(class (const nil $_))','lib/fast.rb').grep(Symbol)
61
+ => [:Rewriter,
62
+ :ExpressionParser,
63
+ :Find,
64
+ :FindString,
65
+ :FindWithCapture,
66
+ :Capture,
67
+ :Parent,
68
+ :Any,
69
+ :All,
70
+ :Not,
71
+ :Maybe,
72
+ :Matcher,
73
+ :Experiment,
74
+ :ExperimentFile]
75
+ ```
76
+
77
+ The idea of this inspecton is build a proof of concept to show the similarity
78
+ of matcher classes because they only define a `match?` method.
79
+
80
+ ```ruby
81
+ patterns = Fast.search_file('class','lib/fast.rb').map{|n|Fast.expression_from(n)}
82
+ ```
83
+
84
+ A simple comparison between the patterns size versus `.uniq.size` can proof if
85
+ the idea will work.
86
+
87
+ ```ruby
88
+ patterns.size == patterns.uniq.size
89
+ ```
90
+
91
+ It does not work for the matcher cases but we can go deeper and analyze all
92
+ files required by bundler.
93
+
94
+ ```ruby
95
+ similarities = {}
96
+ Gem.find_files('*.rb').each do |file|
97
+ Fast.search_file('',file).map do |n|
98
+ key = Fast.expression_from(n)
99
+ similarities[key] ||= Set.new
100
+ similarities[key] << file
101
+ end
102
+ end
103
+ similarities.delete_if {|k,v|v.size < 2}
104
+ ```
105
+ The similarities found are the following:
106
+
107
+ ```ruby
108
+ {"(class (const nil _) (const nil _) nil)"=>
109
+ #<Set: {"/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb",
110
+ "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/method_source-0.9.0/lib/method_source.rb",
111
+ "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rdoc.rb",
112
+ "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/irb.rb",
113
+ "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/tsort.rb"}>,
114
+ "(class (const nil _) nil nil)"=>#<Set: {"/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/ripper.rb", "/Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/cgi.rb"}>}
115
+ ```
116
+
117
+ And now we can test the expression using the command line tool through the files
118
+ and observe the similarity:
119
+
120
+ ```
121
+ ⋊> ~ fast "(class (const nil _) (const nil _) nil)" /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/method_source-0.9.0/lib/method_source.rb /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rdoc.rb /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/irb.rb /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/tsort.rb
122
+ ```
123
+
124
+ Output:
125
+
126
+ ```ruby
127
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb:8
128
+ class DeadWorker < StandardError
129
+ end
130
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb:11
131
+ class Break < StandardError
132
+ end
133
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/parallel-1.12.1/lib/parallel.rb:14
134
+ class Kill < StandardError
135
+ end
136
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/gems/2.5.0/gems/method_source-0.9.0/lib/method_source.rb:16
137
+ class SourceNotFoundError < StandardError; end
138
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/rdoc.rb:63
139
+ class Error < RuntimeError; end
140
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/irb.rb:338
141
+ class Abort < Exception;end
142
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/tsort.rb:125
143
+ class Cyclic < StandardError
144
+ end
145
+ ```
146
+
147
+ It works and now we can create a method to do what the command line tool did,
148
+ grouping the patterns and inspecting the occurrences.
149
+
150
+ ```ruby
151
+ def similarities.show pattern
152
+ files = self[pattern]
153
+ files.each do |file|
154
+ nodes = Fast.search_file(pattern, file)
155
+ nodes.each do |result|
156
+ Fast.report(result, file: file)
157
+ end
158
+ end
159
+ end
160
+ ```
161
+
162
+ And calling the method exploring some "if" similarities, it prints the following
163
+ results:
164
+
165
+ ```ruby
166
+ similarities.show "(if (send (const nil _) _ (lvar _)) nil (return (false)))"
167
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/resolv.rb:1248
168
+ return false unless Name === other
169
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/fileutils.rb:138
170
+ return false unless File.exist?(new)
171
+ # /Users/jonatasdp/.rbenv/versions/2.5.1/lib/ruby/2.5.0/matrix.rb:1862
172
+ return false unless Vector === other
173
+ ```
174
+