ast_ast 0.0.0 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2009 Joshua Hawxwell
1
+ Copyright (c) 2009-2010 Joshua Hawxwell
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining
4
4
  a copy of this software and associated documentation files (the
data/README.md CHANGED
@@ -1,50 +1,120 @@
1
1
  # AstAst
2
2
 
3
3
 
4
- sSSSSs
5
- saaAAA Tttttts
6
- sa tT t TT tt
7
- saaaaaaA t tT t TT Ts
8
- sa tt T tT t TT Ts
9
- AaaaaaaAaaaaAAt TsssssTs
10
- tT t tSTSsssSTt tt
11
- t tt t tt
12
- /t tt /t tt
13
- ( t tt ( t tt
14
- \t tt \t tt
15
- t tt t tt
16
- t tt\ t tts
17
- S tS.`. S tS ss
18
- tsssst-' tsssstSSS
4
+ sSSSSs
5
+ saaAAA Tttttts
6
+ sa tT t TT tt
7
+ saaaaaaA t tT t TT Ts
8
+ sa tt T tT t TT Ts
9
+ - - AaaaaaaAaaaaAAt TsssssTs
10
+ tT t tSTSsssSTt tt
11
+ t tt t tt
12
+ st tt st tt
13
+ S t tt S t tt
14
+ st tt st tt
15
+ t tt t tt
16
+ t tts t tts
17
+ S tS s S tS ss
18
+ tsssstss tsssstSSS
19
+
19
20
 
20
21
 
22
+ ## How To
23
+ ### String -> Ast::Tokens
21
24
 
25
+ So you have a string, eg:
22
26
 
23
- __VERY IMPORTANT:__ it is probably a very bad idea to use this in something that relies on it. It will change without warning!
27
+ an example String, lorem!
28
+
29
+ And you want to turn it into a set of tokens, for some reason, but can't be bothered messing around with `strscan` so instead use `Ast::Tokeniser`
30
+
31
+ string = "an example String, lorem!"
32
+
33
+ class StringTokens < Ast::Tokeniser
34
+
35
+ # A rule uses a regular expression to match against the string given
36
+ # if it matches a token is created with the name given, eg. +:article+
37
+ rule :article, /an|a|the/
38
+ rule :word, /[a-z]+/
39
+ rule :punct, /,|\.|!/
40
+
41
+ # A rule can be passed a block that then modifies the match and returns
42
+ # something new in it's place, here we are removing the capital.
43
+ rule :pronoun, /[A-Z][a-z]+/ do |i|
44
+ i.downcase
45
+ end
46
+ end
47
+
48
+ StringTokens.tokenise(string)
49
+ #=> #< [0] <:article, "an">, <:word, "example">, <:pronoun, "string">, <:punct, ",">, <:word, "lorem">, <:punct, "!"> >
50
+
51
+
52
+ ### Ast::Tokens -> Ast::Tree
53
+
54
+ Later.
24
55
 
25
56
  ## Goals/Ideas
26
57
 
27
- Crazy simple string -> token converting, using regular expression rules and optional blocks. Some of the finer points of this still need working out, mainly should you be able to affect the name of the token within the block.
58
+ Now that it is possible to take a string and turn it into a set of tokens, I want to be able to take the tokens and turn them into a tree structure. This should be easy to write using a similar DSL to Tokeniser. See below for an idea on how this might be done, though of course when I start writing it, it will change _a lot_.
59
+
60
+ ### Ast::Ast
61
+
62
+ Imagine we have a string:
28
63
 
29
- class MyTokeniser < Ast::Tokeniser
30
- rule :long, /--[a-zA-Z0-9]+/
31
- rule :short, /-[a-zA-Z0-9]+/
32
- rule :word, /[a-zA-Z0-9]+/
64
+ string = <<EOS
65
+ def method
66
+ print 'hi'
33
67
  end
34
- input = "--along -sh aword"
35
- MyTokeniser.tokenise(input)
36
- #=> #<Ast::Tokens [[:long, "--along"], [:short, "-sh"], [:word, "aword"]]>
68
+ EOS
69
+
70
+ Which becomes these tokens:
71
+
72
+ tokens #=> [:defn], [:id, 'method'], [:id, 'print'], [:string, 'Hi'], [:end]
73
+
74
+ We're looking for a tree like this:
75
+
76
+ tree #=> [:defn, 'method', [
77
+ [:id, 'print', [
78
+ [:string, 'Hi']
79
+ ]]
80
+ ]]
81
+
82
+ Then the class could look something like (something being the keyword):
83
+
84
+ class MyAst < Ast::Ast
37
85
 
38
- # Use blocks to change results, passes matches
39
- class MyTokeniser < Ast::Tokeniser
40
- rule :long, /--([a-zA-Z0-9]+)/ {|i| i[1]}
41
- rule :short, /-([a-zA-Z0-9]+)/ {|i| i[1].split} # creates an array so splits into multiple tokens
42
- rule :word, /[a-zA-Z0-9]+/
86
+ # create a defn token
87
+ token :defn do
88
+ [
89
+ # start with :defn
90
+ :defn,
91
+
92
+ # get the name of method by reading next :id
93
+ read_next(:id), # if not :id throw error
94
+
95
+ # read rest of block, until the matching :end
96
+ [read_until(:end)]
97
+ ]
98
+ end
99
+
100
+ # allows you to use the name given in place of a list of token names
101
+ group :literal, [:string, :integer, :float]
102
+ # really just creates an array which responds to name given
103
+ group :defined, [:print, :puts, :putc, :gets, :getc]
104
+
105
+ token :id do |t, v|
106
+ case v
107
+ when 'method'
108
+ v
109
+ when :defined
110
+ [:call, v, [read_next(:literal)]]
111
+ else
112
+ [t, v]
113
+ end
114
+ end
115
+
116
+ token(:string) {|i| i } # not really necessary
43
117
  end
44
- input = "--along -sh aword"
45
- MyTokeniser.tokenise(input)
46
- #=> #<Ast::Tokens [[:long, "along"], [:short, "s"], [:short, "h"], [:word, "aword"]]>
47
-
48
118
 
49
119
  ## Copyright
50
120
 
data/Rakefile CHANGED
@@ -1,53 +1,10 @@
1
- require 'rubygems'
2
1
  require 'rake'
3
2
 
4
- begin
5
- require 'jeweler'
6
- Jeweler::Tasks.new do |gem|
7
- gem.name = "ast_ast"
8
- gem.summary = %Q{String -> Tokens -> AST}
9
- gem.description = %Q{Easily convert strings to Tokens and then on to an Abstract Syntax Tree easily. (Very far from finished!)}
10
- gem.email = "m@hawx.me"
11
- gem.homepage = "http://github.com/hawx/ast_ast"
12
- gem.authors = ["Joshua Hawxwell"]
13
- gem.add_development_dependency "thoughtbot-shoulda", ">= 0"
14
- gem.add_development_dependency "yard", ">= 0"
15
- # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
16
- end
17
- Jeweler::GemcutterTasks.new
18
- rescue LoadError
19
- puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
20
- end
21
-
22
- require 'rake/testtask'
23
- Rake::TestTask.new(:test) do |test|
24
- test.libs << 'lib' << 'test'
25
- test.pattern = 'test/**/test_*.rb'
26
- test.verbose = true
27
- end
28
-
29
- begin
30
- require 'rcov/rcovtask'
31
- Rcov::RcovTask.new do |test|
32
- test.libs << 'test'
33
- test.pattern = 'test/**/test_*.rb'
34
- test.verbose = true
35
- end
36
- rescue LoadError
37
- task :rcov do
38
- abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
39
- end
40
- end
41
-
42
- task :test => :check_dependencies
43
-
44
- task :default => :test
3
+ require File.expand_path('../lib/ast_ast/version', __FILE__)
45
4
 
46
- begin
47
- require 'yard'
48
- YARD::Rake::YardocTask.new
49
- rescue LoadError
50
- task :yardoc do
51
- abort "YARD is not available. In order to run yardoc, you must: sudo gem install yard"
5
+ namespace :release do
6
+ task :tag do
7
+ system("git tag v#{Ast::VERSION}")
8
+ system('git push origin --tags')
52
9
  end
53
10
  end
data/lib/ast_ast.rb CHANGED
@@ -1,10 +1,10 @@
1
- # Loads everything
2
- $:.unshift File.dirname(__FILE__)
1
+ $: << File.dirname(__FILE__)
3
2
 
4
3
  require 'strscan'
5
4
 
6
- require 'ast_ast/ast'
7
- require 'ast_ast/tree'
5
+ # require 'ast_ast/ast'
6
+ # require 'ast_ast/tree'
7
+
8
8
  require 'ast_ast/tokeniser'
9
- require 'ast_ast/tokens'
10
- require 'ast_ast/token'
9
+ require 'ast_ast/token'
10
+ require 'ast_ast/tokens'
data/lib/ast_ast/ast.rb CHANGED
@@ -1,5 +1,166 @@
1
1
  module Ast
2
2
  class Ast
3
+ attr_accessor :tokens, :token_descs, :block_descs, :groups
4
+
5
+ # @see Ast::Ast#token
6
+ class TokenDesc
7
+ attr_accessor :name, :block
8
+
9
+ def initialize(name, &block)
10
+ @name = name
11
+ @block = block || Proc.new {|t| t}
12
+ end
13
+ end
14
+
15
+ # @see Ast::Ast#group
16
+ class Group
17
+ attr_accessor :name, :items
18
+
19
+ def initialize(name, items)
20
+ @name = name
21
+ @items = items
22
+ end
23
+
24
+ # @see Array#include?
25
+ def include?(arg)
26
+ @items.include?(arg)
27
+ end
28
+ end
29
+
30
+ # @see Ast::Ast#block
31
+ class BlockDesc
32
+ attr_accessor :open, :close, :block
33
+
34
+ def initialize(open, close, &block)
35
+ @open, @close = open, close
36
+ @block = block || Proc.new {|b| b}
37
+ end
38
+ end
39
+
40
+ # Creates a new token within the subclass. The block is executed
41
+ # when the token is found during the execution of #astify.
42
+ #
43
+ # @example
44
+ #
45
+ # class TestAst < Ast::Ast
46
+ # token :test do
47
+ # p 'test'
48
+ # end
49
+ # end
50
+ #
51
+ def self.token(name, &block)
52
+ @token_descs ||= []
53
+ @token_descs << TokenDesc.new(name, &block)
54
+ end
55
+
56
+ # Creates a new group of token types, this allows you to refer
57
+ # to multiple tokens easily.
58
+ #
59
+ # @example
60
+ #
61
+ # group :names, [:john, :dave, :josh]
62
+ #
63
+ def self.group(name, items)
64
+ @groups ||= []
65
+ @groups << Group.new(name, items)
66
+ end
67
+
68
+ # Creates a block which begins with a certain token and ends with
69
+ # different token.
70
+ #
71
+ # @example
72
+ #
73
+ # block :begin => :end do |r|
74
+ # ...
75
+ # end
76
+ #
77
+ def self.block(t, &block)
78
+ @block_descs ||= []
79
+ @block_descs << BlockDesc.new(t.keys[0], t.values[0], &block)
80
+ end
81
+
82
+
83
+ def self.astify(tokens)
84
+ @tokens = tokens
85
+ t = find_block
86
+ t = run_tokens(t, @token_descs)
87
+ end
88
+
89
+ def self.run_tokens(tok, descs)
90
+ r = []
91
+ @curr_tree = tok
92
+
93
+ until tok.eot?
94
+ i = tok.scan
95
+ case i
96
+ when Token
97
+ # run the token
98
+ _desc = descs.find_all{|j| j.name == i.type}[0]
99
+ if _desc
100
+ r << _desc.block.call(i)
101
+ else
102
+ r << i
103
+ end
104
+ when Tree
105
+ # run the whole branch
106
+ r << run_tokens(i, descs)
107
+ end
108
+ end
109
+
110
+ r
111
+ end
112
+
113
+ def self.find_block(curr_desc=nil)
114
+ body = Tree.new
115
+
116
+ until @tokens.eot?
117
+ # Check if closes current search
118
+ if curr_desc && curr_desc.close == @tokens.curr_item.type
119
+ @tokens.inc
120
+ return body
121
+
122
+ # Check if close token in wrong place
123
+ elsif @block_descs.map(&:close).include?(@tokens.curr_item.type)
124
+ raise "close found before open: #{@tokens.curr_item}"
125
+
126
+ # Check if open token
127
+ elsif @block_descs.map(&:open).include?(@tokens.curr_item.type)
128
+ _desc = @block_descs.find_all {|i| i.open == @tokens.curr_item.type }[0]
129
+ @tokens.inc
130
+ found = find_block(_desc)
131
+ body << Tree.new(_desc.block.call(found))
132
+
133
+ # Otherwise add to body, and start with next token
134
+ else
135
+ body << @tokens.curr_item
136
+ @tokens.inc
137
+ end
138
+ end
139
+
140
+ body
141
+ end
142
+
143
+ # @group For #token block
144
+
145
+ # @see Tokens#scan
146
+ def self.scan(type=nil)
147
+ @curr_tree.scan(type)
148
+ end
149
+
150
+ # @see Tokens#check
151
+ def self.check(type=nil)
152
+ @curr_tree.check(type)
153
+ end
154
+
155
+ # @see Tokens#check
156
+ def self.scan_until(type)
157
+ @curr_tree.scan_until(type)
158
+ end
159
+
160
+ # @return [Ast::Tree] current tree being read
161
+ def self.curr_tree
162
+ @curr_tree
163
+ end
3
164
 
4
165
  end
5
- end
166
+ end
@@ -0,0 +1,187 @@
1
+ $: << File.dirname(__FILE__)
2
+ require 'token'
3
+
4
+ module Ast
5
+
6
+ # Allows you to describe the tree using BNF style syntax.
7
+ #
8
+ # In normal BNF you would write something like:
9
+ #
10
+ # <LETTER> ::= a|b|c|d|...|X|Y|Z
11
+ # <WORD> ::= <WORD><LETTER>|<LETTER>
12
+ # <QUOTE> ::= '
13
+ # <STRING> ::= <QUOTE><WORD><QUOTE>
14
+ #
15
+ # With Ast::BNF, assuming you have the correct tokens, it would
16
+ # become:
17
+ #
18
+ # define "Word", ["Word", :letter], :letter
19
+ # define "String", [:quote, "Word", :quote]
20
+ #
21
+ class BNF
22
+ attr_accessor :tokens, :defs
23
+
24
+ class Definition
25
+ attr_accessor :name, :rules
26
+
27
+ def initialize(name, rules, klass)
28
+ @name = name
29
+ @rules = rules.map {|i| i.is_a?(Array) ? i : [i] }
30
+ @klass = klass
31
+ end
32
+
33
+ # Gets the order of the Definition, this does require
34
+ # access to the other definitions. Here's why:
35
+ #
36
+ # The order of a definition is basically how many (max)
37
+ # times would you have to loop thorugh to get to a
38
+ # terminal rule. So from the example below,
39
+ #
40
+ # <LETTER> ::= a|b|c|d|...|X|Y|Z #=> terminal
41
+ # <WORD> ::= <WORD><LETTER>|<LETTER> #=> 1st order
42
+ # <STRING> ::= '<WORD>' #=> 2nd order
43
+ #
44
+ # Here it is easy to see that <LETTER> is terminal, no
45
+ # other rule will have to be looked at to determine if
46
+ # something is a <LETTER>. For a <WORD> you have to look
47
+ # at the <LETTER> definition, so this is 1st order. And
48
+ # for <STRING>, you need to look at <WORD> which in turn
49
+ # looks at <LETTER>, so you are going back 2 steps.
50
+ #
51
+ # @return [Integer] order of definition
52
+ #
53
+ def order
54
+ if terminal?
55
+ 0
56
+ elsif self_referential?
57
+ 1
58
+ else
59
+ r = 0
60
+ @rules.each do |rule|
61
+ # Only interested in rule with recursion
62
+ if rule.size > 1
63
+ rule.each do |elem|
64
+ # Only interested in references
65
+ if elem.is_a? String
66
+ b = @klass.defs.find_all {|i| i.name == elem}[0].order + 1
67
+ r = b if b > r # swap if higher
68
+ end
69
+ end
70
+ end
71
+ end
72
+ r
73
+ end
74
+ end
75
+
76
+ # A terminal defintion does not reference any other
77
+ # definitions. This is largely irrelevent as Ast::Tokeniser
78
+ # should take care of this but it may be useful in some
79
+ # cases.
80
+ #
81
+ # @return [Boolean] whether contains just terminal elements
82
+ #
83
+ def terminal?
84
+ @rules.each do |r|
85
+ if r.is_a? Array
86
+ r.each do |i|
87
+ return false if i.is_a? String
88
+ end
89
+ end
90
+ end
91
+ true
92
+ end
93
+
94
+ # A Definition is self referential if the only refernce to
95
+ # another rule is to itself or if the other references are
96
+ # to terminal rule.
97
+ #
98
+ # This is not a perfect definition of what "self referential"
99
+ # really means but it does help when finding the order!
100
+ #
101
+ # @return [Boolean] whether the definition is self referential
102
+ #
103
+ def self_referential?
104
+ r = false
105
+ @rules.each do |rule|
106
+ rule.each do |elem|
107
+ if elem == @name
108
+ r = true
109
+ else
110
+ k = @klass.defs.find_all{|i| i.name == elem}[0]
111
+ if k && k.terminal?
112
+ r = true
113
+ else
114
+ return false
115
+ end
116
+ end
117
+ end
118
+ end
119
+ r
120
+ end
121
+
122
+ def inspect; "#<Ast::BNF::Definition #{@name}>"; end
123
+
124
+ end
125
+
126
+ def initialize(name, &block)
127
+ @block = block
128
+ end
129
+
130
+ def to_tree(tokens)
131
+ self.instance_eval(&@block)
132
+
133
+ # get matrix of defs in order by order
134
+ defs_orders = @defs.collect {|i| [i.order, i]}
135
+ ordered_defs = []
136
+ defs_orders.each do |i|
137
+ ordered_defs[i[0]] ||= []
138
+ ordered_defs[i[0]] << i[1]
139
+ end
140
+
141
+ result = []
142
+ ordered_defs.each do |order|
143
+
144
+ order.each do |definition|
145
+ c = tokens.scan
146
+
147
+ definition.rules.each do |rule|
148
+ list = tokens.peek(rule.size)
149
+
150
+ res = []
151
+ rule.zip(list) do |(a, b)|
152
+ next if b.nil?
153
+ if a == b.type
154
+ res << b.value
155
+ end
156
+ end
157
+ next if res.size != rule.size
158
+ p [definition.name, res.join('')]
159
+ end
160
+ end
161
+ end
162
+
163
+ tokens
164
+ end
165
+
166
+ def define(name, *args)
167
+ @defs ||= []
168
+ @defs << Definition.new(name, args, self)
169
+ end
170
+
171
+ end
172
+ end
173
+
174
+
175
+ # This is here for testing only! Better name is required
176
+ def bnf_definition(name, &block)
177
+ Ast::BNF.new(name, &block)
178
+ end
179
+
180
+ test = bnf_definition('hello') do
181
+ define "Digit", :number
182
+ define "Letter", :letter
183
+ define "Number", ["Number", "Digit"], "Digit"
184
+ define "Word", ["Word", "Letter"], "Letter"
185
+ define "String", [:quote, "Word", :quote]
186
+ end
187
+ p test.to_tree Ast::Tokens.new([[:letter, 'a'], [:letter, 'b'], [:number, '5'], [:number, '9']])