yaparc 0.0.7 → 0.0.8

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. data/README +172 -8
  2. data/lib/yaparc.rb +402 -69
  3. data/tests/test_calc.rb +42 -105
  4. data/tests/test_parser.rb +304 -67
  5. metadata +2 -2
data/README CHANGED
@@ -1,10 +1,11 @@
1
1
  = Synopsis
2
2
 
3
- This is a yet another simple combinator parser library in ruby.
3
+ There are several implementations of parser combinator in ruby. This is a yet another simple combinator parser libraryin ruby.
4
4
 
5
5
  = Requirements
6
6
 
7
7
  * Ruby (http://www.ruby-lang.org/)
8
+ * RubyGem (http://rubyforge.org/projects/rubygems/)
8
9
 
9
10
  = Install
10
11
 
@@ -12,19 +13,182 @@ This is a yet another simple combinator parser library in ruby.
12
13
 
13
14
  = Usage
14
15
 
15
- require 'rubygems'
16
- require_gem 'yaparc'
16
+ In combinator parser, each parser is construct as a function taking input string as arguments. Larger parsers are built from smaller parsers. Although combinators are higher-order functions in ordinary functional languages, they are constructed as classes in yaparc, because Ruby has more object-oriented than functional property.
17
17
 
18
+ All parsers has 'parse' method, each of which takes input string as its arguments except SatisfyParser. All of them return an array of array as their result, with the empty array [] denoting faiilure, and a singleton array [[v, xs]] indicating success, with value v and uncosumed input xs as String instance.
18
19
 
19
- Please look at unit test files.
20
+ == Primitive Parsers
20
21
 
22
+ * SucceedParser
23
+ * FailParser
24
+ * ItemParser
25
+ * SatisfyParser
21
26
 
22
- == Basic Parsers
27
+ === SucceedParser class
23
28
 
24
- == Combination Parsers
29
+ The parser SucceedParser always succeeds with the result value, without consuming any of the input string.
30
+ In the following example, SucceedParser#parse takes an input string "blah, blah, blah" and returns the singleton array [[1, "blah, blah, blah"]].
25
31
 
26
- === Sequence Parser
27
- === Alternate Parser
32
+ parser = SucceedParser.new(1)
33
+ parser.parse("blah, blah, blah")
34
+ => [[1, "blah, blah, blah"]]
28
35
 
36
+ === FailParser class
37
+
38
+ The parser FailParser always fails, regardless of the contents of the input string.
39
+
40
+ parser = FailParser.new
41
+ parser.parse("abc")
42
+ => []
43
+
44
+ === ItemParser class
45
+
46
+ The parser ItemParser fails if the input string is empty, and succeeds with the first character as the result value otherwise.
47
+
48
+ parser = ::Yaparc::ItemParser.new
49
+ parser.parse("abc")
50
+ => [["a", "bc"]]
51
+
52
+ === SatisfyParser class
53
+
54
+ The parser SatisfyParser recognizes a single input via predicate which determines if an arbitrary input is suitable for the predicate.
55
+
56
+ is_integer = lambda do |i|
57
+ begin
58
+ Integer(i)
59
+ true
60
+ rescue
61
+ false
62
+ end
63
+ end
64
+ parser = SatisfyParser.new(is_integer)
65
+ parser.parse("123")
66
+ => [["1", "23"]]
67
+
68
+
69
+ == Combining Parsers
70
+
71
+ * AltParser
72
+ * SeqParser
73
+ * ManyParser
74
+ * ManyOneParser
75
+
76
+
77
+
78
+ === Sequencing parser
79
+
80
+ The SeqParser corresponds to sequencing in BNF. The following parser recognizes anything that Symbol.new('+') or Natural.new would if placed in succession.
81
+
82
+ parser = SeqParser.new(Symbol.new('+'), Natural.new)
83
+ parser.parse("+321")
84
+ => [[321,""]]
85
+
86
+ if a block given to SeqParser, it analyses input string to construct its logical structure.
87
+
88
+ parser = SeqParser.new(Symbol.new('+'), Natural.new) do | plus, nat|
89
+ nat
90
+ end
91
+ parser.parse("+1234")
92
+ => [[1234,""]]
93
+
94
+ It produces a parse tree which expounds the semantic structure of the program.
95
+
96
+ === Alternation parser
97
+
98
+ The parser AltParser class is an alternation parser, which returns the result of the first parser to succeed, and failure if neither does.
99
+
100
+
101
+ parser = AltParser.new(
102
+ SeqParser.new(Symbol.new('+'), Natural.new) do | _, nat|
103
+ nat
104
+ end,
105
+ Natural.new
106
+ )
107
+ parser.parse("1234")
108
+ => [[1234,""]]
109
+ parser.parse("-1234")
110
+ => []
111
+
112
+
113
+ === ManyParser
114
+
115
+ In ManyParser, zero or more applications of parser are admissible.
116
+
117
+ parser = ManyParser.new(SatisfyParser.new(lambda {|i| i > '0' and i < '9'}))
118
+ parser.parse("123abc")
119
+ => [["123", "abc"]]
120
+
121
+
122
+ === ManyOneParser
123
+
124
+ The ManyOneParser requires at least one successfull application of parser.
125
+
126
+
127
+ == Tokenized parser
128
+
129
+ * Identifier
130
+
131
+ Parser for identifier
132
+
133
+ * Natural
134
+
135
+ Parser for natural number
136
+
137
+ * Symbol
138
+
139
+
140
+ == Define your own parser
141
+
142
+
143
+ There are two ways to construct parser. One is to inherit from Yaparc::ParserBase class.
144
+
145
+ class StringMatch < Yaparc::ParserBase
146
+
147
+ def initialize(literal)
148
+ @parser = Token.new(StringParser.new(literal))
149
+ end
150
+ end
151
+
152
+ The other is to inherit from Yaparc::AbstractParser class.
153
+
154
+ class Identifier < Yaparc::AbstractParser
155
+ def initialize
156
+ @parser = lambda do
157
+ Token.new(Ident.new)
158
+ end
159
+ end
160
+ end
161
+
162
+ If you want to nest the same parser class in the parser definition, you have to choose this way.
163
+ In the following example, note that Expr class is instantiated inside Expr#initialize method.
164
+
165
+ class Expr < Yaparc::AbstractParser
166
+ def initialize
167
+ @parser = lambda do
168
+ Yaparc::AltParser.new(
169
+ Yaparc::SeqParser.new(Term.new,
170
+ Yaparc::Symbol.new('+'),
171
+ Expr.new) do |term, _, expr|
172
+ ['+', term,expr]
173
+ end,
174
+ Term.new
175
+ )
176
+ end
177
+ end
178
+
179
+ Constructing your parsers, it should be noted that left-recursion leads to non-termination of the parser.
180
+
181
+ == Avoiding left-recursion
182
+
183
+ A ::= A B | C
184
+
185
+ is equivalent to
186
+
187
+ A ::= C B*
188
+
189
+
190
+ == Tokenization
191
+
192
+ When you want to tokenize input stream, use Token class.
29
193
 
30
194