parslet 0.9.0 → 0.10.1
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +4 -0
- data/HISTORY.txt +24 -1
- data/README +23 -66
- data/Rakefile +10 -6
- data/lib/parslet.rb +50 -137
- data/lib/parslet/atoms.rb +12 -479
- data/lib/parslet/atoms/alternative.rb +40 -0
- data/lib/parslet/atoms/base.rb +196 -0
- data/lib/parslet/atoms/entity.rb +48 -0
- data/lib/parslet/atoms/lookahead.rb +57 -0
- data/lib/parslet/atoms/named.rb +31 -0
- data/lib/parslet/atoms/re.rb +28 -0
- data/lib/parslet/atoms/repetition.rb +58 -0
- data/lib/parslet/atoms/sequence.rb +37 -0
- data/lib/parslet/atoms/str.rb +26 -0
- data/lib/parslet/error_tree.rb +2 -2
- data/lib/parslet/expression.rb +41 -0
- data/lib/parslet/expression/treetop.rb +53 -0
- data/lib/parslet/parser.rb +17 -0
- data/lib/parslet/pattern.rb +22 -12
- data/lib/parslet/pattern/binding.rb +25 -16
- data/lib/parslet/pattern/context.rb +24 -0
- data/lib/parslet/transform.rb +70 -25
- metadata +37 -8
data/Gemfile
CHANGED
data/HISTORY.txt
CHANGED
@@ -1,4 +1,27 @@
|
|
1
|
-
= 0.
|
1
|
+
= 0.10.1 / ???
|
2
|
+
|
3
|
+
+ Allow match['a-z'], shortcut for match('[a-z]')
|
4
|
+
|
5
|
+
! Fixed output inconsistencies (behaviour in connection to 'maybe')
|
6
|
+
|
7
|
+
= 0.10.0 / 22Nov2010
|
8
|
+
|
9
|
+
+ Parslet::Transform now takes a block on initialisation, wherein you can
|
10
|
+
define all the rules directly.
|
11
|
+
|
12
|
+
+ Parslet::Transform now only passes a hash to the block during transform
|
13
|
+
when its arity is 1. Otherwise all hash contents as bound as local
|
14
|
+
variables.
|
15
|
+
|
16
|
+
+ Both inline and other documentation have been improved.
|
17
|
+
|
18
|
+
+ You can now use 'subtree(:x)' to bind any subtree to x during tree pattern
|
19
|
+
matching.
|
20
|
+
|
21
|
+
+ Transform classes can now include rules into class definition. This makes
|
22
|
+
Parser and Transformer behave the same.
|
23
|
+
|
24
|
+
= 0.9.0 / 28Oct2010
|
2
25
|
* More of everything: Examples, documentation, etc...
|
3
26
|
|
4
27
|
* Breaking change: Ruby's binary or ('|') is now used for alternatives,
|
data/README
CHANGED
@@ -1,48 +1,17 @@
|
|
1
1
|
INTRODUCTION
|
2
2
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
My goal here was to see how a parser/parser generator should be constructed to
|
16
|
-
allow clean AST construction and good error handling. It seems to me that most
|
17
|
-
often, parser generators only handle the success-case and forget about
|
18
|
-
debugging and error generation.
|
19
|
-
|
20
|
-
More specifically, this library is motivated by one of my compiler projects. I
|
21
|
-
started out using 'treetop' (see the link above), but found it unusable. It
|
22
|
-
was lacking in
|
23
|
-
|
24
|
-
* error reporting: Hard to see where a grammar fails.
|
25
|
-
|
26
|
-
* stability of generated trees: Intermediary trees were dictated by the
|
27
|
-
grammar. It was hard to define invariants in that system - what was
|
28
|
-
convenient when writing the grammar often wasn't in subsequent stages.
|
29
|
-
|
30
|
-
* clarity of parser code: The parser code is generated and is very hard
|
31
|
-
to read. Add that to the first point to understand my pain.
|
32
|
-
|
33
|
-
So parslet tries to be different. It doesn't generate the parser, but instead
|
34
|
-
defines it in a DSL which is very close to what you find in [2]. A successful
|
35
|
-
parse then generates a parser tree consisting entirely of hashes and arrays
|
36
|
-
and strings (read: instable). This parser tree can then be converted to a real
|
37
|
-
AST (read: stable) using a pattern matcher that is also part of this library.
|
38
|
-
|
39
|
-
Error reporting is another area where parslet excels: It is able to print not
|
40
|
-
only the error you are used to seeing ('Parse failed because of REASON at line
|
41
|
-
1 and char 2'), but also prints what led to that failure in the form of a
|
42
|
-
tree (#error_tree method).
|
43
|
-
|
44
|
-
[1] http://en.wikipedia.org/wiki/Parsing_expression_grammar
|
45
|
-
[2] http://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf
|
3
|
+
Parslet makes developing complex parsers easy. It does so by
|
4
|
+
|
5
|
+
* providing the best *error reporting* possible
|
6
|
+
* *not generating* reams of code for you to debug
|
7
|
+
|
8
|
+
Parslet takes the long way around to make *your job* easier. It allows for
|
9
|
+
incremental language construction. Often, you start out small, implementing
|
10
|
+
the atoms of your language first; _parslet_ takes pride in making this
|
11
|
+
possible.
|
12
|
+
|
13
|
+
Eager to try this out? Please see the associated web site:
|
14
|
+
http://kschiess.github.com/parslet
|
46
15
|
|
47
16
|
SYNOPSIS
|
48
17
|
|
@@ -56,7 +25,7 @@ SYNOPSIS
|
|
56
25
|
str('"').absnt? >> any
|
57
26
|
).repeat.as(:string) >>
|
58
27
|
str('"')
|
59
|
-
|
28
|
+
|
60
29
|
# Parse the string and capture parts of the interpretation (:string above)
|
61
30
|
tree = parser.parse(%Q{
|
62
31
|
"This is a \\"String\\" in which you can escape stuff"
|
@@ -64,36 +33,24 @@ SYNOPSIS
|
|
64
33
|
|
65
34
|
tree # => {:string=>"This is a \\\"String\\\" in which you can escape stuff"}
|
66
35
|
|
67
|
-
# Here's how you can grab results from that tree:
|
36
|
+
# Here's how you can grab results from that tree, two methods:
|
37
|
+
|
38
|
+
# 1)
|
68
39
|
Pattern.new(:string => simple(:x)).each_match(tree) do |dictionary|
|
69
|
-
puts "String contents: #{dictionary[:x]}"
|
40
|
+
puts "String contents (method 1): #{dictionary[:x]}"
|
70
41
|
end
|
71
|
-
|
72
|
-
# Here's how to transform that tree into something else ----------------------
|
73
42
|
|
74
|
-
#
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
transform.
|
80
|
-
|
81
|
-
# Transforms the tree
|
82
|
-
transform.apply(tree)
|
83
|
-
|
84
|
-
# => #<struct StringLiteral text="This is a \\\"String\\\" ... escape stuff">
|
43
|
+
# 2)
|
44
|
+
transform = Parslet::Transform.new do
|
45
|
+
rule(:string => simple(:x)) {
|
46
|
+
puts "String contents (method 2): #{x}" }
|
47
|
+
end
|
48
|
+
transform.apply(tree)
|
85
49
|
|
86
50
|
COMPATIBILITY
|
87
51
|
|
88
52
|
This library should work with both ruby 1.8 and ruby 1.9.
|
89
53
|
|
90
|
-
AUTHORS
|
91
|
-
|
92
|
-
My gigantous thanks go to the following cool guys and gals that help make this
|
93
|
-
rock:
|
94
|
-
|
95
|
-
Florian Hanke <florian.hanke@gmail.com>
|
96
|
-
|
97
54
|
STATUS
|
98
55
|
|
99
56
|
On the road to 1.0; improving documentation, packaging and upgrading to rspec2.
|
data/Rakefile
CHANGED
@@ -18,7 +18,7 @@ spec = Gem::Specification.new do |s|
|
|
18
18
|
|
19
19
|
# Change these as appropriate
|
20
20
|
s.name = "parslet"
|
21
|
-
s.version = "0.
|
21
|
+
s.version = "0.10.1"
|
22
22
|
s.summary = "Parser construction library with great error reporting in Ruby."
|
23
23
|
s.author = "Kaspar Schiess"
|
24
24
|
s.email = "kaspar.schiess@absurd.li"
|
@@ -34,7 +34,7 @@ spec = Gem::Specification.new do |s|
|
|
34
34
|
|
35
35
|
# If you want to depend on other gems, add them here, along with any
|
36
36
|
# relevant versions
|
37
|
-
|
37
|
+
s.add_dependency("blankslate", "~> 2.1.2.3")
|
38
38
|
|
39
39
|
# If your tests use any gems, include them here
|
40
40
|
s.add_development_dependency("rspec")
|
@@ -60,11 +60,15 @@ end
|
|
60
60
|
|
61
61
|
task :package => :gemspec
|
62
62
|
|
63
|
+
require 'sdoc'
|
64
|
+
|
63
65
|
# Generate documentation
|
64
|
-
Rake::RDocTask.new do |
|
65
|
-
|
66
|
-
|
67
|
-
|
66
|
+
Rake::RDocTask.new do |rdoc|
|
67
|
+
rdoc.options << '--fmt' << 'shtml' # explictly set shtml generator
|
68
|
+
rdoc.template = 'direct' # lighter template used on railsapi.com
|
69
|
+
rdoc.main = "README"
|
70
|
+
rdoc.rdoc_files.include("README", "lib/**/*.rb")
|
71
|
+
rdoc.rdoc_dir = "rdoc"
|
68
72
|
end
|
69
73
|
|
70
74
|
desc 'Clear out RDoc and generated packages'
|
data/lib/parslet.rb
CHANGED
@@ -4,14 +4,9 @@ require 'stringio'
|
|
4
4
|
#
|
5
5
|
# require 'parslet'
|
6
6
|
#
|
7
|
-
# class MyParser
|
8
|
-
# include Parslet
|
9
|
-
#
|
7
|
+
# class MyParser < Parslet::Parser
|
10
8
|
# rule(:a) { str('a').repeat }
|
11
|
-
#
|
12
|
-
# def parse(str)
|
13
|
-
# a.parse(str)
|
14
|
-
# end
|
9
|
+
# root(:a)
|
15
10
|
# end
|
16
11
|
#
|
17
12
|
# pp MyParser.new.parse('aaaa') # => 'aaaa'
|
@@ -36,138 +31,19 @@ require 'stringio'
|
|
36
31
|
# and use the second stage to isolate the rest of your code from the changes
|
37
32
|
# you've effected.
|
38
33
|
#
|
39
|
-
# = Language Atoms
|
40
|
-
#
|
41
|
-
# PEG-style grammars build on a very small number of atoms, or parslets. In
|
42
|
-
# fact, only three types of parslets exist. Here's how to match a string:
|
43
|
-
#
|
44
|
-
# str('a string')
|
45
|
-
#
|
46
|
-
# This matches the string 'a string' literally and nothing else. If your input
|
47
|
-
# doesn't contain the string, it will fail. Here's how to match a character
|
48
|
-
# set:
|
49
|
-
#
|
50
|
-
# match('[abc]')
|
51
|
-
#
|
52
|
-
# This matches 'a', 'b' or 'c'. The string matched will always have a length
|
53
|
-
# of 1; to match longer strings, please see the title below. The last parslet
|
54
|
-
# of the three is 'any':
|
55
|
-
#
|
56
|
-
# any
|
57
|
-
#
|
58
|
-
# 'any' functions like the dot in regular expressions - it matches any single
|
59
|
-
# character.
|
60
|
-
#
|
61
|
-
# = Combination and Repetition
|
62
|
-
#
|
63
|
-
# Parslets only get useful when combined to grammars. To combine one parslet
|
64
|
-
# with the other, you have 4 kinds of methods available: repeat and maybe, >>
|
65
|
-
# (sequence), | (alternation), absnt? and prsnt?.
|
66
|
-
#
|
67
|
-
# str('a').repeat # any number of 'a's, including 0
|
68
|
-
# str('a').maybe # maybe there'll be an 'a', maybe not
|
69
|
-
#
|
70
|
-
# Parslets can be joined using >>. This means: Match the left parslet, then
|
71
|
-
# match the right parslet.
|
72
|
-
#
|
73
|
-
# str('a') >> str('b') # would match 'ab'
|
74
|
-
#
|
75
|
-
# Keep in mind that all combination and repetition operators themselves return
|
76
|
-
# a parslet. You can combine the result again:
|
77
|
-
#
|
78
|
-
# ( str('a') >> str('b') ) >> str('c') # would match 'abc'
|
79
|
-
#
|
80
|
-
# The slash ('|') indicates alternatives:
|
81
|
-
#
|
82
|
-
# str('a') | str('b') # would match 'a' OR 'b'
|
83
|
-
#
|
84
|
-
# The left side of an alternative is matched first; if it matches, the right
|
85
|
-
# side is never looked at.
|
86
|
-
#
|
87
|
-
# The absnt? and prsnt? qualifiers allow looking at input without consuming
|
88
|
-
# it:
|
89
|
-
#
|
90
|
-
# str('a').absnt? # will match if at the current position there is an 'a'.
|
91
|
-
# str('a').absnt? >> str('b') # check for 'a' then match 'b'
|
92
|
-
#
|
93
|
-
# This means that the second example will not match any input; when the second
|
94
|
-
# part is parsed, the first part has asserted the presence of 'a', and thus
|
95
|
-
# str('b') cannot match. The prsnt? method is the opposite of absnt?, it
|
96
|
-
# asserts presence.
|
97
|
-
#
|
98
|
-
# More documentation on these methods can be found in Parslets::Atoms::Base.
|
99
|
-
#
|
100
|
-
# = Intermediary Parse Trees
|
101
|
-
#
|
102
|
-
# As you have probably seen above, you can hand input (strings or StringIOs) to
|
103
|
-
# your parslets like this:
|
104
|
-
#
|
105
|
-
# parslet.parse(str)
|
106
|
-
#
|
107
|
-
# This returns an intermediary parse tree or raises an exception
|
108
|
-
# (Parslet::ParseFailed) when the input is not well formed.
|
109
|
-
#
|
110
|
-
# Intermediary parse trees are essentially just Plain Old Ruby Objects. (PORO
|
111
|
-
# technology as we call it.) Parslets try very hard to return sensible stuff;
|
112
|
-
# it is quite easy to use the results for the later stages of your program.
|
113
|
-
#
|
114
|
-
# Here a few examples and what their intermediary tree looks like:
|
115
|
-
#
|
116
|
-
# str('foo').parse('foo') # => 'foo'
|
117
|
-
# (str('f') >> str('o') >> str('o')).parse('foo') # => 'foo'
|
118
|
-
#
|
119
|
-
# Naming parslets
|
120
|
-
#
|
121
|
-
# Construction of lambda blocks
|
122
|
-
#
|
123
|
-
# = Intermediary Tree transformation
|
124
|
-
#
|
125
|
-
# The intermediary parse tree by itself is most often not very useful. Its
|
126
|
-
# form is volatile; changing your parser in the slightest might produce
|
127
|
-
# profound changes in the generated trees.
|
128
|
-
#
|
129
|
-
# Generally you will want to construct a more stable tree using your own
|
130
|
-
# carefully crafted representation of the domain. Parslet provides you with
|
131
|
-
# an elegant way of transmogrifying your intermediary tree into the output
|
132
|
-
# format you choose. This is achieved by transformation rules such as this
|
133
|
-
# one:
|
134
|
-
#
|
135
|
-
# transform.rule(:literal => {:string => :_x}) { |d|
|
136
|
-
# StringLit.new(*d.values) }
|
137
|
-
#
|
138
|
-
# The above rule will transform a subtree looking like this:
|
139
|
-
#
|
140
|
-
# :literal
|
141
|
-
# |
|
142
|
-
# :string
|
143
|
-
# |
|
144
|
-
# "somestring"
|
145
|
-
#
|
146
|
-
# into just this:
|
147
|
-
#
|
148
|
-
# StringLit
|
149
|
-
# value: "somestring"
|
150
|
-
#
|
151
|
-
#
|
152
|
-
# = Further documentation
|
153
|
-
#
|
154
|
-
# Please see the examples subdirectory of the distribution for more examples.
|
155
|
-
# Check out 'rooc' (github.com/kschiess/rooc) as well - it uses parslet for
|
156
|
-
# compiler construction.
|
157
|
-
#
|
158
34
|
module Parslet
|
159
35
|
def self.included(base)
|
160
36
|
base.extend(ClassMethods)
|
161
37
|
end
|
162
38
|
|
163
|
-
#
|
164
|
-
#
|
165
|
-
#
|
39
|
+
# Raised when the parse failed to match or to consume all its input. It
|
40
|
+
# contains the message that should be presented to the user. If you want to
|
41
|
+
# display more error explanation, you can print the #error_tree that is
|
166
42
|
# stored in the parslet. This is a graphical representation of what went
|
167
43
|
# wrong.
|
168
44
|
#
|
169
45
|
# Example:
|
170
|
-
#
|
46
|
+
#
|
171
47
|
# begin
|
172
48
|
# parslet.parse(str)
|
173
49
|
# rescue Parslet::ParseFailed => failure
|
@@ -181,6 +57,7 @@ module Parslet
|
|
181
57
|
# Define the parsers #root function. This is the place where you start
|
182
58
|
# parsing; if you have a rule for 'file' that describes what should be
|
183
59
|
# in a file, this would be your root declaration:
|
60
|
+
#
|
184
61
|
# class Parser
|
185
62
|
# root :file
|
186
63
|
# rule(:file) { ... }
|
@@ -205,9 +82,9 @@ module Parslet
|
|
205
82
|
end
|
206
83
|
end
|
207
84
|
|
208
|
-
# Define an entity for the parser. This generates a method of the same
|
209
|
-
# that can be used as part of other patterns. Those methods can be
|
210
|
-
# mixed in your parser class with real ruby methods.
|
85
|
+
# Define an entity for the parser. This generates a method of the same
|
86
|
+
# name that can be used as part of other patterns. Those methods can be
|
87
|
+
# freely mixed in your parser class with real ruby methods.
|
211
88
|
#
|
212
89
|
# Example:
|
213
90
|
#
|
@@ -233,6 +110,14 @@ module Parslet
|
|
233
110
|
end
|
234
111
|
end
|
235
112
|
|
113
|
+
# Allows for delayed construction of #match.
|
114
|
+
#
|
115
|
+
class DelayedMatchConstructor
|
116
|
+
def [](str)
|
117
|
+
Atoms::Re.new("[" + str + "]")
|
118
|
+
end
|
119
|
+
end
|
120
|
+
|
236
121
|
# Returns an atom matching a character class. This is essentially a regular
|
237
122
|
# expression, but you should only match a single character.
|
238
123
|
#
|
@@ -241,8 +126,10 @@ module Parslet
|
|
241
126
|
# match('[ab]') # will match either 'a' or 'b'
|
242
127
|
# match('[\n\s]') # will match newlines and spaces
|
243
128
|
#
|
244
|
-
def match(
|
245
|
-
|
129
|
+
def match(str=nil)
|
130
|
+
return DelayedMatchConstructor.new unless str
|
131
|
+
|
132
|
+
return Atoms::Re.new(str)
|
246
133
|
end
|
247
134
|
module_function :match
|
248
135
|
|
@@ -263,7 +150,19 @@ module Parslet
|
|
263
150
|
Atoms::Re.new('.')
|
264
151
|
end
|
265
152
|
module_function :any
|
266
|
-
|
153
|
+
|
154
|
+
# A special kind of atom that allows embedding whole treetop expressions
|
155
|
+
# into parslet construction.
|
156
|
+
#
|
157
|
+
# Example:
|
158
|
+
#
|
159
|
+
# exp(%Q("a" "b"?)) # => returns the same as str('a') >> str('b').maybe
|
160
|
+
#
|
161
|
+
def exp(str)
|
162
|
+
Parslet::Expression.new(str).to_parslet
|
163
|
+
end
|
164
|
+
module_function :exp
|
165
|
+
|
267
166
|
# Returns a placeholder for a tree transformation that will only match a
|
268
167
|
# sequence of elements. The +symbol+ you specify will be the key for the
|
269
168
|
# matched sequence in the returned dictionary.
|
@@ -292,10 +191,24 @@ module Parslet
|
|
292
191
|
Pattern::SimpleBind.new(symbol)
|
293
192
|
end
|
294
193
|
module_function :simple
|
194
|
+
|
195
|
+
# Returns a placeholder for tree transformation patterns that will match
|
196
|
+
# any kind of subtree.
|
197
|
+
#
|
198
|
+
# Example:
|
199
|
+
#
|
200
|
+
# { :expression => subtree(:exp) }
|
201
|
+
#
|
202
|
+
def subtree(symbol)
|
203
|
+
Pattern::SubtreeBind.new(symbol)
|
204
|
+
end
|
205
|
+
|
206
|
+
autoload :Expression, 'parslet/expression'
|
295
207
|
end
|
296
208
|
|
297
209
|
require 'parslet/error_tree'
|
298
210
|
require 'parslet/atoms'
|
299
211
|
require 'parslet/pattern'
|
300
212
|
require 'parslet/pattern/binding'
|
301
|
-
require 'parslet/transform'
|
213
|
+
require 'parslet/transform'
|
214
|
+
require 'parslet/parser'
|
data/lib/parslet/atoms.rb
CHANGED
@@ -1,4 +1,7 @@
|
|
1
1
|
module Parslet::Atoms
|
2
|
+
# The precedence module controls parenthesis during the #inspect printing
|
3
|
+
# of parslets. It is not relevant to other aspects of the parsing.
|
4
|
+
#
|
2
5
|
module Precedence
|
3
6
|
prec = 0
|
4
7
|
BASE = (prec+=1) # everything else
|
@@ -9,484 +12,14 @@ module Parslet::Atoms
|
|
9
12
|
OUTER = (prec+=1) # printing is done here.
|
10
13
|
end
|
11
14
|
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
result = apply(io)
|
22
|
-
|
23
|
-
# If we haven't consumed the input, then the pattern doesn't match. Try
|
24
|
-
# to provide a good error message (even asking down below)
|
25
|
-
unless io.eof?
|
26
|
-
# Do we know why we stopped matching input? If yes, that's a good
|
27
|
-
# error to fail with. Otherwise just report that we cannot consume the
|
28
|
-
# input.
|
29
|
-
if cause
|
30
|
-
raise Parslet::ParseFailed, "Unconsumed input, maybe because of this: #{cause}"
|
31
|
-
else
|
32
|
-
error(io, "Don't know what to do with #{io.string[io.pos,100]}")
|
33
|
-
end
|
34
|
-
end
|
35
|
-
|
36
|
-
return flatten(result)
|
37
|
-
end
|
38
|
-
|
39
|
-
def apply(io)
|
40
|
-
# p [:start, self, io.string[io.pos, 10]]
|
41
|
-
|
42
|
-
old_pos = io.pos
|
43
|
-
|
44
|
-
# p [:try, self, io.string[io.pos, 20]]
|
45
|
-
begin
|
46
|
-
r = try(io)
|
47
|
-
# p [:return_from, self, flatten(r)]
|
48
|
-
@last_cause = nil
|
49
|
-
return r
|
50
|
-
rescue Parslet::ParseFailed => ex
|
51
|
-
# p [:failing, self, io.string[io.pos, 20]]
|
52
|
-
io.pos = old_pos; raise ex
|
53
|
-
end
|
54
|
-
end
|
55
|
-
|
56
|
-
def repeat(min=0, max=nil)
|
57
|
-
Repetition.new(self, min, max)
|
58
|
-
end
|
59
|
-
def maybe
|
60
|
-
Repetition.new(self, 0, 1, :maybe)
|
61
|
-
end
|
62
|
-
def >>(parslet)
|
63
|
-
Sequence.new(self, parslet)
|
64
|
-
end
|
65
|
-
def |(parslet)
|
66
|
-
Alternative.new(self, parslet)
|
67
|
-
end
|
68
|
-
def absnt?
|
69
|
-
Lookahead.new(self, false)
|
70
|
-
end
|
71
|
-
def prsnt?
|
72
|
-
Lookahead.new(self, true)
|
73
|
-
end
|
74
|
-
def as(name)
|
75
|
-
Named.new(self, name)
|
76
|
-
end
|
77
|
-
|
78
|
-
def flatten(value)
|
79
|
-
# Passes through everything that isn't an array of things
|
80
|
-
return value unless value.instance_of? Array
|
81
|
-
|
82
|
-
# Extracts the s-expression tag
|
83
|
-
tag, *tail = value
|
84
|
-
|
85
|
-
# Merges arrays:
|
86
|
-
result = tail.
|
87
|
-
map { |e| flatten(e) } # first flatten each element
|
88
|
-
|
89
|
-
case tag
|
90
|
-
when :sequence
|
91
|
-
return flatten_sequence(result)
|
92
|
-
when :maybe
|
93
|
-
return result.first
|
94
|
-
when :repetition
|
95
|
-
return flatten_repetition(result)
|
96
|
-
end
|
97
|
-
|
98
|
-
fail "BUG: Unknown tag #{tag.inspect}."
|
99
|
-
end
|
100
|
-
def flatten_sequence(list)
|
101
|
-
list.inject('') { |r, e| # and then merge flat elements
|
102
|
-
case [r, e].map { |o| o.class }
|
103
|
-
when [Hash, Hash] # two keyed subtrees: make one
|
104
|
-
warn_about_duplicate_keys(r, e)
|
105
|
-
r.merge(e)
|
106
|
-
# a keyed tree and an array (push down)
|
107
|
-
when [Hash, Array]
|
108
|
-
[r] + e
|
109
|
-
when [Array, Hash]
|
110
|
-
r + [e]
|
111
|
-
when [String, String]
|
112
|
-
r << e
|
113
|
-
else
|
114
|
-
if r.instance_of? Hash
|
115
|
-
r # Ignore e, since its not a hash we can merge
|
116
|
-
else
|
117
|
-
e # Whatever e is at this point, we keep it
|
118
|
-
end
|
119
|
-
end
|
120
|
-
}
|
121
|
-
end
|
122
|
-
def flatten_repetition(list)
|
123
|
-
if list.any? { |e| e.instance_of?(Hash) }
|
124
|
-
# If keyed subtrees are in the array, we'll want to discard all
|
125
|
-
# strings inbetween. To keep them, name them.
|
126
|
-
return list.select { |e| e.instance_of?(Hash) }
|
127
|
-
end
|
128
|
-
|
129
|
-
if list.any? { |e| e.instance_of?(Array) }
|
130
|
-
# If any arrays are nested in this array, flatten all arrays to this
|
131
|
-
# level.
|
132
|
-
return list.
|
133
|
-
select { |e| e.instance_of?(Array) }.
|
134
|
-
flatten(1)
|
135
|
-
end
|
136
|
-
|
137
|
-
# If there are only strings, concatenate them and return that.
|
138
|
-
list.inject('') { |s,e| s<<(e||'') }
|
139
|
-
end
|
140
|
-
|
141
|
-
def self.precedence(prec)
|
142
|
-
define_method(:precedence) { prec }
|
143
|
-
end
|
144
|
-
precedence Precedence::BASE
|
145
|
-
def to_s(outer_prec)
|
146
|
-
if outer_prec < precedence
|
147
|
-
"("+to_s_inner(precedence)+")"
|
148
|
-
else
|
149
|
-
to_s_inner(precedence)
|
150
|
-
end
|
151
|
-
end
|
152
|
-
def inspect
|
153
|
-
to_s(Precedence::OUTER)
|
154
|
-
end
|
155
|
-
|
156
|
-
# Cause should return the current best approximation of this parslet
|
157
|
-
# of what went wrong with the parse. Not relevant if the parse succeeds,
|
158
|
-
# but needed for clever error reports.
|
159
|
-
#
|
160
|
-
def cause
|
161
|
-
@last_cause
|
162
|
-
end
|
163
|
-
|
164
|
-
# Error tree returns what went wrong here plus what went wrong inside
|
165
|
-
# subexpressions as a tree. The error stored for this node will be equal
|
166
|
-
# with #cause.
|
167
|
-
#
|
168
|
-
def error_tree
|
169
|
-
Parslet::ErrorTree.new(self) if cause?
|
170
|
-
end
|
171
|
-
def cause?
|
172
|
-
not @last_cause.nil?
|
173
|
-
end
|
174
|
-
private
|
175
|
-
# Report/raise a parse error with the given message, printing the current
|
176
|
-
# position as well. Appends 'at line X char Y.' to the message you give.
|
177
|
-
# If +pos+ is given, it is used as the real position the error happened,
|
178
|
-
# correcting the io's current position.
|
179
|
-
#
|
180
|
-
def error(io, str, pos=nil)
|
181
|
-
pre = io.string[0..(pos||io.pos)]
|
182
|
-
lines = Array(pre.lines)
|
183
|
-
|
184
|
-
if lines.empty?
|
185
|
-
formatted_cause = str
|
186
|
-
else
|
187
|
-
pos = lines.last.length
|
188
|
-
formatted_cause = "#{str} at line #{lines.count} char #{pos}."
|
189
|
-
end
|
190
|
-
|
191
|
-
@last_cause = formatted_cause
|
192
|
-
|
193
|
-
raise Parslet::ParseFailed, formatted_cause, nil
|
194
|
-
end
|
195
|
-
def warn_about_duplicate_keys(h1, h2)
|
196
|
-
d = h1.keys & h2.keys
|
197
|
-
unless d.empty?
|
198
|
-
warn "Duplicate subtrees while merging result of \n #{self.inspect}\nonly the values"+
|
199
|
-
" of the latter will be kept. (keys: #{d.inspect})"
|
200
|
-
end
|
201
|
-
end
|
202
|
-
end
|
203
|
-
|
204
|
-
class Named < Base
|
205
|
-
attr_reader :parslet, :name
|
206
|
-
def initialize(parslet, name)
|
207
|
-
@parslet, @name = parslet, name
|
208
|
-
end
|
209
|
-
|
210
|
-
def apply(io)
|
211
|
-
value = parslet.apply(io)
|
212
|
-
|
213
|
-
produce_return_value value
|
214
|
-
end
|
215
|
-
|
216
|
-
def to_s_inner(prec)
|
217
|
-
"#{name}:#{parslet.to_s(prec)}"
|
218
|
-
end
|
219
|
-
|
220
|
-
def error_tree
|
221
|
-
parslet.error_tree
|
222
|
-
end
|
223
|
-
private
|
224
|
-
def produce_return_value(val)
|
225
|
-
{ name => flatten(val) }
|
226
|
-
end
|
227
|
-
end
|
228
|
-
|
229
|
-
class Lookahead < Base
|
230
|
-
attr_reader :positive
|
231
|
-
attr_reader :bound_parslet
|
232
|
-
|
233
|
-
def initialize(bound_parslet, positive=true)
|
234
|
-
# Model positive and negative lookahead by testing this flag.
|
235
|
-
@positive = positive
|
236
|
-
@bound_parslet = bound_parslet
|
237
|
-
end
|
238
|
-
|
239
|
-
def try(io)
|
240
|
-
pos = io.pos
|
241
|
-
begin
|
242
|
-
bound_parslet.apply(io)
|
243
|
-
rescue Parslet::ParseFailed
|
244
|
-
return fail(io)
|
245
|
-
ensure
|
246
|
-
io.pos = pos
|
247
|
-
end
|
248
|
-
return success(io)
|
249
|
-
end
|
250
|
-
|
251
|
-
def fail(io)
|
252
|
-
if positive
|
253
|
-
error(io, "lookahead: #{bound_parslet.inspect} didn't match, but should have")
|
254
|
-
else
|
255
|
-
# TODO: Squash this down to nothing? Return value handling here...
|
256
|
-
return nil
|
257
|
-
end
|
258
|
-
end
|
259
|
-
def success(io)
|
260
|
-
if positive
|
261
|
-
return nil # see above, TODO
|
262
|
-
else
|
263
|
-
error(
|
264
|
-
io,
|
265
|
-
"negative lookahead: #{bound_parslet.inspect} matched, but shouldn't have")
|
266
|
-
end
|
267
|
-
end
|
268
|
-
|
269
|
-
precedence Precedence::LOOKAHEAD
|
270
|
-
def to_s_inner(prec)
|
271
|
-
char = positive ? '&' : '!'
|
272
|
-
|
273
|
-
"#{char}#{bound_parslet.to_s(prec)}"
|
274
|
-
end
|
275
|
-
|
276
|
-
def error_tree
|
277
|
-
bound_parslet.error_tree
|
278
|
-
end
|
279
|
-
end
|
280
|
-
|
281
|
-
class Alternative < Base
|
282
|
-
attr_reader :alternatives
|
283
|
-
def initialize(*alternatives)
|
284
|
-
@alternatives = alternatives
|
285
|
-
end
|
286
|
-
|
287
|
-
def |(parslet)
|
288
|
-
@alternatives << parslet
|
289
|
-
self
|
290
|
-
end
|
291
|
-
|
292
|
-
def try(io)
|
293
|
-
alternatives.each { |a|
|
294
|
-
begin
|
295
|
-
return a.apply(io)
|
296
|
-
rescue Parslet::ParseFailed => ex
|
297
|
-
end
|
298
|
-
}
|
299
|
-
# If we reach this point, all alternatives have failed.
|
300
|
-
error(io, "Expected one of #{alternatives.inspect}.")
|
301
|
-
end
|
302
|
-
|
303
|
-
precedence Precedence::ALTERNATE
|
304
|
-
def to_s_inner(prec)
|
305
|
-
alternatives.map { |a| a.to_s(prec) }.join(' | ')
|
306
|
-
end
|
307
|
-
|
308
|
-
def error_tree
|
309
|
-
Parslet::ErrorTree.new(self, *alternatives.
|
310
|
-
map { |child| child.error_tree })
|
311
|
-
end
|
312
|
-
end
|
313
|
-
|
314
|
-
# A sequence of parslets, matched from left to right. Denoted by '>>'
|
315
|
-
#
|
316
|
-
class Sequence < Base
|
317
|
-
attr_reader :parslets
|
318
|
-
def initialize(*parslets)
|
319
|
-
@parslets = parslets
|
320
|
-
end
|
321
|
-
|
322
|
-
def >>(parslet)
|
323
|
-
@parslets << parslet
|
324
|
-
self
|
325
|
-
end
|
326
|
-
|
327
|
-
def try(io)
|
328
|
-
[:sequence]+parslets.map { |p|
|
329
|
-
# Save each parslet as potentially offending (raising an error).
|
330
|
-
@offending_parslet = p
|
331
|
-
p.apply(io)
|
332
|
-
}
|
333
|
-
rescue Parslet::ParseFailed
|
334
|
-
error(io, "Failed to match sequence (#{self.inspect})")
|
335
|
-
end
|
336
|
-
|
337
|
-
precedence Precedence::SEQUENCE
|
338
|
-
def to_s_inner(prec)
|
339
|
-
parslets.map { |p| p.to_s(prec) }.join(' ')
|
340
|
-
end
|
341
|
-
|
342
|
-
def error_tree
|
343
|
-
Parslet::ErrorTree.new(self).tap { |t|
|
344
|
-
t.children << @offending_parslet.error_tree if @offending_parslet }
|
345
|
-
end
|
346
|
-
end
|
347
|
-
|
348
|
-
class Repetition < Base
|
349
|
-
attr_reader :min, :max, :parslet
|
350
|
-
def initialize(parslet, min, max, tag=:repetition)
|
351
|
-
@parslet = parslet
|
352
|
-
@min, @max = min, max
|
353
|
-
@tag = tag
|
354
|
-
end
|
355
|
-
|
356
|
-
def try(io)
|
357
|
-
occ = 0
|
358
|
-
result = [@tag] # initialize the result array with the tag (for flattening)
|
359
|
-
loop do
|
360
|
-
begin
|
361
|
-
result << parslet.apply(io)
|
362
|
-
occ += 1
|
363
|
-
|
364
|
-
# If we're not greedy (max is defined), check if that has been
|
365
|
-
# reached.
|
366
|
-
return result if max && occ>=max
|
367
|
-
rescue Parslet::ParseFailed => ex
|
368
|
-
# Greedy matcher has produced a failure. Check if occ (which will
|
369
|
-
# contain the number of sucesses) is in {min, max}.
|
370
|
-
# p [:repetition, occ, min, max]
|
371
|
-
error(io, "Expected at least #{min} of #{parslet.inspect}") if occ < min
|
372
|
-
return result
|
373
|
-
end
|
374
|
-
end
|
375
|
-
end
|
376
|
-
|
377
|
-
precedence Precedence::REPETITION
|
378
|
-
def to_s_inner(prec)
|
379
|
-
minmax = "{#{min}, #{max}}"
|
380
|
-
minmax = '?' if min == 0 && max == 1
|
381
|
-
|
382
|
-
parslet.to_s(prec) + minmax
|
383
|
-
end
|
384
|
-
|
385
|
-
def cause
|
386
|
-
# Either the repetition failed or the parslet inside failed to repeat.
|
387
|
-
super || parslet.cause
|
388
|
-
end
|
389
|
-
def error_tree
|
390
|
-
if cause?
|
391
|
-
Parslet::ErrorTree.new(self, parslet.error_tree)
|
392
|
-
else
|
393
|
-
parslet.error_tree
|
394
|
-
end
|
395
|
-
end
|
396
|
-
end
|
397
|
-
|
398
|
-
# Matches a special kind of regular expression that only ever matches one
|
399
|
-
# character at a time. Useful members of this family are: character ranges,
|
400
|
-
# \w, \d, \r, \n, ...
|
401
|
-
#
|
402
|
-
class Re < Base
|
403
|
-
attr_reader :match
|
404
|
-
def initialize(match)
|
405
|
-
@match = match
|
406
|
-
end
|
407
|
-
|
408
|
-
def try(io)
|
409
|
-
r = Regexp.new(match, Regexp::MULTILINE)
|
410
|
-
s = io.read(1)
|
411
|
-
error(io, "Premature end of input") unless s
|
412
|
-
error(io, "Failed to match #{match.inspect[1..-2]}") unless s.match(r)
|
413
|
-
return s
|
414
|
-
end
|
415
|
-
|
416
|
-
def to_s_inner(prec)
|
417
|
-
match.inspect[1..-2]
|
418
|
-
end
|
419
|
-
end
|
420
|
-
|
421
|
-
# Matches a string of characters.
|
422
|
-
#
|
423
|
-
class Str < Base
|
424
|
-
attr_reader :str
|
425
|
-
def initialize(str)
|
426
|
-
@str = str
|
427
|
-
end
|
428
|
-
|
429
|
-
def try(io)
|
430
|
-
old_pos = io.pos
|
431
|
-
s = io.read(str.size)
|
432
|
-
error(io, "Premature end of input") unless s && s.size==str.size
|
433
|
-
error(io, "Expected #{str.inspect}, but got #{s.inspect}", old_pos) \
|
434
|
-
unless s==str
|
435
|
-
return s
|
436
|
-
end
|
437
|
-
|
438
|
-
def to_s_inner(prec)
|
439
|
-
"'#{str}'"
|
440
|
-
end
|
441
|
-
end
|
442
|
-
|
443
|
-
# This wraps pieces of parslet definition and gives them a name. The wrapped
|
444
|
-
# piece is lazily evaluated and cached. This has two purposes:
|
445
|
-
#
|
446
|
-
# a) Avoid infinite recursion during evaluation of the definition
|
447
|
-
#
|
448
|
-
# b) Be able to print things by their name, not by their sometimes
|
449
|
-
# complicated content.
|
450
|
-
#
|
451
|
-
# You don't normally use this directly, instead you should generated it by
|
452
|
-
# using the structuring method Parslet#rule.
|
453
|
-
#
|
454
|
-
class Entity < Base
|
455
|
-
attr_reader :name, :context, :block
|
456
|
-
def initialize(name, context, block)
|
457
|
-
super()
|
458
|
-
|
459
|
-
@name = name
|
460
|
-
@context = context
|
461
|
-
@block = block
|
462
|
-
end
|
463
|
-
|
464
|
-
def try(io)
|
465
|
-
parslet.apply(io)
|
466
|
-
end
|
467
|
-
|
468
|
-
def parslet
|
469
|
-
@parslet ||= context.instance_eval(&block).tap { |p|
|
470
|
-
raise_not_implemented unless p
|
471
|
-
}
|
472
|
-
end
|
473
|
-
|
474
|
-
def to_s_inner(prec)
|
475
|
-
name.to_s.upcase
|
476
|
-
end
|
477
|
-
|
478
|
-
def error_tree
|
479
|
-
parslet.error_tree
|
480
|
-
end
|
481
|
-
|
482
|
-
private
|
483
|
-
def raise_not_implemented
|
484
|
-
trace = caller.reject {|l| l =~ %r{#{Regexp.escape(__FILE__)}}} # blatantly stolen from dependencies.rb in activesupport
|
485
|
-
exception = NotImplementedError.new("rule(#{name.inspect}) { ... } returns nil. Still not implemented, but already used?")
|
486
|
-
exception.set_backtrace(trace)
|
487
|
-
|
488
|
-
raise exception
|
489
|
-
end
|
490
|
-
end
|
15
|
+
autoload :Base, 'parslet/atoms/base'
|
16
|
+
autoload :Named, 'parslet/atoms/named'
|
17
|
+
autoload :Lookahead, 'parslet/atoms/lookahead'
|
18
|
+
autoload :Alternative, 'parslet/atoms/alternative'
|
19
|
+
autoload :Sequence, 'parslet/atoms/sequence'
|
20
|
+
autoload :Repetition, 'parslet/atoms/repetition'
|
21
|
+
autoload :Re, 'parslet/atoms/re'
|
22
|
+
autoload :Str, 'parslet/atoms/str'
|
23
|
+
autoload :Entity, 'parslet/atoms/entity'
|
491
24
|
end
|
492
25
|
|