parslet 0.9.0 → 0.10.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/Gemfile +4 -0
- data/HISTORY.txt +24 -1
- data/README +23 -66
- data/Rakefile +10 -6
- data/lib/parslet.rb +50 -137
- data/lib/parslet/atoms.rb +12 -479
- data/lib/parslet/atoms/alternative.rb +40 -0
- data/lib/parslet/atoms/base.rb +196 -0
- data/lib/parslet/atoms/entity.rb +48 -0
- data/lib/parslet/atoms/lookahead.rb +57 -0
- data/lib/parslet/atoms/named.rb +31 -0
- data/lib/parslet/atoms/re.rb +28 -0
- data/lib/parslet/atoms/repetition.rb +58 -0
- data/lib/parslet/atoms/sequence.rb +37 -0
- data/lib/parslet/atoms/str.rb +26 -0
- data/lib/parslet/error_tree.rb +2 -2
- data/lib/parslet/expression.rb +41 -0
- data/lib/parslet/expression/treetop.rb +53 -0
- data/lib/parslet/parser.rb +17 -0
- data/lib/parslet/pattern.rb +22 -12
- data/lib/parslet/pattern/binding.rb +25 -16
- data/lib/parslet/pattern/context.rb +24 -0
- data/lib/parslet/transform.rb +70 -25
- metadata +37 -8
data/Gemfile
CHANGED
data/HISTORY.txt
CHANGED
@@ -1,4 +1,27 @@
|
|
1
|
-
= 0.
|
1
|
+
= 0.10.1 / ???
|
2
|
+
|
3
|
+
+ Allow match['a-z'], shortcut for match('[a-z]')
|
4
|
+
|
5
|
+
! Fixed output inconsistencies (behaviour in connection to 'maybe')
|
6
|
+
|
7
|
+
= 0.10.0 / 22Nov2010
|
8
|
+
|
9
|
+
+ Parslet::Transform now takes a block on initialisation, wherein you can
|
10
|
+
define all the rules directly.
|
11
|
+
|
12
|
+
+ Parslet::Transform now only passes a hash to the block during transform
|
13
|
+
when its arity is 1. Otherwise all hash contents as bound as local
|
14
|
+
variables.
|
15
|
+
|
16
|
+
+ Both inline and other documentation have been improved.
|
17
|
+
|
18
|
+
+ You can now use 'subtree(:x)' to bind any subtree to x during tree pattern
|
19
|
+
matching.
|
20
|
+
|
21
|
+
+ Transform classes can now include rules into class definition. This makes
|
22
|
+
Parser and Transformer behave the same.
|
23
|
+
|
24
|
+
= 0.9.0 / 28Oct2010
|
2
25
|
* More of everything: Examples, documentation, etc...
|
3
26
|
|
4
27
|
* Breaking change: Ruby's binary or ('|') is now used for alternatives,
|
data/README
CHANGED
@@ -1,48 +1,17 @@
|
|
1
1
|
INTRODUCTION
|
2
2
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
My goal here was to see how a parser/parser generator should be constructed to
|
16
|
-
allow clean AST construction and good error handling. It seems to me that most
|
17
|
-
often, parser generators only handle the success-case and forget about
|
18
|
-
debugging and error generation.
|
19
|
-
|
20
|
-
More specifically, this library is motivated by one of my compiler projects. I
|
21
|
-
started out using 'treetop' (see the link above), but found it unusable. It
|
22
|
-
was lacking in
|
23
|
-
|
24
|
-
* error reporting: Hard to see where a grammar fails.
|
25
|
-
|
26
|
-
* stability of generated trees: Intermediary trees were dictated by the
|
27
|
-
grammar. It was hard to define invariants in that system - what was
|
28
|
-
convenient when writing the grammar often wasn't in subsequent stages.
|
29
|
-
|
30
|
-
* clarity of parser code: The parser code is generated and is very hard
|
31
|
-
to read. Add that to the first point to understand my pain.
|
32
|
-
|
33
|
-
So parslet tries to be different. It doesn't generate the parser, but instead
|
34
|
-
defines it in a DSL which is very close to what you find in [2]. A successful
|
35
|
-
parse then generates a parser tree consisting entirely of hashes and arrays
|
36
|
-
and strings (read: instable). This parser tree can then be converted to a real
|
37
|
-
AST (read: stable) using a pattern matcher that is also part of this library.
|
38
|
-
|
39
|
-
Error reporting is another area where parslet excels: It is able to print not
|
40
|
-
only the error you are used to seeing ('Parse failed because of REASON at line
|
41
|
-
1 and char 2'), but also prints what led to that failure in the form of a
|
42
|
-
tree (#error_tree method).
|
43
|
-
|
44
|
-
[1] http://en.wikipedia.org/wiki/Parsing_expression_grammar
|
45
|
-
[2] http://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf
|
3
|
+
Parslet makes developing complex parsers easy. It does so by
|
4
|
+
|
5
|
+
* providing the best *error reporting* possible
|
6
|
+
* *not generating* reams of code for you to debug
|
7
|
+
|
8
|
+
Parslet takes the long way around to make *your job* easier. It allows for
|
9
|
+
incremental language construction. Often, you start out small, implementing
|
10
|
+
the atoms of your language first; _parslet_ takes pride in making this
|
11
|
+
possible.
|
12
|
+
|
13
|
+
Eager to try this out? Please see the associated web site:
|
14
|
+
http://kschiess.github.com/parslet
|
46
15
|
|
47
16
|
SYNOPSIS
|
48
17
|
|
@@ -56,7 +25,7 @@ SYNOPSIS
|
|
56
25
|
str('"').absnt? >> any
|
57
26
|
).repeat.as(:string) >>
|
58
27
|
str('"')
|
59
|
-
|
28
|
+
|
60
29
|
# Parse the string and capture parts of the interpretation (:string above)
|
61
30
|
tree = parser.parse(%Q{
|
62
31
|
"This is a \\"String\\" in which you can escape stuff"
|
@@ -64,36 +33,24 @@ SYNOPSIS
|
|
64
33
|
|
65
34
|
tree # => {:string=>"This is a \\\"String\\\" in which you can escape stuff"}
|
66
35
|
|
67
|
-
# Here's how you can grab results from that tree:
|
36
|
+
# Here's how you can grab results from that tree, two methods:
|
37
|
+
|
38
|
+
# 1)
|
68
39
|
Pattern.new(:string => simple(:x)).each_match(tree) do |dictionary|
|
69
|
-
puts "String contents: #{dictionary[:x]}"
|
40
|
+
puts "String contents (method 1): #{dictionary[:x]}"
|
70
41
|
end
|
71
|
-
|
72
|
-
# Here's how to transform that tree into something else ----------------------
|
73
42
|
|
74
|
-
#
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
transform.
|
80
|
-
|
81
|
-
# Transforms the tree
|
82
|
-
transform.apply(tree)
|
83
|
-
|
84
|
-
# => #<struct StringLiteral text="This is a \\\"String\\\" ... escape stuff">
|
43
|
+
# 2)
|
44
|
+
transform = Parslet::Transform.new do
|
45
|
+
rule(:string => simple(:x)) {
|
46
|
+
puts "String contents (method 2): #{x}" }
|
47
|
+
end
|
48
|
+
transform.apply(tree)
|
85
49
|
|
86
50
|
COMPATIBILITY
|
87
51
|
|
88
52
|
This library should work with both ruby 1.8 and ruby 1.9.
|
89
53
|
|
90
|
-
AUTHORS
|
91
|
-
|
92
|
-
My gigantous thanks go to the following cool guys and gals that help make this
|
93
|
-
rock:
|
94
|
-
|
95
|
-
Florian Hanke <florian.hanke@gmail.com>
|
96
|
-
|
97
54
|
STATUS
|
98
55
|
|
99
56
|
On the road to 1.0; improving documentation, packaging and upgrading to rspec2.
|
data/Rakefile
CHANGED
@@ -18,7 +18,7 @@ spec = Gem::Specification.new do |s|
|
|
18
18
|
|
19
19
|
# Change these as appropriate
|
20
20
|
s.name = "parslet"
|
21
|
-
s.version = "0.
|
21
|
+
s.version = "0.10.1"
|
22
22
|
s.summary = "Parser construction library with great error reporting in Ruby."
|
23
23
|
s.author = "Kaspar Schiess"
|
24
24
|
s.email = "kaspar.schiess@absurd.li"
|
@@ -34,7 +34,7 @@ spec = Gem::Specification.new do |s|
|
|
34
34
|
|
35
35
|
# If you want to depend on other gems, add them here, along with any
|
36
36
|
# relevant versions
|
37
|
-
|
37
|
+
s.add_dependency("blankslate", "~> 2.1.2.3")
|
38
38
|
|
39
39
|
# If your tests use any gems, include them here
|
40
40
|
s.add_development_dependency("rspec")
|
@@ -60,11 +60,15 @@ end
|
|
60
60
|
|
61
61
|
task :package => :gemspec
|
62
62
|
|
63
|
+
require 'sdoc'
|
64
|
+
|
63
65
|
# Generate documentation
|
64
|
-
Rake::RDocTask.new do |
|
65
|
-
|
66
|
-
|
67
|
-
|
66
|
+
Rake::RDocTask.new do |rdoc|
|
67
|
+
rdoc.options << '--fmt' << 'shtml' # explictly set shtml generator
|
68
|
+
rdoc.template = 'direct' # lighter template used on railsapi.com
|
69
|
+
rdoc.main = "README"
|
70
|
+
rdoc.rdoc_files.include("README", "lib/**/*.rb")
|
71
|
+
rdoc.rdoc_dir = "rdoc"
|
68
72
|
end
|
69
73
|
|
70
74
|
desc 'Clear out RDoc and generated packages'
|
data/lib/parslet.rb
CHANGED
@@ -4,14 +4,9 @@ require 'stringio'
|
|
4
4
|
#
|
5
5
|
# require 'parslet'
|
6
6
|
#
|
7
|
-
# class MyParser
|
8
|
-
# include Parslet
|
9
|
-
#
|
7
|
+
# class MyParser < Parslet::Parser
|
10
8
|
# rule(:a) { str('a').repeat }
|
11
|
-
#
|
12
|
-
# def parse(str)
|
13
|
-
# a.parse(str)
|
14
|
-
# end
|
9
|
+
# root(:a)
|
15
10
|
# end
|
16
11
|
#
|
17
12
|
# pp MyParser.new.parse('aaaa') # => 'aaaa'
|
@@ -36,138 +31,19 @@ require 'stringio'
|
|
36
31
|
# and use the second stage to isolate the rest of your code from the changes
|
37
32
|
# you've effected.
|
38
33
|
#
|
39
|
-
# = Language Atoms
|
40
|
-
#
|
41
|
-
# PEG-style grammars build on a very small number of atoms, or parslets. In
|
42
|
-
# fact, only three types of parslets exist. Here's how to match a string:
|
43
|
-
#
|
44
|
-
# str('a string')
|
45
|
-
#
|
46
|
-
# This matches the string 'a string' literally and nothing else. If your input
|
47
|
-
# doesn't contain the string, it will fail. Here's how to match a character
|
48
|
-
# set:
|
49
|
-
#
|
50
|
-
# match('[abc]')
|
51
|
-
#
|
52
|
-
# This matches 'a', 'b' or 'c'. The string matched will always have a length
|
53
|
-
# of 1; to match longer strings, please see the title below. The last parslet
|
54
|
-
# of the three is 'any':
|
55
|
-
#
|
56
|
-
# any
|
57
|
-
#
|
58
|
-
# 'any' functions like the dot in regular expressions - it matches any single
|
59
|
-
# character.
|
60
|
-
#
|
61
|
-
# = Combination and Repetition
|
62
|
-
#
|
63
|
-
# Parslets only get useful when combined to grammars. To combine one parslet
|
64
|
-
# with the other, you have 4 kinds of methods available: repeat and maybe, >>
|
65
|
-
# (sequence), | (alternation), absnt? and prsnt?.
|
66
|
-
#
|
67
|
-
# str('a').repeat # any number of 'a's, including 0
|
68
|
-
# str('a').maybe # maybe there'll be an 'a', maybe not
|
69
|
-
#
|
70
|
-
# Parslets can be joined using >>. This means: Match the left parslet, then
|
71
|
-
# match the right parslet.
|
72
|
-
#
|
73
|
-
# str('a') >> str('b') # would match 'ab'
|
74
|
-
#
|
75
|
-
# Keep in mind that all combination and repetition operators themselves return
|
76
|
-
# a parslet. You can combine the result again:
|
77
|
-
#
|
78
|
-
# ( str('a') >> str('b') ) >> str('c') # would match 'abc'
|
79
|
-
#
|
80
|
-
# The slash ('|') indicates alternatives:
|
81
|
-
#
|
82
|
-
# str('a') | str('b') # would match 'a' OR 'b'
|
83
|
-
#
|
84
|
-
# The left side of an alternative is matched first; if it matches, the right
|
85
|
-
# side is never looked at.
|
86
|
-
#
|
87
|
-
# The absnt? and prsnt? qualifiers allow looking at input without consuming
|
88
|
-
# it:
|
89
|
-
#
|
90
|
-
# str('a').absnt? # will match if at the current position there is an 'a'.
|
91
|
-
# str('a').absnt? >> str('b') # check for 'a' then match 'b'
|
92
|
-
#
|
93
|
-
# This means that the second example will not match any input; when the second
|
94
|
-
# part is parsed, the first part has asserted the presence of 'a', and thus
|
95
|
-
# str('b') cannot match. The prsnt? method is the opposite of absnt?, it
|
96
|
-
# asserts presence.
|
97
|
-
#
|
98
|
-
# More documentation on these methods can be found in Parslets::Atoms::Base.
|
99
|
-
#
|
100
|
-
# = Intermediary Parse Trees
|
101
|
-
#
|
102
|
-
# As you have probably seen above, you can hand input (strings or StringIOs) to
|
103
|
-
# your parslets like this:
|
104
|
-
#
|
105
|
-
# parslet.parse(str)
|
106
|
-
#
|
107
|
-
# This returns an intermediary parse tree or raises an exception
|
108
|
-
# (Parslet::ParseFailed) when the input is not well formed.
|
109
|
-
#
|
110
|
-
# Intermediary parse trees are essentially just Plain Old Ruby Objects. (PORO
|
111
|
-
# technology as we call it.) Parslets try very hard to return sensible stuff;
|
112
|
-
# it is quite easy to use the results for the later stages of your program.
|
113
|
-
#
|
114
|
-
# Here a few examples and what their intermediary tree looks like:
|
115
|
-
#
|
116
|
-
# str('foo').parse('foo') # => 'foo'
|
117
|
-
# (str('f') >> str('o') >> str('o')).parse('foo') # => 'foo'
|
118
|
-
#
|
119
|
-
# Naming parslets
|
120
|
-
#
|
121
|
-
# Construction of lambda blocks
|
122
|
-
#
|
123
|
-
# = Intermediary Tree transformation
|
124
|
-
#
|
125
|
-
# The intermediary parse tree by itself is most often not very useful. Its
|
126
|
-
# form is volatile; changing your parser in the slightest might produce
|
127
|
-
# profound changes in the generated trees.
|
128
|
-
#
|
129
|
-
# Generally you will want to construct a more stable tree using your own
|
130
|
-
# carefully crafted representation of the domain. Parslet provides you with
|
131
|
-
# an elegant way of transmogrifying your intermediary tree into the output
|
132
|
-
# format you choose. This is achieved by transformation rules such as this
|
133
|
-
# one:
|
134
|
-
#
|
135
|
-
# transform.rule(:literal => {:string => :_x}) { |d|
|
136
|
-
# StringLit.new(*d.values) }
|
137
|
-
#
|
138
|
-
# The above rule will transform a subtree looking like this:
|
139
|
-
#
|
140
|
-
# :literal
|
141
|
-
# |
|
142
|
-
# :string
|
143
|
-
# |
|
144
|
-
# "somestring"
|
145
|
-
#
|
146
|
-
# into just this:
|
147
|
-
#
|
148
|
-
# StringLit
|
149
|
-
# value: "somestring"
|
150
|
-
#
|
151
|
-
#
|
152
|
-
# = Further documentation
|
153
|
-
#
|
154
|
-
# Please see the examples subdirectory of the distribution for more examples.
|
155
|
-
# Check out 'rooc' (github.com/kschiess/rooc) as well - it uses parslet for
|
156
|
-
# compiler construction.
|
157
|
-
#
|
158
34
|
module Parslet
|
159
35
|
def self.included(base)
|
160
36
|
base.extend(ClassMethods)
|
161
37
|
end
|
162
38
|
|
163
|
-
#
|
164
|
-
#
|
165
|
-
#
|
39
|
+
# Raised when the parse failed to match or to consume all its input. It
|
40
|
+
# contains the message that should be presented to the user. If you want to
|
41
|
+
# display more error explanation, you can print the #error_tree that is
|
166
42
|
# stored in the parslet. This is a graphical representation of what went
|
167
43
|
# wrong.
|
168
44
|
#
|
169
45
|
# Example:
|
170
|
-
#
|
46
|
+
#
|
171
47
|
# begin
|
172
48
|
# parslet.parse(str)
|
173
49
|
# rescue Parslet::ParseFailed => failure
|
@@ -181,6 +57,7 @@ module Parslet
|
|
181
57
|
# Define the parsers #root function. This is the place where you start
|
182
58
|
# parsing; if you have a rule for 'file' that describes what should be
|
183
59
|
# in a file, this would be your root declaration:
|
60
|
+
#
|
184
61
|
# class Parser
|
185
62
|
# root :file
|
186
63
|
# rule(:file) { ... }
|
@@ -205,9 +82,9 @@ module Parslet
|
|
205
82
|
end
|
206
83
|
end
|
207
84
|
|
208
|
-
# Define an entity for the parser. This generates a method of the same
|
209
|
-
# that can be used as part of other patterns. Those methods can be
|
210
|
-
# mixed in your parser class with real ruby methods.
|
85
|
+
# Define an entity for the parser. This generates a method of the same
|
86
|
+
# name that can be used as part of other patterns. Those methods can be
|
87
|
+
# freely mixed in your parser class with real ruby methods.
|
211
88
|
#
|
212
89
|
# Example:
|
213
90
|
#
|
@@ -233,6 +110,14 @@ module Parslet
|
|
233
110
|
end
|
234
111
|
end
|
235
112
|
|
113
|
+
# Allows for delayed construction of #match.
|
114
|
+
#
|
115
|
+
class DelayedMatchConstructor
|
116
|
+
def [](str)
|
117
|
+
Atoms::Re.new("[" + str + "]")
|
118
|
+
end
|
119
|
+
end
|
120
|
+
|
236
121
|
# Returns an atom matching a character class. This is essentially a regular
|
237
122
|
# expression, but you should only match a single character.
|
238
123
|
#
|
@@ -241,8 +126,10 @@ module Parslet
|
|
241
126
|
# match('[ab]') # will match either 'a' or 'b'
|
242
127
|
# match('[\n\s]') # will match newlines and spaces
|
243
128
|
#
|
244
|
-
def match(
|
245
|
-
|
129
|
+
def match(str=nil)
|
130
|
+
return DelayedMatchConstructor.new unless str
|
131
|
+
|
132
|
+
return Atoms::Re.new(str)
|
246
133
|
end
|
247
134
|
module_function :match
|
248
135
|
|
@@ -263,7 +150,19 @@ module Parslet
|
|
263
150
|
Atoms::Re.new('.')
|
264
151
|
end
|
265
152
|
module_function :any
|
266
|
-
|
153
|
+
|
154
|
+
# A special kind of atom that allows embedding whole treetop expressions
|
155
|
+
# into parslet construction.
|
156
|
+
#
|
157
|
+
# Example:
|
158
|
+
#
|
159
|
+
# exp(%Q("a" "b"?)) # => returns the same as str('a') >> str('b').maybe
|
160
|
+
#
|
161
|
+
def exp(str)
|
162
|
+
Parslet::Expression.new(str).to_parslet
|
163
|
+
end
|
164
|
+
module_function :exp
|
165
|
+
|
267
166
|
# Returns a placeholder for a tree transformation that will only match a
|
268
167
|
# sequence of elements. The +symbol+ you specify will be the key for the
|
269
168
|
# matched sequence in the returned dictionary.
|
@@ -292,10 +191,24 @@ module Parslet
|
|
292
191
|
Pattern::SimpleBind.new(symbol)
|
293
192
|
end
|
294
193
|
module_function :simple
|
194
|
+
|
195
|
+
# Returns a placeholder for tree transformation patterns that will match
|
196
|
+
# any kind of subtree.
|
197
|
+
#
|
198
|
+
# Example:
|
199
|
+
#
|
200
|
+
# { :expression => subtree(:exp) }
|
201
|
+
#
|
202
|
+
def subtree(symbol)
|
203
|
+
Pattern::SubtreeBind.new(symbol)
|
204
|
+
end
|
205
|
+
|
206
|
+
autoload :Expression, 'parslet/expression'
|
295
207
|
end
|
296
208
|
|
297
209
|
require 'parslet/error_tree'
|
298
210
|
require 'parslet/atoms'
|
299
211
|
require 'parslet/pattern'
|
300
212
|
require 'parslet/pattern/binding'
|
301
|
-
require 'parslet/transform'
|
213
|
+
require 'parslet/transform'
|
214
|
+
require 'parslet/parser'
|
data/lib/parslet/atoms.rb
CHANGED
@@ -1,4 +1,7 @@
|
|
1
1
|
module Parslet::Atoms
|
2
|
+
# The precedence module controls parenthesis during the #inspect printing
|
3
|
+
# of parslets. It is not relevant to other aspects of the parsing.
|
4
|
+
#
|
2
5
|
module Precedence
|
3
6
|
prec = 0
|
4
7
|
BASE = (prec+=1) # everything else
|
@@ -9,484 +12,14 @@ module Parslet::Atoms
|
|
9
12
|
OUTER = (prec+=1) # printing is done here.
|
10
13
|
end
|
11
14
|
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
result = apply(io)
|
22
|
-
|
23
|
-
# If we haven't consumed the input, then the pattern doesn't match. Try
|
24
|
-
# to provide a good error message (even asking down below)
|
25
|
-
unless io.eof?
|
26
|
-
# Do we know why we stopped matching input? If yes, that's a good
|
27
|
-
# error to fail with. Otherwise just report that we cannot consume the
|
28
|
-
# input.
|
29
|
-
if cause
|
30
|
-
raise Parslet::ParseFailed, "Unconsumed input, maybe because of this: #{cause}"
|
31
|
-
else
|
32
|
-
error(io, "Don't know what to do with #{io.string[io.pos,100]}")
|
33
|
-
end
|
34
|
-
end
|
35
|
-
|
36
|
-
return flatten(result)
|
37
|
-
end
|
38
|
-
|
39
|
-
def apply(io)
|
40
|
-
# p [:start, self, io.string[io.pos, 10]]
|
41
|
-
|
42
|
-
old_pos = io.pos
|
43
|
-
|
44
|
-
# p [:try, self, io.string[io.pos, 20]]
|
45
|
-
begin
|
46
|
-
r = try(io)
|
47
|
-
# p [:return_from, self, flatten(r)]
|
48
|
-
@last_cause = nil
|
49
|
-
return r
|
50
|
-
rescue Parslet::ParseFailed => ex
|
51
|
-
# p [:failing, self, io.string[io.pos, 20]]
|
52
|
-
io.pos = old_pos; raise ex
|
53
|
-
end
|
54
|
-
end
|
55
|
-
|
56
|
-
def repeat(min=0, max=nil)
|
57
|
-
Repetition.new(self, min, max)
|
58
|
-
end
|
59
|
-
def maybe
|
60
|
-
Repetition.new(self, 0, 1, :maybe)
|
61
|
-
end
|
62
|
-
def >>(parslet)
|
63
|
-
Sequence.new(self, parslet)
|
64
|
-
end
|
65
|
-
def |(parslet)
|
66
|
-
Alternative.new(self, parslet)
|
67
|
-
end
|
68
|
-
def absnt?
|
69
|
-
Lookahead.new(self, false)
|
70
|
-
end
|
71
|
-
def prsnt?
|
72
|
-
Lookahead.new(self, true)
|
73
|
-
end
|
74
|
-
def as(name)
|
75
|
-
Named.new(self, name)
|
76
|
-
end
|
77
|
-
|
78
|
-
def flatten(value)
|
79
|
-
# Passes through everything that isn't an array of things
|
80
|
-
return value unless value.instance_of? Array
|
81
|
-
|
82
|
-
# Extracts the s-expression tag
|
83
|
-
tag, *tail = value
|
84
|
-
|
85
|
-
# Merges arrays:
|
86
|
-
result = tail.
|
87
|
-
map { |e| flatten(e) } # first flatten each element
|
88
|
-
|
89
|
-
case tag
|
90
|
-
when :sequence
|
91
|
-
return flatten_sequence(result)
|
92
|
-
when :maybe
|
93
|
-
return result.first
|
94
|
-
when :repetition
|
95
|
-
return flatten_repetition(result)
|
96
|
-
end
|
97
|
-
|
98
|
-
fail "BUG: Unknown tag #{tag.inspect}."
|
99
|
-
end
|
100
|
-
def flatten_sequence(list)
|
101
|
-
list.inject('') { |r, e| # and then merge flat elements
|
102
|
-
case [r, e].map { |o| o.class }
|
103
|
-
when [Hash, Hash] # two keyed subtrees: make one
|
104
|
-
warn_about_duplicate_keys(r, e)
|
105
|
-
r.merge(e)
|
106
|
-
# a keyed tree and an array (push down)
|
107
|
-
when [Hash, Array]
|
108
|
-
[r] + e
|
109
|
-
when [Array, Hash]
|
110
|
-
r + [e]
|
111
|
-
when [String, String]
|
112
|
-
r << e
|
113
|
-
else
|
114
|
-
if r.instance_of? Hash
|
115
|
-
r # Ignore e, since its not a hash we can merge
|
116
|
-
else
|
117
|
-
e # Whatever e is at this point, we keep it
|
118
|
-
end
|
119
|
-
end
|
120
|
-
}
|
121
|
-
end
|
122
|
-
def flatten_repetition(list)
|
123
|
-
if list.any? { |e| e.instance_of?(Hash) }
|
124
|
-
# If keyed subtrees are in the array, we'll want to discard all
|
125
|
-
# strings inbetween. To keep them, name them.
|
126
|
-
return list.select { |e| e.instance_of?(Hash) }
|
127
|
-
end
|
128
|
-
|
129
|
-
if list.any? { |e| e.instance_of?(Array) }
|
130
|
-
# If any arrays are nested in this array, flatten all arrays to this
|
131
|
-
# level.
|
132
|
-
return list.
|
133
|
-
select { |e| e.instance_of?(Array) }.
|
134
|
-
flatten(1)
|
135
|
-
end
|
136
|
-
|
137
|
-
# If there are only strings, concatenate them and return that.
|
138
|
-
list.inject('') { |s,e| s<<(e||'') }
|
139
|
-
end
|
140
|
-
|
141
|
-
def self.precedence(prec)
|
142
|
-
define_method(:precedence) { prec }
|
143
|
-
end
|
144
|
-
precedence Precedence::BASE
|
145
|
-
def to_s(outer_prec)
|
146
|
-
if outer_prec < precedence
|
147
|
-
"("+to_s_inner(precedence)+")"
|
148
|
-
else
|
149
|
-
to_s_inner(precedence)
|
150
|
-
end
|
151
|
-
end
|
152
|
-
def inspect
|
153
|
-
to_s(Precedence::OUTER)
|
154
|
-
end
|
155
|
-
|
156
|
-
# Cause should return the current best approximation of this parslet
|
157
|
-
# of what went wrong with the parse. Not relevant if the parse succeeds,
|
158
|
-
# but needed for clever error reports.
|
159
|
-
#
|
160
|
-
def cause
|
161
|
-
@last_cause
|
162
|
-
end
|
163
|
-
|
164
|
-
# Error tree returns what went wrong here plus what went wrong inside
|
165
|
-
# subexpressions as a tree. The error stored for this node will be equal
|
166
|
-
# with #cause.
|
167
|
-
#
|
168
|
-
def error_tree
|
169
|
-
Parslet::ErrorTree.new(self) if cause?
|
170
|
-
end
|
171
|
-
def cause?
|
172
|
-
not @last_cause.nil?
|
173
|
-
end
|
174
|
-
private
|
175
|
-
# Report/raise a parse error with the given message, printing the current
|
176
|
-
# position as well. Appends 'at line X char Y.' to the message you give.
|
177
|
-
# If +pos+ is given, it is used as the real position the error happened,
|
178
|
-
# correcting the io's current position.
|
179
|
-
#
|
180
|
-
def error(io, str, pos=nil)
|
181
|
-
pre = io.string[0..(pos||io.pos)]
|
182
|
-
lines = Array(pre.lines)
|
183
|
-
|
184
|
-
if lines.empty?
|
185
|
-
formatted_cause = str
|
186
|
-
else
|
187
|
-
pos = lines.last.length
|
188
|
-
formatted_cause = "#{str} at line #{lines.count} char #{pos}."
|
189
|
-
end
|
190
|
-
|
191
|
-
@last_cause = formatted_cause
|
192
|
-
|
193
|
-
raise Parslet::ParseFailed, formatted_cause, nil
|
194
|
-
end
|
195
|
-
def warn_about_duplicate_keys(h1, h2)
|
196
|
-
d = h1.keys & h2.keys
|
197
|
-
unless d.empty?
|
198
|
-
warn "Duplicate subtrees while merging result of \n #{self.inspect}\nonly the values"+
|
199
|
-
" of the latter will be kept. (keys: #{d.inspect})"
|
200
|
-
end
|
201
|
-
end
|
202
|
-
end
|
203
|
-
|
204
|
-
class Named < Base
|
205
|
-
attr_reader :parslet, :name
|
206
|
-
def initialize(parslet, name)
|
207
|
-
@parslet, @name = parslet, name
|
208
|
-
end
|
209
|
-
|
210
|
-
def apply(io)
|
211
|
-
value = parslet.apply(io)
|
212
|
-
|
213
|
-
produce_return_value value
|
214
|
-
end
|
215
|
-
|
216
|
-
def to_s_inner(prec)
|
217
|
-
"#{name}:#{parslet.to_s(prec)}"
|
218
|
-
end
|
219
|
-
|
220
|
-
def error_tree
|
221
|
-
parslet.error_tree
|
222
|
-
end
|
223
|
-
private
|
224
|
-
def produce_return_value(val)
|
225
|
-
{ name => flatten(val) }
|
226
|
-
end
|
227
|
-
end
|
228
|
-
|
229
|
-
class Lookahead < Base
|
230
|
-
attr_reader :positive
|
231
|
-
attr_reader :bound_parslet
|
232
|
-
|
233
|
-
def initialize(bound_parslet, positive=true)
|
234
|
-
# Model positive and negative lookahead by testing this flag.
|
235
|
-
@positive = positive
|
236
|
-
@bound_parslet = bound_parslet
|
237
|
-
end
|
238
|
-
|
239
|
-
def try(io)
|
240
|
-
pos = io.pos
|
241
|
-
begin
|
242
|
-
bound_parslet.apply(io)
|
243
|
-
rescue Parslet::ParseFailed
|
244
|
-
return fail(io)
|
245
|
-
ensure
|
246
|
-
io.pos = pos
|
247
|
-
end
|
248
|
-
return success(io)
|
249
|
-
end
|
250
|
-
|
251
|
-
def fail(io)
|
252
|
-
if positive
|
253
|
-
error(io, "lookahead: #{bound_parslet.inspect} didn't match, but should have")
|
254
|
-
else
|
255
|
-
# TODO: Squash this down to nothing? Return value handling here...
|
256
|
-
return nil
|
257
|
-
end
|
258
|
-
end
|
259
|
-
def success(io)
|
260
|
-
if positive
|
261
|
-
return nil # see above, TODO
|
262
|
-
else
|
263
|
-
error(
|
264
|
-
io,
|
265
|
-
"negative lookahead: #{bound_parslet.inspect} matched, but shouldn't have")
|
266
|
-
end
|
267
|
-
end
|
268
|
-
|
269
|
-
precedence Precedence::LOOKAHEAD
|
270
|
-
def to_s_inner(prec)
|
271
|
-
char = positive ? '&' : '!'
|
272
|
-
|
273
|
-
"#{char}#{bound_parslet.to_s(prec)}"
|
274
|
-
end
|
275
|
-
|
276
|
-
def error_tree
|
277
|
-
bound_parslet.error_tree
|
278
|
-
end
|
279
|
-
end
|
280
|
-
|
281
|
-
class Alternative < Base
|
282
|
-
attr_reader :alternatives
|
283
|
-
def initialize(*alternatives)
|
284
|
-
@alternatives = alternatives
|
285
|
-
end
|
286
|
-
|
287
|
-
def |(parslet)
|
288
|
-
@alternatives << parslet
|
289
|
-
self
|
290
|
-
end
|
291
|
-
|
292
|
-
def try(io)
|
293
|
-
alternatives.each { |a|
|
294
|
-
begin
|
295
|
-
return a.apply(io)
|
296
|
-
rescue Parslet::ParseFailed => ex
|
297
|
-
end
|
298
|
-
}
|
299
|
-
# If we reach this point, all alternatives have failed.
|
300
|
-
error(io, "Expected one of #{alternatives.inspect}.")
|
301
|
-
end
|
302
|
-
|
303
|
-
precedence Precedence::ALTERNATE
|
304
|
-
def to_s_inner(prec)
|
305
|
-
alternatives.map { |a| a.to_s(prec) }.join(' | ')
|
306
|
-
end
|
307
|
-
|
308
|
-
def error_tree
|
309
|
-
Parslet::ErrorTree.new(self, *alternatives.
|
310
|
-
map { |child| child.error_tree })
|
311
|
-
end
|
312
|
-
end
|
313
|
-
|
314
|
-
# A sequence of parslets, matched from left to right. Denoted by '>>'
|
315
|
-
#
|
316
|
-
class Sequence < Base
|
317
|
-
attr_reader :parslets
|
318
|
-
def initialize(*parslets)
|
319
|
-
@parslets = parslets
|
320
|
-
end
|
321
|
-
|
322
|
-
def >>(parslet)
|
323
|
-
@parslets << parslet
|
324
|
-
self
|
325
|
-
end
|
326
|
-
|
327
|
-
def try(io)
|
328
|
-
[:sequence]+parslets.map { |p|
|
329
|
-
# Save each parslet as potentially offending (raising an error).
|
330
|
-
@offending_parslet = p
|
331
|
-
p.apply(io)
|
332
|
-
}
|
333
|
-
rescue Parslet::ParseFailed
|
334
|
-
error(io, "Failed to match sequence (#{self.inspect})")
|
335
|
-
end
|
336
|
-
|
337
|
-
precedence Precedence::SEQUENCE
|
338
|
-
def to_s_inner(prec)
|
339
|
-
parslets.map { |p| p.to_s(prec) }.join(' ')
|
340
|
-
end
|
341
|
-
|
342
|
-
def error_tree
|
343
|
-
Parslet::ErrorTree.new(self).tap { |t|
|
344
|
-
t.children << @offending_parslet.error_tree if @offending_parslet }
|
345
|
-
end
|
346
|
-
end
|
347
|
-
|
348
|
-
class Repetition < Base
|
349
|
-
attr_reader :min, :max, :parslet
|
350
|
-
def initialize(parslet, min, max, tag=:repetition)
|
351
|
-
@parslet = parslet
|
352
|
-
@min, @max = min, max
|
353
|
-
@tag = tag
|
354
|
-
end
|
355
|
-
|
356
|
-
def try(io)
|
357
|
-
occ = 0
|
358
|
-
result = [@tag] # initialize the result array with the tag (for flattening)
|
359
|
-
loop do
|
360
|
-
begin
|
361
|
-
result << parslet.apply(io)
|
362
|
-
occ += 1
|
363
|
-
|
364
|
-
# If we're not greedy (max is defined), check if that has been
|
365
|
-
# reached.
|
366
|
-
return result if max && occ>=max
|
367
|
-
rescue Parslet::ParseFailed => ex
|
368
|
-
# Greedy matcher has produced a failure. Check if occ (which will
|
369
|
-
# contain the number of sucesses) is in {min, max}.
|
370
|
-
# p [:repetition, occ, min, max]
|
371
|
-
error(io, "Expected at least #{min} of #{parslet.inspect}") if occ < min
|
372
|
-
return result
|
373
|
-
end
|
374
|
-
end
|
375
|
-
end
|
376
|
-
|
377
|
-
precedence Precedence::REPETITION
|
378
|
-
def to_s_inner(prec)
|
379
|
-
minmax = "{#{min}, #{max}}"
|
380
|
-
minmax = '?' if min == 0 && max == 1
|
381
|
-
|
382
|
-
parslet.to_s(prec) + minmax
|
383
|
-
end
|
384
|
-
|
385
|
-
def cause
|
386
|
-
# Either the repetition failed or the parslet inside failed to repeat.
|
387
|
-
super || parslet.cause
|
388
|
-
end
|
389
|
-
def error_tree
|
390
|
-
if cause?
|
391
|
-
Parslet::ErrorTree.new(self, parslet.error_tree)
|
392
|
-
else
|
393
|
-
parslet.error_tree
|
394
|
-
end
|
395
|
-
end
|
396
|
-
end
|
397
|
-
|
398
|
-
# Matches a special kind of regular expression that only ever matches one
|
399
|
-
# character at a time. Useful members of this family are: character ranges,
|
400
|
-
# \w, \d, \r, \n, ...
|
401
|
-
#
|
402
|
-
class Re < Base
|
403
|
-
attr_reader :match
|
404
|
-
def initialize(match)
|
405
|
-
@match = match
|
406
|
-
end
|
407
|
-
|
408
|
-
def try(io)
|
409
|
-
r = Regexp.new(match, Regexp::MULTILINE)
|
410
|
-
s = io.read(1)
|
411
|
-
error(io, "Premature end of input") unless s
|
412
|
-
error(io, "Failed to match #{match.inspect[1..-2]}") unless s.match(r)
|
413
|
-
return s
|
414
|
-
end
|
415
|
-
|
416
|
-
def to_s_inner(prec)
|
417
|
-
match.inspect[1..-2]
|
418
|
-
end
|
419
|
-
end
|
420
|
-
|
421
|
-
# Matches a string of characters.
|
422
|
-
#
|
423
|
-
class Str < Base
|
424
|
-
attr_reader :str
|
425
|
-
def initialize(str)
|
426
|
-
@str = str
|
427
|
-
end
|
428
|
-
|
429
|
-
def try(io)
|
430
|
-
old_pos = io.pos
|
431
|
-
s = io.read(str.size)
|
432
|
-
error(io, "Premature end of input") unless s && s.size==str.size
|
433
|
-
error(io, "Expected #{str.inspect}, but got #{s.inspect}", old_pos) \
|
434
|
-
unless s==str
|
435
|
-
return s
|
436
|
-
end
|
437
|
-
|
438
|
-
def to_s_inner(prec)
|
439
|
-
"'#{str}'"
|
440
|
-
end
|
441
|
-
end
|
442
|
-
|
443
|
-
# This wraps pieces of parslet definition and gives them a name. The wrapped
|
444
|
-
# piece is lazily evaluated and cached. This has two purposes:
|
445
|
-
#
|
446
|
-
# a) Avoid infinite recursion during evaluation of the definition
|
447
|
-
#
|
448
|
-
# b) Be able to print things by their name, not by their sometimes
|
449
|
-
# complicated content.
|
450
|
-
#
|
451
|
-
# You don't normally use this directly, instead you should generated it by
|
452
|
-
# using the structuring method Parslet#rule.
|
453
|
-
#
|
454
|
-
class Entity < Base
|
455
|
-
attr_reader :name, :context, :block
|
456
|
-
def initialize(name, context, block)
|
457
|
-
super()
|
458
|
-
|
459
|
-
@name = name
|
460
|
-
@context = context
|
461
|
-
@block = block
|
462
|
-
end
|
463
|
-
|
464
|
-
def try(io)
|
465
|
-
parslet.apply(io)
|
466
|
-
end
|
467
|
-
|
468
|
-
def parslet
|
469
|
-
@parslet ||= context.instance_eval(&block).tap { |p|
|
470
|
-
raise_not_implemented unless p
|
471
|
-
}
|
472
|
-
end
|
473
|
-
|
474
|
-
def to_s_inner(prec)
|
475
|
-
name.to_s.upcase
|
476
|
-
end
|
477
|
-
|
478
|
-
def error_tree
|
479
|
-
parslet.error_tree
|
480
|
-
end
|
481
|
-
|
482
|
-
private
|
483
|
-
def raise_not_implemented
|
484
|
-
trace = caller.reject {|l| l =~ %r{#{Regexp.escape(__FILE__)}}} # blatantly stolen from dependencies.rb in activesupport
|
485
|
-
exception = NotImplementedError.new("rule(#{name.inspect}) { ... } returns nil. Still not implemented, but already used?")
|
486
|
-
exception.set_backtrace(trace)
|
487
|
-
|
488
|
-
raise exception
|
489
|
-
end
|
490
|
-
end
|
15
|
+
autoload :Base, 'parslet/atoms/base'
|
16
|
+
autoload :Named, 'parslet/atoms/named'
|
17
|
+
autoload :Lookahead, 'parslet/atoms/lookahead'
|
18
|
+
autoload :Alternative, 'parslet/atoms/alternative'
|
19
|
+
autoload :Sequence, 'parslet/atoms/sequence'
|
20
|
+
autoload :Repetition, 'parslet/atoms/repetition'
|
21
|
+
autoload :Re, 'parslet/atoms/re'
|
22
|
+
autoload :Str, 'parslet/atoms/str'
|
23
|
+
autoload :Entity, 'parslet/atoms/entity'
|
491
24
|
end
|
492
25
|
|