rdf-turtle 0.1.2 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -1,4 +1,4 @@
1
- # RDF::Turtle reader/writer
1
+ # RDF::Turtle reader/writer [![Build Status](https://secure.travis-ci.org/ruby-rdf/rdf-turtle.png?branch=master)](http://travis-ci.org/ruby-rdf/rdf-turtle)
2
2
  [Turtle][] reader/writer for [RDF.rb][RDF.rb] .
3
3
 
4
4
  ## Description
@@ -48,7 +48,7 @@ In some cases, the specification is unclear on certain issues:
48
48
  cannot if the IRI contains any characters that might need escaping. This implementation currently abides
49
49
  by this restriction. Presumably, this would affect both PNAME\_NS and PNAME\_LN terminals.
50
50
  (This is being tracked as issues [67](http://www.w3.org/2011/rdf-wg/track/issues/67)).
51
- * The EBNF definition of IRI_REF seems malformed, and has no provision for \^, as discussed elsewhere in the spec.
51
+ * The EBNF definition of IRIREF seems malformed, and has no provision for \^, as discussed elsewhere in the spec.
52
52
  We presume that [#0000- ] is intended to be [#0000-#0020].
53
53
  * The list example in section 6 uses a list on it's own, without a predicate or object, which is not allowed
54
54
  by the grammar (neither is a blankNodeProperyList). Either the EBNF should be updated to allow for these
@@ -128,9 +128,9 @@ see <http://unlicense.org/> or the accompanying {file:UNLICENSE} file.
128
128
  [YARD]: http://yardoc.org/
129
129
  [YARD-GS]: http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
130
130
  [PDD]: http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
131
- [RDF.rb]: http://rubydoc.info/github/gkellogg/rdf/master/frames
131
+ [RDF.rb]: http://rubydoc.info/github/ruby-rdf/rdf/master/frames
132
132
  [Backports]: http://rubygems.org/gems/backports
133
133
  [N-Triples]: http://www.w3.org/TR/rdf-testcases/#ntriples
134
- [Turtle]: http://www.w3.org/TR/2011/WD-turtle-20110809/
134
+ [Turtle]: http://www.w3.org/TR/2012/WD-turtle-20120710/
135
135
  [Turtle doc]: http://rubydoc.info/github/ruby-rdf/rdf-turtle/master/file/README.markdown
136
- [Turtle EBNF]: http://www.w3.org/TR/2011/WD-turtle-20110809/turtle.bnf
136
+ [Turtle EBNF]: http://dvcs.w3.org/hg/rdf/file/8610b8f58685/rdf-turtle/turtle.bnf
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.2
1
+ 0.3.0
@@ -0,0 +1,620 @@
1
+ require 'strscan'
2
+
3
+ # Extended Bakus-Nour Form (EBNF), being the W3C variation is
4
+ # originaly defined in the
5
+ # [W3C XML 1.0 Spec](http://www.w3.org/TR/REC-xml/#sec-notation).
6
+ #
7
+ # This version attempts to be less strict than the strict definition
8
+ # to allow for coloquial variations (such as in the Turtle syntax).
9
+ #
10
+ # A rule takes the following form:
11
+ # [1] symbol ::= expression
12
+ #
13
+ # Comments include the content between '/*' and '*/'
14
+ #
15
+ # @see http://www.w3.org/2000/10/swap/grammar/ebnf2turtle.py
16
+ # @see http://www.w3.org/2000/10/swap/grammar/ebnf2bnf.n3
17
+ #
18
+ # Based on bnf2turtle by Dan Connolly.
19
+ #
20
+ # Motivation
21
+ # ----------
22
+ #
23
+ # Many specifications include grammars that look formal but are not
24
+ # actually checked, by machine, against test data sets. Debugging the
25
+ # grammar in the XML specification has been a long, tedious manual
26
+ # process. Only when the loop is closed between a fully formal grammar
27
+ # and a large test data set can we be confident that we have an accurate
28
+ # specification of a language [#]_.
29
+ #
30
+ #
31
+ # The grammar in the `N3 design note`_ has evolved based on the original
32
+ # manual transcription into a python recursive-descent parser and
33
+ # subsequent development of test cases. Rather than maintain the grammar
34
+ # and the parser independently, our goal_ is to formalize the language
35
+ # syntax sufficiently to replace the manual implementation with one
36
+ # derived mechanically from the specification.
37
+ #
38
+ #
39
+ # .. [#] and even then, only the syntax of the language.
40
+ # .. _N3 design note: http://www.w3.org/DesignIssues/Notation3
41
+ #
42
+ # Related Work
43
+ # ------------
44
+ #
45
+ # Sean Palmer's `n3p announcement`_ demonstrated the feasibility of the
46
+ # approach, though that work did not cover some aspects of N3.
47
+ #
48
+ # In development of the `SPARQL specification`_, Eric Prud'hommeaux
49
+ # developed Yacker_, which converts EBNF syntax to perl and C and C++
50
+ # yacc grammars. It includes an interactive facility for checking
51
+ # strings against the resulting grammars.
52
+ # Yosi Scharf used it in `cwm Release 1.1.0rc1`_, which includes
53
+ # a SPAQRL parser that is *almost* completely mechanically generated.
54
+ #
55
+ # The N3/turtle output from yacker is lower level than the EBNF notation
56
+ # from the XML specification; it has the ?, +, and * operators compiled
57
+ # down to pure context-free rules, obscuring the grammar
58
+ # structure. Since that transformation is straightforwardly expressed in
59
+ # semantic web rules (see bnf-rules.n3_), it seems best to keep the RDF
60
+ # expression of the grammar in terms of the higher level EBNF
61
+ # constructs.
62
+ #
63
+ # .. _goal: http://www.w3.org/2002/02/mid/1086902566.21030.1479.camel@dirk;list=public-cwm-bugs
64
+ # .. _n3p announcement: http://lists.w3.org/Archives/Public/public-cwm-talk/2004OctDec/0029.html
65
+ # .. _Yacker: http://www.w3.org/1999/02/26-modules/User/Yacker
66
+ # .. _SPARQL specification: http://www.w3.org/TR/rdf-sparql-query/
67
+ # .. _Cwm Release 1.1.0rc1: http://lists.w3.org/Archives/Public/public-cwm-announce/2005JulSep/0000.html
68
+ # .. _bnf-rules.n3: http://www.w3.org/2000/10/swap/grammar/bnf-rules.n3
69
+ #
70
+ # Open Issues and Future Work
71
+ # ---------------------------
72
+ #
73
+ # The yacker output also has the terminals compiled to elaborate regular
74
+ # expressions. The best strategy for dealing with lexical tokens is not
75
+ # yet clear. Many tokens in SPARQL are case insensitive; this is not yet
76
+ # captured formally.
77
+ #
78
+ # The schema for the EBNF vocabulary used here (``g:seq``, ``g:alt``, ...)
79
+ # is not yet published; it should be aligned with `swap/grammar/bnf`_
80
+ # and the bnf2html.n3_ rules (and/or the style of linked XHTML grammar
81
+ # in the SPARQL and XML specificiations).
82
+ #
83
+ # It would be interesting to corroborate the claim in the SPARQL spec
84
+ # that the grammar is LL(1) with a mechanical proof based on N3 rules.
85
+ #
86
+ # .. _swap/grammar/bnf: http://www.w3.org/2000/10/swap/grammar/bnf
87
+ # .. _bnf2html.n3: http://www.w3.org/2000/10/swap/grammar/bnf2html.n3
88
+ #
89
+ #
90
+ #
91
+ # Background
92
+ # ----------
93
+ #
94
+ # The `N3 Primer`_ by Tim Berners-Lee introduces RDF and the Semantic
95
+ # web using N3, a teaching and scribbling language. Turtle is a subset
96
+ # of N3 that maps directly to (and from) the standard XML syntax for
97
+ # RDF.
98
+ #
99
+ #
100
+ #
101
+ # .. _N3 Primer: _http://www.w3.org/2000/10/swap/Primer.html
102
+ #
103
+ # @author Gregg Kellogg
104
+ class EBNF
105
+ class Rule
106
+ # @attr [Symbol] sym
107
+ attr_reader :sym
108
+ # @attr [String] id
109
+ attr_reader :id
110
+ # @attr [Symbol] kind one of :rule, :token, or :pass
111
+ attr_accessor :kind
112
+ # @attr [Array] expr
113
+ attr_reader :expr
114
+ # @attr [String] orig
115
+ attr_accessor :orig
116
+
117
+ # @param [Integer] id
118
+ # @param [Symbol] sym
119
+ # @param [Array] expr
120
+ # @param [String] orig
121
+ # @param [EBNF] ebnf
122
+ def initialize(id, sym, expr, ebnf)
123
+ @id, @sym, @expr, @ebnf = id, sym, expr, ebnf
124
+ end
125
+
126
+ def to_sxp
127
+ [id, sym, kind, expr].to_sxp
128
+ end
129
+
130
+ def to_ttl
131
+ @ebnf.debug("to_ttl") {inspect}
132
+ comment = orig.strip.
133
+ gsub(/"""/, '\"\"\"').
134
+ gsub("\\", "\\\\").
135
+ sub(/^\"/, '\"').
136
+ sub(/\"$/m, '\"')
137
+ statements = [
138
+ %{:#{id} rdfs:label "#{id}"; rdf:value "#{sym}";},
139
+ %{ rdfs:comment #{comment.inspect};},
140
+ ]
141
+
142
+ statements += ttl_expr(expr, kind == :token ? "re" : "g", 1, false)
143
+ "\n" + statements.join("\n")
144
+ end
145
+
146
+ def inspect
147
+ {:sym => sym, :id => id, kind => kind, :expr => expr}.inspect
148
+ end
149
+
150
+ private
151
+ def ttl_expr(expr, pfx, depth, is_obj = true)
152
+ indent = ' ' * depth
153
+ @ebnf.debug("ttl_expr", :depth => depth) {expr.inspect}
154
+ op = expr.shift if expr.is_a?(Array)
155
+ statements = []
156
+
157
+ if is_obj
158
+ bra, ket = "[ ", " ]"
159
+ else
160
+ bra = ket = ''
161
+ end
162
+
163
+ case op
164
+ when :seq, :alt, :diff
165
+ statements << %{#{indent}#{bra}#{pfx}:#{op} (}
166
+ expr.each {|a| statements += ttl_expr(a, pfx, depth + 1)}
167
+ statements << %{#{indent} )#{ket}}
168
+ when :opt, :plus, :star
169
+ statements << %{#{indent}#{bra}#{pfx}:#{op} }
170
+ statements += ttl_expr(expr.first, pfx, depth + 1)
171
+ statements << %{#{indent} #{ket}} unless ket.empty?
172
+ when :"'"
173
+ statements << %{#{indent}"#{esc(expr)}"}
174
+ when :range
175
+ statements << %{#{indent}#{bra} re:matches #{cclass(expr.first).inspect} #{ket}}
176
+ when :hex
177
+ raise "didn't expect \" in expr" if expr.include?(:'"')
178
+ statements << %{#{indent}#{bra} re:matches #{cclass(expr.first).inspect} #{ket}}
179
+ else
180
+ if is_obj
181
+ statements << %{#{indent}#{expr.inspect}}
182
+ else
183
+ statements << %{#{indent}g:seq ( #{expr.inspect} )}
184
+ end
185
+ end
186
+
187
+ statements.last << " ." unless is_obj
188
+ @ebnf.debug("statements", :depth => depth) {statements.join("\n")}
189
+ statements
190
+ end
191
+
192
+ ##
193
+ # turn an XML BNF character class into an N3 literal for that
194
+ # character class (less the outer quote marks)
195
+ #
196
+ # >>> cclass("^<>'{}|^`")
197
+ # "[^<>'{}|^`]"
198
+ # >>> cclass("#x0300-#x036F")
199
+ # "[\\u0300-\\u036F]"
200
+ # >>> cclass("#xC0-#xD6")
201
+ # "[\\u00C0-\\u00D6]"
202
+ # >>> cclass("#x370-#x37D")
203
+ # "[\\u0370-\\u037D]"
204
+ #
205
+ # as in: ECHAR ::= '\' [tbnrf\"']
206
+ # >>> cclass("tbnrf\\\"'")
207
+ # 'tbnrf\\\\\\"\''
208
+ #
209
+ # >>> cclass("^#x22#x5C#x0A#x0D")
210
+ # '^\\u0022\\\\\\u005C\\u000A\\u000D'
211
+ def cclass(txt)
212
+ '[' +
213
+ txt.gsub(/\#x[0-9a-fA-F]+/) do |hx|
214
+ hx = hx[2..-1]
215
+ if hx.length <= 4
216
+ "\\u#{'0' * (4 - hx.length)}#{hx}"
217
+ elsif hx.length <= 8
218
+ "\\U#{'0' * (8 - hx.length)}#{hx}"
219
+ end
220
+ end +
221
+ ']'
222
+ end
223
+ end
224
+
225
+ # Abstract syntax tree from parse
226
+ attr_reader :ast
227
+
228
+ # Parse the string or file input generating an abstract syntax tree
229
+ # in S-Expressions (similar to SPARQL SSE)
230
+ #
231
+ # @param [#read, #to_s] input
232
+ def initialize(input, options = {})
233
+ @options = options
234
+ @lineno, @depth = 1, 0
235
+ token = false
236
+ @ast = []
237
+
238
+ input = input.respond_to?(:read) ? input.read : input.to_s
239
+ scanner = StringScanner.new(input)
240
+
241
+ eachRule(scanner) do |r|
242
+ debug("rule string") {r.inspect}
243
+ case r
244
+ when /^@terminals/
245
+ # Switch mode to parsing tokens
246
+ token = true
247
+ when /^@pass\s*(.*)$/m
248
+ rule = depth {ruleParts("[0] " + r)}
249
+ rule.kind = :pass
250
+ rule.orig = r
251
+ @ast << rule
252
+ else
253
+ rule = depth {ruleParts(r)}
254
+
255
+ # all caps symbols are tokens. Once a token is seen
256
+ # we don't go back
257
+ token ||= !!(rule.sym.to_s =~ /^[A-Z_]+$/)
258
+ rule.kind = token ? :token : :rule
259
+ rule.orig = r
260
+ @ast << rule
261
+ end
262
+ end
263
+ end
264
+
265
+ ##
266
+ # Write out parsed syntax string as an S-Expression
267
+ def to_sxp
268
+ begin
269
+ require 'sxp'
270
+ SXP::Generator.string(ast)
271
+ rescue LoadError
272
+ ast.to_sxp
273
+ end
274
+ end
275
+
276
+ ##
277
+ # Write out syntax tree as Turtle
278
+ # @param [String] prefix for language
279
+ # @return [String]
280
+ def to_ttl(prefix, ns)
281
+ token = false
282
+
283
+ unless ast.empty?
284
+ [
285
+ "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.",
286
+ "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.",
287
+ "@prefix #{prefix}: <#{ns}>.",
288
+ "@prefix : <#{ns}>.",
289
+ "@prefix re: <http://www.w3.org/2000/10/swap/grammar/regex#>.",
290
+ "@prefix g: <http://www.w3.org/2000/10/swap/grammar/ebnf#>.",
291
+ "",
292
+ ":language rdfs:isDefinedBy <>; g:start :#{ast.first.id}.",
293
+ "",
294
+ ]
295
+ end.join("\n") +
296
+
297
+ ast.
298
+ select {|a| [:rule, :token].include?(a.kind)}.
299
+ map(&:to_ttl).
300
+ join("\n")
301
+ end
302
+
303
+ ##
304
+ # Iterate over rule strings.
305
+ # a line that starts with '[' or '@' starts a new rule
306
+ #
307
+ # @param [StringScanner] scanner
308
+ # @yield rule_string
309
+ # @yieldparam [String] rule_string
310
+ def eachRule(scanner)
311
+ cur_lineno = 1
312
+ r = ''
313
+ until scanner.eos?
314
+ case
315
+ when s = scanner.scan(%r(\s+)m)
316
+ # Eat whitespace
317
+ cur_lineno += s.count("\n")
318
+ #debug("eachRule(ws)") { "[#{cur_lineno}] #{s.inspect}" }
319
+ when s = scanner.scan(%r(/\*([^\*]|\*[^\/])*\*/)m)
320
+ # Eat comments
321
+ cur_lineno += s.count("\n")
322
+ debug("eachRule(comment)") { "[#{cur_lineno}] #{s.inspect}" }
323
+ when s = scanner.scan(%r(^@terminals))
324
+ #debug("eachRule(@terminals)") { "[#{cur_lineno}] #{s.inspect}" }
325
+ yield(r) unless r.empty?
326
+ @lineno = cur_lineno
327
+ yield(s)
328
+ r = ''
329
+ when s = scanner.scan(/@pass/)
330
+ # Found rule start, if we've already collected a rule, yield it
331
+ #debug("eachRule(@pass)") { "[#{cur_lineno}] #{s.inspect}" }
332
+ yield r unless r.empty?
333
+ @lineno = cur_lineno
334
+ r = s
335
+ when s = scanner.scan(/\[(?=\w+\])/)
336
+ # Found rule start, if we've already collected a rule, yield it
337
+ yield r unless r.empty?
338
+ #debug("eachRule(rule)") { "[#{cur_lineno}] #{s.inspect}" }
339
+ @lineno = cur_lineno
340
+ r = s
341
+ else
342
+ # Collect until end of line, or start of comment
343
+ s = scanner.scan_until(%r((?:/\*)|$)m)
344
+ cur_lineno += s.count("\n")
345
+ #debug("eachRule(rest)") { "[#{cur_lineno}] #{s.inspect}" }
346
+ r += s
347
+ end
348
+ end
349
+ yield r unless r.empty?
350
+ end
351
+
352
+ ##
353
+ # Parse a rule into a rule number, a symbol and an expression
354
+ #
355
+ # @param [String] rule
356
+ # @return [Rule]
357
+ def ruleParts(rule)
358
+ num_sym, expr = rule.split('::=', 2).map(&:strip)
359
+ num, sym = num_sym.split(']', 2).map(&:strip)
360
+ num = num[1..-1]
361
+ r = Rule.new(sym && sym.to_sym, num, ebnf(expr).first, self)
362
+ debug("ruleParts") { r.inspect }
363
+ r
364
+ end
365
+
366
+ ##
367
+ # Parse a string into an expression tree and a remaining string
368
+ #
369
+ # @example
370
+ # >>> ebnf("a b c")
371
+ # ((seq, [('id', 'a'), ('id', 'b'), ('id', 'c')]), '')
372
+ #
373
+ # >>> ebnf("a? b+ c*")
374
+ # ((seq, [(opt, ('id', 'a')), (plus, ('id', 'b')), ('*', ('id', 'c'))]), '')
375
+ #
376
+ # >>> ebnf(" | x xlist")
377
+ # ((alt, [(seq, []), (seq, [('id', 'x'), ('id', 'xlist')])]), '')
378
+ #
379
+ # >>> ebnf("a | (b - c)")
380
+ # ((alt, [('id', 'a'), (diff, [('id', 'b'), ('id', 'c')])]), '')
381
+ #
382
+ # >>> ebnf("a b | c d")
383
+ # ((alt, [(seq, [('id', 'a'), ('id', 'b')]), (seq, [('id', 'c'), ('id', 'd')])]), '')
384
+ #
385
+ # >>> ebnf("a | b | c")
386
+ # ((alt, [('id', 'a'), ('id', 'b'), ('id', 'c')]), '')
387
+ #
388
+ # >>> ebnf("a) b c")
389
+ # (('id', 'a'), ' b c')
390
+ #
391
+ # >>> ebnf("BaseDecl? PrefixDecl*")
392
+ # ((seq, [(opt, ('id', 'BaseDecl')), ('*', ('id', 'PrefixDecl'))]), '')
393
+ #
394
+ # >>> ebnf("NCCHAR1 | diff | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]")
395
+ # ((alt, [('id', 'NCCHAR1'), ("'", diff), (range, '0-9'), (hex, '#x00B7'), (range, '#x0300-#x036F'), (range, '#x203F-#x2040')]), '')
396
+ #
397
+ # @param [String] s
398
+ # @return [Array]
399
+ def ebnf(s)
400
+ debug("ebnf") {"(#{s.inspect})"}
401
+ e, s = depth {alt(s)}
402
+ debug {"=> alt returned #{[e, s].inspect}"}
403
+ unless s.empty?
404
+ t, ss = depth {token(s)}
405
+ debug {"=> token returned #{[t, ss].inspect}"}
406
+ return [e, ss] if t.is_a?(Array) && t.first == :")"
407
+ end
408
+ [e, s]
409
+ end
410
+
411
+ ##
412
+ # Parse alt
413
+ # >>> alt("a | b | c")
414
+ # ((alt, [('id', 'a'), ('id', 'b'), ('id', 'c')]), '')
415
+ # @param [String] s
416
+ # @return [Array]
417
+ def alt(s)
418
+ debug("alt") {"(#{s.inspect})"}
419
+ args = []
420
+ while !s.empty?
421
+ e, s = depth {seq(s)}
422
+ debug {"=> seq returned #{[e, s].inspect}"}
423
+ if e.to_s.empty?
424
+ break unless args.empty?
425
+ e = [:seq, []] # empty sequence
426
+ end
427
+ args << e
428
+ unless s.empty?
429
+ t, ss = depth {token(s)}
430
+ break unless t[0] == :alt
431
+ s = ss
432
+ end
433
+ end
434
+ args.length > 1 ? [args.unshift(:alt), s] : [e, s]
435
+ end
436
+
437
+ ##
438
+ # parse seq
439
+ #
440
+ # >>> seq("a b c")
441
+ # ((seq, [('id', 'a'), ('id', 'b'), ('id', 'c')]), '')
442
+ #
443
+ # >>> seq("a b? c")
444
+ # ((seq, [('id', 'a'), (opt, ('id', 'b')), ('id', 'c')]), '')
445
+ def seq(s)
446
+ debug("seq") {"(#{s.inspect})"}
447
+ args = []
448
+ while !s.empty?
449
+ e, ss = depth {diff(s)}
450
+ debug {"=> diff returned #{[e, ss].inspect}"}
451
+ unless e.to_s.empty?
452
+ args << e
453
+ s = ss
454
+ else
455
+ break;
456
+ end
457
+ end
458
+ if args.length > 1
459
+ [args.unshift(:seq), s]
460
+ elsif args.length == 1
461
+ args + [s]
462
+ else
463
+ ["", s]
464
+ end
465
+ end
466
+
467
+ ##
468
+ # parse diff
469
+ #
470
+ # >>> diff("a - b")
471
+ # ((diff, [('id', 'a'), ('id', 'b')]), '')
472
+ def diff(s)
473
+ debug("diff") {"(#{s.inspect})"}
474
+ e1, s = depth {postfix(s)}
475
+ debug {"=> postfix returned #{[e1, s].inspect}"}
476
+ unless e1.to_s.empty?
477
+ unless s.empty?
478
+ t, ss = depth {token(s)}
479
+ debug {"diff #{[t, ss].inspect}"}
480
+ if t.is_a?(Array) && t.first == :diff
481
+ s = ss
482
+ e2, s = primary(s)
483
+ unless e2.to_s.empty?
484
+ return [[:diff, e1, e2], s]
485
+ else
486
+ raise "Syntax Error"
487
+ end
488
+ end
489
+ end
490
+ end
491
+ [e1, s]
492
+ end
493
+
494
+ ##
495
+ # parse postfix
496
+ #
497
+ # >>> postfix("a b c")
498
+ # (('id', 'a'), ' b c')
499
+ #
500
+ # >>> postfix("a? b c")
501
+ # ((opt, ('id', 'a')), ' b c')
502
+ def postfix(s)
503
+ debug("postfix") {"(#{s.inspect})"}
504
+ e, s = depth {primary(s)}
505
+ debug {"=> primary returned #{[e, s].inspect}"}
506
+ return ["", s] if e.to_s.empty?
507
+ if !s.empty?
508
+ t, ss = depth {token(s)}
509
+ debug {"=> #{[t, ss].inspect}"}
510
+ if t.is_a?(Array) && [:opt, :star, :plus].include?(t.first)
511
+ return [[t.first, e], ss]
512
+ end
513
+ end
514
+ [e, s]
515
+ end
516
+
517
+ ##
518
+ # parse primary
519
+ #
520
+ # >>> primary("a b c")
521
+ # (('id', 'a'), ' b c')
522
+ def primary(s)
523
+ debug("primary") {"(#{s.inspect})"}
524
+ t, s = depth {token(s)}
525
+ debug {"=> token returned #{[t, s].inspect}"}
526
+ if t.is_a?(Symbol) || t.is_a?(String)
527
+ [t, s]
528
+ elsif %w(range hex).map(&:to_sym).include?(t.first)
529
+ [t, s]
530
+ elsif t.first == :"("
531
+ e, s = depth {ebnf(s)}
532
+ debug {"=> ebnf returned #{[e, s].inspect}"}
533
+ [e, s]
534
+ else
535
+ ["", s]
536
+ end
537
+ end
538
+
539
+ ##
540
+ # parse one token; return the token and the remaining string
541
+ #
542
+ # A token is represented as a tuple whose 1st item gives the type;
543
+ # some types have additional info in the tuple.
544
+ #
545
+ # @example
546
+ # >>> token("'abc' def")
547
+ # (("'", 'abc'), ' def')
548
+ #
549
+ # >>> token("[0-9]")
550
+ # ((range, '0-9'), '')
551
+ # >>> token("#x00B7")
552
+ # ((hex, '#x00B7'), '')
553
+ # >>> token ("[#x0300-#x036F]")
554
+ # ((range, '#x0300-#x036F'), '')
555
+ # >>> token("[^<>'{}|^`]-[#x00-#x20]")
556
+ # ((range, "^<>'{}|^`"), '-[#x00-#x20]')
557
+ def token(s)
558
+ s = s.strip
559
+ case m = s[0,1]
560
+ when '"', "'"
561
+ l, s = s[1..-1].split(m, 2)
562
+ [l, s]
563
+ when '['
564
+ l, s = s[1..-1].split(']', 2)
565
+ [[:range, l], s]
566
+ when '#'
567
+ s.match(/(#\w+)(.*)$/)
568
+ l, s = $1, $2
569
+ [[:hex, l], s]
570
+ when /[[:alpha:]]/
571
+ s.match(/(\w+)(.*)$/)
572
+ l, s = $1, $2
573
+ [l.to_sym, s]
574
+ when '@'
575
+ s.match(/@(#\w+)(.*)$/)
576
+ l, s = $1, $2
577
+ [[:"@", l], s]
578
+ when '-'
579
+ [[:diff], s[1..-1]]
580
+ when '?'
581
+ [[:opt], s[1..-1]]
582
+ when '|'
583
+ [[:alt], s[1..-1]]
584
+ when '+'
585
+ [[:plus], s[1..-1]]
586
+ when '*'
587
+ [[:star], s[1..-1]]
588
+ when /[\(\)]/
589
+ [[m.to_sym], s[1..-1]]
590
+ else
591
+ raise "unrecognized token: #{s.inspect}"
592
+ end
593
+ end
594
+
595
+ def depth
596
+ @depth += 1
597
+ ret = yield
598
+ @depth -= 1
599
+ ret
600
+ end
601
+
602
+ ##
603
+ # Progress output when debugging
604
+ # param [String] node relative location in input
605
+ # param [String] message ("")
606
+ # yieldreturn [String] added to message
607
+ def debug(*args)
608
+ return unless @options[:debug]
609
+ options = args.last.is_a?(Hash) ? args.pop : {}
610
+ depth = options[:depth] || @depth
611
+ message = args.pop
612
+ message = message.call if message.is_a?(Proc)
613
+ args << message if message
614
+ args << yield if block_given?
615
+ message = "#{args.join(': ')}"
616
+ str = "[#{@lineno}]#{' ' * depth}#{message}"
617
+ @options[:debug] << str if @options[:debug].is_a?(Array)
618
+ $stderr.puts(str) if @options[:debug] == true
619
+ end
620
+ end