rdf-turtle 0.1.2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,4 +1,4 @@
1
- # RDF::Turtle reader/writer
1
+ # RDF::Turtle reader/writer [![Build Status](https://secure.travis-ci.org/ruby-rdf/rdf-turtle.png?branch=master)](http://travis-ci.org/ruby-rdf/rdf-turtle)
2
2
  [Turtle][] reader/writer for [RDF.rb][RDF.rb] .
3
3
 
4
4
  ## Description
@@ -48,7 +48,7 @@ In some cases, the specification is unclear on certain issues:
48
48
  cannot if the IRI contains any characters that might need escaping. This implementation currently abides
49
49
  by this restriction. Presumably, this would affect both PNAME\_NS and PNAME\_LN terminals.
50
50
  (This is being tracked as issues [67](http://www.w3.org/2011/rdf-wg/track/issues/67)).
51
- * The EBNF definition of IRI_REF seems malformed, and has no provision for \^, as discussed elsewhere in the spec.
51
+ * The EBNF definition of IRIREF seems malformed, and has no provision for \^, as discussed elsewhere in the spec.
52
52
  We presume that [#0000- ] is intended to be [#0000-#0020].
53
53
  * The list example in section 6 uses a list on it's own, without a predicate or object, which is not allowed
54
54
  by the grammar (neither is a blankNodeProperyList). Either the EBNF should be updated to allow for these
@@ -128,9 +128,9 @@ see <http://unlicense.org/> or the accompanying {file:UNLICENSE} file.
128
128
  [YARD]: http://yardoc.org/
129
129
  [YARD-GS]: http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
130
130
  [PDD]: http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
131
- [RDF.rb]: http://rubydoc.info/github/gkellogg/rdf/master/frames
131
+ [RDF.rb]: http://rubydoc.info/github/ruby-rdf/rdf/master/frames
132
132
  [Backports]: http://rubygems.org/gems/backports
133
133
  [N-Triples]: http://www.w3.org/TR/rdf-testcases/#ntriples
134
- [Turtle]: http://www.w3.org/TR/2011/WD-turtle-20110809/
134
+ [Turtle]: http://www.w3.org/TR/2012/WD-turtle-20120710/
135
135
  [Turtle doc]: http://rubydoc.info/github/ruby-rdf/rdf-turtle/master/file/README.markdown
136
- [Turtle EBNF]: http://www.w3.org/TR/2011/WD-turtle-20110809/turtle.bnf
136
+ [Turtle EBNF]: http://dvcs.w3.org/hg/rdf/file/8610b8f58685/rdf-turtle/turtle.bnf
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.1.2
1
+ 0.3.0
@@ -0,0 +1,620 @@
1
+ require 'strscan'
2
+
3
+ # Extended Bakus-Nour Form (EBNF), being the W3C variation is
4
+ # originaly defined in the
5
+ # [W3C XML 1.0 Spec](http://www.w3.org/TR/REC-xml/#sec-notation).
6
+ #
7
+ # This version attempts to be less strict than the strict definition
8
+ # to allow for coloquial variations (such as in the Turtle syntax).
9
+ #
10
+ # A rule takes the following form:
11
+ # [1] symbol ::= expression
12
+ #
13
+ # Comments include the content between '/*' and '*/'
14
+ #
15
+ # @see http://www.w3.org/2000/10/swap/grammar/ebnf2turtle.py
16
+ # @see http://www.w3.org/2000/10/swap/grammar/ebnf2bnf.n3
17
+ #
18
+ # Based on bnf2turtle by Dan Connolly.
19
+ #
20
+ # Motivation
21
+ # ----------
22
+ #
23
+ # Many specifications include grammars that look formal but are not
24
+ # actually checked, by machine, against test data sets. Debugging the
25
+ # grammar in the XML specification has been a long, tedious manual
26
+ # process. Only when the loop is closed between a fully formal grammar
27
+ # and a large test data set can we be confident that we have an accurate
28
+ # specification of a language [#]_.
29
+ #
30
+ #
31
+ # The grammar in the `N3 design note`_ has evolved based on the original
32
+ # manual transcription into a python recursive-descent parser and
33
+ # subsequent development of test cases. Rather than maintain the grammar
34
+ # and the parser independently, our goal_ is to formalize the language
35
+ # syntax sufficiently to replace the manual implementation with one
36
+ # derived mechanically from the specification.
37
+ #
38
+ #
39
+ # .. [#] and even then, only the syntax of the language.
40
+ # .. _N3 design note: http://www.w3.org/DesignIssues/Notation3
41
+ #
42
+ # Related Work
43
+ # ------------
44
+ #
45
+ # Sean Palmer's `n3p announcement`_ demonstrated the feasibility of the
46
+ # approach, though that work did not cover some aspects of N3.
47
+ #
48
+ # In development of the `SPARQL specification`_, Eric Prud'hommeaux
49
+ # developed Yacker_, which converts EBNF syntax to perl and C and C++
50
+ # yacc grammars. It includes an interactive facility for checking
51
+ # strings against the resulting grammars.
52
+ # Yosi Scharf used it in `cwm Release 1.1.0rc1`_, which includes
53
+ # a SPAQRL parser that is *almost* completely mechanically generated.
54
+ #
55
+ # The N3/turtle output from yacker is lower level than the EBNF notation
56
+ # from the XML specification; it has the ?, +, and * operators compiled
57
+ # down to pure context-free rules, obscuring the grammar
58
+ # structure. Since that transformation is straightforwardly expressed in
59
+ # semantic web rules (see bnf-rules.n3_), it seems best to keep the RDF
60
+ # expression of the grammar in terms of the higher level EBNF
61
+ # constructs.
62
+ #
63
+ # .. _goal: http://www.w3.org/2002/02/mid/1086902566.21030.1479.camel@dirk;list=public-cwm-bugs
64
+ # .. _n3p announcement: http://lists.w3.org/Archives/Public/public-cwm-talk/2004OctDec/0029.html
65
+ # .. _Yacker: http://www.w3.org/1999/02/26-modules/User/Yacker
66
+ # .. _SPARQL specification: http://www.w3.org/TR/rdf-sparql-query/
67
+ # .. _Cwm Release 1.1.0rc1: http://lists.w3.org/Archives/Public/public-cwm-announce/2005JulSep/0000.html
68
+ # .. _bnf-rules.n3: http://www.w3.org/2000/10/swap/grammar/bnf-rules.n3
69
+ #
70
+ # Open Issues and Future Work
71
+ # ---------------------------
72
+ #
73
+ # The yacker output also has the terminals compiled to elaborate regular
74
+ # expressions. The best strategy for dealing with lexical tokens is not
75
+ # yet clear. Many tokens in SPARQL are case insensitive; this is not yet
76
+ # captured formally.
77
+ #
78
+ # The schema for the EBNF vocabulary used here (``g:seq``, ``g:alt``, ...)
79
+ # is not yet published; it should be aligned with `swap/grammar/bnf`_
80
+ # and the bnf2html.n3_ rules (and/or the style of linked XHTML grammar
81
+ # in the SPARQL and XML specificiations).
82
+ #
83
+ # It would be interesting to corroborate the claim in the SPARQL spec
84
+ # that the grammar is LL(1) with a mechanical proof based on N3 rules.
85
+ #
86
+ # .. _swap/grammar/bnf: http://www.w3.org/2000/10/swap/grammar/bnf
87
+ # .. _bnf2html.n3: http://www.w3.org/2000/10/swap/grammar/bnf2html.n3
88
+ #
89
+ #
90
+ #
91
+ # Background
92
+ # ----------
93
+ #
94
+ # The `N3 Primer`_ by Tim Berners-Lee introduces RDF and the Semantic
95
+ # web using N3, a teaching and scribbling language. Turtle is a subset
96
+ # of N3 that maps directly to (and from) the standard XML syntax for
97
+ # RDF.
98
+ #
99
+ #
100
+ #
101
+ # .. _N3 Primer: _http://www.w3.org/2000/10/swap/Primer.html
102
+ #
103
+ # @author Gregg Kellogg
104
+ class EBNF
105
+ class Rule
106
+ # @attr [Symbol] sym
107
+ attr_reader :sym
108
+ # @attr [String] id
109
+ attr_reader :id
110
+ # @attr [Symbol] kind one of :rule, :token, or :pass
111
+ attr_accessor :kind
112
+ # @attr [Array] expr
113
+ attr_reader :expr
114
+ # @attr [String] orig
115
+ attr_accessor :orig
116
+
117
+ # @param [Integer] id
118
+ # @param [Symbol] sym
119
+ # @param [Array] expr
120
+ # @param [String] orig
121
+ # @param [EBNF] ebnf
122
+ def initialize(id, sym, expr, ebnf)
123
+ @id, @sym, @expr, @ebnf = id, sym, expr, ebnf
124
+ end
125
+
126
+ def to_sxp
127
+ [id, sym, kind, expr].to_sxp
128
+ end
129
+
130
+ def to_ttl
131
+ @ebnf.debug("to_ttl") {inspect}
132
+ comment = orig.strip.
133
+ gsub(/"""/, '\"\"\"').
134
+ gsub("\\", "\\\\").
135
+ sub(/^\"/, '\"').
136
+ sub(/\"$/m, '\"')
137
+ statements = [
138
+ %{:#{id} rdfs:label "#{id}"; rdf:value "#{sym}";},
139
+ %{ rdfs:comment #{comment.inspect};},
140
+ ]
141
+
142
+ statements += ttl_expr(expr, kind == :token ? "re" : "g", 1, false)
143
+ "\n" + statements.join("\n")
144
+ end
145
+
146
+ def inspect
147
+ {:sym => sym, :id => id, kind => kind, :expr => expr}.inspect
148
+ end
149
+
150
+ private
151
+ def ttl_expr(expr, pfx, depth, is_obj = true)
152
+ indent = ' ' * depth
153
+ @ebnf.debug("ttl_expr", :depth => depth) {expr.inspect}
154
+ op = expr.shift if expr.is_a?(Array)
155
+ statements = []
156
+
157
+ if is_obj
158
+ bra, ket = "[ ", " ]"
159
+ else
160
+ bra = ket = ''
161
+ end
162
+
163
+ case op
164
+ when :seq, :alt, :diff
165
+ statements << %{#{indent}#{bra}#{pfx}:#{op} (}
166
+ expr.each {|a| statements += ttl_expr(a, pfx, depth + 1)}
167
+ statements << %{#{indent} )#{ket}}
168
+ when :opt, :plus, :star
169
+ statements << %{#{indent}#{bra}#{pfx}:#{op} }
170
+ statements += ttl_expr(expr.first, pfx, depth + 1)
171
+ statements << %{#{indent} #{ket}} unless ket.empty?
172
+ when :"'"
173
+ statements << %{#{indent}"#{esc(expr)}"}
174
+ when :range
175
+ statements << %{#{indent}#{bra} re:matches #{cclass(expr.first).inspect} #{ket}}
176
+ when :hex
177
+ raise "didn't expect \" in expr" if expr.include?(:'"')
178
+ statements << %{#{indent}#{bra} re:matches #{cclass(expr.first).inspect} #{ket}}
179
+ else
180
+ if is_obj
181
+ statements << %{#{indent}#{expr.inspect}}
182
+ else
183
+ statements << %{#{indent}g:seq ( #{expr.inspect} )}
184
+ end
185
+ end
186
+
187
+ statements.last << " ." unless is_obj
188
+ @ebnf.debug("statements", :depth => depth) {statements.join("\n")}
189
+ statements
190
+ end
191
+
192
+ ##
193
+ # turn an XML BNF character class into an N3 literal for that
194
+ # character class (less the outer quote marks)
195
+ #
196
+ # >>> cclass("^<>'{}|^`")
197
+ # "[^<>'{}|^`]"
198
+ # >>> cclass("#x0300-#x036F")
199
+ # "[\\u0300-\\u036F]"
200
+ # >>> cclass("#xC0-#xD6")
201
+ # "[\\u00C0-\\u00D6]"
202
+ # >>> cclass("#x370-#x37D")
203
+ # "[\\u0370-\\u037D]"
204
+ #
205
+ # as in: ECHAR ::= '\' [tbnrf\"']
206
+ # >>> cclass("tbnrf\\\"'")
207
+ # 'tbnrf\\\\\\"\''
208
+ #
209
+ # >>> cclass("^#x22#x5C#x0A#x0D")
210
+ # '^\\u0022\\\\\\u005C\\u000A\\u000D'
211
+ def cclass(txt)
212
+ '[' +
213
+ txt.gsub(/\#x[0-9a-fA-F]+/) do |hx|
214
+ hx = hx[2..-1]
215
+ if hx.length <= 4
216
+ "\\u#{'0' * (4 - hx.length)}#{hx}"
217
+ elsif hx.length <= 8
218
+ "\\U#{'0' * (8 - hx.length)}#{hx}"
219
+ end
220
+ end +
221
+ ']'
222
+ end
223
+ end
224
+
225
+ # Abstract syntax tree from parse
226
+ attr_reader :ast
227
+
228
+ # Parse the string or file input generating an abstract syntax tree
229
+ # in S-Expressions (similar to SPARQL SSE)
230
+ #
231
+ # @param [#read, #to_s] input
232
+ def initialize(input, options = {})
233
+ @options = options
234
+ @lineno, @depth = 1, 0
235
+ token = false
236
+ @ast = []
237
+
238
+ input = input.respond_to?(:read) ? input.read : input.to_s
239
+ scanner = StringScanner.new(input)
240
+
241
+ eachRule(scanner) do |r|
242
+ debug("rule string") {r.inspect}
243
+ case r
244
+ when /^@terminals/
245
+ # Switch mode to parsing tokens
246
+ token = true
247
+ when /^@pass\s*(.*)$/m
248
+ rule = depth {ruleParts("[0] " + r)}
249
+ rule.kind = :pass
250
+ rule.orig = r
251
+ @ast << rule
252
+ else
253
+ rule = depth {ruleParts(r)}
254
+
255
+ # all caps symbols are tokens. Once a token is seen
256
+ # we don't go back
257
+ token ||= !!(rule.sym.to_s =~ /^[A-Z_]+$/)
258
+ rule.kind = token ? :token : :rule
259
+ rule.orig = r
260
+ @ast << rule
261
+ end
262
+ end
263
+ end
264
+
265
+ ##
266
+ # Write out parsed syntax string as an S-Expression
267
+ def to_sxp
268
+ begin
269
+ require 'sxp'
270
+ SXP::Generator.string(ast)
271
+ rescue LoadError
272
+ ast.to_sxp
273
+ end
274
+ end
275
+
276
+ ##
277
+ # Write out syntax tree as Turtle
278
+ # @param [String] prefix for language
279
+ # @return [String]
280
+ def to_ttl(prefix, ns)
281
+ token = false
282
+
283
+ unless ast.empty?
284
+ [
285
+ "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.",
286
+ "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.",
287
+ "@prefix #{prefix}: <#{ns}>.",
288
+ "@prefix : <#{ns}>.",
289
+ "@prefix re: <http://www.w3.org/2000/10/swap/grammar/regex#>.",
290
+ "@prefix g: <http://www.w3.org/2000/10/swap/grammar/ebnf#>.",
291
+ "",
292
+ ":language rdfs:isDefinedBy <>; g:start :#{ast.first.id}.",
293
+ "",
294
+ ]
295
+ end.join("\n") +
296
+
297
+ ast.
298
+ select {|a| [:rule, :token].include?(a.kind)}.
299
+ map(&:to_ttl).
300
+ join("\n")
301
+ end
302
+
303
+ ##
304
+ # Iterate over rule strings.
305
+ # a line that starts with '[' or '@' starts a new rule
306
+ #
307
+ # @param [StringScanner] scanner
308
+ # @yield rule_string
309
+ # @yieldparam [String] rule_string
310
+ def eachRule(scanner)
311
+ cur_lineno = 1
312
+ r = ''
313
+ until scanner.eos?
314
+ case
315
+ when s = scanner.scan(%r(\s+)m)
316
+ # Eat whitespace
317
+ cur_lineno += s.count("\n")
318
+ #debug("eachRule(ws)") { "[#{cur_lineno}] #{s.inspect}" }
319
+ when s = scanner.scan(%r(/\*([^\*]|\*[^\/])*\*/)m)
320
+ # Eat comments
321
+ cur_lineno += s.count("\n")
322
+ debug("eachRule(comment)") { "[#{cur_lineno}] #{s.inspect}" }
323
+ when s = scanner.scan(%r(^@terminals))
324
+ #debug("eachRule(@terminals)") { "[#{cur_lineno}] #{s.inspect}" }
325
+ yield(r) unless r.empty?
326
+ @lineno = cur_lineno
327
+ yield(s)
328
+ r = ''
329
+ when s = scanner.scan(/@pass/)
330
+ # Found rule start, if we've already collected a rule, yield it
331
+ #debug("eachRule(@pass)") { "[#{cur_lineno}] #{s.inspect}" }
332
+ yield r unless r.empty?
333
+ @lineno = cur_lineno
334
+ r = s
335
+ when s = scanner.scan(/\[(?=\w+\])/)
336
+ # Found rule start, if we've already collected a rule, yield it
337
+ yield r unless r.empty?
338
+ #debug("eachRule(rule)") { "[#{cur_lineno}] #{s.inspect}" }
339
+ @lineno = cur_lineno
340
+ r = s
341
+ else
342
+ # Collect until end of line, or start of comment
343
+ s = scanner.scan_until(%r((?:/\*)|$)m)
344
+ cur_lineno += s.count("\n")
345
+ #debug("eachRule(rest)") { "[#{cur_lineno}] #{s.inspect}" }
346
+ r += s
347
+ end
348
+ end
349
+ yield r unless r.empty?
350
+ end
351
+
352
+ ##
353
+ # Parse a rule into a rule number, a symbol and an expression
354
+ #
355
+ # @param [String] rule
356
+ # @return [Rule]
357
+ def ruleParts(rule)
358
+ num_sym, expr = rule.split('::=', 2).map(&:strip)
359
+ num, sym = num_sym.split(']', 2).map(&:strip)
360
+ num = num[1..-1]
361
+ r = Rule.new(sym && sym.to_sym, num, ebnf(expr).first, self)
362
+ debug("ruleParts") { r.inspect }
363
+ r
364
+ end
365
+
366
+ ##
367
+ # Parse a string into an expression tree and a remaining string
368
+ #
369
+ # @example
370
+ # >>> ebnf("a b c")
371
+ # ((seq, [('id', 'a'), ('id', 'b'), ('id', 'c')]), '')
372
+ #
373
+ # >>> ebnf("a? b+ c*")
374
+ # ((seq, [(opt, ('id', 'a')), (plus, ('id', 'b')), ('*', ('id', 'c'))]), '')
375
+ #
376
+ # >>> ebnf(" | x xlist")
377
+ # ((alt, [(seq, []), (seq, [('id', 'x'), ('id', 'xlist')])]), '')
378
+ #
379
+ # >>> ebnf("a | (b - c)")
380
+ # ((alt, [('id', 'a'), (diff, [('id', 'b'), ('id', 'c')])]), '')
381
+ #
382
+ # >>> ebnf("a b | c d")
383
+ # ((alt, [(seq, [('id', 'a'), ('id', 'b')]), (seq, [('id', 'c'), ('id', 'd')])]), '')
384
+ #
385
+ # >>> ebnf("a | b | c")
386
+ # ((alt, [('id', 'a'), ('id', 'b'), ('id', 'c')]), '')
387
+ #
388
+ # >>> ebnf("a) b c")
389
+ # (('id', 'a'), ' b c')
390
+ #
391
+ # >>> ebnf("BaseDecl? PrefixDecl*")
392
+ # ((seq, [(opt, ('id', 'BaseDecl')), ('*', ('id', 'PrefixDecl'))]), '')
393
+ #
394
+ # >>> ebnf("NCCHAR1 | diff | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]")
395
+ # ((alt, [('id', 'NCCHAR1'), ("'", diff), (range, '0-9'), (hex, '#x00B7'), (range, '#x0300-#x036F'), (range, '#x203F-#x2040')]), '')
396
+ #
397
+ # @param [String] s
398
+ # @return [Array]
399
+ def ebnf(s)
400
+ debug("ebnf") {"(#{s.inspect})"}
401
+ e, s = depth {alt(s)}
402
+ debug {"=> alt returned #{[e, s].inspect}"}
403
+ unless s.empty?
404
+ t, ss = depth {token(s)}
405
+ debug {"=> token returned #{[t, ss].inspect}"}
406
+ return [e, ss] if t.is_a?(Array) && t.first == :")"
407
+ end
408
+ [e, s]
409
+ end
410
+
411
+ ##
412
+ # Parse alt
413
+ # >>> alt("a | b | c")
414
+ # ((alt, [('id', 'a'), ('id', 'b'), ('id', 'c')]), '')
415
+ # @param [String] s
416
+ # @return [Array]
417
+ def alt(s)
418
+ debug("alt") {"(#{s.inspect})"}
419
+ args = []
420
+ while !s.empty?
421
+ e, s = depth {seq(s)}
422
+ debug {"=> seq returned #{[e, s].inspect}"}
423
+ if e.to_s.empty?
424
+ break unless args.empty?
425
+ e = [:seq, []] # empty sequence
426
+ end
427
+ args << e
428
+ unless s.empty?
429
+ t, ss = depth {token(s)}
430
+ break unless t[0] == :alt
431
+ s = ss
432
+ end
433
+ end
434
+ args.length > 1 ? [args.unshift(:alt), s] : [e, s]
435
+ end
436
+
437
+ ##
438
+ # parse seq
439
+ #
440
+ # >>> seq("a b c")
441
+ # ((seq, [('id', 'a'), ('id', 'b'), ('id', 'c')]), '')
442
+ #
443
+ # >>> seq("a b? c")
444
+ # ((seq, [('id', 'a'), (opt, ('id', 'b')), ('id', 'c')]), '')
445
+ def seq(s)
446
+ debug("seq") {"(#{s.inspect})"}
447
+ args = []
448
+ while !s.empty?
449
+ e, ss = depth {diff(s)}
450
+ debug {"=> diff returned #{[e, ss].inspect}"}
451
+ unless e.to_s.empty?
452
+ args << e
453
+ s = ss
454
+ else
455
+ break;
456
+ end
457
+ end
458
+ if args.length > 1
459
+ [args.unshift(:seq), s]
460
+ elsif args.length == 1
461
+ args + [s]
462
+ else
463
+ ["", s]
464
+ end
465
+ end
466
+
467
+ ##
468
+ # parse diff
469
+ #
470
+ # >>> diff("a - b")
471
+ # ((diff, [('id', 'a'), ('id', 'b')]), '')
472
+ def diff(s)
473
+ debug("diff") {"(#{s.inspect})"}
474
+ e1, s = depth {postfix(s)}
475
+ debug {"=> postfix returned #{[e1, s].inspect}"}
476
+ unless e1.to_s.empty?
477
+ unless s.empty?
478
+ t, ss = depth {token(s)}
479
+ debug {"diff #{[t, ss].inspect}"}
480
+ if t.is_a?(Array) && t.first == :diff
481
+ s = ss
482
+ e2, s = primary(s)
483
+ unless e2.to_s.empty?
484
+ return [[:diff, e1, e2], s]
485
+ else
486
+ raise "Syntax Error"
487
+ end
488
+ end
489
+ end
490
+ end
491
+ [e1, s]
492
+ end
493
+
494
+ ##
495
+ # parse postfix
496
+ #
497
+ # >>> postfix("a b c")
498
+ # (('id', 'a'), ' b c')
499
+ #
500
+ # >>> postfix("a? b c")
501
+ # ((opt, ('id', 'a')), ' b c')
502
+ def postfix(s)
503
+ debug("postfix") {"(#{s.inspect})"}
504
+ e, s = depth {primary(s)}
505
+ debug {"=> primary returned #{[e, s].inspect}"}
506
+ return ["", s] if e.to_s.empty?
507
+ if !s.empty?
508
+ t, ss = depth {token(s)}
509
+ debug {"=> #{[t, ss].inspect}"}
510
+ if t.is_a?(Array) && [:opt, :star, :plus].include?(t.first)
511
+ return [[t.first, e], ss]
512
+ end
513
+ end
514
+ [e, s]
515
+ end
516
+
517
+ ##
518
+ # parse primary
519
+ #
520
+ # >>> primary("a b c")
521
+ # (('id', 'a'), ' b c')
522
+ def primary(s)
523
+ debug("primary") {"(#{s.inspect})"}
524
+ t, s = depth {token(s)}
525
+ debug {"=> token returned #{[t, s].inspect}"}
526
+ if t.is_a?(Symbol) || t.is_a?(String)
527
+ [t, s]
528
+ elsif %w(range hex).map(&:to_sym).include?(t.first)
529
+ [t, s]
530
+ elsif t.first == :"("
531
+ e, s = depth {ebnf(s)}
532
+ debug {"=> ebnf returned #{[e, s].inspect}"}
533
+ [e, s]
534
+ else
535
+ ["", s]
536
+ end
537
+ end
538
+
539
+ ##
540
+ # parse one token; return the token and the remaining string
541
+ #
542
+ # A token is represented as a tuple whose 1st item gives the type;
543
+ # some types have additional info in the tuple.
544
+ #
545
+ # @example
546
+ # >>> token("'abc' def")
547
+ # (("'", 'abc'), ' def')
548
+ #
549
+ # >>> token("[0-9]")
550
+ # ((range, '0-9'), '')
551
+ # >>> token("#x00B7")
552
+ # ((hex, '#x00B7'), '')
553
+ # >>> token ("[#x0300-#x036F]")
554
+ # ((range, '#x0300-#x036F'), '')
555
+ # >>> token("[^<>'{}|^`]-[#x00-#x20]")
556
+ # ((range, "^<>'{}|^`"), '-[#x00-#x20]')
557
+ def token(s)
558
+ s = s.strip
559
+ case m = s[0,1]
560
+ when '"', "'"
561
+ l, s = s[1..-1].split(m, 2)
562
+ [l, s]
563
+ when '['
564
+ l, s = s[1..-1].split(']', 2)
565
+ [[:range, l], s]
566
+ when '#'
567
+ s.match(/(#\w+)(.*)$/)
568
+ l, s = $1, $2
569
+ [[:hex, l], s]
570
+ when /[[:alpha:]]/
571
+ s.match(/(\w+)(.*)$/)
572
+ l, s = $1, $2
573
+ [l.to_sym, s]
574
+ when '@'
575
+ s.match(/@(#\w+)(.*)$/)
576
+ l, s = $1, $2
577
+ [[:"@", l], s]
578
+ when '-'
579
+ [[:diff], s[1..-1]]
580
+ when '?'
581
+ [[:opt], s[1..-1]]
582
+ when '|'
583
+ [[:alt], s[1..-1]]
584
+ when '+'
585
+ [[:plus], s[1..-1]]
586
+ when '*'
587
+ [[:star], s[1..-1]]
588
+ when /[\(\)]/
589
+ [[m.to_sym], s[1..-1]]
590
+ else
591
+ raise "unrecognized token: #{s.inspect}"
592
+ end
593
+ end
594
+
595
+ def depth
596
+ @depth += 1
597
+ ret = yield
598
+ @depth -= 1
599
+ ret
600
+ end
601
+
602
+ ##
603
+ # Progress output when debugging
604
+ # param [String] node relative location in input
605
+ # param [String] message ("")
606
+ # yieldreturn [String] added to message
607
+ def debug(*args)
608
+ return unless @options[:debug]
609
+ options = args.last.is_a?(Hash) ? args.pop : {}
610
+ depth = options[:depth] || @depth
611
+ message = args.pop
612
+ message = message.call if message.is_a?(Proc)
613
+ args << message if message
614
+ args << yield if block_given?
615
+ message = "#{args.join(': ')}"
616
+ str = "[#{@lineno}]#{' ' * depth}#{message}"
617
+ @options[:debug] << str if @options[:debug].is_a?(Array)
618
+ $stderr.puts(str) if @options[:debug] == true
619
+ end
620
+ end