rdf-turtle 1.1.7 → 1.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 56d0d3cc2cf4292f416aa291e67d4d058cd332d7
4
- data.tar.gz: d39c9e316044e0ebea6e8d1780a9ccf136a39e2e
3
+ metadata.gz: 3eb247803e3c16fd5a6338c2175664deaeeea53c
4
+ data.tar.gz: cf0ea0c919bfc0421e91eb0dbd7337fd61dfe881
5
5
  SHA512:
6
- metadata.gz: 15d74683135486f8c3f107c71f422d27f1b9e8d13c55decb6a75d629a0d923c804e9ece23a0c0ffb339ff42baea972f8f7e6adf9831bea4d556e808b63abcb7c
7
- data.tar.gz: 2733ddfba4a8a54c84f997a31c37df08a3b2c820829c8c9f55899d2b67af769a4e2d36bd289b79f2077e6bc5f0138bb236600f49508d30b3f645ca3836b08f8c
6
+ metadata.gz: 0a461d97b2f6c0fb5a8e63cf0f07f427cbea96dbf38952b0da5dac2064464ff999237067a53d7ac18ddeaaff809a74db273f93095c6007b7d2a13f0e826a15e4
7
+ data.tar.gz: d145feb24bbda56cbdb7992becc7fe68c501efebac7e020bdcb6e87015e09189951f77b0516e2ad8db84f7197e4a055b4fb7564b603b5da1809eb7cffb1b0f7d
data/README.md CHANGED
@@ -4,6 +4,8 @@
4
4
 
5
5
  [![Gem Version](https://badge.fury.io/rb/rdf-turtle.png)](http://badge.fury.io/rb/rdf-turtle)
6
6
  [![Build Status](https://travis-ci.org/ruby-rdf/rdf-turtle.png?branch=master)](http://travis-ci.org/ruby-rdf/rdf-turtle)
7
+ [![Coverage Status](https://coveralls.io/repos/ruby-rdf/rdf-turtle/badge.svg)](https://coveralls.io/r/ruby-rdf/rdf-turtle)
8
+ [![Dependency Status](https://gemnasium.com/ruby-rdf/rdf-turtle.png)](https://gemnasium.com/ruby-rdf/rdf-turtle)
7
9
 
8
10
  ## Description
9
11
  This is a [Ruby][] implementation of a [Turtle][] parser for [RDF.rb][].
@@ -46,10 +48,7 @@ Full documentation available on [Rubydoc.info][Turtle doc]
46
48
  ### Variations from the spec
47
49
  In some cases, the specification is unclear on certain issues:
48
50
 
49
- * The LC version of the [Turtle][] specification separates rules for `@base` and `@prefix` with
50
- closing '.' from the
51
- SPARQL-like `BASE` and `PREFIX` without closing '.'. This version implements a more flexible
52
- syntax where the `@` and closing `.` are optional and `base/prefix` are matched case independently.
51
+ * The LC version of the [Turtle][] specification separates rules for `@base` and `@prefix` with closing '.' from the SPARQL-like `BASE` and `PREFIX` without closing '.'. This version implements a more flexible syntax where the `@` and closing `.` are optional and `base/prefix` are matched case independently.
53
52
  * Additionally, both `a` and `A` match `rdf:type`.
54
53
 
55
54
  ### Freebase-specific Reader
@@ -83,19 +82,13 @@ An example of reading Freebase dumps:
83
82
  r.each_statement {|stmt| puts stmt.to_ntriples}
84
83
  end
85
84
  ## Implementation Notes
86
- The reader uses the [EBNF][] gem to generate first, follow and branch tables, and uses
87
- the `Parser` and `Lexer` modules to implement the Turtle parser.
88
-
89
- The parser takes branch and follow tables generated from the original [Turtle
90
- EBNF Grammar][Turtle EBNF] described in the [specification][Turtle]. Branch and
91
- Follow tables are specified in {RDF::Turtle::Meta}, which is in turn generated
92
- using the [EBNF][] gem.
85
+ This version uses a hand-written parser using the Lexer from the [EBNF][] gem instead of a general [EBNF][] LL(1) parser for faster performance.
93
86
 
94
87
  ## Dependencies
95
88
 
96
- * [Ruby](http://ruby-lang.org/) (>= 1.9.2)
97
- * [RDF.rb](http://rubygems.org/gems/rdf) (>= 1.1)
98
- * [EBNF][] (>= 0.3.0)
89
+ * [Ruby](http://ruby-lang.org/) (>= 1.9.3)
90
+ * [RDF.rb](http://rubygems.org/gems/rdf) (~> 1.1)
91
+ * [EBNF][] (~> 0.3)
99
92
 
100
93
  ## Installation
101
94
 
@@ -136,7 +129,7 @@ A copy of the [Turtle EBNF][] and derived parser files are included in the repos
136
129
  [YARD]: http://yardoc.org/
137
130
  [YARD-GS]: http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
138
131
  [PDD]: http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
139
- [RDF.rb]: http://rubydoc.info/github/ruby-rdf/rdf/master/frames
132
+ [RDF.rb]: http://rubydoc.info/github/ruby-rdf/rdf
140
133
  [EBNF]: http://rubygems.org/gems/ebnf
141
134
  [Backports]: http://rubygems.org/gems/backports
142
135
  [N-Triples]: http://www.w3.org/TR/rdf-testcases/#ntriples
data/VERSION CHANGED
@@ -1 +1 @@
1
- 1.1.7
1
+ 1.1.8
@@ -1,203 +1,44 @@
1
- require 'rdf/turtle/meta'
2
- require 'ebnf/ll1/parser'
1
+ # coding: utf-8
2
+ require 'ebnf/ll1/lexer'
3
3
 
4
4
  module RDF::Turtle
5
5
  ##
6
6
  # A parser for the Turtle 2
7
7
  class Reader < RDF::Reader
8
8
  format Format
9
- include RDF::Turtle::Meta
10
9
  include EBNF::LL1::Parser
11
10
  include RDF::Turtle::Terminals
12
11
 
13
12
  # Terminals passed to lexer. Order matters!
14
- terminal(:ANON, ANON) do |prod, token, input|
15
- input[:resource] = self.bnode
16
- end
17
- terminal(:BLANK_NODE_LABEL, BLANK_NODE_LABEL) do |prod, token, input|
18
- input[:resource] = self.bnode(token.value[2..-1])
19
- end
20
- terminal(:IRIREF, IRIREF, unescape: true) do |prod, token, input|
21
- input[:resource] = process_iri(token.value[1..-2])
22
- end
23
- terminal(:DOUBLE, DOUBLE) do |prod, token, input|
24
- # Note that a Turtle Double may begin with a '.[eE]', so tack on a leading
25
- # zero if necessary
26
- value = token.value.sub(/\.([eE])/, '.0\1')
27
- input[:resource] = literal(value, datatype: RDF::XSD.double)
28
- end
29
- terminal(:DECIMAL, DECIMAL) do |prod, token, input|
30
- # Note that a Turtle Decimal may begin with a '.', so tack on a leading
31
- # zero if necessary
32
- value = token.value
33
- value = "0#{token.value}" if token.value[0,1] == "."
34
- input[:resource] = literal(value, datatype: RDF::XSD.decimal)
35
- end
36
- terminal(:INTEGER, INTEGER) do |prod, token, input|
37
- input[:resource] = literal(token.value, datatype: RDF::XSD.integer)
38
- end
39
- # Spec confusion: spec says : "Literals , prefixed names and IRIs may also contain escape sequences"
40
- terminal(:PNAME_LN, PNAME_LN, unescape: true) do |prod, token, input|
41
- prefix, suffix = token.value.split(":", 2)
42
- input[:resource] = pname(prefix, suffix)
43
- end
44
- # Spec confusion: spec says : "Literals , prefixed names and IRIs may also contain escape sequences"
45
- terminal(:PNAME_NS, PNAME_NS) do |prod, token, input|
46
- prefix = token.value[0..-2]
47
-
48
- # Two contexts, one when prefix is being defined, the other when being used
49
- case prod
50
- when :prefixID, :sparqlPrefix
51
- input[:prefix] = prefix
52
- else
53
- input[:resource] = pname(prefix, '')
54
- end
55
- end
56
- terminal(:STRING_LITERAL_LONG_SINGLE_QUOTE, STRING_LITERAL_LONG_SINGLE_QUOTE, unescape: true) do |prod, token, input|
57
- input[:string_value] = token.value[3..-4]
58
- end
59
- terminal(:STRING_LITERAL_LONG_QUOTE, STRING_LITERAL_LONG_QUOTE, unescape: true) do |prod, token, input|
60
- input[:string_value] = token.value[3..-4]
61
- end
62
- terminal(:STRING_LITERAL_QUOTE, STRING_LITERAL_QUOTE, unescape: true) do |prod, token, input|
63
- input[:string_value] = token.value[1..-2]
64
- end
65
- terminal(:STRING_LITERAL_SINGLE_QUOTE, STRING_LITERAL_SINGLE_QUOTE, unescape: true) do |prod, token, input|
66
- input[:string_value] = token.value[1..-2]
67
- end
13
+ terminal(:ANON, ANON)
14
+ terminal(:BLANK_NODE_LABEL, BLANK_NODE_LABEL)
15
+ terminal(:IRIREF, IRIREF, unescape: true)
16
+ terminal(:DOUBLE, DOUBLE)
17
+ terminal(:DECIMAL, DECIMAL)
18
+ terminal(:INTEGER, INTEGER)
19
+ terminal(:PNAME_LN, PNAME_LN, unescape: true)
20
+ terminal(:PNAME_NS, PNAME_NS)
21
+ terminal(:STRING_LITERAL_LONG_SINGLE_QUOTE, STRING_LITERAL_LONG_SINGLE_QUOTE, unescape: true)
22
+ terminal(:STRING_LITERAL_LONG_QUOTE, STRING_LITERAL_LONG_QUOTE, unescape: true)
23
+ terminal(:STRING_LITERAL_QUOTE, STRING_LITERAL_QUOTE, unescape: true)
24
+ terminal(:STRING_LITERAL_SINGLE_QUOTE, STRING_LITERAL_SINGLE_QUOTE, unescape: true)
68
25
 
69
26
  # String terminals
70
- terminal(nil, %r([\(\),.;\[\]Aa]|\^\^|true|false)) do |prod, token, input|
71
- case token.value
72
- when 'A', 'a' then input[:resource] = RDF.type
73
- when 'true', 'false' then input[:resource] = RDF::Literal::Boolean.new(token.value)
74
- when '@base', '@prefix' then input[:lang] = token.value[1..-1]
75
- when '.' then input[:terminated] = true
76
- else input[:string] = token.value
77
- end
78
- end
79
-
80
- terminal(:PREFIX, PREFIX) do |prod, token, input|
81
- input[:string_value] = token.value
82
- end
83
- terminal(:BASE, BASE) do |prod, token, input|
84
- input[:string_value] = token.value
85
- end
86
-
87
- terminal(:LANGTAG, LANGTAG) do |prod, token, input|
88
- input[:lang] = token.value[1..-1]
89
- end
27
+ terminal(nil, %r([\(\),.;\[\]Aa]|\^\^|true|false))
90
28
 
91
- # Productions
92
- # [4] prefixID defines a prefix mapping
93
- production(:prefixID) do |input, current, callback|
94
- prefix = current[:prefix]
95
- iri = current[:resource]
96
- lexical = current[:string_value]
97
- terminated = current[:terminated]
98
- debug("prefixID") {"Defined prefix #{prefix.inspect} mapping to #{iri.inspect}"}
99
- if lexical.start_with?('@') && lexical != '@prefix'
100
- error(:prefixID, "should be downcased")
101
- elsif lexical == '@prefix'
102
- error(:prefixID, "directive not terminated") unless terminated
103
- else
104
- error(:prefixID, "directive should not be terminated") if terminated
105
- end
106
- prefix(prefix, iri)
107
- end
108
-
109
- # [5] base set base_uri
110
- production(:base) do |input, current, callback|
111
- iri = current[:resource]
112
- lexical = current[:string_value]
113
- terminated = current[:terminated]
114
- debug("base") {"Defined base as #{iri}"}
115
- if lexical.start_with?('@') && lexical != '@base'
116
- error(:base, "should be downcased")
117
- elsif lexical == '@base'
118
- error(:base, "directive not terminated") unless terminated
119
- else
120
- error(:base, "directive should not be terminated") if terminated
121
- end
122
- options[:base_uri] = iri
123
- end
124
-
125
- # [6] triples
126
- start_production(:triples) do |input, current, callback|
127
- # Note production as triples for blankNodePropertyList
128
- # to set :subject instead of :resource
129
- current[:triples] = true
130
- end
131
- production(:triples) do |input, current, callback|
132
- # Note production as triples for blankNodePropertyList
133
- # to set :subject instead of :resource
134
- current[:triples] = true
135
- end
29
+ terminal(:PREFIX, PREFIX)
30
+ terminal(:BASE, BASE)
31
+ terminal(:LANGTAG, LANGTAG)
136
32
 
137
- # [9] verb ::= predicate | "a"
138
- production(:verb) do |input, current, callback|
139
- input[:predicate] = current[:resource]
140
- end
141
-
142
- # [10] subject ::= IRIref | BlankNode | collection
143
- start_production(:subject) do |input, current, callback|
144
- current[:triples] = nil
145
- end
146
-
147
- production(:subject) do |input, current, callback|
148
- input[:subject] = current[:resource]
149
- end
150
-
151
- # [12] object ::= iri | BlankNode | collection | blankNodePropertyList | literal
152
- production(:object) do |input, current, callback|
153
- if input[:object_list]
154
- # Part of an rdf:List collection
155
- input[:object_list] << current[:resource]
156
- else
157
- debug("object") {"current: #{current.inspect}"}
158
- callback.call(:statement, "object", input[:subject], input[:predicate], current[:resource])
159
- end
160
- end
161
-
162
- # [14] blankNodePropertyList ::= "[" predicateObjectList "]"
163
- start_production(:blankNodePropertyList) do |input, current, callback|
164
- current[:subject] = self.bnode
165
- end
166
-
167
- production(:blankNodePropertyList) do |input, current, callback|
168
- if input[:triples]
169
- input[:subject] = current[:subject]
170
- else
171
- input[:resource] = current[:subject]
172
- end
173
- end
174
-
175
- # [15] collection ::= "(" object* ")"
176
- start_production(:collection) do |input, current, callback|
177
- # Tells the object production to collect and not generate statements
178
- current[:object_list] = []
179
- end
180
-
181
- production(:collection) do |input, current, callback|
182
- # Create an RDF list
183
- objects = current[:object_list]
184
- list = RDF::List[*objects]
185
- list.each_statement do |statement|
186
- next if statement.predicate == RDF.type && statement.object == RDF.List
187
- callback.call(:statement, "collection", statement.subject, statement.predicate, statement.object)
188
- end
33
+ ##
34
+ # Accumulated errors found during processing
35
+ # @return [Array<String>]
36
+ attr_reader :errors
189
37
 
190
- # Return bnode as resource
191
- input[:resource] = list.subject
192
- end
193
-
194
- # [16] RDFLiteral ::= String ( LanguageTag | ( "^^" IRIref ) )?
195
- production(:RDFLiteral) do |input, current, callback|
196
- opts = {}
197
- opts[:datatype] = current[:resource] if current[:resource]
198
- opts[:language] = current[:lang] if current[:lang]
199
- input[:resource] = literal(current[:string_value], opts)
200
- end
38
+ ##
39
+ # Accumulated warnings found during processing
40
+ # @return [Array<String>]
41
+ attr_reader :warnings
201
42
 
202
43
  ##
203
44
  # Redirect for Freebase Reader
@@ -229,13 +70,13 @@ module RDF::Turtle
229
70
  # the base URI to use when resolving relative URIs (for acessing intermediate parser productions)
230
71
  # @option options [#to_s] :anon_base ("b0")
231
72
  # Basis for generating anonymous Nodes
232
- # @option options [Boolean] :resolve_uris (false)
233
- # Resolve prefix and relative IRIs, otherwise, when serializing the parsed SSE
234
- # as S-Expressions, use the original prefixed and relative URIs along with `base` and `prefix`
235
- # definitions.
236
73
  # @option options [Boolean] :validate (false)
237
74
  # whether to validate the parsed statements and values. If not validating,
238
75
  # the parser will attempt to recover from errors.
76
+ # @option options [Array] :errors
77
+ # array for placing errors found when parsing
78
+ # @option options [Array] :warnings
79
+ # array for placing warnings found when parsing
239
80
  # @option options [Boolean] :progress
240
81
  # Show progress of parser productions
241
82
  # @option options [Boolean, Integer, Array] :debug
@@ -255,6 +96,11 @@ module RDF::Turtle
255
96
  whitespace: WS,
256
97
  }.merge(options)
257
98
  @options = {prefixes: {nil => ""}}.merge(@options) unless @options[:validate]
99
+ @errors = @options[:errors] || []
100
+ @warnings = @options[:warnings] || []
101
+ @depth = 0
102
+ @prod_stack = []
103
+
258
104
  @options[:debug] ||= case
259
105
  when RDF::Turtle.debug? then true
260
106
  when @options[:progress] then 2
@@ -268,6 +114,8 @@ module RDF::Turtle
268
114
  debug("canonicalize") {canonicalize?.inspect}
269
115
  debug("intern") {intern?.inspect}
270
116
 
117
+ @lexer = EBNF::LL1::Lexer.new(input, self.class.patterns, @options)
118
+
271
119
  if block_given?
272
120
  case block.arity
273
121
  when 0 then instance_eval(&block)
@@ -289,41 +137,28 @@ module RDF::Turtle
289
137
  # @return [void]
290
138
  def each_statement(&block)
291
139
  if block_given?
140
+ @recovering = false
292
141
  @callback = block
293
142
 
294
- parse(@input, START.to_sym, @options.merge(branch: BRANCH,
295
- first: FIRST,
296
- follow: FOLLOW,
297
- reset_on_start: true)
298
- ) do |context, *data|
299
- case context
300
- when :statement
301
- loc = data.shift
302
- s = RDF::Statement.from(data, lineno: lineno)
303
- add_statement(loc, s) unless !s.valid? && validate?
304
- when :trace
305
- level, lineno, depth, *args = data
306
- message = "#{args.join(': ')}"
307
- d_str = depth > 100 ? ' ' * 100 + '+' : ' ' * depth
308
- str = "[#{lineno}](#{level})#{d_str}#{message}"
309
- case @options[:debug]
310
- when Array
311
- @options[:debug] << str
312
- when TrueClass
313
- $stderr.puts str
314
- when Integer
315
- $stderr.puts(str) if level <= @options[:debug]
316
- end
143
+ begin
144
+ while (@lexer.first rescue true)
145
+ read_statement
146
+ end
147
+ rescue EBNF::LL1::Lexer::Error, SyntaxError, EOFError, Recovery
148
+ # Terminate loop if EOF found while recovering
149
+ end
150
+
151
+ if validate?
152
+ if !warnings.empty? && !@options[:warnings]
153
+ $stderr.puts "Warnings: #{warnings.join("\n")}"
154
+ end
155
+ if !errors.empty?
156
+ $stderr.puts "Errors: #{errors.join("\n")}" unless @options[:errors]
157
+ raise RDF::ReaderError, "Errors found during processing"
317
158
  end
318
159
  end
319
160
  end
320
161
  enum_for(:each_statement)
321
- rescue EBNF::LL1::Parser::Error, EBNF::LL1::Lexer::Error => e
322
- if validate?
323
- raise RDF::ReaderError.new(e.message, lineno: e.lineno, token: e.token)
324
- else
325
- $stderr.puts e.message
326
- end
327
162
  end
328
163
 
329
164
  ##
@@ -345,13 +180,12 @@ module RDF::Turtle
345
180
 
346
181
  # add a statement, object can be literal or URI or bnode
347
182
  #
348
- # @param [Nokogiri::XML::Node, any] node XML Node or string for showing context
183
+ # @param [Symbol] production
349
184
  # @param [RDF::Statement] statement the subject of the statement
350
185
  # @return [RDF::Statement] Added statement
351
186
  # @raise [RDF::ReaderError] Checks parameter types and raises if they are incorrect if parsing mode is _validate_.
352
- def add_statement(node, statement)
353
- error(node, "Statement is invalid: #{statement.inspect.inspect}") if validate? && statement.invalid?
354
- progress(node) {"generate statement: #{statement.to_ntriples}"}
187
+ def add_statement(production, statement)
188
+ error("Statement is invalid: #{statement.inspect.inspect}", production: produciton) if validate? && statement.invalid?
355
189
  @callback.call(statement) if statement.subject &&
356
190
  statement.predicate &&
357
191
  statement.object &&
@@ -360,11 +194,15 @@ module RDF::Turtle
360
194
 
361
195
  # Process a URI against base
362
196
  def process_iri(iri)
363
- value = base_uri.join(iri)
197
+ iri = iri.value[1..-2] if iri === :IRIREF
198
+ value = RDF::URI(iri)
199
+ value = base_uri.join(value) if value.relative?
364
200
  value.validate! if validate?
365
201
  value.canonicalize! if canonicalize?
366
202
  value = RDF::URI.intern(value) if intern?
367
203
  value
204
+ rescue ArgumentError => e
205
+ error("process_iri", e)
368
206
  end
369
207
 
370
208
  # Create a literal
@@ -376,6 +214,8 @@ module RDF::Turtle
376
214
  "c14n?: #{canonicalize?.inspect}"
377
215
  end
378
216
  RDF::Literal.new(value, options.merge(validate: validate?, canonicalize: canonicalize?))
217
+ rescue ArgumentError => e
218
+ error("Argument Error #{e.message}", production: :literal, token: @lexer.first)
379
219
  end
380
220
 
381
221
  ##
@@ -397,7 +237,7 @@ module RDF::Turtle
397
237
  if prefix(prefix)
398
238
  base = prefix(prefix).to_s
399
239
  elsif !prefix(prefix)
400
- error("pname", "undefined prefix #{prefix.inspect}")
240
+ error("undefined prefix", production: :pname, token: prefix)
401
241
  base = ''
402
242
  end
403
243
  suffix = suffix.to_s.sub(/^\#/, "") if base.index("#")
@@ -411,5 +251,443 @@ module RDF::Turtle
411
251
  @bnode_cache ||= {}
412
252
  @bnode_cache[value.to_s] ||= RDF::Node.new(value)
413
253
  end
254
+
255
+ protected
256
+ # @return [void]
257
+ def read_statement
258
+ prod(:statement, %w{.}) do
259
+ error("read_statement", "Unexpected end of file") unless token = @lexer.first
260
+ case token.type
261
+ when :BASE, :PREFIX
262
+ read_directive || error("Failed to parse directive", production: :directive, token: token)
263
+ else
264
+ read_triples || error("Expected token", production: :statement, token: token)
265
+ if !@recovering || @lexer.first === '.'
266
+ # If recovering, we will have eaten the closing '.'
267
+ token = @lexer.shift
268
+ unless token && token.value == '.'
269
+ error("Expected '.' following triple", production: :statement, token: token)
270
+ end
271
+ end
272
+ end
273
+ end
274
+ end
275
+
276
+ # @return [void]
277
+ def read_directive
278
+ prod(:directive, %w{.}) do
279
+ token = @lexer.first
280
+ case token.type
281
+ when :BASE
282
+ prod(:base) do
283
+ @lexer.shift
284
+ terminated = token.value == '@base'
285
+ iri = @lexer.shift
286
+ error("Expected IRIREF", :production => :base, token: iri) unless iri === :IRIREF
287
+ @options[:base_uri] = process_iri(iri)
288
+ error("base", "#{token} should be downcased") if token.value.start_with?('@') && token.value != '@base'
289
+
290
+ if terminated
291
+ error("base", "Expected #{token} to be terminated") unless @lexer.first === '.'
292
+ @lexer.shift
293
+ elsif @lexer.first === '.'
294
+ error("base", "Expected #{token} not to be terminated")
295
+ else
296
+ true
297
+ end
298
+ end
299
+ when :PREFIX
300
+ prod(:prefixID, %w{.}) do
301
+ @lexer.shift
302
+ pfx, iri = @lexer.shift, @lexer.shift
303
+ terminated = token.value == '@prefix'
304
+ error("Expected PNAME_NS", :production => :prefix, token: pfx) unless pfx === :PNAME_NS
305
+ error("Expected IRIREF", :production => :prefix, token: iri) unless iri === :IRIREF
306
+ debug("prefixID") {"Defined prefix #{pfx.inspect} mapping to #{iri.inspect}"}
307
+ prefix(pfx.value[0..-2], process_iri(iri))
308
+ error("prefixId", "#{token} should be downcased") if token.value.start_with?('@') && token.value != '@prefix'
309
+
310
+ if terminated
311
+ error("prefixID", "Expected #{token} to be terminated") unless @lexer.first === '.'
312
+ @lexer.shift
313
+ elsif @lexer.first === '.'
314
+ error("prefixID", "Expected #{token} not to be terminated")
315
+ else
316
+ true
317
+ end
318
+ end
319
+ end
320
+ end
321
+ end
322
+
323
+ # @return [Object] returns the last verb matched, or subject BNode on predicateObjectList?
324
+ def read_triples
325
+ prod(:triples, %w{.}) do
326
+ error("read_triples", "Unexpected end of file") unless token = @lexer.first
327
+ case token.type || token.value
328
+ when '['
329
+ # blankNodePropertyList predicateObjectList?
330
+ subject = read_blankNodePropertyList || error("Failed to parse blankNodePropertyList", production: :triples, token: @lexer.first)
331
+ read_predicateObjectList(subject) || subject
332
+ else
333
+ # subject predicateObjectList
334
+ subject = read_subject || error("Failed to parse subject", production: :triples, token: @lexer.first)
335
+ read_predicateObjectList(subject) || error("Expected predicateObjectList", production: :triples, token: @lexer.first)
336
+ end
337
+ end
338
+ end
339
+
340
+ # @param [RDF::Resource] subject
341
+ # @return [RDF::URI] the last matched verb
342
+ def read_predicateObjectList(subject)
343
+ prod(:predicateObjectList, %{;}) do
344
+ last_verb = nil
345
+ while verb = read_verb
346
+ last_verb = verb
347
+ prod(:_predicateObjectList_5) do
348
+ read_objectList(subject, verb) || error("Expected objectList", production: :predicateObjectList, token: @lexer.first)
349
+ end
350
+ break unless @lexer.first === ';'
351
+ @lexer.shift while @lexer.first === ';'
352
+ end
353
+ last_verb
354
+ end
355
+ end
356
+
357
+ # @return [RDF::Term] the last matched subject
358
+ def read_objectList(subject, predicate)
359
+ prod(:objectList, %{,}) do
360
+ last_object = nil
361
+ while object = prod(:_objectList_2) {read_object(subject, predicate)}
362
+ last_object = object
363
+ break unless @lexer.first === ','
364
+ @lexer.shift while @lexer.first === ','
365
+ end
366
+ last_object
367
+ end
368
+ end
369
+
370
+ # @return [RDF::URI]
371
+ def read_verb
372
+ error("read_verb", "Unexpected end of file") unless token = @lexer.first
373
+ case token.type || token.value
374
+ when 'a' then prod(:verb) {@lexer.shift && RDF.type}
375
+ else prod(:verb) {read_iri}
376
+ end
377
+ end
378
+
379
+ # @return [RDF::Resource]
380
+ def read_subject
381
+ prod(:subject) do
382
+ read_iri ||
383
+ read_BlankNode ||
384
+ read_collection ||
385
+ error( "Expected subject", production: :subject, token: @lexer.first)
386
+ end
387
+ end
388
+
389
+ # @return [void]
390
+ def read_object(subject = nil, predicate = nil)
391
+ prod(:object) do
392
+ if object = read_iri ||
393
+ read_BlankNode ||
394
+ read_collection ||
395
+ read_blankNodePropertyList ||
396
+ read_literal
397
+
398
+ add_statement(:object, RDF::Statement(subject, predicate, object)) if subject && predicate
399
+ object
400
+ end
401
+ end
402
+ end
403
+
404
+ # @return [RDF::Literal]
405
+ def read_literal
406
+ error("Unexpected end of file", production: :literal) unless token = @lexer.first
407
+ case token.type || token.value
408
+ when :INTEGER then prod(:literal) {literal(@lexer.shift.value, datatype: RDF::XSD.integer)}
409
+ when :DECIMAL
410
+ prod(:litearl) do
411
+ value = @lexer.shift.value
412
+ value = "0#{value}" if value.start_with?(".")
413
+ literal(value, datatype: RDF::XSD.decimal)
414
+ end
415
+ when :DOUBLE then prod(:literal) {literal(@lexer.shift.value.sub(/\.([eE])/, '.0\1'), datatype: RDF::XSD.double)}
416
+ when "true", "false" then prod(:literal) {literal(@lexer.shift.value, datatype: RDF::XSD.boolean)}
417
+ when :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE
418
+ prod(:literal) do
419
+ value = @lexer.shift.value[1..-2]
420
+ error("read_literal", "Unexpected end of file") unless token = @lexer.first
421
+ case token.type || token.value
422
+ when :LANGTAG
423
+ literal(value, language: @lexer.shift.value[1..-1].to_sym)
424
+ when '^^'
425
+ @lexer.shift
426
+ literal(value, datatype: read_iri)
427
+ else
428
+ literal(value)
429
+ end
430
+ end
431
+ when :STRING_LITERAL_LONG_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE
432
+ prod(:literal) do
433
+ value = @lexer.shift.value[3..-4]
434
+ error("read_literal", "Unexpected end of file") unless token = @lexer.first
435
+ case token.type || token.value
436
+ when :LANGTAG
437
+ literal(value, language: @lexer.shift.value[1..-1].to_sym)
438
+ when '^^'
439
+ @lexer.shift
440
+ literal(value, datatype: read_iri)
441
+ else
442
+ literal(value)
443
+ end
444
+ end
445
+ end
446
+ end
447
+
448
+ # @return [RDF::Node]
449
+ def read_blankNodePropertyList
450
+ token = @lexer.first
451
+ if token === '['
452
+ prod(:blankNodePropertyList, %{]}) do
453
+ @lexer.shift
454
+ progress("blankNodePropertyList") {"token: #{token.inspect}"}
455
+ node = bnode
456
+ read_predicateObjectList(node)
457
+ error("blankNodePropertyList", "Expected closing ']'") unless @lexer.first === ']'
458
+ @lexer.shift
459
+ node
460
+ end
461
+ end
462
+ end
463
+
464
+ # @return [RDF::Node]
465
+ def read_collection
466
+ if @lexer.first === '('
467
+ prod(:collection, %{)}) do
468
+ @lexer.shift
469
+ token = @lexer.first
470
+ progress("collection") {"token: #{token.inspect}"}
471
+ objects = []
472
+ while object = read_object
473
+ objects << object
474
+ end
475
+ list = RDF::List.new(nil, nil, objects)
476
+ list.each_statement do |statement|
477
+ add_statement("collection", statement)
478
+ end
479
+ error("collection", "Expected closing ')'") unless @lexer.first === ')'
480
+ @lexer.shift
481
+ list.subject
482
+ end
483
+ end
484
+ end
485
+
486
+ # @return [RDF::URI]
487
+ def read_iri
488
+ token = @lexer.first
489
+ case token && token.type
490
+ when :IRIREF then prod(:iri) {process_iri(@lexer.shift)}
491
+ when :PNAME_LN, :PNAME_NS then prod(:iri) {pname(*@lexer.shift.value.split(':', 2))}
492
+ end
493
+ end
494
+
495
+ # @return [RDF::Node]
496
+ def read_BlankNode
497
+ token = @lexer.first
498
+ case token && token.type
499
+ when :BLANK_NODE_LABEL then prod(:BlankNode) {bnode(@lexer.shift.value[2..-1])}
500
+ when :ANON then @lexer.shift && prod(:BlankNode) {bnode}
501
+ end
502
+ end
503
+
504
+ def prod(production, recover_to = [])
505
+ @prod_stack << {prod: production, recover_to: recover_to}
506
+ @depth += 1
507
+ @recovering = false
508
+ progress("#{production}(start)") {"token: #{@lexer.first.inspect}"}
509
+ yield
510
+ rescue EBNF::LL1::Lexer::Error, SyntaxError, Recovery => e
511
+ # Lexer encountered an illegal token or the parser encountered
512
+ # a terminal which is inappropriate for the current production.
513
+ # Perform error recovery to find a reasonable terminal based
514
+ # on the follow sets of the relevant productions. This includes
515
+ # remaining terms from the current production and the stacked
516
+ # productions
517
+ case e
518
+ when EBNF::LL1::Lexer::Error
519
+ @lexer.recover
520
+ begin
521
+ error("Lexer error", "With input '#{e.input}': #{e.message}",
522
+ production: production,
523
+ token: e.token)
524
+ rescue SyntaxError
525
+ end
526
+ end
527
+ raise EOFError, "End of input found when recovering" if @lexer.first.nil?
528
+ debug("recovery", "current token: #{@lexer.first.inspect}", :level => 4)
529
+
530
+ unless e.is_a?(Recovery)
531
+ # Get the list of follows for this sequence, this production and the stacked productions.
532
+ debug("recovery", "stack follows:", :level => 4)
533
+ @prod_stack.reverse.each do |prod|
534
+ debug("recovery", :level => 4) {" #{prod[:prod]}: #{prod[:recover_to].inspect}"}
535
+ end
536
+ end
537
+
538
+ # Find all follows to the top of the stack
539
+ follows = @prod_stack.map {|prod| Array(prod[:recover_to])}.flatten.compact.uniq
540
+
541
+ # Skip tokens until one is found in follows
542
+ while (token = (@lexer.first rescue @lexer.recover)) && follows.none? {|t| token === t}
543
+ skipped = @lexer.shift
544
+ progress("recovery") {"skip #{skipped.inspect}"}
545
+ end
546
+ debug("recovery") {"found #{token.inspect} in follows"}
547
+
548
+ # Re-raise the error unless token is a follows of this production
549
+ raise Recovery unless Array(recover_to).any? {|t| token === t}
550
+
551
+ # Skip that token to get something reasonable to start the next production with
552
+ @lexer.shift
553
+ ensure
554
+ progress("#{production}(finish)")
555
+ @depth -= 1
556
+ @prod_stack.pop
557
+ end
558
+
559
+ ##
560
+ # Warning information, used as level `1` debug messages.
561
+ #
562
+ # @param [String] node Relevant location associated with message
563
+ # @param [String] message Error string
564
+ # @param [Hash] options
565
+ # @option options [URI, #to_s] :production
566
+ # @option options [Token] :token
567
+ # @see {#debug}
568
+ def warn(node, message, options = {})
569
+ m = "WARNING "
570
+ m += "[line: #{@lineno}] " if @lineno
571
+ m += message
572
+ m += " (found #{options[:token].inspect})" if options[:token]
573
+ m += ", production = #{options[:production].inspect}" if options[:production]
574
+ @warnings << m unless @recovering
575
+ debug(node, m, options.merge(:level => 1))
576
+ end
577
+
578
+ ##
579
+ # Error information, used as level `0` debug messages.
580
+ #
581
+ # @overload debug(node, message, options)
582
+ # @param [String] node Relevant location associated with message
583
+ # @param [String] message Error string
584
+ # @param [Hash] options
585
+ # @option options [URI, #to_s] :production
586
+ # @option options [Token] :token
587
+ # @see {#debug}
588
+ def error(*args)
589
+ return if @recovering
590
+ options = args.last.is_a?(Hash) ? args.pop : {}
591
+ lineno = @lineno || (options[:token].lineno if options[:token].respond_to?(:lineno))
592
+ message = "#{args.join(': ')}"
593
+ m = "ERROR "
594
+ m += "[line: #{lineno}] " if lineno
595
+ m += message
596
+ m += " (found #{options[:token].inspect})" if options[:token]
597
+ m += ", production = #{options[:production].inspect}" if options[:production]
598
+ @recovering = true
599
+ @errors << m
600
+ debug(m, options.merge(level: 0))
601
+ raise SyntaxError.new(m, lineno: lineno, token: options[:token], production: options[:production])
602
+ end
603
+
604
+ ##
605
+ # Progress output when debugging.
606
+ #
607
+ # The call is ignored, unless `@options[:debug]` is set, in which
608
+ # case it records tracing information as indicated. Additionally,
609
+ # if `@options[:debug]` is an Integer, the call is aborted if the
610
+ # `:level` option is less than than `:level`.
611
+ #
612
+ # @overload debug(node, message, options)
613
+ # @param [Array<String>] args Relevant location associated with message
614
+ # @param [Hash] options
615
+ # @option options [Integer] :depth
616
+ # Recursion depth for indenting output
617
+ # @option options [Integer] :level
618
+ # Level assigned to message, by convention, level `0` is for
619
+ # errors, level `1` is for warnings, level `2` is for parser
620
+ # progress information, and anything higher is for various levels
621
+ # of debug information.
622
+ #
623
+ # @yieldparam [:trace] trace
624
+ # @yieldparam [Integer] level
625
+ # @yieldparam [Integer] lineno
626
+ # @yieldparam [Integer] depth Recursive depth of productions
627
+ # @yieldparam [Array<String>] args
628
+ # @yieldreturn [String] added to message
629
+ def debug(*args)
630
+ return unless @options[:debug]
631
+ options = args.last.is_a?(Hash) ? args.pop : {}
632
+ debug_level = options.fetch(:level, 3)
633
+ return if @options[:debug].is_a?(Integer) && debug_level > @options[:debug]
634
+
635
+ depth = options[:depth] || @depth
636
+ args << yield if block_given?
637
+
638
+ message = "#{args.join(': ')}"
639
+ d_str = depth > 100 ? ' ' * 100 + '+' : ' ' * depth
640
+ str = "[#{lineno}](#{debug_level})#{d_str}#{message}"
641
+ case @options[:debug]
642
+ when Array
643
+ @options[:debug] << str
644
+ when TrueClass
645
+ $stderr.puts str
646
+ when Integer
647
+ case debug_level
648
+ when 0 then return if @options[:errors]
649
+ when 1 then return if @options[:warnings]
650
+ end
651
+ $stderr.puts(str) if debug_level <= @options[:debug]
652
+ end
653
+ end
654
+
655
+ # Used for internal error recovery
656
+ class Recovery < StandardError; end
657
+
658
+ class SyntaxError < RDF::ReaderError
659
+ ##
660
+ # The current production.
661
+ #
662
+ # @return [Symbol]
663
+ attr_reader :production
664
+
665
+ ##
666
+ # The invalid token which triggered the error.
667
+ #
668
+ # @return [String]
669
+ attr_reader :token
670
+
671
+ ##
672
+ # The line number where the error occurred.
673
+ #
674
+ # @return [Integer]
675
+ attr_reader :lineno
676
+
677
+ ##
678
+ # Initializes a new syntax error instance.
679
+ #
680
+ # @param [String, #to_s] message
681
+ # @param [Hash{Symbol => Object}] options
682
+ # @option options [Symbol] :production (nil)
683
+ # @option options [String] :token (nil)
684
+ # @option options [Integer] :lineno (nil)
685
+ def initialize(message, options = {})
686
+ @production = options[:production]
687
+ @token = options[:token]
688
+ @lineno = options[:lineno] || (@token.lineno if @token.respond_to?(:lineno))
689
+ super(message.to_s)
690
+ end
691
+ end
414
692
  end # class Reader
415
693
  end # module RDF::Turtle