rdf-turtle 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/AUTHORS +1 -0
- data/History +9 -0
- data/README.markdown +142 -0
- data/UNLICENSE +24 -0
- data/VERSION +1 -0
- data/lib/rdf/ll1/lexer.rb +458 -0
- data/lib/rdf/ll1/parser.rb +462 -0
- data/lib/rdf/ll1/scanner.rb +100 -0
- data/lib/rdf/turtle.rb +35 -0
- data/lib/rdf/turtle/format.rb +41 -0
- data/lib/rdf/turtle/meta.rb +1748 -0
- data/lib/rdf/turtle/patches.rb +38 -0
- data/lib/rdf/turtle/reader.rb +362 -0
- data/lib/rdf/turtle/terminals.rb +88 -0
- data/lib/rdf/turtle/writer.rb +562 -0
- metadata +115 -0
data/AUTHORS
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
* Gregg Kellogg <gregg@kellogg-assoc.com>
|
data/History
ADDED
@@ -0,0 +1,9 @@
|
|
1
|
+
### 0.0.3
|
2
|
+
* Completed RDF 1.1 Turtle based on http://www.w3.org/TR/2011/WD-turtle-20110809/
|
3
|
+
* Reader
|
4
|
+
* Writer
|
5
|
+
* Issues:
|
6
|
+
* IRI lexical representations
|
7
|
+
* PNAMES not unescaped, should they be?
|
8
|
+
* Assume undefined empty prefix is synonym for base
|
9
|
+
* Can a list be used on it's own? Used in Turtle example.
|
data/README.markdown
ADDED
@@ -0,0 +1,142 @@
|
|
1
|
+
# RDF::Turtle reader/writer
|
2
|
+
[Turtle][] reader/writer for [RDF.rb][RDF.rb] .
|
3
|
+
|
4
|
+
## Description
|
5
|
+
This is a [Ruby][] implementation of a [Turtle][] parser for [RDF.rb][].
|
6
|
+
|
7
|
+
## Features
|
8
|
+
RDF::Turtle parses [Turtle][Turtle] and [N-Triples][N-Triples] into statements or triples. It also serializes to Turtle.
|
9
|
+
|
10
|
+
Install with `gem install rdf-turtle`
|
11
|
+
|
12
|
+
* 100% free and unencumbered [public domain](http://unlicense.org/) software.
|
13
|
+
* Implements a complete parser for [Turtle][].
|
14
|
+
* Compatible with Ruby 1.8.7+, Ruby 1.9.x, and JRuby 1.4/1.5.
|
15
|
+
|
16
|
+
## Usage
|
17
|
+
Instantiate a reader from a local file:
|
18
|
+
|
19
|
+
RDF::Turtle::Reader.open("etc/foaf.ttl") do |reader|
|
20
|
+
reader.each_statement do |statement|
|
21
|
+
puts statement.inspect
|
22
|
+
end
|
23
|
+
end
|
24
|
+
|
25
|
+
or
|
26
|
+
|
27
|
+
graph = RDF::Graph.load("etc/foaf.ttl", :format => :ttl)
|
28
|
+
|
29
|
+
|
30
|
+
Define `@base` and `@prefix` definitions, and use for serialization using `:base_uri` an `:prefixes` options
|
31
|
+
|
32
|
+
Write a graph to a file:
|
33
|
+
|
34
|
+
RDF::Turtle::Writer.open("etc/test.ttl") do |writer|
|
35
|
+
writer << graph
|
36
|
+
end
|
37
|
+
|
38
|
+
## Documentation
|
39
|
+
Full documentation available on [Rubydoc.info][Turtle doc]
|
40
|
+
|
41
|
+
### Principle Classes
|
42
|
+
* {RDF::Turtle::Format}
|
43
|
+
* {RDF::Turtle::TTL}
|
44
|
+
Asserts :ttl format, text/turtle mime-type and .ttl file extension.
|
45
|
+
* {RDF::Turtle::Reader}
|
46
|
+
* {RDF::Turtle::Writer}
|
47
|
+
|
48
|
+
### Variations from the spec
|
49
|
+
In some cases, the specification is unclear on certain issues:
|
50
|
+
|
51
|
+
* In section 2.1, the [spec][Turtle] indicates that "Literals ,
|
52
|
+
prefixed names and IRIs may also contain escapes to encode surrounding syntax ...",
|
53
|
+
however the description in 5.2 indicates that only IRI\_REF and the various STRING\_LITERAL terms
|
54
|
+
are subject to unescaping. This means that an IRI which might otherwise be representable using a PNAME
|
55
|
+
cannot if the IRI contains any characters that might need escaping. This implementation currently abides
|
56
|
+
by this restriction. Presumably, this would affect both PNAME\_NS and PNAME\_LN terminals.
|
57
|
+
* The empty prefix ':' does not have a default definition. In Notation, this definition was '<#>' which is specifically
|
58
|
+
not intended to be used in Turtle. However, example markup using the empty prefix is common among examples. This
|
59
|
+
implementation defines the empty prefix as an alias for the current base IRI (either defined using `@base`,
|
60
|
+
or based on the document's origin).
|
61
|
+
* The EBNF definition of IRI_REF seems malformed, and has no provision for \^, as discussed elsewhere in the spec.
|
62
|
+
We presume that [#0000- ] is intended to be [#0000-#0020].
|
63
|
+
* The list example in section 6 uses a list on it's own, without a predicate or object, which is not allowed
|
64
|
+
by the grammar (neither is a blankNodeProperyList). Either the EBNF should be updated to allow for these
|
65
|
+
forms, or the examples should be changed such that ( ... ) and [ ... ] are used only in the context of being
|
66
|
+
a subject or object. This implementation will generate triples, however an error will be generated if the
|
67
|
+
parser is run in validation mode.
|
68
|
+
|
69
|
+
## Implementation Notes
|
70
|
+
The reader uses a generic LL1 parser {RDF::LL1::Parser} and lexer {RDF::LL1::Lexer}. The parser takes branch and follow
|
71
|
+
tables generated from the original [Turtle EBNF Grammar][Turtle EBNF] described in the [specification][Turtle]. Branch and Follow tables are specified in {RDF::Turtle::Meta}, which is in turn
|
72
|
+
generated using etc/gramLL1.
|
73
|
+
|
74
|
+
The branch rules indicate productions to be taken based on a current production. Terminals are denoted
|
75
|
+
through a set of regular expressions used to match each type of terminal, described in {RDF::Turtle::Terminals}.
|
76
|
+
|
77
|
+
etc/turtle.bnf is used to to generate a Notation3 representation of the grammar, a transformed LL1 representation and ultimately {RDF::Turtle::Meta}.
|
78
|
+
|
79
|
+
Using SWAP utilities, this is done as follows:
|
80
|
+
|
81
|
+
python http://www.w3.org/2000/10/swap/grammar/ebnf2turtle.py \
|
82
|
+
etc/turtle.bnf \
|
83
|
+
ttl language \
|
84
|
+
'http://www.w3.org/2000/10/swap/grammar/turtle#' > |
|
85
|
+
sed -e 's/^ ".*"$/ g:seq (&)/' > etc/turtle.n3
|
86
|
+
|
87
|
+
python http://www.w3.org/2000/10/swap/cwm.py etc/turtle.n3 \
|
88
|
+
http://www.w3.org/2000/10/swap/grammar/ebnf2bnf.n3 \
|
89
|
+
http://www.w3.org/2000/10/swap/grammar/first_follow.n3 \
|
90
|
+
--think --data > etc/turtle-bnf.n3
|
91
|
+
|
92
|
+
script/gramLL1 \
|
93
|
+
--grammar etc/turtle-ll1.n3 \
|
94
|
+
--lang 'http://www.w3.org/2000/10/swap/grammar/turtle#language' \
|
95
|
+
--output lib/rdf/turtle/meta.rb
|
96
|
+
|
97
|
+
|
98
|
+
## Dependencies
|
99
|
+
|
100
|
+
* [Ruby](http://ruby-lang.org/) (>= 1.8.7) or (>= 1.8.1 with [Backports][])
|
101
|
+
* [RDF.rb](http://rubygems.org/gems/rdf) (>= 0.3.0)
|
102
|
+
|
103
|
+
## Installation
|
104
|
+
|
105
|
+
The recommended installation method is via [RubyGems](http://rubygems.org/).
|
106
|
+
To install the latest official release of the `SPARQL::Grammar` gem, do:
|
107
|
+
|
108
|
+
% [sudo] gem install rdf-turtle
|
109
|
+
|
110
|
+
## Mailing List
|
111
|
+
* <http://lists.w3.org/Archives/Public/public-rdf-ruby/>
|
112
|
+
|
113
|
+
## Author
|
114
|
+
* [Gregg Kellogg](http://github.com/gkellogg) - <http://kellogg-assoc.com/>
|
115
|
+
|
116
|
+
## Contributing
|
117
|
+
* Do your best to adhere to the existing coding conventions and idioms.
|
118
|
+
* Don't use hard tabs, and don't leave trailing whitespace on any line.
|
119
|
+
* Do document every method you add using [YARD][] annotations. Read the
|
120
|
+
[tutorial][YARD-GS] or just look at the existing code for examples.
|
121
|
+
* Don't touch the `.gemspec`, `VERSION` or `AUTHORS` files. If you need to
|
122
|
+
change them, do so on your private branch only.
|
123
|
+
* Do feel free to add yourself to the `CREDITS` file and the corresponding
|
124
|
+
list in the the `README`. Alphabetical order applies.
|
125
|
+
* Do note that in order for us to merge any non-trivial changes (as a rule
|
126
|
+
of thumb, additions larger than about 15 lines of code), we need an
|
127
|
+
explicit [public domain dedication][PDD] on record from you.
|
128
|
+
|
129
|
+
## License
|
130
|
+
This is free and unencumbered public domain software. For more information,
|
131
|
+
see <http://unlicense.org/> or the accompanying {file:UNLICENSE} file.
|
132
|
+
|
133
|
+
[Ruby]: http://ruby-lang.org/
|
134
|
+
[RDF]: http://www.w3.org/RDF/
|
135
|
+
[YARD]: http://yardoc.org/
|
136
|
+
[YARD-GS]: http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
|
137
|
+
[PDD]: http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
|
138
|
+
[RDF.rb]: http://rdf.rubyforge.org/
|
139
|
+
[Backports]: http://rubygems.org/gems/backports
|
140
|
+
[Turtle]: http://www.w3.org/TR/2011/WD-turtle-20110809/
|
141
|
+
[Turtle doc]: http://rubydoc.info/github/gkellogg/rdf-turtle/master/file/README.markdown
|
142
|
+
[Turtle EBNF]: http://www.w3.org/2000/10/swap/grammar/turtle.bnf
|
data/UNLICENSE
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
This is free and unencumbered software released into the public domain.
|
2
|
+
|
3
|
+
Anyone is free to copy, modify, publish, use, compile, sell, or
|
4
|
+
distribute this software, either in source code form or as a compiled
|
5
|
+
binary, for any purpose, commercial or non-commercial, and by any
|
6
|
+
means.
|
7
|
+
|
8
|
+
In jurisdictions that recognize copyright laws, the author or authors
|
9
|
+
of this software dedicate any and all copyright interest in the
|
10
|
+
software to the public domain. We make this dedication for the benefit
|
11
|
+
of the public at large and to the detriment of our heirs and
|
12
|
+
successors. We intend this dedication to be an overt act of
|
13
|
+
relinquishment in perpetuity of all present and future rights to this
|
14
|
+
software under copyright law.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
19
|
+
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
20
|
+
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
21
|
+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
22
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
23
|
+
|
24
|
+
For more information, please refer to <http://unlicense.org/>
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.0.2
|
@@ -0,0 +1,458 @@
|
|
1
|
+
module RDF::LL1
|
2
|
+
require 'rdf/ll1/scanner' unless defined?(Scanner)
|
3
|
+
|
4
|
+
##
|
5
|
+
# A lexical analyzer
|
6
|
+
#
|
7
|
+
# @example Tokenizing a Turtle string
|
8
|
+
# terminals = [
|
9
|
+
# [:BLANK_NODE_LABEL, %r(_:(#{PN_LOCAL}))],
|
10
|
+
# ...
|
11
|
+
# ]
|
12
|
+
# ttl = "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ."
|
13
|
+
# lexer = RDF::LL1::Lexer.tokenize(ttl, terminals)
|
14
|
+
# lexer.each_token do |token|
|
15
|
+
# puts token.inspect
|
16
|
+
# end
|
17
|
+
#
|
18
|
+
# @example Tokenizing and returning a token stream
|
19
|
+
# lexer = RDF::LL1::Lexer.tokenize(...)
|
20
|
+
# while :some-condition
|
21
|
+
# token = lexer.first # Get the current token
|
22
|
+
# token = lexer.shift # Get the current token and shift to the next
|
23
|
+
# end
|
24
|
+
#
|
25
|
+
# @example Handling error conditions
|
26
|
+
# begin
|
27
|
+
# RDF::Turtle::Lexer.tokenize(query)
|
28
|
+
# rescue RDF::Turtle::Lexer::Error => error
|
29
|
+
# warn error.inspect
|
30
|
+
# end
|
31
|
+
#
|
32
|
+
# @see http://en.wikipedia.org/wiki/Lexical_analysis
|
33
|
+
class Lexer
|
34
|
+
include Enumerable
|
35
|
+
|
36
|
+
ESCAPE_CHARS = {
|
37
|
+
'\\t' => "\t", # \u0009 (tab)
|
38
|
+
'\\n' => "\n", # \u000A (line feed)
|
39
|
+
'\\r' => "\r", # \u000D (carriage return)
|
40
|
+
'\\b' => "\b", # \u0008 (backspace)
|
41
|
+
'\\f' => "\f", # \u000C (form feed)
|
42
|
+
'\\"' => '"', # \u0022 (quotation mark, double quote mark)
|
43
|
+
"\\'" => '\'', # \u0027 (apostrophe-quote, single quote mark)
|
44
|
+
'\\\\' => '\\' # \u005C (backslash)
|
45
|
+
}
|
46
|
+
ESCAPE_CHAR4 = /\\u(?:[0-9A-Fa-f]{4,4})/ # \uXXXX
|
47
|
+
ESCAPE_CHAR8 = /\\U(?:[0-9A-Fa-f]{8,8})/ # \UXXXXXXXX
|
48
|
+
ECHAR = /\\[tbnrf\\"']/ # [91s]
|
49
|
+
UCHAR = /#{ESCAPE_CHAR4}|#{ESCAPE_CHAR8}/
|
50
|
+
COMMENT = /#.*/
|
51
|
+
WS = / |\t|\r|\n/m
|
52
|
+
|
53
|
+
ML_START = /\'\'\'|\"\"\"/ # Beginning of terminals that may span lines
|
54
|
+
|
55
|
+
##
|
56
|
+
# @attr [Regexp] defines whitespace, defaults to WS
|
57
|
+
attr_reader :whitespace
|
58
|
+
|
59
|
+
##
|
60
|
+
# @attr [Regexp] defines single-line comment, defaults to COMMENT
|
61
|
+
attr_reader :comment
|
62
|
+
|
63
|
+
##
|
64
|
+
# Returns a copy of the given `input` string with all `\uXXXX` and
|
65
|
+
# `\UXXXXXXXX` Unicode codepoint escape sequences replaced with their
|
66
|
+
# unescaped UTF-8 character counterparts.
|
67
|
+
#
|
68
|
+
# @param [String] input
|
69
|
+
# @return [String]
|
70
|
+
# @see http://www.w3.org/TR/rdf-sparql-query/#codepointEscape
|
71
|
+
def self.unescape_codepoints(string)
|
72
|
+
# Decode \uXXXX and \UXXXXXXXX code points:
|
73
|
+
string = string.gsub(UCHAR) do |c|
|
74
|
+
s = [(c[2..-1]).hex].pack('U*')
|
75
|
+
s.respond_to?(:force_encoding) ? s.force_encoding(Encoding::ASCII_8BIT) : s
|
76
|
+
end
|
77
|
+
|
78
|
+
string.force_encoding(Encoding::UTF_8) if string.respond_to?(:force_encoding) # Ruby 1.9+
|
79
|
+
string
|
80
|
+
end
|
81
|
+
|
82
|
+
##
|
83
|
+
# Returns a copy of the given `input` string with all string escape
|
84
|
+
# sequences (e.g. `\n` and `\t`) replaced with their unescaped UTF-8
|
85
|
+
# character counterparts.
|
86
|
+
#
|
87
|
+
# @param [String] input
|
88
|
+
# @return [String]
|
89
|
+
# @see http://www.w3.org/TR/rdf-sparql-query/#grammarEscapes
|
90
|
+
def self.unescape_string(input)
|
91
|
+
input.gsub(ECHAR) { |escaped| ESCAPE_CHARS[escaped] }
|
92
|
+
end
|
93
|
+
|
94
|
+
##
|
95
|
+
# Tokenizes the given `input` string or stream.
|
96
|
+
#
|
97
|
+
# @param [String, #to_s] input
|
98
|
+
# @param [Array<Array<Symbol, Regexp>>] terminals
|
99
|
+
# Array of symbol, regexp pairs used to match terminals.
|
100
|
+
# If the symbol is nil, it defines a Regexp to match string terminals.
|
101
|
+
# @param [Hash{Symbol => Object}] options
|
102
|
+
# @yield [lexer]
|
103
|
+
# @yieldparam [Lexer] lexer
|
104
|
+
# @return [Lexer]
|
105
|
+
# @raise [Lexer::Error] on invalid input
|
106
|
+
def self.tokenize(input, terminals, options = {}, &block)
|
107
|
+
lexer = self.new(input, terminals, options)
|
108
|
+
block_given? ? block.call(lexer) : lexer
|
109
|
+
end
|
110
|
+
|
111
|
+
##
|
112
|
+
# Initializes a new lexer instance.
|
113
|
+
#
|
114
|
+
# @param [String, #to_s] input
|
115
|
+
# @param [Array<Array<Symbol, Regexp>>] terminals
|
116
|
+
# Array of symbol, regexp pairs used to match terminals.
|
117
|
+
# If the symbol is nil, it defines a Regexp to match string terminals.
|
118
|
+
# @param [Hash{Symbol => Object}] options
|
119
|
+
# @option options [Regexp] :whitespace (WS)
|
120
|
+
# @option options [Regexp] :comment (COMMENT)
|
121
|
+
# @option options [Array<Symbol>] :unescape_terms ([])
|
122
|
+
# Regular expression matching the beginning of terminals that may cross newlines
|
123
|
+
def initialize(input = nil, terminals = nil, options = {})
|
124
|
+
@options = options.dup
|
125
|
+
@whitespace = @options[:whitespace] || WS
|
126
|
+
@comment = @options[:comment] || COMMENT
|
127
|
+
@unescape_terms = @options[:unescape_terms] || []
|
128
|
+
@terminals = terminals
|
129
|
+
|
130
|
+
raise Error, "Terminal patterns not defined" unless @terminals && @terminals.length > 0
|
131
|
+
|
132
|
+
@lineno = 1
|
133
|
+
@scanner = Scanner.new(input) do |string|
|
134
|
+
string.force_encoding(Encoding::UTF_8) if string.respond_to?(:force_encoding) # Ruby 1.9+
|
135
|
+
string
|
136
|
+
end
|
137
|
+
end
|
138
|
+
|
139
|
+
##
|
140
|
+
# Any additional options for the lexer.
|
141
|
+
#
|
142
|
+
# @return [Hash]
|
143
|
+
attr_reader :options
|
144
|
+
|
145
|
+
##
|
146
|
+
# The current input string being processed.
|
147
|
+
#
|
148
|
+
# @return [String]
|
149
|
+
attr_accessor :input
|
150
|
+
|
151
|
+
##
|
152
|
+
# The current line number (zero-based).
|
153
|
+
#
|
154
|
+
# @return [Integer]
|
155
|
+
attr_reader :lineno
|
156
|
+
|
157
|
+
##
|
158
|
+
# Returns `true` if the input string is lexically valid.
|
159
|
+
#
|
160
|
+
# To be considered valid, the input string must contain more than zero
|
161
|
+
# terminals, and must not contain any invalid terminals.
|
162
|
+
#
|
163
|
+
# @return [Boolean]
|
164
|
+
def valid?
|
165
|
+
begin
|
166
|
+
!count.zero?
|
167
|
+
rescue Error
|
168
|
+
false
|
169
|
+
end
|
170
|
+
end
|
171
|
+
|
172
|
+
##
|
173
|
+
# Enumerates each token in the input string.
|
174
|
+
#
|
175
|
+
# @yield [token]
|
176
|
+
# @yieldparam [Token] token
|
177
|
+
# @return [Enumerator]
|
178
|
+
def each_token(&block)
|
179
|
+
if block_given?
|
180
|
+
while token = shift
|
181
|
+
yield token
|
182
|
+
end
|
183
|
+
end
|
184
|
+
enum_for(:each_token)
|
185
|
+
end
|
186
|
+
alias_method :each, :each_token
|
187
|
+
|
188
|
+
##
|
189
|
+
# Returns first token in input stream
|
190
|
+
#
|
191
|
+
# @return [Token]
|
192
|
+
def first
|
193
|
+
return nil unless scanner
|
194
|
+
|
195
|
+
@first ||= begin
|
196
|
+
{} while !scanner.eos? && skip_whitespace
|
197
|
+
return @scanner = nil if scanner.eos?
|
198
|
+
|
199
|
+
token = match_token
|
200
|
+
|
201
|
+
if token.nil?
|
202
|
+
lexme = (scanner.rest.split(/#{@whitespace}|#{@comment}/).first rescue nil) || scanner.rest
|
203
|
+
raise Error.new("Invalid token #{lexme.inspect} on line #{lineno + 1}",
|
204
|
+
:input => scanner.rest[0..100], :token => lexme, :lineno => lineno)
|
205
|
+
end
|
206
|
+
|
207
|
+
token
|
208
|
+
end
|
209
|
+
end
|
210
|
+
|
211
|
+
##
|
212
|
+
# Returns first token and shifts to next
|
213
|
+
#
|
214
|
+
# @return [Token]
|
215
|
+
def shift
|
216
|
+
cur = first
|
217
|
+
@first = nil
|
218
|
+
cur
|
219
|
+
end
|
220
|
+
|
221
|
+
##
|
222
|
+
# Skip input until a token is matched
|
223
|
+
#
|
224
|
+
# @return [Token]
|
225
|
+
def recover
|
226
|
+
scanner.skip(/./)
|
227
|
+
until scanner.eos? do
|
228
|
+
begin
|
229
|
+
return first
|
230
|
+
rescue Error
|
231
|
+
# Ignore errors until something scans, or EOS.
|
232
|
+
scanner.skip(/./)
|
233
|
+
end
|
234
|
+
end
|
235
|
+
end
|
236
|
+
protected
|
237
|
+
|
238
|
+
# @return [StringScanner]
|
239
|
+
attr_reader :scanner
|
240
|
+
|
241
|
+
# Perform string and codepoint unescaping
|
242
|
+
# @param [String] string
|
243
|
+
# @return [String]
|
244
|
+
def unescape(string)
|
245
|
+
self.class.unescape_string(self.class.unescape_codepoints(string))
|
246
|
+
end
|
247
|
+
|
248
|
+
##
|
249
|
+
# Skip whitespace or comments, as defined through input options or defaults
|
250
|
+
def skip_whitespace
|
251
|
+
# skip all white space, but keep track of the current line number
|
252
|
+
while !scanner.eos?
|
253
|
+
if matched = scanner.scan(@whitespace)
|
254
|
+
@lineno += matched.count("\n")
|
255
|
+
elsif (com = scanner.scan(@comment))
|
256
|
+
else
|
257
|
+
return
|
258
|
+
end
|
259
|
+
end
|
260
|
+
end
|
261
|
+
|
262
|
+
##
|
263
|
+
# Return the matched token
|
264
|
+
#
|
265
|
+
# @return [Token]
|
266
|
+
def match_token
|
267
|
+
@terminals.each do |(term, regexp)|
|
268
|
+
#STDERR.puts "match[#{term}] #{scanner.rest[0..100].inspect} against #{regexp.inspect}" if term == :STRING_LITERAL2
|
269
|
+
if matched = scanner.scan(regexp)
|
270
|
+
matched = unescape(matched) if @unescape_terms.include?(term)
|
271
|
+
#STDERR.puts " unescape? #{@unescape_terms.include?(term).inspect}"
|
272
|
+
#STDERR.puts " matched #{term.inspect}: #{matched.inspect}"
|
273
|
+
return token(term, matched)
|
274
|
+
end
|
275
|
+
end
|
276
|
+
nil
|
277
|
+
end
|
278
|
+
|
279
|
+
protected
|
280
|
+
|
281
|
+
##
|
282
|
+
# Constructs a new token object annotated with the current line number.
|
283
|
+
#
|
284
|
+
# The parser relies on the type being a symbolized URI and the value being
|
285
|
+
# a string, if there is no type. If there is a type, then the value takes
|
286
|
+
# on the native representation appropriate for that type.
|
287
|
+
#
|
288
|
+
# @param [Symbol] type
|
289
|
+
# @param [String] value
|
290
|
+
# Scanner instance with access to matched groups
|
291
|
+
# @return [Token]
|
292
|
+
def token(type, value)
|
293
|
+
Token.new(type, value, :lineno => lineno)
|
294
|
+
end
|
295
|
+
|
296
|
+
##
|
297
|
+
# Represents a lexer token.
|
298
|
+
#
|
299
|
+
# @example Creating a new token
|
300
|
+
# token = RDF::LL1::Lexer::Token.new(:LANGTAG, "en")
|
301
|
+
# token.type #=> :LANGTAG
|
302
|
+
# token.value #=> "en"
|
303
|
+
#
|
304
|
+
# @see http://en.wikipedia.org/wiki/Lexical_analysis#Token
|
305
|
+
class Token
|
306
|
+
##
|
307
|
+
# Initializes a new token instance.
|
308
|
+
#
|
309
|
+
# @param [Symbol] type
|
310
|
+
# @param [String] value
|
311
|
+
# @param [Hash{Symbol => Object}] options
|
312
|
+
# @option options [Integer] :lineno (nil)
|
313
|
+
def initialize(type, value, options = {})
|
314
|
+
@type, @value = (type ? type.to_s.to_sym : nil), value
|
315
|
+
@options = options.dup
|
316
|
+
@lineno = @options.delete(:lineno)
|
317
|
+
end
|
318
|
+
|
319
|
+
##
|
320
|
+
# The token's symbol type.
|
321
|
+
#
|
322
|
+
# @return [Symbol]
|
323
|
+
attr_reader :type
|
324
|
+
|
325
|
+
##
|
326
|
+
# The token's value.
|
327
|
+
#
|
328
|
+
# @return [String]
|
329
|
+
attr_reader :value
|
330
|
+
|
331
|
+
##
|
332
|
+
# The line number where the token was encountered.
|
333
|
+
#
|
334
|
+
# @return [Integer]
|
335
|
+
attr_reader :lineno
|
336
|
+
|
337
|
+
##
|
338
|
+
# Any additional options for the token.
|
339
|
+
#
|
340
|
+
# @return [Hash]
|
341
|
+
attr_reader :options
|
342
|
+
|
343
|
+
##
|
344
|
+
# Returns the attribute named by `key`.
|
345
|
+
#
|
346
|
+
# @param [Symbol] key
|
347
|
+
# @return [Object]
|
348
|
+
def [](key)
|
349
|
+
key = key.to_s.to_sym unless key.is_a?(Integer) || key.is_a?(Symbol)
|
350
|
+
case key
|
351
|
+
when 0, :type then @type
|
352
|
+
when 1, :value then @value
|
353
|
+
else nil
|
354
|
+
end
|
355
|
+
end
|
356
|
+
|
357
|
+
##
|
358
|
+
# Returns `true` if the given `value` matches either the type or value
|
359
|
+
# of this token.
|
360
|
+
#
|
361
|
+
# @example Matching using the symbolic type
|
362
|
+
# SPARQL::Grammar::Lexer::Token.new(:NIL) === :NIL #=> true
|
363
|
+
#
|
364
|
+
# @example Matching using the string value
|
365
|
+
# SPARQL::Grammar::Lexer::Token.new(nil, "{") === "{" #=> true
|
366
|
+
#
|
367
|
+
# @param [Symbol, String] value
|
368
|
+
# @return [Boolean]
|
369
|
+
def ===(value)
|
370
|
+
case value
|
371
|
+
when Symbol then value == @type
|
372
|
+
when ::String then value.to_s == @value.to_s
|
373
|
+
else value == @value
|
374
|
+
end
|
375
|
+
end
|
376
|
+
|
377
|
+
##
|
378
|
+
# Returns a hash table representation of this token.
|
379
|
+
#
|
380
|
+
# @return [Hash]
|
381
|
+
def to_hash
|
382
|
+
{:type => @type, :value => @value}
|
383
|
+
end
|
384
|
+
|
385
|
+
##
|
386
|
+
# Readable version of token
|
387
|
+
def to_s
|
388
|
+
@type ? @type.inspect : @value
|
389
|
+
end
|
390
|
+
|
391
|
+
##
|
392
|
+
# Returns type, if not nil, otherwise value
|
393
|
+
def representation
|
394
|
+
@type ? @type : @value
|
395
|
+
end
|
396
|
+
|
397
|
+
##
|
398
|
+
# Returns an array representation of this token.
|
399
|
+
#
|
400
|
+
# @return [Array]
|
401
|
+
def to_a
|
402
|
+
[@type, @value]
|
403
|
+
end
|
404
|
+
|
405
|
+
##
|
406
|
+
# Returns a developer-friendly representation of this token.
|
407
|
+
#
|
408
|
+
# @return [String]
|
409
|
+
def inspect
|
410
|
+
to_hash.inspect
|
411
|
+
end
|
412
|
+
end # class Token
|
413
|
+
|
414
|
+
##
|
415
|
+
# Raised for errors during lexical analysis.
|
416
|
+
#
|
417
|
+
# @example Raising a lexer error
|
418
|
+
# raise SPARQL::Grammar::Lexer::Error.new(
|
419
|
+
# "invalid token '%' on line 10",
|
420
|
+
# :input => query, :token => '%', :lineno => 9)
|
421
|
+
#
|
422
|
+
# @see http://ruby-doc.org/core/classes/StandardError.html
|
423
|
+
class Error < StandardError
|
424
|
+
##
|
425
|
+
# The input string associated with the error.
|
426
|
+
#
|
427
|
+
# @return [String]
|
428
|
+
attr_reader :input
|
429
|
+
|
430
|
+
##
|
431
|
+
# The invalid token which triggered the error.
|
432
|
+
#
|
433
|
+
# @return [String]
|
434
|
+
attr_reader :token
|
435
|
+
|
436
|
+
##
|
437
|
+
# The line number where the error occurred.
|
438
|
+
#
|
439
|
+
# @return [Integer]
|
440
|
+
attr_reader :lineno
|
441
|
+
|
442
|
+
##
|
443
|
+
# Initializes a new lexer error instance.
|
444
|
+
#
|
445
|
+
# @param [String, #to_s] message
|
446
|
+
# @param [Hash{Symbol => Object}] options
|
447
|
+
# @option options [String] :input (nil)
|
448
|
+
# @option options [String] :token (nil)
|
449
|
+
# @option options [Integer] :lineno (nil)
|
450
|
+
def initialize(message, options = {})
|
451
|
+
@input = options[:input]
|
452
|
+
@token = options[:token]
|
453
|
+
@lineno = options[:lineno]
|
454
|
+
super(message.to_s)
|
455
|
+
end
|
456
|
+
end # class Error
|
457
|
+
end # class Lexer
|
458
|
+
end # module RDF::Turtle
|