RubyGems - rdf-turtle - Versions diffs - 0.0.2 - Mend

rdf-turtle 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

data/AUTHORS ADDED

	@@ -0,0 +1 @@
1	+ * Gregg Kellogg <gregg@kellogg-assoc.com>

data/History ADDED

@@ -0,0 +1,9 @@
+### 0.0.3
+ * Completed RDF 1.1 Turtle based on http://www.w3.org/TR/2011/WD-turtle-20110809/
+    * Reader
+    * Writer
+ * Issues:
+    * IRI lexical representations
+    * PNAMES not unescaped, should they be?
+    * Assume undefined empty prefix is synonym for base
+    * Can a list be used on it's own? Used in Turtle example.

data/README.markdown ADDED

@@ -0,0 +1,142 @@
+# RDF::Turtle reader/writer
+[Turtle][] reader/writer for [RDF.rb][RDF.rb] .
+## Description
+This is a [Ruby][] implementation of a [Turtle][] parser for [RDF.rb][].
+## Features
+RDF::Turtle parses [Turtle][Turtle] and [N-Triples][N-Triples] into statements or triples. It also serializes to Turtle.
+Install with `gem install rdf-turtle`
+* 100% free and unencumbered [public domain](http://unlicense.org/) software.
+* Implements a complete parser for [Turtle][].
+* Compatible with Ruby 1.8.7+, Ruby 1.9.x, and JRuby 1.4/1.5.
+## Usage
+Instantiate a reader from a local file:
+    RDF::Turtle::Reader.open("etc/foaf.ttl") do |reader|
+       reader.each_statement do |statement|
+         puts statement.inspect
+       end
+    end
+or
+    graph = RDF::Graph.load("etc/foaf.ttl", :format => :ttl)
+Define `@base` and `@prefix` definitions, and use for serialization using `:base_uri` an `:prefixes` options
+Write a graph to a file:
+    RDF::Turtle::Writer.open("etc/test.ttl") do |writer|
+       writer << graph
+    end
+## Documentation
+Full documentation available on [Rubydoc.info][Turtle doc]
+### Principle Classes
+* {RDF::Turtle::Format}
+  * {RDF::Turtle::TTL}
+    Asserts :ttl format, text/turtle mime-type and .ttl file extension.
+* {RDF::Turtle::Reader}
+* {RDF::Turtle::Writer}
+### Variations from the spec
+In some cases, the specification is unclear on certain issues:
+* In section 2.1, the [spec][Turtle] indicates that "Literals ,
+  prefixed names and IRIs may also contain escapes to encode surrounding syntax ...",
+  however the description in 5.2 indicates that only IRI\_REF and the various STRING\_LITERAL terms
+  are subject to unescaping. This means that an IRI which might otherwise be representable using a PNAME
+  cannot if the IRI contains any characters that might need escaping. This implementation currently abides
+  by this restriction. Presumably, this would affect both PNAME\_NS and PNAME\_LN terminals.
+* The empty prefix ':' does not have a default definition. In Notation, this definition was '<#>' which is specifically
+  not intended to be used in Turtle. However, example markup using the empty prefix is common among examples. This
+  implementation defines the empty prefix as an alias for the current base IRI (either defined using `@base`,
+  or based on the document's origin).
+* The EBNF definition of IRI_REF seems malformed, and has no provision for \^, as discussed elsewhere in the spec.
+  We presume that [#0000- ] is intended to be [#0000-#0020].
+* The list example in section 6 uses a list on it's own, without a predicate or object, which is not allowed
+  by the grammar (neither is a blankNodeProperyList). Either the EBNF should be updated to allow for these
+  forms, or the examples should be changed such that ( ... ) and [ ... ] are used only in the context of being
+  a subject or object. This implementation will generate triples, however an error will be generated if the
+  parser is run in validation mode.
+## Implementation Notes
+The reader uses a generic LL1 parser {RDF::LL1::Parser} and lexer {RDF::LL1::Lexer}. The parser takes branch and follow
+tables generated from the original [Turtle EBNF Grammar][Turtle EBNF] described in the [specification][Turtle]. Branch and Follow tables are specified in {RDF::Turtle::Meta}, which is in turn
+generated using etc/gramLL1.
+The branch rules indicate productions to be taken based on a current production. Terminals are denoted
+through a set of regular expressions used to match each type of terminal, described in {RDF::Turtle::Terminals}.
+etc/turtle.bnf is used to to generate a Notation3 representation of the grammar, a transformed LL1 representation and ultimately {RDF::Turtle::Meta}.
+Using SWAP utilities, this is done as follows:
+    python http://www.w3.org/2000/10/swap/grammar/ebnf2turtle.py \
+      etc/turtle.bnf \
+      ttl language \
+      'http://www.w3.org/2000/10/swap/grammar/turtle#' > |
+    sed -e 's/^  ".*"$/  g:seq (&)/'  > etc/turtle.n3
+    python http://www.w3.org/2000/10/swap/cwm.py etc/turtle.n3 \
+      http://www.w3.org/2000/10/swap/grammar/ebnf2bnf.n3 \
+      http://www.w3.org/2000/10/swap/grammar/first_follow.n3 \
+      --think --data > etc/turtle-bnf.n3
+    script/gramLL1 \
+      --grammar etc/turtle-ll1.n3 \
+      --lang 'http://www.w3.org/2000/10/swap/grammar/turtle#language' \
+      --output lib/rdf/turtle/meta.rb
+## Dependencies
+* [Ruby](http://ruby-lang.org/) (>= 1.8.7) or (>= 1.8.1 with [Backports][])
+* [RDF.rb](http://rubygems.org/gems/rdf) (>= 0.3.0)
+## Installation
+The recommended installation method is via [RubyGems](http://rubygems.org/).
+To install the latest official release of the `SPARQL::Grammar` gem, do:
+    % [sudo] gem install rdf-turtle
+## Mailing List
+* <http://lists.w3.org/Archives/Public/public-rdf-ruby/>
+## Author
+* [Gregg Kellogg](http://github.com/gkellogg) - <http://kellogg-assoc.com/>
+## Contributing
+* Do your best to adhere to the existing coding conventions and idioms.
+* Don't use hard tabs, and don't leave trailing whitespace on any line.
+* Do document every method you add using [YARD][] annotations. Read the
+  [tutorial][YARD-GS] or just look at the existing code for examples.
+* Don't touch the `.gemspec`, `VERSION` or `AUTHORS` files. If you need to
+  change them, do so on your private branch only.
+* Do feel free to add yourself to the `CREDITS` file and the corresponding
+  list in the the `README`. Alphabetical order applies.
+* Do note that in order for us to merge any non-trivial changes (as a rule
+  of thumb, additions larger than about 15 lines of code), we need an
+  explicit [public domain dedication][PDD] on record from you.
+## License
+This is free and unencumbered public domain software. For more information,
+see <http://unlicense.org/> or the accompanying {file:UNLICENSE} file.
+[Ruby]:         http://ruby-lang.org/
+[RDF]:          http://www.w3.org/RDF/
+[YARD]:         http://yardoc.org/
+[YARD-GS]:      http://rubydoc.info/docs/yard/file/docs/GettingStarted.md
+[PDD]:          http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0013.html
+[RDF.rb]:       http://rdf.rubyforge.org/
+[Backports]:    http://rubygems.org/gems/backports
+[Turtle]:       http://www.w3.org/TR/2011/WD-turtle-20110809/
+[Turtle doc]:   http://rubydoc.info/github/gkellogg/rdf-turtle/master/file/README.markdown
+[Turtle EBNF]:  http://www.w3.org/2000/10/swap/grammar/turtle.bnf

data/UNLICENSE ADDED

@@ -0,0 +1,24 @@
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or
+distribute this software, either in source code form or as a compiled
+binary, for any purpose, commercial or non-commercial, and by any
+means.
+In jurisdictions that recognize copyright laws, the author or authors
+of this software dedicate any and all copyright interest in the
+software to the public domain. We make this dedication for the benefit
+of the public at large and to the detriment of our heirs and
+successors. We intend this dedication to be an overt act of
+relinquishment in perpetuity of all present and future rights to this
+software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
+For more information, please refer to <http://unlicense.org/>

data/VERSION ADDED

	@@ -0,0 +1 @@
1	+ 0.0.2

data/lib/rdf/ll1/lexer.rb ADDED

@@ -0,0 +1,458 @@
+module RDF::LL1
+  require 'rdf/ll1/scanner'    unless defined?(Scanner)
+  ##
+  # A lexical analyzer
+  #
+  # @example Tokenizing a Turtle string
+  #   terminals = [
+  #     [:BLANK_NODE_LABEL, %r(_:(#{PN_LOCAL}))],
+  #     ...
+  #   ]
+  #   ttl = "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ."
+  #   lexer = RDF::LL1::Lexer.tokenize(ttl, terminals)
+  #   lexer.each_token do |token|
+  #     puts token.inspect
+  #   end
+  #
+  # @example Tokenizing and returning a token stream
+  #   lexer = RDF::LL1::Lexer.tokenize(...)
+  #   while :some-condition
+  #     token = lexer.first # Get the current token
+  #     token = lexer.shift # Get the current token and shift to the next
+  #   end
+  #
+  # @example Handling error conditions
+  #   begin
+  #     RDF::Turtle::Lexer.tokenize(query)
+  #   rescue RDF::Turtle::Lexer::Error => error
+  #     warn error.inspect
+  #   end
+  #
+  # @see http://en.wikipedia.org/wiki/Lexical_analysis
+  class Lexer
+    include Enumerable
+    ESCAPE_CHARS         = {
+      '\\t'   => "\t",  # \u0009 (tab)
+      '\\n'   => "\n",  # \u000A (line feed)
+      '\\r'   => "\r",  # \u000D (carriage return)
+      '\\b'   => "\b",  # \u0008 (backspace)
+      '\\f'   => "\f",  # \u000C (form feed)
+      '\\"'  => '"',    # \u0022 (quotation mark, double quote mark)
+      "\\'"  => '\'',   # \u0027 (apostrophe-quote, single quote mark)
+      '\\\\' => '\\'    # \u005C (backslash)
+    }
+    ESCAPE_CHAR4        = /\\u(?:[0-9A-Fa-f]{4,4})/           # \uXXXX
+    ESCAPE_CHAR8        = /\\U(?:[0-9A-Fa-f]{8,8})/           # \UXXXXXXXX
+    ECHAR               = /\\[tbnrf\\"']/                     # [91s]
+    UCHAR               = /#{ESCAPE_CHAR4}|#{ESCAPE_CHAR8}/
+    COMMENT             = /#.*/
+    WS                  = / |\t|\r|\n/m
+    ML_START            = /\'\'\'|\"\"\"/                     # Beginning of terminals that may span lines
+    ##
+    # @attr [Regexp] defines whitespace, defaults to WS
+    attr_reader :whitespace
+    ##
+    # @attr [Regexp] defines single-line comment, defaults to COMMENT
+    attr_reader :comment
+    ##
+    # Returns a copy of the given `input` string with all `\uXXXX` and
+    # `\UXXXXXXXX` Unicode codepoint escape sequences replaced with their
+    # unescaped UTF-8 character counterparts.
+    #
+    # @param  [String] input
+    # @return [String]
+    # @see    http://www.w3.org/TR/rdf-sparql-query/#codepointEscape
+    def self.unescape_codepoints(string)
+      # Decode \uXXXX and \UXXXXXXXX code points:
+      string = string.gsub(UCHAR) do |c|
+        s = [(c[2..-1]).hex].pack('U*')
+        s.respond_to?(:force_encoding) ? s.force_encoding(Encoding::ASCII_8BIT) : s
+      end
+      string.force_encoding(Encoding::UTF_8) if string.respond_to?(:force_encoding)      # Ruby 1.9+
+      string
+    end
+    ##
+    # Returns a copy of the given `input` string with all string escape
+    # sequences (e.g. `\n` and `\t`) replaced with their unescaped UTF-8
+    # character counterparts.
+    #
+    # @param  [String] input
+    # @return [String]
+    # @see    http://www.w3.org/TR/rdf-sparql-query/#grammarEscapes
+    def self.unescape_string(input)
+      input.gsub(ECHAR) { |escaped| ESCAPE_CHARS[escaped] }
+    end
+    ##
+    # Tokenizes the given `input` string or stream.
+    #
+    # @param  [String, #to_s]                 input
+    # @param  [Array<Array<Symbol, Regexp>>]  terminals
+    #   Array of symbol, regexp pairs used to match terminals.
+    #   If the symbol is nil, it defines a Regexp to match string terminals.
+    # @param  [Hash{Symbol => Object}]        options
+    # @yield  [lexer]
+    # @yieldparam [Lexer] lexer
+    # @return [Lexer]
+    # @raise  [Lexer::Error] on invalid input
+    def self.tokenize(input, terminals, options = {}, &block)
+      lexer = self.new(input, terminals, options)
+      block_given? ? block.call(lexer) : lexer
+    end
+    ##
+    # Initializes a new lexer instance.
+    #
+    # @param  [String, #to_s]                 input
+    # @param  [Array<Array<Symbol, Regexp>>]  terminals
+    #   Array of symbol, regexp pairs used to match terminals.
+    #   If the symbol is nil, it defines a Regexp to match string terminals.
+    # @param  [Hash{Symbol => Object}]        options
+    # @option options [Regexp]                :whitespace (WS)
+    # @option options [Regexp]                :comment (COMMENT)
+    # @option options [Array<Symbol>]         :unescape_terms ([])
+    #   Regular expression matching the beginning of terminals that may cross newlines
+    def initialize(input = nil, terminals = nil, options = {})
+      @options        = options.dup
+      @whitespace     = @options[:whitespace]     || WS
+      @comment        = @options[:comment]        || COMMENT
+      @unescape_terms = @options[:unescape_terms] || []
+      @terminals      = terminals
+      raise Error, "Terminal patterns not defined" unless @terminals && @terminals.length > 0
+      @lineno = 1
+      @scanner = Scanner.new(input) do |string|
+        string.force_encoding(Encoding::UTF_8) if string.respond_to?(:force_encoding)      # Ruby 1.9+
+        string
+      end
+    end
+    ##
+    # Any additional options for the lexer.
+    #
+    # @return [Hash]
+    attr_reader   :options
+    ##
+    # The current input string being processed.
+    #
+    # @return [String]
+    attr_accessor :input
+    ##
+    # The current line number (zero-based).
+    #
+    # @return [Integer]
+    attr_reader   :lineno
+    ##
+    # Returns `true` if the input string is lexically valid.
+    #
+    # To be considered valid, the input string must contain more than zero
+    # terminals, and must not contain any invalid terminals.
+    #
+    # @return [Boolean]
+    def valid?
+      begin
+        !count.zero?
+      rescue Error
+        false
+      end
+    end
+    ##
+    # Enumerates each token in the input string.
+    #
+    # @yield  [token]
+    # @yieldparam [Token] token
+    # @return [Enumerator]
+    def each_token(&block)
+      if block_given?
+        while token = shift
+          yield token
+        end
+      end
+      enum_for(:each_token)
+    end
+    alias_method :each, :each_token
+    ##
+    # Returns first token in input stream
+    #
+    # @return [Token]
+    def first
+      return nil unless scanner
+      @first ||= begin
+        {} while !scanner.eos? && skip_whitespace
+        return @scanner = nil if scanner.eos?
+        token = match_token
+        if token.nil?
+          lexme = (scanner.rest.split(/#{@whitespace}|#{@comment}/).first rescue nil) || scanner.rest
+          raise Error.new("Invalid token #{lexme.inspect} on line #{lineno + 1}",
+            :input => scanner.rest[0..100], :token => lexme, :lineno => lineno)
+        end
+        token
+      end
+    end
+    ##
+    # Returns first token and shifts to next
+    #
+    # @return [Token]
+    def shift
+      cur = first
+      @first = nil
+      cur
+    end
+    ##
+    # Skip input until a token is matched
+    #
+    # @return [Token]
+    def recover
+      scanner.skip(/./)
+      until scanner.eos? do
+        begin
+          return first
+        rescue Error
+          # Ignore errors until something scans, or EOS.
+          scanner.skip(/./)
+        end
+      end
+    end
+  protected
+    # @return [StringScanner]
+    attr_reader :scanner
+    # Perform string and codepoint unescaping
+    # @param [String] string
+    # @return [String]
+    def unescape(string)
+      self.class.unescape_string(self.class.unescape_codepoints(string))
+    end
+    ##
+    # Skip whitespace or comments, as defined through input options or defaults
+    def skip_whitespace
+      # skip all white space, but keep track of the current line number
+      while !scanner.eos?
+       if matched = scanner.scan(@whitespace)
+          @lineno += matched.count("\n")
+        elsif (com = scanner.scan(@comment))
+        else
+          return
+        end
+      end
+    end
+    ##
+    # Return the matched token
+    #
+    # @return [Token]
+    def match_token
+      @terminals.each do |(term, regexp)|
+        #STDERR.puts "match[#{term}] #{scanner.rest[0..100].inspect} against #{regexp.inspect}" if term == :STRING_LITERAL2
+        if matched = scanner.scan(regexp)
+          matched = unescape(matched) if @unescape_terms.include?(term)
+          #STDERR.puts "  unescape? #{@unescape_terms.include?(term).inspect}"
+          #STDERR.puts "  matched #{term.inspect}: #{matched.inspect}"
+          return token(term, matched)
+        end
+      end
+      nil
+    end
+  protected
+    ##
+    # Constructs a new token object annotated with the current line number.
+    #
+    # The parser relies on the type being a symbolized URI and the value being
+    # a string, if there is no type. If there is a type, then the value takes
+    # on the native representation appropriate for that type.
+    #
+    # @param  [Symbol] type
+    # @param  [String] value
+    #   Scanner instance with access to matched groups
+    # @return [Token]
+    def token(type, value)
+      Token.new(type, value, :lineno => lineno)
+    end
+    ##
+    # Represents a lexer token.
+    #
+    # @example Creating a new token
+    #   token = RDF::LL1::Lexer::Token.new(:LANGTAG, "en")
+    #   token.type   #=> :LANGTAG
+    #   token.value  #=> "en"
+    #
+    # @see http://en.wikipedia.org/wiki/Lexical_analysis#Token
+    class Token
+      ##
+      # Initializes a new token instance.
+      #
+      # @param  [Symbol]                 type
+      # @param  [String]                 value
+      # @param  [Hash{Symbol => Object}] options
+      # @option options [Integer]        :lineno (nil)
+      def initialize(type, value, options = {})
+        @type, @value = (type ? type.to_s.to_sym : nil), value
+        @options = options.dup
+        @lineno  = @options.delete(:lineno)
+      end
+      ##
+      # The token's symbol type.
+      #
+      # @return [Symbol]
+      attr_reader :type
+      ##
+      # The token's value.
+      #
+      # @return [String]
+      attr_reader :value
+      ##
+      # The line number where the token was encountered.
+      #
+      # @return [Integer]
+      attr_reader :lineno
+      ##
+      # Any additional options for the token.
+      #
+      # @return [Hash]
+      attr_reader :options
+      ##
+      # Returns the attribute named by `key`.
+      #
+      # @param  [Symbol] key
+      # @return [Object]
+      def [](key)
+        key = key.to_s.to_sym unless key.is_a?(Integer) || key.is_a?(Symbol)
+        case key
+          when 0, :type    then @type
+          when 1, :value   then @value
+          else nil
+        end
+      end
+      ##
+      # Returns `true` if the given `value` matches either the type or value
+      # of this token.
+      #
+      # @example Matching using the symbolic type
+      #   SPARQL::Grammar::Lexer::Token.new(:NIL) === :NIL     #=> true
+      #
+      # @example Matching using the string value
+      #   SPARQL::Grammar::Lexer::Token.new(nil, "{") === "{"  #=> true
+      #
+      # @param  [Symbol, String] value
+      # @return [Boolean]
+      def ===(value)
+        case value
+          when Symbol   then value == @type
+          when ::String then value.to_s == @value.to_s
+          else value == @value
+        end
+      end
+      ##
+      # Returns a hash table representation of this token.
+      #
+      # @return [Hash]
+      def to_hash
+        {:type => @type, :value => @value}
+      end
+      ##
+      # Readable version of token
+      def to_s
+        @type ? @type.inspect : @value
+      end
+      ##
+      # Returns type, if not nil, otherwise value
+      def representation
+        @type ? @type : @value
+      end
+      ##
+      # Returns an array representation of this token.
+      #
+      # @return [Array]
+      def to_a
+        [@type, @value]
+      end
+      ##
+      # Returns a developer-friendly representation of this token.
+      #
+      # @return [String]
+      def inspect
+        to_hash.inspect
+      end
+    end # class Token
+    ##
+    # Raised for errors during lexical analysis.
+    #
+    # @example Raising a lexer error
+    #   raise SPARQL::Grammar::Lexer::Error.new(
+    #     "invalid token '%' on line 10",
+    #     :input => query, :token => '%', :lineno => 9)
+    #
+    # @see http://ruby-doc.org/core/classes/StandardError.html
+    class Error < StandardError
+      ##
+      # The input string associated with the error.
+      #
+      # @return [String]
+      attr_reader :input
+      ##
+      # The invalid token which triggered the error.
+      #
+      # @return [String]
+      attr_reader :token
+      ##
+      # The line number where the error occurred.
+      #
+      # @return [Integer]
+      attr_reader :lineno
+      ##
+      # Initializes a new lexer error instance.
+      #
+      # @param  [String, #to_s]          message
+      # @param  [Hash{Symbol => Object}] options
+      # @option options [String]         :input  (nil)
+      # @option options [String]         :token  (nil)
+      # @option options [Integer]        :lineno (nil)
+      def initialize(message, options = {})
+        @input  = options[:input]
+        @token  = options[:token]
+        @lineno = options[:lineno]
+        super(message.to_s)
+      end
+    end # class Error
+  end # class Lexer
+end # module RDF::Turtle