RubyGems - rley - Versions diffs - 0.3.12 → 0.4.00 - Mend

rley 0.3.12 → 0.4.00

Files changed (21) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +7 -0
data/README.md +69 -5
data/examples/NLP/mini_en_demo.rb +5 -1
data/examples/data_formats/JSON/JSON_demo.rb +1 -0
data/examples/general/calc/calc_demo.rb +2 -1
data/lib/rley/constants.rb +1 -1
data/lib/rley/parser/dotted_item.rb +1 -1
data/lib/rley/parser/error_reason.rb +106 -0
data/lib/rley/parser/gfg_chart.rb +1 -24
data/lib/rley/parser/gfg_earley_parser.rb +28 -57
data/lib/rley/parser/gfg_parsing.rb +54 -30
data/lib/rley/ptree/token_range.rb +0 -5
data/lib/rley/rley_error.rb +10 -0
data/lib/rley/sppf/parse_forest.rb +7 -9
data/spec/rley/parser/error_reason_spec.rb +120 -0
data/spec/rley/parser/gfg_chart_spec.rb +3 -54
data/spec/rley/parser/gfg_earley_parser_spec.rb +74 -63
data/spec/rley/parser/gfg_parsing_spec.rb +2 -3
data/spec/rley/support/grammar_pb_helper.rb +48 -0
metadata +7 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 3cb8a4bdff7b8a62407ea9b05f6efc807a62bcbf
-  data.tar.gz: 498be3de6c49c8b6800d5f64eb2e1237f845b405
+  metadata.gz: eb2c26370206f6c6eca059858ee0c8adedd32810
+  data.tar.gz: 77a42b3da998a2e8b073ec3a811287b71e6b3a3f
 SHA512:
-  metadata.gz: e8ca57661728076d51e311c936698bfe603962de9041c02a686bb7cce360ad21f992437e7f3f460b7792b78c1fe30106cd955b3dbe5ecebdd8973112dc2375df
-  data.tar.gz: 984d7e2dc9f42831657f3a30d58c5e4145cc9383df894f3234dabd09c34138bdfa3bde3aa956f1b14dc8cce97dc90eaa11c85074fca21befd9a038cac95e3218
+  metadata.gz: b16495b26269ee208ed3151f820a296d801ed7ca01ea9c98cf29b554da4ceba55719d67a7a7e15dc4fee9b70b54b1f08881ae0dc499b217f47db493b873af4eb
+  data.tar.gz: e463f9697c3cf8b012c8bc8c7736e675d6d355d3f81197bac7fb23529bb0c9e66c791d45ad833f2d6fadeb7eb2adb1a5eed6b3415292bb31fe8a02a43d2fed94

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,10 @@
+### 0.4.00 / 2016-12-17
+* [CHANGE] Error reporting is vastly changed. Syntax errors don't raise exceptions.
+  parse error can be retrieved via an `ErrorReason` object. Such an object is returned by the
+  method `GFGParsing#failure_reason` method.
+* [CHANGE] File `README.md` updated to reflect the new error reporting.
+* [CHANGE] Examples updated to reflect the new error reporting.
 ### 0.3.12 / 2016-12-08
 * [NEW] Directory `examples\general\calc`. A simple arithmetic expression demo parser.

data/README.md CHANGED Viewed

@@ -64,7 +64,7 @@ Installing the latest stable version is simple:
 ## A whirlwind tour of Rley
 The purpose of this section is show how to create a parser for a minimalistic
-English language subset.
+English language subset.
 The tour is organized into the following steps:
 1. [Defining the language grammar](#defining-the-language-grammar)
 2. [Creating a lexicon](#creating-a-lexicon)
@@ -73,7 +73,7 @@ The tour is organized into the following steps:
 5. [Parsing some input](#parsing-some-input)
 6. [Generating the parse forest](#generating-the-parse-forest)
-The complete source code of the tour can be found in the
+The complete source code of the example used in this tour can be found in the
 [examples](https://github.com/famished-tiger/Rley/tree/master/examples/NLP/mini_en_demo.rb)
 directory
@@ -86,7 +86,7 @@ The subset of English grammar is based on an example from the NLTK book.
     # Instantiate a builder object that will build the grammar for us
     builder = Rley::Syntax::GrammarBuilder.new do
       # Terminal symbols (= word categories in lexicon)
-      add_terminals('Noun', 'Proper-Noun', 'Verb')
+      add_terminals('Noun', 'Proper-Noun', 'Verb')
       add_terminals('Determiner', 'Preposition')
       # Here we define the productions (= grammar rules)
@@ -97,7 +97,7 @@ The subset of English grammar is based on an example from the NLTK book.
       rule 'VP' => %w[Verb NP]
       rule 'VP' => %w[Verb NP PP]
       rule 'PP' => %w[Preposition NP]
-    end
+    end
     # And now, let's build the grammar...
     grammar = builder.grammar
 ```
@@ -178,11 +178,75 @@ creating a lexicon and tokenizer from scratch. Here are a few Ruby Part-of-Speec
     pforest = result.parse_forest
 ```
+## Error reporting
+__Rley__ is a non-violent parser, that is, it won't throw an exception when it
+detects a syntax error. Instead, the parse result will be marked as
+non-successful. The parse error can then be identified by calling the
+`GFGParsing#failure_reason` method. This method returns an error reason object
+which can help to produce an error message.
+Consider the example from the [Parsing some input](#parsing-some-input) section
+above and, as an error, we delete the verb `saw` in the sentence to parse.
+```ruby
+    # Verb has been removed from the sentence on next line
+    input_to_parse = 'John Mary with a telescope'
+    # Convert input text into a sequence of token objects...
+    tokens = tokenizer(input_to_parse, grammar)
+    result = parser.parse(tokens)
+    puts "Parsing successful? #{result.success?}" # => Parsing successful? false
+    exit(1)
+```
+As expected, the parse is now failing.
+To get an error message, one just need to retrieve the error reason and
+ask it to generate a message.
+```ruby
+    # Show error message if parse fails...
+    puts result.failure_reason.message unless result.success?
+```
+Re-running the example with the error, result in the error message:
+```
+  Syntax error at or near token 2 >>>Mary<<<
+  Expected one 'Verb', found a 'Proper-Noun' instead.
+```
+The standard __Rley__ message not only inform about the location of
+the mistake, it also provides some hint by disclosing its expectations.
+Let's experiment again with the original sentence but without the word
+`telescope`.
+```ruby
+    # Last word has been removed from the sentence on next line
+    input_to_parse = 'John saw Mary with a '
+    # Convert input text into a sequence of token objects...
+    tokens = tokenizer(input_to_parse, grammar)
+    result = parser.parse(tokens)
+    puts "Parsing successful? #{result.success?}" # => Parsing successful? false
+    unless result.success?
+      puts result.failure_reason.message
+      exit(1)
+    end
+```
+This time, the following output is displayed:
+```
+  Parsing successful? false
+  Premature end of input after 'a' at position 5
+  Expected one 'Noun'.
+```
+Again, the resulting error message is user-friendly.
+Remark: currently, Rley reports an error position as the index of the
+input token with which the error was detected.
 ## Examples
-The project source directory contains several example scripts that demonstrate
+The project source directory contains several example scripts that demonstrate
 how grammars are to be constructed and used.

data/examples/NLP/mini_en_demo.rb CHANGED Viewed

@@ -83,7 +83,11 @@ input_to_parse = 'John saw Mary with a telescope'
 tokens = tokenizer(input_to_parse, grammar)
 result = parser.parse(tokens)
-puts "Parsing successful? #{result.success?}" # => Parsing successful? true
+puts "Parsing successful? #{result.success?}"
+unless result.success?
+  puts result.failure_reason.message
+  exit(1)
+end
 ########################################
 # Step 6. Generating the parse forest

data/examples/data_formats/JSON/JSON_demo.rb CHANGED Viewed

@@ -23,6 +23,7 @@ result = parser.parse_file(file_name) # result object contains parse details
 unless result.success?
   # Stop if the parse failed...
   puts "Parsing of '#{file_name}' failed"
+  puts result.failure_reason.message
   exit(1)
 end

data/examples/general/calc/calc_demo.rb CHANGED Viewed

@@ -22,7 +22,8 @@ result = parser.parse_expression(ARGV[0])
 unless result.success?
   # Stop if the parse failed...
-  puts "Parsing of '#{file_name}' failed"
+  puts "Parsing of '#{ARGV[0]}' failed"
+  puts "Reason: #{result.failure_reason.message}"
   exit(1)
 end

data/lib/rley/constants.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 module Rley # Module used as a namespace
   # The version number of the gem.
-  Version = '0.3.12'.freeze
+  Version = '0.4.00'.freeze
   # Brief description of the gem.
   Description = "Ruby implementation of the Earley's parsing algorithm".freeze

data/lib/rley/parser/dotted_item.rb CHANGED Viewed

@@ -115,7 +115,7 @@ module Rley # This module is used as a namespace
       private
-      # Return the given after its validation.
+      # Return the given position after its validation.
       def valid_position(aPosition)
         rhs_size = production.rhs.size
         if aPosition < 0 || aPosition > rhs_size

data/lib/rley/parser/error_reason.rb ADDED Viewed

@@ -0,0 +1,106 @@
+module Rley # Module used as a namespace
+  module Parser # This module is used as a namespace
+    # Abstract class. An instance represents an explanation describing
+    # the likely cause of a parse error
+    # detected by Rley.
+    class ErrorReason
+      # The position of the offending input token
+      attr_reader(:position)
+      # The failing production
+      attr_reader(:production)
+      def initialize(aPosition)
+        @position = aPosition
+      end
+      # Returns the result of invoking reason.to_s.
+      def message()
+        return self.to_s
+      end
+      # Return this reason's class name and message
+      def inspect
+        "#{self.class.name}: #{message}"
+      end
+    end # class
+    # This parse error occurs when no input for parsing was provided
+    # while the grammar requires some non-empty input.
+    class NoInput < ErrorReason
+      def initialize()
+        super(0)
+      end
+      # Returns the reason's message.
+      def to_s
+        'Input cannot be empty.'
+      end
+    end # class
+    # Abstract class and subclass of ErrorReason.
+    # This specialization represents errors in which the input
+    # didn't match one of the expected token.
+    class ExpectationNotMet < ErrorReason
+      # The last input token read when error was detected
+      attr_reader(:last_token)
+      # The terminal symbols expected when error was occurred
+      attr_reader(:expected_terminals)
+      def initialize(aPosition, lastToken, expectedTerminals)
+        super(aPosition)
+        @last_token = lastToken.dup
+        @expected_terminals = expectedTerminals.dup
+      end
+      protected
+      # Emit a text explaining the expected terminal symbols
+      def expectations
+        term_names = expected_terminals.map(&:name)
+        explain = 'Expected one '
+        explain << if expected_terminals.size > 1
+                     "of: ['#{term_names.join("', '")}']"
+                   else
+                     "'#{term_names[0]}'"
+                   end
+        return explain
+      end
+    end # class
+    # This parse error occurs when the current token from the input
+    # is unexpected according to the grammar rules.
+    class UnexpectedToken < ExpectationNotMet
+      # Returns the reason's message.
+      def to_s
+        err_msg = "Syntax error at or near token #{position + 1} "
+        err_msg << ">>>#{last_token.lexeme}<<<\n"
+        err_msg << expectations
+        err_msg << ", found a '#{last_token.terminal.name}' instead."
+        return err_msg
+      end
+    end # class
+    # This parse error occurs when all input tokens were consumed
+    # but the parser still expected one or more tokens from the input.
+    class PrematureInputEnd < ExpectationNotMet
+      # Returns the reason's message.
+      def to_s
+        err_msg = "Premature end of input after '#{last_token.lexeme}'"
+        err_msg << " at position #{position + 1}\n"
+        err_msg << "#{expectations}."
+        return err_msg
+      end
+    end # class
+  end # module
+end # module
+# End of file

data/lib/rley/parser/gfg_chart.rb CHANGED Viewed

@@ -12,17 +12,8 @@ module Rley # This module is used as a namespace
       # An array of entry sets (one per input token + 1)
       attr_reader(:sets)
-      # The level of trace details reported on stdout during the parse.
-      # The possible values are:
-      # 0: No trace output (default case)
-      # 1: Show trace of scanning and completion rules
-      # 2: Same as of 1 with the addition of the prediction rules
-      attr_reader(:tracer)
       # @param tokenCount [Fixnum] The number of lexemes in the input to parse.
-      # @param aTracer [ParseTracer] A tracer object.
-      def initialize(tokenCount, aGFGraph, aTracer)
-        @tracer = aTracer
+      def initialize(tokenCount, aGFGraph)
         @sets = Array.new(tokenCount + 1) { |_| ParseEntrySet.new }
         push_entry(aGFGraph.start_vertex, 0, 0, :start_rule)
       end
@@ -53,20 +44,6 @@ module Rley # This module is used as a namespace
       def push_entry(aVertex, anOrigin, anIndex, aReason)
         new_entry = ParseEntry.new(aVertex, anOrigin)
         pushed = self[anIndex].push_entry(new_entry)
-        if pushed == new_entry && tracer.level > 0
-          case aReason
-            when :start_rule, :prediction
-              tracer.trace_prediction(anIndex, new_entry)
-            when :scanning
-               tracer.trace_scanning(anIndex, new_entry)
-            when :completion
-               tracer.trace_completion(anIndex, new_entry)
-            else
-              raise NotImplementedError, "Unknown push_entry mode #{aReason}"
-          end
-        end
         return pushed
       end

data/lib/rley/parser/gfg_earley_parser.rb CHANGED Viewed

@@ -17,33 +17,34 @@ module Rley # This module is used as a namespace
       # Parse a sequence of input tokens.
       # @param aTokenSequence [Array] Array of Tokens objects returned by a
       # tokenizer/scanner/lexer.
-      # @param aTraceLevel [Fixnum] The specified trace level.
-      # The possible values are:
-      # 0: No trace output (default case)
-      # 1: Show trace of scanning and completion rules
-      # 2: Same as of 1 with the addition of the prediction rules
       # @return [Parsing] an object that embeds the parse results.
-      def parse(aTokenSequence, aTraceLevel = 0)
-        tracer = ParseTracer.new(aTraceLevel, $stdout, aTokenSequence)
-        result = GFGParsing.new(gf_graph, aTokenSequence, tracer)
+      def parse(aTokenSequence)
+        result = GFGParsing.new(gf_graph, aTokenSequence)
         last_token_index = aTokenSequence.size
+        if last_token_index == 0 && !grammar.start_symbol.nullable?
+          return unexpected_empty_input(result)
+        end
         (0..last_token_index).each do |i|
-          handle_error(result) if result.chart[i].empty?
           result.chart[i].each do |entry|
             # Is entry of the form? [A => alpha . B beta, k]...
             next_symbol = entry.next_symbol
             if next_symbol && next_symbol.kind_of?(Syntax::NonTerminal)
               # ...apply the Call rule
-              call_rule(result, entry, i, tracer)
+              call_rule(result, entry, i)
             end
-            exit_rule(result, entry, i, tracer) if entry.exit_entry?
-            start_rule(result, entry, i, tracer) if entry.start_entry?
-            end_rule(result, entry, i, tracer) if entry.end_entry?
+            exit_rule(result, entry, i) if entry.exit_entry?
+            start_rule(result, entry, i) if entry.start_entry?
+            end_rule(result, entry, i) if entry.end_entry?
+          end
+          if i < last_token_index
+            scan_success = scan_rule(result, i)
+            break unless scan_success
           end
-          scan_rule(result, i, tracer) if i < last_token_index
         end
+        result.done # End of parsing process
         return result
       end
@@ -55,10 +56,7 @@ module Rley # This module is used as a namespace
       # Then the entry [.B, i] is added to the current sigma set.
       # Gist: when an entry expects the non-terminal symbol B, then
       # add an entry with start vertex .B
-      def call_rule(aParsing, anEntry, aPosition, aTracer)
-        if aTracer.level > 1
-          puts "Chart[#{aPosition}] Call rule applied upon #{anEntry}:"
-        end
+      def call_rule(aParsing, anEntry, aPosition)
         aParsing.call_rule(anEntry, aPosition)
       end
@@ -69,10 +67,7 @@ module Rley # This module is used as a namespace
       # is added to the current sigma set.
       # Gist: for an entry corresponding to a start vertex, add an entry
       # for each entry edge in the graph.
-      def start_rule(aParsing, anEntry, aPosition, aTracer)
-        if aTracer.level > 1
-          puts "Chart[#{aPosition}] Start rule applied upon #{anEntry}:"
-        end
+      def start_rule(aParsing, anEntry, aPosition)
         aParsing.start_rule(anEntry, aPosition)
       end
@@ -81,10 +76,7 @@ module Rley # This module is used as a namespace
       # production. Then entry [B., k] is added to the current entry set.
       # Gist: for an entry corresponding to a reduced production, add an entry
       # for each exit edge in the graph.
-      def exit_rule(aParsing, anEntry, aPosition, aTracer)
-        if aTracer.level > 1
-          puts "Chart[#{aPosition}] Exit rule applied upon #{anEntry}:"
-        end
+      def exit_rule(aParsing, anEntry, aPosition)
         aParsing.exit_rule(anEntry, aPosition)
       end
@@ -92,10 +84,7 @@ module Rley # This module is used as a namespace
       # is added to a parse entry set with index j.
       # then for every entry of the form [A => α . B γ, i] in the kth sigma set
       #   the entry [A => α B . γ, i] is added to the jth sigma set.
-      def end_rule(aParsing, anEntry, aPosition, aTracer)
-        if aTracer.level > 1
-          puts "Chart[#{aPosition}] End rule applied upon #{anEntry}:"
-        end
+      def end_rule(aParsing, anEntry, aPosition)
         aParsing.end_rule(anEntry, aPosition)
       end
@@ -105,35 +94,17 @@ module Rley # This module is used as a namespace
       #     and allow them to cross the edge, adding the node on the back side
       #     of the edge as an entry to the next sigma set:
       #       add an entry to the next sigma set [A => α t . γ, i]
-      def scan_rule(aParsing, aPosition, aTracer)
-        if aTracer.level > 1
-          prefix = "Chart[#{aPosition}] Scan rule applied upon "
-          puts prefix + aParsing.tokens[aPosition].to_s
-        end
+      def scan_rule(aParsing, aPosition)
         aParsing.scan_rule(aPosition)
       end
+      # Parse error detected: no input tokens provided while the grammar
+      # forbids this this.
+      def unexpected_empty_input(aParsing)
+        aParsing.faulty(NoInput.new)
+        return aParsing
+      end
-      # Raise an exception to indicate a syntax error.
-      def handle_error(aParsing)
-        # Retrieve the first empty state set
-        pos = aParsing.chart.sets.find_index(&:empty?)
-        lexeme_at_pos = aParsing.tokens[pos - 1].lexeme
-        puts "chart index: #{pos - 1}"
-        terminals = aParsing.chart.sets[pos - 1].expected_terminals
-        puts "count expected terminals #{terminals.size}"
-        entries = aParsing.chart.sets[pos - 1].entries.map(&:to_s).join("\n")
-        puts "Items #{entries}"
-        term_names = terminals.map(&:name)
-        err_msg = "Syntax error at or near token #{pos}"
-        err_msg << ">>>#{lexeme_at_pos}<<<:\nExpected "
-        err_msg << if terminals.size > 1
-                     "one of: ['#{term_names.join("', '")}'],"
-                   else
-                     ": #{term_names[0]},"
-                   end
-        err_msg << " found a '#{aParsing.tokens[pos - 1].terminal.name}'"
-        raise StandardError, err_msg + ' instead.'
-      end
     end # class
   end # module
 end # module

data/lib/rley/parser/gfg_parsing.rb CHANGED Viewed

@@ -1,4 +1,5 @@
 require_relative 'gfg_chart'
+require_relative 'error_reason'
 require_relative 'parse_entry_tracker'
 require_relative 'parse_forest_factory'
@@ -15,22 +16,21 @@ module Rley # This module is used as a namespace
       # The sequence of input token to parse
       attr_reader(:tokens)
-      # A Hash with pairs of the form:
+      # A Hash with pairs of the form:
       # parse entry => [ antecedent parse entries ]
       # It associates to a every parse entry its antecedent(s), that is,
-      # the parse entry/ies that causes the key parse entry to be created
+      # the parse entry/ies that causes the key parse entry to be created
       # with one the gfg rules
       attr_reader(:antecedence)
-      # @param aTracer [ParseTracer] An object that traces the parsing.
-      # The possible values are:
-      # 0: No trace output (default case)
-      # 1: Show trace of scanning and completion rules
-      # 2: Same as of 1 with the addition of the prediction rules
-      def initialize(theGFG, theTokens, aTracer)
+      # The reason of a parse failure
+      attr_reader(:failure_reason)
+      def initialize(theGFG, theTokens)
         @gf_graph = theGFG
         @tokens = theTokens.dup
-        @chart = GFGChart.new(tokens.size, gf_graph, aTracer)
+        @chart = GFGChart.new(tokens.size, gf_graph)
         @antecedence = Hash.new { |hash, key| hash[key] = [] }
         antecedence[chart[0].first]
       end
@@ -45,7 +45,7 @@ module Rley # This module is used as a namespace
         next_symbol = anEntry.next_symbol
         start_vertex = gf_graph.start_vertex_for[next_symbol]
         pos = aPosition
-        apply_rule(anEntry, start_vertex, pos, pos, :call_rule)
+        apply_rule(anEntry, start_vertex, pos, pos, :call_rule)
       end
       # Let the current sigma set be the ith parse entry set.
@@ -65,7 +65,7 @@ module Rley # This module is used as a namespace
       end
       # This method must be invoked when an entry is added to a parse entry set
-      # and is of the form [B => γ ., k] (the dot is at the end of the
+      # and is of the form [B => γ ., k] (the dot is at the end of the
       # production. Then entry [B., k] is added to the current entry set.
       # Gist: for an entry corresponding to a reduced production, add an entry
       # for each exit edge in the graph.
@@ -96,11 +96,12 @@ module Rley # This module is used as a namespace
       end
       # Given that the terminal t is at the specified position,
-      #   Locate all entries in the current sigma set that expect t:
+      #   Locate all entries in the current sigma set that expect t:
       #     [A => α . t γ, i]
       #     and allow them to cross the edge, adding the node on the back side
       #     of the edge as an entry to the next sigma set:
       #       add an entry to the next sigma set [A => α t . γ, i]
+      # returns true if next token matches the expectations, false otherwise.
       def scan_rule(aPosition)
         terminal = tokens[aPosition].terminal
@@ -108,7 +109,10 @@ module Rley # This module is used as a namespace
         expecting_term = chart[aPosition].entries4term(terminal)
         # ... if the terminal isn't expected then we have an error
-        handle_error(aPosition) if expecting_term.empty?
+        if expecting_term.empty?
+          unexpected_token(aPosition)
+          return false
+        end
         expecting_term.each do |ntry|
           # Get the vertices after the expected terminal
@@ -119,6 +123,8 @@ module Rley # This module is used as a namespace
             apply_rule(ntry, vertex_after_terminal, origin, pos, :scan_rule)
           end
         end
+        return true
       end
@@ -136,7 +142,7 @@ module Rley # This module is used as a namespace
       end
       # Factory method. Builds a ParseForest from the parse result.
-      # @return [ParseForest]
+      # @return [ParseForest]
       def parse_forest()
         factory = ParseForestFactory.new(self)
@@ -148,7 +154,7 @@ module Rley # This module is used as a namespace
       # with origin equal to zero.
       def initial_entry()
         return chart.initial_entry
-      end
+      end
       # Retrieve the accepting parse entry that represents
       # a complete, successful parse
@@ -158,25 +164,43 @@ module Rley # This module is used as a namespace
         return chart.accepting_entry
       end
+      # Mark the parse as erroneous
+      def faulty(aReason)
+        @failure_reason = aReason
+      end
+      # A notification that the parsing reached an end
+      def done
+        unless self.success? || self.failure_reason
+          # Parse not successful and no reason identified
+          # Assuming that parse failed because of a premature end
+          premature_end
+        end
+      end
       private
-      # Raise an exception to indicate a syntax error.
-      def handle_error(aPosition)
-        # Retrieve the actual token
-        actual = tokens[aPosition].terminal
-        lexeme_at_pos = tokens[aPosition].lexeme
+      # Parse error detected: all input tokens were consumed and
+      # the parser didn't detect syntax error meanwhile but
+      # could not reach the accepting state.
+      def premature_end
+        token_pos = tokens.size # One-based!
+        last_token = tokens[-1]
+        entry_set = chart.sets[tokens.size]
+        expected = entry_set.expected_terminals
+        reason = PrematureInputEnd.new(token_pos - 1, last_token, expected)
+        faulty(reason)
+      end
+      # Parse error detected: input token doesn't match
+      # the expectations set by grammar rules
+      def unexpected_token(aPosition)
+        unexpected = tokens[aPosition]
         expected = chart.sets[aPosition].expected_terminals
-        term_names = expected.map(&:name)
-        err_msg = "Syntax error at or near token #{aPosition + 1}"
-        err_msg << ">>>#{lexeme_at_pos}<<<:\nExpected "
-        err_msg << if expected.size > 1
-                     "one of: ['#{term_names.join("', '")}'],"
-                   else
-                     ": #{term_names[0]},"
-                   end
-        err_msg << " found a '#{actual.name}'"
-        raise StandardError, err_msg + ' instead.'
+        reason = UnexpectedToken.new(aPosition, unexpected, expected)
+        faulty(reason)
       end
       def apply_rule(antecedentEntry, aVertex, anOrigin, aPosition, aRuleId)

data/lib/rley/ptree/token_range.rb CHANGED Viewed

@@ -68,11 +68,6 @@ module Rley # This module is used as a namespace
         return "[#{low_text}, #{high_text}]"
       end
-      # Generate a String that represents a value-based identifier
-      def keystr()
-        return "#{low.object_id}-#{high.object_id}"
-      end
       private
       def assign_low(aRange)

data/lib/rley/rley_error.rb ADDED Viewed

@@ -0,0 +1,10 @@
+# File: exceptions.rb
+module Rley # Module used as a namespace
+  # @abstract
+  # Base class for any exception explicitly raised by Rley code.
+  class RleyError < StandardError
+  end # class
+end # module
+# End of file

data/lib/rley/sppf/parse_forest.rb CHANGED Viewed

@@ -4,15 +4,13 @@ require_relative 'alternative_node'
 module Rley # This module is used as a namespace
   module SPPF # This module is used as a namespace
-    # TODO change comment
-    # A parse tree (a.k.a. concrete syntax tree) is a tree-based representation
-    # for the parse that corresponds to the input text. In a parse tree,
-    # a node corresponds to a grammar symbol used during the parsing:
-    # - a leaf node maps to a terminal symbol occurring in
-    # the input, and
-    # - a intermediate node maps to a non-terminal node reduced
-    # during the parse.
-    # The root node corresponds to the main/start symbol of the grammar.
+    # In an ambiguous grammar there are valid inputs that can result in multiple
+    # parse trees. A set of parse trees is commonly referred to as a parse
+    # forest. More specifically a parse forest is a graph data
+    # structure designed to represent a set of equally syntactically correct
+    # parse trees. Parse forests generated by Rley are so-called Shared Packed
+    # Parse Forests (SPPF). SPPFs allow very compact representation of parse
+    # trees by sharing common sub-tree amongst the parse trees.
     class ParseForest
       # The root node of the forest
       attr_reader(:root)

data/spec/rley/parser/error_reason_spec.rb ADDED Viewed

@@ -0,0 +1,120 @@
+require_relative '../../spec_helper'
+require_relative '../../../lib/rley/parser/token'
+# Load the class under test
+require_relative '../../../lib/rley/parser/error_reason'
+module Rley # Open this namespace to avoid module qualifier prefixes
+  module Parser # Open this namespace to avoid module qualifier prefixes
+    describe NoInput do
+      context 'Initialization:' do
+        # Default instantiation rule
+        subject { NoInput.new }
+        it 'should be created without argument' do
+          expect { NoInput.new }.not_to raise_error
+        end
+        it 'should know the error position' do
+          expect(subject.position).to eq(0)
+        end
+      end # context
+      context 'Provided services:' do
+        it 'should emit a standard message' do
+          text = 'Input cannot be empty.'
+          expect(subject.to_s).to eq(text)
+          expect(subject.message).to eq(text)
+        end
+        it 'should give a clear inspection text' do
+          text = 'Rley::Parser::NoInput: Input cannot be empty.'
+          expect(subject.inspect).to eq(text)
+        end
+      end # context
+    end # describe
+    describe ExpectationNotMet do
+      let(:err_token) { double('fake-token') }
+      let(:terminals) do
+        ['PLUS', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
+      end
+      # Default instantiation rule
+      subject { ExpectationNotMet.new(3, err_token, terminals)  }
+      context 'Initialization:' do
+        it 'should be created with arguments' do
+          expect { ExpectationNotMet.new(3, err_token, terminals) }.not_to raise_error
+        end
+        it 'should know the error position' do
+          expect(subject.position).to eq(3)
+        end
+        it 'should know the expected terminals' do
+          expect(subject.expected_terminals).to eq(terminals)
+        end
+      end # context
+    end # describe
+    describe UnexpectedToken do
+      let(:err_lexeme) { '-'}
+      let(:err_terminal) { Syntax::Terminal.new('MINUS') }
+      let(:err_token) { Token.new(err_lexeme, err_terminal) }
+      let(:terminals) do
+        ['PLUS', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
+      end
+      # Default instantiation rule
+      subject { UnexpectedToken.new(3, err_token, terminals)  }
+      context 'Initialization:' do
+        it 'should be created with arguments' do
+          expect { UnexpectedToken.new(3, err_token, terminals) }.not_to raise_error
+        end
+      end # context
+      context 'Provided services:' do
+        it 'should emit a message' do
+          text = <<MSG_END
+Syntax error at or near token 4 >>>-<<<
+Expected one of: ['PLUS', 'LPAREN'], found a 'MINUS' instead.
+MSG_END
+          expect(subject.to_s).to eq(text.chomp)
+          expect(subject.message).to eq(text.chomp)
+        end
+      end # context
+    end #describe
+    describe PrematureInputEnd do
+      let(:err_lexeme) { '+'}
+      let(:err_terminal) { Syntax::Terminal.new('PLUS') }
+      let(:err_token) { Token.new(err_lexeme, err_terminal) }
+      let(:terminals) do
+        ['INT', 'LPAREN'].map { |name| Syntax::Terminal.new(name) }
+      end
+      # Default instantiation rule
+      subject { PrematureInputEnd.new(3, err_token, terminals)  }
+      context 'Initialization:' do
+        it 'should be created with arguments' do
+          expect { PrematureInputEnd.new(3, err_token, terminals) }.not_to raise_error
+        end
+      end # context
+      context 'Provided services:' do
+        it 'should emit a message' do
+          text = <<MSG_END
+Premature end of input after '+' at position 4
+Expected one of: ['INT', 'LPAREN'].
+MSG_END
+          expect(subject.to_s).to eq(text.chomp)
+          expect(subject.message).to eq(text.chomp)
+        end
+      end # context
+    end # describe
+  end # module
+end # module
+# End of file

data/spec/rley/parser/gfg_chart_spec.rb CHANGED Viewed

@@ -46,17 +46,16 @@ module Rley # Open this namespace to avoid module qualifier prefixes
       # from the abc grammar
       let(:items_from_grammar) { build_items_for_grammar(grammar_abc) }
       let(:sample_gfg) { GFG::GrmFlowGraph.new(items_from_grammar) }
-      let(:sample_tracer) { ParseTracer.new(0, output, token_seq) }
       let(:sample_start_symbol) { sample_gfg.start_vertex.non_terminal }
       # Default instantiation rule
-      subject { GFGChart.new(count_token, sample_gfg, sample_tracer) }
+      subject { GFGChart.new(count_token, sample_gfg) }
       context 'Initialization:' do
-        it 'should be created with start vertex, token count, tracer' do
-          expect { GFGChart.new(count_token, sample_gfg, sample_tracer) }
+        it 'should be created with start vertex, token count' do
+          expect { GFGChart.new(count_token, sample_gfg) }
             .not_to raise_error
         end
@@ -64,10 +63,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
           expect(subject.sets.size).to eq(count_token + 1)
         end
-        it 'should reference a tracer' do
-          expect(subject.tracer).to eq(sample_tracer)
-        end
         it 'should know the start symbol' do
           expect(subject.start_symbol).to eq(sample_start_symbol)
         end
@@ -83,52 +78,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
         end
-=end
-      end # context
-      context 'Provided services:' do
-=begin
-        let(:t_a) { Syntax::Terminal.new('a') }
-        let(:t_b) { Syntax::Terminal.new('b') }
-        let(:t_c) { Syntax::Terminal.new('c') }
-        let(:nt_sentence) { Syntax::NonTerminal.new('sentence') }
-        let(:sample_prod) do
-          Syntax::Production.new(nt_sentence, [t_a, t_b, t_c])
-        end
-        let(:origin_val) { 3 }
-        let(:dotted_rule) { DottedItem.new(sample_prod, 2) }
-        let(:complete_rule) { DottedItem.new(sample_prod, 3) }
-        let(:sample_parse_state) { ParseState.new(dotted_rule, origin_val) }
-        let(:sample_tracer) { ParseTracer.new(1, output, token_seq) }
-        # Factory method.
-        def parse_state(origin, aDottedRule)
-          ParseState.new(aDottedRule, origin)
-        end
-        it 'should trace its initialization' do
-          subject[0]  # Force constructor call here
-          expectation = <<-SNIPPET
-['I', 'saw', 'John', 'with', 'a', 'dog']
-|.  I   . saw  . John . with .  a   . dog  .|
-|>      .      .      .      .      .      .| [0:0] sentence => A B . C
-SNIPPET
-          expect(output.string).to eq(expectation)
-        end
-        it 'should trace parse state pushing' do
-          subject[0]  # Force constructor call here
-          output.string = ''
-          subject.push_state(dotted_rule, 3, 5, :prediction)
-          expectation = <<-SNIPPET
-|.      .      .      >      .| [3:5] sentence => A B . C
-SNIPPET
-          expect(output.string).to eq(expectation)
-        end
 =end
       end # context
     end # describe

data/spec/rley/parser/gfg_earley_parser_spec.rb CHANGED Viewed

@@ -7,8 +7,11 @@ require_relative '../../../lib/rley/syntax/grammar_builder'
 require_relative '../../../lib/rley/parser/token'
 require_relative '../../../lib/rley/parser/dotted_item'
 require_relative '../../../lib/rley/parser/gfg_parsing'
+# Load builders and lexers for sample grammars
 require_relative '../support/grammar_abc_helper'
 require_relative '../support/ambiguous_grammar_helper'
+require_relative '../support/grammar_pb_helper'
 require_relative '../support/grammar_helper'
 require_relative '../support/expectation_helper'
@@ -68,10 +71,10 @@ module Rley # Open this namespace to avoid module qualifier prefixes
       # for the language specified by grammar_expr
       def grm2_tokens()
         input_sequence = [
-          { '2' => 'integer' },
-          '+',
+          { '2' => 'integer' },
+          '+',
           { '3' => 'integer' },
-          '*',
+          '*',
           { '4' => 'integer' }
         ]
         return build_token_sequence(input_sequence, grammar_expr)
@@ -178,39 +181,6 @@ module Rley # Open this namespace to avoid module qualifier prefixes
           expect(entry_set_5.entries.size).to eq(4)
           compare_entry_texts(entry_set_5, expected)
         end
-=begin
-        it 'should trace a parse with level 1' do
-          # Substitute temporarily $stdout by a StringIO
-          prev_ostream = $stdout
-          $stdout = StringIO.new('', 'w')
-          trace_level = 1
-          subject.parse(grm1_tokens, trace_level)
-          expectations = <<-SNIPPET
-['a', 'a', 'b', 'c', 'c']
-|. a . a . b . c . c .|
-|>   .   .   .   .   .| [0:0] S => . A
-|>   .   .   .   .   .| [0:0] A => . 'a' A 'c'
-|>   .   .   .   .   .| [0:0] A => . 'b'
-|[---]   .   .   .   .| [0:1] A => 'a' . A 'c'
-|.   >   .   .   .   .| [1:1] A => . 'a' A 'c'
-|.   >   .   .   .   .| [1:1] A => . 'b'
-|.   [---]   .   .   .| [1:2] A => 'a' . A 'c'
-|.   .   >   .   .   .| [2:2] A => . 'a' A 'c'
-|.   .   >   .   .   .| [2:2] A => . 'b'
-|.   .   [---]   .   .| [2:3] A => 'b' .
-|.   [------->   .   .| [1:3] A => 'a' A . 'c'
-|.   .   .   [---]   .| [3:4] A => 'a' A 'c' .
-|[--------------->   .| [0:4] A => 'a' A . 'c'
-|.   .   .   .   [---]| [4:5] A => 'a' A 'c' .
-|[===================]| [0:5] S => A .
-SNIPPET
-          expect($stdout.string).to eq(expectations)
-          # Restore standard ouput stream
-          $stdout = prev_ostream
-        end
-=end
         it 'should parse a valid simple expression' do
           instance = GFGEarleyParser.new(grammar_expr)
@@ -586,40 +556,81 @@ SNIPPET
         it 'should parse an invalid simple input' do
           # Parse an erroneous input (b is missing)
           wrong = build_token_sequence(%w(a a c c), grammar_abc)
+          parse_result = subject.parse(wrong)
+          expect(parse_result.success?).to eq(false)
           err_msg = <<-MSG
-Syntax error at or near token 3>>>c<<<:
+Syntax error at or near token 3 >>>c<<<
 Expected one of: ['a', 'b'], found a 'c' instead.
 MSG
-          err = StandardError
-          expect { subject.parse(wrong) }
-            .to raise_error(err, err_msg.chomp)
+          expect(parse_result.failure_reason.message).to eq(err_msg.chomp)
         end
-        it 'should parse a common sample' do
-          # Grammar based on example found in paper of K. Pingali, G. Bilardi:
-          # "A Graphical Model for Context-Free Gammar Parsing"
-          t_int = Syntax::Literal.new('int', /[-+]?\d+/)
-          t_plus = Syntax::VerbatimSymbol.new('+')
-          t_lparen = Syntax::VerbatimSymbol.new('(')
-          t_rparen = Syntax::VerbatimSymbol.new(')')
+        it 'should report error when no input provided but was required' do
+          helper = GrammarPBHelper.new
+          grammar = helper.grammar
+          instance = GFGEarleyParser.new(grammar)
+          tokens = helper.tokenize('')
+          parse_result = instance.parse(tokens)
+          expect(parse_result.success?).to eq(false)
+          err_msg = 'Input cannot be empty.'
+          expect(parse_result.failure_reason.message).to eq(err_msg)
+        end
-          builder = Syntax::GrammarBuilder.new do
-            add_terminals(t_int, t_plus, t_lparen, t_rparen)
-            rule 'S' => 'E'
-            rule 'E' => 'int'
-            rule 'E' => %w(( E + E ))
-            rule 'E' => %w(E + E)
-          end
-          input_sequence = [
-            { '7' => 'int' },
-            '+',
-            { '8' => 'int' },
-            '+',
-            { '9' => 'int' }
+        it 'should report error when input ends prematurely' do
+          helper = GrammarPBHelper.new
+          grammar = helper.grammar
+          instance = GFGEarleyParser.new(grammar)
+          tokens = helper.tokenize('1 +')
+          parse_result = instance.parse(tokens)
+          expect(parse_result.success?).to eq(false)
+          ###################### S(0) == . 1 +
+          # Expectation chart[0]:
+          expected = [
+            '.S | 0',                     # initialization
+            'S => . E | 0',               # start rule
+            '.E | 0',                     # call rule
+            'E => . int | 0',             # start rule
+            "E => . '(' E '+' E ')' | 0", # start rule
+            "E => . E '+' E | 0"          # start rule
           ]
-          tokens = build_token_sequence(input_sequence, builder.grammar)
-          instance = GFGEarleyParser.new(builder.grammar)
+          compare_entry_texts(parse_result.chart[0], expected)
+          ###################### S(1) == 1 . +
+          # Expectation chart[1]:
+          expected = [
+            'E => int . | 0',             # scan '1'
+            'E. | 0',                     # exit rule
+            'S => E . | 0',               # end rule
+            "E => E . '+' E | 0",         # end rule
+            'S. | 0'                      # exit rule
+          ]
+          compare_entry_texts(parse_result.chart[1], expected)
+          ###################### S(2) == 1 + .
+          # Expectation chart[2]:
+          expected = [
+            "E => E '+' . E | 0",         # scan '+'
+            '.E | 2',                     # exit rule
+            'E => . int | 2',             # start rule
+            "E => . '(' E '+' E ')' | 2", # start rule
+            "E => . E '+' E | 2"          # start rule
+          ]
+          compare_entry_texts(parse_result.chart[2], expected)
+          err_msg = "Premature end of input after '+' at position 2"
+          err_msg << "\nExpected one of: ['int', '(']."
+          expect(parse_result.failure_reason.message).to eq(err_msg)
+        end
+        it 'should parse a common sample' do
+          # Use grammar based on example found in paper of
+          # K. Pingali and G. Bilardi:
+          # "A Graphical Model for Context-Free Grammar Parsing"
+          helper = GrammarPBHelper.new
+          grammar = helper.grammar
+          instance = GFGEarleyParser.new(grammar)
+          tokens = helper.tokenize('7 + 8 + 9')
           parse_result = instance.parse(tokens)
           expect(parse_result.success?).to eq(true)
           ###################### S(0) == . 7 + 8 + 9

data/spec/rley/parser/gfg_parsing_spec.rb CHANGED Viewed

@@ -53,16 +53,15 @@ module Rley # Open this namespace to avoid module qualifier prefixes
       let(:sample_gfg) { GFG::GrmFlowGraph.new(items_from_grammar) }
       let(:output) { StringIO.new('', 'w') }
-      let(:sample_tracer) { ParseTracer.new(0, output, grm1_tokens) }
       # Default instantiation rule
       subject do
-        GFGParsing.new(sample_gfg, grm1_tokens, sample_tracer)
+        GFGParsing.new(sample_gfg, grm1_tokens)
       end
       context 'Initialization:' do
         it 'should be created with a GFG, tokens, trace' do
-          expect { GFGParsing.new(sample_gfg, grm1_tokens, sample_tracer) }
+          expect { GFGParsing.new(sample_gfg, grm1_tokens) }
             .not_to raise_error
         end

data/spec/rley/support/grammar_pb_helper.rb ADDED Viewed

@@ -0,0 +1,48 @@
+# Load the builder class
+require_relative '../../../lib/rley/syntax/grammar_builder'
+require_relative '../../../lib/rley/parser/token'
+# Utility class.
+class GrammarPBHelper
+  # Factory method. Creates a grammar for a basic arithmetic
+  # expression based on example found in paper of
+  # K. Pingali and G. Bilardi:
+  # "A Graphical Model for Context-Free Grammar Parsing"
+  def grammar()
+    @grammar ||= begin
+      builder = Rley::Syntax::GrammarBuilder.new do
+        t_int = Rley::Syntax::Literal.new('int', /[-+]?\d+/)
+        t_plus = Rley::Syntax::VerbatimSymbol.new('+')
+        t_lparen = Rley::Syntax::VerbatimSymbol.new('(')
+        t_rparen = Rley::Syntax::VerbatimSymbol.new(')')
+        add_terminals(t_int, t_plus, t_lparen, t_rparen)
+        rule 'S' => 'E'
+        rule 'E' => 'int'
+        rule 'E' => %w(( E + E ))
+        rule 'E' => %w(E + E)
+      end
+      builder.grammar
+    end
+  end
+  # Basic expression tokenizer
+  def tokenize(aText)
+    tokens = aText.scan(/\S+/).map do |lexeme|
+      case lexeme
+        when '+', '(', ')'
+          terminal = @grammar.name2symbol[lexeme]
+        when /^[-+]?\d+$/
+          terminal = @grammar.name2symbol['int']
+        else
+          msg = "Unknown input text '#{lexeme}'"
+          raise StandardError, msg
+      end
+      Rley::Parser::Token.new(lexeme, terminal)
+    end
+    return tokens
+  end
+end # module
+# End of file

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rley
 version: !ruby/object:Gem::Version
-  version: 0.3.12
+  version: 0.4.00
 platform: ruby
 authors:
 - Dimitri Geshef
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2016-12-08 00:00:00.000000000 Z
+date: 2016-12-17 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
@@ -161,6 +161,7 @@ files:
 - lib/rley/parser/chart.rb
 - lib/rley/parser/dotted_item.rb
 - lib/rley/parser/earley_parser.rb
+- lib/rley/parser/error_reason.rb
 - lib/rley/parser/gfg_chart.rb
 - lib/rley/parser/gfg_earley_parser.rb
 - lib/rley/parser/gfg_parsing.rb
@@ -183,6 +184,7 @@ files:
 - lib/rley/ptree/parse_tree_node.rb
 - lib/rley/ptree/terminal_node.rb
 - lib/rley/ptree/token_range.rb
+- lib/rley/rley_error.rb
 - lib/rley/sppf/alternative_node.rb
 - lib/rley/sppf/composite_node.rb
 - lib/rley/sppf/epsilon_node.rb
@@ -220,6 +222,7 @@ files:
 - spec/rley/parser/chart_spec.rb
 - spec/rley/parser/dotted_item_spec.rb
 - spec/rley/parser/earley_parser_spec.rb
+- spec/rley/parser/error_reason_spec.rb
 - spec/rley/parser/gfg_chart_spec.rb
 - spec/rley/parser/gfg_earley_parser_spec.rb
 - spec/rley/parser/gfg_parsing_spec.rb
@@ -250,6 +253,7 @@ files:
 - spec/rley/support/grammar_b_expr_helper.rb
 - spec/rley/support/grammar_helper.rb
 - spec/rley/support/grammar_l0_helper.rb
+- spec/rley/support/grammar_pb_helper.rb
 - spec/rley/support/grammar_sppf_helper.rb
 - spec/rley/syntax/grammar_builder_spec.rb
 - spec/rley/syntax/grammar_spec.rb
@@ -308,6 +312,7 @@ test_files:
 - spec/rley/parser/chart_spec.rb
 - spec/rley/parser/dotted_item_spec.rb
 - spec/rley/parser/earley_parser_spec.rb
+- spec/rley/parser/error_reason_spec.rb
 - spec/rley/parser/gfg_chart_spec.rb
 - spec/rley/parser/gfg_earley_parser_spec.rb
 - spec/rley/parser/gfg_parsing_spec.rb